[Pw_forum] Problems with espresso-4.0 running example01
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Mon Jul 14 22:26:44 CEST 2008
On Mon, 14 Jul 2008, Amos Leffler wrote:
AL> Dear Forum,
dear amos,
i don't use QE much for production run, so i tend to compile and
test mostly the cvs version, but to prove that this is a PEBCAC-type
of problem, i just downloaded the current 4.0.1 version, ran ./configure
(QE picks up intel 9.1.040 as default fortran 90 compiler, with
OpenMPI, and MKL 10.0.1.014. on an x86_64 machine running fedora 6)
and compiled fine without futher modifications of make.sys.
i then changed the directory to examples/example01 and did a
./run_example and it worked without a flaw. please note, that the
compiled executable is a parallel one, but with OpenMPI, starting
a parallel executable without mpirun is the same as 'mpirun -np 1'.
not all MPI libraries allow this and for those the file
examples/environment_variables must be modified (see below).
as a next step i told OpenMPI and QE to use g95 instead of ifort
(export OMPI_FC=g95; export OMPI_F77=g95 ; export F77=g95 ; export
F90=g95) and ran configure again and compiled. the configure
again picked up MKL as BLAS/LAPACK and the compiled went through.
now, i am getting a:
"error while loading shared libraries: libguide.so: cannot open shared
object files"
but this is _deservedly_ so, since i have a _non-standard_ mkl
installation, since i want to be able to use multiple versions of
it at the same time without much hassle (for benchmarks and tests),
and thus i don't set the LD_LIBRARY_PATH environment, but usually
modify my makefiles to encode the search path explicitly into the
executable upon linking. since here i didn't do this, i get this
failure, but running with:
env LD_LIBRARY_PATH=/path/to/mkl/lib/dir ./run_example
example01 completes just fine.
as a final test i reset my environment to replace intel 9.1 with intel
10.1.015. fresh compilation and again example01 runs just fine without
any modification.
AL> On June 13 Todd Beaudet reported trying to run example01 in
AL> espresso-4.0 using intel compiler 10.1.011, mkl 10.0.1.014 and
AL> reported the output file that stopped with the K_POINTS for Si even
AL> though a number of other materials were to be calculated. Some
AL> suggestions were made including the compiler being at fault.
i noticed two issues: intel 10.1 with patchlevels lower than 014
miscompiles parts of all plane wave codes that i've tested recently.
i also saw that for a while in the cvs there was a bug with the
parallel davidson that would crash pw.x after initialization when
you were running the parallel executable with only one mpi task.
the latter has since been resolved and for the former that are
newer patchlevels.
AL> I ran the sample example using the g95 compiler and got the exact
AL> same results showing it is not the compiler. On June 17 Paolo G.
AL> suggested modifying the ./configure command to disable the parallel.
AL> I tried this, modifying the compiler, but it didn't work either.
AL> Since then I have seen nothing in the Forum on this problem.
AL> If the espresso-4.0 example01 is replaced with the
AL> espresso-3.2 example 01 the output shown below results. Note
the run_example scripts depend on scripts in upper level.
you have to use the full tree.
if you want to test a parallel compile you usually also
_have_ to edit the PARA_PREFIX part (and PARA_POSTFIX in
a few cases. it is important to use quotation marks here.
AL> particularly the lines 82 and 128 where the location of the
AL> si.scf.in files was changed but permission was denied. Below that
AL> the original file location was used and si.band.david.out was used
AL> and it failed also. In the espresso-4.0 version the diagonalization
AL> is set to "david" but I don't know if the example was set up to run
AL> in parallel but I could not find "david" anywhere and reset the
AL> location of si.scf.in to the locations shown below in the output.
what about: the line "for diago in david cg; do" ?
this input tests two different diagonalizers from the
the same section of the shell script.
AL> One further note. If the original example01 is run
AL> libguide.so is mentioned and cant seem to share libraries. Also the
come again? do you mean the error i mentioned above? that is
an indication of an incorrect installation or usage of MKL.
there are several ways to alleviate this. setting LD_LIBRARY_PATH
accordingly is one, passing an appropriate -rpath flag
to the linker a second and linking MKL statically a third.
this is not the fault of QE. QE _has_ to depend on a correct
setup of a machine. ...and even then an occasional tweak to
make.sys is needed to correct where configure guesses wrong
(there are far too many combinations of
linux/intel/mkl/g95/pgi/atlas/whatever installations around
to get it right all the time).
AL> setup did not include any MPI files.
???
AL> Hopefully there is a simple resolution to this problem.
the (simple) resolution is, that there is no problem
that has to be resolved on the side of QE. if you have
a machine that is properly installed, it should work
just fine.
AL> Amos Leffler
AL> unaffiliated
AL> Script started on Mon 14 Jul 2008 11:11:39 AM PDT
AL> ]2;amos at leffler2:...examples/example01]1;leffler2amos at leffler2:~/Desktop/espresso-4.0/examples/example01> AL> ]./run_example
where do the escape sequences come from? do you have some sort
of "prompt hack" going (e.g. to update the xterminal title text
or colorized prompt) that produces output in a script environment
(i.e. it is set regardless of whether you are using /bin/sh or
/bin/bash) ?
AL> /home/amos/Desktop/espresso-4.0/examples/example01 : starting
AL>
AL> This example shows how to use pw.x to calculate the total energy and
AL> the band structure of four simple systems: Si, Al, Cu, Ni.
AL>
AL> executables directory: /home/amos/Desktop/espresso-4.0/bin
AL> pseudo directory: /home/amos/Desktop/espresso-4.0/pseudo
AL> temporary directory: /home/amos/tmp
AL> checking that needed directories and files exist... done
AL>
AL> running pw.x as: /home/amos/Desktop/espresso-4.0/bin/pw.x
AL> running bands.x as: /home/amos/Desktop/espresso-4.0/bin/bands.x
AL>
AL> cleaning /home/amos/tmp... done
AL> ./run_example: line 82: /home/amos/Desktop/espresso-4.0/GUI/PWgui/examples/pw/si.scf.in: Permission denied
where does this path come from? something is messed up that irritates
the shell script. it could be related to your shell prompt setup.
please try editing examples/environment_variables to give the value
of PREFIX explicitly. the current code changes the directory in a
subshell and if you have a "prompt hack" active, the value of PREFIX
could be messed up. of course, i'm assuming that you are running the
./run_example from 4.0.1 without any modifications...
[...]
cheers,
axel.
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the Pw_forum
mailing list