[Pw_forum] Problems with espresso-4.0 running example01

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Mon Jul 14 22:26:44 CEST 2008


On Mon, 14 Jul 2008, Amos Leffler wrote:

AL> Dear Forum,

dear amos,

i don't use QE much for production run, so i tend to compile and
test mostly the cvs version, but to prove that this is a PEBCAC-type 
of problem, i just downloaded the current 4.0.1 version, ran ./configure
(QE picks up intel 9.1.040 as default fortran 90 compiler, with 
OpenMPI, and MKL 10.0.1.014. on an x86_64 machine running fedora 6) 
and compiled fine without futher modifications of make.sys. 

i then changed the directory to examples/example01 and did a 
./run_example and it worked without a flaw. please note, that the 
compiled executable is a parallel one, but with OpenMPI, starting 
a parallel executable without mpirun is the same as 'mpirun -np 1'. 
not all MPI libraries allow this and for those the file 
examples/environment_variables must be modified (see below).

as a next step i told OpenMPI and QE to use g95 instead of ifort 
(export OMPI_FC=g95; export OMPI_F77=g95 ; export F77=g95 ; export 
F90=g95) and ran configure again and compiled.  the configure
again picked up MKL as BLAS/LAPACK and the compiled went through.

now, i am getting a:
"error while loading shared libraries: libguide.so: cannot open shared 
object files"

but this is _deservedly_ so, since i have a _non-standard_ mkl 
installation, since i want to be able to use multiple versions of 
it at the same time without much hassle (for benchmarks and tests),
and thus i don't set the LD_LIBRARY_PATH environment, but usually
modify my makefiles to encode the search path explicitly into the 
executable upon linking. since here i didn't do this, i get this
failure, but running with:

env LD_LIBRARY_PATH=/path/to/mkl/lib/dir ./run_example

example01 completes just fine.

as a final test i reset my environment to replace intel 9.1 with intel 
10.1.015. fresh compilation and again example01 runs just fine without
any modification.

AL>         On June 13 Todd Beaudet reported trying to run example01 in 
AL> espresso-4.0 using intel compiler 10.1.011, mkl 10.0.1.014 and 
AL> reported the output file that stopped with the K_POINTS for Si even 
AL> though a number of other materials were to be calculated.  Some 
AL> suggestions were made including the compiler being at fault.

i noticed two issues: intel 10.1 with patchlevels lower than 014 
miscompiles parts of all plane wave codes that i've tested recently.
i also saw that for a while in the cvs there was a bug with the 
parallel davidson that would crash pw.x after initialization when
you were running the parallel executable with only one mpi task.
the latter has since been resolved and for the former that are 
newer patchlevels.

AL> I ran the sample example using the g95 compiler and got the exact 
AL> same results showing it is not the compiler.  On June 17 Paolo G. 
AL> suggested modifying the ./configure command to disable the parallel.  
AL> I tried this, modifying the compiler, but it didn't work either.  
AL> Since then I have seen nothing in the Forum on this problem.

AL>         If the espresso-4.0 example01 is replaced with the 
AL> espresso-3.2 example 01 the output shown below results. Note 

the run_example scripts depend on scripts in upper level. 
you have to use the full tree. 

if you want to test a parallel compile you usually also
_have_ to edit the PARA_PREFIX part (and PARA_POSTFIX in
a few cases. it is important to use quotation marks here.

AL> particularly the lines 82 and 128 where the location of the 
AL> si.scf.in files was changed but permission was denied.  Below that 
AL> the original file location was used and si.band.david.out was used 
AL> and it failed also.  In the espresso-4.0 version the diagonalization 
AL> is set to "david" but I don't know if the example was set up to run 
AL> in parallel but I could not find "david" anywhere and reset the 
AL> location of si.scf.in to the locations shown below in the output.

what about: the line "for diago in david cg; do" ?
this input tests two different diagonalizers from the
the same section of the shell script. 

AL>         One further note.  If the original example01 is run 
AL> libguide.so is mentioned and cant seem to share libraries.  Also the 

come again? do you mean the error i mentioned above? that is 
an indication of an incorrect installation or usage of MKL. 
there are several ways to alleviate this. setting LD_LIBRARY_PATH
accordingly is one, passing an appropriate -rpath flag
to the linker a second and linking MKL statically a third.
this is not the fault of QE. QE _has_ to depend on a correct
setup of a machine. ...and even then an occasional tweak to
make.sys is needed to correct where configure guesses wrong
(there are far too many combinations of 
linux/intel/mkl/g95/pgi/atlas/whatever installations around
to get it right all the time).

AL> setup did not include any MPI files.

???

AL>         Hopefully there is a simple resolution to this problem.

the (simple) resolution is, that there is no problem
that has to be resolved on the side of QE. if you have 
a machine that is properly installed, it should work 
just fine.

AL>                                                                                     Amos Leffler
AL>                                                                                     unaffiliated

AL> Script started on Mon 14 Jul 2008 11:11:39 AM PDT

AL> ]2;amos at leffler2:...examples/example01]1;leffler2amos at leffler2:~/Desktop/espresso-4.0/examples/example01> AL> ]./run_example

where do the escape sequences come from? do you have some sort
of "prompt hack" going (e.g. to update the xterminal title text
or colorized prompt) that produces output in a script environment
(i.e. it is set regardless of whether you are using /bin/sh or
/bin/bash) ?


AL> /home/amos/Desktop/espresso-4.0/examples/example01 : starting
AL> 
AL> This example shows how to use pw.x to calculate the total energy and
AL> the band structure of four simple systems: Si, Al, Cu, Ni.
AL> 
AL>   executables directory: /home/amos/Desktop/espresso-4.0/bin
AL>   pseudo directory:      /home/amos/Desktop/espresso-4.0/pseudo
AL>   temporary directory:   /home/amos/tmp
AL>   checking that needed directories and files exist... done
AL> 
AL>   running pw.x as:  /home/amos/Desktop/espresso-4.0/bin/pw.x 
AL>   running bands.x as:  /home/amos/Desktop/espresso-4.0/bin/bands.x 
AL> 
AL>   cleaning /home/amos/tmp... done
AL> ./run_example: line 82: /home/amos/Desktop/espresso-4.0/GUI/PWgui/examples/pw/si.scf.in: Permission denied

where does this path come from? something is messed up that irritates
the shell script. it could be related to your shell prompt setup. 
please try editing examples/environment_variables to give the value 
of PREFIX explicitly. the current code changes the directory in a
subshell and if you have a "prompt hack" active, the value of PREFIX
could be messed up. of course, i'm assuming that you are running the
./run_example from 4.0.1 without any modifications...

[...]

cheers,
   axel.

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.


More information about the Pw_forum mailing list