[Pw_forum] MPI work via LSF
Paolo Giannozzi
giannozz at nest.sns.it
Wed Apr 2 21:19:35 CEST 2008
On Apr 2, 2008, at 16:19 , Charles Chen wrote:
> I just followed the direction from ITS service here, submit job
> through LSF, eg.
>
> bsub -q $queue -m $machine -n 4 -o $job.out -r $job.err -a mpichp4
> mpirun -np 4 $BIN_DIR/pw.x -in $job.in
>
> I have no clue what's wrong when LSF detects overload and suspends
> my job, and the administrator send me warning message.
inquire with your system manager what is wrong and what they are
complaining
about ("overload" of what? cpu? disk I/O?). This is not a quantum-
espresso problem:
it is related to your specific software and hardware installation. It
is the responsibility
of your computing center to provide the users with the needed
information to run a
job. All we can do is to provide a code that works and some general
information on
how to run it (e.g. required communication hardware and software,
how to avoid
large I/O load). There is in particular a section "running on
parallel machine" in the
user guide: did you read it? if you want to do serious calculations,
you have to
know the machine you are running on.
> Then, any tip to figure out what's wrong? the -in option works
> quite well with other codes.
in the data file: verify that the terminator is / and not \ or &end .
Check for spurious
control characters in the input file. Leave a space before &inputgipaw.
in the code: locate the part of the gipaw code that reads the
namelist and check what
every line does. Compare a case that work with one that doesn't.
Initialize variable
ios to 0. Ignore the error message and check what has been read.
Extract a small
program that just reads, experiment with it. Try a different
compiler. Try serial
execution with parallel compilation, then serial compilation.
Paolo
---
Paolo Giannozzi, Dept of Physics, University of Udine
via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
More information about the Pw_forum
mailing list