[Pw_forum] "MPI_COMM_RANK : Null communicator..." error through Platform LSF system
Axel Kohlmeyer
akohlmey at cmm.chem.upenn.edu
Wed Apr 9 12:57:11 CEST 2008
On Wed, 9 Apr 2008, wangxinquan at tju.edu.cn wrote:
WX> Dear users and developers,
WX>
WX> Recently I have done a test on Nankai Stars HPC. The error message
WX> "MPI_COMM_RANK : Null communicator¡Aborting program !"appeared when I did
WX> a scf calculation through 2 cpu (2nodes).
this is not a quantum espresso problem, but a problem of your
local machine. please contact your system adminimstrator to
investigate. you may want to try a simpler, "hello world"-style
MPI program first to verify this.
cheers,
axel.
WX>
WX> To solve this problem, I have found some hints from google, such as¡°please
WX> make sure that you used the same version of MPI for compiling and running, and
WX> included the corresponding header file mpi.h in your code.¡±
WX> (http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)
WX>
WX> According to the pwscf mailing list,"dynamic port number used in mpi
WX> intercommunication is not working. This is most probably an installation issue
WX> regarding LSF." may be a problem.
WX> (http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)
WX>
WX> According to the pwscf manual,"Your machine might be configured so as to
WX> disallow interactive execution" may be another problem.
WX>
WX> My question is:
WX> To solve ¡°MPI_COMM_RANK¡¡± problem, do I need to modify pwscf code,
WX> mpich_gm code or LSF system?
WX>
WX> Calculation Details are as follows:
WX> ---------------------------------------------------------------------------------
WX> HPC background:
WX> Nankai Stars (http://202.113.29.200/introduce.htm)
WX> 800 Xeon 3.06 Ghz CPU (400 nodes)
WX> 800 GB Memory
WX> 53T High-Speed Storage
WX> Myrinet
WX> Parallel jobs are run and debuged through Platform LSF system.
WX> Mpich_gm driver:1.2.6..13a
WX> Espresso-3.2.3
WX> ---------------------------------------------------------------------------------
WX>
WX> ---------------------------------------------------------------------------------
WX> Installation:
WX> /configure CC=mpicc F77=mpif77 F90=mpif90
WX> make all
WX> ---------------------------------------------------------------------------------
WX>
WX> ---------------------------------------------------------------------------------
WX> Submit script :
WX> #!/bin/bash
WX> #BSUB -q normal
WX> #BSUB -J test.icymoon
WX> #BSUB -c 3:00
WX> #BSUB -a "mpich_gm"
WX> #BSUB -o %J.log
WX> #BSUB -n 2
WX>
WX> cd /nfs/s04r2p1/wangxq_tj
WX> echo "test icymoon"
WX>
WX> mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
WX> /nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
WX>
WX> echo "test icymoon end"
WX> ---------------------------------------------------------------------------------
WX>
WX> ---------------------------------------------------------------------------------
WX> Output file (%J.log):
WX>
WX> ¡ ¡
WX> The output (if any) follows:
WX>
WX> test icymoon
WX> 0 - MPI_COMM_RANK : Null communicator
WX> [0] Aborting program !
WX> [0] Aborting program!
WX> test icymoon end
WX> ---------------------------------------------------------------------------------
WX>
WX> ---------------------------------------------------------------------------------
WX> <cu.scf.in>
WX> &control
WX>
WX> calculation='scf'
WX> restart_mode='from_scratch',
WX> pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
WX> outdir='/nfs/s04r2p1/wangxq_tj/',
WX> prefix='cu'
WX> /
WX>
WX> &system
WX>
WX> ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
WX> ecutwfc = 25.0, ecutrho = 300.0
WX> occupations='smearing', smearing='methfessel-paxton', degauss=0.02
WX> noncolin = .true.
WX> starting_magnetization(1) = 0.5
WX> angle1(1) = 90.0
WX> angle2(1) = 0.0
WX> /
WX>
WX> &electrons
WX>
WX> conv_thr = 1.0e-8
WX> mixing_beta = 0.7
WX> /
WX>
WX> ATOMIC_SPECIES
WX> Cu 63.55 Cu.pz-d-rrkjus.UPF
WX> ATOMIC_POSITIONS
WX> Cu 0.0 0.0 0.0
WX> K_POINTS (automatic)
WX> 8 8 8 0 0 0
WX> --------------------------------------------------------------------------------
WX>
WX> ---------------------------------------------------------------------------------
WX> cu.scf.out
WX>
WX> 1 - MPI_COMM_RANK : Null communicator
WX> [1] Aborting program !
WX> [1] Aborting program!
WX>
WX> TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
WX>
WX> ==== ========== ================ ======================= ===================
WX>
WX> 0001 node333 Exit (255) 04/08/2008 19:36:59
WX>
WX> 0002 node284 Exit (255) 04/08/2008 19:36:59
WX>
WX> ---------------------------------------------------------------------------------
WX>
WX> Any help will be deeply appreciated!
WX>
WX> Best regards,
WX>
WX> =====================================
WX>
WX> X.Q. Wang
WX>
WX> wangxinquan at tju.edu.cn
WX>
WX> School of Chemical Engineering and Technology
WX>
WX> Tianjin University
WX>
WX> 92 Weijin Road, Tianjin, P. R. China
WX>
WX> tel:86-22-27890268, fax: 86-22-27892301
WX>
WX> =====================================
WX>
WX>
WX>
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the Pw_forum
mailing list