[Pw_forum] "MPI_COMM_RANK : Null communicator..." error through Platform LSF system

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Wed Apr 9 12:57:11 CEST 2008


On Wed, 9 Apr 2008, wangxinquan at tju.edu.cn wrote:

WX> Dear users and developers,
WX> 
WX>      Recently I have done a test on Nankai Stars HPC. The error message 
WX> "MPI_COMM_RANK : Null communicator¡­Aborting program !"appeared when I did 
WX> a scf calculation through 2 cpu (2nodes). 

this is not a quantum espresso problem, but a problem of your
local machine. please contact your system adminimstrator to
investigate. you may want to try a simpler, "hello world"-style
MPI program first to verify this.

cheers,
   axel.

WX> 
WX>      To solve this problem, I have found some hints from google, such as¡°please
WX> make sure that you used the same version of MPI for compiling and running, and
WX> included the corresponding header file mpi.h in your code.¡± 
WX> (http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)
WX> 
WX>      According to the pwscf mailing list,"dynamic port number used in mpi
WX> intercommunication is not working. This is most probably an installation issue
WX> regarding LSF." may be a problem. 
WX> (http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)
WX> 
WX>      According to the pwscf manual,"Your machine might be configured so as to 
WX> disallow interactive execution" may be another problem.
WX> 
WX>      My question is:
WX>      To solve ¡°MPI_COMM_RANK¡­¡± problem, do I need to modify pwscf code,
WX> mpich_gm code or LSF system?
WX> 
WX> Calculation Details are as follows:
WX> ---------------------------------------------------------------------------------
WX> HPC background:
WX> Nankai Stars (http://202.113.29.200/introduce.htm)
WX> 800 Xeon 3.06 Ghz CPU (400 nodes)   
WX> 800 GB Memory    
WX> 53T High-Speed Storage    
WX> Myrinet
WX> Parallel jobs are run and debuged through Platform LSF system.
WX> Mpich_gm driver:1.2.6..13a
WX> Espresso-3.2.3
WX> ---------------------------------------------------------------------------------
WX> 
WX> ---------------------------------------------------------------------------------
WX> Installation:
WX> /configure CC=mpicc F77=mpif77 F90=mpif90
WX> make all
WX> ---------------------------------------------------------------------------------
WX> 
WX> ---------------------------------------------------------------------------------
WX> Submit script :
WX> #!/bin/bash
WX> #BSUB -q normal
WX> #BSUB -J test.icymoon
WX> #BSUB -c 3:00
WX> #BSUB -a "mpich_gm"
WX> #BSUB -o %J.log
WX> #BSUB -n 2 
WX> 
WX> cd /nfs/s04r2p1/wangxq_tj
WX> echo "test icymoon"
WX> 
WX> mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
WX> /nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
WX> 
WX> echo "test icymoon end"
WX> ---------------------------------------------------------------------------------
WX> 
WX> ---------------------------------------------------------------------------------
WX> Output file (%J.log):
WX> 
WX> ¡­ ¡­
WX> The output (if any) follows:
WX> 
WX> test icymoon
WX> 0 - MPI_COMM_RANK : Null communicator
WX> [0]  Aborting program !
WX> [0] Aborting program!
WX> test icymoon end
WX> ---------------------------------------------------------------------------------
WX> 
WX> ---------------------------------------------------------------------------------
WX> <cu.scf.in>
WX> &control
WX> 
WX>     calculation='scf'
WX>     restart_mode='from_scratch',
WX>     pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
WX>     outdir='/nfs/s04r2p1/wangxq_tj/',
WX>     prefix='cu'
WX>  /
WX> 
WX>  &system
WX> 
WX>     ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
WX>     ecutwfc = 25.0, ecutrho = 300.0
WX>     occupations='smearing', smearing='methfessel-paxton', degauss=0.02
WX>     noncolin = .true.
WX>     starting_magnetization(1) = 0.5
WX>     angle1(1) = 90.0
WX>     angle2(1) =  0.0
WX>  /
WX> 
WX>  &electrons
WX> 
WX>     conv_thr = 1.0e-8
WX>     mixing_beta = 0.7 
WX>  /
WX> 
WX> ATOMIC_SPECIES
WX>  Cu 63.55 Cu.pz-d-rrkjus.UPF
WX> ATOMIC_POSITIONS
WX>  Cu 0.0 0.0 0.0
WX> K_POINTS (automatic)
WX>  8 8 8 0 0 0
WX> --------------------------------------------------------------------------------
WX> 
WX> ---------------------------------------------------------------------------------
WX> cu.scf.out
WX> 
WX> 1 - MPI_COMM_RANK : Null communicator
WX> [1]  Aborting program !
WX> [1] Aborting program!
WX> 
WX> TID  HOST_NAME    COMMAND_LINE            STATUS            TERMINATION_TIME
WX> 
WX> ==== ========== ================  =======================  ===================
WX> 
WX> 0001 node333                      Exit (255)               04/08/2008 19:36:59
WX> 
WX> 0002 node284                      Exit (255)               04/08/2008 19:36:59
WX> 
WX> ---------------------------------------------------------------------------------
WX> 
WX> Any help will be deeply appreciated!
WX> 
WX> Best regards,
WX> 
WX> =====================================
WX> 
WX> X.Q. Wang 
WX> 
WX> wangxinquan at tju.edu.cn
WX> 
WX> School of Chemical Engineering and Technology
WX> 
WX> Tianjin University
WX> 
WX> 92 Weijin Road, Tianjin, P. R. China
WX> 
WX> tel:86-22-27890268, fax: 86-22-27892301
WX> 
WX> =====================================
WX> 
WX> 
WX> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.


More information about the Pw_forum mailing list