[Pw_forum] "MPI_COMM_RANK : Null communicator..." error through Platform LSF system

wangxinquan at tju.edu.cn wangxinquan at tju.edu.cn
Wed Apr 9 05:50:07 CEST 2008


Dear users and developers,

     Recently I have done a test on Nankai Stars HPC. The error message 
"MPI_COMM_RANK : Null communicator¡­Aborting program !"appeared when I did 
a scf calculation through 2 cpu (2nodes). 

     To solve this problem, I have found some hints from google, such as¡°please
make sure that you used the same version of MPI for compiling and running, and
included the corresponding header file mpi.h in your code.¡± 
(http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)

     According to the pwscf mailing list,"dynamic port number used in mpi
intercommunication is not working. This is most probably an installation issue
regarding LSF." may be a problem. 
(http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)

     According to the pwscf manual,"Your machine might be configured so as to 
disallow interactive execution" may be another problem.

     My question is:
     To solve ¡°MPI_COMM_RANK¡­¡± problem, do I need to modify pwscf code,
mpich_gm code or LSF system?

Calculation Details are as follows:
---------------------------------------------------------------------------------
HPC background:
Nankai Stars (http://202.113.29.200/introduce.htm)
800 Xeon 3.06 Ghz CPU (400 nodes)   
800 GB Memory    
53T High-Speed Storage    
Myrinet
Parallel jobs are run and debuged through Platform LSF system.
Mpich_gm driver:1.2.6..13a
Espresso-3.2.3
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Installation:
/configure CC=mpicc F77=mpif77 F90=mpif90
make all
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Submit script :
#!/bin/bash
#BSUB -q normal
#BSUB -J test.icymoon
#BSUB -c 3:00
#BSUB -a "mpich_gm"
#BSUB -o %J.log
#BSUB -n 2 

cd /nfs/s04r2p1/wangxq_tj
echo "test icymoon"

mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
/nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out

echo "test icymoon end"
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
Output file (%J.log):

¡­ ¡­
The output (if any) follows:

test icymoon
0 - MPI_COMM_RANK : Null communicator
[0]  Aborting program !
[0] Aborting program!
test icymoon end
---------------------------------------------------------------------------------

---------------------------------------------------------------------------------
<cu.scf.in>
&control

    calculation='scf'
    restart_mode='from_scratch',
    pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
    outdir='/nfs/s04r2p1/wangxq_tj/',
    prefix='cu'
 /

 &system

    ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
    ecutwfc = 25.0, ecutrho = 300.0
    occupations='smearing', smearing='methfessel-paxton', degauss=0.02
    noncolin = .true.
    starting_magnetization(1) = 0.5
    angle1(1) = 90.0
    angle2(1) =  0.0
 /

 &electrons

    conv_thr = 1.0e-8
    mixing_beta = 0.7 
 /

ATOMIC_SPECIES
 Cu 63.55 Cu.pz-d-rrkjus.UPF
ATOMIC_POSITIONS
 Cu 0.0 0.0 0.0
K_POINTS (automatic)
 8 8 8 0 0 0
--------------------------------------------------------------------------------

---------------------------------------------------------------------------------
cu.scf.out

1 - MPI_COMM_RANK : Null communicator
[1]  Aborting program !
[1] Aborting program!

TID  HOST_NAME    COMMAND_LINE            STATUS            TERMINATION_TIME

==== ========== ================  =======================  ===================

0001 node333                      Exit (255)               04/08/2008 19:36:59

0002 node284                      Exit (255)               04/08/2008 19:36:59

---------------------------------------------------------------------------------

Any help will be deeply appreciated!

Best regards,

=====================================

X.Q. Wang 

wangxinquan at tju.edu.cn

School of Chemical Engineering and Technology

Tianjin University

92 Weijin Road, Tianjin, P. R. China

tel:86-22-27890268, fax: 86-22-27892301

=====================================




More information about the Pw_forum mailing list