[Pw_forum] "MPI_COMM_RANK : Null communicator..." error through Platform LSF system
wangxinquan
wangxinquan at tju.edu.cn
Thu Apr 10 04:47:07 CEST 2008
Dear all,
To solve "MPI_COMM_RANK..." problem, I have modified make.sys.
IFLAGS=-I../include -I/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/include
MPI_LIBS=/usr/local/mpich/1.2.6..13a/gm-2.1.3aa2nks3/smp/intel32/ssh/lib/libmpichf90.a
For IFLAGS parameter, the path of mpif.h was included.
Finally, the "MPI_COMM_RANK..." problem has been solved.
Unfortunately, a new error appeared.
The output (if any) follows:
test icymoon
[0] MPI Abort by user Aborting program!
[0] Aborting program!
test icymoon end
CRASH file:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 0
from read_namelists : error # 1
reading namelist control
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
task # 1
from read_namelists : error # 1
reading namelist control
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The input file is as follow:
------------------------------------------------------------------
<cu.scf.in>
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir='/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
outdir='/nfs/s04r2p1/wangxq_tj/',
prefix='cu'
/
&system
... ...
------------------------------------------------------------------
I have checked "control" section. But I can not find any error in it.
It confused me that whether it is an input file error or a mpich error?
Any help will be deeply appreciated!
Best regards, XQ Wang
------------------------------
Message: 3
Date: Wed, 09 Apr 2008 11:50:07 +0800
From: "" <wangxinquan at tju.edu.cn >
Subject: [Pw_forum] "MPI_COMM_RANK : Null communicator..." error
through Platform LSF system
To: pw_forum at pwscf.org
Message-ID: <407713007.16944 at tju.edu.cn >
Content-Type: text/plain
Dear users and developers,
Recently I have done a test on Nankai Stars HPC. The error message
"MPI_COMM_RANK : Null communicator…Aborting program !"appeared when I did
a scf calculation through 2 cpu (2nodes).
To solve this problem, I have found some hints from google, such as“please
make sure that you used the same version of MPI for compiling and running, and
included the corresponding header file mpi.h in your code.”
(http://www.ncsa.edu/UserInfo/Resources/Hardware/XeonCluster/FAQ/XeonJobs.html)
According to the pwscf mailing list,"dynamic port number used in mpi
intercommunication is not working. This is most probably an installation issue
regarding LSF." may be a problem.
(http://www.democritos.it/pipermail/pw_forum/2007-June/006689.html)
According to the pwscf manual,"Your machine might be configured so as to
disallow interactive execution" may be another problem.
My question is:
To solve “MPI_COMM_RANK…” problem, do I need to modify pwscf code,
mpich_gm code or LSF system?
Calculation Details are as follows:
---------------------------------------------------------------------------------
HPC background:
Nankai Stars (http://202.113.29.200/introduce.htm)
800 Xeon 3.06 Ghz CPU (400 nodes)
800 GB Memory
53T High-Speed Storage
Myrinet
Parallel jobs are run and debuged through Platform LSF system.
Mpich_gm driver:1.2.6..13a
Espresso-3.2.3
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Installation:
/configure CC=mpicc F77=mpif77 F90=mpif90
make all
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Submit script :
#!/bin/bash
#BSUB -q normal
#BSUB -J test.icymoon
#BSUB -c 3:00
#BSUB -a "mpich_gm"
#BSUB -o %J.log
#BSUB -n 2
cd /nfs/s04r2p1/wangxq_tj
echo "test icymoon"
mpirun.lsf /nfs/s04r2p1/wangxq_tj/espresso-3.2.3/bin/pw.x <
/nfs/s04r2p1/wangxq_tj/cu.scf.in > cu.scf.out
echo "test icymoon end"
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
Output file (%J.log):
… …
The output (if any) follows:
test icymoon
0 - MPI_COMM_RANK : Null communicator
[0] Aborting program !
[0] Aborting program!
test icymoon end
---------------------------------------------------------------------------------
---------------------------------------------------------------------------------
<cu.scf.in >
&control
calculation='scf'
restart_mode='from_scratch',
pseudo_dir = '/nfs/s04r2p1/wangxq_tj/espresso-3.2.3/pseudo/',
outdir='/nfs/s04r2p1/wangxq_tj/',
prefix='cu'
/
&system
ibrav = 2, celldm(1) =6.73, nat= 1, ntyp= 1,
ecutwfc = 25.0, ecutrho = 300.0
occupations='smearing', smearing='methfessel-paxton', degauss=0.02
noncolin = .true.
starting_magnetization(1) = 0.5
angle1(1) = 90.0
angle2(1) = 0.0
/
&electrons
conv_thr = 1.0e-8
mixing_beta = 0.7
/
ATOMIC_SPECIES
Cu 63.55 Cu.pz-d-rrkjus.UPF
ATOMIC_POSITIONS
Cu 0.0 0.0 0.0
K_POINTS (automatic)
8 8 8 0 0 0
--------------------------------------------------------------------------------
---------------------------------------------------------------------------------
cu.scf.out
1 - MPI_COMM_RANK : Null communicator
[1] Aborting program !
[1] Aborting program!
TID HOST_NAME COMMAND_LINE STATUS TERMINATION_TIME
==== ========== ================ ======================= ===================
0001 node333 Exit (255) 04/08/2008 19:36:59
0002 node284 Exit (255) 04/08/2008 19:36:59
---------------------------------------------------------------------------------
Any help will be deeply appreciated!
Best regards,
=====================================
X.Q. Wang
wangxinquan at tju.edu.cn
School of Chemical Engineering and Technology
Tianjin University
92 Weijin Road, Tianjin, P. R. China
tel:86-22-27890268, fax: 86-22-27892301
=====================================
------------------------------
_______________________________________________
Pw_forum mailing list
Pw_forum at pwscf.org
http://www.democritos.it/mailman/listinfo/pw_forum
End of Pw_forum Digest, Vol 10, Issue 13
****************************************
=====================================
X.Q. Wang
wangxinquan at tju.edu.cn
Schoolof Chemical Engineeringand Technology
TianjinUniversity
92 Weijin Road, Tianjin, P. R. China
tel:86-22-27890268, fax: 86-22-27892301
=====================================
More information about the Pw_forum
mailing list