No subject
Sun Aug 24 10:11:18 CEST 2008
calculation is quite low.
I think it maybe not up to snuff.
For better understanding with my problem, I'll tell more about my =
software, hardware and simulation model.
Simulation model: my system was a slab model for a certain metal oxide =
surface with 3 irregular k points.
The pseudopotential was Ultrasoft (Vanderbilt) Pseudopotentials.
Hardware: there are two Xeon single core CPUs and 2G physical memory for =
each node. The network=20
is infiniband with 10G band width.=20
Software: My fortran and C compile was intel 10.0.015 version. MKL is =
l_mkl_p_10.0.3.020.tgz.
FFTW is fftw-2.1.5.tar.gz. MPI is mpich2-1.0.7.tar.gz. All of above was =
stored at NSF location.
My QE was compiled in a NFS location. and the outdir was also on the =
NFS. wfcdir was on local
disk, /tmp/ folder. In order to reduce the IO, I also set the disk_io =
=3D 'none'.
Could you tell me what make my CPUs run in a such low efficiency style? =
Is there any hints to improve the=20
performance of the parallel efficiency?
Do you think 10G infiniband is good enough for 39 nodes? Do you think =
it's not necessary to put so much file
on NFS localtion? Could tell me which folders must be on a NFS location =
so that all the nodes can load and=20
write? =20
I also noticed that the pw.x reported 129.6 Mb memory was required. But =
actually, I found the virtual memory was
used. Do you think the pw.x underestimate greatly for the memory?=20
thank you for reading.
any hints on my problem will be deeply appreciated.
vega
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
------=_NextPart_000_0008_01C91ABB.63A1E290
Content-Type: text/html;
charset="gb2312"
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3Dtext/html;charset=3Dgb2312>
<META content=3D"MSHTML 6.00.6000.16705" name=3DGENERATOR></HEAD>
<BODY id=3DMailContainerBody=20
style=3D"PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px"=20
bgColor=3D#ffffff leftMargin=3D0 topMargin=3D0 CanvasTabStop=3D"true"=20
name=3D"Compose message area">
<DIV><FONT face=3DArial size=3D2>Dear all,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I just finished a relax =
calculation for 120=20
atoms. After calculation was done, the outputfile reported as=20
follows,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2> Program=20
PWSCF v.4.0.1 starts=20
...<BR> Today is 16Sep2008 at 19:14:42 =
</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Parallel =
version=20
(MPI)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Number of =
processors in=20
use: 78<BR> =
K-points=20
division: npool =20
=3D 3<BR> R & G space=20
division: proc/pool =3D 26</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> For =
Norm-Conserving or=20
Ultrasoft (Vanderbilt) Pseudopotentials or PAW</FONT></DIV>
<DIV><FONT face=3DArial =
size=3D2>................................</FONT></DIV>
<DIV><FONT face=3DArial size=3D2> per-process =
dynamical=20
memory: 129.6 Mb</FONT></DIV>
<DIV><FONT face=3DArial =
size=3D2>................................</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2> =20
PWSCF =
: =20
0d 14h46m CPU =
time, =20
2d 18h 4m wall time</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> =20
init_run : 91.49s=20
CPU<BR> electrons : 47137.56s =
CPU=20
( 27 calls,1745.836 s=20
avg)<BR> update_pot : =
187.80s=20
CPU ( 26 calls, 7.223 s=20
avg)<BR> =
forces =20
: 4492.20s CPU ( 27 calls, 166.378 s =
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Called by=20
init_run:<BR> =
wfcinit =20
: 23.68s CPU<BR> =20
potinit : 3.15s=20
CPU</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Called by=20
electrons:<BR> =
c_bands :=20
23198.29s CPU ( 258 calls, 89.916 s=20
avg)<BR> sum_band : =
11159.67s=20
CPU ( 258 calls, 43.255 s=20
avg)<BR> v_of_rho =
: =20
167.39s CPU ( 280 calls, 0.598 s=20
avg)<BR> =20
newd : 13679.79s CPU=20
( 280 calls, 48.856 s=20
avg)<BR> mix_rho =20
: 30.14s CPU ( 258 =
calls, =20
0.117 s avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Called by=20
c_bands:<BR> init_us_2 =20
: 48.95s CPU ( 517 =
calls, =20
0.095 s avg)<BR> =
cegterg :=20
23038.94s CPU ( 258 calls, 89.298 s=20
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Called by=20
*egterg:<BR> =20
h_psi : 8629.82s CPU=20
( 1459 calls, 5.915 s=20
avg)<BR> =
s_psi =20
: 2230.78s CPU ( 1459 calls, 1.529 s =
avg)<BR> =
g_psi =20
: 34.68s CPU ( 1200 =
calls, 0.029=20
s avg)<BR> cdiaghg =
: =20
5929.74s CPU ( 1427 calls, 4.155 s=20
avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> Called by=20
h_psi:<BR> add_vuspsi : =
2209.17s CPU=20
( 1459 calls, 1.514 s avg)</FONT></DIV>
<DIV> </DIV>
<DIV><FONT face=3DArial size=3D2> General=20
routines<BR> =
calbec =20
: 2904.12s CPU ( 1744 calls, 1.665 s =
avg)<BR> =
cft3s =20
: 4337.89s CPU ( 950068 calls, 0.005 s=20
avg)<BR> interpolate : =
34.87s=20
CPU ( 538 calls, 0.065 s=20
avg)<BR> <BR> Parallel=20
routines<BR> fft_scatter : =
538.83s CPU=20
( 950068 calls, 0.001 s avg)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>From the reported information, we can =
see that the=20
efficiency of my calculation is quite low.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>I think it maybe not up to =
snuff.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>For better understanding with my =
problem, I'll tell=20
more about my software, hardware and simulation model.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Simulation model: my system was a slab =
model for=20
a certain metal oxide surface with 3 irregular k =
points.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>The pseudopotential was Ultrasoft =
(Vanderbilt)=20
Pseudopotentials.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Hardware: there are two Xeon single =
core CPUs=20
and 2G physical memory for each node. The network </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>is infiniband with 10G band width. =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Software: My fortran and C compile was =
intel=20
10.0.015 version. MKL is l_mkl_p_10.0.3.020.tgz.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>FFTW is fftw-2.1.5.tar.gz. MPI is=20
mpich2-1.0.7.tar.gz.</FONT> <FONT face=3DArial size=3D2>All of =
above was stored=20
at NSF location.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>My QE was compiled in a NFS location. =
and the=20
outdir was also on the NFS. wfcdir was on local</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>disk, /tmp/ folder. In order =
to reduce=20
the IO, I also set the disk_io =3D 'none'.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Could you tell me what make =
my CPUs run in a=20
such low efficiency style? Is there any hints to improve the =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>performance of the parallel=20
efficiency?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>Do you think 10G infiniband is good =
enough for 39=20
nodes? Do you think it's not necessary to put so much file</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>on NFS =
localtion? Could tell me=20
which folders must be on a NFS location so that all the nodes can =
load=20
and </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>write? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>I also noticed that the pw.x reported =
129.6 Mb=20
memory was required. But actually, I found the virtual memory =
was</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>used. Do you think the pw.x =
underestimate greatly=20
for the memory? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>thank you for reading.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>any hints on my problem will be deeply=20
appreciated.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial size=3D2>vega</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT> </DIV>
<DIV><FONT face=3DArial=20
size=3D2>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D<BR>Vega=20
Lew (weijia liu)<BR>PH.D Candidate in Chemical Engineering<BR>State Key=20
Laboratory of Materials-oriented Chemical Engineering<BR>College of =
Chemistry=20
and Chemical Engineering<BR>Nanjing University of Technology, 210009, =
Nanjing,=20
Jiangsu, China</FONT></DIV></BODY></HTML>
------=_NextPart_000_0008_01C91ABB.63A1E290--
More information about the Pw_forum
mailing list