No subject


Sun Aug 24 10:11:18 CEST 2008


calculation is quite low.
I think it maybe not up to snuff.
For better understanding with my problem, I'll tell more about my =
software, hardware and simulation model.

Simulation model: my system was a slab model for a certain metal oxide =
surface with 3 irregular k points.
The pseudopotential was Ultrasoft (Vanderbilt) Pseudopotentials.

Hardware: there are two Xeon single core CPUs and 2G physical memory for =
each node. The network=20
is infiniband with 10G band width.=20

Software: My fortran and C compile was intel 10.0.015 version. MKL is =
l_mkl_p_10.0.3.020.tgz.
FFTW is fftw-2.1.5.tar.gz. MPI is mpich2-1.0.7.tar.gz. All of above was =
stored at NSF location.
My QE was compiled in a NFS location. and the outdir was also on the =
NFS. wfcdir was on local
disk, /tmp/ folder. In order to reduce the IO, I also set the disk_io =
=3D 'none'.

Could you tell me what make my CPUs run in a such low efficiency style? =
Is there any hints to improve the=20
performance of the parallel efficiency?

Do you think 10G infiniband is good enough for 39 nodes? Do you think =
it's not necessary to put so much file
on NFS localtion?  Could tell me which folders must be on a NFS location =
so that all the nodes can load and=20
write? =20

I also noticed that the pw.x reported 129.6 Mb memory was required. But =
actually, I found the virtual memory was
used. Do you think the pw.x underestimate greatly for the memory?=20

thank you for reading.

any hints on my problem will be deeply appreciated.

vega

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
------=_NextPart_000_0008_01C91ABB.63A1E290
Content-Type: text/html;
	charset="gb2312"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3Dtext/html;charset=3Dgb2312>
<META content=3D"MSHTML 6.00.6000.16705" name=3DGENERATOR></HEAD>
<BODY id=3DMailContainerBody=20
style=3D"PADDING-RIGHT: 10px; PADDING-LEFT: 10px; PADDING-TOP: 15px"=20
bgColor=3D#ffffff leftMargin=3D0 topMargin=3D0 CanvasTabStop=3D"true"=20
name=3D"Compose message area">
<DIV><FONT face=3DArial size=3D2>Dear all,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I just finished a&nbsp;relax =
calculation for 120=20
atoms. After calculation was done, the outputfile reported as=20
follows,</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp; Program=20
PWSCF&nbsp;&nbsp;&nbsp;&nbsp; v.4.0.1&nbsp; starts=20
...<BR>&nbsp;&nbsp;&nbsp;&nbsp; Today is 16Sep2008 at 19:14:42 =
</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Parallel =
version=20
(MPI)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Number of =
processors in=20
use:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 78<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
K-points=20
division:&nbsp;&nbsp;&nbsp;&nbsp; npool&nbsp;&nbsp;&nbsp;&nbsp;=20
=3D&nbsp;&nbsp;&nbsp; 3<BR>&nbsp;&nbsp;&nbsp;&nbsp; R &amp; G space=20
division:&nbsp; proc/pool =3D&nbsp;&nbsp; 26</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; For =
Norm-Conserving or=20
Ultrasoft (Vanderbilt) Pseudopotentials or PAW</FONT></DIV>
<DIV><FONT face=3DArial =
size=3D2>................................</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; per-process =
dynamical=20
memory:&nbsp;&nbsp; 129.6 Mb</FONT></DIV>
<DIV><FONT face=3DArial =
size=3D2>................................</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;=20
PWSCF&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
:&nbsp;&nbsp;&nbsp;&nbsp;=20
0d&nbsp;&nbsp; 14h46m CPU =
time,&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
2d&nbsp;&nbsp; 18h 4m wall time</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp;=20
init_run&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp;&nbsp; 91.49s=20
CPU<BR>&nbsp;&nbsp;&nbsp;&nbsp; electrons&nbsp;&nbsp;&nbsp; : 47137.56s =
CPU=20
(&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 27 calls,1745.836 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; update_pot&nbsp;&nbsp; :&nbsp;&nbsp; =
187.80s=20
CPU (&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 26 calls,&nbsp;&nbsp; 7.223 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
forces&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp; 4492.20s CPU (&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 27 calls, 166.378 s =

avg)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Called by=20
init_run:<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
wfcinit&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp;&nbsp;&nbsp; 23.68s CPU<BR>&nbsp;&nbsp;&nbsp;&nbsp;=20
potinit&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp;&nbsp;&nbsp;&nbsp; 3.15s=20
CPU</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Called by=20
electrons:<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
c_bands&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :=20
23198.29s CPU (&nbsp;&nbsp;&nbsp;&nbsp; 258 calls,&nbsp; 89.916 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; sum_band&nbsp;&nbsp;&nbsp;&nbsp; : =
11159.67s=20
CPU (&nbsp;&nbsp;&nbsp;&nbsp; 258 calls,&nbsp; 43.255 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; v_of_rho&nbsp;&nbsp;&nbsp;&nbsp; =
:&nbsp;&nbsp;=20
167.39s CPU (&nbsp;&nbsp;&nbsp;&nbsp; 280 calls,&nbsp;&nbsp; 0.598 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp;=20
newd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; : 13679.79s CPU=20
(&nbsp;&nbsp;&nbsp;&nbsp; 280 calls,&nbsp; 48.856 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; mix_rho&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp;&nbsp;&nbsp; 30.14s CPU (&nbsp;&nbsp;&nbsp;&nbsp; 258 =
calls,&nbsp;&nbsp;=20
0.117 s avg)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Called by=20
c_bands:<BR>&nbsp;&nbsp;&nbsp;&nbsp; init_us_2&nbsp;&nbsp;&nbsp;=20
:&nbsp;&nbsp;&nbsp; 48.95s CPU (&nbsp;&nbsp;&nbsp;&nbsp; 517 =
calls,&nbsp;&nbsp;=20
0.095 s avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
cegterg&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :=20
23038.94s CPU (&nbsp;&nbsp;&nbsp;&nbsp; 258 calls,&nbsp; 89.298 s=20
avg)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Called by=20
*egterg:<BR>&nbsp;&nbsp;&nbsp;&nbsp;=20
h_psi&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; :&nbsp; 8629.82s CPU=20
(&nbsp;&nbsp;&nbsp; 1459 calls,&nbsp;&nbsp; 5.915 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
s_psi&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp; 2230.78s CPU (&nbsp;&nbsp;&nbsp; 1459 calls,&nbsp;&nbsp; 1.529 s =

avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
g_psi&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp;&nbsp;&nbsp; 34.68s CPU (&nbsp;&nbsp;&nbsp; 1200 =
calls,&nbsp;&nbsp; 0.029=20
s avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; cdiaghg&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; =
:&nbsp;=20
5929.74s CPU (&nbsp;&nbsp;&nbsp; 1427 calls,&nbsp;&nbsp; 4.155 s=20
avg)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; Called by=20
h_psi:<BR>&nbsp;&nbsp;&nbsp;&nbsp; add_vuspsi&nbsp;&nbsp; :&nbsp; =
2209.17s CPU=20
(&nbsp;&nbsp;&nbsp; 1459 calls,&nbsp;&nbsp; 1.514 s avg)</FONT></DIV>
<DIV>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>&nbsp;&nbsp;&nbsp;&nbsp; General=20
routines<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
calbec&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp; 2904.12s CPU (&nbsp;&nbsp;&nbsp; 1744 calls,&nbsp;&nbsp; 1.665 s =

avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; =
cft3s&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;=20
:&nbsp; 4337.89s CPU (&nbsp; 950068 calls,&nbsp;&nbsp; 0.005 s=20
avg)<BR>&nbsp;&nbsp;&nbsp;&nbsp; interpolate&nbsp; :&nbsp;&nbsp;&nbsp; =
34.87s=20
CPU (&nbsp;&nbsp;&nbsp;&nbsp; 538 calls,&nbsp;&nbsp; 0.065 s=20
avg)<BR>&nbsp;<BR>&nbsp;&nbsp;&nbsp;&nbsp; Parallel=20
routines<BR>&nbsp;&nbsp;&nbsp;&nbsp; fft_scatter&nbsp; :&nbsp;&nbsp; =
538.83s CPU=20
(&nbsp; 950068 calls,&nbsp;&nbsp; 0.001 s avg)</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>From the reported information, we can =
see that the=20
efficiency of my calculation is quite low.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>I think it maybe not up to =
snuff.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>For better understanding with my =
problem, I'll tell=20
more about my software, hardware&nbsp;and simulation model.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Simulation model: my system was a slab =
model for=20
a&nbsp;certain metal oxide surface with 3 irregular k =
points.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>The pseudopotential was Ultrasoft =
(Vanderbilt)=20
Pseudopotentials.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Hardware: there are two Xeon single =
core CPUs=20
and&nbsp;2G physical memory for each node. The network </FONT></DIV>
<DIV><FONT face=3DArial size=3D2>is infiniband with 10G band width. =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Software: My fortran and C compile was =
intel=20
10.0.015 version. MKL is l_mkl_p_10.0.3.020.tgz.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>FFTW is fftw-2.1.5.tar.gz. MPI&nbsp;is=20
mpich2-1.0.7.tar.gz.</FONT>&nbsp;<FONT face=3DArial size=3D2>All of =
above was stored=20
at NSF location.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>My QE was compiled in a NFS location. =
and the=20
outdir&nbsp;was also on the NFS. wfcdir was on local</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>disk, /tmp/ folder.&nbsp;In order =
to&nbsp;reduce=20
the&nbsp;IO, I also set the disk_io =3D 'none'.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Could you tell me what make =
my&nbsp;CPUs run in a=20
such low efficiency style? Is there any hints to improve the =
</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>performance of the parallel=20
efficiency?</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>Do you think 10G infiniband is good =
enough for 39=20
nodes? Do you think it's not necessary to put so much file</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>on NFS =
localtion?&nbsp;&nbsp;Could&nbsp;tell me=20
which folders&nbsp;must be on a NFS location so that all the nodes can =
load=20
and&nbsp;</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>write?&nbsp;&nbsp;</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>I also noticed that the pw.x reported =
129.6 Mb=20
memory was required. But actually, I found the virtual memory =
was</FONT></DIV>
<DIV><FONT face=3DArial size=3D2>used. Do you think the pw.x =
underestimate greatly=20
for the memory? </FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>thank you for reading.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>any hints on my problem will be deeply=20
appreciated.</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial size=3D2>vega</FONT></DIV>
<DIV><FONT face=3DArial size=3D2></FONT>&nbsp;</DIV>
<DIV><FONT face=3DArial=20
size=3D2>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D<BR>Vega=20
Lew (weijia liu)<BR>PH.D Candidate in Chemical Engineering<BR>State Key=20
Laboratory of Materials-oriented Chemical Engineering<BR>College of =
Chemistry=20
and Chemical Engineering<BR>Nanjing University of Technology, 210009, =
Nanjing,=20
Jiangsu, China</FONT></DIV></BODY></HTML>

------=_NextPart_000_0008_01C91ABB.63A1E290--



More information about the Pw_forum mailing list