[Pw_forum] abysmal parallel performance of the CP code
Axel Kohlmeyer
axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Thu Sep 22 12:07:20 CEST 2005
On Wed, 21 Sep 2005, Konstantin Kudin wrote:
KK> Hi,
kostya,
KK> I've done some parallel benchmarks for the CP code so I thought I'd
KK> share them with the rest of the group. The system we have is a cluster
KK> of dual Opterons 2.0 Ghz with 1Gbit ethernet.
please keep in mind, that for reasonable scaling with car-parrinello
MD you usually need a better interconnect than gigabit ethernet.
some time ago i summarized some tests results for the CPMD code at:
http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/cpmd-bench.html#parallel
the general issues apply to the espresso CP codes as well as to CPMD.
with gigabit (or TCP/IP at that) you suffer most from the very high
latencies. this is especially bad for the all-to-all communications that
are needed for the FFTs, and more visibly for dual-cpu. for quad-cpu nodes
you also should be in the bandwidth limit. now, how much this becomes
visible depends much on the size of the job. we have a cluster of
dual-opteron 246 (2.0GHz) with two gigabit networks (data and MPI
separately) and it usually does not pay to run jobs across more
than 3-4 nodes. and even then you already 'waste' about 20% of your
cpu power. the only saving grace is the fact, that a better interconnect
will coste much more than one wasted node.
KK> I looked at 2 different measures of time, CPU time, and wall time
KK> computed as the difference between "This run was started" and "This run
KK> was terminated". By the way, such wall time could probably be printed
KK> by the code directly to be readily available.
probably, but you can also get the number as easily by using
the 'time' command to start the jobs.
KK> The system is a reasonably sized simulation cell with 20 CP
KK> (electronic+ionic) steps total.
KK>
KK> The compiler is IFC 9.0, GOTO library is for BLAS, and mpich 1.2.6
KK> used for the MPI. The CP version is the CVS from Aug. 20, 2005.
KK>
KK> What is crazy is that even for 2 cpus sitting in the same box there is
KK> lots of cpu time just lost somewhere. The strange thing is that the
KK> quad we have at 2.2 Ghz seems to lose just as much wall time as 2 duals
KK> talking across the network. And note how 4 cpus are barely better than
KK> 2x compared to single cpu performance if the wall clock time is
KK> considered.
please check, whether your MPI library does use shared memory
communication properly and that your kernel supports setting the
proper CPU and Memory affinity (and you set it). i have seen some
numbers where this makes over 20% difference on a dual machine and
i would expect it matters even more on quad machines.
KK> I know Nicola Marzari has done some parallel benchmarks, but I do not
KK> think that wall times were being paid attention to ...
KK>
KK> Kostya
KK>
KK> P.S. Any suggestions what might be going on here?
you also have to take into account, that when you are running
a gamma point only calculation, you are missing the most efficient
parallelization (across k-points) that helps running, e.g., pw.x
rather efficiently on 'not so high'-performance networks.
best regards,
axel.
KK>
KK>
KK> Ncpu CPU time Wall time
KK> 1 1h22m 1h24m
KK> 2 45m33.41s 57m13s
KK> 4 27m30.80s 44m21s
KK> 6 18m22.71s 43m18s
KK> 8 14m53.91s 45m56s
KK>
KK> 4(quad) 37m18.56s 45m32s
KK>
KK>
KK>
KK> __________________________________________________
KK> Do You Yahoo!?
KK> Tired of spam? Yahoo! Mail has the best spam protection around
KK> http://mail.yahoo.com
KK> _______________________________________________
KK> Pw_forum mailing list
KK> Pw_forum at pwscf.org
KK> http://www.democritos.it/mailman/listinfo/pw_forum
KK>
KK>
--
=======================================================================
Dr. Axel Kohlmeyer e-mail: axel.kohlmeyer at theochem.ruhr-uni-bochum.de
Lehrstuhl fuer Theoretische Chemie Phone: ++49 (0)234/32-26673
Ruhr-Universitaet Bochum - NC 03/53 Fax: ++49 (0)234/32-14045
D-44780 Bochum http://www.theochem.ruhr-uni-bochum.de/~axel.kohlmeyer/
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the Pw_forum
mailing list