[Pw_forum] P4HT problem?
Nicola Marzari
marzari at MIT.EDU
Fri Jul 9 15:39:57 CEST 2004
Dear All,
these are tests made by Matteo Cococcioni on our own cluster.
(The code used is CP, the Car-Parrinello molecular dynamics
code distributed by www.democritos.it - input and parts of
the architecture are common with PWSCF).
Our notes are that the bus speed is very important, as is
having DDR3200 memory; DDR2700 leads to a 10-15% decrease
in performance. Until last week, fastest PIV had 800 MHz
front side bus, and fastest Xeons only 533 MHz FSB.
For this reason, one dual Xeon performs only between 0% to 40%
better than a single Xeon on memory intensive application,
and a dual Xeon is on par or even worse with a single PIV with the
faster bus. Note that CP is a very optimized code, that relies
heavily on the blas/lapack zgemm as implemented by Intel in
the MKL library.
Gigabit ethernet seems to do poorly above 4 or maybe 6 nodes.
Intel has just released Nocona, the Xeon with 800 MHz
FSB on the Tumwater E7525 chipset, and this would probably improve
considerably the performance of dual-processor machines.
Best,
nicola
> Matteo's tests - opensesame fast cp (uqbar: cluster of PIV 3.2 GHz,
> 800 MHz FSB, DDR3200 or DDR2700; dx6: dual xeon 2.4 GHz, 533 MHz FSB,
> DDR2100; grant: dual xeon 2.8 (?) GHz, 400 (?) MHz FSB)
> ifc 7.1, MKL 6.0, fftw 2.1.3. AgI with 108 atoms, 486 bands.
>
> -------------------------------------------------
> uqbar (DDR3200 over 8-port gigabit)
> 5 iteraz 10 iteraz (10-5, in sec)
> 1 28:34 47:49 1155s
> 2 15:24 26:31 667s
> 3 10:34 17:59 445s
> 4 7:55 13:22 327s
> 5 7:10 12:14 304s
> 6 6:11 10:15 244s
> 7 5:50 9:42 232s
> 8 5:23 9:25 242s
> uqbar (DDR3200 over 4-port gigabit)
> 5 iteraz 10 iteraz 5 iteraz 10 iteraz
> 1 proc 29:27 49:59 (10-5=1232) 42:51 (dx6) 71:12 (dx6)
> 2 proc 15:15 26:19 (10-5=664s) 39:59 (grant) 68:51 (grant)
> 3 proc 10:30 17:57 (10-5=447s)
> 4 proc 7:55 13:22 (10-5=327s) 22:03 (grant) 38:13 (grant)
> uqbar (DDR3200 over 48-port ethernet)
> 2 proc 18:45 31:58
> 3 proc 15:12 26:04
> uqbar (DDR2700 over 4-port gigabit)
> 1 proc 32:31 55:00
> 2 proc 17:00 29:26
> 3 proc 10:43 18:15
> 4 proc 8:50 14:55
(For some reason, the two 1 proc tests on uqbar seems to be slightly
different). We have also a variety of tests on machines with unknown
FSBs and RAM speeds
> dual Xeon 2.4 GHz (6 cpus - myrinet) DDRAM macarthur 24:27
> dual Xeon 2.4 GHz (4 cpus - myrinet) DDRAM macarthur 36:21
> dual Xeon 2.8 GHz (4 cpus - grant 7,8 lam ethernet) DDRAM 50:45 +/- 15s
> dual Xeon 2.4 GHz (2 cpus) RDRAM 63 to 70 minutes
> dual Xeon 2.4 GHz (2 cpus - myrinet) DDRAM macarthur 68:30
> dual Xeon 2.8 GHz (2 cpus - grant node 9) DDRAM (200 MHz?) 74:45 +/- 15s
> dual Xeon 2.4 GHz (1 cpu) RDRAM 107: minutes
> dual Xeon 2.0 GHz (1 cpu) DDR 127: minutes
---------------------------------------------------------------------
Prof Nicola Marzari Department of Materials Science and Engineering
13-5066 MIT 77 Massachusetts Avenue Cambridge MA 02139-4307 USA
tel 617.4522758 fax 617.2586534 marzari at mit.edu http://nnn.mit.edu
More information about the Pw_forum
mailing list