[Pw_forum] P4HT problem?

Nicola Marzari marzari at MIT.EDU
Fri Jul 9 15:39:57 CEST 2004



Dear All,


these are tests made by Matteo Cococcioni on our own cluster.
(The code used is CP, the Car-Parrinello molecular dynamics
code distributed by www.democritos.it - input and parts of
the architecture are common with PWSCF).

Our notes are that the bus speed is very important, as is
having DDR3200 memory; DDR2700 leads to a 10-15% decrease
in performance. Until last week, fastest PIV had 800 MHz
front side bus, and fastest Xeons only 533 MHz FSB.
For this reason, one dual Xeon performs only between 0% to 40%
better than a single Xeon on memory intensive application,
and a dual Xeon is on par or even worse with a single PIV with the
faster bus. Note that CP is a very optimized code, that relies
heavily on the blas/lapack zgemm as implemented by Intel in
the MKL library.

Gigabit ethernet seems to do poorly above 4 or maybe 6 nodes.

Intel has just released Nocona, the Xeon with 800 MHz
FSB on the Tumwater E7525 chipset, and this would probably improve
considerably the performance of dual-processor machines.

Best,


				nicola



> Matteo's tests - opensesame fast cp (uqbar: cluster of PIV 3.2 GHz, 
 > 800 MHz FSB, DDR3200 or DDR2700; dx6: dual xeon 2.4 GHz, 533 MHz FSB,
 > DDR2100; grant: dual xeon 2.8 (?) GHz, 400 (?) MHz FSB)
> ifc 7.1, MKL 6.0, fftw 2.1.3. AgI with 108 atoms, 486 bands.
> 
> -------------------------------------------------
>     uqbar (DDR3200 over 8-port gigabit)
>             5 iteraz     10 iteraz   (10-5, in sec)
> 1             28:34        47:49     1155s
> 2             15:24        26:31      667s
> 3             10:34        17:59      445s
> 4              7:55        13:22      327s
> 5              7:10        12:14      304s
> 6              6:11        10:15      244s
> 7              5:50         9:42      232s
> 8              5:23         9:25      242s
>     uqbar (DDR3200 over 4-port gigabit)           
>             5 iteraz      10 iteraz                5 iteraz      10 iteraz
>  1 proc       29:27        49:59   (10-5=1232)     42:51 (dx6)   71:12 (dx6)
>  2 proc       15:15        26:19   (10-5=664s)     39:59 (grant) 68:51  (grant)
>  3 proc       10:30        17:57   (10-5=447s)
>  4 proc        7:55        13:22   (10-5=327s)     22:03 (grant) 38:13 (grant)
>     uqbar (DDR3200 over 48-port ethernet)
>  2 proc       18:45        31:58
>  3 proc       15:12        26:04
>     uqbar (DDR2700 over 4-port gigabit)
>  1 proc       32:31        55:00
>  2 proc       17:00        29:26
>  3 proc       10:43        18:15
>  4 proc        8:50        14:55

(For some reason, the two 1 proc tests on uqbar seems to be slightly
different). We have also a variety of tests on machines with unknown
FSBs and RAM speeds

> dual Xeon 2.4 GHz  (6 cpus - myrinet) DDRAM macarthur        24:27
> dual Xeon 2.4 GHz  (4 cpus - myrinet) DDRAM macarthur        36:21
> dual Xeon 2.8 GHz  (4 cpus - grant 7,8 lam ethernet) DDRAM   50:45 +/- 15s
> dual Xeon 2.4 GHz  (2 cpus) RDRAM                            63 to 70 minutes
> dual Xeon 2.4 GHz  (2 cpus - myrinet) DDRAM macarthur        68:30
> dual Xeon 2.8 GHz  (2 cpus - grant node 9) DDRAM  (200 MHz?) 74:45 +/- 15s
> dual Xeon 2.4 GHz  (1 cpu) RDRAM                            107:   minutes
> dual Xeon 2.0 GHz  (1 cpu) DDR                              127:   minutes


---------------------------------------------------------------------
Prof Nicola Marzari   Department of Materials Science and Engineering
13-5066   MIT   77 Massachusetts Avenue   Cambridge MA 02139-4307 USA
tel 617.4522758  fax 617.2586534  marzari at mit.edu  http://nnn.mit.edu



More information about the Pw_forum mailing list