[Pw_forum] FFT & MPI - issues with latency

Axel Kohlmeyer akohlmey at vitae.cmm.upenn.edu
Mon Jan 30 14:24:16 CET 2006


On Mon, 30 Jan 2006, Konstantin Kudin wrote:

KK>  Axel, Nicola, thanks for sharing the experiences!

dear kostya,

KK>  I am actually wondering now if the benchmarks that Nicola posted did
KK> not have the 2nd Xeon cpu running threads with the GOTO or MKL
KK> libraries (launched on their own). If such threads were running, then
KK> that would make the 1 cpu per node times artifically low, making it
KK> look like the 2nd cpu adds very little.

in my experience, this is less likely. some friend of mine made tests,
and even if forcing MKL to using only one thread by setting the 
environment variable OMP_NUM_THREADS to 1, the pentium IV processors
beat the opteron hands down. MKL does a fantastic job on those cpus.
the only PC machines, where i found a _significant_ gain in using a 
multithreaded BLAS/LAPACK, were dual athlon MP 1600+/1800+ machines
using an AMD MPX chipset, but on those using MPI over shared memory
was even more efficient. as nicola pointed out, memory bandwith is
a key issue (btw. this is true for both opteron and xeon) and dual
processor on x86/em64t/amd64 for most scientific applications 
currently only makes sense with opteron. 

KK>  As far as Axel's comments go, the idea of caching the G-space to
KK> R-space fourier transforms sound intriguing. Perhaps going to 8 or even
KK> 16 cpus is nothing to high rollers like Axel :-) , however, for mere
KK> mortals using the regular Gigabit and cheap clusters that would be a
KK> step forward !

i'm all for it. hell, i've done a lot of that kind of machine myself.
my point was only, that i found, one has to consider the value of
the time (and money) invested versus the gain. you cannot expect to
get much further than the cluster of 8 nodes subclusters setup, that 
nicola was describing (which i think is a commendable idea for setting 
up a cost-efficient cluster solution for those plane wave codes, we all 
love/hate/use so much)

KK>  Also, this parastation for Gigabit-not-really-Gigabit sounded like it
KK> could reduce the latency with no new hardware. Was it implied that such
KK> a solution was somewhat unstable? How easy is it to get it running, and
KK> to use it afterwards?

it is definitely worth a try. when i tried it about two years ago, it 
was comparatively easy to setup. to me the most annoying part was having
to run a separate license manager and being locked into their 
'middleware' solution. stability was good. i would recommend, you try 
it, especially if you qualify for the acedemic (i.e. no-cost) licensing.

axel


KK> 
KK>  Kostya
KK> 
KK> __________________________________________________
KK> Do You Yahoo!?
KK> Tired of spam?  Yahoo! Mail has the best spam protection around 
KK> http://mail.yahoo.com 
KK> _______________________________________________
KK> Pw_forum mailing list
KK> Pw_forum at pwscf.org
KK> http://www.democritos.it/mailman/listinfo/pw_forum
KK> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.




More information about the Pw_forum mailing list