[Pw_forum] IBM JS20 pw tests problems

Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu
Thu Nov 20 23:12:40 CET 2008


On Thu, 20 Nov 2008, Lazaro Calderin wrote:

LC> Hello everyone,

hello lazaro,

LC> I am getting some large discrepancies and errors when executing pw tests
LC> on a couple of ibms clusters. Here is a report for the first one.

actually, the discrepancies are not _that_ large, and 
most likely due the fact that the reference was run with
fftw as FFT library, while you are using ESSL. fftw supports
many more FFT grid sizes than ESSL and as a consequence 
you have differences in energies. try compiling without ESSL
and against the shipped internal fft and see if you still
have those differences...

LC> 
LC> This is the arch.:
LC> 
LC> Machine:     IBM JS20 Blade Center
LC> OS:          Linux 2.6.5-7.244-pseries64
LC> Compilers:   xlf suite 10.1
LC> 
LC> espresso 4.0.3, configured with ARCH = ppc64, make.sys file modified to
LC> account for missing stuff (an example for the serial version is attached).
LC> 
LC> All parallel and serial version, with or without optimization, give the
LC> same errors or discrepancies in the pw tests suite. The same errors are

which confirms my comments about the fft grids causing the
energy differences.

LC> also reproduced when using espresso's blas and lapack. A log of the pw
LC> tests is attached.
LC> 
LC> Main problems:
LC> Apart from some large discrepancies these are the most colorful errors:
LC> 
LC> - berry tests ask to increase ecutrho
LC> - paw-atom_l=2 blows up with a segmentation fault

those need to be investigated in more detail. the ecutrho
thing may be related to the grid sizes (check and compare
the two output files for the fft grids). the paw problem
should not happen, but this is rather new code. since you
get a segfault, it would be helpful, if you could trigger
a coredump and provide a stack trace, so that the author
of the code can have a closer look...

LC> I thought that the tests my be outdated but they work fine on Linux/Intel
LC> Clusters Tool/Mkl.
LC> 
LC> I could not find any reports on these problems.
LC> The segmentation fault may be a compiler problem but what about all 
LC> the rest.

true, but making code work with multiple compilers is usually
a good thing. some compilers are overly forgiving towards
some programming practices that are not 100% standard. it is
always helpfult to track these issues down...

cheers,
   axel.

LC> 
LC> Thanks,
LC> Lazaro
LC> 
LC> 

-- 
=======================================================================
Axel Kohlmeyer   akohlmey at cmm.chem.upenn.edu   http://www.cmm.upenn.edu
   Center for Molecular Modeling   --   University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582,  fax: 1-215-573-6233,  office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.


More information about the Pw_forum mailing list