[Pw_forum] k-points parallelization in pwscf 4.2.1

Gabriele Sclauzero sclauzer at sissa.it
Mon Feb 14 14:49:20 CET 2011


Dear Davide,

  it might be a memory-contention problem, since CPU cache sizes are of the order of a few MB. Please provide the detailed timings at the end of the runs (that are the first thing one should look at in order to interpret these kind of speedup tests).

Next time please take a few seconds to sign your post using full name and affiliation.

Regards,

GS


Il giorno 14/feb/2011, alle ore 12.18, Davide Sangalli ha scritto:

> Thank you for the answer.
> 
> I did a check to be sure, but these jobs use only few MB of memory.
> The serial run uses just 2.5% of my node memory (so around 15% for the 
> run on 6 CPUs).
> It does not seems to me that this could be the problem.
> Moreover in the fft parallelization the memory was not distributed neither.
> 
> Is it possible that pwscf is not properly compiled?
> Is there any other check that you would suggest to do?
> 
> Best regards,
> Davide
> 
> *************************************
>      Largest allocated arrays     est. size (Mb)     dimensions
>         Kohn-Sham Wavefunctions         5.68 Mb     (   6422,  58)
>         NL pseudopotentials            13.33 Mb     (   6422, 136)
>         Each V/rho on FFT grid          7.81 Mb     ( 512000)
>         Each G-vector array             1.76 Mb     ( 230753)
>         G-vector shells                 0.09 Mb     (  12319)
>      Largest temporary arrays     est. size (Mb)     dimensions
>         Auxiliary wavefunctions        22.73 Mb     (   6422, 232)
>         Each subspace H/S matrix        0.82 Mb     (    232, 232)
>         Each <psi_i|beta_j> matrix      0.12 Mb     (    136,  58)
>         Arrays for rho mixing          62.50 Mb     ( 512000,   8)
>      writing wfc files to a dedicated directory
> 
> 
> On 02/14/2011 11:34 AM, Paolo Giannozzi wrote:
>> Davide Sangalli wrote:
>> 
>>> What could my problem be?
>> the only reason I can think of is that k-point parallelization doesn't
>> (and cannot) distribute memory, so the total memory requirement will
>> be npools*(size of serial execution). If you run on the same node six
>> instances of a large executable, memory conflicts may slow down more
>> than parallelization can speed up.
>> 
>> P.
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum


§ Gabriele Sclauzero, EPFL SB ITP CSEA
   PH H2 462, Station 3, CH-1015 Lausanne

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.democritos.it/pipermail/pw_forum/attachments/20110214/d15028b2/attachment.htm 


More information about the Pw_forum mailing list