[Pw_forum] Nonlinear scaling with pool parallelization
Paolo Giannozzi
giannozz at democritos.it
Tue Apr 5 21:54:02 CEST 2011
On Apr 5, 2011, at 19:54 , Markus Meinert wrote:
> I used an _unshifted_ k-mesh
it doesn't matter if it is shifted or unshifted: only the number of k-
points
matters for k-point parallelization.
> The slab has 20 k points.
20 k-points on 3 processors = 7+7+6: load balancing is not ideal.
This is likely to be a minor factor, though.
> But, since a single iteration takes about 100 seconds, I do not
> see where the time is being spent, when the k points are independent.
you do not see because you do not know where to look. Not that it
is explained somewhere...have a look into the final report:
* the time spent in "c_bands" and called routines is proportional to the
number of k-points, so it will scale linearly with the number of
"k-point pools"
* the time spent in "sum_band" is only in part proportional to the
number
of k-points and will partially scale
* the time spent in "v_of_rho", "newd", "mix_rho", is independent
upon the
number of of k-points and will not scale at all
* k-point parallelization does not reduce memory
* The rest is usually irrelevant
Also note that
* FFT parallelization distributes most memory
* FFT parallelization speeds up (with varying efficiency) almost all
routines,
with the exception of "cdiaghg" or "rdiaghg"
* linear-algebra parallelization (that you are not using) will (not
always) speed
up "cdiaghg" or "rdiaghg" and distribute more memory
Alles klar?
P.
---
Paolo Giannozzi, Dept of Chemistry&Physics&Environment,
Univ. Udine, via delle Scienze 208, 33100 Udine, Italy
Phone +39-0432-558216, fax +39-0432-558222
More information about the Pw_forum
mailing list