<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><br><div><div>Il giorno 21/gen/2011, alle ore 21.11, Yuyang Zhang ha scritto:</div><br class="Apple-interchange-newline"><blockquote type="cite">Dear All,<div><br></div><div>I am now carrying on the calculation for a system with 2300 electrons in a 20x22x25 angstrom^3 supercell. &nbsp;</div><div><br></div><div>I always submit jobs on a blue-gene type machine with 512 CPUs (power 470) connected with infiniband.</div>

<div><br></div><div>In VASP, time scale for a single SCF step is about 150 sec (spin polarized, gamma point only, 400 eV energy cutoff)</div><div><br></div><div>In PWSCF, with a 25Ry energy cutoff, a single SCF step will take 3000 sec (spin polarized , gamma point only, -npool=2).</div>

<div><br></div><div>I check "massive parallel" on the manual and the arxive of this mailing list, and try to use -ntg tag when submitting the jobs but no significant improvement.</div></blockquote><div><br></div><div>In that section you should also have found this:</div><div><p style="font-family: Times; ">Since v.4.1, ScaLAPACK can be used to diagonalize block distributed matrices, yielding better speed-up than the default algorithms for large (&gt; 1000&nbsp;<tex2html_verbatim_mark>) matrices, when using a large number of processors (&gt; 512&nbsp;<tex2html_verbatim_mark>). If you want to test ScaLAPACK, use&nbsp;<tt>configure -with-scalapack</tt>. This will add&nbsp;<tt>-D__SCALAPACK</tt>&nbsp;to DFLAGS in<tt>make.sys</tt>&nbsp;and set LAPACK_LIBS to something like:</tex2html_verbatim_mark></tex2html_verbatim_mark></p><span class="Apple-style-span" style="font-family: Times; "><pre>    LAPACK_LIBS = -lscalapack -lblacs -lblacsF77init -lblacs -llapack</pre></span><div><br></div><div>are you using parallel diagonalization with scalapack?</div><div><br></div><div>GS</div></div><br><blockquote type="cite"><div><br></div><div>There is no reason that PWSCF runs 20 times slower than VASP. &nbsp;Does anyone have experience to improve the parallel efficiency for these large systems?</div>

<div><br></div><div>Best,<br clear="all"><br>Yuyang Zhang</div><div><br></div><div>Nanoscale Physics and Devices Laboratory<br>Institute of Physics, Chinese Academy of Sciences<br>Beijing 100190, P. R. China<br><br><br>

</div>

_______________________________________________<br>Pw_forum mailing list<br><a href="mailto:Pw_forum@pwscf.org">Pw_forum@pwscf.org</a><br>http://www.democritos.it/mailman/listinfo/pw_forum<br></blockquote></div><br><div>

<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-size: medium; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; "><div><span class="Apple-style-span" style="color: rgb(126, 126, 126); font-size: 16px; font-style: italic; "><br class="Apple-interchange-newline">§ Gabriele Sclauzero,&nbsp;EPFL SB ITP CSEA</span></div><div><font class="Apple-style-span" color="#7E7E7E"><i>&nbsp;&nbsp; PH H2 462, Station 3,&nbsp;CH-1015 Lausanne</i></font></div></span>

</div>

<br></body></html>