[Pw_forum] [Fwd: diagonalization failure (david, cg) for large numbers of bands]
Stefano de Gironcoli
degironc at sissa.it
Tue Dec 1 22:42:40 CET 2009
Dear Vivek Ranjan... (or Joseph Turnbull ?),
I do not know if this comment is relevant... I never tried to
compute so large a fraction of the band structure.
how many plane waves your basis set contains ?
the default operation of davidson diagonalization is that the basis
set is expanded up to diago_david_ndim times (default = 4 times) the
number or required bands (nbnd) ... are you asking more that 1/4 of
the total number of elements in your basis set ?
it seams to me that NPW in your case should be something of the
order of 8000, isn't it ?
you could try using diago_david_ndim=2..
stefano
Quoting Vivek Ranjan <vranjan at ncsu.edu>:
> Hi,
>
> ## Summary: Running pw.x on 128-1024 processors, testing bulk 64-Si cell
> at gamma
> (gamma tricks not used because of incompatibility with subsequent
> calculations) with
> a "large" number of (extra) bands. No problems reported when nbnd is
> small. With
> 128-256 processors, when nbnd>1300, if using Davidson diag, program exits
> before
> completion of 1 scf step, with cholesky decomposition failure error; if using
> iterative diag (cg), fails at same stage with error "(ZHEGV*) failed".
> System is
> Cray XT4.
>
> ## Purpose: reproducing the beautiful results of PHYSICAL REVIEW B 79,
> 201104, 2009
> for GWW education purposes. :)
>
> ## Background: I have found similar-looking problems reported here, and
> have tried
> several of the recommendations (switching to ndiag 1 at runtime to use
> serial diag
> instead of parallel; switching from david to cg).
>
> In addition, I have tried increasing the PW cutoff (to provide more PWs
> relative to
> requested bands for the sake of Davidson diag, but this does not really
> help).
>
> I also attempted to do a regular SCF calculation with no nbnd specification,
> followed by a NSCF calculation with extra bands specified. The same
> errors are
> obtained.
>
> ## Current status: I am now trying to rule out memory-related errors (via
> running
> on more nodes), and will update this thread accordingly if the problem is
> related to
> memory requirements. Running on 512 processors permitted nbnd=2500
> (converged
> results should require ~3300 bands for this particular calculation,
> according to my
> understanding of the noted paper), and I have some 1024 processor runs
> queued up.
>
> It does not seem to me that such a system, even with so many states,
> should have
> such large memory demands, so am wondering if I am doing something
> stupendously
> wrong (or perhaps not exactly doing something wrong, but failing to do
> something
> glaringly obvious that would solve the problem). Below is my input file,
> followed
> by some brief technical specs in case such are helpful.
>
> ## Sample input file:
>
> &control
> calculation='scf'
> restart_mode='from_scratch',
> prefix='si'
> outdir='/scr/josepht/espresso/bsi64/Large_GAMMA/STEP_B/tmp'
> pseudo_dir='/scr/josepht/espresso/bsi64/pseudo'
> /
> &system
> ibrav= 8,
> celldm(1)= 20.52,
> celldm(2)= 1,
> celldm(3)=1,
> nat= 64,
> ntyp= 1,
> ecutwfc = 35.0,
> nosym=.true.
> nbnd = 3328,
> /
> &electrons
> diagonalization='david',
> conv_thr = 1.0d-8,
> mixing_beta = 0.5,
> /
> ATOMIC_SPECIES
> Si 1. Si.pbe-rrkj.UPF
> ATOMIC_POSITIONS (bohr)
> Si 0.00000000 0.00000000 0.00000000
> Si 5.13000000 5.13000000 0.00000000
> Si 0.00000000 5.13000000 5.13000000
> Si 5.13000000 0.00000000 5.13000000
> Si 2.56500000 2.56500000 2.56500000
> Si 7.69500000 7.69500000 2.56500000
> Si 7.69500000 2.56500000 7.69500000
> Si 2.56500000 7.69500000 7.69500000
> Si 10.26000000 0.00000000 0.00000000
> Si 15.39000000 5.13000000 0.00000000
> Si 10.26000000 5.13000000 5.13000000
> Si 15.39000000 0.00000000 5.13000000
> Si 12.82500000 2.56500000 2.56500000
> Si 17.95500000 7.69500000 2.56500000
> Si 17.95500000 2.56500000 7.69500000
> Si 12.82500000 7.69500000 7.69500000
> Si 0.00000000 10.26000000 0.00000000
> Si 5.13000000 15.39000000 0.00000000
> Si 0.00000000 15.39000000 5.13000000
> Si 5.13000000 10.26000000 5.13000000
> Si 2.56500000 12.82500000 2.56500000
> Si 7.69500000 17.95500000 2.56500000
> Si 7.69500000 12.82500000 7.69500000
> Si 2.56500000 7.69500000 7.69500000
> Si 10.26000000 0.00000000 0.00000000
> Si 15.39000000 5.13000000 0.00000000
> Si 10.26000000 5.13000000 5.13000000
> Si 15.39000000 0.00000000 5.13000000
> Si 12.82500000 2.56500000 2.56500000
> Si 17.95500000 7.69500000 2.56500000
> Si 17.95500000 2.56500000 7.69500000
> Si 12.82500000 7.69500000 7.69500000
> Si 0.00000000 10.26000000 0.00000000
> Si 5.13000000 15.39000000 0.00000000
> Si 0.00000000 15.39000000 5.13000000
> Si 5.13000000 10.26000000 5.13000000
> Si 2.56500000 12.82500000 2.56500000
> Si 7.69500000 17.95500000 2.56500000
> Si 7.69500000 12.82500000 7.69500000
> Si 2.56500000 17.95500000 7.69500000
> Si 0.00000000 0.00000000 10.26000000
> Si 5.13000000 5.13000000 10.26000000
> Si 0.00000000 5.13000000 15.39000000
> Si 5.13000000 0.00000000 15.39000000
> Si 2.56500000 2.56500000 12.82500000
> Si 7.69500000 7.69500000 12.82500000
> Si 7.69500000 2.56500000 17.95500000
> Si 2.56500000 7.69500000 17.95500000
> Si 10.26000000 10.26000000 0.00000000
> Si 15.39000000 15.39000000 0.00000000
> Si 10.26000000 15.39000000 5.13000000
> Si 15.39000000 10.26000000 5.13000000
> Si 12.82500000 12.82500000 2.56500000
> Si 17.95500000 17.95500000 2.56500000
> Si 17.95500000 12.82500000 7.69500000
> Si 12.82500000 17.95500000 7.69500000
> Si 10.26000000 0.00000000 10.26000000
> Si 15.39000000 5.13000000 10.26000000
> Si 10.26000000 5.13000000 15.39000000
> Si 15.39000000 0.00000000 15.39000000
> Si 12.82500000 2.56500000 12.82500000
> Si 17.95500000 7.69500000 12.82500000
> Si 17.95500000 2.56500000 17.95500000
> Si 12.82500000 7.69500000 17.95500000
> Si 0.00000000 10.26000000 10.26000000
> Si 5.13000000 15.39000000 10.26000000
> Si 0.00000000 15.39000000 15.39000000
> Si 5.13000000 10.26000000 15.39000000
> Si 2.56500000 12.82500000 12.82500000
> Si 7.69500000 17.95500000 12.82500000
> Si 7.69500000 12.82500000 17.95500000
> Si 2.56500000 17.95500000 17.95500000
> Si 10.26000000 10.26000000 10.26000000
> Si 15.39000000 15.39000000 10.26000000
> Si 10.26000000 15.39000000 15.39000000
> Si 15.39000000 10.26000000 15.39000000
> Si 12.82500000 12.82500000 12.82500000
> Si 17.95500000 17.95500000 12.82500000
> Si 17.95500000 12.82500000 17.95500000
> Si 12.82500000 17.95500000 17.95500000
> K_POINTS
> 1
> 0.0 0.0 0.0 1.0
>
> ##END OF INPUT
>
> The above file runs when nbnd = 1280 , and (possibly) relevant output from
> the
> successful run includes:
>
> (Each subspace H/S matrix 400.00 Mb ( 5120,5120)
>
> ## Technical specs: Code was compiled on a Cray XT4 (unsure if
> compilation details
> would be helpful), and runs were performed on Cray XT4 nodes with two
> quad-core 2.3
> GHz AMD Opteron processors with 16 GBytes of usable memory (requesting 4
> cores per
> node).
>
> I've read here that the problem might be related to libraries/compilers
> (issues with
> PGI, ACML, etcetera)...if that is likely the case, I would be interested
> in insight
> regarding optimal compilation on Cray.
>
> Thanks in advance for any assistance, and I apologize if this question has
> essentially already been answered on the forum - I searched but did not
> come across
> an explicit solution to something matching this, though admit that the
> general theme
> is present in several independent threads.
>
> Joseph Turnbull
> Department of Physics
> NC State University
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
>
----------------------------------------------------------------
SISSA Webmail https://webmail.sissa.it/
Powered by Horde http://www.horde.org/
More information about the Pw_forum
mailing list