[Pw_forum] density files

Axel Kohlmeyer
Wed Feb 8 23:01:18 CET 2006

On Wed, 8 Feb 2006, Silviu Zilberman wrote:


Paolo Giannozzi wrote:
On Wednesday 08 February 2006 12:18, Silviu Zilberman wrote:
SZ> >
SZ> >   
SZ> >> I have been running calculations on Lemieux which is an alpha cluster
SZ> >> super computer in Pittsburgh. For some reason that is still mysterious
SZ> >> to me, writing these density files on the scratch space took very long
SZ> >> time, ~30 (!) minutes for a 68MB file.
SZ> >>     
SZ> >
maybe you should rename your machine "Lepire" :-)

paolo, you should be careful with remarks like this.
people in pittsburgh take sports _very_ seriously and
don't like others making fun of the names of their idol 
from the pittsburgh penguins. you may get away with it 
since the steelers just won the superbowl last weekend...

SZ> > The charge density file should be written only when the wavefunctions
SZ> > are written, every "isave" steps and at the end of the run. If it is written
SZ> > at each time step, this is definitely wrong. 
SZ> >   
SZ> The charge density is written only every isave steps, and I set it to a 
SZ> very large number to avoid this time-consuming i/o. But even if I write 
SZ> it just once at the end of the calculation, it would still require ~90 
SZ> minutes for 3 files, which is completely crazy, given that the maximal 
SZ> time allocated per job is 12 hours on this supercomputer.

lemieux is an alpha and the dec compiler has the unfortunate property
to do synchronous i/o by default. this will have a desasterous effect
on a networked filesystem used by (too) many users.
please try compiling with  '-assume  buffered_io' and let me know
if that helps.

SZ> > The charge density in a parallel calculation is collected to a single node
SZ> > and written from there. Since it is not wise to collect it into a single
SZ> > array, each slice from each processor is collected and written. Maybe 
SZ> > this algorithm is not optimal (maybe it is even "pessimal"). You should 
SZ> > try to understand where exactly the machine spends all this time and
SZ> > why

another recommendation from the PSC staff is only use 3 processors per 
node for the actual job (which generally is the performance limit for
memory bandwidth consuming jobs like DFT/PW/PP codes) so that there is 
some cpu capacity left for asynchrous operation (e.g. kernel i/o and 
the MPI and NFS threads).


SZ> >   
SZ> I may do it, but for the time being, these files are not very useful for 
SZ> me. I can change the code to respect again the disk_io parameter and 
SZ> avoid writing these files all together. However I would like to know 
SZ> first if there was some reasoning behind dumping these files by default 
SZ> without user control over it.
SZ> Thanks, Silviu.
