[Pw_forum] memory problem

Prasenjit Ghosh prasenjit.jnc at gmail.com
Tue Nov 4 23:32:27 CET 2008


Hi everybody,

While doing a relaxation calculation using pw.x (v.4.0) and 80 processors
the dynamical memory required by the code keeps on increasing. If I grep the
lines of the output file which contains the information about memory usage,
I get the following output:

     per-process dynamical memory:   334.6 Mb
     per-process dynamical memory:   498.3 Mb
     per-process dynamical memory:   498.3 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:   507.1 Mb
     per-process dynamical memory:  1867.8 Mb
     per-process dynamical memory:  1867.8 Mb
     per-process dynamical memory:  1867.8 Mb
     per-process dynamical memory:  1867.8 Mb

After this the job is getting killed with the following error message:

Oct 31 13:49:45 2008 13128 4 7.02 handleTSRegisterTerm(): TS reports task
<0> pid <30753> on host<node0833> killed or core dumped

My system consists of a cluster of 147 atoms (516 electrons) in a
47.3x47.3x51.6 (in bohr) box. I'm using a wavefn. cut off of 25 Ry & a
charge density cut off of 210 Ry. For ion_dynamics I'm using bfgs and for
electrons, I'm using the davidson diagonalization scheme.

The machine details are:

Model: IBM BCX/5120
Architecture: eServer e326 Cluster Opteron
Processor Type: Opteron Dual Core 2.6 GHz
Number of Nodes: 1280 (4 cores per node)
Number of Processors/cores: 2560/5120
Memory: 8 GB/node
Internal Network: Infiniband (5Gb/s)
Disk Space: 100 TB + SAN
Operating System: Red Hat RHEL4

The code has been complied using openmpi--1.2.5--intel--10.1

Can any one please let me know why the job is getting killed? Is it due to
some memory problem?
For the same job why does the memory requirement go on increasing?

Also, can you please let me know how the code calculates the memory
requirement? I tried to look into the clib/memstat.c file but could't make
much of it because I'm not familiar of C programming language.

With regards,
Prasenjit

-- 
PRASENJIT GHOSH,
POST-DOC,
ROOM NO: 265, MAIN BUILDING,
CM SECTION, ICTP,
STRADA COSTERIA 11,
TRIESTE, 34104,
ITALY
PHONE: +39 040 2240 369 (O)
             +39 3807528672 (M)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.democritos.it/pipermail/pw_forum/attachments/20081104/4c531615/attachment.htm 


More information about the Pw_forum mailing list