[Pw_forum] hard drive becomes read only during parallel QE 4.1.1 run with openmpi-1.3.3 intelv11 compilers

Derek Stewart stewart at cnf.cornell.edu
Thu Oct 22 17:47:36 CEST 2009


Hi everyone,

I have recently been trying out the new version of QE (4.1.1) with different MPI libraries (mpich, openmpi) etc.  I recently came across a strange problem where a QE run for nickel test case (large k-mesh 48x48x48) would crash when I ran it in parallel across 5 nodes (10 processors) with openmpi-1.3.3 and Intel v11 compiles/MKL 10.2.  It appears that the hard drive on the last node becomes read-only and QE can no longer write to the local wavefunction files.  

After the run, the local scratch drive remains read only and I end up having to reboot the system to eliminate this problem.  I have been able to reproduce this problem on the same node.  However, when I remove this node from the list, QE runs fine.  Also, running QE with mpich2 doesn't have a problem on that node.

I suspect that it could be a hardware issue (harddrive close to dying perhaps) or an issue with openmpi, but I wanted to check to see anyone else has run into this problem while using QE.

For additional technical info, this is on a system with Redhat Enterprise 4, 2 Xeon processors (3 GHz), 2GB ram.

Thanks,

Derek

 

################################
Derek Stewart, Ph. D.
Scientific Computation Associate
** New Webpage **
http://sites.google.com/site/dft4nano/
250 Duffield Hall
Cornell Nanoscale Facility (CNF)
Ithaca, NY 14853
stewart (at) cnf.cornell.edu
(607) 255-2856




More information about the Pw_forum mailing list