[Pw_forum] hard drive becomes read only during parallel QE 4.1.1 run with openmpi-1.3.3 intelv11 compilers
Derek Stewart
stewart at cnf.cornell.edu
Thu Oct 22 17:47:36 CEST 2009
Hi everyone,
I have recently been trying out the new version of QE (4.1.1) with different MPI libraries (mpich, openmpi) etc. I recently came across a strange problem where a QE run for nickel test case (large k-mesh 48x48x48) would crash when I ran it in parallel across 5 nodes (10 processors) with openmpi-1.3.3 and Intel v11 compiles/MKL 10.2. It appears that the hard drive on the last node becomes read-only and QE can no longer write to the local wavefunction files.
After the run, the local scratch drive remains read only and I end up having to reboot the system to eliminate this problem. I have been able to reproduce this problem on the same node. However, when I remove this node from the list, QE runs fine. Also, running QE with mpich2 doesn't have a problem on that node.
I suspect that it could be a hardware issue (harddrive close to dying perhaps) or an issue with openmpi, but I wanted to check to see anyone else has run into this problem while using QE.
For additional technical info, this is on a system with Redhat Enterprise 4, 2 Xeon processors (3 GHz), 2GB ram.
Thanks,
Derek
################################
Derek Stewart, Ph. D.
Scientific Computation Associate
** New Webpage **
http://sites.google.com/site/dft4nano/
250 Duffield Hall
Cornell Nanoscale Facility (CNF)
Ithaca, NY 14853
stewart (at) cnf.cornell.edu
(607) 255-2856
More information about the Pw_forum
mailing list