[Pw_forum] MPI & disk unavailability
Gerardo Ballabio
g.ballabio at cineca.it
Mon Nov 28 18:34:54 CET 2005
On 11/28/2005 06:01:43 PM, Konstantin Kudin wrote:
> I have the following question. Occasionally on the cluster I am
> running Espresso (mostly cp.x) the storage becomes temporarily
> unavailable.
>
> This causes the jobs to die. It seems that while the output file
> can sometimes wait for the storage, the restart files absolutely
> cannot.
> So my question is if it is possible to make MPI wait for disk
> availability when writing the restart files?
I suspect this is an operating system level thing. When the operating
system "talks" to the disk and doesn't get a reply, it can either
wait, or give up and report failure. Probably there is a configurable
timeout. If it's a networked filesystem, most likely the filesystem
daemons are responsible for that.
Actually, if this hasn't changed recently, Espresso doesn't use the
MPI I/O functions: all I/O is handled by cpu 0 that reads and writes
locally. The "local" disk may be (and most often is) a networked
filesystem; but again, this is handled by the operating system, and
completely transparent to Espresso.
I also guess that output behaves differently than input because it is
buffered.
Gerardo
More information about the Pw_forum
mailing list