[Pw_forum] MPI & disk unavailability
Axel Kohlmeyer
akohlmey at vitae.cmm.upenn.edu
Mon Nov 28 18:15:56 CET 2005
On Mon, 28 Nov 2005, Konstantin Kudin wrote:
KK> Hi,
kostya,
KK> So my question is if it is possible to make MPI wait for disk
KK> availability when writing the restart files?
please note, that this has _nothing_ to do with MPI itself,
but with making the _application_ fault tolerant. while this
may be desirable for certain environments, it also tends to
add a _lot_ of clutter to the code, which makes maintaining
it a nightmare. the best compromise usually is to make sure,
that you can write intermediary restart files, optionally to
a special place and that you can set the frequency (according
to the stability of the machine).
while this is generally a problem of the setup of the respective
machine, and one usually cannot affort to make a code run on even
the most crappy hardware, one should have at least some helpful
options to have some kind of 'performance degraded mode' for
fragile machines, as most high-end hardware tends to be quite
fragile during its introduction.
KK> 2946 0.04081 0.0 418.3 -1164.45530 -1164.45530 -1163.99768
KK> -1163.50459 0.0000 0.0000 -0.0001 2.0386
KK> bm_list_5117: p4_error: interrupt SIGx: 15
KK> rm_l_1_5140: p4_error: interrupt SIGx: 15
so, just one of your processes died and MPICH
did not fully recognize it and thus fails on writing
to a half-closed tcp socket with a (to a regular
unix user) cryptic error message.
best regards,
axel.
--
=======================================================================
Axel Kohlmeyer akohlmey at cmm.chem.upenn.edu http://www.cmm.upenn.edu
Center for Molecular Modeling -- University of Pennsylvania
Department of Chemistry, 231 S.34th Street, Philadelphia, PA 19104-6323
tel: 1-215-898-1582, fax: 1-215-573-6233, office-tel: 1-215-898-5425
=======================================================================
If you make something idiot-proof, the universe creates a better idiot.
More information about the Pw_forum
mailing list