[Pw_forum] Could you please help me to cope with the error message
vega
vegalew at hotmail.com
Mon Nov 10 20:56:46 CET 2008
Dear all,
I am suffering from the error message like this,
[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8D51C Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
mca_oob_tcp.so 55F911B4 Unknown Unknown Unknown
Unknown 00000001 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
pw.x 0813EE72 Unknown Unknown Unknown
pw.x 0813E577 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50F Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE40E Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50F Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFD060 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
Unknown 00000003 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8C50B Unknown Unknown Unknown
pw.x 081CBD7B Unknown Unknown Unknown
pw.x 0823A95E Unknown Unknown Unknown
pw.x 08239C4A Unknown Unknown Unknown
pw.x 081DEDC9 Unknown Unknown Unknown
pw.x 081D4E9C Unknown Unknown Unknown
Unknown FFFFCDC0 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
[node1][0,1,12][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8BF47 Unknown Unknown Unknown
pw.x 080EA567 Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
libblas.so.3 55A8BF3B Unknown Unknown Unknown
pw.x 081E3C7B Unknown Unknown Unknown
Stack trace terminated abnormally.
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
[node8][0,1,23][btl_tcp_frag.c:202:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed with errno=104
forrtl: error (78): process killed (SIGTERM)
Image PC Routine Line Source
. FFFFE410 Unknown Unknown Unknown
mpirun noticed that job rank 14 with PID 3519 on node node3 exited on signal 11 (Segmentation fault).
I could relax 72 atoms successfully with my system using openmpi. But when I wanted to relax 84 atoms, the error message stoped my calculation. Then I tried the mpich2 using the same system. With the help of mpich2 I could relax 120 atoms instead. But the error message bothered me again when I wanted to relax 132 atoms. I was get entangle by his troublesome thing for quite a long time. Could someone give me some suggestions to cope with this?
for better understanding my question, I will show the detail of my systems as follows,
there are 8 nodes in my cluster with the Ethernet.
CPU intel Q6600
Memory 8G per node
Main Board intel S3000AH
hard disk seagate 750G (7200)
OS redhat linux enterprise 4 as 4 update 4
Fortran intel ifort 10.1.015
C intel icc 10.1.015
MPI mpich2/openmpi
FFTW fftw 2.1.5
MKL 10.0.1.014
thank you for reading. any hints will be deeply appreciated.
vega
=================================================================================
Vega Lew (weijia liu)
PH.D Candidate in Chemical Engineering
State Key Laboratory of Materials-oriented Chemical Engineering
College of Chemistry and Chemical Engineering
Nanjing University of Technology, 210009, Nanjing, Jiangsu, China
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.democritos.it/pipermail/pw_forum/attachments/20081111/09997556/attachment-0001.htm
More information about the Pw_forum
mailing list