[Pw_forum] Crash when running pw.x for relaxing a structure

Huiqun Zhou hqzhou at nju.edu.cn
Wed Aug 2 08:04:09 CEST 2006


Paolo,

Thanks for your response. I have made more calculations these days. Below 
are
some facts:

(1) The system I'm investigating is with orthohombic unit cell. The volume 
is fixed
  at 1570.0 bohr^3, and the calculations were carried on by changing
  b/a = 0.275 - 0.400 : 0.025
  c/a = 0.975 - 1.200 : 0.025
(2) Not all calculations will fail at charge density extrapolation. For 
example, when
  b/a = 0.375, the calculation will fail only when c/a = 1.200. This error 
is highly
  reproducible.
(3) The error occurrs only in the calculation on 4 CPU cores, and it's no 
problem
  when running on one or two CPU cores. This is true for all failed cases.
(4) This problem can be duplicated on compute nodes with different dual core
  processors, such as Intel dempsey (Xeon 5060), woodcrest (Xeon 5140), and
 AMD opteron 280.

The OS of my cluster is RHEL4 U3 (kernel 2.6.9-34). I'm using Intel FORTRAN
9.0, MKL 8.0 and FFTW 2.1.5. I have no access to other commercial compilers,
and failed to compile QE with g95.

The attached zip file includes input for b/a=0.375, c/a=1.200, and output 
files of the
results on 1 core (successful) and 4 cores (failure).

Thanks again for your help.

Huiqun Zhou


----- Original Message ----- 
From: "Paolo Giannozzi" <giannozz at nest.sns.it>
To: <pw_forum at pwscf.org>
Sent: Tuesday, July 25, 2006 12:25 AM
Subject: Re: [Pw_forum] Crash when running pw.x for relaxing a structure


> On Tuesday 18 July 2006 12:21, Huiqun Zhou wrote:
>
>> I'm doing structural optimization for chromite with calcium ferrite
>> structure while changing b/a and c/a at fixed volume. But for every
>> run with different pair of b/a and c/a, I alway got following error
>> after 3-5 rounds of SCF calculations:
>
>>      Writing output data file fecr2o4-cf-relax.save
>>
>>      second order charge density extrapolation
>> rank 1 in job 170  woodcrest_32906   caused collective abort of all ranks
>>   exit status of rank 1: return code 220
>
>> The job was running parallely on one compute node with 4 CPU cores
>> (Intel woodcrest).
>>
>> Did I do anything wrong?
>
> difficult to say. Is it reproducible? does it happen on other machines
> or with other compilers or in serial execution ? If it is not reproducible
> it may not be related to the code itself
>
> P.
> -- 
> Paolo Giannozzi             Phone:   +39/050-509876
> DEMOCRITOS and SNS          Fax:     +39/050-563513
> Piazza dei Cavalieri 7      I-56126 Pisa, Italy
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fecr2o4-cf.zip
Type: application/octet-stream
Size: 25406 bytes
Desc: not available
Url : /pipermail/attachments/20060802/54688685/attachment.obj 


More information about the Pw_forum mailing list