[Pw_forum] PW 2.0 - problems with parallelization
Nicolas Mounet
mounet at MIT.EDU
Wed Mar 17 01:21:01 CET 2004
Dear PW users,
Here is the situation:
I have installed last week the new release of pwscf. Using the old
configuration procedure ("./configure.old"), I was able to compile both
in single-processor mode or in parallel. However, while the
single-processor version seems to works, the parallel one fails even on
simple things like the Si part of "example1" from pw_examples (see
input script in attachement). The self-consistent calculation works, but
the non self-consistent one has a strange behavior :
- sometimes, pw just "hangs" without being killed, CPUs don't work
anymore and pw stops writing in the output after the following lines:
[...]
nbndx = 32 nbnd = 8 natomwfc = 8 npwx = 168
nelec = 8.00 nkb = 8 ngl = 65
- other times (depending on the energy cutoff !), odd errors appear at
the end of the non self-consistent output file, like:
[...]
nbndx = 32 nbnd = 8 natomwfc = 8 npwx = 168
nelec = 8.00 nkb = 8 ngl = 65
The initial potential is read from file silicon.pot
Starting wfc are atomic
MPI_Recv: message truncated (rank 1, MPI_COMM_WORLD)
Rank (1, MPI_COMM_WORLD): Call stack within LAM:
Rank (1, MPI_COMM_WORLD): - MPI_Recv()
Rank (1, MPI_COMM_WORLD): - MPI_Bcast()
Rank (1, MPI_COMM_WORLD): - MPI_Allreduce()
Rank (1, MPI_COMM_WORLD): - main()
and the output stops there.
(If the cutoff energy is put at 20 Ryd, the first case happens, whereas
the second case happens for a cutoff of 18 Ryd)
The same example works normally if the non-sef consistent calculation is
done in single-processor mode instead of parallel.
The operating system is Linux, running on a PC cluster (dual Xeon). The
same kind of problem occurs when compiling either with ifc 6.0 or ifc
7.0, and using either mkl 5.1 or mkl 6.1. We also use mpif77 compiler
and LAM 7.0.3/MPI 2 C++ for parallel implementation (older versions seem
to give the same problem). The FFTW environment used is local.
Note that the old version of pw (1.3.1) worked perfectly well.
An additional fact (that may be of no relevance) is that the new
configuration procedure ("./configure") is not able to detect the
parallel environment, in particular cannot find "zggev", "dgemm" and
"mpi_init" in the various libraries. Nevertheless, the old configuration
procedure ("./configure.old") leads to compilation without any problem.
Thanks in advance,
Best regards,
Nicolas
PS: I also give the make file used (which compiles well) in attachement
--
Nicolas Mounet
Prof. Marzari's Group
Department of Materials Science and Engineering
13-4084
Massachusetts Institute of Technology
77 Massachusetts Avenue
Cambridge MA 02139
USA
Tel: (+1)617-253-6026
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: input.txt
Url: /pipermail/attachments/20040316/c67ba320/attachment.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: make.sys.txt
Url: /pipermail/attachments/20040316/c67ba320/attachment-0001.txt
More information about the Pw_forum
mailing list