[Pw_forum] ph.x v3.2 on NEC SX-8
wlyim at puccini.che.pitt.edu
wlyim at puccini.che.pitt.edu
Fri Dec 29 17:02:01 CET 2006
Thank Axel for fruitful discussion.
The larger nrxx index in NEC is come from the "good_fft_dimension"
subroutine, and nrxx=25x24x37 (=22200), instead of 24x24x36 in INTEL.
I found that I cannot simply neglect the "zero" elements after I tried two
things:
1. masked the overflow error can pass through the phq_setup step, but
after the whole phonon cycles, it hanged at some point;
2. modified the dmxc_spin subroutine and kept the extra array zero. The
program can run till the end but the result was wrong.
Till now, I cannot find a way to solve the ph.x problem in NEC (pw.x
works). I will be appreciated if someone will address this issue.
Alternatively, I compiled espresso-2.1.4 in NEC using the following
makefile, using the internal FFTW and __INTEL and __LINUX64 flags. On my
test case, both pw.x and ph.x worked properly.
========================================================================
OSHOME=/nfs/home5/HLRS/xol/xolwlyim/PWSCF.2.1.4/espresso-2.1.4.parallel
#
# System-dependent definitions for NEC SX6 - Contributed by Guido Roma
# Edit according to your needs
#
# Precompiler:
#
MAKE=sxmake
CPP = /SX/usr/lib/sxcpp
INC_DIR = ../include
# For fft routines of ASL library
CPPFLAGS = -P -E -DLANGUAGE_FORTRAN -D__INTEL -D__LINUX64 -D__FFTW
-D__USE_INTERNAL_FFTW -D__MPI -D__PARA -I$(INC_DIR)
# For libfft library, part of Mathkeisan Libraries
#CPPFLAGS = -P -E -DLANGUAGE_FORTRAN -DHAS_ZHEGVX -D__SX6 -I$(INC_DIR)
# For libjmfft library (www.idris.fr) by Jean-Marie Teuler
#CPPFLAGS = -P -E -DZZFFT3D=ccfft3d -DHAS_ZHEGVX -DLANGUAGE_FORTRAN
-D__SX6 -I$(INC_DIR)
HOST=-sx8
BASIC=-float0 -P stack $(HOST)
MISC = -I$(INC_DIR) -eab -R5 -Wf" -P nh -ptr byte" -Wf,"-Ncont -A dbl4 "
MISC1 = -I$(INC_DIR) -eab -R5 -Wf" -P nh -ptr byte" -Wf,"-cont -A dbl4 "
PROF=-p
FTRACE=-ftrace
OPT= -C hopt -Wf" -pvctl noifopt loopcnt=9999999 expand=12 fullmsg
vwork=stack -fusion -O noif"
OPTVSAFE= -C vsafe -Wf" -pvctl loopcnt=9999999 fullmsg vwork=stack "
OPT0= -C debug
DEBUG= -g
DEBUGOPT= -Wf" -init stack=zero heap=zero"
#
AR = sxar
ARFLAGS = rv
# This is needed to tell the compiler where modules are
#
MODULEFLAG= -I$(OSHOME)/Modules -I$(OSHOME)/PW -I$(OSHOME)/PH
#
# Fortran compiler:
#
#
F90 = sxmpif90
F77 = sxmpif90
FFLAGS = $(BASIC) $(MISC) $(OPT) $(DEBUGOPT)
#$(FTRACE)
#FFLAGS = $(BASIC) $(MISC) $(DEBUG) $(DEBUGOPT) $(OPT0)
#FCAUTIOUS=$(BASIC) $(MISC1) $(DEBUG) $(DEBUGOPT)
F90FLAGS=$(FFLAGS)
F77FLAGS=-f0 $(FFLAGS)
#
# C compiler:
#
#CC = sxc++
#CFLAGS = -DLANGUAGE_C -DNEC -DSX -I$(INC_DIR) -hfloat0,0,acct
CC = sxcc
CCLOCAL=cc
CCFLAGS = -D__INTEL -D__LINUX64 -D__FFTW -D__USE_INTERNAL_FFTW -D__MPI
-D__PARA -I$(INC_DIR)
#
# Libraries:
#
# With ASL fft libraries
LIBS = -llapack -lblas
# With libfft (Mathkeisan) libraries
# be careful, versions <= 1.4 are buggy (zzfft3d),
#wait for 1.5 (expected end of 2003)
#LIBS = -llapack -lblas $(OSHOME)/zzfft3d.o -lfft
# You can find the jmfft Cray compatible library written
# by Jean-Marie Teuler on www.idris.fr (search for jmfft)
#LIBS = -llapack -lblas -L$(HOME)/mylocal/lib -ljmfft
#
# Loader flags:
#
LD = $(F90)
#LDFLAGS = $(BASIC) $(PROF) $(FTRACE)
LDFLAGS = $(BASIC) $(DEBUG) $(DEBUGOPT) $(OSHOME)/flib/ptools.a \
$(OSHOME)/flib/flib.a $(OSHOME)/clib/clib.a \
-p -Wl" -f zero " $(LIBS)
RANLIB = echo
=============================================================================
Thanks!
Best regards,
William
On Thu, 28 Dec 2006, Axel Kohlmeyer wrote:
> On 12/28/06, wlyim at puccini.che.pitt.edu <wlyim at puccini.che.pitt.edu> wrote:
> > Thanks for your suggestion. I will try one of the examples as soon as
> > possible.
> >
> > Current status: ifort-compiled pw.x and ph.x can complete the job
> > normally. However, the NEC executables pass a larger "nrxx" value, 22200
> > in NEC vs 20736 in Intel, given that nr1=24,nr2=24,nr3=36. So in NEC, some
>
> that is very interesting.
>
> > zero "zeta" were passed to dmxc_spin subroutine which led to "divide by
> > zero" error at line 1192 in Modules/functionals.f90. Interestingly, pw.x
> > by sxcross compiler and ifort gave the same scf results, while ph.x in NEC
> > didn't work...
>
> no surprise here. pw.x does not need the derivatives of the exchange-
> correlation potential.
>
> > Any suggestion is welcome, e.g. compiler options, preprocessor flags...
>
> from looking at the code it seems that the relation nrxx=nrx1*nrx2*nrx3 is
> only true in the serial case. see Modules/fft_types.f90 lines 242ff.
>
> the intel compiler code usually continues with a denormalized number
> (NaN or Inf) after a division by zero (same as IBM xlf) and since the
> corresponding grid point is not accessed this does not propagate.
>
> to remedy the situation you can try a) compile a serial version of the code,
> b) look for a compiler flag to continue after a denormalized number, or
> c) correct the code in PH/phq_setup.f90 to call dmxc()/dmxc_spin()
> only for values or 'ir' that correspond to valid grid points.
>
> cheers,
> axel.
>
>
> >
> > Best regards,
> > William
> >
>
>
>
--
Dr. Wai-Leung Yim
Institut fuer Reine und Angewandte Chemie,
Theoretische Chemie,
Carl von Ossiezky Universtaet Oldenburg,
26129 Oldenburg,
Germany
Email: wlyim at puccini.che.pitt.edu
Phone: +49-441-798-3950 (office)
+49-441-798-5102 (home)
Fax: +49-441-798-3964
More information about the Pw_forum
mailing list