[Pw_forum] a strange behavior for QE-v4 compared with QE-v3
JR Schmidt
schmidt at chem.wisc.edu
Wed Nov 26 15:52:52 CET 2008
I ran into this same issue. Let me say that (for whatever reason)
setting OMP_NUM_THREADS, or MKL_NUM_THREADS did not seem to fix the
problem. MKL was still creating many threads per process, though
perhaps only one thread was active at at time.
I found a better solution, offering increased performance at least for
parallel jobs, was to link with the MKL serial library. This can be
done by modifying the following lines in make.sys (for MKL 10)
BLAS_LIBS = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
LAPACK_LIBS = -lmkl_intel_lp64 -lmkl_sequential -lmkl_core
then doing:
make clean (probably unnecessary)
make
Although this gives WORSE performance when you are only running a single
job (since it does not thread to take advantage of the other cores), if
you are trying to fully utilize your nodes by running several jobs or
parallel jobs, using the threaded library results in a giant mess.
> this looks a lot like you are using two different versions
> of MKL for the two compiles and thus are another "victim"
> of the automatic threading of MKL v10. you should not
> have a process with >100% CPU with a regular compile.
> now if you have 8 cores and 8 MPI tasks and each of them
> threads across 8 cores, you have a) a severe overload of
> the scheduler and b) a big mess and all kinds of bad
> performance issues.
>
> try setting OMP_NUM_THREADS=1 and check if that
> changes the behavior.
>
--
J.R. Schmidt
Assistant Professor of Chemistry
Room 8305D
Department of Chemistry
University of Wisconsin-Madison
1101 University Ave
Madison, WI 53706
Phone: (608) 262-2996
Fax: (608) 262-9918
E-mail: schmidt at chem.wisc.edu
http://www.chem.wisc.edu/people/profiles/schmidt.php
More information about the Pw_forum
mailing list