[Pw_forum] On the use of Quantum Espresso on GPUs

O. Baris Malcioglu baris.malcioglu at gmail.com
Mon Mar 8 11:01:47 CET 2010

Dear Hande,

Although not specifically related to QE you may find the following
information useful.

I have recently attended to a workshop in high performance computing,
and a series of lectures/hands-on sessions co-organized by NVIDIA.

The good news is that CUDA (the NVIDIA accelarated GPU computing
"platform" (i.e. modified compilers etc.) ) now has a LAPACK/BLAS
implementation that can be used "out of the box" (like intel MKL, no
modification to QE mandatory in principle, just download and link like
usual) but the bad news is that current blend of GPUs are not so
fantastic when performing general operations and/or when double
precision is required.

My -personal- opinion and -very limited- experience on the matter is
that I do not find the prospect particularly promising at the moment,
at least until the new GPU "fermi" and attached "general computing
platform revision of CUDA" is released. My -personal- reasoning goes
like this:

-There is a huge bottleneck when moving data through the pci-x bridge
between the card memory and the system memory, especially in
run-of-the-mill servers , and unless you do low level tricks in your
code to optimize these copies, the code runs slower not faster when
GPU is involved. Sadly, CUDA do not provide all these low level
optimizations in the wrappers they have for Fortran, and it seems not
a straightforward problem to implement C wrappers that will do this
optimization in QE.

-The NVIDIA linux drivers are not open-source and taint the kernel.
Their insist on not making their driver open-source like the others
has some nasty consequences in performance in some HPC environments.
You can browse HPC forums for details.

-The GPU is stupidly fast for some select float operations (as the
brain does not require that much precison when interpretting scenes in
a screen) but not-so-fast for precision operations involving double.
Most, if not all, of the variables in QE are double precison.

-Same thing, GPU is stupidly fast for things like multiplication and
addition, but not-so-fast for divison. Also one must use higly GPU
specific tricks like "fused multiply add" to get to the posterity
results they are showing around.

-Since the 3rd party companies do nasty things like overclocking etc.
to the GPU cards they supply, in the same sense that the precision is
not mandatory for visual applications, the platform is highly volatile
for scientific applications. Personally I wouldn't trust the outcome
of a run unless proven otherwise for a specific machine (should be
tested very well).

I will start modfying my code if I have access to a Fermi GPU just to
give it a try, but otherwise I prefer to spend my time on more
pressing topics.


2010/3/8 Hande Ustunel <hande at newton.physics.metu.edu.tr>:
> Dear Quantum-Espresso community,
> Following the acquisition of a couple of GPU servers by our national
> computer cluster, I decided to try and see if I can compile QE on
> them. Through some web searching I found some promising mention of current
> progress on the porting of QE to GPUs. I was wondering if I could get from
> any of you perhaps a more up-to-date idea of the current status, if it's
> not too much trouble.
> Thank you very much in advance.
> Hande
> --
> Hande Toffoli
> Department of Physics
> Office 439
> Middle East Technical University
> Ankara 06531, Turkey
> Tel : +90 312 210 3264
> http://www.physics.metu.edu.tr/~hande
> _______________________________________________
> Pw_forum mailing list
> Pw_forum at pwscf.org
> http://www.democritos.it/mailman/listinfo/pw_forum

More information about the Pw_forum mailing list