<br><div class="gmail_quote"><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

there are two issue that need to be considered.<br>

<br>

1) how large are your test jobs? if they are not large enough, timings are pointless.</blockquote><div>&nbsp;</div><div>about 15 minutes in Intel Quadcore. 66 atoms: Cd_30Te_30O_6. 576 electrons in total. <br>My test may be very particular. If you a have a balanced benchmark, I would like to run it.<br>

<br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><br>

2) it is most likely, that you are still tricked by the<br>

 &nbsp; auto-parallelization of intel MKL. the export OMP_NUM_THREADS<br>

 &nbsp; will usually only work for the _local_ copy, for some<br>

 &nbsp; MPI startup mechanisms not at all. thus your MPI jobs will<br>

 &nbsp; be slowed down.</blockquote><div>I am using only SMP. Sorry, I still haven&#39;t a cluster of Quadcores. <br></div><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

<br>

<br>

 &nbsp; to make certain that you only like the serial version of<br>

 &nbsp; MKL with your MPI executable, please replace &nbsp;-lmkl_em64t<br>

 &nbsp; in your make.sys file with<br>

 &nbsp; -lmkl_intel_lp64 -lmkl_sequential -lmkl_core</blockquote><div><br>Yes, I also tried that. The test runs in 14m2s. Using only -lmkl_em64t it runs in 14m31s. Using serial compilations it ran in 12m20s.<br><br><br><br></div>

Thanks,<br>Eduardo<br><br><br></div>