[Pw_forum] PW taskgroups and a large run on a BG/P

David Farrell davidfarrell2008 at u.northwestern.edu
Thu Feb 12 19:24:38 CET 2009


I pulled down the current CVS version, compiled as I did with the  
previous snapshot and got the same behavior:

When I ran on 128 cores in vn mode with -ntg 4 -ndiag 121, I got a  
cholesky error:

When I ran on 128 cores in dual mode with -ntg 4 -ndiag 121, I got the  
cholesky error:

When I ran on 128 cores in smp mode with -ntg 4 -ndiag 121, it ran fine.

I guess I have 2 options:

1) try larger systems in SMP mode with the CVS version, see how big I  
can get before things blow up. I'll just have to deal with the extra  
cost of the idle CPUs.

2) climb into the code with a debugger to see if I can see anything  
going on (things I am interested in now are how much memory is  
actually available to the code, how much it is using, if there is  
something funny going on in the different modes). I'll probably have  
to construct a smaller system that does the same thing first.

I don't want to abandon PW/CP just yet because this code has  
demonstrated decent physics, and other codes would require me to do  
develop PPs that give me results I can be confident in or way too much  
work to get them scalable. Unfortunately - I also need to get it  
running on the BG/P as I have a big allocation on that machine that is  
otherwise wasted.

Dave


David E. Farrell
Post-Doctoral Fellow
Department of Materials Science and Engineering
Northwestern University
email: d-farrell2 at northwestern.edu

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.democritos.it/pipermail/pw_forum/attachments/20090212/457df989/attachment.htm 


More information about the Pw_forum mailing list