On Mar 18, 2008, at 5:52 PM, Chembeti, Ramesh (S&T-Student) wrote:
MY question is : when I printed the results,accelerations on
processor 0( ie from 1 to nmol/2) are same as the results for serial
code, where as they arent same for processor 1(nmol/2+1 to nmol). As
I am learning MPI I couldnt find where it went wrong in doing an all
to all operation for accleration part ax(i,m),ay(i,m),az(i,m).
I can't really parse your question, and I unfortunately don't have
time to parse your code. I see that you're doing 3 bcasts (they're
not all-to-all, as your comment claims), but I don't know how big they
are.
The big issue here is how much work each process is doing compared to
the whole. If your problem is not "big enough", the communication
costs can outweigh the computation costs and any benefits you might
have gained from parallelization will be lost. In short: you usually
need a big computational problem before you'll see benefits from
parallelization. (yes, there's lots of corner cases; find a textbook
on parallel computation to see the finer details)
Here's a writeup I did on basic parallel computing many years ago that
might be helpful:
http://www.osl.iu.edu/~jsquyres/bladeenc/details.php
--
Jeff Squyres
Cisco Systems