There is nothing MPI specific in your code snippet.
You should try to find out what is different in your
code for node 0. You have mentioned that you have
moved the root node to other nodes, so it's not machine
specific. You might be setting up the arrays differently
on the different nodes. You should also try using other
timers such as clock_gettime, gettimeofday etc to see
if the results are consistent.
Also, are you running multiple threads on the same processor?
Did you try out blocking etc ?

42af...@niit.edu.pk wrote:
Hi All,
       Thanks for the help. I think that I don't have the cache issue
because all the processes have the same amount of data, and
accessed in the same fashion. My problem is solved partially as I
was using 2, 4, 8 , 16, 32 and 64 processes for my application
code. Now what I did I used 3 processes instead of 2 and 5 instead
of 4. In other words I used one extra process than what I was using
before. I forced process 0 to do nothing but just wait for other
processes to finish. In this way I am having same time, on all the
processes except process 0, for calculating the code segment that
was taking more time on process 0. So, still I need help and I will
be thankful for further help.

regards Aftab Hussain


On Fri, October 26, 2007 4:13 pm, Jeff Squyres wrote:
This is not an MPI problem.


Without looking at your code in detail, I'm guessing that you're
accessing memory without any regard to memory layout and/or caching. Such
an access pattern will therefore thrash your L1 and L2 caches and access
memory in a truly horrible pattern that guarantees abysmal performance.

Google around for cache effects or check out an operating systems
textbook; there's lots of material around about this kind of effect.

Good luck.




On Oct 26, 2007, at 5:10 AM, 42af...@niit.edu.pk wrote:


Thanks,


The array bounds are the same on all the nodes and also the
compute nodes are identical i.e. SunFire V890 nodes. And I have also
changed the root process to be on different nodes, but the problem
remains the same. I still dont understand the problem very well and my
progress is in stand still situation.

regards aftab hussain

Hi,


Please ensure if following things are correct
1) The array bounds are equal. Means "my_x" and "size_y" has the same
value on all nodes. 2) Nodes are homogenous. To check that, you could
decide root to be some different node and run the program

-Neeraj



On Fri, October 26, 2007 10:13 am, 42af...@niit.edu.pk wrote:

Thanks for your reply,



I used MPI_Wtime for my application but even then process 0 took
longer time executing the mentioned code segment. I might be worng, but
 what I see is process 0 takes more time to access the array elements
than other processes. Now I dont see what to do because the mentioned
code segment is creating a bottleneck for the timing of my application.


Can any one suggest somthing in this regard. I will be very thankful



regards

Aftab Hussain




On Thu, October 25, 2007 9:38 pm, jody wrote:


HI
I'm not sure if that is a problem,
but in MPI applications you shoud use MPI_WTime() for time-
measurements

Jody




On 10/25/07, 42af...@niit.edu.pk <42af...@niit.edu.pk> wrote:



Hi all,
I am a research assistant (RA) at NUST Pakistan in High
Performance
Scientific Computing Lab. I am working on the parallel
implementation of the Finitie Difference Time Domain (FDTD) method
 using MPI. I am using the OpenMPI environment on a cluster of 4
SunFire v890 cluster connected through Myrinet. I am having
problem that when I run my code with let say 4 processes. Process
0 takes about 3 times more time than other three processes,
executing a for loop which is the main cause of load imbalance in
my code. I am writing the code that is causing the problem. The
code is run by all the processes simultaneously and independently
and I have timed it independent of segments of code.

start = gethrtime(); for (m = 1; m < my_x ; m++){ for (n = 1; n <
 size_y-1; n++) { Ez(m,n) = Ez(m,n) + cezh*((Hy(m,n) - Hy(m-1,n))
-
(Hx(m,n) - Hx(m,n-1)));
}
}
stop = gethrtime(); time = (stop-start);

In my implementation I used 1-D array to realize 2-D arrays.I
have used the following macros for accesing the array elements.

#define Hx(I,J) hx[(I)*(size_y) + (J)]
#define Hy(I,J) hy[(I)*(size_y) + (J)]
#define Ez(I,J) ez[(I)*(size_y) + (J)]




Can any one tell me what am I doing wrong here, or macros are
creating the problems or it can be related to any OS issue. I will
be looking forward for help because this problem has stopped my
progress for the last two weeks

regards aftab hussain

RA High Performance Scientific Computing Lab




NUST Institue of Information Technology




National University of Sciences and Technology Pakistan







--
This message has been scanned for viruses and
dangerous content by MailScanner, and is believed to be clean.

_______________________________________________
users mailing list us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




_______________________________________________
users mailing list us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is believed to be clean.



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is believed to be clean.

_______________________________________________
users mailing list us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is believed to be clean.

_______________________________________________
users mailing list us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Jeff Squyres
Cisco Systems


_______________________________________________
users mailing list us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
This message has been scanned for viruses and
dangerous content by MailScanner, and is believed to be clean.



Reply via email to