Java uses *many* threads, simply
ls /proc/<pid>/tasks
and you will be amazed at how many threads are used.
Here is my guess,
from the point of view of a given MPI process :
in case 1, the main thread and all the other threads do time sharing, so
basically, when an other thread is working, the main thread is blocked.
in case 2, some parallelism is possible if an other MPI task is sleeping
: main thread is running, and an other thread is running on an other core
in case 3, the main thread can move from on core to an other
=> cache flush
=> QPI access if used memory is no more local
so though there is more opportunity for parallelism, process migration
can slow down everything
bottom line, event with one thread, case 1 and case 2 are quite
different because Java uses so many threads per process, so i am not so
surprised with the difference in performance.
if you have any chance, i suggest you write a similar program in C.
since only a few threads are use per process, i guess case 1 and case 2
will become pretty close.
i also suggest that for cases 2 and 3, you bind processes to a socket
instead of no binding at all
Cheers,
Gilles
On 6/23/2016 2:41 PM, Saliya Ekanayake wrote:
Thank you, Gilles for the quick response. The code comes from a
clustering application, bu let me try to explain simply what the
pattern is. It's a bit long than I expected.
The program has the pattern BSP pattern with /compute()/ followed by
collective /allreduce()/ And it does many iterations over these two.
Each process is a Java process with just the main thread. However in
Java the process and main thread have their own PIDs and act as two
LWPs in Linux.
Now, let's take two binding scenarios. For simplicity, I'll assume a
node with 2 sockets each with 4-cores. The real one I ran has 2
sockets with 12 cores each.
1. *--map-by ppr:8:node:PE=1 --bind-to core* results in something like
below.
Inline image 3
where each process is bound to 1 core. The blue dots show the main
thread in Java. It too is bound to the same core as its parent process
by default.
2. *--map-by ppr:8:node --bind-to none * This is similar to 1, but
now processes are not bound (or bound to all cores). However, from the
program, we *explicitly bind its main thread to 1 core*. It gives
something like below.
Inline image 4
The results we got suggest approach 2 gives better communication
performance than 1. The btl used is openib. Here's a graph showing the
variation in timings. It shows for other cases that use more than 1
thread to do the computation as well. In all patterns communication is
done through the main thread only.
What is peculiar is the two points within the dotted circle.
Intuitively they should overlap as it only has the main thread in each
Java process and that main is bound to 1 core. The difference is how
the parent process is bound with MPI. The red line is for *Case
1* above and the blue is for *Case 2*
*
*
The green line is when both parent process and threads are unbound.
Inline image 6
/
/
/
/
On Thu, Jun 23, 2016 at 12:36 AM, Gilles Gouaillardet
<gil...@rist.or.jp <mailto:gil...@rist.or.jp>> wrote:
Can you please provide more details on your config, how test are
performed and the results ?
to be fair, you should only compare cases in which mpi tasks are
bound to the same sockets.
for example, if socket0 has core[0-7] and socket1 has core[8-15]
it is fair to compare {task0,task1} bound on
{0,8}, {[0-1],[8-9]}, {[0-7],[8-15]}
but it is unfair to compare
{0,1} and {0,8} or {[0-7],[8-15]}
since {0,1} does not involve traffic on the QPI, but {0,8} does.
depending on the btl you are using, it might involve or not an
other "helper" thread.
if your task is bound on one core, and assuming there is no SMT,
then the task and the helper do time sharing.
but if the task is bound on more than one core, then the task and
the helper run in parallel.
Cheers,
Gilles
On 6/23/2016 1:21 PM, Saliya Ekanayake wrote:
Hi,
I am trying to understand this peculiar behavior where the
communication time in OpenMPI changes depending on the number of
process elements (cores) the process is bound to.
Is this expected?
Thank you,
saliya
--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this
post:http://www.open-mpi.org/community/lists/users/2016/06/29523.php
_______________________________________________
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/06/29524.php
--
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/06/29529.php