Re: [OMPI users] Why communication performance change with binding PEs?

Saliya Ekanayake Thu, 23 Jun 2016 08:20:17 -0400 (EDT)

Thank you, this is really helpful. Yes, the other bookkeeping threads of
Java were what I worried too.


I think I can extract a part to make a  c program to check.

I've got a quick question. Besides theses time sharing constraints, does
number of cores has any significance to MPI's communication decisions?
On Jun 23, 2016 2:18 AM, "Gilles Gouaillardet" <gil...@rist.or.jp> wrote:

> Java uses *many* threads, simply
>
> ls /proc/<pid>/tasks
> and you will be amazed at how many threads are used.
> Here is my guess,
>
>
> from the point of view of a given MPI process :
>
> in case 1, the main thread and all the other threads do time sharing, so
> basically, when an other thread is working, the main thread is blocked.
>
> in case 2, some parallelism is possible if an other MPI task is sleeping :
> main thread is running, and an other thread is running on an other core
>
> in case 3, the main thread can move from on core to an other
> => cache flush
> => QPI access if used memory is no more local
> so though there is more opportunity for parallelism, process migration can
> slow down everything
>
>
> bottom line, event with one thread, case 1 and case 2 are quite different
> because Java uses so many threads per process, so i am not so surprised
> with the difference in performance.
>
> if you have any chance, i suggest you write a similar program in C.
> since only a few threads are use per process, i guess case 1 and case 2
> will become pretty close.
>
> i also suggest that for cases 2 and 3, you bind processes to a socket
> instead of no binding at all
>
> Cheers,
>
> Gilles
>
> On 6/23/2016 2:41 PM, Saliya Ekanayake wrote:
>
> Thank you, Gilles for the quick response. The code comes from a clustering
> application, bu let me try to explain simply what the pattern is. It's a
> bit long than I expected.
>
>
>
> The program has the pattern BSP pattern with *compute()* followed by
> collective *allreduce()* And it does many iterations over these two.
>
> Each process is a Java process with just the main thread. However in Java
> the process and main thread have their own PIDs and act as two LWPs in
> Linux.
>
> Now, let's take two binding scenarios. For simplicity, I'll assume a node
> with 2 sockets each with 4-cores. The real one I ran has 2 sockets with 12
> cores each.
>
> 1. *--map-by ppr:8:node:PE=1 --bind-to core* results in something like
> below.
>
> [image: Inline image 3]
> where each process is bound to 1 core. The blue dots show the main thread
> in Java. It too is bound to the same core as its parent process by default.
>
> 2. *--map-by ppr:8:node  --bind-to none * This is similar to 1, but now
> processes are not bound (or bound to all cores). However, from the program,
> we *explicitly bind its main thread to 1 core*. It gives something like
> below.
>
> [image: Inline image 4]
> The results we got suggest approach 2 gives better communication
> performance than 1. The btl used is openib. Here's a graph showing the
> variation in timings. It shows for other cases that use more than 1 thread
> to do the computation as well. In all patterns communication is done
> through the main thread only.
>
> What is peculiar is the two points within the dotted circle. Intuitively
> they should overlap as it only has the main thread in each Java process and
> that main is bound to 1 core. The difference is how the parent process is
> bound with MPI. The red line is for *Case 1* above and the blue is for *Case
> 2*
>
> The green line is when both parent process and threads are unbound.
>
>
> [image: Inline image 6]
>
>
>
>
>
>
>
> On Thu, Jun 23, 2016 at 12:36 AM, Gilles Gouaillardet <gil...@rist.or.jp>
> wrote:
>
>> Can you please provide more details on your config, how test are
>> performed and the results ?
>>
>>
>> to be fair, you should only compare cases in which mpi tasks are bound to
>> the same sockets.
>>
>> for example, if socket0 has core[0-7] and socket1 has core[8-15]
>>
>> it is fair to compare {task0,task1} bound on
>>
>> {0,8}, {[0-1],[8-9]}, {[0-7],[8-15]}
>>
>> but it is unfair to compare
>>
>> {0,1} and {0,8} or {[0-7],[8-15]}
>>
>> since {0,1} does not involve traffic on the QPI, but {0,8} does.
>> depending on the btl you are using, it might involve or not an other
>> "helper" thread.
>> if your task is bound on one core, and assuming there is no SMT, then the
>> task and the helper do time sharing.
>> but if the task is bound on more than one core, then the task and the
>> helper run in parallel.
>>
>>
>> Cheers,
>>
>> Gilles
>>
>> On 6/23/2016 1:21 PM, Saliya Ekanayake wrote:
>>
>> Hi,
>>
>> I am trying to understand this peculiar behavior where the communication
>> time in OpenMPI changes depending on the number of process elements (cores)
>> the process is bound to.
>>
>> Is this expected?
>>
>> Thank you,
>> saliya
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
>>
>>
>> _______________________________________________
>> users mailing listus...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2016/06/29523.php
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/06/29524.php
>>
>
>
>
> --
> Saliya Ekanayake
> Ph.D. Candidate | Research Assistant
> School of Informatics and Computing | Digital Science Center
> Indiana University, Bloomington
>
>
>
> _______________________________________________
> users mailing listus...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2016/06/29529.php
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/06/29530.php
>

Re: [OMPI users] Why communication performance change with binding PEs?

Reply via email to