Thank you, this is really helpful. Yes, the other bookkeeping threads of Java were what I worried too.
I think I can extract a part to make a c program to check. I've got a quick question. Besides theses time sharing constraints, does number of cores has any significance to MPI's communication decisions? On Jun 23, 2016 2:18 AM, "Gilles Gouaillardet" <gil...@rist.or.jp> wrote: > Java uses *many* threads, simply > > ls /proc/<pid>/tasks > and you will be amazed at how many threads are used. > Here is my guess, > > > from the point of view of a given MPI process : > > in case 1, the main thread and all the other threads do time sharing, so > basically, when an other thread is working, the main thread is blocked. > > in case 2, some parallelism is possible if an other MPI task is sleeping : > main thread is running, and an other thread is running on an other core > > in case 3, the main thread can move from on core to an other > => cache flush > => QPI access if used memory is no more local > so though there is more opportunity for parallelism, process migration can > slow down everything > > > bottom line, event with one thread, case 1 and case 2 are quite different > because Java uses so many threads per process, so i am not so surprised > with the difference in performance. > > if you have any chance, i suggest you write a similar program in C. > since only a few threads are use per process, i guess case 1 and case 2 > will become pretty close. > > i also suggest that for cases 2 and 3, you bind processes to a socket > instead of no binding at all > > Cheers, > > Gilles > > On 6/23/2016 2:41 PM, Saliya Ekanayake wrote: > > Thank you, Gilles for the quick response. The code comes from a clustering > application, bu let me try to explain simply what the pattern is. It's a > bit long than I expected. > > > > The program has the pattern BSP pattern with *compute()* followed by > collective *allreduce()* And it does many iterations over these two. > > Each process is a Java process with just the main thread. However in Java > the process and main thread have their own PIDs and act as two LWPs in > Linux. > > Now, let's take two binding scenarios. For simplicity, I'll assume a node > with 2 sockets each with 4-cores. The real one I ran has 2 sockets with 12 > cores each. > > 1. *--map-by ppr:8:node:PE=1 --bind-to core* results in something like > below. > > [image: Inline image 3] > where each process is bound to 1 core. The blue dots show the main thread > in Java. It too is bound to the same core as its parent process by default. > > 2. *--map-by ppr:8:node --bind-to none * This is similar to 1, but now > processes are not bound (or bound to all cores). However, from the program, > we *explicitly bind its main thread to 1 core*. It gives something like > below. > > [image: Inline image 4] > The results we got suggest approach 2 gives better communication > performance than 1. The btl used is openib. Here's a graph showing the > variation in timings. It shows for other cases that use more than 1 thread > to do the computation as well. In all patterns communication is done > through the main thread only. > > What is peculiar is the two points within the dotted circle. Intuitively > they should overlap as it only has the main thread in each Java process and > that main is bound to 1 core. The difference is how the parent process is > bound with MPI. The red line is for *Case 1* above and the blue is for *Case > 2* > > The green line is when both parent process and threads are unbound. > > > [image: Inline image 6] > > > > > > > > On Thu, Jun 23, 2016 at 12:36 AM, Gilles Gouaillardet <gil...@rist.or.jp> > wrote: > >> Can you please provide more details on your config, how test are >> performed and the results ? >> >> >> to be fair, you should only compare cases in which mpi tasks are bound to >> the same sockets. >> >> for example, if socket0 has core[0-7] and socket1 has core[8-15] >> >> it is fair to compare {task0,task1} bound on >> >> {0,8}, {[0-1],[8-9]}, {[0-7],[8-15]} >> >> but it is unfair to compare >> >> {0,1} and {0,8} or {[0-7],[8-15]} >> >> since {0,1} does not involve traffic on the QPI, but {0,8} does. >> depending on the btl you are using, it might involve or not an other >> "helper" thread. >> if your task is bound on one core, and assuming there is no SMT, then the >> task and the helper do time sharing. >> but if the task is bound on more than one core, then the task and the >> helper run in parallel. >> >> >> Cheers, >> >> Gilles >> >> On 6/23/2016 1:21 PM, Saliya Ekanayake wrote: >> >> Hi, >> >> I am trying to understand this peculiar behavior where the communication >> time in OpenMPI changes depending on the number of process elements (cores) >> the process is bound to. >> >> Is this expected? >> >> Thank you, >> saliya >> >> -- >> Saliya Ekanayake >> Ph.D. Candidate | Research Assistant >> School of Informatics and Computing | Digital Science Center >> Indiana University, Bloomington >> >> >> >> _______________________________________________ >> users mailing listus...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29523.php >> >> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/06/29524.php >> > > > > -- > Saliya Ekanayake > Ph.D. Candidate | Research Assistant > School of Informatics and Computing | Digital Science Center > Indiana University, Bloomington > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29529.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/06/29530.php >