subject:"\[OMPI devel\] Benchmark with multiple orteds"

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Federico Reghenzani

Yes, --mca btl tcp,self always used. We found the problem, we have restricted the interfaces with --mca btl_tcp_if_include eth0 and now we are at same performance (actually it seems that multiple orteds case is slightly faster). I think there is some mess with other interfaces, however I cannot fig

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Ralph Castain

I also assumed that was true. However, when communicating between two procs, the TCP stack will use a shortcut in the loopback code if the two procs are known to be on the same node. In the case of multiple orteds, it isn't clear to me that the stack knows this situation as the orteds, at least, mu

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Gilles Gouaillardet

Though I did not repeat it, I assumed --mca btl tcp,self is always used, as described in the initial email Cheers, Gilles On Monday, January 25, 2016, Ralph Castain wrote: > I believe the performance penalty will still always be greater than zero, > however, as the TCP stack is smart enough to

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Federico Reghenzani

Ok, thank you Ralph and Gilles, I will continue testing and I'll update you if there is any news. Cheers, Federico 2016-01-25 14:23 GMT+01:00 Ralph Castain : > I believe the performance penalty will still always be greater than zero, > however, as the TCP stack is smart enough to take an optimi

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Ralph Castain

I believe the performance penalty will still always be greater than zero, however, as the TCP stack is smart enough to take an optimized path when doing a loopback as opposed to inter-node communication. On Mon, Jan 25, 2016 at 4:28 AM, Gilles Gouaillardet < gilles.gouaillar...@gmail.com> wrote:

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Gilles Gouaillardet

Federico, I did not expect 0% degradation, since you are now comparing two different cases 1 orted means tasks are bound on sockets 16 orted means tasks are not bound. a quick way to improve things is to use a wrapper that binds MPI tasks mpirun --bind-to none wrapper.sh skampi wrapper.sh can us

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Federico Reghenzani

Thank you Gilles, you're right, with --bind-to none we have ~ 15% of degradation rather than 50%. It's much better now, but I think it should be (in theory) around 0%. The benchmark is MPI bound (the standard benchmark provided with SkaMPI), it tests these functions: MPI_Bcast, MPI_Barrier, MPI_Re

Re: [OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Gilles Gouaillardet

Federico, unless you already took care of that, I would guess all 16 orted bound their children MPI tasks on socket 0 can you try mpirun --bind-to none ... btw, is your benchmark application cpu bound ? memory bound ? MPI bound ? Cheers, Gilles On Monday, January 25, 2016, Federico Reghenzani

[OMPI devel] Benchmark with multiple orteds

2016-01-25 Thread Federico Reghenzani

Hello, we have executed a benchmark (SkaMPI) on the same machine (32 core Intel Xeon 86_64) with these two configurations: - 1 orted with 16 processes with BTL forced to TCP (--mca btl self,tcp) - 16 orted with each 1 process (that uses TCP) We use a custom RAS to allow multiple orted on the same

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

Re: [OMPI devel] Benchmark with multiple orteds

[OMPI devel] Benchmark with multiple orteds

9 matches

Site Navigation

Mail list logo

Footer information