subject:"\[OMPI devel\] poor btl sm latency"

Re: [OMPI devel] poor btl sm latency

2012-03-16 Thread Matthias Jurenz

"Unfortunately" also Platform MPI benefits from disabled ASLR: shared L2/L1I caches (core 0 and 1) and enabled ASLR: $ mpirun -np 2 taskset -c 0,1 ./NPmpi_pcmpi -u 1 -n 100 Now starting the main loop 0: 1 bytes 100 times --> 17.07 Mbps in 0.45 usec disabled ASLR: $ mpiru

Re: [OMPI devel] poor btl sm latency

2012-03-15 Thread Jeffrey Squyres

On Mar 15, 2012, at 8:06 AM, Matthias Jurenz wrote: > We made a big step forward today! > > The used Kernel has a bug regarding to the shared L1 instruction cache in AMD > Bulldozer processors: > See > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=dfb09f9b7ab03

Re: [OMPI devel] poor btl sm latency

2012-03-15 Thread Matthias Jurenz

gt; - Original Message - > > From: Matthias Jurenz [mailto:matthias.jur...@tu-dresden.de] > > Sent: Friday, March 09, 2012 08:50 AM > > To: Open MPI Developers > > Cc: Mora, Joshua > > Subject: Re: [OMPI devel] poor btl sm latency > > > > I just made an intere

Re: [OMPI devel] poor btl sm latency

2012-03-12 Thread Matthias Jurenz

.@tu-dresden.de] > Sent: Friday, March 09, 2012 08:50 AM > To: Open MPI Developers > Cc: Mora, Joshua > Subject: Re: [OMPI devel] poor btl sm latency > > I just made an interesting observation: > > When binding the processes to two neighboring cores (L2 sharing) NetPIPE

Re: [OMPI devel] poor btl sm latency

2012-03-09 Thread Matthias Jurenz

I just made an interesting observation: When binding the processes to two neighboring cores (L2 sharing) NetPIPE shows *sometimes* pretty good results: ~0.5us $ mpirun -mca btl sm,self -np 1 hwloc-bind -v core:0 ./NPmpi_ompi1.5.5 -u 4 -n 10 -p 0 : -np 1 hwloc-bind -v core:1 ./NPmpi_ompi1.5.

Re: [OMPI devel] poor btl sm latency

2012-03-05 Thread Matthias Jurenz

Here the SM BTL parameters: $ ompi_info --param btl sm MCA btl: parameter "btl_base_verbose" (current value: <0>, data source: default value) Verbosity level of the BTL framework MCA btl: parameter "btl" (current value: , data source: file [/sw/atlas/libraries/openmpi/1.5.5rc3/x86_64/etc/openmpi

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread George Bosilca

Please do a "ompi_info --param btl sm" on your environment. The lazy_free direct the internals of the SM BTL not to release the memory fragments used to communicate until the lazy limit is reached. The default value was deemed as reasonable a while back when the number of default fragments was l

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Matthias Jurenz

In thanks to the OTPO tool, I figured out that setting the MCA parameter btl_sm_fifo_lazy_free to 1 (default is 120) improves the latency significantly: 0,88µs But somehow I get the feeling that this doesn't eliminate the actual problem... Matthias On Friday 02 March 2012 15:37:03 Matthias Ju

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Matthias Jurenz

On Friday 02 March 2012 14:58:45 Jeffrey Squyres wrote: > Ok. Good that there's no oversubscription bug, at least. :-) > > Did you see my off-list mail to you yesterday about building with an > external copy of hwloc 1.4 to see if that helps? Yes, I did - I answered as well. Our mail server seem

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Jeffrey Squyres

Hah! I just saw your ticket about how --with-hwloc=/path/to/install is broken in 1.5.5. So -- let me go look in to that... On Mar 2, 2012, at 8:58 AM, Jeffrey Squyres wrote: > Ok. Good that there's no oversubscription bug, at least. :-) > > Did you see my off-list mail to you yesterday abo

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Jeffrey Squyres

Ok. Good that there's no oversubscription bug, at least. :-) Did you see my off-list mail to you yesterday about building with an external copy of hwloc 1.4 to see if that helps? On Mar 2, 2012, at 8:26 AM, Matthias Jurenz wrote: > To exclude a possible bug within the LSF component, I rebuil

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Matthias Jurenz

To exclude a possible bug within the LSF component, I rebuilt Open MPI without support for LSF (--without-lsf). -> It makes no difference - the latency is still bad: ~1.1us. Matthias On Friday 02 March 2012 13:50:13 Matthias Jurenz wrote: > SORRY, it was obviously a big mistake by me. :-( > >

Re: [OMPI devel] poor btl sm latency

2012-03-02 Thread Matthias Jurenz

SORRY, it was obviously a big mistake by me. :-( Open MPI 1.5.5 was built with LSF support, so when starting an LSF job it's necessary to request at least the number of tasks/cores as used for the subsequent mpirun command. That was not the case - I forgot the bsub's '-n' option to specify the

Re: [OMPI devel] poor btl sm latency

2012-02-28 Thread Matthias Jurenz

When using Open MPI v1.4.5 I get ~1.1us. That's the same result as I get with Open MPI v1.5.x using mpi_yield_when_idle=0. So I think there is a bug in Open MPI (v1.5.4 and v1.5.5rc2) regarding to the automatic performance mode selection. When enabling the degraded performance mode for Open MPI

Re: [OMPI devel] poor btl sm latency

2012-02-28 Thread Matthias Jurenz

Minor update: I see some improvement when I set the MCA parameter mpi_yield_when_idle to 0 to enforce the "Agressive" performance mode: $ mpirun -np 2 -mca mpi_yield_when_idle 0 -mca btl self,sm -bind-to-core - cpus-per-proc 2 ./NPmpi_ompi1.5.5 -u 4 -n 10 0: n090 1: n090 Now starting the mai

Re: [OMPI devel] poor btl sm latency

2012-02-28 Thread Christopher Samuel

-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13/02/12 22:11, Matthias Jurenz wrote: > Do you have any idea? Please help! Do you see the same bad latency in the old branch (1.4.5) ? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Com

Re: [OMPI devel] poor btl sm latency

2012-02-21 Thread Matthias Jurenz

Some supplements: I tried several compilers for building Open MPI with enabled optimizations for the AMD Bulldozer architecture: * gcc 4.6.2 (-Ofast -mtune=bdver1 -march=bdver1) * Open64 5.0 (-O3 -march=bgver1 -mtune=bdver1 -mso) * Intel 12.1 (-O3 -msse4.2) They all result in similar latencies

Re: [OMPI devel] poor btl sm latency

2012-02-20 Thread Matthias Jurenz

If the processes are bound for L2 sharing (i.e. using neighboring cores pu:0 and pu:1) I get the *worst* latency results: $ mpiexec -np 1 hwloc-bind pu:0 ./NPmpi -S -u 4 -n 10 : -np 1 hwloc-bind pu:1 ./NPmpi -S -u 4 -n 10 Using synchronous sends Using synchronous sends 0: n023 1: n023 No

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Brice Goglin

Le 16/02/2012 17:12, Matthias Jurenz a écrit : > Thanks for the hint, Brice. > I'll forward this bug report to our cluster vendor. > > Could this be the reason for the bad latencies with Open MPI or does it only > affect hwloc/lstopo? It affects binding. So it may affect the performance you obser

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Matthias Jurenz

Thanks for the hint, Brice. I'll forward this bug report to our cluster vendor. Could this be the reason for the bad latencies with Open MPI or does it only affect hwloc/lstopo? Matthias On Thursday 16 February 2012 15:46:46 Brice Goglin wrote: > Le 16/02/2012 15:39, Matthias Jurenz a écrit : >

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Matthias Jurenz

On Thursday 16 February 2012 16:50:55 Jeff Squyres wrote: > On Feb 16, 2012, at 10:30 AM, Matthias Jurenz wrote: > > $ mpirun -np 2 --bind-to-core --cpus-per-proc 2 hwloc-bind --get > > 0x0003 > > 0x000c > > That seems right. From your prior email, 3 maps to 11 binary, which maps > to:

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Edgar Gabriel

just a stupid question: in another sm-related thread the value of the $TMPDIR variable was the problem, could this be the problem here as well? Edgar On 2/16/2012 9:30 AM, Matthias Jurenz wrote: > On Thursday 16 February 2012 16:21:10 Jeff Squyres wrote: >> On Feb 16, 2012, at 9:39 AM, Matthias J

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Jeff Squyres

On Feb 16, 2012, at 10:30 AM, Matthias Jurenz wrote: > $ mpirun -np 2 --bind-to-core --cpus-per-proc 2 hwloc-bind --get > 0x0003 > 0x000c That seems right. From your prior email, 3 maps to 11 binary, which maps to: Socket L#0 (16GB) NUMANode L#0 (P#0 8190MB) + L3 L#0 (6144KB) L

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Matthias Jurenz

On Thursday 16 February 2012 16:21:10 Jeff Squyres wrote: > On Feb 16, 2012, at 9:39 AM, Matthias Jurenz wrote: > > However, the latencies are constant now but still too high: > > > > $ mpirun -np 2 --bind-to-core --cpus-per-proc 2 ./NPmpi_ompi1.5.5 -S -u > > 12 -n 10 > > Can you run: > > mp

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Jeff Squyres

On Feb 16, 2012, at 9:39 AM, Matthias Jurenz wrote: > However, the latencies are constant now but still too high: > > $ mpirun -np 2 --bind-to-core --cpus-per-proc 2 ./NPmpi_ompi1.5.5 -S -u 12 -n > 10 Can you run: mpirun -np 2 --bind-to-core --cpus-per-proc 2 hwloc-bind --get I want to ve

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Brice Goglin

Le 16/02/2012 15:39, Matthias Jurenz a écrit : > Here the output of lstopo from a single compute node. I'm wondering that the > fact of L1/L2 sharing isn't visible - also not in the graphical output... That's a kernel bug. We're waiting for AMD to tell the kernel that L1i and L2 are shared across

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Matthias Jurenz

: > >>>>> Few thoughts > >>>>> > >>>>> 1. Bind to socket is broken in 1.5.4 - fixed in next release > >>>>> > >>>>> 2. Add --report-bindings to cmd line and see where it thinks the > >>>>> procs are bound > >

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Jeff Squyres

cal - might be worth checking mem >>>>> binding. >>>>> >>>>> Sent from my iPad >>>>> >>>>> On Feb 13, 2012, at 7:07 AM, Matthias Jurenz >> >>> dresden.de> wrote: >>>>>> Hi Sylv

Re: [OMPI devel] poor btl sm latency

2012-02-16 Thread Matthias Jurenz

ranks to one socket: > >>>> $ mpirun -np 2 --bind-to-core ./all2all > >>>> $ mpirun -np 2 -mca mpi_paffinity_alone 1 ./all2all > >>>> > >>>> bind two ranks to two different sockets: > >>>> $ mpirun -np 2 --bind-to-socket ./all2all &

Re: [OMPI devel] poor btl sm latency

2012-02-15 Thread Jeff Squyres

> >>>> All three runs resulted in similar bad latencies (~1.4us). >>>> >>>> :-( >>>> >>>> Matthias >>>> >>>> On Monday 13 February 2012 12:43:22 sylvain.jeau...@bull.net wrote: >>>>> Hi Matthias, >>>>>

Re: [OMPI devel] poor btl sm latency

2012-02-14 Thread Matthias Jurenz

.jeau...@bull.net wrote: > >>> Hi Matthias, > >>> > >>> You might want to play with process binding to see if your problem is > >>> related to bad memory affinity. > >>> > >>> Try to launch pingpong on two CP

Re: [OMPI devel] poor btl sm latency

2012-02-13 Thread Jeff Squyres

CPUs of the same socket, then on different >>> sockets (i.e. bind each process to a core, and try different >>> configurations). >>> >>> Sylvain >>> >>> >>> >>> De :Matthias Jurenz >>> A : Open MPI Developer

Re: [OMPI devel] poor btl sm latency

2012-02-13 Thread Ralph Castain

a core, and try different >> configurations). >> >> Sylvain >> >> >> >> De :Matthias Jurenz >> A : Open MPI Developers >> Date : 13/02/2012 12:12 >> Objet : [OMPI devel] poor btl sm latency >> Envoyé par :devel-

Re: [OMPI devel] poor btl sm latency

2012-02-13 Thread Matthias Jurenz

s > Date : 13/02/2012 12:12 > Objet : [OMPI devel] poor btl sm latency > Envoyé par :devel-boun...@open-mpi.org > > > > Hello all, > > on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad > latencies > (~1.5us) when performing 0-byte p2p communica

Re: [OMPI devel] poor btl sm latency

2012-02-13 Thread sylvain . jeaugey

:Matthias Jurenz A : Open MPI Developers Date : 13/02/2012 12:12 Objet : [OMPI devel] poor btl sm latency Envoyé par :devel-boun...@open-mpi.org Hello all, on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad latencies (~1.5us) when performing 0-byte p2p communication on one

[OMPI devel] poor btl sm latency

2012-02-13 Thread Matthias Jurenz

Hello all, on our new AMD cluster (AMD Opteron 6274, 2,2GHz) we get very bad latencies (~1.5us) when performing 0-byte p2p communication on one single node using the Open MPI sm BTL. When using Platform MPI we get ~0.5us latencies which is pretty good. The bandwidth results are similar for both

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

Re: [OMPI devel] poor btl sm latency

[OMPI devel] poor btl sm latency

36 matches

Site Navigation

Mail list logo

Footer information