Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you use?
On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain <r...@open-mpi.org> wrote: > Great! Would you mind showing the revised table? I'm curious as to the > relative performance. > > > On Jun 11, 2013, at 4:53 PM, eblo...@1scom.net wrote: > > > Problem solved. I did not configure with --with-mxm=/opt/mellanox/mcm and > > this location was not auto-detected. Once I rebuilt with this option, > > everything worked fine. Scaled better than MVAPICH out to 800. MVAPICH > > configure log showed that it had found this component of the OFED stack. > > > > Ed > > > > > >> If you run at 224 and things look okay, then I would suspect something > in > >> the upper level switch that spans cabinets. At that point, I'd have to > >> leave it to Mellanox to advise. > >> > >> > >> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com > > > >> wrote: > >> > >>> I tried adding "-mca btl openib,sm,self" but it did not make any > >>> difference. > >>> > >>> Jesus’ e-mail this morning has got me thinking. In our system, each > >>> cabinet has 224 cores, and we are reaching a different level of the > >>> system architecture when we go beyond 224. I got an additional data > >>> point at 256 and found that performance is already falling off. Perhaps > >>> I did not build OpenMPI properly to support the Mellanox adapters that > >>> are used in the backplane, or I need some configuration setting similar > >>> to FAQ #19 in the Tuning/Openfabrics section. > >>> > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > On > >>> Behalf Of Ralph Castain > >>> Sent: Sunday, June 09, 2013 6:48 PM > >>> To: Open MPI Users > >>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance > >>> problem > >>> > >>> Strange - it looks like a classic oversubscription behavior. Another > >>> possibility is that it isn't using IB for some reason when extended to > >>> the other nodes. What does your cmd line look like? Have you tried > >>> adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for > >>> some reason? > >>> > >>> > >>> On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com > > > >>> wrote: > >>> > >>> > >>> Correct. 20 nodes, 8 cores per dual-socket on each node = 360. > >>> > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > On > >>> Behalf Of Ralph Castain > >>> Sent: Sunday, June 09, 2013 6:18 PM > >>> To: Open MPI Users > >>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance > >>> problem > >>> > >>> So, just to be sure - when you run 320 "cores", you are running across > >>> 20 nodes? > >>> > >>> Just want to ensure we are using "core" the same way - some people > >>> confuse cores with hyperthreads. > >>> > >>> On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com > > > >>> wrote: > >>> > >>> > >>> > >>> 16. dual-socket Xeon, E5-2670. > >>> > >>> I am trying a larger model to see if the performance drop-off happens > at > >>> a different number of cores. > >>> Also I’m running some intermediate core-count sizes to refine the curve > >>> a bit. > >>> I also added mpi_show_mca_params all, and at the same time, > >>> btl_openib_use_eager_rdma 1, just to see if that does anything. > >>> > >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] > On > >>> Behalf Of Ralph Castain > >>> Sent: Sunday, June 09, 2013 5:04 PM > >>> To: Open MPI Users > >>> Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem > >>> > >>> Looks to me like things are okay thru 160, and then things fall apart > >>> after that point. How many cores are on a node? > >>> > >>> > >>> On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com > > > >>> wrote: > >>> > >>> > >>> > >>> > >>> I’m having some trouble getting good scaling with OpenMPI 1.6.4 and I > >>> don’t know where to start looking. This is an Infiniband FDR network > >>> with Sandy Bridge nodes. I am using affinity (--bind-to-core) but no > >>> other options. As the number of cores goes up, the message sizes are > >>> typically going down. There seem to be lots of options in the FAQ, and > I > >>> would welcome any advice on where to start. All these timings are on a > >>> completely empty system except for me. > >>> > >>> Thanks > >>> > >>> > >>> MPI # cores Ave. Rate Std. Dev. % # timings > >>> Speedup Efficiency > >>> > ================================================================================================ > >>> MVAPICH | 16 | 8.6783 | 0.995 % | 2 | > >>> 16.000 | 1.0000 > >>> MVAPICH | 48 | 8.7665 | 1.937 % | 3 | > >>> 47.517 | 0.9899 > >>> MVAPICH | 80 | 8.8900 | 2.291 % | 3 | > >>> 78.095 | 0.9762 > >>> MVAPICH | 160 | 8.9897 | 2.409 % | 3 | > >>> 154.457 | 0.9654 > >>> MVAPICH | 320 | 8.9780 | 2.801 % | 3 | > >>> 309.317 | 0.9666 > >>> MVAPICH | 480 | 8.9704 | 2.316 % | 3 | > >>> 464.366 | 0.9674 > >>> MVAPICH | 640 | 9.0792 | 1.138 % | 3 | > >>> 611.739 | 0.9558 > >>> MVAPICH | 720 | 9.1328 | 1.052 % | 3 | > >>> 684.162 | 0.9502 > >>> MVAPICH | 800 | 9.1945 | 0.773 % | 3 | > >>> 755.079 | 0.9438 > >>> OpenMPI | 16 | 8.6743 | 2.335 % | 2 | > >>> 16.000 | 1.0000 > >>> OpenMPI | 48 | 8.7826 | 1.605 % | 2 | > >>> 47.408 | 0.9877 > >>> OpenMPI | 80 | 8.8861 | 0.120 % | 2 | > >>> 78.093 | 0.9762 > >>> OpenMPI | 160 | 8.9774 | 0.785 % | 2 | > >>> 154.598 | 0.9662 > >>> OpenMPI | 320 | 12.0585 | 16.950 % | 2 | > >>> 230.191 | 0.7193 > >>> OpenMPI | 480 | 14.8330 | 1.300 % | 2 | > >>> 280.701 | 0.5848 > >>> OpenMPI | 640 | 17.1723 | 2.577 % | 3 | > >>> 323.283 | 0.5051 > >>> OpenMPI | 720 | 18.2153 | 2.798 % | 3 | > >>> 342.868 | 0.4762 > >>> OpenMPI | 800 | 19.3603 | 2.254 % | 3 | > >>> 358.434 | 0.4480 > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >>> > >>> _______________________________________________ > >>> users mailing list > >>> us...@open-mpi.org > >>> http://www.open-mpi.org/mailman/listinfo.cgi/users > >> > >> _______________________________________________ > >> users mailing list > >> us...@open-mpi.org > >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > > > _______________________________________________ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >