Also, what ofed version (ofed_info -s) and mxm version (rpm -qi mxm) do you
use?


On Wed, Jun 12, 2013 at 3:30 AM, Ralph Castain <r...@open-mpi.org> wrote:

> Great! Would you mind showing the revised table? I'm curious as to the
> relative performance.
>
>
> On Jun 11, 2013, at 4:53 PM, eblo...@1scom.net wrote:
>
> > Problem solved. I did not configure with --with-mxm=/opt/mellanox/mcm and
> > this location was not auto-detected.  Once I rebuilt with this option,
> > everything worked fine. Scaled better than MVAPICH out to 800. MVAPICH
> > configure log showed that it had found this component of the OFED stack.
> >
> > Ed
> >
> >
> >> If you run at 224 and things look okay, then I would suspect something
> in
> >> the upper level switch that spans cabinets. At that point, I'd have to
> >> leave it to Mellanox to advise.
> >>
> >>
> >> On Jun 11, 2013, at 6:55 AM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com
> >
> >> wrote:
> >>
> >>> I tried adding "-mca btl openib,sm,self"  but it did not make any
> >>> difference.
> >>>
> >>> Jesus’ e-mail this morning has got me thinking.  In our system, each
> >>> cabinet has 224 cores, and we are reaching a different level of the
> >>> system architecture when we go beyond 224.  I got an additional data
> >>> point at 256 and found that performance is already falling off. Perhaps
> >>> I did not build OpenMPI properly to support the Mellanox adapters that
> >>> are used in the backplane, or I need some configuration setting similar
> >>> to FAQ #19 in the Tuning/Openfabrics section.
> >>>
> >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
> On
> >>> Behalf Of Ralph Castain
> >>> Sent: Sunday, June 09, 2013 6:48 PM
> >>> To: Open MPI Users
> >>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
> >>> problem
> >>>
> >>> Strange - it looks like a classic oversubscription behavior. Another
> >>> possibility is that it isn't using IB for some reason when extended to
> >>> the other nodes. What does your cmd line look like? Have you tried
> >>> adding "-mca btl openib,sm,self" just to ensure it doesn't use TCP for
> >>> some reason?
> >>>
> >>>
> >>> On Jun 9, 2013, at 4:31 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com
> >
> >>> wrote:
> >>>
> >>>
> >>> Correct.  20 nodes, 8 cores per dual-socket on each node = 360.
> >>>
> >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
> On
> >>> Behalf Of Ralph Castain
> >>> Sent: Sunday, June 09, 2013 6:18 PM
> >>> To: Open MPI Users
> >>> Subject: Re: [OMPI users] EXTERNAL: Re: Need advice on performance
> >>> problem
> >>>
> >>> So, just to be sure - when you run 320 "cores", you are running across
> >>> 20 nodes?
> >>>
> >>> Just want to ensure we are using "core" the same way - some people
> >>> confuse cores with hyperthreads.
> >>>
> >>> On Jun 9, 2013, at 3:50 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com
> >
> >>> wrote:
> >>>
> >>>
> >>>
> >>> 16.  dual-socket Xeon, E5-2670.
> >>>
> >>> I am trying a larger model to see if the performance drop-off happens
> at
> >>> a different number of cores.
> >>> Also I’m running some intermediate core-count sizes to refine the curve
> >>> a bit.
> >>> I also added mpi_show_mca_params all, and at the same time,
> >>> btl_openib_use_eager_rdma 1, just to see if that does anything.
> >>>
> >>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org]
> On
> >>> Behalf Of Ralph Castain
> >>> Sent: Sunday, June 09, 2013 5:04 PM
> >>> To: Open MPI Users
> >>> Subject: EXTERNAL: Re: [OMPI users] Need advice on performance problem
> >>>
> >>> Looks to me like things are okay thru 160, and then things fall apart
> >>> after that point. How many cores are on a node?
> >>>
> >>>
> >>> On Jun 9, 2013, at 1:59 PM, "Blosch, Edwin L" <edwin.l.blo...@lmco.com
> >
> >>> wrote:
> >>>
> >>>
> >>>
> >>>
> >>> I’m having some trouble getting good scaling with OpenMPI 1.6.4 and I
> >>> don’t know where to start looking. This is an Infiniband FDR network
> >>> with Sandy Bridge nodes.  I am using affinity (--bind-to-core) but no
> >>> other options. As the number of cores goes up, the message sizes are
> >>> typically going down. There seem to be lots of options in the FAQ, and
> I
> >>> would welcome any advice on where to start.  All these timings are on a
> >>> completely empty system except for me.
> >>>
> >>> Thanks
> >>>
> >>>
> >>>    MPI              # cores   Ave. Rate   Std. Dev. %  # timings
> >>> Speedup    Efficiency
> >>>
> ================================================================================================
> >>> MVAPICH            |   16   |    8.6783  |   0.995 % |       2  |
> >>> 16.000  |  1.0000
> >>> MVAPICH            |   48   |    8.7665  |   1.937 % |       3  |
> >>> 47.517  |  0.9899
> >>> MVAPICH            |   80   |    8.8900  |   2.291 % |       3  |
> >>> 78.095  |  0.9762
> >>> MVAPICH            |  160   |    8.9897  |   2.409 % |       3  |
> >>> 154.457  |  0.9654
> >>> MVAPICH            |  320   |    8.9780  |   2.801 % |       3  |
> >>> 309.317  |  0.9666
> >>> MVAPICH            |  480   |    8.9704  |   2.316 % |       3  |
> >>> 464.366  |  0.9674
> >>> MVAPICH            |  640   |    9.0792  |   1.138 % |       3  |
> >>> 611.739  |  0.9558
> >>> MVAPICH            |  720   |    9.1328  |   1.052 % |       3  |
> >>> 684.162  |  0.9502
> >>> MVAPICH            |  800   |    9.1945  |   0.773 % |       3  |
> >>> 755.079  |  0.9438
> >>> OpenMPI            |   16   |    8.6743  |   2.335 % |       2  |
> >>> 16.000  |  1.0000
> >>> OpenMPI            |   48   |    8.7826  |   1.605 % |       2  |
> >>> 47.408  |  0.9877
> >>> OpenMPI            |   80   |    8.8861  |   0.120 % |       2  |
> >>> 78.093  |  0.9762
> >>> OpenMPI            |  160   |    8.9774  |   0.785 % |       2  |
> >>> 154.598  |  0.9662
> >>> OpenMPI            |  320   |   12.0585  |  16.950 % |       2  |
> >>> 230.191  |  0.7193
> >>> OpenMPI            |  480   |   14.8330  |   1.300 % |       2  |
> >>> 280.701  |  0.5848
> >>> OpenMPI            |  640   |   17.1723  |   2.577 % |       3  |
> >>> 323.283  |  0.5051
> >>> OpenMPI            |  720   |   18.2153  |   2.798 % |       3  |
> >>> 342.868  |  0.4762
> >>> OpenMPI            |  800   |   19.3603  |   2.254 % |       3  |
> >>> 358.434  |  0.4480
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> _______________________________________________
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> > _______________________________________________
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to