not sure whether it is relevant in this case, but I spent in January nearly one week to figure out why the openib component was running very slow with the new Open MPI releases (though it was the 2.x series at that time), and the culprit turned out to be the btl_openib_flags parameter. I used to set this parameter in former releases to get good performance on my cluster, but it lead to absolutely disastrous performance with the new version. So if you have any parameters set, try to remove them completely and see whether this makes a difference.

Edgar


On 3/23/2016 10:01 AM, Gilles Gouaillardet wrote:
Ronald,

out of curiosity, what kind of performance do you get with tcp and two nodes ?
e.g.
mpirun --mca tcp,vader,self ...

before that, you can
mpirun uptime
to ensure all your nodes are free
(e.g. no process was left running by an other job)

you might also want to allocate your nodes exclusively (iirc, qsub -x) to avoid side effects

Cheers,

Gilles

On Wednesday, March 23, 2016, Gilles Gouaillardet <gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> wrote:

    Ronald,

    first, can you make sure tm was built ?
    the easiest way us to
    configure --with-tm ...
    it will crash if tm is not found
    if pbs/torque is not installed in a standard location, then you
    have to
    configure --with-tm=<dir>

    then you can omit -hostfile from your mpirun command line

    hpl is known to scale, assuming the data is big enough, you use an
    optimized blas, and the right number of openmp threads
    (e.g. if you run 8 tasks per node, the you can have up to 2 openmp
    threads, but if you use 8 or 16 threads, then performance will be
    worst)
    first run xhpl one node, and when you get 80% of the peak
    performance, then you can run on two nodes.

    Cheers,

    Gilles

    On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote:

        The configure line was simply:

         ./configure --prefix=/home/rcohen

        when I run:

        mpirun --mca btl self,vader,openib ...

        I get the same lousy results: 1.5 GFLOPS

        The output of the grep is:

        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15
        Cpus_allowed_list:      0-7
        Cpus_allowed_list:      8-15


        linpack *HPL) certainly is known to scale fine.

        I am running a standard benchmark--HPL--linpack.

        I think it is not the compiler, but I could try that.

        Ron




        ---
        Ron Cohen
        recoh...@gmail.com
        skypename: ronaldcohen
        twitter: @recohen3


        On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
        <gilles.gouaillar...@gmail.com> wrote:
        > Ronald,
        >
        > the fix I mentioned landed into the v1.10 branch
        >
        
https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
        >
        > can you please post your configure command line ?
        >
        > you can also try to
        > mpirun --mca btl self,vader,openib ...
        > to make sure your run will abort instead of falling back to tcp
        >
        > then you can
        > mpirun ... grep Cpus_allowed_list /proc/self/status
        > to confirm your tasks do not end up bound to the same cores
        when running on
        > two nodes.
        >
        > is your application known to scale on infiniband network ?
        > or did you naively hope it would scale ?
        >
        > at first, I recommend you run standard benchmark to make
        sure you get the
        > performance you expect from your infiniband network
        > (for example IMB or OSU benchmark)
        > and run this test in the same environment than your app
        (e.g. via a batch
        > manager if applicable)
        >
        > if you do not get the performance you expect, then I suggest
        you try the
        > stock gcc compiler shipped with your distro and see if it helps.
        >
        > Cheers,
        >
        > Gilles
        >
        > On Wednesday, March 23, 2016, Ronald Cohen
        <recoh...@gmail.com> wrote:
        >>
        >> Thank  you! Here are the answers:
        >>
        >> I did not try a previous release of gcc.
        >> I built from a tarball.
        >> What should I do about the iirc issue--how should I check?
        >> Are there any flags I should be using for infiniband? Is this a
        >> problem with latency?
        >>
        >> Ron
        >>
        >>
        >> ---
        >> Ron Cohen
        >> recoh...@gmail.com
        >> skypename: ronaldcohen
        >> twitter: @recohen3
        >>
        >>
        >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
        >> <gilles.gouaillar...@gmail.com> wrote:
        >> > Ronald,
        >> >
        >> > did you try to build openmpi with a previous gcc release ?
        >> > if yes, what about the performance ?
        >> >
        >> > did you build openmpi from a tarball or from git ?
        >> > if from git and without VPATH, then you need to
        >> > configure with --disable-debug
        >> >
        >> > iirc, one issue was identified previously
        >> > (gcc optimization that prevents the memory wrapper from
        behaving as
        >> > expected) and I am not sure the fix landed in v1.10
        branch nor master
        >> > ...
        >> >
        >> > thanks for the info about gcc 6.0.0
        >> > now this is supported on a free compiler
        >> > (cray and intel already support that, but they are commercial
        >> > compilers),
        >> > I will resume my work on supporting this
        >> >
        >> > Cheers,
        >> >
        >> > Gilles
        >> >
        >> > On Wednesday, March 23, 2016, Ronald Cohen
        <recoh...@gmail.com> wrote:
        >> >>
        >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP
        running 8 cores
        >> >> on two nodes. It seems that quad-infiniband should do
        better than
        >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0
        20160317 . Any
        >> >> ideas of what to do to get usable performance? Thank you!
        >> >>
        >> >> bstatus
        >> >> Infiniband device 'mlx4_0' port 1 status:
        >> >>         default gid:
         fe80:0000:0000:0000:0002:c903:00ec:9301
        >> >>         base lid:        0x1
        >> >>         sm lid:          0x1
        >> >>         state:           4: ACTIVE
        >> >>         phys state:      5: LinkUp
        >> >>         rate:            56 Gb/sec (4X FDR)
        >> >>         link_layer:      InfiniBand
        >> >>
        >> >> Ron
        >> >> --
        >> >>
        >> >> Professor Dr. Ronald Cohen
        >> >> Ludwig Maximilians Universität
        >> >> Theresienstrasse 41 Room 207
        >> >> Department für Geo- und Umweltwissenschaften
        >> >> München
        >> >> 80333
        >> >> Deutschland
        >> >>
        >> >>
        >> >> ronald.co...@min.uni-muenchen.de
        >> >> skype: ronaldcohen
        >> >> +49 (0) 89 74567980
        >> >> ---
        >> >> Ronald Cohen
        >> >> Geophysical Laboratory
        >> >> Carnegie Institution
        >> >> 5251 Broad Branch Rd., N.W.
        >> >> Washington, D.C. 20015
        >> >> rco...@carnegiescience.edu
        >> >> office: 202-478-8937
        >> >> skype: ronaldcohen
        >> >> https://twitter.com/recohen3
        >> >> https://www.linkedin.com/profile/view?id=163327727
        >> >>
        >> >>
        >> >> ---
        >> >> Ron Cohen
        >> >> recoh...@gmail.com
        >> >> skypename: ronaldcohen
        >> >> twitter: @recohen3
        >> >> _______________________________________________
        >> >> users mailing list
        >> >> us...@open-mpi.org
        >> >> Subscription:
        http://www.open-mpi.org/mailman/listinfo.cgi/users
        >> >> Link to this post:
        >> >>
        http://www.open-mpi.org/community/lists/users/2016/03/28791.php
        >> >
        >> >
        >> > _______________________________________________
        >> > users mailing list
        >> > us...@open-mpi.org
        >> > Subscription:
        http://www.open-mpi.org/mailman/listinfo.cgi/users
        >> > Link to this post:
        >> >
        http://www.open-mpi.org/community/lists/users/2016/03/28793.php
        >> _______________________________________________
        >> users mailing list
        >> us...@open-mpi.org
        >> Subscription:
        http://www.open-mpi.org/mailman/listinfo.cgi/users
        >> Link to this post:
        >> http://www.open-mpi.org/community/lists/users/2016/03/28794.php
        >
        >
        > _______________________________________________
        > users mailing list
        > us...@open-mpi.org
        > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        > Link to this post:
        > http://www.open-mpi.org/community/lists/users/2016/03/28796.php
        _______________________________________________
        users mailing list
        us...@open-mpi.org
        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
        Link to this post:
        http://www.open-mpi.org/community/lists/users/2016/03/28800.php


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335
--

Reply via email to