Re: [OMPI users] terrible infiniband performance for

Ronald Cohen Wed, 23 Mar 2016 13:21:06 -0400 (EDT)

So I want to thank you so much! My benchmark for my actual application
went from 5052 seconds to 266 seconds with this simple fix!


Ron

---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3


On Wed, Mar 23, 2016 at 11:00 AM, Ronald Cohen <recoh...@gmail.com> wrote:
> Dear Gilles,
>
> --with-tm fails. I have now built with
> ./configure --prefix=/home/rcohen --with-tm=/opt/torque
> make clean
> make -j 8
> make install
>
> This rebuilt greatly improved performance, from 1 GF to 32 GF for 2
> nodes for a 2000 size matrix.  For 5000 it went up to 108. So this
> sounds pretty good.
>
> Thank you so much! Is there a way to test and improve for latency?
>
> Thanks!
>
> Ron
>
> ---
> Ron Cohen
> recoh...@gmail.com
> skypename: ronaldcohen
> twitter: @recohen3
>
>
> On Wed, Mar 23, 2016 at 10:38 AM, Gilles Gouaillardet
> <gilles.gouaillar...@gmail.com> wrote:
>> Ronald,
>>
>> first, can you make sure tm was built ?
>> the easiest way us to
>> configure --with-tm ...
>> it will crash if tm is not found
>> if pbs/torque is not installed in a standard location, then you have to
>> configure --with-tm=<dir>
>>
>> then you can omit -hostfile from your mpirun command line
>>
>> hpl is known to scale, assuming the data is big enough, you use an optimized
>> blas, and the right number of openmp threads
>> (e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads,
>> but if you use 8 or 16 threads, then performance will be worst)
>> first run xhpl one node, and when you get 80% of the peak performance, then
>> you can run on two nodes.
>>
>> Cheers,
>>
>> Gilles
>>
>> On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote:
>>>
>>> The configure line was simply:
>>>
>>>  ./configure --prefix=/home/rcohen
>>>
>>> when I run:
>>>
>>> mpirun --mca btl self,vader,openib ...
>>>
>>> I get the same lousy results: 1.5 GFLOPS
>>>
>>> The output of the grep is:
>>>
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>> Cpus_allowed_list:      0-7
>>> Cpus_allowed_list:      8-15
>>>
>>>
>>> linpack *HPL) certainly is known to scale fine.
>>>
>>> I am running a standard benchmark--HPL--linpack.
>>>
>>> I think it is not the compiler, but I could try that.
>>>
>>> Ron
>>>
>>>
>>>
>>>
>>> ---
>>> Ron Cohen
>>> recoh...@gmail.com
>>> skypename: ronaldcohen
>>> twitter: @recohen3
>>>
>>>
>>> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
>>> <gilles.gouaillar...@gmail.com> wrote:
>>> > Ronald,
>>> >
>>> > the fix I mentioned landed into the v1.10 branch
>>> >
>>> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>>> >
>>> > can you please post your configure command line ?
>>> >
>>> > you can also try to
>>> > mpirun --mca btl self,vader,openib ...
>>> > to make sure your run will abort instead of falling back to tcp
>>> >
>>> > then you can
>>> > mpirun ... grep Cpus_allowed_list /proc/self/status
>>> > to confirm your tasks do not end up bound to the same cores when running
>>> > on
>>> > two nodes.
>>> >
>>> > is your application known to scale on infiniband network ?
>>> > or did you naively hope it would scale ?
>>> >
>>> > at first, I recommend you run standard benchmark to make sure you get
>>> > the
>>> > performance you expect from your infiniband network
>>> > (for example IMB or OSU benchmark)
>>> > and run this test in the same environment than your app (e.g. via a
>>> > batch
>>> > manager if applicable)
>>> >
>>> > if you do not get the performance you expect, then I suggest you try the
>>> > stock gcc compiler shipped with your distro and see if it helps.
>>> >
>>> > Cheers,
>>> >
>>> > Gilles
>>> >
>>> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote:
>>> >>
>>> >> Thank  you! Here are the answers:
>>> >>
>>> >> I did not try a previous release of gcc.
>>> >> I built from a tarball.
>>> >> What should I do about the iirc issue--how should I check?
>>> >> Are there any flags I should be using for infiniband? Is this a
>>> >> problem with latency?
>>> >>
>>> >> Ron
>>> >>
>>> >>
>>> >> ---
>>> >> Ron Cohen
>>> >> recoh...@gmail.com
>>> >> skypename: ronaldcohen
>>> >> twitter: @recohen3
>>> >>
>>> >>
>>> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>>> >> <gilles.gouaillar...@gmail.com> wrote:
>>> >> > Ronald,
>>> >> >
>>> >> > did you try to build openmpi with a previous gcc release ?
>>> >> > if yes, what about the performance ?
>>> >> >
>>> >> > did you build openmpi from a tarball or from git ?
>>> >> > if from git and without VPATH, then you need to
>>> >> > configure with --disable-debug
>>> >> >
>>> >> > iirc, one issue was identified previously
>>> >> > (gcc optimization that prevents the memory wrapper from behaving as
>>> >> > expected) and I am not sure the fix landed in v1.10 branch nor master
>>> >> > ...
>>> >> >
>>> >> > thanks for the info about gcc 6.0.0
>>> >> > now this is supported on a free compiler
>>> >> > (cray and intel already support that, but they are commercial
>>> >> > compilers),
>>> >> > I will resume my work on supporting this
>>> >> >
>>> >> > Cheers,
>>> >> >
>>> >> > Gilles
>>> >> >
>>> >> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8
>>> >> >> cores
>>> >> >> on two nodes. It seems that quad-infiniband should do better than
>>> >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any
>>> >> >> ideas of what to do to get usable performance? Thank you!
>>> >> >>
>>> >> >> bstatus
>>> >> >> Infiniband device 'mlx4_0' port 1 status:
>>> >> >>         default gid:     fe80:0000:0000:0000:0002:c903:00ec:9301
>>> >> >>         base lid:        0x1
>>> >> >>         sm lid:          0x1
>>> >> >>         state:           4: ACTIVE
>>> >> >>         phys state:      5: LinkUp
>>> >> >>         rate:            56 Gb/sec (4X FDR)
>>> >> >>         link_layer:      InfiniBand
>>> >> >>
>>> >> >> Ron
>>> >> >> --
>>> >> >>
>>> >> >> Professor Dr. Ronald Cohen
>>> >> >> Ludwig Maximilians Universität
>>> >> >> Theresienstrasse 41 Room 207
>>> >> >> Department für Geo- und Umweltwissenschaften
>>> >> >> München
>>> >> >> 80333
>>> >> >> Deutschland
>>> >> >>
>>> >> >>
>>> >> >> ronald.co...@min.uni-muenchen.de
>>> >> >> skype: ronaldcohen
>>> >> >> +49 (0) 89 74567980
>>> >> >> ---
>>> >> >> Ronald Cohen
>>> >> >> Geophysical Laboratory
>>> >> >> Carnegie Institution
>>> >> >> 5251 Broad Branch Rd., N.W.
>>> >> >> Washington, D.C. 20015
>>> >> >> rco...@carnegiescience.edu
>>> >> >> office: 202-478-8937
>>> >> >> skype: ronaldcohen
>>> >> >> https://twitter.com/recohen3
>>> >> >> https://www.linkedin.com/profile/view?id=163327727
>>> >> >>
>>> >> >>
>>> >> >> ---
>>> >> >> Ron Cohen
>>> >> >> recoh...@gmail.com
>>> >> >> skypename: ronaldcohen
>>> >> >> twitter: @recohen3
>>> >> >> _______________________________________________
>>> >> >> users mailing list
>>> >> >> us...@open-mpi.org
>>> >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >> >> Link to this post:
>>> >> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php
>>> >> >
>>> >> >
>>> >> > _______________________________________________
>>> >> > users mailing list
>>> >> > us...@open-mpi.org
>>> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >> > Link to this post:
>>> >> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php
>>> >> _______________________________________________
>>> >> users mailing list
>>> >> us...@open-mpi.org
>>> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> >> Link to this post:
>>> >> http://www.open-mpi.org/community/lists/users/2016/03/28794.php
>>> >
>>> >
>>> > _______________________________________________
>>> > users mailing list
>>> > us...@open-mpi.org
>>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> > Link to this post:
>>> > http://www.open-mpi.org/community/lists/users/2016/03/28796.php
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2016/03/28800.php
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/03/28801.php

Re: [OMPI users] terrible infiniband performance for

Reply via email to