not sure whether it is relevant in this case, but I spent in January
nearly one week to figure out why the openib component was running very
slow with the new Open MPI releases (though it was the 2.x series at
that time), and the culprit turned out to be the
btl_openib_flags parameter. I used to set this parameter in former
releases to get good performance on my cluster, but it lead to
absolutely disastrous performance with the new version. So if you have
any parameters set, try to remove them completely and see whether this
makes a difference.
Edgar
On 3/23/2016 10:01 AM, Gilles Gouaillardet wrote:
Ronald,
out of curiosity, what kind of performance do you get with tcp and two
nodes ?
e.g.
mpirun --mca tcp,vader,self ...
before that, you can
mpirun uptime
to ensure all your nodes are free
(e.g. no process was left running by an other job)
you might also want to allocate your nodes exclusively (iirc, qsub -x)
to avoid side effects
Cheers,
Gilles
On Wednesday, March 23, 2016, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
wrote:
Ronald,
first, can you make sure tm was built ?
the easiest way us to
configure --with-tm ...
it will crash if tm is not found
if pbs/torque is not installed in a standard location, then you
have to
configure --with-tm=<dir>
then you can omit -hostfile from your mpirun command line
hpl is known to scale, assuming the data is big enough, you use an
optimized blas, and the right number of openmp threads
(e.g. if you run 8 tasks per node, the you can have up to 2 openmp
threads, but if you use 8 or 16 threads, then performance will be
worst)
first run xhpl one node, and when you get 80% of the peak
performance, then you can run on two nodes.
Cheers,
Gilles
On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote:
The configure line was simply:
./configure --prefix=/home/rcohen
when I run:
mpirun --mca btl self,vader,openib ...
I get the same lousy results: 1.5 GFLOPS
The output of the grep is:
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
Cpus_allowed_list: 0-7
Cpus_allowed_list: 8-15
linpack *HPL) certainly is known to scale fine.
I am running a standard benchmark--HPL--linpack.
I think it is not the compiler, but I could try that.
Ron
---
Ron Cohen
recoh...@gmail.com
skypename: ronaldcohen
twitter: @recohen3
On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet
<gilles.gouaillar...@gmail.com> wrote:
> Ronald,
>
> the fix I mentioned landed into the v1.10 branch
>
https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62
>
> can you please post your configure command line ?
>
> you can also try to
> mpirun --mca btl self,vader,openib ...
> to make sure your run will abort instead of falling back to tcp
>
> then you can
> mpirun ... grep Cpus_allowed_list /proc/self/status
> to confirm your tasks do not end up bound to the same cores
when running on
> two nodes.
>
> is your application known to scale on infiniband network ?
> or did you naively hope it would scale ?
>
> at first, I recommend you run standard benchmark to make
sure you get the
> performance you expect from your infiniband network
> (for example IMB or OSU benchmark)
> and run this test in the same environment than your app
(e.g. via a batch
> manager if applicable)
>
> if you do not get the performance you expect, then I suggest
you try the
> stock gcc compiler shipped with your distro and see if it helps.
>
> Cheers,
>
> Gilles
>
> On Wednesday, March 23, 2016, Ronald Cohen
<recoh...@gmail.com> wrote:
>>
>> Thank you! Here are the answers:
>>
>> I did not try a previous release of gcc.
>> I built from a tarball.
>> What should I do about the iirc issue--how should I check?
>> Are there any flags I should be using for infiniband? Is this a
>> problem with latency?
>>
>> Ron
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>> <gilles.gouaillar...@gmail.com> wrote:
>> > Ronald,
>> >
>> > did you try to build openmpi with a previous gcc release ?
>> > if yes, what about the performance ?
>> >
>> > did you build openmpi from a tarball or from git ?
>> > if from git and without VPATH, then you need to
>> > configure with --disable-debug
>> >
>> > iirc, one issue was identified previously
>> > (gcc optimization that prevents the memory wrapper from
behaving as
>> > expected) and I am not sure the fix landed in v1.10
branch nor master
>> > ...
>> >
>> > thanks for the info about gcc 6.0.0
>> > now this is supported on a free compiler
>> > (cray and intel already support that, but they are commercial
>> > compilers),
>> > I will resume my work on supporting this
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen
<recoh...@gmail.com> wrote:
>> >>
>> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP
running 8 cores
>> >> on two nodes. It seems that quad-infiniband should do
better than
>> >> this. I built openmpi-1.10.2g with gcc version 6.0.0
20160317 . Any
>> >> ideas of what to do to get usable performance? Thank you!
>> >>
>> >> bstatus
>> >> Infiniband device 'mlx4_0' port 1 status:
>> >> default gid:
fe80:0000:0000:0000:0002:c903:00ec:9301
>> >> base lid: 0x1
>> >> sm lid: 0x1
>> >> state: 4: ACTIVE
>> >> phys state: 5: LinkUp
>> >> rate: 56 Gb/sec (4X FDR)
>> >> link_layer: InfiniBand
>> >>
>> >> Ron
>> >> --
>> >>
>> >> Professor Dr. Ronald Cohen
>> >> Ludwig Maximilians Universität
>> >> Theresienstrasse 41 Room 207
>> >> Department für Geo- und Umweltwissenschaften
>> >> München
>> >> 80333
>> >> Deutschland
>> >>
>> >>
>> >> ronald.co...@min.uni-muenchen.de
>> >> skype: ronaldcohen
>> >> +49 (0) 89 74567980
>> >> ---
>> >> Ronald Cohen
>> >> Geophysical Laboratory
>> >> Carnegie Institution
>> >> 5251 Broad Branch Rd., N.W.
>> >> Washington, D.C. 20015
>> >> rco...@carnegiescience.edu
>> >> office: 202-478-8937
>> >> skype: ronaldcohen
>> >> https://twitter.com/recohen3
>> >> https://www.linkedin.com/profile/view?id=163327727
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >> _______________________________________________
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> Link to this post:
>> >>
http://www.open-mpi.org/community/lists/users/2016/03/28791.php
>> >
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> >
http://www.open-mpi.org/community/lists/users/2016/03/28793.php
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2016/03/28794.php
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28796.php
_______________________________________________
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2016/03/28800.php
--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab http://pstl.cs.uh.edu
Department of Computer Science University of Houston
Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
--