Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
So I want to thank you so much! My benchmark for my actual application went from 5052 seconds to 266 seconds with this simple fix! Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Wed, Mar 23, 2016 at 11:00 AM, Ronald Cohen wrote: > Dear Gilles, > > --with-tm f

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I don't have any parameters set other than the defaults--thank you! Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Wed, Mar 23, 2016 at 11:07 AM, Edgar Gabriel wrote: > not sure whether it is relevant in this case, but I spent in January nearly > one week to

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Edgar Gabriel
not sure whether it is relevant in this case, but I spent in January nearly one week to figure out why the openib component was running very slow with the new Open MPI releases (though it was the 2.x series at that time), and the culprit turned out to be the btl_openib_flags parameter. I used to

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald, out of curiosity, what kind of performance do you get with tcp and two nodes ? e.g. mpirun --mca tcp,vader,self ... before that, you can mpirun uptime to ensure all your nodes are free (e.g. no process was left running by an other job) you might also want to allocate your nodes exclusive

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
Dear Gilles, --with-tm fails. I have now built with ./configure --prefix=/home/rcohen --with-tm=/opt/torque make clean make -j 8 make install This rebuilt greatly improved performance, from 1 GF to 32 GF for 2 nodes for a 2000 size matrix. For 5000 it went up to 108. So this sounds pretty good.

[OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald, first, can you make sure tm was built ? the easiest way us to configure --with-tm ... it will crash if tm is not found if pbs/torque is not installed in a standard location, then you have to configure --with-tm= then you can omit -hostfile from your mpirun command line hpl is known to sc

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
The configure line was simply: ./configure --prefix=/home/rcohen when I run: mpirun --mca btl self,vader,openib ... I get the same lousy results: 1.5 GFLOPS The output of the grep is: Cpus_allowed_list: 0-7 Cpus_allowed_list: 8-15 Cpus_allowed_list: 0-7 Cpus_allowed_list:

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I have tried: mpirun --mca btl openib,self -hostfile $PBS_NODEFILE -n 16 xhpl > xhpl.out and mpirun -hostfile $PBS_NODEFILE -n 16 xhpl > xhpl.out How do I run "sanity checks, like OSU latency and bandwidth benchmarks between the nodes?" I am not superuser. Thanks, Ron --- Ron Cohen recoh.

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald, the fix I mentioned landed into the v1.10 branch https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62 can you please post your configure command line ? you can also try to mpirun --mca btl self,vader,openib ... to make sure your run will abort instead

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Joshua Ladd
Hi, Ron Please include the command line you used in your tests. Have you run any sanity checks, like OSU latency and bandwidth benchmarks between the nodes? Josh On Wed, Mar 23, 2016 at 8:47 AM, Ronald Cohen wrote: > Thank you! Here are the answers: > > I did not try a previous release of gcc

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
Thank you! Here are the answers: I did not try a previous release of gcc. I built from a tarball. What should I do about the iirc issue--how should I check? Are there any flags I should be using for infiniband? Is this a problem with latency? Ron --- Ron Cohen recoh...@gmail.com skypename: ron

Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Gilles Gouaillardet
Ronald, did you try to build openmpi with a previous gcc release ? if yes, what about the performance ? did you build openmpi from a tarball or from git ? if from git and without VPATH, then you need to configure with --disable-debug iirc, one issue was identified previously (gcc optimization th

Re: [OMPI users] terrible infiniband performance for HPL, & gfortran message

2016-03-23 Thread Ronald Cohen
Attached is the output of ompi_info --all . Note that the message : Fort use mpi_f08: yes Fort mpi_f08 compliance: The mpi_f08 module is available, but due to limitations in the gfortran compiler, does not support the following: array subsections, direct passthru (where possible) to under

[OMPI users] terrible infiniband performance for

2016-03-23 Thread Ronald Cohen
I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 cores on two nodes. It seems that quad-infiniband should do better than this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any ideas of what to do to get usable performance? Thank you! bstatus Infiniband device 'mlx4_0'