Dear Gilles, --with-tm fails. I have now built with ./configure --prefix=/home/rcohen --with-tm=/opt/torque make clean make -j 8 make install
This rebuilt greatly improved performance, from 1 GF to 32 GF for 2 nodes for a 2000 size matrix. For 5000 it went up to 108. So this sounds pretty good. Thank you so much! Is there a way to test and improve for latency? Thanks! Ron --- Ron Cohen recoh...@gmail.com skypename: ronaldcohen twitter: @recohen3 On Wed, Mar 23, 2016 at 10:38 AM, Gilles Gouaillardet <gilles.gouaillar...@gmail.com> wrote: > Ronald, > > first, can you make sure tm was built ? > the easiest way us to > configure --with-tm ... > it will crash if tm is not found > if pbs/torque is not installed in a standard location, then you have to > configure --with-tm=<dir> > > then you can omit -hostfile from your mpirun command line > > hpl is known to scale, assuming the data is big enough, you use an optimized > blas, and the right number of openmp threads > (e.g. if you run 8 tasks per node, the you can have up to 2 openmp threads, > but if you use 8 or 16 threads, then performance will be worst) > first run xhpl one node, and when you get 80% of the peak performance, then > you can run on two nodes. > > Cheers, > > Gilles > > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote: >> >> The configure line was simply: >> >> ./configure --prefix=/home/rcohen >> >> when I run: >> >> mpirun --mca btl self,vader,openib ... >> >> I get the same lousy results: 1.5 GFLOPS >> >> The output of the grep is: >> >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> Cpus_allowed_list: 0-7 >> Cpus_allowed_list: 8-15 >> >> >> linpack *HPL) certainly is known to scale fine. >> >> I am running a standard benchmark--HPL--linpack. >> >> I think it is not the compiler, but I could try that. >> >> Ron >> >> >> >> >> --- >> Ron Cohen >> recoh...@gmail.com >> skypename: ronaldcohen >> twitter: @recohen3 >> >> >> On Wed, Mar 23, 2016 at 9:32 AM, Gilles Gouaillardet >> <gilles.gouaillar...@gmail.com> wrote: >> > Ronald, >> > >> > the fix I mentioned landed into the v1.10 branch >> > >> > https://github.com/open-mpi/ompi-release/commit/c376994b81030cfa380c29d5b8f60c3e53d3df62 >> > >> > can you please post your configure command line ? >> > >> > you can also try to >> > mpirun --mca btl self,vader,openib ... >> > to make sure your run will abort instead of falling back to tcp >> > >> > then you can >> > mpirun ... grep Cpus_allowed_list /proc/self/status >> > to confirm your tasks do not end up bound to the same cores when running >> > on >> > two nodes. >> > >> > is your application known to scale on infiniband network ? >> > or did you naively hope it would scale ? >> > >> > at first, I recommend you run standard benchmark to make sure you get >> > the >> > performance you expect from your infiniband network >> > (for example IMB or OSU benchmark) >> > and run this test in the same environment than your app (e.g. via a >> > batch >> > manager if applicable) >> > >> > if you do not get the performance you expect, then I suggest you try the >> > stock gcc compiler shipped with your distro and see if it helps. >> > >> > Cheers, >> > >> > Gilles >> > >> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> wrote: >> >> >> >> Thank you! Here are the answers: >> >> >> >> I did not try a previous release of gcc. >> >> I built from a tarball. >> >> What should I do about the iirc issue--how should I check? >> >> Are there any flags I should be using for infiniband? Is this a >> >> problem with latency? >> >> >> >> Ron >> >> >> >> >> >> --- >> >> Ron Cohen >> >> recoh...@gmail.com >> >> skypename: ronaldcohen >> >> twitter: @recohen3 >> >> >> >> >> >> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet >> >> <gilles.gouaillar...@gmail.com> wrote: >> >> > Ronald, >> >> > >> >> > did you try to build openmpi with a previous gcc release ? >> >> > if yes, what about the performance ? >> >> > >> >> > did you build openmpi from a tarball or from git ? >> >> > if from git and without VPATH, then you need to >> >> > configure with --disable-debug >> >> > >> >> > iirc, one issue was identified previously >> >> > (gcc optimization that prevents the memory wrapper from behaving as >> >> > expected) and I am not sure the fix landed in v1.10 branch nor master >> >> > ... >> >> > >> >> > thanks for the info about gcc 6.0.0 >> >> > now this is supported on a free compiler >> >> > (cray and intel already support that, but they are commercial >> >> > compilers), >> >> > I will resume my work on supporting this >> >> > >> >> > Cheers, >> >> > >> >> > Gilles >> >> > >> >> > On Wednesday, March 23, 2016, Ronald Cohen <recoh...@gmail.com> >> >> > wrote: >> >> >> >> >> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP running 8 >> >> >> cores >> >> >> on two nodes. It seems that quad-infiniband should do better than >> >> >> this. I built openmpi-1.10.2g with gcc version 6.0.0 20160317 . Any >> >> >> ideas of what to do to get usable performance? Thank you! >> >> >> >> >> >> bstatus >> >> >> Infiniband device 'mlx4_0' port 1 status: >> >> >> default gid: fe80:0000:0000:0000:0002:c903:00ec:9301 >> >> >> base lid: 0x1 >> >> >> sm lid: 0x1 >> >> >> state: 4: ACTIVE >> >> >> phys state: 5: LinkUp >> >> >> rate: 56 Gb/sec (4X FDR) >> >> >> link_layer: InfiniBand >> >> >> >> >> >> Ron >> >> >> -- >> >> >> >> >> >> Professor Dr. Ronald Cohen >> >> >> Ludwig Maximilians Universität >> >> >> Theresienstrasse 41 Room 207 >> >> >> Department für Geo- und Umweltwissenschaften >> >> >> München >> >> >> 80333 >> >> >> Deutschland >> >> >> >> >> >> >> >> >> ronald.co...@min.uni-muenchen.de >> >> >> skype: ronaldcohen >> >> >> +49 (0) 89 74567980 >> >> >> --- >> >> >> Ronald Cohen >> >> >> Geophysical Laboratory >> >> >> Carnegie Institution >> >> >> 5251 Broad Branch Rd., N.W. >> >> >> Washington, D.C. 20015 >> >> >> rco...@carnegiescience.edu >> >> >> office: 202-478-8937 >> >> >> skype: ronaldcohen >> >> >> https://twitter.com/recohen3 >> >> >> https://www.linkedin.com/profile/view?id=163327727 >> >> >> >> >> >> >> >> >> --- >> >> >> Ron Cohen >> >> >> recoh...@gmail.com >> >> >> skypename: ronaldcohen >> >> >> twitter: @recohen3 >> >> >> _______________________________________________ >> >> >> users mailing list >> >> >> us...@open-mpi.org >> >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> Link to this post: >> >> >> http://www.open-mpi.org/community/lists/users/2016/03/28791.php >> >> > >> >> > >> >> > _______________________________________________ >> >> > users mailing list >> >> > us...@open-mpi.org >> >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > Link to this post: >> >> > http://www.open-mpi.org/community/lists/users/2016/03/28793.php >> >> _______________________________________________ >> >> users mailing list >> >> us...@open-mpi.org >> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> Link to this post: >> >> http://www.open-mpi.org/community/lists/users/2016/03/28794.php >> > >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> > Link to this post: >> > http://www.open-mpi.org/community/lists/users/2016/03/28796.php >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2016/03/28800.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/03/28801.php