Thanks again Gilles. Ahh, better yet - I wasn't familiar with the config file way to set these parameters... it'll be easy to bake this into my AMI so that I don't have to set them each time while waiting for the next Open MPI release.
Out of mostly laziness I try to keep to the formal releases rather than applying patches myself, but thanks for the link to it (the commit comments were useful to understand why this improved performance). -Adam On Mon, Jul 10, 2017 at 12:04 AM, Gilles Gouaillardet <gil...@rist.or.jp> wrote: > Adam, > > > Thanks for letting us know your performance issue has been resolved. > > > yes, https://www.open-mpi.org/faq/?category=tcp is the best place to look > for this kind of information. > > i will add a reference to these parameters. i will also ask folks at AWS > if they have additional/other recommendations. > > > note you have a few options before 2.1.2 (or 3.0.0) is released : > > > - update your system wide config file (/.../etc/openmpi-mca-params.conf) > or user config file > > ($HOME/.openmpi/mca-params.conf) and add the following lines > > btl_tcp_sndbuf = 0 > > btl_tcp_rcvbuf = 0 > > > - add the following environment variable to your environment > > export OMPI_MCA_btl_tcp_sndbuf=0 > > export OMPI_MCA_btl_tcp_rcvbuf=0 > > > - use Open MPI 2.0.3 > > > - last but not least, you can manually download and apply the patch > available at > > https://github.com/open-mpi/ompi/commit/b64fedf4f652cadc9bfc > 7c4693f9c1ef01dfb69f.patch > > > Cheers, > > Gilles > > On 7/9/2017 11:04 PM, Adam Sylvester wrote: > >> Gilles, >> >> Thanks for the fast response! >> >> The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended >> made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of >> these flags... with a little Googling, is https://www.open-mpi.org/faq/? >> category=tcp the best place to look for this kind of information and any >> other tweaks I may want to try (or if there's a better FAQ out there, >> please let me know)? >> There is only eth0 on my machines so nothing to tweak there (though good >> to know for the future). I also didn't see any improvement by specifying >> more sockets per instance. But, your initial suggestion had a major impact. >> In general I try to stay relatively up to date with my Open MPI version; >> I'll be extra motivated to upgrade to 2.1.2 so that I don't have to >> remember to set these --mca flags on the command line. :o) >> -Adam >> >> On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet < >> gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>> >> wrote: >> >> Adam, >> >> at first, you need to change the default send and receive socket >> buffers : >> mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ... >> /* note this will be the default from Open MPI 2.1.2 */ >> >> hopefully, that will be enough to greatly improve the bandwidth for >> large messages. >> >> >> generally speaking, i recommend you use the latest (e.g. Open MPI >> 2.1.1) available version >> >> how many interfaces can be used to communicate between hosts ? >> if there is more than one (for example a slow and a fast one), you'd >> rather only use the fast one. >> for example, if eth0 is the fast interface, that can be achieved with >> mpirun --mca btl_tcp_if_include eth0 ... >> >> also, you might be able to achieve better results by using more than >> one socket on the fast interface. >> for example, if you want to use 4 sockets per interface >> mpirun --mca btl_tcp_links 4 ... >> >> >> >> Cheers, >> >> Gilles >> >> On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com >> <mailto:op8...@gmail.com>> wrote: >> > I am using Open MPI 2.1.0 on RHEL 7. My application has one >> unavoidable >> > pinch point where a large amount of data needs to be transferred >> (about 8 GB >> > of data needs to be both sent to and received all other ranks), >> and I'm >> > seeing worse performance than I would expect; this step has a >> major impact >> > on my overall runtime. In the real application, I am using >> MPI_Alltoall() >> > for this step, but for the purpose of a simple benchmark, I >> simplified to >> > simply do a single MPI_Send() / MPI_Recv() between two ranks of >> a 2 GB >> > buffer. >> > >> > I'm running this in AWS with instances that have 10 Gbps >> connectivity in the >> > same availability zone (according to tracepath, there are no >> hops between >> > them) and MTU set to 8801 bytes. Doing a non-MPI benchmark of >> sending data >> > directly over TCP between these two instances, I reliably get >> around 4 Gbps. >> > Between these same two instances with MPI_Send() / MPI_Recv(), I >> reliably >> > get around 2.4 Gbps. This seems like a major performance >> degradation for a >> > single MPI operation. >> > >> > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings. I'm >> > connecting between instances via ssh and using I assume TCP for >> the actual >> > network transfer (I'm not setting any special command-line or >> programmatic >> > settings). The actual command I'm running is: >> > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app >> > >> > Any advice on other things to test or compilation and/or runtime >> flags to >> > set would be much appreciated! >> > -Adam >> > >> > _______________________________________________ >> > users mailing list >> > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> <https://rfd.newmexicoconsortium.org/mailman/listinfo/users> >> >> >> >> >> _______________________________________________ >> users mailing list >> users@lists.open-mpi.org >> https://rfd.newmexicoconsortium.org/mailman/listinfo/users >> > > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/users >
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users