Thanks again Gilles.  Ahh, better yet - I wasn't familiar with the config
file way to set these parameters... it'll be easy to bake this into my AMI
so that I don't have to set them each time while waiting for the next Open
MPI release.

Out of mostly laziness I try to keep to the formal releases rather than
applying patches myself, but thanks for the link to it (the commit comments
were useful to understand why this improved performance).

-Adam

On Mon, Jul 10, 2017 at 12:04 AM, Gilles Gouaillardet <gil...@rist.or.jp>
wrote:

> Adam,
>
>
> Thanks for letting us know your performance issue has been resolved.
>
>
> yes, https://www.open-mpi.org/faq/?category=tcp is the best place to look
> for this kind of information.
>
> i will add a reference to these parameters. i will also ask folks at AWS
> if they have additional/other recommendations.
>
>
> note you have a few options before 2.1.2 (or 3.0.0) is released :
>
>
> - update your system wide config file (/.../etc/openmpi-mca-params.conf)
> or user config file
>
>   ($HOME/.openmpi/mca-params.conf) and add the following lines
>
> btl_tcp_sndbuf = 0
>
> btl_tcp_rcvbuf = 0
>
>
> - add the following environment variable to your environment
>
> export OMPI_MCA_btl_tcp_sndbuf=0
>
> export OMPI_MCA_btl_tcp_rcvbuf=0
>
>
> - use Open MPI 2.0.3
>
>
> - last but not least, you can manually download and apply the patch
> available at
>
> https://github.com/open-mpi/ompi/commit/b64fedf4f652cadc9bfc
> 7c4693f9c1ef01dfb69f.patch
>
>
> Cheers,
>
> Gilles
>
> On 7/9/2017 11:04 PM, Adam Sylvester wrote:
>
>> Gilles,
>>
>> Thanks for the fast response!
>>
>> The --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 flags you recommended
>> made a huge difference - this got me up to 5.7 Gb/s! I wasn't aware of
>> these flags... with a little Googling, is https://www.open-mpi.org/faq/?
>> category=tcp the best place to look for this kind of information and any
>> other tweaks I may want to try (or if there's a better FAQ out there,
>> please let me know)?
>> There is only eth0 on my machines so nothing to tweak there (though good
>> to know for the future). I also didn't see any improvement by specifying
>> more sockets per instance. But, your initial suggestion had a major impact.
>> In general I try to stay relatively up to date with my Open MPI version;
>> I'll be extra motivated to upgrade to 2.1.2 so that I don't have to
>> remember to set these --mca flags on the command line. :o)
>> -Adam
>>
>> On Sun, Jul 9, 2017 at 9:26 AM, Gilles Gouaillardet <
>> gilles.gouaillar...@gmail.com <mailto:gilles.gouaillar...@gmail.com>>
>> wrote:
>>
>>     Adam,
>>
>>     at first, you need to change the default send and receive socket
>>     buffers :
>>     mpirun --mca btl_tcp_sndbuf 0 --mca btl_tcp_rcvbuf 0 ...
>>     /* note this will be the default from Open MPI 2.1.2 */
>>
>>     hopefully, that will be enough to greatly improve the bandwidth for
>>     large messages.
>>
>>
>>     generally speaking, i recommend you use the latest (e.g. Open MPI
>>     2.1.1) available version
>>
>>     how many interfaces can be used to communicate between hosts ?
>>     if there is more than one (for example a slow and a fast one), you'd
>>     rather only use the fast one.
>>     for example, if eth0 is the fast interface, that can be achieved with
>>     mpirun --mca btl_tcp_if_include eth0 ...
>>
>>     also, you might be able to achieve better results by using more than
>>     one socket on the fast interface.
>>     for example, if you want to use 4 sockets per interface
>>     mpirun --mca btl_tcp_links 4 ...
>>
>>
>>
>>     Cheers,
>>
>>     Gilles
>>
>>     On Sun, Jul 9, 2017 at 10:10 PM, Adam Sylvester <op8...@gmail.com
>>     <mailto:op8...@gmail.com>> wrote:
>>     > I am using Open MPI 2.1.0 on RHEL 7.  My application has one
>>     unavoidable
>>     > pinch point where a large amount of data needs to be transferred
>>     (about 8 GB
>>     > of data needs to be both sent to and received all other ranks),
>>     and I'm
>>     > seeing worse performance than I would expect; this step has a
>>     major impact
>>     > on my overall runtime.  In the real application, I am using
>>     MPI_Alltoall()
>>     > for this step, but for the purpose of a simple benchmark, I
>>     simplified to
>>     > simply do a single MPI_Send() / MPI_Recv() between two ranks of
>>     a 2 GB
>>     > buffer.
>>     >
>>     > I'm running this in AWS with instances that have 10 Gbps
>>     connectivity in the
>>     > same availability zone (according to tracepath, there are no
>>     hops between
>>     > them) and MTU set to 8801 bytes.  Doing a non-MPI benchmark of
>>     sending data
>>     > directly over TCP between these two instances, I reliably get
>>     around 4 Gbps.
>>     > Between these same two instances with MPI_Send() / MPI_Recv(), I
>>     reliably
>>     > get around 2.4 Gbps.  This seems like a major performance
>>     degradation for a
>>     > single MPI operation.
>>     >
>>     > I compiled Open MPI 2.1.0 with gcc 4.9.1 and default settings.  I'm
>>     > connecting between instances via ssh and using I assume TCP for
>>     the actual
>>     > network transfer (I'm not setting any special command-line or
>>     programmatic
>>     > settings).  The actual command I'm running is:
>>     > mpirun -N 1 --bind-to none --hostfile hosts.txt my_app
>>     >
>>     > Any advice on other things to test or compilation and/or runtime
>>     flags to
>>     > set would be much appreciated!
>>     > -Adam
>>     >
>>     > _______________________________________________
>>     > users mailing list
>>     > users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>     > https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>     <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>     _______________________________________________
>>     users mailing list
>>     users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
>>     https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>     <https://rfd.newmexicoconsortium.org/mailman/listinfo/users>
>>
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to