Gilles suggested your best next course of action; time the MPI_Bcast and 
MPI_Barrier calls and see if there’s a non-linear scaling effect as you 
increase group size.

You mention that you’re using m3.large instances; while this isn’t the list for 
in-depth discussion about EC2 instances (the AWS Forums are better for that), 
I’ll note that unless you’re tied to m3 for organizational or reserved instance 
reasons, you’ll probably be happier on another instance type.  m3 was one of 
the last instance families released which does not support Enhanced Networking. 
 There’s significantly more jitter and latency in the m3 network stack compared 
to platforms which support Enhanced Networking (including the m4 platform).  If 
networking costs are causing your scaling problems, the first step will be 
migrating instance types.

Brian

> On Oct 23, 2017, at 4:19 AM, Gilles Gouaillardet 
> <gilles.gouaillar...@gmail.com> wrote:
> 
> Konstantions,
> 
> A simple way is to rewrite MPI_Bcast() and insert timer and
> PMPI_Barrier() before invoking the real PMPI_Bcast().
> time spent in PMPI_Barrier() can be seen as time NOT spent on actual
> data transmission,
> and since all tasks are synchronized upon exit, time spent in
> PMPI_Bcast() can be seen as time spent on actual data transmission.
> this is not perfect, but this is a pretty good approximation.
> You can add extra timers so you end up with an idea of how much time
> is spent in PMPI_Barrier() vs PMPI_Bcast().
> 
> Cheers,
> 
> Gilles
> 
> On Mon, Oct 23, 2017 at 4:16 PM, Konstantinos Konstantinidis
> <kostas1...@gmail.com> wrote:
>> In any case, do you think that the time NOT spent on actual data
>> transmission can impact the total time of the broadcast especially when
>> there are so many groups that communicate (please refer to the numbers I
>> gave before if you want to get an idea).
>> 
>> Also, is there any way to quantify this impact i.e. to measure the time not
>> spent on actual data transmissions?
>> 
>> Kostas
>> 
>> On Fri, Oct 20, 2017 at 10:32 PM, Jeff Hammond <jeff.scie...@gmail.com>
>> wrote:
>>> 
>>> Broadcast is collective but not necessarily synchronous in the sense you
>>> imply. If you broadcast message size under the eager limit, the root may
>>> return before any non-root processes enter the function. Data transfer may
>>> happen prior to processes entering the function. Only rendezvous forces
>>> synchronization between any two processes but there may still be asynchrony
>>> between different levels of the broadcast tree.
>>> 
>>> Jeff
>>> 
>>> On Fri, Oct 20, 2017 at 3:27 PM Konstantinos Konstantinidis
>>> <kostas1...@gmail.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> I am running some tests on Amazon EC2 and they require a lot of
>>>> communication among m3.large instances.
>>>> 
>>>> I would like to give you an idea of what kind of communication takes
>>>> place. There are 40 m3.large instances. Now, 28672 groups of 5 instances 
>>>> are
>>>> formed in a specific manner (let's skip the details). Within each group,
>>>> each instance broadcasts some unsigned char data to the other 4 instances 
>>>> in
>>>> the group. So within each group, exactly 5 broadcasts take place.
>>>> 
>>>> The problem is that if I increase the size of the group from 5 to 10
>>>> there is significant delay in terms of transmission rate while, based on
>>>> some theoretical results, this is not reasonable.
>>>> 
>>>> I want to check if one of the reasons that this is happening is due to
>>>> the time needed for the instances to synchronize when they call MPI_Bcast()
>>>> since it's a collective function. As far as I know, all of the machines in
>>>> the broadcast need to call it and then synchronize until the actual data
>>>> transfer starts. Is there any way to measure this synchronization time?
>>>> 
>>>> The code is in C++ and the MPI installed is described in the attached
>>>> file.
>>>> _______________________________________________
>>>> users mailing list
>>>> users@lists.open-mpi.org
>>>> https://lists.open-mpi.org/mailman/listinfo/users
>>> 
>>> --
>>> Jeff Hammond
>>> jeff.scie...@gmail.com
>>> http://jeffhammond.github.io/
>> 
>> 
>> 
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to