Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-12-11 Thread George Bosilca
Your code looks correct. There are few things I would change to improve it: - There are too many calls to the clock(). I would move the operations on "time" (the variable) outside the outer loop. - Replace the 2 non scalable constructs to gather the 2 times on the root either by an MPI_Reduce(+)

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-12-04 Thread Konstantinos Konstantinidis
Coming back to this discussion after a long time let me clarify a few issues that you have addressed. 1. Yes, the list of communicators in G is ordered in the same way on all processes. 2. I am now using "mcComm != MPI_COMM_NULL" for participation check. I have not seen much improvement but it's

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread George Bosilca
On Tue, Nov 7, 2017 at 6:09 PM, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: > OK, I will try to explain a few more things about the shuffling and I have > attached only specific excerpts of the code to avoid confusion. I have > added many comments. > > First, let me note that this

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread Konstantinos Konstantinidis
OK, I will try to explain a few more things about the shuffling and I have attached only specific excerpts of the code to avoid confusion. I have added many comments. First, let me note that this project is an implementation of the Terasort benchmark with a master node which assigns jobs to the

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread George Bosilca
If each process send a different amount of data, then the operation should be an allgatherv. This also requires that you know the amount each process will send, so you will need a allgather. Schematically the code should look like the following: long bytes_send_count = endata.size * sizeof(long);

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread Konstantinos Konstantinidis
OK, I started implementing the above Allgather() idea without success (segmentation fault). So I will post the problematic lines hare: * comm.Allgather(&(endata.size), 1, MPI::UNSIGNED_LONG_LONG, &(endata_rcv.size), 1, MPI::UNSIGNED_LONG_LONG);* * endata_rcv.data = new unsigned

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-06 Thread George Bosilca
On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: > Hi George, > > First, let me note that the cost of q^(k-1)]*(q-1) communicators was fine > for the values of parameters q,k I am working with. Also, the whole point > of speeding up the shuffling phase

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-05 Thread Konstantinos Konstantinidis
Hi George, First, let me note that the cost of q^(k-1)]*(q-1) communicators was fine for the values of parameters q,k I am working with. Also, the whole point of speeding up the shuffling phase is trying to reduce this number even more (compared to already known implementations) which is a major

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-10-31 Thread George Bosilca
It really depends what are you trying to achieve. If the question is rhetorical: "can I write a code that does in parallel broadcasts on independent groups of processes ?" then the answer is yes, this is certainly possible. If however you add a hint of practicality in your question "can I write an

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-10-31 Thread Konstantinos Konstantinidis
Let me clarify one thing, When I said "there are q-1 groups that can communicate in parallel at the same time" I meant that this is possible at any particular time. So at the beginning we have q-1 groups that could communicate in parallel, then another set of q-1 groups and so on until we exhaust

[OMPI users] Parallel MPI broadcasts (parameterized)

2017-10-31 Thread Konstantinos Konstantinidis
Assume that we have K=q*k nodes (slaves) where q,k are positive integers >= 2. Based on the scheme that I am currently using I create [q^(k-1)]*(q-1) groups (along with their communicators). Each group consists of k nodes and within each group exactly k broadcasts take place (each node broadcasts