On Mar 22, 2011, at 4:03 PM, George Bosilca wrote:
>
> On Mar 22, 2011, at 14:20 , Ralph Castain wrote:
>
>> Hi folks
>>
>> For those interested in trying it, I completed backporting the multicast
>> grpcomm module from my branch over the last weekend. This allows all modex
>> and other ORTE
On Mar 22, 2011, at 14:20 , Ralph Castain wrote:
> Hi folks
>
> For those interested in trying it, I completed backporting the multicast
> grpcomm module from my branch over the last weekend. This allows all modex
> and other ORTE-level collective operations to occur via multicast, which
> si
Hi folks
For those interested in trying it, I completed backporting the multicast
grpcomm module from my branch over the last weekend. This allows all modex and
other ORTE-level collective operations to occur via multicast, which
significantly improves the performance of those operations.
In o
Yes.
That was the problem Ralph. Again, thanks a lot for your help, it was a
silly mistake of mine :).
Best regards.
Hugo Meyer
2011/3/22 Ralph Castain
> The problem is here:
>
> /* Pack the faulty vpid */
> if (ORT
Sounds good.
Would you mind reviewing the CMRs?
https://svn.open-mpi.org/trac/ompi/ticket/2756
https://svn.open-mpi.org/trac/ompi/ticket/2757
Thanks,
Josh
On Mar 22, 2011, at 10:19 AM, George Bosilca wrote:
> Josh,
>
> Your patch (r24551) looks fine. I think you should make a CMR for the 1
Josh,
Your patch (r24551) looks fine. I think you should make a CMR for the 1.4 and
1.5.
Thanks,
george.
On Mar 22, 2011, at 09:04 , Joshua Hursey wrote:
> George,
>
> I agree that it is difficult to come up with a good scenario, outside of
> resilience, in which MPI_Probe would retur
The problem is here:
/* Pack the faulty vpid */
if (ORTE_SUCCESS != (rc =
opal_dss.pack(buffer, &proc, 1, ORTE_NAME))) {
ORTE_ERROR_LOG(rc);
George,
I agree that it is difficult to come up with a good scenario, outside of
resilience, in which MPI_Probe would return an error (other than a bad argument
type of error - which does currently work). I agree with your assessment of the
value of the return code, and that it should trigger t
Thanks again Ralph for your reply.
> There's your problem - that module is run in the daemon, where the
> orte_job_data pointer array isn't used. You have to use the
> orte_local_jobdata and orte_local_children lists instead. So once the HNP
> replies with the jobid, you look up the orte_odls_job