int
idea where I can find some info on that? I never found something like that
Best,
Suraj
On Feb 22, 2014, at 6:38 PM, Ralph Castain wrote:
>
> On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Thanks Ralph.
>>
>> I
Ralph Castain wrote:
>
> On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children
>> never returned!
>
> Understood - my point was that the output shows no e
red?
>
> On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Ok, I figured out that it was not a problem with the node grsacc04 because I
>> now conducted the same on totally different set of nodes.
>>
>> I mu
Ok, I figured out that it was not a problem with the node grsacc04 because I
now conducted the same on totally different set of nodes.
I must really say that with --bind-to none option, the program completed "many"
times compared to earlier but still "sometimes" it hangs! Attaching now the
, 2014, at 7:05 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Thanks Ralph!
>>
>> I must have mentioned though. Without the Torque environment, spawning with
>> ssh works ok. But Under the torque environment, not.
>
> Ah, no - you f
specific behavior you see?
>
> You might try the attached program. It's a simple spawn test we use - 1.7.4
> seems happy with it.
>
>
>
> On Feb 20, 2014, at 10:14 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> I am using 1.7.4!
>
I am using 1.7.4!
On Feb 20, 2014, at 7:00 PM, Ralph Castain wrote:
> What OMPI version are you using?
>
> On Feb 20, 2014, at 7:56 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Hello!
>>
>> I am having problem using MPI_Comm_s
Hello!
I am having problem using MPI_Comm_spawn under torque. It doesn't work when
spawning more than 12 processes on various nodes. To be more precise,
"sometimes" it works, and "sometimes" it doesn't!
Here is my case. I obtain 5 nodes, 3 cores per node and my $PBS_NODEFILE looks
like below.
Sorry for the very late reply. Everything works now! Thanks a lot!!
On Sep 25, 2013, at 7:00 PM, Ralph Castain wrote:
> I've committed a fix to the trunk (r29245) and scheduled it for v1.7.3 -
> thanks for the debug info!
>
> Ralph
>
> On Sep 25, 2013, at 5:00 AM,
le more info as to exactly what you are doing? Perhaps
> send me your test code?
>
> On Sep 24, 2013, at 7:48 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Hi Ralph,
>>
>> Output attached in a file.
>> Thanks a lot!
>>
>
e_verbose 10
>
> Thanks
> Ralph
>
> On Sep 24, 2013, at 6:35 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Hi Ralph,
>>
>> I always got this output from any MPI job that ran on our nodes. There seems
>> to be a problem s
Check your
> cables, subnet manager configuration, etc. The openib BTL will be
> ignored for this job.
>
> Local host: %s
>
> Looks like at least one node being used doesn't have an active Infiniband
> port on it?
>
>
> On Sep 24, 2013, at 6:11 AM, Suraj Prabhakaran <sura
>>> both the trunk and current 1.7.3 branch for "add-host" and both worked just
>>> fine. This was on my little test cluster which only has rsh available - no
>>> Torque.
>>>
>>> You might add "-mca plm_base_verbose 5" to your cmd l
e - no Torque.
>
> You might add "-mca plm_base_verbose 5" to your cmd line to get some debug
> output as to the problem.
>
>
> On Sep 21, 2013, at 5:48 PM, Ralph Castain <r...@open-mpi.org> wrote:
>
>>
>> On Sep 21, 2013, at 4:54 P
Dear all,
Really thanks a lot for your efforts. I too downloaded the trunk to check if it
works for my case and as of revision 29215, it works for the original case I
reported. Although it works, I still see the following in the output. Does it
mean anything?
Hi Ralph,
Thanks a lot!!! thats really cool!!
Best,
Suraj
On Sep 15, 2013, at 5:01 PM, Ralph Castain wrote:
> I fixed it and have filed a cmr to move it to 1.7.3
>
> Thanks for your patience, and for reminding me
> Ralph
>
> On Sep 13, 2013, at 12:05 PM, Suraj Prabhakar
and hence not resolved yet. I doubt it
> will make 1.7.3, though if you need it, I'll give it a try.
>
> On Sep 13, 2013, at 7:21 AM, Suraj Prabhakaran <
> suraj.prabhaka...@gmail.com> wrote:
>
> > Hello,
> >
> > Is there a plan to fix the problem with MPI_Inter
Hello,
Is there a plan to fix the problem with MPI_Intercomm_merge with 1.7.3 as
stated in this ticket? We are really in need of this at the moment. Any hints?
We face the following problem.
Parents (x and y) spawn child (z). (all of them execute on separate nodes)
x is the root.
x,y and z do
I am referring to v1.6.
On Sep 12, 2012, at 5:27 PM, Ralph Castain wrote:
> what version of ompi are you referring to?
>
> On Sep 12, 2012, at 8:13 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com>
> wrote:
>
>> Dear all,
>>
>> I observed a
Dear all,
I observed a strange behavior with MPI_Comm_connect and MPI_Comm_disconnect.
In short, after two processes connect to each other with a port and merge to
create a intra comm (rank 0 and rank 1), only one of them (the root) is
thereafter able to reach a third new process through
other tool.
Hope this is useful.
Best,
Suraj Prabhakaran
On 12/21/2010 03:12 PM, Ralph Castain wrote:
Are you using ompi-server for pub/sub, or just letting it default to
mpirun?
You might want to output the return value from lookup_name and
publish_name to see if they match. If they are different, then you
will definitely hang.
I used
Hello,
This is basically a repost of my previous mail regarding problems with
connect/accept and disconnect (**this is not related to spawning,
parent/child**).
I *sometimes* find processes blocking indefinitely at Connect/Accept
calls or at Disconnect calls. I have an example below.
On 12/17/2010 06:24 PM, George Bosilca wrote:
Let me try to round the edges on this one. It is not that we couldn't or wouldn't like to have a more "MPI" compliant
approach on this, but the definition of connected processes in the MPI standard is [kind of] shady. One thing is clear however,
ize();
return 0;
}
In the above simple example, the second printf will not be displayed
clearly indicating that the child is really disconnected from the
parent. However, at exit() of the child, the parent terminates too.
Perhaps there is a way to avoid this kind of auto cleanup?
Thanks,
Suraj Prabhakaran
25 matches
Mail list logo