Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran
int idea where I can find some info on that? I never found something like that Best, Suraj On Feb 22, 2014, at 6:38 PM, Ralph Castain wrote: > > On Feb 22, 2014, at 9:30 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Thanks Ralph. >> >> I

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-22 Thread Suraj Prabhakaran
Ralph Castain wrote: > > On Feb 21, 2014, at 5:55 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Hmm.. but in actual the MPI_Comm_spawn of parents and MPI_Init of children >> never returned! > > Understood - my point was that the output shows no e

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
red? > > On Feb 21, 2014, at 3:31 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Ok, I figured out that it was not a problem with the node grsacc04 because I >> now conducted the same on totally different set of nodes. >> >> I mu

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
Ok, I figured out that it was not a problem with the node grsacc04 because I now conducted the same on totally different set of nodes. I must really say that with --bind-to none option, the program completed "many" times compared to earlier but still "sometimes" it hangs! Attaching now the

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-21 Thread Suraj Prabhakaran
, 2014, at 7:05 PM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Thanks Ralph! >> >> I must have mentioned though. Without the Torque environment, spawning with >> ssh works ok. But Under the torque environment, not. > > Ah, no - you f

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
specific behavior you see? > > You might try the attached program. It's a simple spawn test we use - 1.7.4 > seems happy with it. > > > > On Feb 20, 2014, at 10:14 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> I am using 1.7.4! >

Re: [OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
I am using 1.7.4! On Feb 20, 2014, at 7:00 PM, Ralph Castain wrote: > What OMPI version are you using? > > On Feb 20, 2014, at 7:56 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Hello! >> >> I am having problem using MPI_Comm_s

[OMPI devel] MPI_Comm_spawn under Torque

2014-02-20 Thread Suraj Prabhakaran
Hello! I am having problem using MPI_Comm_spawn under torque. It doesn't work when spawning more than 12 processes on various nodes. To be more precise, "sometimes" it works, and "sometimes" it doesn't! Here is my case. I obtain 5 nodes, 3 cores per node and my $PBS_NODEFILE looks like below.

Re: [OMPI devel] Intercomm Merge

2013-10-02 Thread Suraj Prabhakaran
Sorry for the very late reply. Everything works now! Thanks a lot!! On Sep 25, 2013, at 7:00 PM, Ralph Castain wrote: > I've committed a fix to the trunk (r29245) and scheduled it for v1.7.3 - > thanks for the debug info! > > Ralph > > On Sep 25, 2013, at 5:00 AM,

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
le more info as to exactly what you are doing? Perhaps > send me your test code? > > On Sep 24, 2013, at 7:48 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Hi Ralph, >> >> Output attached in a file. >> Thanks a lot! >> >

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
e_verbose 10 > > Thanks > Ralph > > On Sep 24, 2013, at 6:35 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Hi Ralph, >> >> I always got this output from any MPI job that ran on our nodes. There seems >> to be a problem s

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
Check your > cables, subnet manager configuration, etc. The openib BTL will be > ignored for this job. > > Local host: %s > > Looks like at least one node being used doesn't have an active Infiniband > port on it? > > > On Sep 24, 2013, at 6:11 AM, Suraj Prabhakaran <sura

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
>>> both the trunk and current 1.7.3 branch for "add-host" and both worked just >>> fine. This was on my little test cluster which only has rsh available - no >>> Torque. >>> >>> You might add "-mca plm_base_verbose 5" to your cmd l

Re: [OMPI devel] Intercomm Merge

2013-09-22 Thread Suraj Prabhakaran
e - no Torque. > > You might add "-mca plm_base_verbose 5" to your cmd line to get some debug > output as to the problem. > > > On Sep 21, 2013, at 5:48 PM, Ralph Castain <r...@open-mpi.org> wrote: > >> >> On Sep 21, 2013, at 4:54 P

Re: [OMPI devel] Intercomm Merge

2013-09-21 Thread Suraj Prabhakaran
Dear all, Really thanks a lot for your efforts. I too downloaded the trunk to check if it works for my case and as of revision 29215, it works for the original case I reported. Although it works, I still see the following in the output. Does it mean anything?

Re: [OMPI devel] Intercomm Merge

2013-09-17 Thread Suraj Prabhakaran
Hi Ralph, Thanks a lot!!! thats really cool!! Best, Suraj On Sep 15, 2013, at 5:01 PM, Ralph Castain wrote: > I fixed it and have filed a cmr to move it to 1.7.3 > > Thanks for your patience, and for reminding me > Ralph > > On Sep 13, 2013, at 12:05 PM, Suraj Prabhakar

Re: [OMPI devel] Intercomm Merge

2013-09-13 Thread Suraj Prabhakaran
and hence not resolved yet. I doubt it > will make 1.7.3, though if you need it, I'll give it a try. > > On Sep 13, 2013, at 7:21 AM, Suraj Prabhakaran < > suraj.prabhaka...@gmail.com> wrote: > > > Hello, > > > > Is there a plan to fix the problem with MPI_Inter

[OMPI devel] Intercomm Merge

2013-09-13 Thread Suraj Prabhakaran
Hello, Is there a plan to fix the problem with MPI_Intercomm_merge with 1.7.3 as stated in this ticket? We are really in need of this at the moment. Any hints? We face the following problem. Parents (x and y) spawn child (z). (all of them execute on separate nodes) x is the root. x,y and z do

Re: [OMPI devel] MPI_Comm_connect/accept does not work as it should

2012-09-12 Thread Suraj Prabhakaran
I am referring to v1.6. On Sep 12, 2012, at 5:27 PM, Ralph Castain wrote: > what version of ompi are you referring to? > > On Sep 12, 2012, at 8:13 AM, Suraj Prabhakaran <suraj.prabhaka...@gmail.com> > wrote: > >> Dear all, >> >> I observed a

[OMPI devel] MPI_Comm_connect/accept does not work as it should

2012-09-12 Thread Suraj Prabhakaran
Dear all, I observed a strange behavior with MPI_Comm_connect and MPI_Comm_disconnect. In short, after two processes connect to each other with a port and merge to create a intra comm (rank 0 and rank 1), only one of them (the root) is thereafter able to reach a third new process through

[OMPI devel] Quick fix for MPI_Publish_name

2011-03-04 Thread Suraj Prabhakaran
other tool. Hope this is useful. Best, Suraj Prabhakaran

Re: [OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
On 12/21/2010 03:12 PM, Ralph Castain wrote: Are you using ompi-server for pub/sub, or just letting it default to mpirun? You might want to output the return value from lookup_name and publish_name to see if they match. If they are different, then you will definitely hang. I used

[OMPI devel] Connect/Accept and Disconnect

2010-12-21 Thread Suraj Prabhakaran
Hello, This is basically a repost of my previous mail regarding problems with connect/accept and disconnect (**this is not related to spawning, parent/child**). I *sometimes* find processes blocking indefinitely at Connect/Accept calls or at Disconnect calls. I have an example below.

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-17 Thread Suraj Prabhakaran
On 12/17/2010 06:24 PM, George Bosilca wrote: Let me try to round the edges on this one. It is not that we couldn't or wouldn't like to have a more "MPI" compliant approach on this, but the definition of connected processes in the MPI standard is [kind of] shady. One thing is clear however,

Re: [OMPI devel] Parent terminates when child crashes/terminates (without finalizing)

2010-12-17 Thread Suraj Prabhakaran
ize(); return 0; } In the above simple example, the second printf will not be displayed clearly indicating that the child is really disconnected from the parent. However, at exit() of the child, the parent terminates too. Perhaps there is a way to avoid this kind of auto cleanup? Thanks, Suraj Prabhakaran