Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
Hi Ralph, I tested it with the trunk r29228. I still have the following problem. Now, it even spawns the daemon on the new node through torque but then suddently quits. The following is the output. Can you please have a look? Thanks Suraj [grsacc20:04511] [[6253,0],0] plm:base:receive process

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Ralph Castain
Your output shows that it launched your apps, but they exited. The error is reported here, though it appears we aren't flushing the message out before exiting due to a race condition: > [grsacc20:04511] 1 more process has sent help message help-mpi-btl-openib.txt > / no active ports found Here

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
Hi Ralph, I always got this output from any MPI job that ran on our nodes. There seems to be a problem somewhere but it never stopped the applications from running. But anyway, I ran it again now with only tcp and excluded the infiniband and I get the same output again. Except that this time,

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Ralph Castain
Afraid I don't see the problem offhand - can you add the following to your cmd line? -mca state_base_verbose 10 -mca errmgr_base_verbose 10 Thanks Ralph On Sep 24, 2013, at 6:35 AM, Suraj Prabhakaran wrote: > Hi Ralph, > > I always got this output from any MPI job that ran on our nodes. Th

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
Hi Ralph, Output attached in a file. Thanks a lot! Best, Suraj {\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360 {\fonttbl\f0\fswiss\fcharset0 Helvetica;} {\colortbl;\red255\green255\blue255;} \paperw11900\paperh16840\margl1440\margr1440\vieww30340\viewh23120\viewkind0 \pard\tx566\tx1133\tx170

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Ralph Castain
I'm going to need a little help here. The problem is that you launch two new daemons, and one of them exits immediately because it thinks it lost the connection back to mpirun - before it even gets a chance to create it. Can you give me a little more info as to exactly what you are doing? Perhap

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Suraj Prabhakaran
Hi Ralph, So here is what I do. I spawn just a "single" process on a new node which is basically not in the $PBS_NODEFILE list. My $PBS_NODEFILE list contains grsacc20 grsacc19 I then start the app with just 2 processes. So one host gets one process and they are successfully spawned through th

Re: [OMPI devel] Intercomm Merge

2013-09-24 Thread Ralph Castain
What I find puzzling is that I don't see any output indicating that you went thru the Torque launcher to launch the daemons - not a peep of debug output. This makes me suspicious that something else is going on. Are you sure you sent me all the output? Try adding -novm to your mpirun cmd line a