Hi Ralph,
I tested it with the trunk r29228. I still have the following problem. Now, it
even spawns the daemon on the new node through torque but then suddently quits.
The following is the output. Can you please have a look?
Thanks
Suraj
[grsacc20:04511] [[6253,0],0] plm:base:receive process
Your output shows that it launched your apps, but they exited. The error is
reported here, though it appears we aren't flushing the message out before
exiting due to a race condition:
> [grsacc20:04511] 1 more process has sent help message help-mpi-btl-openib.txt
> / no active ports found
Here
Hi Ralph,
I always got this output from any MPI job that ran on our nodes. There seems to
be a problem somewhere but it never stopped the applications from running. But
anyway, I ran it again now with only tcp and excluded the infiniband and I get
the same output again. Except that this time,
Afraid I don't see the problem offhand - can you add the following to your cmd
line?
-mca state_base_verbose 10 -mca errmgr_base_verbose 10
Thanks
Ralph
On Sep 24, 2013, at 6:35 AM, Suraj Prabhakaran
wrote:
> Hi Ralph,
>
> I always got this output from any MPI job that ran on our nodes. Th
Hi Ralph,
Output attached in a file.
Thanks a lot!
Best,
Suraj
{\rtf1\ansi\ansicpg1252\cocoartf1038\cocoasubrtf360
{\fonttbl\f0\fswiss\fcharset0 Helvetica;}
{\colortbl;\red255\green255\blue255;}
\paperw11900\paperh16840\margl1440\margr1440\vieww30340\viewh23120\viewkind0
\pard\tx566\tx1133\tx170
I'm going to need a little help here. The problem is that you launch two new
daemons, and one of them exits immediately because it thinks it lost the
connection back to mpirun - before it even gets a chance to create it.
Can you give me a little more info as to exactly what you are doing? Perhap
Hi Ralph,
So here is what I do. I spawn just a "single" process on a new node which is
basically not in the $PBS_NODEFILE list.
My $PBS_NODEFILE list contains
grsacc20
grsacc19
I then start the app with just 2 processes. So one host gets one process and
they are successfully spawned through th
What I find puzzling is that I don't see any output indicating that you went
thru the Torque launcher to launch the daemons - not a peep of debug output.
This makes me suspicious that something else is going on. Are you sure you sent
me all the output?
Try adding -novm to your mpirun cmd line a