RDMACM creates the same QPs with the same tunings as OOB, so I don't see how
CPC may effect on performance.
Pavel (Pasha) Shamis
---
Application Performance Tools Group
Computer Science and Math Division
Oak Ridge National Laboratory
On Jan 13, 2011, at 2:15 PM, Jeff Squyres wrote:
> +1 on
On Jan 13, 2011, at 14:08 , Jeff Squyres wrote:
> Great!
>
> I see in your other mail that you pulled something from MPICH2 to make this
> work.
>
> Does that mean that there's a even-newer version of ROMIO that we should pull
> in its entirety? It's a little risky to pull most stuff from on
+1 on what Pasha said -- if using rdmacm fixes the problem, then there's
something else nefarious going on...
You might want to check padb with your hangs to see where all the processes are
hung to see if anything obvious jumps out. I'd be surprised if there's a bug
in the oob cpc; it's been a
Great!
I see in your other mail that you pulled something from MPICH2 to make this
work.
Does that mean that there's a even-newer version of ROMIO that we should pull
in its entirety? It's a little risky to pull most stuff from one released
version of ROMIO and then more stuff from another re
This problem of assertion is now solved by a patch in ROMIO just
commited in http://bitbucket.org/devezep/new-romio-for-openmpi
I don't know any other problem in this porting of ROMIO.
Pascal
Pascal Deveze a écrit :
Jeff Squyres a écrit :
On Dec 16, 2010, at 3:31 AM, Pascal Deveze wrote:
A new patch in ROMIO solves this problem
Thanks to Dave.
Pascal
Dave Goodell a écrit :
Hmm... Apparently I was too optimistic about my untested patch. I'll work with
Rob this afternoon to straighten this out.
-Dave
On Jan 10, 2011, at 5:53 AM CST, Pascal Deveze wrote:
Dave,
Your propo
For the moment, that's true.
Abhishek's working on bringing over SOS and the notifier...
On Jan 12, 2011, at 5:57 PM, Ralph Castain wrote:
> You also have to remove all references to OPAL SOS...
>
>
> On Jan 12, 2011, at 1:25 PM, Jeff Squyres wrote:
>
>> I back-ported the trunk's paffinity
Try manually specifying the collective component "-mca coll tuned"
You seem to be using the "sync" collective component, any stale mca param
files lying around ?
--Nysal
On Tue, Jan 11, 2011 at 6:28 PM, Doron Shoham wrote:
> Hi
>
> All machines on the setup are IDataPlex with Nehalem 12 cores p