Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-12-02 Thread Sylvain Jeaugey
Ok, so I tried with RHEL5 and I get the same (even at 6 nodes) : when setting ORTE_RELAY_DELAY to 1, I get the deadlock systematically with the typical stack. Without my "reproducer patch", 80 nodes was the lower bound to reproduce the bug (and you needed a couple of runs to get it). But since

[OMPI devel] Mpi-request discussion

2009-12-02 Thread Jeff Squyres (jsquyres)
Reminder for people to fill out the doodle if you want to be on the call to discuss mpi-request issues next week. Please fill it out by tomorrow (thurs) cob - I'll pick a time and sretup a call on fri morning. -jms Sent from my PDA. No type good.

Re: [OMPI devel] === CREATE FAILURE (v1.4) ===

2009-12-02 Thread Jeff Squyres
Oops. This was a mistake in how I initially setup the v1.4 nightly builds yesterday (i.e., a local config error on eddie, the machine that makes the nightly builds). I'll fix now... On Dec 1, 2009, at 9:00 PM, MPI Team wrote: > > ERROR: Command returned a non-zero exist status (v1.4): >

[OMPI devel] [PATCH] Not optimal SRQ resource allocation

2009-12-02 Thread Vasily Philipov
The attach patch should resolve the long pending issue that we have on our track https://svn.open-mpi.org/trac/ompi/ticket/1912. The issue: As process of OpenIB BTL creation we also create set of SRQs and corresponding receive fragments are allocated and posted on all SRQs. It mean that a pro

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-12-02 Thread Ralph Castain
I'm sorry, Sylvain - I simply cannot replicate this problem (tried yet another slurm system): ./configure --prefix=blah --with-platform=contrib/platform/iu/odin/debug [rhc@odin ~]$ salloc -N 16 tcsh salloc: Granted job allocation 75294 [rhc@odin mpi]$ mpirun -pernode ./hello Hello, World, I am 1