Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-26 Thread Ralph Castain
Just to clarify something: I have been testing with the trunk, NOT the 1.5 branch. I haven't even bothered to look at that code since it was branched. >From what little I have heard plus what I (and others) have done since the >branch, I strongly suspect a complete ORTE refresh will be required

Re: [OMPI devel] Deadlocks with new (routed) orted launch algorithm

2009-11-26 Thread Ralph Castain
Hi Sylvain Well, I hate to tell you this, but I cannot reproduce the "bug" even with this code in ORTE no matter what value of ORTE_RELAY_DELAY I use. The system runs really slow as I increase the delay, but it completes the job just fine. I ran jobs across 16 nodes on a slurm machine, 1-4 ppn,

Re: [OMPI devel] mca_btl_openib_post_srr() posts to an uncreated SRQwhen ibv_resize_cq() has failed

2009-11-26 Thread Nadia Derbey
On Mon, 2009-10-26 at 15:06 -0700, Paul H. Hargrove wrote: > Retrying w/ fewer CQ entires as Jeff describes is a good idea to help > ensure that EINVAL actually does signify that the count exceeds the max > instead of just assuming this is so). If it actually was signifying > some other error c