Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-07 Thread Ralph Castain
I may have finally tracked this down. At least, I can now get the correct devel map to come out, and found a memory corruption issue that only impacted hetero operations. I can’t know if this is the root cause of the problem Bill is seeing, however, as I have no way of actually running the job.

Re: [OMPI users] Error while launching Jobs in LSF with OpenMPI

2015-07-07 Thread Rahul Pisharody
Hello Ralph and everybody, The issue was finally tracked down. It had nothing to do with OpenMPI. The LSF Environment Variable LSF_DJOB_DISABLED was set to 'y'. This was preventing openmpi from launching jobs spanning multiple machines. Thank you all for your hep and suggestions. Thanks, Rahul

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-07 Thread Lane, William
I'm sorry I haven't been able to get the lstopo information for all the nodes, but I had to get the latest version of hwloc installed first. They've even added in some more modern blades that also support hyperthreading, ugh. They've also been doing some memory upgrades as well. I'm trying to get

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-07 Thread Ralph Castain
No need for the lstopo data anymore, Bill - I was able to recreate the situation using some very nice hwloc functions plus your prior descriptions. I'm not totally confident that this fix will resolve the problem but it will clear out at least one problem. We'll just have to see what happens and a