Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread George Bosilca
On Dec 15, 2013, at 15:40 , Ralph Castain wrote: > Not true, George - look more closely at the code. We only retrieve the > hostname if the number of procs is low. Otherwise, we do *not* retrieve it > until we do a modex_recv, and thus the debug is now broken at scale. This was > required for

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread Ralph Castain
On Dec 15, 2013, at 12:08 PM, George Bosilca wrote: > On Dec 15, 2013, at 14:36 , Ralph Castain wrote: > >> Sure you can - just find the ompi_proc_t without setting the thread lock >> since (as you point out) it is already being held. > > This is hardly thread-safe for all the cases where th

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread George Bosilca
On Dec 15, 2013, at 14:36 , Ralph Castain wrote: > Sure you can - just find the ompi_proc_t without setting the thread lock > since (as you point out) it is already being held. This is hardly thread-safe for all the cases where this function is not called while owning the lock (aka all the mod

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread Ralph Castain
Sure you can - just find the ompi_proc_t without setting the thread lock since (as you point out) it is already being held. Or we could pass the ompi_proc_t into the call, if necessary. Either way should work. On Dec 15, 2013, at 11:30 AM, George Bosilca wrote: > The current code is correct. I

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread George Bosilca
The current code is correct. If the goal is to continue to retrieve the proc_name in a call that is asking for something else, I don’t see how you can remove this circular dependency between setting and using the ompi_procs. George. On Dec 15, 2013, at 13:52 , Ralph Castain wrote: > Okay -

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread Ralph Castain
Okay - then let's correct the code rather than lose the debug. I'll fix that problem. Thanks Ralph On Dec 15, 2013, at 9:54 AM, George Bosilca wrote: > I understand your reasons but the code as it was in the trunk is not correct. > In most of the cases when you reach one of the ompi_rte_db_fe

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread George Bosilca
I understand your reasons but the code as it was in the trunk is not correct. In most of the cases when you reach one of the ompi_rte_db_fetch calls, you are setting up an ompi_proc … which means you own the ompi_proc_lock mutex. As the ompi_rte_db_fetch was calling back into the proc infrastruc

Re: [OMPI devel] [OMPI svn] svn:open-mpi r29917 - trunk/ompi/mca/rte/orte

2013-12-15 Thread Ralph Castain
This actually creates a bit of a problem. The reason we did this was because the OMPI-layer "show-help" calls want to report the hostname of the proc. Since we don't retrieve that info by default, the show-help calls all fail due to a NULL pointer. Nathan tried wrapping all the show-help calls

Re: [OMPI devel] Test suite of openMPI 1.7.3 fails

2013-12-15 Thread George Bosilca
We have a subtle bug with the atomic selection. On Linux the CMA support is linked together with the OPAL_ASSEMBLY_ARCH. If the BUILTIN atomics are enabled, and we are on Linux and we need to define the CMA syscall# (OMPI_BTL_SM_CMA_NEED_SYSCALL_DEFS), the opal/include/opal/sys/cma.h file is un

Re: [OMPI devel] Test suite of openMPI 1.7.3 fails

2013-12-15 Thread George Bosilca
Philipp, Thanks for providing the config file. Based on it’s content I was able to replicate your issue. You just unearthed a huge bug in our atomic support and handling, one that will take some time to get fixed completely. Meanwhile, I have pushed a partial fix in the trunk (29915, 29916).