On Dec 15, 2013, at 15:40 , Ralph Castain wrote:
> Not true, George - look more closely at the code. We only retrieve the
> hostname if the number of procs is low. Otherwise, we do *not* retrieve it
> until we do a modex_recv, and thus the debug is now broken at scale. This was
> required for
On Dec 15, 2013, at 12:08 PM, George Bosilca wrote:
> On Dec 15, 2013, at 14:36 , Ralph Castain wrote:
>
>> Sure you can - just find the ompi_proc_t without setting the thread lock
>> since (as you point out) it is already being held.
>
> This is hardly thread-safe for all the cases where th
On Dec 15, 2013, at 14:36 , Ralph Castain wrote:
> Sure you can - just find the ompi_proc_t without setting the thread lock
> since (as you point out) it is already being held.
This is hardly thread-safe for all the cases where this function is not called
while owning the lock (aka all the mod
Sure you can - just find the ompi_proc_t without setting the thread lock since
(as you point out) it is already being held. Or we could pass the ompi_proc_t
into the call, if necessary. Either way should work.
On Dec 15, 2013, at 11:30 AM, George Bosilca wrote:
> The current code is correct. I
The current code is correct. If the goal is to continue to retrieve the
proc_name in a call that is asking for something else, I don’t see how you can
remove this circular dependency between setting and using the ompi_procs.
George.
On Dec 15, 2013, at 13:52 , Ralph Castain wrote:
> Okay -
Okay - then let's correct the code rather than lose the debug. I'll fix that
problem.
Thanks
Ralph
On Dec 15, 2013, at 9:54 AM, George Bosilca wrote:
> I understand your reasons but the code as it was in the trunk is not correct.
> In most of the cases when you reach one of the ompi_rte_db_fe
I understand your reasons but the code as it was in the trunk is not correct.
In most of the cases when you reach one of the ompi_rte_db_fetch calls, you are
setting up an ompi_proc … which means you own the ompi_proc_lock mutex. As the
ompi_rte_db_fetch was calling back into the proc infrastruc
This actually creates a bit of a problem. The reason we did this was because
the OMPI-layer "show-help" calls want to report the hostname of the proc. Since
we don't retrieve that info by default, the show-help calls all fail due to a
NULL pointer.
Nathan tried wrapping all the show-help calls
We have a subtle bug with the atomic selection. On Linux the CMA support is
linked together with the OPAL_ASSEMBLY_ARCH. If the BUILTIN atomics are
enabled, and we are on Linux and we need to define the CMA syscall#
(OMPI_BTL_SM_CMA_NEED_SYSCALL_DEFS), the opal/include/opal/sys/cma.h file is
un
Philipp,
Thanks for providing the config file. Based on it’s content I was able to
replicate your issue. You just unearthed a huge bug in our atomic support and
handling, one that will take some time to get fixed completely. Meanwhile, I
have pushed a partial fix in the trunk (29915, 29916).
10 matches
Mail list logo