Re: OMP

2019-06-28 Thread Pedro Larroy
ation, static construction and thread creation using pthread_atfork, I think we are mixing all those difficult and subtle actions with side effects and we have the perfect storm, I think one of the possibilities is that two threads can initialize openmp at the same time, or at least if O

Re: OMP

2019-06-25 Thread Chris Olivier
1) I don't see how that code could cause reentrancy problems in omp. It doesn't make any OMP calls at all. Still doesn't look related to me. Setting an environment variable probably doesn't even do anything, because: a) It probably doesn't check the environment v

Re: OMP

2019-06-25 Thread Pedro Larroy
Nobody claimed that the original lockup has to do with OMP, but the fix caused re-entrancy into OMP initialization as explained below. So I agree with your statement that the bug that using pthread_atfork was fixing is not related with OMP, but the fix is causing interactions with OMP as described

Re: OMP

2019-06-25 Thread Chris Olivier
The call stacks there are mostly associated with the execution engine threads, which are not OMP threads. That lockup doesn't look to me to be related to OMP -- the execution engine uses its own thread pool logic -- I'm pretty familiar with that part of the code. Unless I am missing

Re: OMP

2019-06-25 Thread Chris Olivier
That doesnt look like it has anything to do with omp On Mon, Jun 24, 2019 at 6:40 PM kellen sunderland < kellen.sunderl...@gmail.com> wrote: > I remember this hang as well, it was pretty hard to reproduce IIRC. I > believe the stacks for the hang are here: > https://

Re: OMP

2019-06-25 Thread Pedro Larroy
was forking and the engine was being re-entered. If that > >> situation is now happening anymore it might not be needed any longer. > >> I don't think we know the cause why there was a fork inside cuda, so > >> the code has grown around a fix for an issue which its r

Re: OMP

2019-06-24 Thread kellen sunderland
n is now happening anymore it might not be needed any longer. >> I don't think we know the cause why there was a fork inside cuda, so >> the code has grown around a fix for an issue which its root cause was >> not understood, and side effects which this fix caused aft

Re: OMP

2019-06-24 Thread kellen sunderland
root cause was > not understood, and side effects which this fix caused afterwards. > > My build uses MKL+LLVM OMP+DEBUG as seen in the container provided in > the link above, no libgomp. > > I didn't try the Make build. > > I would refactor the code linked above an

Re: OMP

2019-06-24 Thread Pedro Larroy
ation is now happening anymore it might not be needed any longer. I don't think we know the cause why there was a fork inside cuda, so the code has grown around a fix for an issue which its root cause was not understood, and side effects which this fix caused afterwards. My build uses MKL+LLVM

Re: OMP

2019-06-24 Thread Chris Olivier
one major advantage of intel/llvm omp is that it spawns a new thread pool after fork if a thread pool was already created. this is so that omp can be used in the forked processes. libgomp doesn’t do this so it’ll just lock up if you try to do omp in the forked process. is your build linking

Re: OMP

2019-06-24 Thread Pedro Larroy
nning unit tests. In my view, the root cause of the assertion is that we are re-entering OMP initialization when spawning threads on the following code through pthread_at_fork https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L58 This causes double initialization of the

Re: OMP

2019-06-24 Thread Chris Olivier
, 2019 at 1:22 PM Pedro Larroy wrote: > Added a dockerfile, and reports of a crash in my local machine when > running MKL+OMP+DEBUG, with Anton's branch the crash happened as well. > I couldn't reproduce the crash on my EC2 machine: > Added the backtrace of the crash as well.

Re: OMP

2019-06-24 Thread Pedro Larroy
Added a dockerfile, and reports of a crash in my local machine when running MKL+OMP+DEBUG, with Anton's branch the crash happened as well. I couldn't reproduce the crash on my EC2 machine: Added the backtrace of the crash as well. https://github.com/apache/incubator-mxnet/issues/10856

Re: OMP

2019-06-20 Thread Marco de Abreu
pinpoint the exact parts where people disagree or misunderstood each other. -Marco Pedro Larroy schrieb am Do., 20. Juni 2019, 21:47: > I can confirm that we are linking with two versions of omp, I'm > gaining more clarity into this topic, but I have still questions, the > facts

Re: OMP

2019-06-20 Thread Pedro Larroy
I can confirm that we are linking with two versions of omp, I'm gaining more clarity into this topic, but I have still questions, the facts that I got so far are the folllowing: * #1: We are linking with two versions of omp, intel's omp and llvm openmp when building with MKL enabled

Re: OMP

2019-06-19 Thread kellen sunderland
"if you’re linking in two then you’re doing something wrong." Correct, that's one thing I believe we've got consensus on. So let's call that out as a bug to be fixed. Let's move forward with some reproducible numbers and then discuss the pros / cons of which pa

Re: OMP

2019-06-19 Thread Pedro Larroy
two then you’re doing something wrong. You can see by > my email yesterday that only one is linked in. This is also the case with > the mkl version built by the Makefile — only the Intel OMP library is used > (no libgomp). > > That being said, Do you have clear evidence that using Int

Re: OMP

2019-06-19 Thread Chris Olivier
if you’re linking in two then you’re doing something wrong. You can see by my email yesterday that only one is linked in. This is also the case with the mkl version built by the Makefile — only the Intel OMP library is used (no libgomp). That being said, Do you have clear evidence that using

Re: OMP

2019-06-19 Thread Pedro Larroy
> > Hi, Chris! > > > > Stas here - I've gathered that performance data. > > Sure thing, I can be wrong, but please elaborate a bit on what we are > > missing. > > Be assured, intentional misdirection was never a case. > > > > Thanks a lot for bei

Re: OMP

2019-06-19 Thread kellen sunderland
. > Sure thing, I can be wrong, but please elaborate a bit on what we are > missing. > Be assured, intentional misdirection was never a case. > > Thanks a lot for being constructive. > > > Turning Intel OMP on and off (and MKL as well, since it tends to pull in > omp, depend

Re: OMP

2019-06-19 Thread Tsukrov, Stanislav
Hi, Chris! Stas here - I've gathered that performance data. Sure thing, I can be wrong, but please elaborate a bit on what we are missing. Be assured, intentional misdirection was never a case. Thanks a lot for being constructive. > Turning Intel OMP on and off (and MKL as well, since

Re: OMP

2019-06-18 Thread Pedro Larroy
rs not really > understanding what they are building, or possibly intentional misdirection): > > Turning Intel OMP on and off (and MKL as well, since it tends to pull in > omp, depending which one is linked in). > There is a HUGE difference. This is consistent with my experience bef

Re: OMP

2019-06-18 Thread Per da Silva
; understanding what they are building, or possibly intentional > misdirection): > > Turning Intel OMP on and off (and MKL as well, since it tends to pull in > omp, depending which one is linked in). > There is a HUGE difference. This is consistent with my experience before > when it

OMP

2019-06-18 Thread Chris Olivier
Intel OMP on and off (and MKL as well, since it tends to pull in omp, depending which one is linked in). There is a HUGE difference. This is consistent with my experience before when it was added. default mnist: python ../example/image-classification/train_mnist.py INFO:root:start with arguments