ation, static construction and thread
creation using pthread_atfork, I think we are mixing all those
difficult and subtle actions with side effects and we have the perfect
storm,
I think one of the possibilities is that two threads can initialize
openmp at the same time, or at least if O
1) I don't see how that code could cause reentrancy problems in omp. It
doesn't make any OMP calls at all. Still doesn't look related to me.
Setting an environment variable probably doesn't even do anything, because:
a) It probably doesn't check the environment v
Nobody claimed that the original lockup has to do with OMP, but the
fix caused re-entrancy into OMP initialization as explained below. So
I agree with your statement that the bug that using pthread_atfork was
fixing is not related with OMP, but the fix is causing interactions
with OMP as described
The call stacks there are mostly associated with the execution engine
threads, which are not OMP threads. That lockup doesn't look to me to be
related to OMP -- the execution engine uses its own thread pool logic --
I'm pretty familiar with that part of the code. Unless I am missing
That doesnt look like it has anything to do with omp
On Mon, Jun 24, 2019 at 6:40 PM kellen sunderland <
kellen.sunderl...@gmail.com> wrote:
> I remember this hang as well, it was pretty hard to reproduce IIRC. I
> believe the stacks for the hang are here:
> https://
was forking and the engine was being re-entered. If that
> >> situation is now happening anymore it might not be needed any longer.
> >> I don't think we know the cause why there was a fork inside cuda, so
> >> the code has grown around a fix for an issue which its r
n is now happening anymore it might not be needed any longer.
>> I don't think we know the cause why there was a fork inside cuda, so
>> the code has grown around a fix for an issue which its root cause was
>> not understood, and side effects which this fix caused aft
root cause was
> not understood, and side effects which this fix caused afterwards.
>
> My build uses MKL+LLVM OMP+DEBUG as seen in the container provided in
> the link above, no libgomp.
>
> I didn't try the Make build.
>
> I would refactor the code linked above an
ation is now happening anymore it might not be needed any longer.
I don't think we know the cause why there was a fork inside cuda, so
the code has grown around a fix for an issue which its root cause was
not understood, and side effects which this fix caused afterwards.
My build uses MKL+LLVM
one major advantage of intel/llvm omp is that it spawns a new thread pool
after fork if a thread pool was already created. this is so that omp can be
used in the forked processes. libgomp doesn’t do this so it’ll just lock up
if you try to do omp in the forked process.
is your build linking
nning unit tests.
In my view, the root cause of the assertion is that we are re-entering
OMP initialization when spawning threads on the following code through
pthread_at_fork
https://github.com/apache/incubator-mxnet/blob/master/src/initialize.cc#L58
This causes double initialization of the
, 2019 at 1:22 PM Pedro Larroy
wrote:
> Added a dockerfile, and reports of a crash in my local machine when
> running MKL+OMP+DEBUG, with Anton's branch the crash happened as well.
> I couldn't reproduce the crash on my EC2 machine:
> Added the backtrace of the crash as well.
Added a dockerfile, and reports of a crash in my local machine when
running MKL+OMP+DEBUG, with Anton's branch the crash happened as well.
I couldn't reproduce the crash on my EC2 machine:
Added the backtrace of the crash as well.
https://github.com/apache/incubator-mxnet/issues/10856
pinpoint the exact parts where people disagree or misunderstood each
other.
-Marco
Pedro Larroy schrieb am Do., 20. Juni 2019,
21:47:
> I can confirm that we are linking with two versions of omp, I'm
> gaining more clarity into this topic, but I have still questions, the
> facts
I can confirm that we are linking with two versions of omp, I'm
gaining more clarity into this topic, but I have still questions, the
facts that I got so far are the folllowing:
* #1: We are linking with two versions of omp, intel's omp and llvm
openmp when building with MKL enabled
"if you’re linking in two then you’re doing something wrong." Correct,
that's one thing I believe we've got consensus on. So let's call that out
as a bug to be fixed.
Let's move forward with some reproducible numbers and then discuss the pros
/ cons of which pa
two then you’re doing something wrong. You can see by
> my email yesterday that only one is linked in. This is also the case with
> the mkl version built by the Makefile — only the Intel OMP library is used
> (no libgomp).
>
> That being said, Do you have clear evidence that using Int
if you’re linking in two then you’re doing something wrong. You can see by
my email yesterday that only one is linked in. This is also the case with
the mkl version built by the Makefile — only the Intel OMP library is used
(no libgomp).
That being said, Do you have clear evidence that using
> > Hi, Chris!
> >
> > Stas here - I've gathered that performance data.
> > Sure thing, I can be wrong, but please elaborate a bit on what we are
> > missing.
> > Be assured, intentional misdirection was never a case.
> >
> > Thanks a lot for bei
.
> Sure thing, I can be wrong, but please elaborate a bit on what we are
> missing.
> Be assured, intentional misdirection was never a case.
>
> Thanks a lot for being constructive.
>
> > Turning Intel OMP on and off (and MKL as well, since it tends to pull in
> omp, depend
Hi, Chris!
Stas here - I've gathered that performance data.
Sure thing, I can be wrong, but please elaborate a bit on what we are missing.
Be assured, intentional misdirection was never a case.
Thanks a lot for being constructive.
> Turning Intel OMP on and off (and MKL as well, since
rs not really
> understanding what they are building, or possibly intentional misdirection):
>
> Turning Intel OMP on and off (and MKL as well, since it tends to pull in
> omp, depending which one is linked in).
> There is a HUGE difference. This is consistent with my experience bef
; understanding what they are building, or possibly intentional
> misdirection):
>
> Turning Intel OMP on and off (and MKL as well, since it tends to pull in
> omp, depending which one is linked in).
> There is a HUGE difference. This is consistent with my experience before
> when it
Intel OMP on and off (and MKL as well, since it tends to pull in
omp, depending which one is linked in).
There is a HUGE difference. This is consistent with my experience before
when it was added.
default mnist:
python ../example/image-classification/train_mnist.py
INFO:root:start with arguments
24 matches
Mail list logo