Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-18 Thread Sheng Zha
This vote has been closed. We will make another tag and start vote again. -sz > On Jun 18, 2019, at 5:24 PM, Lin Yuan wrote: > > With the PR https://github.com/apache/incubator-mxnet/pull/15213 I could > verify that building Horovod is successful with MXNet built from source. So > I will remove

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-18 Thread Lin Yuan
With the PR https://github.com/apache/incubator-mxnet/pull/15213 I could verify that building Horovod is successful with MXNet built from source. So I will remove my pervious -1 vote. Best, Lin On Tue, Jun 18, 2019 at 2:10 PM Junru Shao wrote: > Dear community, > > I am happy to share some res

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-18 Thread Junru Shao
Dear community, I am happy to share some results with regard to commit 83d2c2d0e (PR #14192, link: https://github.com/apache/incubator-mxnet/pull/14192) that Pedro mentioned that causes regression. First, using the exact model that Pedro provides, we did rigorous profiling and found out that the

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-13 Thread Pedro Larroy
I reach you in private, the model is not public. We should be able to see this problem in a public model using LSTM I think. On Thu, Jun 13, 2019 at 11:15 AM Junru Shao wrote: > > Hi Pedro, > > Thanks for brining this up! > > Could you provide your model so that we can dig into this? > > Thanks,

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-13 Thread Junru Shao
Hi Pedro, Thanks for brining this up! Could you provide your model so that we can dig into this? Thanks, Junru On Thu, Jun 13, 2019 at 10:33 Pedro Larroy wrote: > I have isolated some of the commits that are causing performance > regressions in wavenet like models: > > Title: 83d2c2d0e:[MXNET

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-13 Thread Pedro Larroy
I have isolated some of the commits that are causing performance regressions in wavenet like models: Title: 83d2c2d0e:[MXNET-1324] Add NaiveRunGraph to imperative utils (#14192) Causes a regression making hybridize with static slower using GPU inference. [0f63659be5070af218095a6a460427d2a1b67aba

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-12 Thread Lai Wei
Hi @dev, I am canceling the vote as the issue Lin discovered require a fix[1] and the solution is not ready yet. It's a general problem when building from source with MXNet, not only impacting horovod use cases. Any help is appreciated. Other issues we are tracking: 1. Regression on hybridize wi

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Pedro Larroy
Tested with CPU, 2.6x slower. comparing master vs 1.4.1. Looks like a general regression. On Tue, Jun 11, 2019 at 2:31 PM Lai Wei wrote: > > Hi guys, > > Thanks for the updates. Currently, we are able to confirm Lin's issue with > Horovod, and there is a fix pending. [1] > Will update later tod

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Aaron Markham
-1 There's an autogenerated file that doesn't get cleaned up in the scala-package folder when you run make clean. This causes the scaladoc step to fail. I'm putting in workaround messaging in the error message and that'll go into master, but if anyone wants to specifically run the scaladocs for 1.5

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Lai Wei
Hi guys, Thanks for the updates. Currently, we are able to confirm Lin's issue with Horovod, and there is a fix pending. [1] Will update later today to see if we need to cancel this vote for the fix. As for the hybridize with static alloc performance regression. IMO it does not need to be a block

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Zhi Zhang
On 2019/06/11 18:53:56, Pedro Larroy wrote: > The stack trace doesn't seem to come from MXNet, do you have more info? > > On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang wrote: > > > > > > > > On 2019/06/11 17:36:09, Pedro Larroy wrote: > > > A bit more background into this: > > > > > > While tu

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Pedro Larroy
Correction, I wanted to say: 1.5 is 33% faster than 1.4.1 when using hybridize without static_alloc and static_shape. We are claiming that static_alloc should improve speed and in this case it makes it worse. Is that a blocker for the release? Pedro. On Tue, Jun 11, 2019 at 10:36 AM Pedro Larro

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Pedro Larroy
The stack trace doesn't seem to come from MXNet, do you have more info? On Tue, Jun 11, 2019 at 11:46 AM Zhi Zhang wrote: > > > > On 2019/06/11 17:36:09, Pedro Larroy wrote: > > A bit more background into this: > > > > While tuning a model using LSTM and convolutions we find that using > > hybri

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Zhi Zhang
On 2019/06/11 17:36:09, Pedro Larroy wrote: > A bit more background into this: > > While tuning a model using LSTM and convolutions we find that using > hybridize with static_alloc and static_shape is 15% slower in the > latest revision vs in version 1.4.1 in which using hybridize with > stat

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Zhang Zhi
-1. Built from source, import mxnet in python cause Segfault. back trace: Thread 1 "python3" received signal SIGSEGV, Segmentation fault. 0x7fff3e8a9f20 in ?? () (gdb) bt #0 0x7fff3e8a9f20 in ?? () #1 0x7fffebbf440c in ReadConfigFile(Configuration&, std::__cxx11::basic_string, std::

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-11 Thread Pedro Larroy
A bit more background into this: While tuning a model using LSTM and convolutions we find that using hybridize with static_alloc and static_shape is 15% slower in the latest revision vs in version 1.4.1 in which using hybridize with static_alloc and static_shape is 10% faster than without. Overwa

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-10 Thread Pedro Larroy
-1 We found a performance regression vs 1.4 related to CachedOp which affects Hybrid forward, which we are looking into. Pedro. On Mon, Jun 10, 2019 at 4:33 PM Lin Yuan wrote: > > -1 (Tentatively until resolved) > > I tried to build MXNet 1.5.0 from source and pip install horovod but got > the

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-10 Thread Lin Yuan
-1 (Tentatively until resolved) I tried to build MXNet 1.5.0 from source and pip install horovod but got the following error: Reproduce: 1) cp make/config.mk . 2) turn on USE_CUDA, USE_CUDNN, USE_NCCL 3) make -j MXNet can build successfully. 4) pip install horovod /home/ubuntu/src/incubator-m

Re: [VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-08 Thread shiwen hu
+1 Lai Wei 于2019年6月9日周日 上午4:12写道: > Dear MXNet community, > > This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0. > Voting on dev@ will start June 8, 23:59:59(PST) and close on June 11, > 23:59:59. > > 1) Link to release notes: > https://cwiki.apache.org/confluence/displa

[VOTE] Release Apache MXNet (incubating) version 1.5.0.rc0

2019-06-08 Thread Lai Wei
Dear MXNet community, This is the 3-day vote to release Apache MXNet (incubating) version 1.5.0. Voting on dev@ will start June 8, 23:59:59(PST) and close on June 11, 23:59:59. 1) Link to release notes: https://cwiki.apache.org/confluence/display/MXNET/1.5.0+Release+Notes 2) Link to release can