Hi,
We had to restart the master to mitigate an issue related to jenkins slaves
being down.
You may have to retrigger some of your in progress PRs. Apologies for the
inconvenience caused.
Anirudh
Hi,
I had to upgrade the CI to obtain some important security fixes :
https://jenkins.io/security/advisory/2020-01-29/ . You may have to
retrigger some of your in progress PRs. Apologies for the inconvenience
caused.
Anirudh
Thanks for the thoughtful and valuable comments @arcadiaphy.
> I've deployed many models with scala API, and run them in multiple threads.
> The whole system has run smoothly in production environment for more than 2
> months.
> The backend of inference is graph executor, which is created for e
@ptrendx I am trying to open a PR by Friday. On the status : the two prereqs
issues https://github.com/dmlc/dmlc-core/pull/573 and
https://github.com/apache/incubator-mxnet/issues/16434 have been better
understood and fixed/worked around. I have made C API and backend changes and
currently stil
Hi Akash,
Welcome to the project! https://mxnet.apache.org/community/contribute is a
good place to start.
Anirudh
On Fri, Oct 18, 2019 at 6:37 AM AKASH S M wrote:
> Hello,
> I'm Akash S M, an undergraduate from Indian Institute of
> Technology, Roorkee. I'd like to join the developer
Thanks @marcoabreu !
> Will the new C-API functions be threadsafe in general? Speak, I can invoke
> them at any point in time from any thread without the need of a lock,
> sticky-thread or a thread hierarchy? (I'm thinking of the thread-safety being
> done on the backend level)
The issue I fo
Thanks to @nswamy for his inputs and design discussions related to this project
and @frankfliu for explaining the requirements and the use case from customer
perspective.
# Problem Statement
One of the big un-catered for use cases in MXNet is loading a model and being
able to run parallel infe
chaining back to the
> python through normal signal channels. if i can get it to work i’ll post a
> PR.
>
> On Mon, Sep 23, 2019 at 12:00 PM Anirudh Subramanian <
> anirudh2...@gmail.com>
> wrote:
>
> > Currently I don't see any special handling in the code b
Currently I don't see any special handling in the code base for this. We
have atexit.register which invokes MXInvokeShutdown from python but that
doesnt work for signals.
Anirudh
On Sun, Sep 22, 2019 at 7:30 PM Chris Olivier wrote:
> question: how does gluon handle ctrl-c during a “long” impera
+1
Build from source with cmake and ran unittest for gluon and amp.
Noticed that test_sync_batchnorm fails on p3.8xlarge (hidden by the CI
because passes on machines with 1 or 2 gpus).
I have opened an issue for the same
https://github.com/apache/incubator-mxnet/issues/16214 though I think its
no
+1
On Thu, Sep 12, 2019 at 1:15 PM Zach Kimberg
wrote:
> We had a discussion a while back about trying to improve the way we handle
> issues by assigning them to users who are working on them. However, the
> discussion ended because issues could only be assigned to those with write
> access (com
Hi Pedro,
I don't see anything "destructive" with Chris asking for justification for
you calling something "hacky". The only email in this thread where I see ad
hominems and disrespectful comments is your email.
On Sat, Sep 7, 2019, 10:18 PM Pedro Larroy
wrote:
> Apache mentors should have a lo
arly so we can track and solve it
> in time rather than block the release during vote time.
>
> [1] https://travis-ci.org/awslabs/sockeye
>
>
> On Fri, Jun 21, 2019 at 7:01 AM Anirudh Subramanian >
> wrote:
>
> > I was able to reproduce a cr
I was able to reproduce a crash with the commit
09202f7f261954383aa387144524d38f83f18d06 but not with the commit
a862270beb2d796c1ba311183f7f4a766a18ad6c.
Anirudh
On Thu, Jun 20, 2019 at 3:53 PM Lai Wei wrote:
> Hi Przemyslaw,
>
> Is there an issue with more details to track the problem?
>
>
>
+1, Agree this should be done for both CUDA and CUDNN versions. At max CUDA
Version N and CUDA Version N - 1 should be supported in CI.
My question is what happens, when we are at a position, where we are on a
CUDA version N and removed support for CUDA version N - 1. Within a small
duration Nvidi
> HOWEVER, as I was writing this reply I realized that due to pure luck
this is not actually what happens - optimizers could in fact be in the
FP32_FUNCS list. That is because, as AMP's assumption is that the model
being changed is FP32 model at the start, all weights (and so all the
gradients just
The assumption is the AMP requirement is something that has a steep
learning curve. Developers may get confused by the name, but the question
the developer has to essentially answer is (and this can be added in the
error):
1. If the operator can run in FP16 and FP32 modes, put it in
FP16_FP32_FUNCS
Hi,
I agree with Marco there are some easy wins to be had since many new GPU
operators come with FP16 support.
I think we can explore the overhead to the developer and try to reduce the
feedback time for the developer, so
that cost associated with adding support for AMP feature is minimized.
Also
Hi all,
I had discussion with Przemyslaw about this offline. There are two options
we can pursue to make developer experience better ( Since currently they
have to wait for CI to complete):
1. Obtain the current lists and check if the length of the combined lists
is same as MXListAllOpNames which
quests aiming for
> 1.5.0 that needs attention.
> Please understand we already have around 650 commits in master that need
> to be released in time. We understand TensorRT test in CI is failing and
> are trying to fix it. Meanwhile please update the tracker if there is any
> cha
Hi Junru,
Overall, I appreciate the points you made about the proposal.
Having said that, I would like to remind the Apache Code of Conduct :
https://www.apache.org/foundation/policies/conduct.
"Be empathetic, welcoming, friendly and patient".
I find your tone condescending. Clearly you understa
Sent invite!
On Wed, May 8, 2019 at 6:43 AM Sem wrote:
> Requesting slack access
>
>
Hi Sheng,
I had a discussion with nvidia folks offline today (@ptrendx et. al.). I
strongly feel that the AMP feature should be included as part of the
release: https://github.com/apache/incubator-mxnet/pull/14173 .
The PR is aimed for completion for next week but reviews and RFC
discussions may t
her cmake
> works on their side.
>
> Thanks,
> Junru
>
>
> On Fri, May 3, 2019 at 9:43 PM Anirudh Subramanian
> wrote:
>
> > Hi Junru,
> >
> > I am on v1.4.x , and my dmlc-core commit is this one :
> >
> >
> https://github.com/dmlc/dmlc
Also, could you check if you are testing on v1.4.x branch?
>
> Thanks,
> Junru
>
>
>
> On Fri, May 3, 2019 at 4:33 PM Anirudh Subramanian
> wrote:
>
> > -1 (binding)
> >
> > Is the cmake build failing for the 1.4.1 release tag ? Is this a known
> &g
-1 (binding)
Is the cmake build failing for the 1.4.1 release tag ? Is this a known
issue ?
Did the following:
cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DCUDNN_ROO
> mixed precision model we don't talk about training, and when talk about
> inference, INT8 quantization is not mentioned~
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, April 30, 2019 8:27 PM
> To: dev@mxnet.
passes.
Anirudh
On Mon, Apr 29, 2019 at 2:22 PM Anirudh Subramanian
wrote:
> Hi Zach,
>
> You raise an interesting point. Thank you for the pointer!
>
> Incorporating CSE pass comes with its own cost, and the advantage it
> brings is to make the ReducePrecision nnvm pass more ligh
y for other passes that
> could create duplicates or to remove duplicate expressions in general. This
> tutorial [2] talks about it a bit.
>
> Zach
>
> [1] - https://en.wikipedia.org/wiki/Common_subexpression_elimination
> [2] - https://blog.regehr.org/archives/1603
>
> On Mon,
ort the lower
> > precision the previous one used?
> > - what will be saved in the final symbol.json and params file when
> > training is finished?
> > - more generally, what will be saved when users want to serialize
> > their model to disk?
> >
> > Th
params file when
> training is finished?
> - more generally, what will be saved when users want to serialize their
> model to disk?
>
> Thank you,
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Monday, April 29, 2019
Hi all,
I have created a doc for conversion from FP32 to Mixed Precision Models:
https://cwiki.apache.org/confluence/display/MXNET/Conversion+from+FP32+to+Mixed+Precision+Models
I look forward to your feedback on the same.
Thanks,
Anirudh
Hi,
Please join me to welcome Wang Jiajun (https://github.com/arcadiaphy) as a
new committer of Apache (incubating) MXNet!
Wang has been solving some tough bugs with respect to memory leaks, process
fork handling, dependency engine issues and custom op exception handling.
Issue Involvement:
http
> If there is a use-case where people can not even use our C++ package,
> then
> > we could have discussions about introducing a user-facing C-API, but
> right
> > now this approach to interface with our C-API (although I know that
> people
> > use it) seem a bit like u
ce a lot of duplicate
> code though.
>
> On Thu, Apr 11, 2019 at 8:50 AM Anirudh Subramanian >
> wrote:
>
> > I was under the impression that C API does fall under semver. Has this
> been
> > discussed somewhere before ? Is this also the case for C Predict API ?
> &g
I was under the impression that C API does fall under semver. Has this been
discussed somewhere before ? Is this also the case for C Predict API ?
On Thu, Apr 11, 2019, 8:08 AM Marco de Abreu
wrote:
> In case only changes to the c-api are being made, it doesn't fall under our
> semantic versioni
Hi all,
Please join me to welcome Alex Zai as a new committer of Apache
(incubating) MXNet!
Alex has been instrumental in brining MKLDNN from experimental to making it
default on MXNet master. This involved adding Python and C++ unit tests,
improving CI coverage for MKLDNN, testing MKLDNN on diff
Hi all,
Please join me to welcome Patric Zhao as a new committer of Apache
(incubating) MXNet!
Patric has put in great effort around MKLDNN integration into MXNet and has
been involved in features like quantization, graph fusion and fused RNN
operators for CPU.
Dev List activity:
https://lists.a
-0
Thanks Steffen for your release efforts !
Build from source works with make but fails with cmake for me.
cd build && cmake VERBOSE=1 -DUSE_CUDA=ON -DUSE_CUDNN=ON -DUSE_OPENMP=ON
-DCMAKE_BUILD_TYPE=Debug -DUSE_DIST_KVSTORE=0 -DUSE_OPENCV=1 -GNinja .. &&
ninja -v
FAILED: : && /usr/bin/c++ -
ct to
> backward compatibility, interface changes, testing etc.
>
> (Lin) This is definitely an informative discussion. It would be better if
> we can put this in a more noticeable place for developers.
>
>
> On Thu, Dec 20, 2018 at 1:39 PM Anirudh Subramanian >
> wrot
1) Which guideline should we follow when updating the UI in MXNet operators?
A) MXNet follows semantic versioning, so breaking changes to operator
interfaces can be introduced only in major versions.
2) Who should approve the UI change?
A) Contributors who may have worked on the operator and/or ot
Hi Steffen,
I have created a PR to cherry pick the change to v1.4.x branch:
https://github.com/apache/incubator-mxnet/pull/13517
Anirudh
On Mon, Dec 3, 2018 at 11:29 AM Steffen Rochel
wrote:
> Thanks Haibin. Anirudh - please add PR for v1.4.x for
> https://github.com/apache/incubator-mxnet/pul
Instruction set extensions support like AVX2, AVX512 etc. can vary between
AMD and Intel and there can also be a time lag between when Intel supports
it versus when AMD supports it.
Also, in the future this setup may be useful in case MXNet supports AMD
GPUs and AWS also happens to have support for
+1
On Thu, Nov 29, 2018 at 2:38 PM Alex Zai wrote:
> What are people's thoughts on having AMD machines tested on the CI? AMD
> machines are now available on AWS.
>
> Best,
> Alex
>
annot find any evidence about patch release.
>
> -tao
>
> -Original Message-
> From: Anirudh Subramanian [mailto:anirudh2...@gmail.com]
> Sent: Tuesday, November 27, 2018 6:16 AM
> To: dev@mxnet.incubator.apache.org
> Subject: Re: Include MKLDNN into default mxnet pip pa
Hi Tao,
I agree with Steffen that we can start with a stable release for MKLDNN for
1.4.0. For your suggestion on using 0.17, can you provide info on what
versioning mechanism MKLDNN uses. Once a MKLDNN release is out and there
are some regressions found like the LSTM regression, would it be possi
Thanks for the quick response and mitigation!
On Wed, Nov 21, 2018 at 3:55 PM Marco de Abreu
wrote:
> Hello,
>
> today, CI had some issues and I had to cancel all jobs a few minutes ago.
> This was basically caused by the high load that is currently being put on
> our CI system due to the pre-re
Hello all,
The Apache MXNet (incubating) Community announces the availability of
Apache MXNet (incubating) 1.2.1!
Apache MXNet (incubating) is a deep learning framework designed for
both efficiency and flexibility. It allows you to mix symbolic and
imperative programming to maximize efficiency a
48 matches
Mail list logo