subject:"CUDNN algorithm selection failure"

Re: CUDNN algorithm selection failure

2018-10-04 Thread kellen sunderland

"I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't reproduce the issue." One thing to keep in mind is that the SelectAlgo call will cache results in a registry that is in static scope. To repro you'd likely have to create a new process each time you run the test. (Apologies

Re: CUDNN algorithm selection failure

2018-10-04 Thread Marco de Abreu

For GPU, we don't run any tests in parallel. -Marco Naveen Swamy schrieb am Do., 4. Okt. 2018, 19:54: > Looking at the error raised, you can see that the workspace size(GPU mem > size) of 1GB isn't sufficient. I am wondering if it is due to tests running > in parallel on CI, if this is

Re: CUDNN algorithm selection failure

2018-10-04 Thread Naveen Swamy

Looking at the error raised, you can see that the workspace size(GPU mem size) of 1GB isn't sufficient. I am wondering if it is due to tests running in parallel on CI, if this is true(tests running in parallel) is it possible to reduce the parallelism ? Error: "mxnet.base.MXNetError: [05:40:12]

Re: CUDNN algorithm selection failure

2018-10-03 Thread Pedro Larroy

Seems is not the only test: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been touched for a while. It doesn't look like a problem with the test to me, (not a flaky

Re: CUDNN algorithm selection failure

2018-10-02 Thread Marco de Abreu

I have created an issue at https://github.com/apache/incubator-mxnet/issues/12715 and a PR to disable the test at https://github.com/apache/incubator-mxnet/pull/12716. This test is pretty new and was submitted with a number of other problematic (and disabled) tests:

Re: CUDNN algorithm selection failure

2018-10-01 Thread Lin Yuan

I could not reproduce the error on an EC2 g3x8 instance making it hard to debug. I also suspect it was due to resource usage limit on ci Instance. On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy wrote: > It doesn't look like flakiness to me at first sight. I think it might be > related to

CUDNN algorithm selection failure

2018-10-01 Thread Pedro Larroy

Hi I saw this failure on CI: http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline Have you seen other cases where we fail to select the best CUDNN algorithm? In which circumstances this could happen, and do you think is a good idea to have

Re: CUDNN algorithm selection failure

Re: CUDNN algorithm selection failure

Re: CUDNN algorithm selection failure

Re: CUDNN algorithm selection failure

Re: CUDNN algorithm selection failure

Re: CUDNN algorithm selection failure

CUDNN algorithm selection failure

7 matches

Site Navigation

Mail list logo

Footer information