"I ran a similar test(test_slice_batchnorm) for 5K times and I couldn't
reproduce the issue."
One thing to keep in mind is that the SelectAlgo call will cache results in
a registry that is in static scope. To repro you'd likely have to create a
new process each time you run the test. (Apologies
For GPU, we don't run any tests in parallel.
-Marco
Naveen Swamy schrieb am Do., 4. Okt. 2018, 19:54:
> Looking at the error raised, you can see that the workspace size(GPU mem
> size) of 1GB isn't sufficient. I am wondering if it is due to tests running
> in parallel on CI, if this is
Looking at the error raised, you can see that the workspace size(GPU mem
size) of 1GB isn't sufficient. I am wondering if it is due to tests running
in parallel on CI, if this is true(tests running in parallel) is it
possible to reduce the parallelism ?
Error:
"mxnet.base.MXNetError: [05:40:12]
Seems is not the only test:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/PR-12726/5/pipeline
test_slice_batchnorm_reshape_batchnorm is also failing and hasn't been
touched for a while. It doesn't look like a problem with the test to me,
(not a flaky
I have created an issue at
https://github.com/apache/incubator-mxnet/issues/12715 and a PR to disable
the test at https://github.com/apache/incubator-mxnet/pull/12716.
This test is pretty new and was submitted with a number of other
problematic (and disabled) tests:
I could not reproduce the error on an EC2 g3x8 instance making it hard to
debug. I also suspect it was due to resource usage limit on ci Instance.
On Mon, Oct 1, 2018 at 10:40 PM Pedro Larroy
wrote:
> It doesn't look like flakiness to me at first sight. I think it might be
> related to
Hi
I saw this failure on CI:
http://jenkins.mxnet-ci.amazon-ml.com/blue/organizations/jenkins/incubator-mxnet/detail/master/1697/pipeline
Have you seen other cases where we fail to select the best CUDNN algorithm?
In which circumstances this could happen, and do you think is a good idea
to have