perdasilva opened a new issue #14502: [Test Failure] GPU Test failures across 
different CUDA versions
URL: https://github.com/apache/incubator-mxnet/issues/14502
 
 
   ## Description
   Testing mxnet library compiled for the python distribution against different 
versions of CUDA.
   
   I'm getting a strange failure on all CUDA versions. The tests are being run 
on a g3.8xlarge instance, within a docker container based on the 
nvidia/cuda:XXX-cudnn7-devel-ubuntu16.04 (where XXX is the particular version 
of CUDA).
   
   ```
   ======================================================================
   ERROR: test_gluon_gpu.test_lstmp
   ----------------------------------------------------------------------
   Traceback (most recent call last):
     File "/usr/local/lib/python2.7/dist-packages/nose/case.py", line 197, in 
runTest
       self.test(*self.arg)
     File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 177, in 
test_new
       orig_test(*args, **kwargs)
     File "/work/mxnet/tests/python/gpu/../unittest/common.py", line 110, in 
test_new
       orig_test(*args, **kwargs)
     File "/work/mxnet/tests/python/gpu/test_gluon_gpu.py", line 124, in 
test_lstmp
       check_rnn_layer_forward(gluon.rnn.LSTM(10, 2, projection_size=5), 
mx.nd.ones((8, 3, 20)))
     File "/work/mxnet/tests/python/gpu/../unittest/test_gluon_rnn.py", line 
441, in check_rnn_layer_forward
       out = layer(inputs)
     File "/work/mxnet/python/mxnet/gluon/block.py", line 540, in __call__
       out = self.forward(*args)
     File "/work/mxnet/python/mxnet/gluon/block.py", line 917, in forward
       return self.hybrid_forward(ndarray, x, *args, **params)
     File "/work/mxnet/python/mxnet/gluon/rnn/rnn_layer.py", line 239, in 
hybrid_forward
       out = self._forward_kernel(F, inputs, states, **kwargs)
     File "/work/mxnet/python/mxnet/gluon/rnn/rnn_layer.py", line 270, in 
_forward_kernel
       lstm_state_clip_nan=self._lstm_state_clip_nan)
     File "<string>", line 145, in RNN
     File "/work/mxnet/python/mxnet/_ctypes/ndarray.py", line 92, in 
_imperative_invoke
       ctypes.byref(out_stypes)))
     File "/work/mxnet/python/mxnet/base.py", line 252, in check_call
       raise MXNetError(py_str(_LIB.MXGetLastError()))
   MXNetError: [14:22:01] src/operator/./rnn-inl.h:385: hidden layer projection 
is only supported for GPU with CuDNN later than 7.1.1
   
   Stack trace returned 10 entries:
   [bt] (0) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x42c70a) 
[0x7fcb9e25170a]
   [bt] (1) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x42cd31) 
[0x7fcb9e251d31]
   [bt] (2) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x3495ea8) 
[0x7fcba12baea8]
   [bt] (3) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x349612e) 
[0x7fcba12bb12e]
   [bt] (4) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x30ea87f) 
[0x7fcba0f0f87f]
   [bt] (5) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x75f9c5) 
[0x7fcb9e5849c5]
   [bt] (6) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Imperative::InvokeOp(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::OpReqType, 
std::allocator<mxnet::OpReqType> > const&, mxnet::DispatchMode, 
mxnet::OpStatePtr)+0xb35) [0x7fcba0cdcc45]
   [bt] (7) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(mxnet::Imperative::Invoke(mxnet::Context
 const&, nnvm::NodeAttrs const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&, std::vector<mxnet::NDArray*, 
std::allocator<mxnet::NDArray*> > const&)+0x38c) [0x7fcba0cdd1cc]
   [bt] (8) /work/mxnet/python/mxnet/../../lib/libmxnet.so(+0x2db2d09) 
[0x7fcba0bd7d09]
   [bt] (9) 
/work/mxnet/python/mxnet/../../lib/libmxnet.so(MXImperativeInvokeEx+0x6f) 
[0x7fcba0bd82ff]
   
   
   -------------------- >> begin captured stdout << ---------------------
   checking gradient for lstm0_l0_h2h_bias
   checking gradient for lstm0_l0_h2h_weight
   checking gradient for lstm0_l0_i2h_weight
   checking gradient for lstm0_l0_i2h_bias
   checking gradient for lstm0_l0_h2r_weight
   
   --------------------- >> end captured stdout << ----------------------
   -------------------- >> begin captured logging << --------------------
   common: INFO: Setting test np/mx/python random seeds, use 
MXNET_TEST_SEED=1414687138 to reproduce.
   --------------------- >> end captured logging << ---------------------
   ```
   
   I have not yet tried to reproduce it separately outside of Docker on a GPU 
machine using the current pip package for 1.4.0.
   
   I find it strange that the PRs aren't breaking. Since they seem to be based 
off the same docker image I'm using, running on the same instance type.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to