arcadiaphy opened a new issue #14057: validation stucks when training gluoncv 
ssd model
URL: https://github.com/apache/incubator-mxnet/issues/14057
 
 
   ## Description
   When training gluoncv ssd model, validation sometimes takes way more longer 
time than the training epoch. After debugging, the problem comes from the 
`box_nms` operator which contributes most of the time.
   
   ## Environment info (Required)
   
   ```
       Centos 7
       CUDA: 9.0
       cudnn: 7 
       mxnet: 1.4.0.rc2
       gluon-cv: latest
   
   ```
   
   ## Minimum reproducible example
   The following snippets show `box_nms` will take very long time when 
processing a lot of prior boxes
   ```
   import mxnet as mx
   import numpy as np
   
   np.random.seed(0)
   
   batch_size = 32
   prior_number = 100000
   data = np.zeros((batch_size, prior_number, 6))
   data[:, :, 0] = np.random.randint(-1, 1, (batch_size, prior_number))
   data[:, :, 1] = np.random.random((batch_size, prior_number))
   
   xmin = np.random.random((batch_size, prior_number))
   ymin = np.random.random((batch_size, prior_number))
   width = np.random.random((batch_size, prior_number))
   height = np.random.random((batch_size, prior_number))
   data[:, :, 2] = xmin
   data[:, :, 3] = ymin
   data[:, :, 4] = xmin + width
   data[:, :, 5] = ymin + height
   
   mx_data = mx.nd.array(data, ctx=mx.gpu(0))
   rv = mx.nd.contrib.box_nms(mx_data, overlap_thresh=0.5, valid_thresh=0.01, 
topk=400, score_index=1, id_index=0)
   mx.nd.waitall()
   
   ```
   
   ## What I have found out
   1. The gpu version of stable sort in `SortByKey` function degrades badly on 
sorting length
   2. The `box_nms` operator doesn't remove background boxes in valid box 
filtering which leads to big sorting length
   
   ## What I have done
   1. Add SORT_WITH_THRUST compiling definition in Makefile: the validation 
process is still very slow
   2. Add background boxes filtering in `box_nms`: the validation process 
accelerates dramatically since most of boxes are classified as background.
   
   I will post a PR on the second solution.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to