QueensGambit commented on issue #8832: I didn't get a reasonable speed-up when 
applying depthwise convolution to VGG16
URL: 
https://github.com/apache/incubator-mxnet/issues/8832#issuecomment-513513890
 
 
   Hi @lawrencewxj @edmBernard 
   For a certain model architecture which heavily uses depthwise separable 
convolutions, I achieved a 1.4x speed-up on GPU ( MXNET-cu10 1.4.1,CUDA 10.0, 
cuDNN v7.5.1.10), but a 3x speed-up on CPU (MXNET-mkl 1.4.1).
   The main reason for this is because grouped convolutions cause memory 
fraction and are naturally not well suited for GPUs.
   
   The following paper conducted an experiment on this (see Table 2).
   **ShuffleNet V2: Practical Guidelines for EfficientCNN Architecture Design** 
(Ma et al., 2018)
   * https://arxiv.org/pdf/1807.11164.pdf
   
   Recent cuDNN-Versions are improving the performance for group-convolutions:
   * https://docs.nvidia.com/deeplearning/sdk/pdf/cuDNN-Release-Notes.pdf
   
   It could be good to verify that MXNet makes use of all these recent 
optimizations.
   
   Best,
   ~QueensGambit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to