xrmzju opened a new issue #14516: cann't achive linear speed up with multi GPUs URL: https://github.com/apache/incubator-mxnet/issues/14516 Note: Providing complete information in the most concise form is the best way to get help. This issue template serves as the checklist for essential information to most of the technical issues and bug reports. For non-technical issues and feature requests, feel free to present the information in what you believe is the best form. For Q & A and discussion, please start a discussion thread at https://discuss.mxnet.io ## Description i'm trying to run imagenet benchmark with resnet18 && cifar10( use 3,32,32 imageshape instead), here is my results. i'm wondering: 1) why the speedup is not linear? 2) the speed up ratio increases with the batch size, why? 3) What factors will affect the training speed? batch size | kv_store | dtype | num of gpus | speed | speedup ratio -- | -- | -- | -- | -- | -- 256 | device | float16 | 1 | 6982.74 | 1 | | | 2 | 9908.78 | 0.709519472 | | | 4 | 9128.08 | 0.460605645 | | float32 | 1 | 4192.01 | 1 | | | 2 | 6686.33 | 0.797508832 | | | 4 | 7094.3 | 0.530507767 | local | float16 | 1 | 6866.89 | | | | 2 | 7995.19 | 0.582155095 | | | 4 | 5417.86 | 0.197245769 | | float32 | 1 | 4216.72 | 1 | | | 2 | 6814.4 | 0.808021401 | | | 4 | 6005.8 | 0.356070595 512 | device | float16 | 1 | 7827.6 | 1 | | | 2 | 13258 | 0.84687516 | | | 4 | 13203 | 0.421680975 | | float32 | 1 | 4274.25 | 1 | | | 2 | 8046.32 | 0.941255191 | | | 4 | 9978.2 | 0.583622858 | local | float16 | 1 | 7685.1 | 1 | | | 2 | 12535.5 | 0.815571691 | | | 4 | 10960.6 | 0.356553591 | | float32 | 1 | 4220.42 | 1 | | | 2 | 8058.09 | 0.954654987 | | | 4 | 10418.9 | 0.617171988 1024 | device | float16 | 1 | 5465.05 | 1 | | | 2 | 15415.4 | 1.410362211 | | | 4 | 21156.9 | 0.967827376 | | float32 | 1 | 3835.76 | 1 | | | 2 | 8428.88 | 1.410362211 | | | 4 | 12995.3 | 0.967827376 | local | float16 | 1 | 5473.84 | 1 | | | 2 | 15174.3 | 1.386074492 | | | 4 | 18561.5 | 0.847736689 | | float32 | 1 | 3830.2 | 1 | | | 2 | 8426.37 | 1.099990862 | | | 4 | 14106.9 | 0.920767845 2048 | device | float16 | 1 | 4515.36 | 1 | | | 2 | 10938 | 1.211199107 | | | 4 | 28013.4 | 1.5510059 | | float32 | 1 | 3387.51 | 1 | | | 2 | 7633.34 | 1.211199107 | | | 4 | 15375.4 | 1.5510059 | local | float16 | 1 | 4494.73 | 1 | | | 2 | 10821.7 | 1.203820919 | | | 4 | 25833.7 | 1.436888311 | | float32 | 1 | 3382.08 | 1 | | | 2 | 7639.2 | 1.129364178 | | | 4 | 16075.5 | 1.18828502 ## Environment info (Required) ```shell ----------Python Info---------- Version : 3.5.2 Compiler : GCC 5.4.0 20160609 Build : ('default', 'Nov 12 2018 13:43:14') Arch : ('64bit', 'ELF') ------------Pip Info----------- Version : 19.0.3 Directory : /usr/local/lib/python3.5/dist-packages/pip ----------MXNet Info----------- Version : 1.4.0 Directory : /usr/local/lib/python3.5/dist-packages/mxnet Commit Hash : a03d59ed867ba334d78d61246a1090cd1868f5da ----------System Info---------- Platform : Linux-4.1.51-x86_64-with-Ubuntu-16.04-xenial system : Linux node : mxnet-no-nvlink-4-74c6599dc6-f854z release : 4.1.51 version : #1 SMP Tue Feb 12 00:00:00 UTC 2019 ----------Hardware Info---------- machine : x86_64 processor : x86_64 Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 79 Model name: Intel(R) Xeon(R) CPU E5-2690 v4 @ 2.60GHz Stepping: 1 CPU MHz: 2599.996 BogoMIPS: 5199.99 Hypervisor vendor: KVM Virtualization type: full L1d cache: 32K L1i cache: 32K L2 cache: 4096K L3 cache: 16384K NUMA node0 CPU(s): 0-15 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon rep_good nopl xtopology eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch arat fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt ----------Network Test---------- Setting timeout: 10 Timing for FashionMNIST: https://apache-mxnet.s3-accelerate.dualstack.amazonaws.com/gluon/dataset/fashion-mnist/train-labels-idx1-ubyte.gz, DNS: 0.0346 sec, LOAD: 2.9720 sec. Timing for Gluon Tutorial(en): http://gluon.mxnet.io, DNS: 3.3387 sec, LOAD: 1.8584 sec. Timing for Conda: https://repo.continuum.io/pkgs/free/, DNS: 1.2275 sec, LOAD: 0.8845 sec. Timing for MXNet: https://github.com/apache/incubator-mxnet, DNS: 0.0022 sec, LOAD: 0.9994 sec. Timing for Gluon Tutorial(cn): https://zh.gluon.ai, DNS: 0.0532 sec, LOAD: 2.3628 sec. Error open PYPI: https://pypi.python.org/pypi/pip, <urlopen error _ssl.c:629: The handshake operation timed out>, DNS finished in 0.029279708862304688 sec. ``` ## Steps to reproduce (Paste the commands you ran that produced the error.) 1. run command ```shell python train_imagenet.py --image-shape 3,32,32 --batch-size 512 --gpus 0,1 --num-epochs 1 --benchmark 1 --dtype=float32 --kv-store=device --num-layers 18 --network resnet ``` 2. change parameters above ## What have you tried to solve it? i tried to increase the image-shape to 3,224,224, and the results becomes much more better, but i'm still wondering why the speed up ratio would be bigger than 1? batch size | kv_store | dtype | num of gpus | speed | speedup ratio -- | -- | -- | -- | -- | -- 256 | device | float16 | 1 | 1389.26 | | | | 2 | 2923.71 | 1.052254 | | | 4 | 4735.49 | 0.852161 | | float32 | 1 | 897.35 | | | | 2 | 1901.02 | 1.059241 | | | 4 | 3152.52 | 0.878286 | local | float16 | 1 | 1390.71 | 1 | | | 2 | 2921.01 | 1.050187 | | | 4 | 4033.09 | 0.725006 | | float32 | 1 | 899.573 | 1 | | | 2 | 1904.76 | 1.058702 | | | 4 | 3216.54 | 0.893907 512 | device | float16 | 1 | 1094.68 | 1 | | | 2 | 2722.44 | 1.243487 | | | 4 | 5097.52 | 1.164158 | | float32 | 1 | 830.413 | 1 | | | 2 | 1795.82 | 1.081281 | | | 4 | 3578.45 | 1.07731 | local | float16 | 1 | 1094.4 | 1 | | | 2 | 2720.68 | 1.243001 | | | 4 | 5066.31 | 1.157326 | | float32 | 1 | 828.245 | 1 | | | 2 | 1801.51 | 1.087547 | | | 4 | 3572.59 | 1.078361
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services