Could we run more epochs to see the performance difference or profiling the
difference between good and bad run?
> -Original Message-
> From: Pedro Larroy [mailto:pedro.larroy.li...@gmail.com]
> Sent: Thursday, June 27, 2019 9:35 AM
> To: dev@mxnet.incubator.apache.org
> Cc: d...@mxnet.ap
I run again and the gap is again bigger, I guess we need to average
out the times across several runs:
piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$
time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
~/mxnet_1.5/py3_venv/bin/python cifar10.py --epochs 5
[23
The difference looks smaller now, more like your numbers. I wonder if
something happened during the previous benchmark like a system
update...
piotr@ip-172-31-63-171:0:~/deeplearning-benchmark/dawnbench (master)+$
time ~/mxnet_1.4/py3_venv/bin/python cifar10.py --epochs 5 && time
~/mxnet_1.5/py3_
Hi Ciyong, thanks for trying to reproduce:
I used this one:
https://github.com/awslabs/deeplearning-benchmark/blob/master/dawnbench/cifar10.py
Could you provide hardware and OS details?
I will rerun and repost numbers in a few minutes.
Pedro.
On Wed, Jun 26, 2019 at 4:18 AM Chen, Ciyong wrote
Hi Pedro,
I'm looking at this case, and using the script of
"incubator-mxnet/example/image-classification/train_cifar10.py" to get
the timing data, but seems there's not much difference between mxnet 1.4.1.rc0
and 1.5.0.rc1 on C5.18xlarge.
Not sure if there's any difference in the python script