[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-29 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-665436383 @gilbertfrancois What is the BN suppose to do for your model in the tail? Is it suppose to do batch normalize every single value? ---

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-23 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-663168367 I am not sure that putting the running mean and running var into the backward pass is the solution. It can be achieved by setting autograd.record(train_mode=False). The p

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-22 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-662730136 After nn.Flatten(), the batch norm is actually performed on a 1xCx1x1 Tensor, where C is 9408 for the first batch norm layer in tail, and it is 32 for the second batch mo

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-22 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-662721253 It looks like there is a bug there for doing batch norm with 1D array, when the batch size is 1. For example, in this case, after flat, there vector size is 9408, which m

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-22 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-662687910 @gilbertfrancois I did a quick test, to answer your question: > I don't understand why y_out from MyNet with BatchNorm on GPU still contains real numbers, given that th

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-22 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-662547038 @gilbertfrancois Is your project for training or inference? In your script, it uses autograd, but it does not do backward(). The reason I asked this, is BatchNorm behave

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-20 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-661448168 @szha It may not affect real training for either CPU or GPU version as CPU version does update the running mean and running var in the backward path. Should we unify the

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-20 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-661442792 In case anyone want to do a PyTorch comparison for CPU version: This is an automated message from the Apa

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-20 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-661225937 The CPU version updated running mean and running var in backward path. And there are some nuance difference there between CPU and GPUs. @gilbertfrancois, could you try t

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-18 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660586280 On GPU, the first running mean is 0, while the following 3 running means are 0.1, 0.19 and 0.271, which can be explained as running_mean = 0.1 * running_mean + 0.9 * p

[GitHub] [incubator-mxnet] TristonC commented on issue #18751: gluon.nn.BatchNorm seems to swap updated values of moving_mean and moving_var on GPU.

2020-07-18 Thread GitBox
TristonC commented on issue #18751: URL: https://github.com/apache/incubator-mxnet/issues/18751#issuecomment-660583673 The CPU result seems wrong, while the GPU result seems more resonable. This is an automated message from