Re: Using AMP

2020-05-01 Thread Przemysław Trędak
Hi Naveen, The problem that you see with loss is due to the fact that the model clips the gradient, which in the case of AMP is scaled by the loss scale. In order for it to work you need to apply the same loss scale to the value you are using to clip the gradients. This is currently possible in

Re: Using AMP

2020-05-01 Thread Przemysław Trędak
Just realized I did not actually link to the issue I mentioned, it is https://github.com/apache/incubator-mxnet/issues/17507 On 2020/05/01 18:19:27, Przemys��aw Tr��dak wrote: > Hi Naveen, > > The problem that you see with loss is due to the fact that the model clips > the gradient, which in

Re: Using AMP

2020-05-01 Thread Naveen Swamy
Thanks Przemek, appreciate your input. Let me apply the scale changes to the gradient clips and run the experiment again. On Fri, May 1, 2020 at 11:20 AM Przemysław Trędak wrote: > Just realized I did not actually link to the issue I mentioned, it is > https://github.com/apache/incubator-mxnet/i