Hi Pedro,
these are just helper functions, you need to check the operator. In this
case, the function is the derivative as function of the *output*, which is
cheaper to compute:
y = log(1 + exp(x)) => dy/dx = 1/(1 + exp(-x)) = 1 - exp(-y)
If you check all sorts of other ops, the same is the
I bumped into the definition of the softrelu gradient:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/mshadow_op.h#L170
Which is defined as 1- exp(-x)
As we define the forward of the softrelu as the softplus function,
shouldn't the gradient be the logistic function?
Is my