Looking at this <https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/ann/Layer.scala#L541> code
for (i <- (L - 2) to (0, -1)) { layerModels(i + 1).computePrevDelta(deltas(i + 1), outputs(i + 1), deltas(i)) } I want to understand why are we passing outputs(i+1) instead of outputs(i) in the code snippet above. As far as I understand this is only needed for sigmoid activation layer which has a derivative as f'(x) = f(x) * (1-f(x)) = outputs(i) * (1-outputs(i)) Which means in order to find prevDelta we should be using outputs(i). -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org