On Tue, Mar 11, 2014 at 10:18 PM, Michael Allman <m...@allman.ms> wrote: > I'm seeing counterintuitive, sometimes nonsensical recommendations. For > comparison, I've run the training data through Oryx's in-VM implementation > of implicit ALS with the same parameters. Oryx uses the same algorithm. > (Source in this file: > https://github.com/cloudera/oryx/blob/master/als-common/src/main/java/com/cloudera/oryx/als/common/factorizer/als/AlternatingLeastSquares.java)
On this note, I should say that Oryx varies from that paper in a couple small ways. In particular it the regularization parameter that is used in the end is not just lambda, but lambda * alpha. (There are decent reasons for this.) So the difference with the "same" parameters could be down to this. What param values are you using? It might be the difference. (There is another difference in handling of negative values, but that is probably irrelevant to you? It is in Spark now too though. It was not in 0.9.0 but is in HEAD.) > However, it looks like this code is in fact computing YtY + YtY(Cu - I), > which is the same as YtYCu. If so, that's a bug. Can someone familiar with > this code evaluate my claim? I too can't be 100% certain I'm not missing something, but from a look at that line, I don't think it is computing YtY(Cu-I). It is indeed trying to accumulate the value Yt(Cu-I)Y by building it up from pieces, from rows of Y. For one row of Y that piece is, excusing my notation, Y(i)t (Cu(i)-1) Y(i). The middle term is just a scalar so it's fine to multiply it at the end as you see in that line. You may wish to follow HEAD, which is a bit different: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala#L390 The computation is actually the same as before (for positive input), expressed a little differently. Happy to help on this given that I know this code a little and the code you are comparing it to a lot.