Thank you everyone for your feedback. It's been very helpful, and though I still haven't found the cause of the difference between Spark and Oryx, I feel I'm making progress.
Xiangrui asked me to create a ticket for this issue. The reason I didn't do this originally is because it's not clear to me yet that this is a bug or a mistake on my part. I'd like to see where this conversation goes and then file a more clearcut issue if applicable. Sean pointed out that Oryx differs in its use of the regularization parameter lambda. I'm aware of this and have been compensating for this difference from the start. Also, the handling of negative values is indeed irrelevant as I have none in my data. After reviewing Sean's analysis and running some calculations in the console, I agree that the Spark code does compute YtCuY correctly. Regarding testing, I'm computing EPR on a test set as outlined in the paper. I'm training on three weeks of data and testing on the following week. I recently updated my data sets and rebuilt and tested the new models. The results were inconclusive in that both models scored about the same. I'm continuing to investigate the source of the wide difference in recommendations between implementations. I will reply with my findings when I have something more definitive. Cheers and thanks again. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/possible-bug-in-Spark-s-ALS-implementation-tp2567p2632.html Sent from the Apache Spark User List mailing list archive at Nabble.com.