Jerry Lam created SPARK-24652: --------------------------------- Summary: Strange ALS Implementation for Implicit Feedback Key: SPARK-24652 URL: https://issues.apache.org/jira/browse/SPARK-24652 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.3.1 Reporter: Jerry Lam
Hi there, I'm evaluating the ALS implementation from Spark ML. Does Spark implement the algorithm described in "Collaborative Filtering for Implicit Feedback Datasets"? because if it is, I think the implementation returns result that is incorrect. Here is the example: {code:java} from pyspark.ml.recommendation import ALS als = ALS( maxIter=100, regParam=0.0, alpha=1.0, nonnegative=False, implicitPrefs=True, rank=1) ratings = spark.createDataFrame([(0, 0, 1), (1,1, 1)]).toDF('user', 'item', 'rating') als_model = als.fit(ratings) reco = als_model.recommendForAllUsers(10) reco.show(truncate=False) {code} The result is: {code:java} +----+---------------------------------+ |user|recommendations | +----+---------------------------------+ |0 |[[0, 0.6666667], [1, -0.6666667]]| |1 |[[1, 0.6666667], [0, -0.6666667]]| +----+---------------------------------+ {code} I expect the results for the above to be : {code:java} +----+---------------------------------+ |user|recommendations | +----+---------------------------------+ |0 |[[0, 1.0], [1, -1.0]]| |1 |[[1, 1.0], [0, -1.0]]| +----+---------------------------------+ {code} The reason I believe that it should be equal to 1.0 for (user=1, item=1) and 1.0 for (user=0, item=0) is because from the paper, the above should return 1.0 this two cases given that lambda is 0.0 (no regularization). Can someone describe what implementation of implicit feedback is spark using? If it implemented the same paper, why the result is so different? Thank you. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org