You're asking what happens when you put many ratings for one user-item pair in the input, right? I'm saying you shouldn't do that -- aggregate them into one pair in your application.
For rating-like (explicit) data, it doesn't really make sense otherwise. The only sensible aggregation is last-first, but there's no natural notion of 'last' in the RDD you supply. For count-like (implicit) data, it makes sense to sum the inputs, but I don't think that is done automatically. I skimmed the code and didn't see it. So you would sum the values per user-item anyway. On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari <roberto.pagli...@asos.com> wrote: > Hi Sean, > I¹m not sure what you mean by aggregate. The input of trainImplicit is an > RDD of Ratings. > > I find it odd that duplicate ratings would mess with ALS in the implicit > case. It¹d be nice if it didn¹t. > > > Thank you, > > On 15/02/2016 20:49, "Sean Owen" <so...@cloudera.com> wrote: > >>I believe you need to aggregate inputs per user-item in your call. I >>am actually not sure what happens if you don't. I think it would >>compute the factors twice and one would win, so yes I think it would >>effectively be ignored. For implicit, that wouldn't work correctly, >>so you do need to aggregate. >> >>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari >><roberto.pagli...@asos.com> wrote: >>> What happens when duplicate user/ratings are fed into ALS (the implicit >>> version, specifically)? Are duplicates ignored? >>> >>> I¹m asking because that would save me a distinct. >>> >>> >>> >>> Thank you, >>> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org