You're asking what happens when you put many ratings for one user-item
pair in the input, right? I'm saying you shouldn't do that --
aggregate them into one pair in your application.

For rating-like (explicit) data, it doesn't really make sense
otherwise. The only sensible aggregation is last-first, but there's no
natural notion of 'last' in the RDD you supply.

For count-like (implicit) data, it makes sense to sum the inputs, but
I don't think that is done automatically. I skimmed the code and
didn't see it. So you would sum the values per user-item anyway.

On Mon, Feb 15, 2016 at 9:05 PM, Roberto Pagliari
<roberto.pagli...@asos.com> wrote:
> Hi Sean,
> I¹m not sure what you mean by aggregate. The input of trainImplicit is an
> RDD of Ratings.
>
> I find it odd that duplicate ratings would mess with ALS in the implicit
> case. It¹d be nice if it didn¹t.
>
>
> Thank you,
>
> On 15/02/2016 20:49, "Sean Owen" <so...@cloudera.com> wrote:
>
>>I believe you need to aggregate inputs per user-item in your call. I
>>am actually not sure what happens if you don't. I think it would
>>compute the factors twice and one would win, so yes I think it would
>>effectively be ignored.  For implicit, that wouldn't work correctly,
>>so you do need to aggregate.
>>
>>On Mon, Feb 15, 2016 at 8:30 PM, Roberto Pagliari
>><roberto.pagli...@asos.com> wrote:
>>> What happens when duplicate user/ratings are fed into ALS (the implicit
>>> version, specifically)? Are duplicates ignored?
>>>
>>> I¹m asking because that would save me a distinct.
>>>
>>>
>>>
>>> Thank you,
>>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to