The scale factor was only to scale up the number of ratings in the dataset
for performance testing purposes, to illustrate the scalability of Spark
ALS.

It is not something you would normally do on your training dataset.
On Fri, 23 Sep 2016 at 20:07, Roshani Nagmote <roshaninagmo...@gmail.com>
wrote:

> Hello,
>
> I was working on Spark MLlib ALS Matrix factorization algorithm and came
> across the following blog post:
>
>
> https://databricks.com/blog/2014/07/23/scalable-collaborative-filtering-with-spark-mllib.html
>
> Can anyone help me understanding what "s" scaling factor does and does it
> really give better performance? What's the significance of this?
> If we convert input data to scaledData with the help of "s", will it
> speedup the algorithm?
>
> Scaled data usage:
> *(For each user, we create pseudo-users that have the same ratings. That
> is, for every rating as (userId, productId, rating), we generate (userId+i,
> productId, rating) where 0 <= i < s and s is the scaling factor)*
>
> Also, this blogpost is for spark 1.1 and I am currently using 2.0
>
> Any help will be greatly appreciated.
>
> Thanks,
> Roshani
>

Reply via email to