Re: Parameter choice and tuning parallelALS

Sebastian Schelter Wed, 02 Jan 2013 08:47:25 -0800

Hi Pat,

ParallelALSFactorizationJob actually implements two different flavours
of matrix factorization, one that is aimed at explicit feedback data
(such as ratings):


"Large-scale Parallel Collaborative Filtering for the Netﬂix Prize" [1]

and another one that is aimed at using implicit feedback data (clicks,
pageviews, etc):

"Collaborative Filtering for Implicit Feedback Datasets" available" [2]

The first approach is the default behavior, if you specify
implicitFeedback = true, than the second approach is used.

I'd recommend to have a look at the papers for the parameters for
details, I can give you a brief explanation here:

--numIterations controls the number of iterations to execute, it might
be a burden to have the users explicitly set this, but otherwise we
would have to check training error convergence after every iteration,
which would slow down this already very slow job even more

--numFeatures controls the number of latent features that we use to
model the user and item factors. For a production setting I would start
to experiment with a low number such as 10 or 20.

--alpha is a special parameter that's only necessary to handle implicit
data, have a look at [2]

--lambda is a hyperparameter that controls the regularization. You want
your solution to work well on unseen data and not overfit on the
training data. In order to find a good lambda, you have to look at the
RMSE your factorization gives on unseen data.

Best,
Sebastian



[1]
http://www.hpl.hp.com/personal/Robert_Schreiber/papers/2008%20AAIM%20Netflix/netflix_aaim08(submitted).pdf

[2] http://research.yahoo.com/pub/2433</p>

On 02.01.2013 17:27, Pat Ferrel wrote:
> What is the intuition regarding the choice or tuning of the ALS params?
> 
> Job-Specific Options:                                                         
>   
>   --lambda lambda                                     regularization 
> parameter               
>   --implicitFeedback implicitFeedback         data consists of implicit 
> feedback?    
>   --alpha alpha                                       confidence parameter 
> (only used on     
>                                                                       
> implicit feedback)                     
>   --numFeatures numFeatures                   dimension of the feature space  
>        
>   --numIterations numIterations               number of iterations
> 
> I've set up an iterative search for the lambda that gets the lowest rmse but 
> what is the likely range? Can the range to search be determined from the data 
> (all 1 or nothing in my case).  
> 
> I do plan to include implicit feedback (values less than 1) eventually. Not 
> sure what this controls. I would think implicit feedback means preferences of 
> varying strengths and that could be seen in the input so I'm unsure about 
> this flag's meaning and use.
> 
> No idea what the confidence factor should be or how it is used. 
> 
> Features? I suppose the number should be much less than the number of items 
> but there is a rule of thumb that applies to SVD so I wonder if there is also 
> one for ALS-WR?
> 
> Iterations seems straightforward since the greater the number the better the 
> results. I just need to see where the improvement is too small to warrant the 
> time spent.
> 
> The only parameter I wonder about for recommendfactorized is the maxRating? I 
> assume it is just a scaling factor so all ratings are between 0 and 
> maxRating? It doesn't do something unexpected like return anything > 
> maxRating as maxRating? In my case I have prefs 0-1 so maxRating is 1? I 
> imagine that the math might sometimes produce a rating higher than the max 
> pref so this is to clean up the returned ratings range?
>

Re: Parameter choice and tuning parallelALS

Reply via email to