Re: How does SVDRecommender work in mahout?

Daniel Quach Sun, 29 Apr 2012 14:25:10 -0700

Just wondering, what does mahout do for user/item pairs that do not have a 
rating? Does it fill it in with some average value? fill with zeros? something 
else?


On Apr 25, 2012, at 4:26 PM, Sean Owen wrote:

> I don't know what the particular issue is; I imagine there's something
> that needs some optimization in there.
> 
> If you're definitely interested in ALS and recommenders, I don't feel
> bad promoting our attempts to commercialize Mahout: Myrrix
> (http://myrrix.com) is exactly an ALS-based recommender, and I know it
> will crunch this data set into a model in 16 seconds on my laptop.
> This part of it is also free / open source.
> 
> Sean
> 
> On Wed, Apr 25, 2012 at 9:28 PM, Daniel Quach <danqu...@cs.ucla.edu> wrote:
>> I tried it again with 30 features and 3 iterations on the same data set, 
>> it's still running for 10+ minutes just to factorize for the SVDRecommender 
>> and has yet to complete. Perhaps it is my machine?
>> 
>> I am running on a macbook air with 4GB of RAM and an intel i5 processor, I 
>> specified 2GB of memory for java. (-Xmx2048M)
>> 
>> 
>> 
>> On Apr 25, 2012, at 12:25 PM, Sean Owen wrote:
>> 
>>> There's not a hard limit; the hard limit you would run into is memory,
>>> if anything.
>>> 
>>> This sounds slow. It may be that this implementation could use some
>>> optimization somewhere. Are you running many iterations or using a
>>> large number of features?
>>> 
>>> I have a different ALS implementation that finishes this data set (3
>>> iterations, 30 features -- quick and dirty) in more like 20 seconds.
>>> Here's some info on a run on a much larger data set, using ALS, for
>>> comparison: http://myrrix.com/example-performance/
>>> 
>>> On Wed, Apr 25, 2012 at 8:17 PM, Daniel Quach <danqu...@cs.ucla.edu> wrote:
>>>> Regarding the factorization (I am using ALSWRFactorizer), is there a limit 
>>>> to how large a data set that can be factorized?
>>>> 
>>>> I am trying to apply it on the 100K rating data set from group lens 
>>>> (approximately 1000 users by 1600 movies).
>>>> 
>>>> It's been running for at least 10 minutes now, I am getting the feeling it 
>>>> might not be wise to apply the factorizer on a some of group lens's larger 
>>>> data sets...
>>>> 
>>>> On Apr 18, 2012, at 1:09 PM, Sean Owen wrote:
>>>> 
>>>>> This paper doesn't address how to compute the SVD. There are two
>>>>> approaches implemented with SVDRecommender. One computes a SVD, one
>>>>> doesn't :) Really it ought to be called something like
>>>>> MatrixFactorizationRecommender. The SVD factorizer uses a fairly
>>>>> simple expectation maximization approach. I don't know how well this
>>>>> scales. The other factorizer uses alternating-least-squares.
>>>>> 
>>>>> What you come out with are not 3 matrices, from an SVD, but 2. The "S"
>>>>> matrix in the SVD of singular values is mashed into the left/right
>>>>> singular vectors.
>>>>> 
>>>>> So to answer your question now, the prediction expression is
>>>>> essentially the same, with two caveats:
>>>>> 
>>>>> 1. It shows it as the product of U, sqrt(S), sqrt(S), and V. What you
>>>>> get out of the factorizer are really more like the "U" and "V" with
>>>>> the two sqrt(S) bits already multiplied in. The product comes out the
>>>>> same, there is a conceptual difference I suppose but not a practical
>>>>> one. In both cases you're really just multiplying the matrix factors
>>>>> all back together to make the predictions.
>>>>> 
>>>>> 2. This model subtracts the customer average rating in the beginning,
>>>>> and adds it back at the end here. The SVDRecommender doesn't do that,
>>>>> because, quite crucially, it turns sparse data into dense data (all
>>>>> the zeroes become non-zero) and this crushes scalability.
>>>>> 
>>>>> The answer is "mostly the same thing" yes. In fact this is broadly how
>>>>> all matrix factorization approaches work.
>>>>> 
>>>>> On Wed, Apr 18, 2012 at 2:49 PM, Daniel Quach <danqu...@cs.ucla.edu> 
>>>>> wrote:
>>>>>> I am basing my knowledge off this paper: 
>>>>>> http://www.grouplens.org/papers/pdf/webKDD00.pdf
>>>>>> 
>>>>>> Your book provided algorithms for the user-based, item-based, and slope 
>>>>>> one recommendation, but none for the SVDRecommender (I'm guessing 
>>>>>> because it was experimental)
>>>>>> 
>>>>>> Does the SVDRecommender just compute the resultant matrices and follow a 
>>>>>> formula similar to the one at the top of page 5 in the linked paper? I 
>>>>>> think I understand the process of SVD but I'm just wondering how it's 
>>>>>> exactly applied to obtain recommendations in mahout's case.
>>>>>> 
>>>>>> 
>>>>>> On Apr 18, 2012, at 12:13 PM, Sean Owen wrote:
>>>>>> 
>>>>>>> Yes you could call it a model-based approach. I suppose I was thinking
>>>>>>> more of Bayesian implementations when I wrote that sentence.
>>>>>>> 
>>>>>>> SVD is the Singular Value Decomposition -- are you asking what the SVD
>>>>>>> is, or what matrix factorization is, or something about specific code
>>>>>>> here? You can look up the SVD online.
>>>>>>> 
>>>>>>> On Wed, Apr 18, 2012 at 12:49 PM, Daniel Quach <danqu...@cs.ucla.edu> 
>>>>>>> wrote:
>>>>>>>> I had originally thought the experimental SVDrecommender in mahout was 
>>>>>>>> a model-based collaborative filtering technique. Looking at the book 
>>>>>>>> "Mahout in Action", it mentions that model-based recommenders are a 
>>>>>>>> future goal for mahout, which implies to me that the SVDRecommender is 
>>>>>>>> not considered model-based.
>>>>>>>> 
>>>>>>>> How exactly does the SVDRecommender work in mahout? I can't seem to 
>>>>>>>> find any description of the algorithm underneath it
>>>>>> 
>>>> 
>>

Re: How does SVDRecommender work in mahout?

Reply via email to