Seems like most people agree that ranking is more important than rating in most 
recommender deployments. RMSE was used for a long time with cross-validation 
(partly because it was the choice of Netflix during the competition) but it is 
really a measure of total rating error.  In the past we’ve used 
mean-average-precision as a good measure of ranking quality. We chose hold-out 
tests based on time, so something like 10% of the most recent data was held out 
for cross-validaton and we measured MAP@n for tuning parameters.

http://en.wikipedia.org/wiki/Information_retrieval#Mean_average_precision

For our data (ecommerce shopping data) most of the ALS tuning parameters had 
very little affect on MAP. However cooccurrence recommenders performed much 
better using the same data. Unfortunately comparing two algorithms with offline 
tests is of questionable value. Still with nothing else to go on we went with 
the cooccurrence recommender.

On Mar 30, 2014, at 12:47 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:

Niklas,

http://en.wikipedia.org/wiki/Cross-validation_(statistics)

http://statweb.stanford.edu/~tibs/sta306b/cvwrong.pdf



On Sun, Mar 30, 2014 at 12:41 PM, Niklas Ekvall <niklas.ekv...@gmail.com>wrote:

> Hello Sebastian, could you do a deeper explanation or refer to any article
> that handle the subject?
> 
> Best regards, Niklas
> 
> 
> 2014-03-30 20:50 GMT+02:00 Sebastian Schelter <s...@apache.org>:
> 
>> Use k-fold cross-validation or hold-out tests for estimating the quality
>> of different parameter combinations.
>> 
>> --sebastian
>> 
>> 
>> On 03/30/2014 11:53 AM, Niklas Ekvall wrote:
>> 
>>> Hi,
>>> 
>>> My name is Niklas Ekvall and I have a implementation of the recommender
>>> algorithm "Large-scale Parallel Collaborative Filtering for the Netflix
>>> Prize" and now I'm wondering how to choose the number of features and
>>> lambda. Could any of guys help me to explain a stepwise strategy to
> choose
>>> or optimize these two parameters?
>>> 
>>> Best regards, Niklas
>>> 
>>> 
>>> 2014-03-27 19:07 GMT+01:00 j.barrett Strausser <
>>> j.barrett.straus...@gmail.com>:
>>> 
>>> Thanks Ted,
>>>> 
>>>> Yes for the time problem. We tend to use aggregations of session data.
> So
>>>> instead of asking for user recommendations we do things like
>>>> user+sessions
>>>> recommendations.
>>>> 
>>>> Of course, deciding when sessions start and stop isn't trivial. I
> ideally
>>>> what I would want to is time-weight views using a kernel or
> convolution.
>>>> That's a bit heavy so we typically have a global model, that is is
>>>> basically all preferences over times. Then these user+session type
>>>> models.
>>>> We can then combine these at another level to give recommendations
> based
>>>> on
>>>> what you like throughout time versus what you have been doing recently.
>>>> 
>>>> 
>>>> 
>>>> -b
>>>> 
>>>> 
>>>> On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning <ted.dunn...@gmail.com>
>>>> wrote:
>>>> 
>>>> For the poly-syllable challenged,
>>>>> 
>>>>> hetereoscedasticity - degree of variation changes.  This is common
> with
>>>>> counts because you expect the standard deviation of count data to be
>>>>> proportional to sqrt(n).
>>>>> 
>>>>> time imhogeneity - changes in behavior over time.  One way to handle
>>>>> this
>>>>> (roughly) is to first remove variation in personal and item means over
>>>>> 
>>>> time
>>>> 
>>>>> (if using ratings) and then to segment user histories into episodes.
> By
>>>>> including both short and long episodes you get some repair for changes
>>>>> in
>>>>> personal preference.  A great example of how this works/breaks is
>>>>> 
>>>> Christmas
>>>> 
>>>>> music.  On December 26th, you want to *stop* recommending this music
> so
>>>>> 
>>>> it
>>>> 
>>>>> really pays to limit histories at this point.  By having an episodic
>>>>> user
>>>>> session that starts around November and runs to Christmas, you can get
>>>>> 
>>>> good
>>>> 
>>>>> recommendations for seasonal songs and not pollute the rest of the
>>>>> universe.
>>>>> 
>>>>> 
>>>>> 
>>>>> On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser <
>>>>> j.barrett.straus...@gmail.com> wrote:
>>>>> 
>>>>> For my team it has usually been hetereoscedasticity and time
>>>>>> 
>>>>> inhomogeneity.
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin
>>>>>> <tevfik.ayte...@gmail.com>wrote:
>>>>>> 
>>>>>> Interesting topic,
>>>>>>> Ted, can you give examples of those mathematical assumptions
>>>>>>> under-pinning ALS which are violated by the real world?
>>>>>>> 
>>>>>>> On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <ted.dunn...@gmail.com
>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> How can there be any other practical method?  Essentially all of
>>>>>>>> 
>>>>>>> the
>>>> 
>>>>> mathematical assumptions under-pinning ALS are violated by the real
>>>>>>>> 
>>>>>>> world.
>>>>>>> 
>>>>>>>>  Why would any mathematical consideration of the number of
> features
>>>>>>>> 
>>>>>>> be
>>>>> 
>>>>>> much
>>>>>>> 
>>>>>>>> more than heuristic?
>>>>>>>> 
>>>>>>>> That said, you can make an information content argument.  You can
>>>>>>>> 
>>>>>>> also
>>>>> 
>>>>>> make
>>>>>>> 
>>>>>>>> the argument that if you take too many features, it doesn't much
>>>>>>>> 
>>>>>>> hurt
>>>> 
>>>>> so
>>>>>> 
>>>>>>> you should always take as many as you can compute.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter <
>>>>>>>> 
>>>>>>> s...@apache.org>
>>>> 
>>>>> wrote:
>>>>>>> 
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> does anyone know of a principled approach of choosing the number
>>>>>>>>> 
>>>>>>>> of
>>>> 
>>>>> features for ALS (other than cross-validation?)
>>>>>>>>> 
>>>>>>>>> --sebastian
>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> 
>>>>>> 
>>>>>> https://github.com/bearrito
>>>>>> @deepbearrito
>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> --
>>>> 
>>>> 
>>>> https://github.com/bearrito
>>>> @deepbearrito
>>>> 
>>>> 
>>> 
>> 
> 

Reply via email to