Hi, My name is Niklas Ekvall and I have a implementation of the recommender algorithm "Large-scale Parallel Collaborative Filtering for the Netflix Prize" and now I'm wondering how to choose the number of features and lambda. Could any of guys help me to explain a stepwise strategy to choose or optimize these two parameters?
Best regards, Niklas 2014-03-27 19:07 GMT+01:00 j.barrett Strausser < j.barrett.straus...@gmail.com>: > Thanks Ted, > > Yes for the time problem. We tend to use aggregations of session data. So > instead of asking for user recommendations we do things like user+sessions > recommendations. > > Of course, deciding when sessions start and stop isn't trivial. I ideally > what I would want to is time-weight views using a kernel or convolution. > That's a bit heavy so we typically have a global model, that is is > basically all preferences over times. Then these user+session type models. > We can then combine these at another level to give recommendations based on > what you like throughout time versus what you have been doing recently. > > > > -b > > > On Thu, Mar 27, 2014 at 1:59 PM, Ted Dunning <ted.dunn...@gmail.com> > wrote: > > > For the poly-syllable challenged, > > > > hetereoscedasticity - degree of variation changes. This is common with > > counts because you expect the standard deviation of count data to be > > proportional to sqrt(n). > > > > time imhogeneity - changes in behavior over time. One way to handle this > > (roughly) is to first remove variation in personal and item means over > time > > (if using ratings) and then to segment user histories into episodes. By > > including both short and long episodes you get some repair for changes in > > personal preference. A great example of how this works/breaks is > Christmas > > music. On December 26th, you want to *stop* recommending this music so > it > > really pays to limit histories at this point. By having an episodic user > > session that starts around November and runs to Christmas, you can get > good > > recommendations for seasonal songs and not pollute the rest of the > > universe. > > > > > > > > On Thu, Mar 27, 2014 at 8:30 AM, j.barrett Strausser < > > j.barrett.straus...@gmail.com> wrote: > > > > > For my team it has usually been hetereoscedasticity and time > > inhomogeneity. > > > > > > > > > > > > > > > On Thu, Mar 27, 2014 at 10:18 AM, Tevfik Aytekin > > > <tevfik.ayte...@gmail.com>wrote: > > > > > > > Interesting topic, > > > > Ted, can you give examples of those mathematical assumptions > > > > under-pinning ALS which are violated by the real world? > > > > > > > > On Thu, Mar 27, 2014 at 3:43 PM, Ted Dunning <ted.dunn...@gmail.com> > > > > wrote: > > > > > How can there be any other practical method? Essentially all of > the > > > > > mathematical assumptions under-pinning ALS are violated by the real > > > > world. > > > > > Why would any mathematical consideration of the number of features > > be > > > > much > > > > > more than heuristic? > > > > > > > > > > That said, you can make an information content argument. You can > > also > > > > make > > > > > the argument that if you take too many features, it doesn't much > hurt > > > so > > > > > you should always take as many as you can compute. > > > > > > > > > > > > > > > > > > > > On Thu, Mar 27, 2014 at 6:33 AM, Sebastian Schelter < > s...@apache.org> > > > > wrote: > > > > > > > > > >> Hi, > > > > >> > > > > >> does anyone know of a principled approach of choosing the number > of > > > > >> features for ALS (other than cross-validation?) > > > > >> > > > > >> --sebastian > > > > >> > > > > > > > > > > > > > > > > -- > > > > > > > > > https://github.com/bearrito > > > @deepbearrito > > > > > > > > > -- > > > https://github.com/bearrito > @deepbearrito >