It's actually a pretty interesting challenge, once you get past the constrictions of their API: you're optimizing explicitly for revenue-per-session, take as input past sessions, which include the kinds of practical things you'd like: each session is by a userId which will naturally include repeat customers, products have prices, and there are categoryId labels already.
Due to the whole Netflix data lawsuit, the training data is synthetic, which puts the contestants at a disadvantage, and another interesting fact: runtime performance is at issue: your code will be run *live*, with your model being used to produce recommendations with a hard timeout of 50ms - if you miss this more than 20% of the time, you fail to progress to the end of the semi-final round. You're allowed to use open-source Apache licensed code (and are in fact *required* to license your code according to the ASL to compete), but their APIs are, while extraordinarily similar to Hadoop and Mahout/Taste, are fixed, so you can't just do drop-in replacement. On Sat, May 14, 2011 at 6:45 PM, Grant Ingersoll <[email protected]>wrote: > Ah, you are right. Read too quickly. > > On May 14, 2011, at 6:32 PM, Jake Mannix wrote: > > > You're allowed to be an individual, or a team not associated with an > > academic institution, according to what I'm reading on that page... > > > > On Sat, May 14, 2011 at 3:13 PM, Grant Ingersoll <[email protected] > >wrote: > > > >> Ah, never mind. Academics only. :-( > >> > >> > >> On May 14, 2011, at 5:34 PM, Danny Bickson wrote: > >> > >>> Another interesting collaborative filtering contest with a big prize of > >> 1M $. > >>> See http://overstockreclabprize.com/ > >>> > >>> - Danny Bickson > >> > >> > >> > > >
