More contests at: http://challenge.gov/NIH/132-nlm-show-off-your-apps-innovative-uses-of-nlm-information
On May 15, 2011, at 10:25 PM, Alex Kozlov wrote: > On Sat, May 14, 2011 at 9:11 PM, Jake Mannix <[email protected]> wrote: > >> Due to the whole Netflix data lawsuit, the training data is synthetic, >> which >> puts the contestants at a disadvantage, and another interesting fact: >> runtime >> performance is at issue: your code will be run *live*, with your model >> being >> used to produce recommendations with a hard timeout of 50ms - if you >> miss this more than 20% of the time, you fail to progress to the end of >> the semi-final round. >> > > If the dataset is synthetic (and I assume not random) is the goal to just > guess the model that generated the dataset? Assuming it performs well, how > far us the 'synthetic' model from the actual customer behavior so that there > are no 'surprises' when it runs 'live'? > > Potentially, there are more avenues for a lawsuit than in the Netflix case > since money is involved (just a thought). > > Alex K -------------------------------------------- Grant Ingersoll Join the LUCENE REVOLUTION Lucene & Solr User Conference May 25-26, San Francisco www.lucenerevolution.org
