The link is http://www.occamslab.com/petricek/data/
The KDD or Netflix data are plenty big to play with. How big is big for your purpose? On Fri, Jul 8, 2011 at 7:05 AM, web service <wbs...@gmail.com> wrote: > Is it taken offline as well ? > > On Thu, Jul 7, 2011 at 10:40 PM, Alex Kozlov <ale...@cloudera.com> wrote: > > > There is still a libimseti dataset > > http://www.occamslab.com/petricek/datawith 17,359,346 ratings. People > > are scared after the Netflix lawsuit. > > > > On Thu, Jul 7, 2011 at 10:17 PM, Ted Dunning <ted.dunn...@gmail.com> > > wrote: > > > > > Those are both reasonably large, but not commercial in scale. > > > > > > At Veoh, we had about 10 non-zero elements in our raw data. I think > > > Netflix > > > has 100 million. > > > > > > On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog <goks...@gmail.com> > wrote: > > > > > > > What recommendation datasets, that are available, are considered > > > > "large" by Mahout testing standards? Yahoo KDD Cup is offline, the > > > > Netflix data went under a cloud... > > > > > > > > -- > > > > Lance Norskog > > > > goks...@gmail.com > > > > > > > > > >