Thanks.

Netflix & Yahoo KDD were my first choice, but are gone. It did not
occur to me that stashing such things away would be wise; packrat
though I am.

Purpose is testing large user/item or document'/term databases.

On Fri, Jul 8, 2011 at 12:44 AM, Sebastian Schelter <s...@apache.org> wrote:
> Another dataset to play with is this compilation of song listenings scraped
> from the last.fm API:
>
> http://mtg.upf.edu/node/1671.
>
> Should include about 20M ratings.
>
> --sebastian
>
> On 08.07.2011 09:17, Sean Owen wrote:
>>
>> The link is http://www.occamslab.com/petricek/data/
>>
>> The KDD or Netflix data are plenty big to play with. How big is big for
>> your
>> purpose?
>>
>> On Fri, Jul 8, 2011 at 7:05 AM, web service<wbs...@gmail.com>  wrote:
>>
>>> Is it taken offline as well ?
>>>
>>> On Thu, Jul 7, 2011 at 10:40 PM, Alex Kozlov<ale...@cloudera.com>  wrote:
>>>
>>>> There is still a libimseti dataset
>>>> http://www.occamslab.com/petricek/datawith 17,359,346 ratings.  People
>>>> are scared after the Netflix lawsuit.
>>>>
>>>> On Thu, Jul 7, 2011 at 10:17 PM, Ted Dunning<ted.dunn...@gmail.com>
>>>> wrote:
>>>>
>>>>> Those are both reasonably large, but not commercial in scale.
>>>>>
>>>>> At Veoh, we had about 10 non-zero elements in our raw data.  I think
>>>>> Netflix
>>>>> has 100 million.
>>>>>
>>>>> On Thu, Jul 7, 2011 at 8:05 PM, Lance Norskog<goks...@gmail.com>
>>>
>>> wrote:
>>>>>
>>>>>> What recommendation datasets, that are available, are considered
>>>>>> "large" by Mahout testing standards? Yahoo KDD Cup is offline, the
>>>>>> Netflix data went under a cloud...
>>>>>>
>>>>>> --
>>>>>> Lance Norskog
>>>>>> goks...@gmail.com
>>>>>>
>>>>>
>>>>
>>>
>>
>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to