Re: Available datasets for recommendations

Lance Norskog Fri, 08 Jul 2011 15:04:29 -0700

Ratings and more generally "parallel universe" or "dual space" or
"dyadic" (but that is other things): Correspondences between samples
in two different parallel spaces.


A mail corpus has different kinds of gleanable knowledge: word/subject
line correspondences, authoritative mail v.s. conversational, reply-to
is a one-way relationship in the same space, time series aspects, and
more. It would be a good base for an examples/ set of several
algorithms and interpreting all concepts. That's a trimester course.

Lance

On Fri, Jul 8, 2011 at 5:31 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> It's not a traditional ratings corpus, but the ASF mail archives I put up all 
> have clear provenance and are freely available and I don't think it is too 
> hard to make a recommender problem out of them, likely based on the replies.  
> There are 6m+ items in it.  And now that Amazon has free inbound, I may well 
> setup a job to do it on a more regular basis, perhaps quarterly.
>
> -Grant
>
> On Jul 7, 2011, at 11:05 PM, Lance Norskog wrote:
>
>> What recommendation datasets, that are available, are considered
>> "large" by Mahout testing standards? Yahoo KDD Cup is offline, the
>> Netflix data went under a cloud...
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>
> --------------------------
> Grant Ingersoll
>
>
>
>



-- 
Lance Norskog
goks...@gmail.com

Re: Available datasets for recommendations

Reply via email to