Ratings and more generally "parallel universe" or "dual space" or "dyadic" (but that is other things): Correspondences between samples in two different parallel spaces.
A mail corpus has different kinds of gleanable knowledge: word/subject line correspondences, authoritative mail v.s. conversational, reply-to is a one-way relationship in the same space, time series aspects, and more. It would be a good base for an examples/ set of several algorithms and interpreting all concepts. That's a trimester course. Lance On Fri, Jul 8, 2011 at 5:31 AM, Grant Ingersoll <gsing...@apache.org> wrote: > It's not a traditional ratings corpus, but the ASF mail archives I put up all > have clear provenance and are freely available and I don't think it is too > hard to make a recommender problem out of them, likely based on the replies. > There are 6m+ items in it. And now that Amazon has free inbound, I may well > setup a job to do it on a more regular basis, perhaps quarterly. > > -Grant > > On Jul 7, 2011, at 11:05 PM, Lance Norskog wrote: > >> What recommendation datasets, that are available, are considered >> "large" by Mahout testing standards? Yahoo KDD Cup is offline, the >> Netflix data went under a cloud... >> >> -- >> Lance Norskog >> goks...@gmail.com > > -------------------------- > Grant Ingersoll > > > > -- Lance Norskog goks...@gmail.com