Hey Guys, This is a dataset that kinda fits the bill, sorta -- probably the closest thing out there. I got this extracted from BestBuy. Now, while it is more focused on 'search' opposed to recommendations...could probably double for a recs problem.
basically, each userid is mapped to a query that resulted in a click on a particular sku (product_id). They are the real skus as well, so they can map back to real products in their products api (this data is also provided in bulk on kaggle): https://bbyopen.com/api-profiles/products-api http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel <pat.fer...@gmail.com> wrote: > MAJOR may be too tame a word. > > Furthermore there are several enhancements the community could make to > support retail data and retail recommenders. For one thing without public > data a *public* cross-recommender will probably not get built. > > The cross-recommender needs to separate actions types and use them in > slightly different ways so it is important to have a data set with user's > purchases but also views, add-to-cart, impressions, purchases in groups > (shopping carts)--whatever events are available with anonymized user IDs. > > This data set would be significant in getting new techniques into the > community and therefore back to people like you. > > On Apr 15, 2013, at 9:49 AM, Koobas <koo...@gmail.com> wrote: > > Definitely of MAJOR interest. > I am sure it would also draw all kinds of desired attention to your > business. > Movie Lens is way too small to be meaningful any more. > Wikipedia articles and Stackoverflow tags are not retail data! > By all means, post some real retail data, if you can. > Meaningful sizes would be appreciated: millions of customers, > thousands - tens of thousands products. > > > On Mon, Apr 15, 2013 at 12:27 PM, Robin Morris <r...@baynote.com> wrote: > > > I asked management here a while ago whether there would be a problem with > > releasing an anonymized set of data from one of our retail customers, and > > didn't get too much push-back. If this is something that would be of > > major interest, I can ask again and see whether there's something we can > > put out as a community resource. > > > > Robin > > > > > > On 4/10/13 8:37 PM, "Pat Ferrel" <p...@occamsmachete.com> wrote: > > > >> I have retail data but can't publish results from it. If I could get a > >> public sample I'd share how the technique worked out. > >> > >> Not sure how to simulate this data. It has the important characteristic > >> that every purchase is also a view but not the other way around and > Ted's > >> technique is a way to scrub the views that don't lead to purchases. All > >> these are implicit preferences but that's not the important part for > this > >> technique. > >> > >> On Apr 10, 2013, at 4:15 PM, Koobas <koo...@gmail.com> wrote: > >> > >> Retail data may be hard to impossible, but one can improvise. > >> It seems to be fairly common to use Wikipedia articles (Myrrix, > GraphLab). > >> Another idea is to use StackOverflow tags (Myrrix examples). > >> Although they are only good for emulating implicit feedback. > >> > >> > >> On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning <ted.dunn...@gmail.com> > >> wrote: > >> > >>> On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel <p...@occamsmachete.com> > >>> wrote: > >>> > >>>> Does anyone know of a public data set that provides things like views > >>>> and > >>>> purchases? > >>>> > >>> > >>> I don't. > >>> > >> > > > > > >