Generating similarity file(s) for item recommender?

2012-07-03 Thread Matt Mitchell
Hi, I'm just beginning to play with the Mahout recommendation framework. I'm wondering if I could get some advice for implementing this thing. My data comes from a web app's, event logs, where the users accounts are only persisted for 30 days -- cookie data. I'm thinking the session ID (in the co

Re: Generating similarity file(s) for item recommender?

2012-07-03 Thread Mridul Kapoor
> I'm thinking the session ID (in the cookie) would be used as the user ID. > The events > are tied to product IDs, so these would be used in generating the > preferences. I guess if you consider product-preference on a per session-basis (i.e. only items for which a user expresses preference for,

Re: Generating similarity file(s) for item recommender?

2012-07-03 Thread Matt Mitchell
Thanks Mridul, I'll try this out. Does getItemIDs return every item id from the file in your example? This kind of leads me to another, related question... I want to have my recommender engine recommend items to a user, but the items should be from a known set of item ids. For example, if a user i

Re: Generating similarity file(s) for item recommender?

2012-07-03 Thread Sean Owen
I'm not sure if Mridul's suggestion does what you want. Do you want to recommend items to users? then no, you do not start with item IDs and recommend to them. It sounds like your question is how to compute similarity data. The first answer is that you do not use Hadoop unless you must use Hadoop.

Re: Generating similarity file(s) for item recommender?

2012-07-04 Thread Matt Mitchell
Hi Sean, Myrrix does look interesting! I'll keep an eye on it. What I'd like to do is recommend items to users yes. I looked at the IdRescorer and it did the job perfectly (pre filtering). I was a little misleading in regard to the size of the data. The raw data files are around 1GB. But after t

Re: Generating similarity file(s) for item recommender?

2012-07-04 Thread Sean Owen
If your input is 10MB then the good news is you are not near the scale where you need Hadoop. A simple non-distributed Mahout recommender works well, and includes the Rescorer capability you need. That's a fine place to start. The book ought to give a pretty good tour of how that works in chapter

Re: Generating similarity file(s) for item recommender?

2012-07-04 Thread Matt Mitchell
Thanks Sean! Nice to know I can stay simple for now. - Matt On Wed, Jul 4, 2012 at 9:59 AM, Sean Owen wrote: > If your input is 10MB then the good news is you are not near the scale > where you need Hadoop. A simple non-distributed Mahout recommender > works well, and includes the Rescorer capab