Re: taking mahout into production

2011-05-21 Thread Lance Norskog
The existence of a rating, no matter what it is, generates an emotional engagement. "2.7? What idiots hate this? The kitten is a genius!". When I was involved in such a system, I wanted to randomly generate ratings. There is no SLA in a consumer site where you watch videos for free. You might get

Re: taking mahout into production

2011-05-21 Thread Ted Dunning
I bet the name becomes very appropriate very quickly. The other category of repeated viewing is click-spamming. They are very much worth ignoring as well. In any case, I have found that it is very important to almost entirely ignore the number of times that somebody interacts with a media item (

Re: taking mahout into production

2011-05-21 Thread Grant Ingersoll
On May 20, 2011, at 10:11 PM, Ted Dunning wrote: > Also, from a practical point of view, people rarely watch videos repeatedly, > even if they like them and want to see more. > > (people - excluding two year olds who will watch something they like until > it wears out) I would extend that from

Re: taking mahout into production

2011-05-20 Thread Lance Norskog
For using Mahout in production you need a feedback loop. The implementers are drawn to sexy things like great algorithms, and can print out a bunch of numbers and say, "ok, that looks right". I keep hacking up ways to interpret and view what Mahout spits out, and I'm not happy with any of them. On

Re: taking mahout into production

2011-05-20 Thread Ted Dunning
Also, from a practical point of view, people rarely watch videos repeatedly, even if they like them and want to see more. (people - excluding two year olds who will watch something they like until it wears out) On Fri, May 20, 2011 at 7:04 PM, Sean Owen wrote: > I agree that ratings contain rel

Re: taking mahout into production

2011-05-20 Thread Sean Owen
I agree that ratings contain relatively little data. Here you're not using direct ratings, but inferring some notion of rating from impressions. Does your scheme make sense? It's not illogical but not one I would choose. To me, there is the most "information" in the jump from 0 impressions to 1. Th

Re: taking mahout into production

2011-05-20 Thread Sebastian Schelter
I published an article in my blog at http://ssc.io recently that deals with scaling recommender systems, i'm sure it has some ideas you could adapt. --sebastian Am 20.05.2011 20:02 schrieb "Ted Dunning" : > Sean will be able to address scaling and configuration better than I, but I > have built vi

Re: taking mahout into production

2011-05-20 Thread Ted Dunning
Sean will be able to address scaling and configuration better than I, but I have built video recommendation systems before and found that a) ratings are nearly worthless, largely because so few people will rate things b) the best preference data we ever found was whether the user viewed the asset

taking mahout into production

2011-05-20 Thread Varnit Khanna
Hi, I have been considering using mahout for our recommendation engine needs and had couple of questions about using it in production. Use Case: We need to provide recommendation on video assets (similar to hulu) to couple of million users and we have over 100K assets. Since we are experiencing gr