Re: Popularity of recommender items

2014-02-06 Thread Ted Dunning
One way to deal with that is to build a model that predicts the ultimate number of views/plays/purchases for the item based on history so far. If this model can be made Bayesian enough to sample from the posterior distribution of total popularity, then you can use the Thomson sampling trick a

Re: Popularity of recommender items

2014-02-06 Thread Pat Ferrel
A velocity measure of sorts, makes a lot of sense for a “what’s hot” list. The particular thing I’m looking at now is how to rank a list of items by some measure of popularity when you don’t have a velocity. There is an introduction date though so another way to look at popularity might be to de

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Suneel Marthi
Oops... Didn't see ted's response before I had replied.. Sent from my iPhone > On Feb 6, 2014, at 7:31 PM, Ted Dunning wrote: > > OK. Cool. > > That probably means that problem is much smaller and more likely to be > logistics. Your suggestion of an off-by-one issue is quite plausible. > >

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Suneel Marthi
Sent from my iPhone > On Feb 6, 2014, at 10:08 AM, Ted Dunning wrote: > > I can't comment on the specific question that you ask, but it should not > necessarily be expected that LDA will reconstruct the categories that you > have in mind. It will develop categories that explain the data as we

Re: Popularity of recommender items

2014-02-06 Thread Ted Dunning
Rising popularity is often a better match to what people want to see on a "most popular" page. The best measure for that in my experience is log (new_count + offset) / (old_count + offset) where new and old counts are the number of views during the periods in question and offset is used partly to

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Ted Dunning
OK. Cool. That probably means that problem is much smaller and more likely to be logistics. Your suggestion of an off-by-one issue is quite plausible. On Thu, Feb 6, 2014 at 4:46 PM, Stamatis Rapanakis wrote: > That is correct. My problem is not the categories developed (which are > meaningfu

Re: Popularity of recommender items

2014-02-06 Thread Sean Owen
Agree - I thought by asking for most popular you meant to look for apple pie. Agree with you and Ted that the sum of similarity says something interesting even if it is not popularity exactly. On Feb 6, 2014 11:16 AM, "Pat Ferrel" wrote: > The problem with the usual preference count is that big

Re: Popularity of recommender items

2014-02-06 Thread Pat Ferrel
The problem with the usual preference count is that big hit items can be overwhelmingly popular. If you want to know which ones the most people saw and are likely to have an opinion about then this seems a good measure. But these hugely popular items may not differentiate taste. So we calculate

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Stamatis Rapanakis
That is correct. My problem is not the categories developed (which are meaningful by the way) but the fact that a certain document is not assigned to the proper (LDA generated) category. The document to topics assignment is really bad... On Thu, Feb 6, 2014 at 5:08 PM, Ted Dunning wrote: > I ca

Re: Popularity of recommender items

2014-02-06 Thread Ted Dunning
If you look at the indicator matrix (cooccurrence reduced by LLR), you will usually have asymmetry due to limitations on the number of indicators per row. This will give you some interesting results when you look at the column sums. I wouldn't call it popularity, but it is an interesting measure.

Re: Extracting the topics of documents (LDA, Mahout 0.7)

2014-02-06 Thread Ted Dunning
I can't comment on the specific question that you ask, but it should not necessarily be expected that LDA will reconstruct the categories that you have in mind. It will develop categories that explain the data as well as it can, but that won't necessarily match the categories you intend. It is li

Re: Popularity of recommender items

2014-02-06 Thread Sean Owen
I have always defined popularity as just the number of ratings/prefs, yes. You could rank on some kind of 'net promoter score' -- good ratings minus bad ratings -- though that becomes more like 'most liked'. How do you get popularity from similarity -- similarity to what? Ranking by sum of similar

Re: Mahout 0.9 with cloudera

2014-02-06 Thread Sean Owen
Yeah that's the version that's bundled with 4.x. 5.x has basically 0.8 plus patches to work on MR2. Mahout is not really something you have to install. Even though it does get packaged and dumped onto the cluster nodes. Just use it against your cluster -- it can be from a machine that isn't part o

Re: Popularity of recommender items

2014-02-06 Thread Tevfik Aytekin
Well, I think what you are suggesting is to define popularity as being similar to other items. So in this way most popular items will be those which are most similar to all other items, like the centroids in K-means. I would first check the correlation between this definition and the standard one

Mahout 0.9 with cloudera

2014-02-06 Thread Kevin Moulart
Hi everyone, Is there a simple way to install Mahout 0.9 on a cluster running Cloudera's CDH 4.5 ? When I try what they advise on their doc (yum install mahout on my CentOS 6 node), it wants to install mahout version 0.7+22-1.cdh4.5.0.p0.14.el6. Thanks in advance ! -- Kévin Moulart GSM France