Thanks Ted, awesome(and intuitive) how you reduced my problem by comparing features to users!
Mridul On 30 September 2013 10:47, Ted Dunning <ted.dunn...@gmail.com> wrote: > Yes. You can turn the normal item-item relationships around to get this. > > What you have is an item x feature matrix. Normally, one has a user x item > matrix in cooccurrence analysis and you get an item x item matrix. > > If you consider the features to be "users" in the computation, then the > resulting indicator matrix would be just what you want. > > The basic idea is that items would be related if they share features. Two > items that have the same feature would be said to co-occur on that feature. > Finding anomalous cooccurrence would be what you need to do to find items > that co-occur on many features. > > This works by building a small 2x2 matrix that relates item A and item B. > The entries would be feature counts. The upper left entry of the matrix > is the number of features that A and B both have, the upper right is the > number of features that B has that A does not and so on. Put another way, > the columns represent features that A has or does not have respectively and > the rows represent the features that B has or does not have respectively. > Items that give high root log-likelihood ratio values should considered > connected. Those that have small values for root LLR should be considered > not connected. The value of the root-LLR should be discarded after > thresholding and should not be considered a measure of the strength of the > relationship. > > I would recommend the same down-sampling that the rowSimilarityJob already > does. > > > > > > On Sun, Sep 29, 2013 at 3:40 AM, Mridul Kapoor <mridulkap...@gmail.com > >wrote: > > > Hi > > > > I have records - items - with many features. > > Something like > > > > ID, feature1, feature2, ..., featureN > > > > > > > Can I leverage Mahout's log-likelihood similarity metrics for calculating > > the K-Most similar items to a given item X? > > > > - > > Thanks > > Mridul > > >