I think that this is a really bad thing to do.

The LLR is really good to find interesting things.  Once you have done
that, directly using the LLR in any form to produce a weight reduces the
method to something akin to Naive Bayes.  This is bad generally and very,
very bad in the cases of smal counts.

Typically LLR works extremely well when you use it as a filter only and
then use som global measure to compute a weight.  See the Luduan method [1]
for an example.  The use of a text retrieval engine to implement a search
engine such as I have been lately nattering about much too much is another
example.    A major reason that such methods work so unreasonably well is
that they don't make silly weighting decisions based on very small counts.
 It is slightly paradoxical that looking at global counts rather than
counts specific so the cases of interest produce much better weights, but
the empirical evidence is pretty over-whelming.

Aside from such practical considerations, there is the fact that converting
a massive number of frequentist p values into weight is either outright
heresy (from the frequentist point of view) or simply nutty (from the
Bayesian point of view).

In any case, I have never been able get more than one bit of useful
information from an LLR score.  That one bit is extremely powerful, but
getting more seems to be a very bad idea.


[1] http://arxiv.org/abs/1207.1847 chapter 7, espoecially


On Thu, Jun 20, 2013 at 10:41 AM, Dan Filimon
<dangeorge.fili...@gmail.com>wrote:

> Awesome! Thanks for clarifying! :)
>
>
> On Thu, Jun 20, 2013 at 12:28 PM, Sean Owen <sro...@gmail.com> wrote:
>
> > Yes that should be all that's needed.
> > On Jun 20, 2013 10:27 AM, "Dan Filimon" <dangeorge.fili...@gmail.com>
> > wrote:
> >
> > > Right, makes sense. So, by normalize, I need to replace the counts in
> the
> > > matrix with probabilities.
> > > So, I would divide everything by the sum of all the counts in the
> matrix?
> > >
> > >
> > > On Thu, Jun 20, 2013 at 12:16 PM, Sean Owen <sro...@gmail.com> wrote:
> > >
> > > > I think the quickest answer is: the formula computes the test
> > > > statistic as a difference of log values, rather than log of ratio of
> > > > values. By not normalizing, the entropy is multiplied by a factor
> (sum
> > > > of the counts) vs normalized. So you do end up with a statistic N
> > > > times larger when counts are N times larger.
> > > >
> > > > On Thu, Jun 20, 2013 at 9:52 AM, Dan Filimon
> > > > <dangeorge.fili...@gmail.com> wrote:
> > > > > My understanding:
> > > > >
> > > > > Yes, the log-likelihood ratio (-2 log lambda) follows a chi-squared
> > > > > distribution with 1 degree of freedom in the 2x2 table case.
> > > > >       A   ~A
> > > > > B
> > > > > ~B
> > > > >
> > > > > We're testing to see if p(A | B) = p(A | ~B). That's the null
> > > > hypothesis. I
> > > > > compute the LLR. The larger that is, the more unlikely the null
> > > > hypothesis
> > > > > is to be true.
> > > > > I can then look at a table with df=1. And I'd get p, the
> probability
> > of
> > > > > seeing that result or something worse (the upper tail).
> > > > > So, the probability of them being similar is 1 - p (which is
> exactly
> > > the
> > > > > CDF for that value of X).
> > > > >
> > > > > Now, my question is: in the contingency table case, why would I
> > > > normalize?
> > > > > It's a ratio already, isn't it?
> > > > >
> > > > >
> > > > > On Thu, Jun 20, 2013 at 11:03 AM, Sean Owen <sro...@gmail.com>
> > wrote:
> > > > >
> > > > >> someone can check my facts here, but the log-likelihood ratio
> > follows
> > > > >> a chi-square distribution. You can figure an actual probability
> from
> > > > >> that in the usual way, from its CDF. You would need to tweak the
> > code
> > > > >> you see in the project to compute an actual LLR by normalizing the
> > > > >> input.
> > > > >>
> > > > >> You could use 1-p then as a similarity metric.
> > > > >>
> > > > >> This also isn't how the test statistic is turned into a similarity
> > > > >> metric in the project now. But 1-p sounds nicer. Maybe the
> > historical
> > > > >> reason was speed, or, ignorance.
> > > > >>
> > > > >> On Thu, Jun 20, 2013 at 8:53 AM, Dan Filimon
> > > > >> <dangeorge.fili...@gmail.com> wrote:
> > > > >> > When computing item-item similarity using the log-likelihood
> > > > similarity
> > > > >> > [1], can I simply apply a sigmoid do the resulting values to get
> > the
> > > > >> > probability that two items are similar?
> > > > >> >
> > > > >> > Is there any other processing I need to do?
> > > > >> >
> > > > >> > Thanks!
> > > > >> >
> > > > >> > [1]
> > > http://tdunning.blogspot.ro/2008/03/surprise-and-coincidence.html
> > > > >>
> > > >
> > >
> >
>

Reply via email to