Re: Log-likelihood based correlation test?

Andrew Troemner Fri, 17 Nov 2017 07:59:56 -0800

I'll echo Dan here. He and I went through the raw Mahout libraries called
by the Universal Recommender, and while Noelia's description is accurate
for an intermediate step, the indexing via ElasticSearch generates some
separate relevancy scores based on their Lucene indexing scheme. The raw
LLR scores are used in building this process, but the final scores served
up by the API's should be post-processed, and cannot be used to reconstruct
the raw LLR's (to my understanding).


There are also some additional steps including down-sampling, which scrubs
out very rare combinations (which otherwise would have very high LLR's for
a single observation), which partially corrects for the statistical problem
of multiple detection. But the underlying logic is per Ted Dunning's
research and summarized by Noelia, and is a solid way to approach
interaction effects for tens of thousands of items and including secondary
indicators (like demographics, or implicit preferences).


*ANDREW TROEMNER*Associate Principal Data Scientist | salesforce.com
Office: 317.832.4404
Mobile: 317.531.0216



<http://smart.salesforce.com/sig/atroemner//us_mb_kb/default/link.html>

On Fri, Nov 17, 2017 at 9:55 AM, Daniel Gabrieli <dgabri...@salesforce.com>
wrote:

> Maybe someone can correct me if I am wrong but in the code I believe
> Elasticsearch is used instead of "resulting LLR is what goes into the AB
> element in matrix PtP or PtL."
>
> By default the strongest 50 LLR scores get set as searchable values in
> Elasticsearch per item-event pair.
>
> You can configure the thresholds for significance using the configuration
> parameters: maxCorrelatorsPerItem or minLLR.  And this configuration is
> important because at default of 50 you may end up treating all "indicator
> values" as significant.  More info here: http://actionml.com/
> docs/ur_config
>
>
>
> On Fri, Nov 17, 2017 at 4:50 AM Noelia Osés Fernández <no...@vicomtech.org>
> wrote:
>
>>
>> Let's see if I've understood how LLR is used in UR. Let P be the matrix
>> for the primary conversion indicator (say purchases) and Pt its transposed.
>>
>> Then, with a second matrix, which can be P again to make PtP or a matrix
>> for a secondary indicator (say L for likes) to make PtL, we take a row from
>> Pt (item A) and a column from the second matrix (either P or L, in this
>> example) (item B) and we calculate the table that Ted Dunning explains on
>> his webpage: the number of coocurrences that item A *AND* B have been
>> purchased (or purchased AND liked), the number of times that item A *OR*
>> B have been purchased (or purchased OR liked), and the number of times that
>> *neither* item A nor B have been purchased (or purchased or liked). With
>> this counts we calculate LLR following the formulas that Ted Dunning
>> provides and the resulting LLR is what goes into the AB element in matrix
>> PtP or PtL. Correct?
>>
>> Thank you!
>>
>> On 16 November 2017 at 17:03, Noelia Osés Fernández <no...@vicomtech.org>
>> wrote:
>>
>>> Wonderful! Thanks Daniel!
>>>
>>> Suneel, I'm still new to the Apache ecosystem and so I know that Mahout
>>> is used but only vaguely... I still don't know the different parts well
>>> enough to have a good understanding of what each of them do (Spark, MLLib,
>>> PIO, Mahout,...)
>>>
>>> Thank you both!
>>>
>>> On 16 November 2017 at 16:59, Suneel Marthi <smar...@apache.org> wrote:
>>>
>>>> Indeed so. Ted Dunning is an Apache Mahout PMC and committer and the
>>>> whole idea of Search-based Recommenders stems from his work and insights.
>>>> If u didn't know, the PIO UR uses Apache Mahout under the hood and hence u
>>>> see the LLR.
>>>>
>>>> On Thu, Nov 16, 2017 at 3:49 PM, Daniel Gabrieli <
>>>> dgabri...@salesforce.com> wrote:
>>>>
>>>>> I am pretty sure the LLR stuff in UR is based off of this blog post
>>>>> and associated paper:
>>>>>
>>>>> http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html
>>>>>
>>>>> Accurate Methods for the Statistics of Surprise and Coincidence
>>>>> by Ted Dunning
>>>>>
>>>>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5962
>>>>>
>>>>>
>>>>> On Thu, Nov 16, 2017 at 10:26 AM Noelia Osés Fernández <
>>>>> no...@vicomtech.org> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I've been trying to understand how the UR algorithm works and I think
>>>>>> I have a general idea. But I would like to have a *mathematical
>>>>>> description* of the step in which the LLR comes into play. In the
>>>>>> CCO presentations I have found it says:
>>>>>>
>>>>>> (PtP) compares column to column using
>>>>>> *log-likelihood based correlation test*
>>>>>>
>>>>>> However, I have searched for "log-likelihood based correlation test"
>>>>>> in google but no joy. All I get are explanations of the likelihood-ratio
>>>>>> test to compare two models.
>>>>>>
>>>>>> I would very much appreciate a math explanation of log-likelihood
>>>>>> based correlation test. Any pointers to papers or any other literature 
>>>>>> that
>>>>>> explains this specifically are much appreciated.
>>>>>>
>>>>>> Best regards,
>>>>>> Noelia
>>>>>>
>>>>>
>>>>
>>>
>>>
>>
>>
>>
>>
>>
>>
>>

Re: Log-likelihood based correlation test?

Reply via email to