This thread is very enlightening, thank you very much! Is there a way I can see what the P, PtP, and PtL matrices of an app are? In the handmade case, for example?
Are there any pio calls I can use to get these? On 17 November 2017 at 19:52, Pat Ferrel <p...@occamsmachete.com> wrote: > Mahout builds the model by doing matrix multiplication (PtP) then > calculating the LLR score for every non-zero value. We then keep the top K > or use a threshold to decide whether to keep of not (both are supported in > the UR). LLR is a metric for seeing how likely 2 events in a large group > are correlated. Therefore LLR is only used to remove weak data from the > model. > > So Mahout builds the model then it is put into Elasticsearch which is used > as a KNN (K-nearest Neighbors) engine. The LLR score is not put into the > model only an indicator that the item survived the LLR test. > > The KNN is applied using the user’s history as the query and finding items > the most closely match it. Since PtP will have items in rows and the row > will have correlating items, this “search” methods work quite well to find > items that had very similar items purchased with it as are in the user’s > history. > > =============================== that is the simple explanation > ======================================== > > Item-based recs take the model items (correlated items by the LLR test) as > the query and the results are the most similar items—the items with most > similar correlating items. > > The model is items in rows and items in columns if you are only using one > event. PtP. If you think it through, it is all purchased items in as the > row key and other items purchased along with the row key. LLR filters out > the weakly correlating non-zero values (0 mean no evidence of correlation > anyway). If we didn’t do this it would be purely a “Cooccurrence” > recommender, one of the first useful ones. But filtering based on > cooccurrence strength (PtP values without LLR applied to them) produces > much worse results than using LLR to filter for most highly correlated > cooccurrences. You get a similar effect with Matrix Factorization but you > can only use one type of event for various reasons. > > Since LLR is a probabilistic metric that only looks at counts, it can be > applied equally well to PtV (purchase, view), PtS (purchase, search terms), > PtC (purchase, category-preferences). We did an experiment using Mean > Average Precision for the UR using video “Likes” vs “Likes” and “Dislikes” > so LtL vs. LtL and LtD scraped from rottentomatoes.com reviews and got a > 20% lift in the MAP@k score by including data for “Dislikes”. > https://developer.ibm.com/dwblog/2017/mahout-spark-correlated-cross- > occurences/ > > So the benefit and use of LLR is to filter weak data from the model and > allow us to see if dislikes, and other events, correlate with likes. Adding > this type of data, that is usually thrown away is one the the most powerful > reasons to use the algorithm—BTW the algorithm is called Correlated > Cross-Occurrence (CCO). > > The benefit of using Lucene (at the heart of Elasticsearch) to do the KNN > query is that is it fast, taking the user’s realtime events into the query > but also because it is is trivial to add all sorts or business rules. like > give me recs based on user events but only ones from a certain category, of > give me recs but only ones tagged as “in-stock” in fact the business rules > can have inclusion rules, exclusion rules, and be mixed with ANDs and ORs. > > BTW there is a version ready for testing with PIO 0.12.0 and ES5 here: > https://github.com/actionml/universal-recommender/tree/0.7.0-SNAPSHOT > Instructions > in the readme and notice it is in the 0.7.0-SNAPSHOT branch. > > > On Nov 17, 2017, at 7:59 AM, Andrew Troemner <atroem...@salesforce.com> > wrote: > > I'll echo Dan here. He and I went through the raw Mahout libraries called > by the Universal Recommender, and while Noelia's description is accurate > for an intermediate step, the indexing via ElasticSearch generates some > separate relevancy scores based on their Lucene indexing scheme. The raw > LLR scores are used in building this process, but the final scores served > up by the API's should be post-processed, and cannot be used to reconstruct > the raw LLR's (to my understanding). > > There are also some additional steps including down-sampling, which scrubs > out very rare combinations (which otherwise would have very high LLR's for > a single observation), which partially corrects for the statistical problem > of multiple detection. But the underlying logic is per Ted Dunning's > research and summarized by Noelia, and is a solid way to approach > interaction effects for tens of thousands of items and including secondary > indicators (like demographics, or implicit preferences). > > > *ANDREW TROEMNER*Associate Principal Data Scientist | salesforce.com > Office: 317.832.4404 > Mobile: 317.531.0216 > > > > <http://smart.salesforce.com/sig/atroemner//us_mb_kb/default/link.html> > > On Fri, Nov 17, 2017 at 9:55 AM, Daniel Gabrieli <dgabri...@salesforce.com > > wrote: > >> Maybe someone can correct me if I am wrong but in the code I believe >> Elasticsearch is used instead of "resulting LLR is what goes into the AB >> element in matrix PtP or PtL." >> >> By default the strongest 50 LLR scores get set as searchable values in >> Elasticsearch per item-event pair. >> >> You can configure the thresholds for significance using the configuration >> parameters: maxCorrelatorsPerItem or minLLR. And this configuration is >> important because at default of 50 you may end up treating all "indicator >> values" as significant. More info here: http://actionml.com/docs >> /ur_config >> >> >> >> On Fri, Nov 17, 2017 at 4:50 AM Noelia Osés Fernández < >> no...@vicomtech.org> wrote: >> >>> >>> Let's see if I've understood how LLR is used in UR. Let P be the matrix >>> for the primary conversion indicator (say purchases) and Pt its transposed. >>> >>> >>> Then, with a second matrix, which can be P again to make PtP or a matrix >>> for a secondary indicator (say L for likes) to make PtL, we take a row from >>> Pt (item A) and a column from the second matrix (either P or L, in this >>> example) (item B) and we calculate the table that Ted Dunning explains on >>> his webpage: the number of coocurrences that item A *AND* B have been >>> purchased (or purchased AND liked), the number of times that item A *OR* >>> B have been purchased (or purchased OR liked), and the number of times >>> that *neither* item A nor B have been purchased (or purchased or >>> liked). With this counts we calculate LLR following the formulas that Ted >>> Dunning provides and the resulting LLR is what goes into the AB element in >>> matrix PtP or PtL. Correct? >>> >>> Thank you! >>> >>> On 16 November 2017 at 17:03, Noelia Osés Fernández <no...@vicomtech.org >>> > wrote: >>> >>>> Wonderful! Thanks Daniel! >>>> >>>> Suneel, I'm still new to the Apache ecosystem and so I know that Mahout >>>> is used but only vaguely... I still don't know the different parts well >>>> enough to have a good understanding of what each of them do (Spark, MLLib, >>>> PIO, Mahout,...) >>>> >>>> Thank you both! >>>> >>>> On 16 November 2017 at 16:59, Suneel Marthi <smar...@apache.org> wrote: >>>> >>>>> Indeed so. Ted Dunning is an Apache Mahout PMC and committer and the >>>>> whole idea of Search-based Recommenders stems from his work and insights. >>>>> If u didn't know, the PIO UR uses Apache Mahout under the hood and hence u >>>>> see the LLR. >>>>> >>>>> On Thu, Nov 16, 2017 at 3:49 PM, Daniel Gabrieli <dgabrieli@ >>>>> salesforce.com> wrote: >>>>> >>>>>> I am pretty sure the LLR stuff in UR is based off of this blog post >>>>>> and associated paper: >>>>>> >>>>>> http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html >>>>>> >>>>>> Accurate Methods for the Statistics of Surprise and Coincidence >>>>>> by Ted Dunning >>>>>> >>>>>> http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.14.5962 >>>>>> >>>>>> >>>>>> On Thu, Nov 16, 2017 at 10:26 AM Noelia Osés Fernández < >>>>>> no...@vicomtech.org> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I've been trying to understand how the UR algorithm works and I >>>>>>> think I have a general idea. But I would like to have a *mathematical >>>>>>> description* of the step in which the LLR comes into play. In the >>>>>>> CCO presentations I have found it says: >>>>>>> >>>>>>> (PtP) compares column to column using >>>>>>> *log-likelihood based correlation test* >>>>>>> >>>>>>> However, I have searched for "log-likelihood based correlation test" >>>>>>> in google but no joy. All I get are explanations of the likelihood-ratio >>>>>>> test to compare two models. >>>>>>> >>>>>>> I would very much appreciate a math explanation of log-likelihood >>>>>>> based correlation test. Any pointers to papers or any other literature >>>>>>> that >>>>>>> explains this specifically are much appreciated. >>>>>>> >>>>>>> Best regards, >>>>>>> Noelia >>>>>>> >>>>>> >>>>> >>>> >>>> >>> >>> >>> >>> >>> >>> >>> > > -- > You received this message because you are subscribed to the Google Groups > "actionml-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to actionml-user+unsubscr...@googlegroups.com. > To post to this group, send email to actionml-u...@googlegroups.com. > To view this discussion on the web visit https://groups.google. > com/d/msgid/actionml-user/CAA2BRS%2Boj%2BNYDmsNNd2mYM1ZC5CgWwC71W3% > 3DEhrO9qeOiKyWXA%40mail.gmail.com > <https://groups.google.com/d/msgid/actionml-user/CAA2BRS%2Boj%2BNYDmsNNd2mYM1ZC5CgWwC71W3%3DEhrO9qeOiKyWXA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > > -- <http://www.vicomtech.org> Noelia Osés Fernández, PhD Senior Researcher | Investigadora Senior no...@vicomtech.org +[34] 943 30 92 30 Data Intelligence for Energy and Industrial Processes | Inteligencia de Datos para Energía y Procesos Industriales <https://www.linkedin.com/company/vicomtech> <https://www.youtube.com/user/VICOMTech> <https://twitter.com/@Vicomtech_IK4> member of: <http://www.graphicsmedia.net/> <http://www.ik4.es> Legal Notice - Privacy policy <http://www.vicomtech.org/en/proteccion-datos>