Thank you very much for all these clarifications? Yes, I have items with no conversions. I did read in the literature that content-based recs are less sensible to cold-start problem so I headed to it.
You suggested to use Word2Vec in previous post for item with few content attached to it. I already computed Word2Vec for my items using simple sum and want to use them to do some smoothing in the sparse user-item matrix. I was thinking that a kind of tensor operation may be used with CF with the Word2Vec vectors atached to items. 2017-06-04 23:05 GMT+04:00 Pat Ferrel <[email protected]>: > TT’ does not solve cold start because you need user history for > personalizations. There are several other techniques that I’ve mentioned > many times on the list that help with cold start but TT’ is for a slightly > different thing. It’s use is when you have a user’s history of item > preferences but the items are too old to recommend and you only want to > recommend new ones with no history. If you think about news, it is close to > being like this. Or patent application, law opinions or judgments too. To > be helpful there needs to be a lot of content for each item and you only > want new things recommended. > > What cold-start do you need to “solve” new anonymous users with no history > or items with no conversions? Search the PIO list and AML group for past > posts on this. > > Tag use is implemented as both CF and content similarity (not TT’). If you > ask for item-based recommendation and the item has no conversions, you will > get popular items by default. If you boost items with the same tags as the > item the user is looking at, you get popular items mostly with similar > tags. If you disable the popularity part you get items with similar tags, > This requires that you attach tags to the items with $set and your query > should contain the tags (or any other properties) of the example item. > There are many ways of mixing this. You could also just get recs and mix-in > new inventory by some small random amount. You can use different placements > for these so you aren’t ruining recs with too much randomized cold-items. > > Anyway, the best way to do this depends on your GUI and data. > > > On Jun 4, 2017, at 11:35 AM, Marius Rabenarivo <[email protected]> > wrote: > > I didn't mean to tell you what it means, but I just wanted to make it > clear for my part. > > As I understand, the T part is a personalization that we should make if we > want > to use content based information when doing recommendation. > > For my use case, I want to use it for to overcome the cold start problem. > > I was thinking that it was already implemented as you documented it in the > slides > but I didn't find tag use in the code. > > Is it SimilarityAnalysis.rowSimilarity() in Mahout that implement TT'? > (just to confirm) > > 2017-06-04 22:06 GMT+04:00 Pat Ferrel <[email protected]>: > >> No offense Marius but I wrote the slides and the equation so I do indeed >> know what they are saying. Whether a user writes a tag or you are detecting >> the user preference for a tag you wrote, they are user indicators of >> preference. The LLR filtering of these secondary indicators is what CCO is >> all about and leaves you with a model that can be compared to a user’s >> history and contains only indicators that correlate to some conversion >> behavior. >> >> T in the "whole enchilada" it used to personalize content based >> recommendations. Each row of T represent an item and it’s content as >> tokens. Tokens are stemmed, tokenized text terms, of can be entities in the >> item’s text (using some form of NLP) or tags, etc. TT’ then gives you >> items and items that are most similar in terms of whatever content you were >> using in T. Now you take the users’s history of content item preference, >> which articles did they read for instance, and the most similar items in >> TT’. These will be personalized content-based recommendations. >> >> This is not implemented in the UR but is in the CCO tools in Mahout. The >> reason it is not implemented is that it still requires users history and >> content-based recs are worse predictors than collaborative filtering with >> user history. In CF you treat the terms or tags as indicators of preference >> you do not find items similar by content. >> >> The personalized content-based recs may serve for edge conditions where >> you are recommending items with no usage behavior as the most common case, >> like news articles where you have no items all the time with no usage >> events. In this case extracting something better than “bag-of-words” for >> content is quite important. So highly detailed user tagging or NLP >> techniques can greatly increase the quality of results. >> >> >> >> >> On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo <[email protected]> >> wrote: >> >> IMHO, T represents tag it an Anonymous tag (or property) labeling task >> and what you propose is Personalized tag (or property) labeling >> as described in https://arxiv.org/pdf/1203.4487.pdf (Section 1.4.5 >> Emerging new classification) p. 40 >> >> 2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected]>: >> >>> And what the T in the slides is for? >>> >>> How can we implement it if it's is not implemented yet? >>> >>> 2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected]>: >>> >>>> Buy purchasing an item with a tag that you have given it, they are >>>> displaying a preference for that tag. >>>> >>>> >>>> On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo < >>>> [email protected]> wrote: >>>> >>>> So the tag here is assumed to be a tag given by the user to an item? >>>> >>>> I was thinking that it was some kind of tag we give to the item by some >>>> mean (classification, LDA, etc) >>>> >>>> 2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected]>: >>>> >>>>> A = history of all purchases (in the e-com case) >>>>> B = history of all tag preferences >>>>> >>>>> r = [A’A]h_a + [A’B]h_b >>>>> >>>>> The part in the slides about content-based recs is not needed here >>>>> because you have captured them as user preferences. >>>>> >>>>> >>>>> On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo < >>>>> [email protected]> wrote: >>>>> >>>>> Please correct side to size in my previous e-mail >>>>> >>>>> 2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <mariusrabenarivo@g >>>>> mail.com>: >>>>> >>>>>> What will be the size of the matrix if we send an event like tag-pref >>>>>> >>>>>> We will get a |U|x|T| matrix I think (where T is the set of all tags). >>>>>> >>>>>> So [AtA] will be a |T| x |T| matrix and we will do a dot product with >>>>>> the user history hT to get recommendation right? >>>>>> >>>>>> I was assuming that A should be of side |U| x |I| where I is the set >>>>>> of all items as it should be added to other terms of the whole enchilada >>>>>> formula afterwards. >>>>>> >>>>>> Thank you for your guidance Pat. >>>>>> >>>>>> 2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>> >>>>>>> Please refer to the documents. The “event” is the name of the type >>>>>>> of event or indicator if preference, it implies the type of >>>>>>> the targetEntityId. So a “tag-pref’ event would be accompanied by >>>>>>> a targetEntityId = tag-id. This is separate from attaching “tag” >>>>>>> properties >>>>>>> to items with the $set event for use with filter and boost rules. One >>>>>>> looks >>>>>>> at the data as a possible preference indicator and the other is used to >>>>>>> restrict results. This is why we usually name events so they sound like >>>>>>> a >>>>>>> user preference of some type, whereas item property values are simply >>>>>>> item >>>>>>> attributes, intrinsic to the items and independent of an individual >>>>>>> user. >>>>>>> >>>>>>> The event can have any name that makes sense to you. >>>>>>> >>>>>>> >>>>>>> On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo < >>>>>>> [email protected]> wrote: >>>>>>> >>>>>>> so, the event field should be the token and targetEntityId the item >>>>>>> ID, right? >>>>>>> >>>>>>> 2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>> >>>>>>>> Yes, each is analyzed separately as a separate event. If you are >>>>>>>> using REST you can send up to 50 events in a single array. Some SDKs >>>>>>>> may >>>>>>>> support this too. >>>>>>>> >>>>>>>> >>>>>>>> On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo < >>>>>>>> [email protected]> wrote: >>>>>>>> >>>>>>>> So I have to send an event like category-preference for each tag >>>>>>>> associated to an item right? >>>>>>>> >>>>>>>> entityId: userd-id >>>>>>>> event: category-preference >>>>>>>> targetEntityId : tag/token >>>>>>>> >>>>>>>> 2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected]>: >>>>>>>> >>>>>>>>> When a user expresses a preference for a tag, word or term as in >>>>>>>>> search or even in content like descriptions, these can be considered >>>>>>>>> secondary events. The most useful are tags and search terms in our >>>>>>>>> experience. Content can be used but each term/token needs to be sent >>>>>>>>> as a >>>>>>>>> separate preference while search phrases can be used though again >>>>>>>>> turning >>>>>>>>> them into tokens may be better. >>>>>>>>> >>>>>>>>> Please looks through the docs here: http://actionml.com/docs/ur or >>>>>>>>> the siide deck here: https://www.slideshare.n >>>>>>>>> et/pferrel/unified-recommender-39986309 >>>>>>>>> >>>>>>>>> The major innovation of CCO, the algorithm behind the UR, is the >>>>>>>>> use of these cross-domain indicators. They are not guaranteed to >>>>>>>>> predict >>>>>>>>> conversions but the CCO algo tests them and weights them low if they >>>>>>>>> do not >>>>>>>>> so we tend to test for strength of prediction of the entire category >>>>>>>>> of >>>>>>>>> indictor and drop them if weak or set a minLLR threshold and filter >>>>>>>>> weak >>>>>>>>> individual indicators out. >>>>>>>>> >>>>>>>>> Technically these are not called latent, that has another meaning >>>>>>>>> in Machine Learning having to do with Latent Factor Analysis. >>>>>>>>> >>>>>>>>> >>>>>>>>> On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo < >>>>>>>>> [email protected]> wrote: >>>>>>>>> >>>>>>>>> Hello everyone! >>>>>>>>> >>>>>>>>> Do you have an idea on how to use latent informations associated >>>>>>>>> to items like tag, word vector embedding in Mahout's >>>>>>>>> SimilarityAnalysis.cooccurrences? >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> >>>>>>>>> Marius >>>>>>>>> >>>>>>>>> -- >>>>>>>>> You received this message because you are subscribed to the Google >>>>>>>>> Groups "actionml-user" group. >>>>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>>>> send an email to [email protected]. >>>>>>>>> To post to this group, send email to actionml-user@googlegroups. >>>>>>>>> com. >>>>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>>>> m/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA >>>>>>>>> 0rtD-xg0u-tNA_g%40mail.gmail.com >>>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>>>> . >>>>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> You received this message because you are subscribed to the Google >>>>>>> Groups "actionml-user" group. >>>>>>> To unsubscribe from this group and stop receiving emails from it, >>>>>>> send an email to [email protected]. >>>>>>> To post to this group, send email to [email protected]. >>>>>>> To view this discussion on the web visit https://groups.google.co >>>>>>> m/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bac >>>>>>> s5KMzcqS0kDdc0A%40mail.gmail.com >>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>>>> . >>>>>>> For more options, visit https://groups.google.com/d/optout. >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "actionml-user" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To post to this group, send email to [email protected]. >>>>> To view this discussion on the web visit https://groups.google.co >>>>> m/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3 >>>>> EdULpqjHK3LtEfdcQ%40mail.gmail.com >>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> For more options, visit https://groups.google.com/d/optout. >>>>> >>>>> >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "actionml-user" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To post to this group, send email to [email protected]. >>>> To view this discussion on the web visit https://groups.google.co >>>> m/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoS >>>> PnD%2Bv_-4ZCpR0AQ%40mail.gmail.com >>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >>>> . >>>> For more options, visit https://groups.google.com/d/optout. >>>> >>>> >>> >> >> > > -- > You received this message because you are subscribed to the Google Groups > "actionml-user" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > To view this discussion on the web visit https://groups.google. > com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR% > 3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com > <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > >
