Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Marius Rabenarivo Sun, 04 Jun 2017 12:15:06 -0700

Thank you very much for all these clarifications?

Yes, I have items with no conversions.
I did read in the literature that content-based recs are less sensible to
cold-start problem
so I headed to it.


You suggested to use Word2Vec in previous post for item with few content
attached to it.

I already computed Word2Vec for my items using simple sum and want to use
them to
do some smoothing in the sparse user-item matrix.

I was thinking that a kind of tensor operation may be used with CF with the
Word2Vec vectors
atached to items.

2017-06-04 23:05 GMT+04:00 Pat Ferrel <[email protected]>:

> TT’ does not solve cold start because you need user history for
> personalizations. There are several other techniques that I’ve mentioned
> many times on the list that help with cold start but TT’ is for a slightly
> different thing. It’s use is when you have a user’s history of item
> preferences but the items are too old to recommend and you only want to
> recommend new ones with no history. If you think about news, it is close to
> being like this. Or patent application, law opinions or judgments too. To
> be helpful there needs to be a lot of content for each item and you only
> want new things recommended.
>
> What cold-start do you need to “solve” new anonymous users with no history
> or items with no conversions? Search the PIO list and AML group for past
> posts on this.
>
> Tag use is implemented as both CF and content similarity (not TT’). If you
> ask for item-based recommendation and the item has no conversions, you will
> get popular items by default. If you boost items with the same tags as the
> item the user is looking at, you get popular items mostly with similar
> tags. If you disable the popularity part you get items with similar tags,
> This requires that you attach tags to the items with $set and your query
> should contain the tags (or any other properties) of the example item.
> There are many ways of mixing this. You could also just get recs and mix-in
> new inventory by some small random amount. You can use different placements
> for these so you aren’t ruining recs with too much randomized cold-items.
>
> Anyway, the best way to do this depends on your GUI and data.
>
>
> On Jun 4, 2017, at 11:35 AM, Marius Rabenarivo <[email protected]>
> wrote:
>
> I didn't mean to tell you what it means, but I just wanted to make it
> clear for my part.
>
> As I understand, the T part is a personalization that we should make if we
> want
> to use content based information when doing recommendation.
>
> For my use case, I want to use it for to overcome the cold start problem.
>
> I was thinking that it was already implemented as you documented it in the
> slides
> but I didn't find tag use in the code.
>
> Is it SimilarityAnalysis.rowSimilarity() in Mahout that implement TT'?
> (just to confirm)
>
> 2017-06-04 22:06 GMT+04:00 Pat Ferrel <[email protected]>:
>
>> No offense Marius but I wrote the slides and the equation so I do indeed
>> know what they are saying. Whether a user writes a tag or you are detecting
>> the user preference for a tag you wrote, they are user indicators of
>> preference. The LLR filtering of these secondary indicators is what CCO is
>> all about and leaves you with a model that can be compared to a user’s
>> history and contains only indicators that correlate to some conversion
>> behavior.
>>
>> T in the "whole enchilada" it used to personalize content based
>> recommendations. Each row of T represent an item and it’s content as
>> tokens. Tokens are stemmed, tokenized text terms, of can be entities in the
>> item’s text (using some form of NLP) or tags, etc.  TT’ then gives you
>> items and items that are most similar in terms of whatever content you were
>> using in T. Now you take the users’s history of content item preference,
>> which articles did they read for instance, and the most similar items in
>> TT’. These will be personalized content-based recommendations.
>>
>> This is not implemented in the UR but is in the CCO tools in Mahout. The
>> reason it is not implemented is that it still requires users history and
>> content-based recs are worse predictors than collaborative filtering with
>> user history. In CF you treat the terms or tags as indicators of preference
>> you do not find items similar by content.
>>
>> The personalized content-based recs may serve for edge conditions where
>> you are recommending items with no usage behavior as the most common case,
>> like news articles where you have no items all the time with no usage
>> events. In this case extracting something better than “bag-of-words” for
>> content is quite important. So highly detailed user tagging or NLP
>> techniques can greatly increase the quality of results.
>>
>>
>>
>>
>> On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo <[email protected]>
>> wrote:
>>
>> IMHO, T represents tag it an Anonymous tag (or property) labeling task
>> and what you propose is Personalized tag (or property) labeling
>> as described in https://arxiv.org/pdf/1203.4487.pdf (Section 1.4.5
>> Emerging new classification) p. 40
>>
>> 2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected]>:
>>
>>> And what the T in the slides is for?
>>>
>>> How can we implement it if it's is not implemented yet?
>>>
>>> 2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected]>:
>>>
>>>> Buy purchasing an item with a tag that you have given it, they are
>>>> displaying a preference for that tag.
>>>>
>>>>
>>>> On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo <
>>>> [email protected]> wrote:
>>>>
>>>> So the tag here is assumed to be a tag given by the user to an item?
>>>>
>>>> I was thinking that it was some kind of tag we give to the item by some
>>>> mean (classification, LDA, etc)
>>>>
>>>> 2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected]>:
>>>>
>>>>> A = history of all purchases (in the e-com case)
>>>>> B = history of all tag preferences
>>>>>
>>>>> r = [A’A]h_a + [A’B]h_b
>>>>>
>>>>> The part in the slides about content-based recs is not needed here
>>>>> because you have captured them as user preferences.
>>>>>
>>>>>
>>>>> On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo <
>>>>> [email protected]> wrote:
>>>>>
>>>>> Please correct side to size in my previous e-mail
>>>>>
>>>>> 2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <mariusrabenarivo@g
>>>>> mail.com>:
>>>>>
>>>>>> What will be the size of the matrix if we send an event like tag-pref
>>>>>>
>>>>>> We will get a |U|x|T| matrix I think (where T is the set of all tags).
>>>>>>
>>>>>> So [AtA] will be a |T| x |T| matrix and we will do a dot product with
>>>>>> the user history hT to get recommendation right?
>>>>>>
>>>>>> I was assuming that A should be of side |U| x |I| where I is the set
>>>>>> of all items as it should be added to other terms of the whole enchilada
>>>>>> formula afterwards.
>>>>>>
>>>>>> Thank you for your guidance Pat.
>>>>>>
>>>>>> 2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected]>:
>>>>>>
>>>>>>> Please refer to the documents. The “event” is the name of the type
>>>>>>> of event or indicator if preference, it implies the type of
>>>>>>> the targetEntityId. So a “tag-pref’ event would be accompanied by
>>>>>>> a targetEntityId = tag-id. This is separate from attaching “tag” 
>>>>>>> properties
>>>>>>> to items with the $set event for use with filter and boost rules. One 
>>>>>>> looks
>>>>>>> at the data as a possible preference indicator and the other is used to
>>>>>>> restrict results. This is why we usually name events so they sound like 
>>>>>>> a
>>>>>>> user preference of some type, whereas item property values are simply 
>>>>>>> item
>>>>>>> attributes, intrinsic to the items and independent of an individual 
>>>>>>> user.
>>>>>>>
>>>>>>> The event can have any name that makes sense to you.
>>>>>>>
>>>>>>>
>>>>>>> On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>> so, the event field should be the token and targetEntityId the item
>>>>>>> ID, right?
>>>>>>>
>>>>>>> 2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected]>:
>>>>>>>
>>>>>>>> Yes, each is analyzed separately as a separate event. If you are
>>>>>>>> using REST you can send up to 50 events in a single array. Some SDKs 
>>>>>>>> may
>>>>>>>> support this too.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>> So I have to send an event like category-preference for each tag
>>>>>>>> associated to an item right?
>>>>>>>>
>>>>>>>> entityId: userd-id
>>>>>>>> event: category-preference
>>>>>>>> targetEntityId : tag/token
>>>>>>>>
>>>>>>>> 2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected]>:
>>>>>>>>
>>>>>>>>> When a user expresses a preference for a tag, word or term as in
>>>>>>>>> search or even in content like descriptions, these can be considered
>>>>>>>>> secondary events. The most useful are tags and search terms in our
>>>>>>>>> experience. Content can be used but each term/token needs to be sent 
>>>>>>>>> as a
>>>>>>>>> separate preference while search phrases can be used though again 
>>>>>>>>> turning
>>>>>>>>> them into tokens may be better.
>>>>>>>>>
>>>>>>>>> Please looks through the docs here: http://actionml.com/docs/ur or
>>>>>>>>> the siide deck here: https://www.slideshare.n
>>>>>>>>> et/pferrel/unified-recommender-39986309
>>>>>>>>>
>>>>>>>>> The major innovation of CCO, the algorithm behind the UR, is the
>>>>>>>>> use of these cross-domain indicators. They are not guaranteed to 
>>>>>>>>> predict
>>>>>>>>> conversions but the CCO algo tests them and weights them low if they 
>>>>>>>>> do not
>>>>>>>>> so we tend to test for strength of prediction of the entire category 
>>>>>>>>> of
>>>>>>>>> indictor and drop them if weak or set a minLLR threshold and filter 
>>>>>>>>> weak
>>>>>>>>> individual indicators out.
>>>>>>>>>
>>>>>>>>> Technically these are not called latent, that has another meaning
>>>>>>>>> in Machine Learning having to do with Latent Factor Analysis.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>> Hello everyone!
>>>>>>>>>
>>>>>>>>> Do you have an idea on how to use latent informations associated
>>>>>>>>> to items like tag, word vector embedding in Mahout's
>>>>>>>>> SimilarityAnalysis.cooccurrences?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>>
>>>>>>>>> Marius
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> You received this message because you are subscribed to the Google
>>>>>>>>> Groups "actionml-user" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>>>> send an email to [email protected].
>>>>>>>>> To post to this group, send email to actionml-user@googlegroups.
>>>>>>>>> com.
>>>>>>>>> To view this discussion on the web visit https://groups.google.co
>>>>>>>>> m/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA
>>>>>>>>> 0rtD-xg0u-tNA_g%40mail.gmail.com
>>>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>>>> .
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "actionml-user" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to [email protected].
>>>>>>> To post to this group, send email to [email protected].
>>>>>>> To view this discussion on the web visit https://groups.google.co
>>>>>>> m/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bac
>>>>>>> s5KMzcqS0kDdc0A%40mail.gmail.com
>>>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>>>> .
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "actionml-user" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To post to this group, send email to [email protected].
>>>>> To view this discussion on the web visit https://groups.google.co
>>>>> m/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3
>>>>> EdULpqjHK3LtEfdcQ%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "actionml-user" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To post to this group, send email to [email protected].
>>>> To view this discussion on the web visit https://groups.google.co
>>>> m/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoS
>>>> PnD%2Bv_-4ZCpR0AQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>
>>
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "actionml-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.
> com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%
> 3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com
> <https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>

Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Reply via email to