Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Pat Ferrel Sun, 04 Jun 2017 12:06:05 -0700

TT’ does not solve cold start because you need user history for 
personalizations. There are several other techniques that I’ve mentioned many 
times on the list that help with cold start but TT’ is for a slightly different 
thing. It’s use is when you have a user’s history of item preferences but the 
items are too old to recommend and you only want to recommend new ones with no 
history. If you think about news, it is close to being like this. Or patent 
application, law opinions or judgments too. To be helpful there needs to be a 
lot of content for each item and you only want new things recommended.


What cold-start do you need to “solve” new anonymous users with no history or 
items with no conversions? Search the PIO list and AML group for past posts on 
this. 

Tag use is implemented as both CF and content similarity (not TT’). If you ask 
for item-based recommendation and the item has no conversions, you will get 
popular items by default. If you boost items with the same tags as the item the 
user is looking at, you get popular items mostly with similar tags. If you 
disable the popularity part you get items with similar tags, This requires that 
you attach tags to the items with $set and your query should contain the tags 
(or any other properties) of the example item. There are many ways of mixing 
this. You could also just get recs and mix-in new inventory by some small 
random amount. You can use different placements for these so you aren’t ruining 
recs with too much randomized cold-items. 

Anyway, the best way to do this depends on your GUI and data.


On Jun 4, 2017, at 11:35 AM, Marius Rabenarivo <[email protected]> 
wrote:

I didn't mean to tell you what it means, but I just wanted to make it clear for 
my part.

As I understand, the T part is a personalization that we should make if we want
to use content based information when doing recommendation.

For my use case, I want to use it for to overcome the cold start problem.

I was thinking that it was already implemented as you documented it in the 
slides
but I didn't find tag use in the code.

Is it SimilarityAnalysis.rowSimilarity() in Mahout that implement TT'? (just to 
confirm)

2017-06-04 22:06 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
No offense Marius but I wrote the slides and the equation so I do indeed know 
what they are saying. Whether a user writes a tag or you are detecting the user 
preference for a tag you wrote, they are user indicators of preference. The LLR 
filtering of these secondary indicators is what CCO is all about and leaves you 
with a model that can be compared to a user’s history and contains only 
indicators that correlate to some conversion behavior.

T in the "whole enchilada" it used to personalize content based 
recommendations. Each row of T represent an item and it’s content as tokens. 
Tokens are stemmed, tokenized text terms, of can be entities in the item’s text 
(using some form of NLP) or tags, etc.  TT’ then gives you items and items that 
are most similar in terms of whatever content you were using in T. Now you take 
the users’s history of content item preference, which articles did they read 
for instance, and the most similar items in TT’. These will be personalized 
content-based recommendations.

This is not implemented in the UR but is in the CCO tools in Mahout. The reason 
it is not implemented is that it still requires users history and content-based 
recs are worse predictors than collaborative filtering with user history. In CF 
you treat the terms or tags as indicators of preference you do not find items 
similar by content. 

The personalized content-based recs may serve for edge conditions where you are 
recommending items with no usage behavior as the most common case, like news 
articles where you have no items all the time with no usage events. In this 
case extracting something better than “bag-of-words” for content is quite 
important. So highly detailed user tagging or NLP techniques can greatly 
increase the quality of results.




On Jun 4, 2017, at 4:09 AM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

IMHO, T represents tag it an Anonymous tag (or property) labeling task
and what you propose is Personalized tag (or property) labeling
as described in https://arxiv.org/pdf/1203.4487.pdf 
<https://arxiv.org/pdf/1203.4487.pdf> (Section 1.4.5 Emerging new 
classification) p. 40

2017-06-04 8:14 GMT+04:00 Marius Rabenarivo <[email protected] 
<mailto:[email protected]>>:
And what the T in the slides is for?

How can we implement it if it's is not implemented yet?

2017-06-04 8:11 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Buy purchasing an item with a tag that you have given it, they are displaying a 
preference for that tag.


On Jun 3, 2017, at 12:36 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

So the tag here is assumed to be a tag given by the user to an item?

I was thinking that it was some kind of tag we give to the item by some mean 
(classification, LDA, etc)

2017-06-03 21:14 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
A = history of all purchases (in the e-com case)
B = history of all tag preferences

r = [A’A]h_a + [A’B]h_b

The part in the slides about content-based recs is not needed here because you 
have captured them as user preferences.


On Jun 2, 2017, at 7:22 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

Please correct side to size in my previous e-mail

2017-06-03 6:14 GMT+04:00 Marius Rabenarivo <[email protected] 
<mailto:[email protected]>>:
What will be the size of the matrix if we send an event like tag-pref 
We will get a |U|x|T| matrix I think (where T is the set of all tags).

So [AtA] will be a |T| x |T| matrix and we will do a dot product with the user 
history hT to get recommendation right?

I was assuming that A should be of side |U| x |I| where I is the set of all 
items as it should be added to other terms of the whole enchilada formula 
afterwards.

Thank you for your guidance Pat.

2017-06-02 21:35 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Please refer to the documents. The “event” is the name of the type of event or 
indicator if preference, it implies the type of the targetEntityId. So a 
“tag-pref’ event would be accompanied by a targetEntityId = tag-id. This is 
separate from attaching “tag” properties to items with the $set event for use 
with filter and boost rules. One looks at the data as a possible preference 
indicator and the other is used to restrict results. This is why we usually 
name events so they sound like a user preference of some type, whereas item 
property values are simply item attributes, intrinsic to the items and 
independent of an individual user.

The event can have any name that makes sense to you.


On Jun 2, 2017, at 9:19 AM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

so, the event field should be the token and targetEntityId the item ID, right?

2017-06-02 20:07 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
Yes, each is analyzed separately as a separate event. If you are using REST you 
can send up to 50 events in a single array. Some SDKs may support this too.


On Jun 2, 2017, at 8:56 AM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

So I have to send an event like category-preference for each tag associated to 
an item right?

entityId: userd-id
event: category-preference
targetEntityId : tag/token

2017-06-02 19:47 GMT+04:00 Pat Ferrel <[email protected] 
<mailto:[email protected]>>:
When a user expresses a preference for a tag, word or term as in search or even 
in content like descriptions, these can be considered secondary events. The 
most useful are tags and search terms in our experience. Content can be used 
but each term/token needs to be sent as a separate preference while search 
phrases can be used though again turning them into tokens may be better.

Please looks through the docs here: http://actionml.com/docs/ur 
<http://actionml.com/docs/ur> or the siide deck here: 
https://www.slideshare.net/pferrel/unified-recommender-39986309 
<https://www.slideshare.net/pferrel/unified-recommender-39986309>

The major innovation of CCO, the algorithm behind the UR, is the use of these 
cross-domain indicators. They are not guaranteed to predict conversions but the 
CCO algo tests them and weights them low if they do not so we tend to test for 
strength of prediction of the entire category of indictor and drop them if weak 
or set a minLLR threshold and filter weak individual indicators out.

Technically these are not called latent, that has another meaning in Machine 
Learning having to do with Latent Factor Analysis.


On Jun 1, 2017, at 11:26 PM, Marius Rabenarivo <[email protected] 
<mailto:[email protected]>> wrote:

Hello everyone!

Do you have an idea on how to use latent informations associated to items like 
tag, word vector embedding in Mahout's SimilarityAnalysis.cooccurrences?

Regards,

Marius

-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVEO_YON-5E95iPJjBR-FUgEv8TQsOA0rtD-xg0u-tNA_g%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.





-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVFMsZw3uKtJQ8Mi00vvfRz4wOo3bacs5KMzcqS0kDdc0A%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.




-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVEuH6iFKAyzDt8_MdAWQuzjgb%3Dx3EdULpqjHK3LtEfdcQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.



-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVHa-v4Aw8Ebo4xESzKUxvyyhfEfBoSPnD%2Bv_-4ZCpR0AQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.






-- 
You received this message because you are subscribed to the Google Groups 
"actionml-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected] 
<mailto:[email protected]>.
To post to this group, send email to [email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com
 
<https://groups.google.com/d/msgid/actionml-user/CAC-ATVFoJQpX8XWJ25cQo7CEF8YR%3DRzWxVHTFFZWv_fjGgC6LA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
For more options, visit https://groups.google.com/d/optout 
<https://groups.google.com/d/optout>.

Re: Use of latent informations associated to items with Mahout's SimilarityAnalysis.cooccurrences

Reply via email to