Answers below


On Apr 18, 2017, at 1:25 AM, Dennis Honders <dennishond...@gmail.com> wrote:

Hello Pat,

First of all, thanks a lot for the great explanation and the link to the 
PowerPoint. I think it already helped me a lot understanding the algorithms 
behind the templates. I also have some new questions based on the email and the 
PowerPoint. 

I currently understand matrix factorization as finding latent factors that (the 
latent factors) describe hidden relations between users and items. Is this 
correct?
And for finding hidden latent factors, different algorithms exist like 
cooccurrence, ALS and correlated cross occurrence. Is this correct?
No, ALS find latent factors, CCO finds explicit correlations.
In the PowerPoint at the ALS algorithm: 'U' describes dimensionally reduced 
users by “features”. What are the features here? "features are projection 
parameters into a space that is optimized to reduce an error function" I don't 
exactly understand what is meant here. 
Depending on who you talk to, factor = feature they are words for the same 
thing.
I also watched your video about the cooccurence algorithm 
(https://www.youtube.com/watch?v=LWAY_XeoQoc 
<https://www.youtube.com/watch?v=LWAY_XeoQoc>). From the description in the 
email about ALS, I don't see the difference between ALS and the Coocccurrence 
algorithm as explained in the youtube video. 
The YouTube was not meant to describe ALS
The Correlated Cross-Occurrence could be seen as an expansion of the 
Cooccurrence algorithm to make it multi-domain and multi-modal?
Yes, but in a rather fundamental way. The modes of behavior we are talking 
about are different “tensors” in math jargon. Until the advent of LLR there was 
no good way to compare vectors from one tensor to another since the transforms 
are unknown. LLR uses occurrence counts and so turns a multi-tensor problem 
into one tensor space. In other words it allows us to compare one behavior to 
another.

Another accurate way to say this is that Cooccurrence is a special limited case 
of CCO.
From the email: "It does give good results for the top ranked though when you 
have lots of “conversions” per user on average because ALS can only use 
conversions as input. in other words it can use only one kind of behavior 
data." For confirmation: Behavior data is like buy, view, etc?
Choose one, if “buy” yes ALS can only use “buy”. Some use “rate” but this is in 
disfavor these days for many good reasons. Views, etc. you may as well throw 
away with ALS. 
From the email: "It does this for all users and so finds which of the 
indicators most often lead to conversion." What do you mean with conversion 
(also saw it in the PowerPoint)?
A conversion again may be called by many names but it is the mode of behavior 
you want to increase. Primary indicator, conversion event, “buy”, whatever you 
would like to call it, we see it as the most pure form of indication that a 
user prefers an item. On news sites it might be that a user “shares” the 
article, on a video site it might mean that they watch 95% of the video. This 
is the indicator that we compare all else with and is the behavior mode we want 
to recommend. Typically this means for E-Commerce we want “buys” for news we 
want “shares”, for video we want “watch 95%”. In CCO it is only the purest for 
some purpose but there is good data in the rest of the behavior modes if they 
are tested for correlation to the conversion/primary behavior mode.

CCO finds events from multiple modes of behavior that correlate with 
conversions, “buy” in the E-Commerce case. This is done on the individual event 
basis. For instance some “views’ lead to “buys” but not all. Many views are 
just for flashy pictures or because they are above the fold somewhere. Others 
views are found to correlate with buys in a significant way. CCO finds these 
views and uses them as good quality indicators of user preference. This 
magnifies usable data (as compared to a single mode recommender like ALS) and 
therefore also increases user and item coverage.

Greetings,

Dennis

2017-04-14 15:18 GMT+02:00 Vaghawan Ojha <vaghawan...@gmail.com 
<mailto:vaghawan...@gmail.com>>:
Sorry the email sent accidentally without finishing, it would be really helpful 
for me if you describe about in which case the multi model are being used. 

On Fri, Apr 14, 2017 at 7:01 PM, Vaghawan Ojha <vaghawan...@gmail.com 
<mailto:vaghawan...@gmail.com>> wrote:
Hi Pat, 

This is really a great explanation, I myself had tried ALS before CCO, but in 
my case CCO seems better. You had a nice presentation, but I was quite confused 
regarding multi-model recommendation. 

In what case does UR make use of multi model? For say, I've a location 
preference for every user event, and category preference as well. Let's say I 
trained the model and queried with the preference parameter, in that case is it 
using multi model for each preference? 

If you could describe a bit about this, it would be reall

On Thu, Apr 13, 2017 at 9:15 PM, Pat Ferrel <p...@occamsmachete.com 
<mailto:p...@occamsmachete.com>> wrote:
I’m surprised that ALS seemed clear because is is based on a complicated matrix 
factorization algorithm that transforms the user vectors into a smaller 
dimensional space that is composed of “important” features. These are not 
interactions with items like “buys”, they can only be described as defining a 
new feature space. The factorized matrices transform in and out of that space. 
The factorized matrices are approximations of user x features, and features x 
items.

The user’s history is transformed into the feature space, which will be dense, 
in other words indicating some preference for all features. Then when this 
dense user vector is transformed back into item space the approximation nature 
of ALS will give some preference value for all items. At this point they can be 
ranked by score and the top few returned. This is clearly wrong since user will 
never have a preference for all items and would never purchase or convert on a 
large number of them no mater what the circumstances. It does give good results 
for the top ranked though when you have lots of “conversions” per user on 
average because ALS can only use conversions as input. in other words it can 
use only one kind of behavior data.

The CCO (Correlated Cross-Occurrence) algorithm from Mahout that is behind the 
Universal Recommender is multi-domain and multi-modal, in that takes 
interactions of the user from many actions they perform and even contextual 
data like profile info or location. It takes all this and finds which 
“indicators”, a name for these interactions or other user info, and compares 
them with the user’s conversions. It does this for all users and so finds which 
of the indicators most often lead to conversion. These highly correlated 
indicators are then associated with items as properties, When a user 
recommendation is needed we see which items have the most similar behavioral 
indicators as the user's history. This tells us that the user probably has an 
affinity for the item—we can predict a preference for these items.

The differences:
1) ALS can ingest only one type of behavior. This is not bad but also not very 
flexible and requires a good number of these interactions per user.
2) Cross-behavioral recommendations cannot be made with ALS since no cross 
behavioral data is seen by it. This in turn means that users with few or no 
conversions will not get recommendations. The Universal Recommender can make 
recommendations to users with no conversions if they have other behavior to 
draw from so it is generally said to handle cool-start for user’s better. 
Another way to say this is that “cold-start” for ALS is only “cool-start” for 
CCO (in the UR). The same goes for item-based recommendations.
3) CCO can also use content directly for similar item recommendations, which 
helps solve the item “cold-start” problem. ALS cannot.
4) CCO is more like a landscape of Predictive AI algorithms using all we know 
about a user from multiple domains (conversions, page views, search terms, 
category preferences, tag preferences, brand preferences, location, device 
used, etc) to make predictions in some specific domain. It can also work with 
conversions alone
5) To do queries with ALS in the MLlib requires that the factorized matrices be 
in-memory. They are much smaller than the input but this means running Spark to 
make queries. This makes it rather heavy-weight for queries and makes scaling a 
bit of a problem and fairly complicated (too much to explain here). CCO on the 
other hand uses Spark only to create the indicators model, which it puts in 
Elasticsearch. Elasticsearch finds the top ranked items compared to the user’s 
history at runtime in real-time.  This makes scaling queries as easy as scaling 
Elasticsearch since it was meant to scale.

I have done cross-validaton comparisons but they are a bit unfair and the 
winner depends on the dataset, In real-life CCO serves more users than ALS 
since it uses more behavior and so tends to win for this reason. It’s nearly 
impossible to compare this with cross-validation so A/B tests are our only 
metric.

We have a slide deck showing some of these comparisons here: 
https://docs.google.com/presentation/d/1HpHZZiRmHpMKtu86rOKBJ70cd58VyTOUM1a8OmKSMTo/edit?usp=sharing
 
<https://docs.google.com/presentation/d/1HpHZZiRmHpMKtu86rOKBJ70cd58VyTOUM1a8OmKSMTo/edit?usp=sharing>


On Apr 13, 2017, at 2:39 AM, Dennis Honders <dennishond...@gmail.com 
<mailto:dennishond...@gmail.com>> wrote:

Hello, 

I was using the similar product template. (I'm not a data scientist)
The template is using the ALS algorithm and the Cooccurrence algortihm. 

The ALS algorithm is quite good described on the Apache Spark MLlib website. 
The Apache Mahout documentation about the cooccurrence algorithm is quite 
general described and it is not clear what the differences are between these 
algorithms. They both use matrixes to describe relations but use a different 
approach to factorize the matrices?

I also like to know a bit more about the parameters of both algorithms, in the 
engine.json. What could be the impact of changing the values?
ALS: rank, nIterations, lambda and seed. 
Cooccurrence: "n" 
The algorithms bring different results. Is there a general way of comparing 
these results? 

Greetings,

Dennis





Reply via email to