Hi Pat, This is really a great explanation, I myself had tried ALS before CCO, but in my case CCO seems better. You had a nice presentation, but I was quite confused regarding multi-model recommendation.
In what case does UR make use of multi model? For say, I've a location preference for every user event, and category preference as well. Let's say I trained the model and queried with the preference parameter, in that case is it using multi model for each preference? If you could describe a bit about this, it would be reall On Thu, Apr 13, 2017 at 9:15 PM, Pat Ferrel <p...@occamsmachete.com> wrote: > I’m surprised that ALS seemed clear because is is based on a complicated > matrix factorization algorithm that transforms the user vectors into a > smaller dimensional space that is composed of “important” features. These > are not interactions with items like “buys”, they can only be described as > defining a new feature space. The factorized matrices transform in and out > of that space. The factorized matrices are approximations of user x > features, and features x items. > > The user’s history is transformed into the feature space, which will be > dense, in other words indicating some preference for all features. Then > when this dense user vector is transformed back into item space the > approximation nature of ALS will give some preference value for all items. > At this point they can be ranked by score and the top few returned. This is > clearly wrong since user will never have a preference for all items and > would never purchase or convert on a large number of them no mater what the > circumstances. It does give good results for the top ranked though when you > have lots of “conversions” per user on average because ALS can only use > conversions as input. in other words it can use only one kind of behavior > data. > > The CCO (Correlated Cross-Occurrence) algorithm from Mahout that is behind > the Universal Recommender is multi-domain and multi-modal, in that takes > interactions of the user from many actions they perform and even contextual > data like profile info or location. It takes all this and finds which > “indicators”, a name for these interactions or other user info, and > compares them with the user’s conversions. It does this for all users and > so finds which of the indicators most often lead to conversion. These > highly correlated indicators are then associated with items as properties, > When a user recommendation is needed we see which items have the most > similar behavioral indicators as the user's history. This tells us that the > user probably has an affinity for the item—we can predict a preference for > these items. > > The differences: > 1) ALS can ingest only one type of behavior. This is not bad but also not > very flexible and requires a good number of these interactions per user. > 2) Cross-behavioral recommendations cannot be made with ALS since no cross > behavioral data is seen by it. This in turn means that users with few or no > conversions will not get recommendations. The Universal Recommender can > make recommendations to users with no conversions if they have other > behavior to draw from so it is generally said to handle cool-start for > user’s better. Another way to say this is that “cold-start” for ALS is only > “cool-start” for CCO (in the UR). The same goes for item-based > recommendations. > 3) CCO can also use content directly for similar item recommendations, > which helps solve the item “cold-start” problem. ALS cannot. > 4) CCO is more like a landscape of Predictive AI algorithms using all we > know about a user from multiple domains (conversions, page views, search > terms, category preferences, tag preferences, brand preferences, location, > device used, etc) to make predictions in some specific domain. It can also > work with conversions alone > 5) To do queries with ALS in the MLlib requires that the factorized > matrices be in-memory. They are much smaller than the input but this means > running Spark to make queries. This makes it rather heavy-weight for > queries and makes scaling a bit of a problem and fairly complicated (too > much to explain here). CCO on the other hand uses Spark only to create the > indicators model, which it puts in Elasticsearch. Elasticsearch finds the > top ranked items compared to the user’s history at runtime in real-time. > This makes scaling queries as easy as scaling Elasticsearch since it was > meant to scale. > > I have done cross-validaton comparisons but they are a bit unfair and the > winner depends on the dataset, In real-life CCO serves more users than ALS > since it uses more behavior and so tends to win for this reason. It’s > nearly impossible to compare this with cross-validation so A/B tests are > our only metric. > > We have a slide deck showing some of these comparisons here: > https://docs.google.com/presentation/d/1HpHZZiRmHpMKtu86rOKBJ70cd58Vy > TOUM1a8OmKSMTo/edit?usp=sharing > > > On Apr 13, 2017, at 2:39 AM, Dennis Honders <dennishond...@gmail.com> > wrote: > > Hello, > > I was using the similar product template. (I'm not a data scientist) > The template is using the ALS algorithm and the Cooccurrence algortihm. > > The ALS algorithm is quite good described on the Apache Spark MLlib > website. The Apache Mahout documentation about the cooccurrence algorithm > is quite general described and it is not clear what the differences are > between these algorithms. They both use matrixes to describe relations but > use a different approach to factorize the matrices? > > I also like to know a bit more about the parameters of both algorithms, in > the engine.json. What could be the impact of changing the values? > > - ALS: rank, nIterations, lambda and seed. > - Cooccurrence: "n" > > The algorithms bring different results. Is there a general way of > comparing these results? > > Greetings, > > Dennis > >