Hi Pat,

This is really a great explanation, I myself had tried ALS before CCO, but
in my case CCO seems better. You had a nice presentation, but I was quite
confused regarding multi-model recommendation.

In what case does UR make use of multi model? For say, I've a location
preference for every user event, and category preference as well. Let's say
I trained the model and queried with the preference parameter, in that case
is it using multi model for each preference?

If you could describe a bit about this, it would be reall

On Thu, Apr 13, 2017 at 9:15 PM, Pat Ferrel <p...@occamsmachete.com> wrote:

> I’m surprised that ALS seemed clear because is is based on a complicated
> matrix factorization algorithm that transforms the user vectors into a
> smaller dimensional space that is composed of “important” features. These
> are not interactions with items like “buys”, they can only be described as
> defining a new feature space. The factorized matrices transform in and out
> of that space. The factorized matrices are approximations of user x
> features, and features x items.
>
> The user’s history is transformed into the feature space, which will be
> dense, in other words indicating some preference for all features. Then
> when this dense user vector is transformed back into item space the
> approximation nature of ALS will give some preference value for all items.
> At this point they can be ranked by score and the top few returned. This is
> clearly wrong since user will never have a preference for all items and
> would never purchase or convert on a large number of them no mater what the
> circumstances. It does give good results for the top ranked though when you
> have lots of “conversions” per user on average because ALS can only use
> conversions as input. in other words it can use only one kind of behavior
> data.
>
> The CCO (Correlated Cross-Occurrence) algorithm from Mahout that is behind
> the Universal Recommender is multi-domain and multi-modal, in that takes
> interactions of the user from many actions they perform and even contextual
> data like profile info or location. It takes all this and finds which
> “indicators”, a name for these interactions or other user info, and
> compares them with the user’s conversions. It does this for all users and
> so finds which of the indicators most often lead to conversion. These
> highly correlated indicators are then associated with items as properties,
> When a user recommendation is needed we see which items have the most
> similar behavioral indicators as the user's history. This tells us that the
> user probably has an affinity for the item—we can predict a preference for
> these items.
>
> The differences:
> 1) ALS can ingest only one type of behavior. This is not bad but also not
> very flexible and requires a good number of these interactions per user.
> 2) Cross-behavioral recommendations cannot be made with ALS since no cross
> behavioral data is seen by it. This in turn means that users with few or no
> conversions will not get recommendations. The Universal Recommender can
> make recommendations to users with no conversions if they have other
> behavior to draw from so it is generally said to handle cool-start for
> user’s better. Another way to say this is that “cold-start” for ALS is only
> “cool-start” for CCO (in the UR). The same goes for item-based
> recommendations.
> 3) CCO can also use content directly for similar item recommendations,
> which helps solve the item “cold-start” problem. ALS cannot.
> 4) CCO is more like a landscape of Predictive AI algorithms using all we
> know about a user from multiple domains (conversions, page views, search
> terms, category preferences, tag preferences, brand preferences, location,
> device used, etc) to make predictions in some specific domain. It can also
> work with conversions alone
> 5) To do queries with ALS in the MLlib requires that the factorized
> matrices be in-memory. They are much smaller than the input but this means
> running Spark to make queries. This makes it rather heavy-weight for
> queries and makes scaling a bit of a problem and fairly complicated (too
> much to explain here). CCO on the other hand uses Spark only to create the
> indicators model, which it puts in Elasticsearch. Elasticsearch finds the
> top ranked items compared to the user’s history at runtime in real-time.
> This makes scaling queries as easy as scaling Elasticsearch since it was
> meant to scale.
>
> I have done cross-validaton comparisons but they are a bit unfair and the
> winner depends on the dataset, In real-life CCO serves more users than ALS
> since it uses more behavior and so tends to win for this reason. It’s
> nearly impossible to compare this with cross-validation so A/B tests are
> our only metric.
>
> We have a slide deck showing some of these comparisons here:
> https://docs.google.com/presentation/d/1HpHZZiRmHpMKtu86rOKBJ70cd58Vy
> TOUM1a8OmKSMTo/edit?usp=sharing
>
>
> On Apr 13, 2017, at 2:39 AM, Dennis Honders <dennishond...@gmail.com>
> wrote:
>
> Hello,
>
> I was using the similar product template. (I'm not a data scientist)
> The template is using the ALS algorithm and the Cooccurrence algortihm.
>
> The ALS algorithm is quite good described on the Apache Spark MLlib
> website. The Apache Mahout documentation about the cooccurrence algorithm
> is quite general described and it is not clear what the differences are
> between these algorithms. They both use matrixes to describe relations but
> use a different approach to factorize the matrices?
>
> I also like to know a bit more about the parameters of both algorithms, in
> the engine.json. What could be the impact of changing the values?
>
>    - ALS: rank, nIterations, lambda and seed.
>    - Cooccurrence: "n"
>
> The algorithms bring different results. Is there a general way of
> comparing these results?
>
> Greetings,
>
> Dennis
>
>

Reply via email to