Part of the solr-recommender project is a cross-recommender based on Mahout. It
uses the mapreduce version of the RecommenderJob as a template and implements
an XRecommenderJob. Unfortunately the key part of the algorithm—the part
handled by RowSimilarityJob—is done with a simple matrix
Anonymizing the id's is a good start, especially if you have a relatively
small subset of the entire social graph and if the graph is publicly
visible in any case. If you have a complete crawl of the graph, then many
id's will recoverable by reference back to the public version of the graph.
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions, something.
To use this data we would treat clicks like a purchase--the primary action we
want to recommend. Then the search-result-item-impressions is like
Primary action can be emitting a search term. Secondary can be click to
view.
On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote:
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions
, Pat Ferrel pat.fer...@gmail.com wrote:
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions,
something.
To use this data we would treat clicks like a purchase--the primary
action
we want to recommend
. Secondary can be click to
view.
On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote:
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions,
something.
To use this data we would treat clicks
on that.
On Tue, Apr 16, 2013 at 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote:
Primary action can be emitting a search term. Secondary can be click to
view.
On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote:
For the cross-recommender we need some replacement
be click to
view.
On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com
wrote:
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions,
something.
To use this data we would treat clicks like
:
For the cross-recommender we need some replacement for a primary
action--purchases and a secondary action--views, clicks, impressions,
something.
To use this data we would treat clicks like a purchase--the primary
action
we want to recommend. Then the search-result-item-impressions is like
I asked management here a while ago whether there would be a problem with
releasing an anonymized set of data from one of our retail customers, and
didn't get too much push-back. If this is something that would be of
major interest, I can ask again and see whether there's something we can
put out
Definitely of MAJOR interest.
I am sure it would also draw all kinds of desired attention to your
business.
Movie Lens is way too small to be meaningful any more.
Wikipedia articles and Stackoverflow tags are not retail data!
By all means, post some real retail data, if you can.
Meaningful sizes
MAJOR may be too tame a word.
Furthermore there are several enhancements the community could make to support
retail data and retail recommenders. For one thing without public data a
*public* cross-recommender will probably not get built.
The cross-recommender needs to separate actions types
/data
On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel pat.fer...@gmail.com wrote:
MAJOR may be too tame a word.
Furthermore there are several enhancements the community could make to
support retail data and retail recommenders. For one thing without public
data a *public* cross-recommender
That looks like the best shortcut. It is one of the few places where the rows
of one and the columns of the other are seen together. Now I know why you
transpose the first input :-)
But, I have begun to wonder whether it is the right thing to do for a cross
recommender because you
...@occamsmachete.com wrote:
That looks like the best shortcut. It is one of the few places where the
rows of one and the columns of the other are seen together. Now I know why
you transpose the first input :-)
But, I have begun to wonder whether it is the right thing to do for a
cross recommender because
Getting this running with co-occurrence rather than using a similarity calc on
user rows finally forced me to understand what is going on in the base
recommender. And the answer implies further work.
[B'B] is usually not calculated in the usual item based recommender. The matrix
that comes out
Do I have to create a SimilarityJob( matrixB, matrixA, similarityType
) to get this or have I missed something already in Mahout?
It could be worth to investigate whether MatrixMultiplicationJob could
be extended to compute similarities instead of dot products.
Best,
Sebastian
BTW I have this working on trivial data and am in the process of measuring it's
results on some real world data. It does a lot with DistributedRowMatix and so
I'll be interested to see how it performs with a larger data set.
Does anyone know of a public data set that provides things like views
On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel p...@occamsmachete.com wrote:
Does anyone know of a public data set that provides things like views and
purchases?
I don't.
Retail data may be hard to impossible, but one can improvise.
It seems to be fairly common to use Wikipedia articles (Myrrix, GraphLab).
Another idea is to use StackOverflow tags (Myrrix examples).
Although they are only good for emulating implicit feedback.
On Wed, Apr 10, 2013 at 6:48 PM, Ted
I have retail data but can't publish results from it. If I could get a public
sample I'd share how the technique worked out.
Not sure how to simulate this data. It has the important characteristic that
every purchase is also a view but not the other way around and Ted's technique
is a way to
On Sat, Apr 6, 2013 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote:
I guess I don't understand this issue.
In my case both the item ids and user ids of the separate DistributedRow
Matrix will match and I know the size for the entire space from a previous
step where I create id maps. I
inline
On Apr 3, 2013, at 6:15 PM, Pat Ferrel wrote:
The non-symmetry of the [B'A] and the fact that it is calculated from two
models leads me to a rather heavy handed approach at least for a first cut.
Let me know if this seems right:
//calculate the 'cross' co-occurrence matrix
On Apr 4, 2013, at 5:17 PM, Pat Ferrel wrote:
One issue with the method below is that the two source matrices would not
have values for all users or items (rows or columns). I do know the entire
user and item id space from a previous step so I know the # of rows including
blank ones and #
This may not quite be true because the RSJ is able to take some liberties.
The origin of these is that A'A can be viewed as a self join. Thus as rows of
A are read, the cooccurrences can be emitted as they are read.
For B'A, we have to somehow get corresponding rows of A and B at the same time
Completely concur with that. MatrixMultiplicationJob is already using a
mapside merge-join AFAIK.
On 05.04.2013 15:04, Ted Dunning wrote:
This may not quite be true because the RSJ is able to take some liberties.
The origin of these is that A'A can be viewed as a self join. Thus as rows
I guess I don't understand this issue.
In my case both the item ids and user ids of the separate DistributedRow Matrix
will match and I know the size for the entire space from a previous step where
I create id maps. I suppose you are saying the the m/r code would be super
simple if a row of B'
I need to do the equivalent of the xrecommender.mostSimilarItems(long[]
itemIDs, int howMany)
To over simplify this, in the standard Item-Based Recommender this is
equivalent to looking at the item similarities from the preference matrix
(similarity of item pruchases by user). In the
o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity
as similarity measure.
On 02.04.2013 23:43, Pat Ferrel wrote:
Taking an idea from Ted, I'm working on a cross recommender starting
from mahout's m/r implementation of an item-based recommender. We have
purchases and views for items by user. It is straightforward
.
On 02.04.2013 23:43, Pat Ferrel wrote:
Taking an idea from Ted, I'm working on a cross recommender starting
from mahout's m/r implementation of an item-based recommender. We have
purchases and views for items by user. It is straightforward to create a
recommender on purchases but using views
RowSimilarityJob on B'A, I think you would
need an equivalent of RowSimilarityJob to compute B'A. I guess you could
extends the MatrixMultiplicationJob to use the similarity measures from
RowSimilarityJob instead of standard dot products.
I really like the idea of such a cross recommender.
On 03.04.2013 08
the MatrixMultiplicationJob to use the similarity measures from
RowSimilarityJob instead of standard dot products.
I really like the idea of such a cross recommender.
On 03.04.2013 08:33, Ted Dunning wrote:
Sebastian,
What about the assumption that the matrix is symmetric?
A'A is symmetric, but B'A
Taking an idea from Ted, I'm working on a cross recommender starting from
mahout's m/r implementation of an item-based recommender. We have purchases and
views for items by user. It is straightforward to create a recommender on
purchases but using views as a predictor of purchases does not work
o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity
as similarity measure.
On 02.04.2013 23:43, Pat Ferrel wrote:
Taking an idea from Ted, I'm working on a cross recommender starting from
mahout's m/r implementation of an item-based recommender. We have purchases
and views
To pick up an old thread…
A = views items x users
B = purchases items x users
A cross recommender B'A h_v + B'B h_p = r_p
The B'B h_p is the basic boolean mahout recommender trained on purchases and
we'll use that implementation I assume.
B'A gives cooccurrences of views and purchases
Ferrel pat.fer...@gmail.com wrote:
To pick up an old thread…
A = views items x users
B = purchases items x users
A cross recommender B'A h_v + B'B h_p = r_p
The B'B h_p is the basic boolean mahout recommender trained on purchases and
we'll use that implementation I assume.
B'A gives
36 matches
Mail list logo