Cross Recommender Proposal

2014-01-23 Thread Pat Ferrel
Part of the solr-recommender project is a cross-recommender based on Mahout. It uses the mapreduce version of the RecommenderJob as a template and implements an XRecommenderJob. Unfortunately the key part of the algorithm—the part handled by RowSimilarityJob—is done with a simple matrix

Re: More Cross-recommender thoughts

2013-05-17 Thread Ted Dunning
Anonymizing the id's is a good start, especially if you have a relatively small subset of the entire social graph and if the graph is publicly visible in any case. If you have a complete crawl of the graph, then many id's will recoverable by reference back to the public version of the graph.

Re: cross recommender

2013-04-16 Thread Pat Ferrel
For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions, something. To use this data we would treat clicks like a purchase--the primary action we want to recommend. Then the search-result-item-impressions is like

Re: cross recommender

2013-04-16 Thread Ted Dunning
Primary action can be emitting a search term. Secondary can be click to view. On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote: For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions

Re: cross recommender

2013-04-16 Thread Nick Kolegraff
, Pat Ferrel pat.fer...@gmail.com wrote: For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions, something. To use this data we would treat clicks like a purchase--the primary action we want to recommend

Re: cross recommender

2013-04-16 Thread Pat Ferrel
. Secondary can be click to view. On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote: For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions, something. To use this data we would treat clicks

Re: cross recommender

2013-04-16 Thread Pat Ferrel
on that. On Tue, Apr 16, 2013 at 10:29 AM, Ted Dunning ted.dunn...@gmail.com wrote: Primary action can be emitting a search term. Secondary can be click to view. On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote: For the cross-recommender we need some replacement

Re: cross recommender

2013-04-16 Thread Nick Kolegraff
be click to view. On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel pat.fer...@gmail.com wrote: For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions, something. To use this data we would treat clicks like

Re: cross recommender

2013-04-16 Thread Ted Dunning
: For the cross-recommender we need some replacement for a primary action--purchases and a secondary action--views, clicks, impressions, something. To use this data we would treat clicks like a purchase--the primary action we want to recommend. Then the search-result-item-impressions is like

Re: cross recommender

2013-04-15 Thread Robin Morris
I asked management here a while ago whether there would be a problem with releasing an anonymized set of data from one of our retail customers, and didn't get too much push-back. If this is something that would be of major interest, I can ask again and see whether there's something we can put out

Re: cross recommender

2013-04-15 Thread Koobas
Definitely of MAJOR interest. I am sure it would also draw all kinds of desired attention to your business. Movie Lens is way too small to be meaningful any more. Wikipedia articles and Stackoverflow tags are not retail data! By all means, post some real retail data, if you can. Meaningful sizes

Re: cross recommender

2013-04-15 Thread Pat Ferrel
MAJOR may be too tame a word. Furthermore there are several enhancements the community could make to support retail data and retail recommenders. For one thing without public data a *public* cross-recommender will probably not get built. The cross-recommender needs to separate actions types

Re: cross recommender

2013-04-15 Thread Nick Kolegraff
/data On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel pat.fer...@gmail.com wrote: MAJOR may be too tame a word. Furthermore there are several enhancements the community could make to support retail data and retail recommenders. For one thing without public data a *public* cross-recommender

Re: cross recommender

2013-04-12 Thread Pat Ferrel
That looks like the best shortcut. It is one of the few places where the rows of one and the columns of the other are seen together. Now I know why you transpose the first input :-) But, I have begun to wonder whether it is the right thing to do for a cross recommender because you

Re: cross recommender

2013-04-12 Thread Ted Dunning
...@occamsmachete.com wrote: That looks like the best shortcut. It is one of the few places where the rows of one and the columns of the other are seen together. Now I know why you transpose the first input :-) But, I have begun to wonder whether it is the right thing to do for a cross recommender because

Re: cross recommender

2013-04-11 Thread Pat Ferrel
Getting this running with co-occurrence rather than using a similarity calc on user rows finally forced me to understand what is going on in the base recommender. And the answer implies further work. [B'B] is usually not calculated in the usual item based recommender. The matrix that comes out

Re: cross recommender

2013-04-11 Thread Sebastian Schelter
Do I have to create a SimilarityJob( matrixB, matrixA, similarityType ) to get this or have I missed something already in Mahout? It could be worth to investigate whether MatrixMultiplicationJob could be extended to compute similarities instead of dot products. Best, Sebastian

Re: cross recommender

2013-04-10 Thread Pat Ferrel
BTW I have this working on trivial data and am in the process of measuring it's results on some real world data. It does a lot with DistributedRowMatix and so I'll be interested to see how it performs with a larger data set. Does anyone know of a public data set that provides things like views

Re: cross recommender

2013-04-10 Thread Ted Dunning
On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel p...@occamsmachete.com wrote: Does anyone know of a public data set that provides things like views and purchases? I don't.

Re: cross recommender

2013-04-10 Thread Koobas
Retail data may be hard to impossible, but one can improvise. It seems to be fairly common to use Wikipedia articles (Myrrix, GraphLab). Another idea is to use StackOverflow tags (Myrrix examples). Although they are only good for emulating implicit feedback. On Wed, Apr 10, 2013 at 6:48 PM, Ted

Re: cross recommender

2013-04-10 Thread Pat Ferrel
I have retail data but can't publish results from it. If I could get a public sample I'd share how the technique worked out. Not sure how to simulate this data. It has the important characteristic that every purchase is also a view but not the other way around and Ted's technique is a way to

Re: cross recommender

2013-04-08 Thread Ted Dunning
On Sat, Apr 6, 2013 at 3:26 PM, Pat Ferrel p...@occamsmachete.com wrote: I guess I don't understand this issue. In my case both the item ids and user ids of the separate DistributedRow Matrix will match and I know the size for the entire space from a previous step where I create id maps. I

Re: cross recommender

2013-04-06 Thread Ted Dunning
inline On Apr 3, 2013, at 6:15 PM, Pat Ferrel wrote: The non-symmetry of the [B'A] and the fact that it is calculated from two models leads me to a rather heavy handed approach at least for a first cut. Let me know if this seems right: //calculate the 'cross' co-occurrence matrix

Re: cross recommender

2013-04-06 Thread Ted Dunning
On Apr 4, 2013, at 5:17 PM, Pat Ferrel wrote: One issue with the method below is that the two source matrices would not have values for all users or items (rows or columns). I do know the entire user and item id space from a previous step so I know the # of rows including blank ones and #

Re: cross recommender

2013-04-06 Thread Ted Dunning
This may not quite be true because the RSJ is able to take some liberties. The origin of these is that A'A can be viewed as a self join. Thus as rows of A are read, the cooccurrences can be emitted as they are read. For B'A, we have to somehow get corresponding rows of A and B at the same time

Re: cross recommender

2013-04-06 Thread Sebastian Schelter
Completely concur with that. MatrixMultiplicationJob is already using a mapside merge-join AFAIK. On 05.04.2013 15:04, Ted Dunning wrote: This may not quite be true because the RSJ is able to take some liberties. The origin of these is that A'A can be viewed as a self join. Thus as rows

Re: cross recommender

2013-04-06 Thread Pat Ferrel
I guess I don't understand this issue. In my case both the item ids and user ids of the separate DistributedRow Matrix will match and I know the size for the entire space from a previous step where I create id maps. I suppose you are saying the the m/r code would be super simple if a row of B'

Re: cross recommender

2013-04-06 Thread Pat Ferrel
I need to do the equivalent of the xrecommender.mostSimilarItems(long[] itemIDs, int howMany) To over simplify this, in the standard Item-Based Recommender this is equivalent to looking at the item similarities from the preference matrix (similarity of item pruchases by user). In the

Re: cross recommender

2013-04-04 Thread Pat Ferrel
o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity as similarity measure. On 02.04.2013 23:43, Pat Ferrel wrote: Taking an idea from Ted, I'm working on a cross recommender starting from mahout's m/r implementation of an item-based recommender. We have purchases and views for items by user. It is straightforward

Re: cross recommender

2013-04-03 Thread Ted Dunning
. On 02.04.2013 23:43, Pat Ferrel wrote: Taking an idea from Ted, I'm working on a cross recommender starting from mahout's m/r implementation of an item-based recommender. We have purchases and views for items by user. It is straightforward to create a recommender on purchases but using views

Re: cross recommender

2013-04-03 Thread Sebastian Schelter
RowSimilarityJob on B'A, I think you would need an equivalent of RowSimilarityJob to compute B'A. I guess you could extends the MatrixMultiplicationJob to use the similarity measures from RowSimilarityJob instead of standard dot products. I really like the idea of such a cross recommender. On 03.04.2013 08

Re: cross recommender

2013-04-03 Thread Pat Ferrel
the MatrixMultiplicationJob to use the similarity measures from RowSimilarityJob instead of standard dot products. I really like the idea of such a cross recommender. On 03.04.2013 08:33, Ted Dunning wrote: Sebastian, What about the assumption that the matrix is symmetric? A'A is symmetric, but B'A

cross recommender

2013-04-02 Thread Pat Ferrel
Taking an idea from Ted, I'm working on a cross recommender starting from mahout's m/r implementation of an item-based recommender. We have purchases and views for items by user. It is straightforward to create a recommender on purchases but using views as a predictor of purchases does not work

Re: cross recommender

2013-04-02 Thread Sebastian Schelter
o.a.m.math.hadoop.similarity.cooccurrence.measures.CooccurrenceCountSimilarity as similarity measure. On 02.04.2013 23:43, Pat Ferrel wrote: Taking an idea from Ted, I'm working on a cross recommender starting from mahout's m/r implementation of an item-based recommender. We have purchases and views

[B'A] h_v cross recommender

2013-03-19 Thread Pat Ferrel
To pick up an old thread… A = views items x users B = purchases items x users A cross recommender B'A h_v + B'B h_p = r_p The B'B h_p is the basic boolean mahout recommender trained on purchases and we'll use that implementation I assume. B'A gives cooccurrences of views and purchases

Re: [B'A] h_v cross recommender

2013-03-19 Thread Pat Ferrel
Ferrel pat.fer...@gmail.com wrote: To pick up an old thread… A = views items x users B = purchases items x users A cross recommender B'A h_v + B'B h_p = r_p The B'B h_p is the basic boolean mahout recommender trained on purchases and we'll use that implementation I assume. B'A gives