Re: need help on mahout

2012-11-09 Thread Pat Ferrel
The confusion here may be over the term "supervised" Supervised classification assumes you know which group each user is in, and the classifier builds a model to classify new users into the predefined groups. Usually there is a classifier for each group that, when given a user vector, return h

How to interpret recommendation strength

2012-11-15 Thread Pat Ferrel
Using a boolean data model and log likelihood similarity I get recommendations with strengths. If I were using preference rating magnitudes the recommendation strength is interpreted as the likely magnitude that a user would rate the recommendation. Using the boolean model I get values approach

Re: How to interpret recommendation strength

2012-11-15 Thread Pat Ferrel
* however since it means results can't be ranked by preference value (all are 1). So instead this returns a * sum of similarities to any other user in the neighborhood who has also rated the item. */ On Nov 15, 2012, at 9:59 AM, Pat Ferrel wrote: Using a boolean data model and log l

Re: How to interpret recommendation strength

2012-11-15 Thread Pat Ferrel
ighted by count -- which is to say, it's a sum of similarities. This isn't terribly principled but works reasonably in practice. A simple average tends to over-weight unpopular items, but there are likely better ways to account for that. On Thu, Nov 15, 2012 at 5:59 PM, Pat Ferrel wrote

Re: How to interpret recommendation strength

2012-11-15 Thread Pat Ferrel
similarity, weighted by count -- which is to say, it's a sum of similarities. This isn't terribly principled but works reasonably in practice. A simple average tends to over-weight unpopular items, but there are likely better ways to account for that. On Thu, Nov 15, 2012 at 5:59 PM,

Re: How to interpret recommendation strength

2012-11-15 Thread Pat Ferrel
Trying to catch up. Isn't the sum of similarities actually a globally comparable number for strength of preference in a boolean model? I was thinking it wasn't but it is really. It may not be ideal but as an ordinal it should work, right? Is the logic behind the IDF idea that very popular items

Recommender Evaluator

2012-12-03 Thread Pat Ferrel
I'm doing a very simple recommender based on binary data. Using GenericRecommenderIRStatsEvaluator I get nDCG = NaN for each user. My data is still very incomplete, which means an extremely low cooccurrence rate but there are some since otherwise I'd expect P and R to be 0 and they are not. For

Re: Recommender Evaluator

2012-12-03 Thread Pat Ferrel
will have to decide what NaN means. I am happy to change that -- but would not pay attention to these tests at this scale. On Mon, Dec 3, 2012 at 7:55 PM, Pat Ferrel wrote: > I'm doing a very simple recommender based on binary data. Using > GenericRecommenderIRStatsEvaluator I g

splitDataset

2012-12-05 Thread Pat Ferrel
does anyone know if mahout/examples/bin/factorize-movielens-1M.sh is still working? CLI version of splitDataset is crashing in my build (latest trunk). Even as in "mahout splitDataset" to get the params. Wouldn't be the first time I mucked up a build though.

Re: splitDataset crashes

2012-12-07 Thread Pat Ferrel
it complete correctly. Not exactly sure how this is supposed to be done, it doesn't look like the options get parsed in the super class automatically. This will cause any invocation of splitDataset or DatasetSplitter to crash running the current trunk. On Dec 5, 2012, at 1:58 PM, Pat Ferre

Parameter choice and tuning parallelALS

2013-01-02 Thread Pat Ferrel
What is the intuition regarding the choice or tuning of the ALS params? Job-Specific Options: --lambda lambda regularization parameter --implicitFeedback implicitFeedback

Re: is Hadoop based SVD_ALS a complete feature?

2013-01-17 Thread Pat Ferrel
+1 this, found the same problems, same fixes. Haven't seem your last problem On Jan 11, 2013, at 1:41 PM, Ying Liao wrote: I am tring factorize-movielens-1M.sh. I first find a bug in the sh file. Then I find a bug in org.apache.mahout.cf.taste.hadoop.als.DatasetSplitter, the argMap is not mapped

Re: is Hadoop based SVD_ALS a complete feature?

2013-01-17 Thread Pat Ferrel
elter wrote: Which version/distribution of Hadoop are you using? On 17.01.2013 16:08, Pat Ferrel wrote: > +1 this, found the same problems, same fixes. Haven't seem your last problem > > On Jan 11, 2013, at 1:41 PM, Ying Liao wrote: > > I am tring factorize-movielens-1M.sh.

Re: (near) real time recommender/predictor

2013-02-02 Thread Pat Ferrel
RE: Temporal effects. In CF you are interested in similarities. For instance in a User-based CF recommender you want to detect users similar to a given user. The time decay of the similarities is likely to be very slow. In other word if I bought an iPad 1 and you bought an iPad 1, the similarity

Re: (near) real time recommender/predictor

2013-02-02 Thread Pat Ferrel
mporal dynamics. On Sat, Feb 2, 2013 at 9:54 AM, Pat Ferrel wrote: > RE: Temporal effects. In CF you are interested in similarities. For > instance in a User-based CF recommender you want to detect users similar to > a given user. The time decay of the similarities is likely to be ve

Using IDF in CF recommender

2013-02-05 Thread Pat Ferrel
2013 at 1:03 PM, Pat Ferrel wrote: > Indeed, please elaborate. Not sure what you mean by "this is an important > effect" > > Do you disagree with what I said re temporal decay? > No. I agree with it. Human relatedness decays much more quickly than item popularity. I

Re: Using IDF in CF recommender

2013-02-06 Thread Pat Ferrel
: On Tue, Feb 5, 2013 at 11:29 AM, Pat Ferrel wrote: > I think you meant: "Human relatedness decays much slower than item > popularity." > Yes. Oops. > So to make sure I understand the implications of using IDF… For > boolean/implicit preferences the sum of all pref

Re: Using IDF in CF recommender

2013-02-06 Thread Pat Ferrel
The affect of downweighting the popular items is very similar to removing them from recommendations so I still suspect precision will go down using IDF. Obviously this can pretty easily be tested, I just wondered if anyone had already done it. This brings up a problem with holdout based precisi

Implicit preferences

2013-02-09 Thread Pat Ferrel
I'd like to experiment with using several types of implicit preference values with recommenders. I have purchases as an implicit pref of high strength. I'd like to see if add-to-cart, view-product-details, impressions-seen, etc. can increase offline precision in holdout tests. These less than ob

Re: Implicit preferences

2013-02-09 Thread Pat Ferrel
nt for the affect: you looked at certain items and eventually purchased one and I looked at the same items so I might like what you purchased. It also seems to work better in the existing mahout framework. On Feb 9, 2013, at 11:50 AM, Pat Ferrel wrote: I'd like to experiment with using s

Re: Implicit preferences

2013-02-12 Thread Pat Ferrel
together but not as strongly as ought to > be obvious from the fact that they're the same. Still, interesting thought. > > There ought to be some 'signal' in this data, just a question of how much > vs noise. A purchase means much more than a page view of course; it'

Shopping cart

2013-02-14 Thread Pat Ferrel
There are several methods for recommending things given a shopping cart contents. At the risk of using the same tool for every problem I was thinking about a recommender's use here. I'd do something like train on shopping cart purchases so row = cartID, column = itemID. Given cart contents I co

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
53 AM, Pat Ferrel wrote: > There are several methods for recommending things given a shopping cart > contents. At the risk of using the same tool for every problem I was > thinking about a recommender's use here. > > I'd do something like train on shopping cart purch

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
eas you've mentioned here. Given N items in a cart, which next item most frequently occurs in a purchased cart? On Thu, Feb 14, 2013 at 6:30 PM, Pat Ferrel wrote: > I thought you might say that but we don't have the add-to-cart action. We > have to calculate cart purchases by ma

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
own version of it. Yes you are computing similarity for k carted items by all N items, but is N so large? hundreds of thousands of products? this is still likely pretty fast even if the similarity is over millions of carts. Some smart precomputation and caching goes a long way too. On Thu, Feb 14

Re: Shopping cart

2013-02-14 Thread Pat Ferrel
2013, at 6:09 PM, Ted Dunning wrote: Do you see the contents of the cart? Is the cart ID opaque? Does it persist as a surrogate for a user? On Thu, Feb 14, 2013 at 10:30 AM, Pat Ferrel wrote: > I thought you might say that but we don't have the add-to-cart action. We > have t

Re: Problems with Mahout's RecommenderIRStatsEvaluator

2013-02-17 Thread Pat Ferrel
Time splits are fine but may contain anomalies that bias the data. If you are going to compare two recommenders based on time splits, make sure the data is exactly the same for each recommender. One time split we did to create a 90-10 training to test set had a split date of 12/24! Some form of

Cross recommendation

2013-02-21 Thread Pat Ferrel
be some 'signal' in this data, just a question of how much > vs noise. A purchase means much more than a page view of course; it's not > as subject to noise. Finding a means to use that info is probably going to > help. > > > > > On Sat, Feb 9, 2

Re: Cross recommendation

2013-02-22 Thread Pat Ferrel
My plan was to NOT use lucene to start with though I see the benefits. This is because I want to experiment with weighting--doing idf, no weighting, and with a non-log idf. Also I want to experiment with temporal decay of recomendability and maybe blend item similarity based results in certain c

Re: Cross recommendation

2013-02-23 Thread Pat Ferrel
combined item recommendation matrix which is roughly twice as much work as you need to do and it also doesn't let you adjust weightings separately. But it is probably the simplest way to get going with cross recommendation. On Fri, Feb 22, 2013 at 9:48 AM, Pat Ferrel wrote: > There

Re: Cross recommendation

2013-02-24 Thread Pat Ferrel
rm set of users to connect the items together. When you compute the cooccurrence matrix you get A_1' A_1 + A_2' A_2 which gives you recommendations from 1=>1 and from 2=>2, but no recommendations 1=>2 or 2=>1. Thus, no cross recommendations. On Sat, Feb 23, 2013 at 10

[B'A] h_v cross recommender

2013-03-19 Thread Pat Ferrel
To pick up an old thread… A = views items x users B = purchases items x users A cross recommender B'A h_v + B'B h_p = r_p The B'B h_p is the basic boolean mahout recommender trained on purchases and we'll use that implementation I assume. B'A gives cooccurrences of views and purchases multiplyi

Re: [B'A] h_v cross recommender

2013-03-19 Thread Pat Ferrel
rong since view similarity unfiltered by purchase is not ideal) or the cooccurrences in [B'A] and since this is not symmetric it will matter whether I look at columns or rows. Either correspond to item ids but similarities will be different. Has anyone tried this sort of thing? On Mar 19, 2

cross recommender

2013-04-02 Thread Pat Ferrel
Taking an idea from Ted, I'm working on a cross recommender starting from mahout's m/r implementation of an item-based recommender. We have purchases and views for items by user. It is straightforward to create a recommender on purchases but using views as a predictor of purchases does not work

Re: cross recommender

2013-04-03 Thread Pat Ferrel
to each row of the >> input matrix. You can think of it as computing A'A and sparsifying the >> result afterwards. Furthermore it allows to plug in a similarity measure >> of your choice. >> >> If you want to have a cooccurrence matrix, you can use >> >

Re: cross recommender

2013-04-04 Thread Pat Ferrel
ed it. I will need to pass in the size of the matrices as the size of the user and item space, Correct? On Apr 3, 2013, at 9:15 AM, Pat Ferrel wrote: The non-symmetry of the [B'A] and the fact that it is calculated from two models leads me to a rather heavy handed approach at least for a

Re: cross recommender

2013-04-06 Thread Pat Ferrel
I guess I don't understand this issue. In my case both the item ids and user ids of the separate DistributedRow Matrix will match and I know the size for the entire space from a previous step where I create id maps. I suppose you are saying the the m/r code would be super simple if a row of B'

Re: cross recommender

2013-04-06 Thread Pat Ferrel
I need to do the equivalent of the xrecommender.mostSimilarItems(long[] itemIDs, int howMany) To over simplify this, in the standard Item-Based Recommender this is equivalent to looking at the item similarities from the preference matrix (similarity of item pruchases by user). In the xrecommen

Re: cross recommender

2013-04-10 Thread Pat Ferrel
like views and purchases? On Apr 8, 2013, at 2:31 PM, Ted Dunning wrote: On Sat, Apr 6, 2013 at 3:26 PM, Pat Ferrel wrote: > I guess I don't understand this issue. > > In my case both the item ids and user ids of the separate DistributedRow > Matrix will match and I know th

Re: cross recommender

2013-04-10 Thread Pat Ferrel
to use Wikipedia articles (Myrrix, GraphLab). Another idea is to use StackOverflow tags (Myrrix examples). Although they are only good for emulating implicit feedback. On Wed, Apr 10, 2013 at 6:48 PM, Ted Dunning wrote: > On Wed, Apr 10, 2013 at 10:38 AM, Pat Ferrel > wrote: > >&g

Re: cross recommender

2013-04-11 Thread Pat Ferrel
Getting this running with co-occurrence rather than using a similarity calc on user rows finally forced me to understand what is going on in the base recommender. And the answer implies further work. [B'B] is usually not calculated in the usual item based recommender. The matrix that comes out

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Or you may want to look at recording purchases by user ID. Then use the standard recommender to train on (userID, itemsID, boolean). Then query the trained recommender thus: recommender.mostSimilarItems(long itemID, int howMany) This does what you want but uses more data than just what items wer

Re: Is Mahout the right tool to recommend cross sales?

2013-04-11 Thread Pat Ferrel
Do you not have a user ID? No matter (though if you do I'd use it) you can use the item ID as a surrogate for a user ID in the recommender. And there will be no filtering if you ask for recommender.mostSimilarItems(long itemID, int howMany), which has no user ID in the call and so will not filte

Re: cross recommender

2013-04-12 Thread Pat Ferrel
That looks like the best shortcut. It is one of the few places where the rows of one and the columns of the other are seen together. Now I know why you transpose the first input :-) But, I have begun to wonder whether it is the right thing to do for a cross recommender because you are comparing

Re: cross recommender

2013-04-15 Thread Pat Ferrel
esource. > > Robin > > > On 4/10/13 8:37 PM, "Pat Ferrel" wrote: > >> I have retail data but can't publish results from it. If I could get a >> public sample I'd share how the technique worked out. >> >> Not sure how to simulate

Re: cross recommender

2013-04-16 Thread Pat Ferrel
om/api-profiles/products-api http://www.kaggle.com/c/acm-sf-chapter-hackathon-big/data On Mon, Apr 15, 2013 at 2:03 PM, Pat Ferrel wrote: > MAJOR may be too tame a word. > > Furthermore there are several enhancements the community could make to > support retail data and retail recommen

Re: cross recommender

2013-04-16 Thread Pat Ferrel
k to > view. > > > On Tue, Apr 16, 2013 at 4:53 PM, Pat Ferrel wrote: > >> For the cross-recommender we need some replacement for a primary >> action--purchases and a secondary action--views, clicks, impressions, >> something. >> >> To use this da

Re: cross recommender

2013-04-16 Thread Pat Ferrel
u can infer the search from the data, just not all search results. On Apr 16, 2013, at 1:24 PM, Pat Ferrel wrote: I think Ted is talking about a different application of this idea: http://www.slideshare.net/tdunning/search-as-recommendation The IDs in my case must be in the same space, at very

Re: Clustering product views and sales

2013-05-07 Thread Pat Ferrel
You always will have a "cold start" problem for a subset of users--the new ones to a site. Popularity doesn't always work either. Sometimes you have a flat purchase frequency distribution, as I've seen. In these cases a metadata or content based recommender is nice to fill in. If you have no met

More Cross-recommender thoughts

2013-05-17 Thread Pat Ferrel
I'm doing an experiment creating a recommender from a Pinterest crawl I have going. I have at least three actions that relate to recommendations: Goal: recommend people you (a pinterest user) might want to follow Actions mined by crawling: follows (user, user) followed by (user, user) repinned

Re: Which database should I use with Mahout

2013-05-19 Thread Pat Ferrel
Using a Hadoop version of a Mahout recommender will create some number of recs for all users as its output. Sean is talking about Myrrix I think which uses factorization to get much smaller models and so can calculate the recs at runtime for fairly large user sets. However if you are using Maho

Re: Which database should I use with Mahout

2013-05-19 Thread Pat Ferrel
On May 19, 2013 6:27 PM, "Pat Ferrel" wrote: > Using a Hadoop version of a Mahout recommender will create some number of > recs for all users as its output. Sean is talking about Myrrix I think > which uses factorization to get much smaller models and so can calculate > the

Re: Which database should I use with Mahout

2013-05-19 Thread Pat Ferrel
no user data in the matrix. Or are you talking about using the user history as the query? in which case you have to remember somewhere all users' history and look it up for the query, no? On May 19, 2013, at 8:09 PM, Ted Dunning wrote: On Sun, May 19, 2013 at 8:04 PM, Pat Ferrel wrote: &

Re: Which database should I use with Mahout

2013-05-20 Thread Pat Ferrel
I certainly have questions about this architecture mentioned below but first let me make sure I understand. You use the user history vector as a query? This will be a list of item IDs and strength-of-preference values (maybe 1s for purchases). The cooccurrence matrix has columns treated like t

Re: Which database should I use with Mahout

2013-05-21 Thread Pat Ferrel
In the interest of getting some empirical data out about various architectures: On Mon, May 20, 2013 at 9:46 AM, Pat Ferrel wrote: >> ... >> You use the user history vector as a query? > > The most recent suffix of the history vector. How much is used varies by > the

Re: Which database should I use with Mahout

2013-05-22 Thread Pat Ferrel
This data was for a mobile shopping app. Other answers below. > On May 21, 2013, at 5:42 PM, Ted Dunning wrote: > > Inline > > > On Tue, May 21, 2013 at 8:59 AM, Pat Ferrel wrote: > >> In the interest of getting some empirical data out about various >> arc

Re: Interpreting Cluster Dump Metrics

2013-05-24 Thread Pat Ferrel
e averages for all clusters? I don't think I've heard of this before. Seems interesting is there a paper? On May 21, 2013, at 9:53 PM, Ted Dunning wrote: On Tue, May 21, 2013 at 8:47 PM, Pat Ferrel wrote: > For this sample it looks like about 20-40 clusters is "best"? Loo

Re: Interpreting Cluster Dump Metrics

2013-05-25 Thread Pat Ferrel
proportional to log-likelihood (with an offset) for the mixture of Gaussian model that underlies k-means clustering. See this paper for a use of mean squared distance to nearest cluster. On Fri, May 24, 2013 at 9:46 AM, Pat Ferrel wrote: > I'm trying to automate something like a hier

Re: Blending initial recommendations for cross recommendation

2013-05-31 Thread Pat Ferrel
I've got a cross-recommender too. It was originally conceived to do a multi-action ensemble from Ted's notes. I'm now gathering a new data set and building the meta-model learner. Even with the same scale you need to learn the weighting factors. Consider a simple ensemble case: R_p is the matr

Re: Blending initial recommendations for cross recommendation

2013-06-01 Thread Pat Ferrel
u really perform gradient descent learning of the weights using hadoop/mahout? Isn't this too costly to perform due to the overheads of the JVM and hadoop? On Jun 1, 2013, at 1:21 AM, Pat Ferrel wrote: > I've got a cross-recommender too. It was originally conceived to do a &g

Controlling output locations

2013-06-04 Thread Pat Ferrel
Am I loosing my mind or did the --outputPath option get removed from the MatrixMultiplicationJob recently? It looks like it is now in 'productWith-xxx' so I'll have to search for the most recent dir of that name? And why isn't there a --outputPath option to transpose? I have to search for the mo

Re: Blending initial recommendations for cross recommendation

2013-06-04 Thread Pat Ferrel
and, of course, eventually A/B test it. You don't always have time associated with actions. In the data I'm mining from Pinterest, for example, the date one user followed another user is not available. So there is no reasonable way to do truncation. Maybe Pinterest could do better.

Re: Blending initial recommendations for cross recommendation

2013-06-04 Thread Pat Ferrel
cky and will require some sort of grid search for good parameters (which might be sped up by using an evolutionary algorithm picking the best intermediate solutions). Since, most of what I wrote above about evaluation is still in the planning stage, any suggestions are welcome! On Jun 4, 2013, at

Negative Preferences in a Recommender

2013-06-17 Thread Pat Ferrel
In the case where you know a user did not like an item, how should the information be treated in a recommender? Normally for retail recommendations you have an implicit 1 for a purchase and no value otherwise. But what if you knew the user did not like an item? Maybe you have records of "I want

Re: Negative Preferences in a Recommender

2013-06-18 Thread Pat Ferrel
lidation search, which is initially quite expensive (even for >> distributed machine cluster tech), but could be incrementally bail out > much >> sooner after previous good guess is already known. >> >> MR doesn't work well for this though since it requires A LOT of >

Re: Negative Preferences in a Recommender

2013-06-18 Thread Pat Ferrel
distributed machine cluster tech), but could be incrementally bail out much sooner after previous good guess is already known. MR doesn't work well for this though since it requires A LOT of iterations. On Mon, Jun 17, 2013 at 5:51 PM, Pat Ferrel wrote: > In the case where you know a user did

Re: Preserve contents of keys after running k-means

2013-07-05 Thread Pat Ferrel
I think https://issues.apache.org/jira/browse/MAHOUT-1030 may be the wrong issue #. The problem is that the Names from NamedVectorWritable are not used in the cluster map after kmeans. You need to maintain your own map of your vector name to internal Mahout id ints. NamedVectors work all the w

Re: Preserve contents of keys after running k-means

2013-07-06 Thread Pat Ferrel
ther processing. On Jul 5, 2013, at 10:28 PM, Andrew Musselman wrote: I want to have the core feature of k-means which is to find out which vectors landed in what cluster, and I'm open to discussion beyond that. Best Andrew On Jul 5, 2013, at 5:43 PM, Pat Ferrel wrote: > I think ht

Re: Setting up a recommender

2013-07-21 Thread Pat Ferrel
Read the paper, and the preso. As to the 'offline to Solr' part. It sounds like you are suggesting an item item similarity matrix be stored and indexed in Solr. One would have to create the action matrix from user profile data (preference history), do a rowsimiarity job on it (using LLR similar

Re: Setting up a recommender

2013-07-21 Thread Pat Ferrel
no reason to not use a real data set. There is a strong reason to use a synthetic dataset, however, in that it can be trivially scaled up and down both in items and users. Also, the synthetic dataset doesn't require that the real data be found and downloaded. On Sun, Jul 21, 2013 at 2:17 PM

Re: Setting up a recommender

2013-07-22 Thread Pat Ferrel
can handle two kinds of input, then > RowSimilarity can be easily modified to be CrossRowSimilarity. Likewise, > if we have two DRM's with the same row id's in the same order, we can do a > map-side merge. Such a merge can be very efficient on a system like MapR > where

Re: Setting up a recommender

2013-07-22 Thread Pat Ferrel
inline BTW if there is an LLR cross-similarity job (replacing [B'A] it is easy to integrate. On Jul 22, 2013, at 12:09 PM, Ted Dunning wrote: On Mon, Jul 22, 2013 at 9:20 AM, Pat Ferrel wrote: > +10 > > Love the academics but I agree with this. Recently saw a VP from Netfli

Re: Setting up a recommender

2013-07-23 Thread Pat Ferrel
lans for the next couple weeks as it happens anyway. Let me know if you want me to proceed. On Jul 22, 2013, at 3:42 PM, Ted Dunning wrote: On Mon, Jul 22, 2013 at 12:40 PM, Pat Ferrel wrote: > Yes. And the combined recommender would query on both at the same time. > > Pat-- does

Re: Setting up a recommender

2013-07-23 Thread Pat Ferrel
w. On Jul 23, 2013, at 10:37 AM, Ted Dunning wrote: This sounds great. Go for it. Put a comment on the design doc with a pointer to text that I should import. On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel wrote: I can supply: 1) a Maven based project in a public github repo as a baseline

Re: Setting up a recommender

2013-07-23 Thread Pat Ferrel
arity rank is not something we want to lose so unless someone has a better idea I'll just order the IDs in the fields and call it good for now. On Jul 23, 2013, at 12:03 PM, Pat Ferrel wrote: Will do. For what it's worth… The project I'm working on is an online recommender

Re: Setting up a recommender

2013-07-24 Thread Pat Ferrel
looks like similarity and TFIDF are plugable in Solr and seem pretty easy to change. Planning to use cosine for the first cut since it's default. On Jul 24, 2013, at 4:10 AM, Michael Sokolov wrote: On 7/23/13 7:26 PM, Pat Ferrel wrote: > Honestly not trying to make this more co

Re: Setting up a recommender

2013-07-24 Thread Pat Ferrel
uot;elegant" or "home-style" might be good indicators for different restaurants even if those terms don't appear in a restaurant description. Sent from my iPhone On Jul 23, 2013, at 18:26, Pat Ferrel wrote: > Honestly not trying to make this more complicated but… >

Re: Setting up a recommender

2013-07-27 Thread Pat Ferrel
On Jul 24, 2013, at 8:32 PM, Pat Ferrel wrote: Understood, catalog categories, tags, etc will make good metadata to be included in the query and putting in separate fields allows us to separately boost each in the query. UserIDs that have interacted with the item is an interesting idea. Howe

Re: Setting up a recommender

2013-07-30 Thread Pat Ferrel
Well its a work in progress but you can see it here: https://github.com/pferrel/solr-recommender There is no Solr integration yet, it is just ingest, create id indexes, run RecommenderJob, and XRecommenderJob. These create the item similarity matrixes, which will be put into Solr. They also cre

Re: Setting up a recommender

2013-07-30 Thread Pat Ferrel
ter this week. On 23.07.2013 19:38, Ted Dunning wrote: > On Tue, Jul 23, 2013 at 9:39 AM, Pat Ferrel wrote: > >> This pipeline lacks downsampling since I had to replace >> PreparePreferenceMatrixJob and potentially LLR for [B'A]. I assume >> Sebastian is the person to

Re: Setting up a recommender

2013-07-30 Thread Pat Ferrel
In the cross-recommender the similarity matrix is calculated doing [B'A]. We want the rows to be stored as the item-item similarities in Solr right? [B'B] is symmetric so just want to make sure I have it straight for [B'A]. B = purchases iphone ipadnexus galaxy surface u1 1

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
A few architectural questions: http://bit.ly/18vbbaT I created a local instance of the LucidWorks Search on my dev machine. I can quite easily save the similarity vectors from the DRMs into docs at special locations and index them with LucidWorks. But to ingest the docs and put them in separate

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
Jul 31, 2013, at 11:20 AM, Pat Ferrel wrote: A few architectural questions: http://bit.ly/18vbbaT I created a local instance of the LucidWorks Search on my dev machine. I can quite easily save the similarity vectors from the DRMs into docs at special locations and index them with LucidWorks. But

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
OK and yes. The docs will look like: ipad iphone iphone nexus iphone ipad ipad galaxy On Jul 31, 2013, at 11:42 AM, B Lyon wrote: I'm interested in helping as well. Btw I thought that what was stored in the solr fields were the

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
isioned in the design > doc, although I could be wrong on this. Anyway I'm pretty open to helping > wherever needed. > > Thanks, > Andrew > > > > > > On 7/31/13 12:20 PM, "Pat Ferrel" wrote: > >> A few architectural questions: http://

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
I'd vote for csv then. On Jul 31, 2013, at 12:00 PM, Ted Dunning wrote: On Wed, Jul 31, 2013 at 11:20 AM, Pat Ferrel wrote: A few architectural questions: http://bit.ly/18vbbaT I created a local instance of the LucidWorks Search on my dev machine. I can quite easily save the simil

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
to be retrieved. Better to have the tags for the single doc on all the related docs so that a single retrieval will pull them all in with their details. On Wed, Jul 31, 2013 at 11:51 AM, Pat Ferrel wrote: > OK and yes. The docs will look like: > > > > ipad >

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
oops, mistyped… If the LLR created DRM has a row: Key: 1, Value { 0:1.0,} where 0 -> iphone and 1 -> ipad then wouldn't the doc look like ipad iphone On Jul 31, 2013, at 12:14 PM, Pat Ferrel wrote: Sorry not sure what you are saying. If the LLR created DRM has a r

Re: Setting up a recommender

2013-07-31 Thread Pat Ferrel
o find a system--free tier AWS, Ted's box, etc. Then install all the needed stuff. I'll get the output working to csv. On Jul 31, 2013, at 11:51 AM, Pat Ferrel wrote: OK and yes. The docs will look like: ipad iphone iphone nexus iphone

Re: Setting up a recommender

2013-08-01 Thread Pat Ferrel
larities there is no need to do more than fetch one doc that contains the similarities, right? I've successfully used this method with the Mahout recommender but please correct me if something above is wrong. On Jul 31, 2013, at 4:52 PM, Ted Dunning wrote: Pat, See inline O

Re: Setting up a recommender

2013-08-01 Thread Pat Ferrel
cross-validation tests. On Aug 1, 2013, at 9:49 AM, Ted Dunning wrote: On Thu, Aug 1, 2013 at 8:46 AM, Pat Ferrel wrote: > > For item similarities there is no need to do more than fetch one doc that > contains the similarities, right? I've successfully used this method with

Re: Setting up a recommender

2013-08-01 Thread Pat Ferrel
agreed to store the rows there too because they were from Bs items. This was the discussion about having different items for cross actions. The excerpt below is Ted responding to my question. So do we want the columns of [B'A]? It's only a transpose away. > On Tue, Jul 30, 2013 at

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
o maybe someone else can check this reasoning. Have a look at the data here https://github.com/pferrel/solr-recommender/blob/master/src/test/resources/Recommender%20Math.xlsx On Aug 1, 2013, at 6:00 PM, Pat Ferrel wrote: Yes, storing the similar_items in a field, cross_action_similar_items in a

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
that, the rows do. Going from rows to columns is the trivial addition of a transpose so I'm going to go ahead with rows for now. This affects the cross_action_similar_items and so only the cross-recommender part of the whole. On Aug 2, 2013, at 8:00 AM, Pat Ferrel wrote: I put so

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
ns of the google matrix ( https://googledrive.com/host/0B2GQktu-wcTiaWw5OFVqT1k3bDA/). There are lots of other different pieces here of course, but show connections soup-to-nuts as much as possible. On Friday, August 2, 2013, Pat Ferrel wrote: > I put some thought into this (actually I sle

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
We doing a hangout at 2 on the Solr recommender?

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
Assuming Ted needs to call it, not sure if an invite has gone out, I haven't seen one. On Aug 2, 2013, at 12:49 PM, B Lyon wrote: I am planning on sitting in as flaky connection allows. On Aug 2, 2013 3:21 PM, "Pat Ferrel" wrote: > We doing a hangout at 2 on the Solr recommender? >

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
sed on composite behavior composed of h_a and h_b query is [b-a-links: h_a b-b-links: h_b] Does this make sense by being more explicit? Now, it is pretty clear that we could have an index of A objects as well but the link fields would have to be a-a-links and a-b-links, of course. On

Re: Setting up a recommender

2013-08-02 Thread Pat Ferrel
Got away with that stupid comment. All doc ids will be from B items even in the general case. On Aug 2, 2013, at 2:39 PM, Pat Ferrel wrote: Thanks, well put. In order to have the ultimate impl with two id spaces for A and B would we have to create different docs for A'B and B'B?

Re: solr-recommender, recent changes to ToItemVectorsMapper

2013-08-04 Thread Pat Ferrel
I'll refresh my copy of the trunk and look into it. If this happens a lot I'll put my version of Mahout on github until it settles down. Had to copy the code for a couple Mahout classes like Recommender and ToItemsVectorReducer to get access to private statics, no substantive changes. I haven't

  1   2   3   4   5   6   7   8   >