I'm creating a matrix of cart ids and items ids so cart x items in cart. The 'preference' then is cartID, itemID. This will create the correct matrix I think.
For any cart id I would get a ranked list of recommended items that was calculated from other carts. This seems like what is needed in a SC recommender. So doing this should give a "recommend to this collection of items", right? The only issue is finding the best cart to get the recs. I would be doing a pair-wise similarity comparison for N carts to the current cart contents and the result would have to come back in a very short amount of time, on the order of the time to get recs for 3M users and 100K items. Not sure what N is yet but the # of items is the same as in the purchase matrix. So finding the best cart to get recs for will be N similarity comparisons--worst case. Each cart is likely to have only a few items in it and I imagine this speeds the similarity calc. I guess I'll try it as described and optimize for speed if the precision is good compared to the apriori algo. On Feb 14, 2013, at 10:57 AM, Sean Owen <sro...@gmail.com> wrote: I don't think it's necessarily slow; this is how item-based recommenders work. The only thing stopping you from using Mahout directly is that I don't think there's an easy way to say "recommend to this collection of items". But that's what is happening inside when you recommend for a user. You can just roll your own version of it. Yes you are computing similarity for k carted items by all N items, but is N so large? hundreds of thousands of products? this is still likely pretty fast even if the similarity is over millions of carts. Some smart precomputation and caching goes a long way too. On Thu, Feb 14, 2013 at 7:10 PM, Pat Ferrel <pat.fer...@gmail.com> wrote: > Yes, one time tested way to do this is the "apriori" algo which looks at > frequent item sets and creates rules. > > I was looking for a shortcut using a recommender, which would be super > easy to try. The rule builder is a little harder to implement but we can > also test precision on that and compare the two. > > The recommender method below should be reasonable AFAICT except for the > method(s) of retrieving recs, which seem likely to be slow. > > On Feb 14, 2013, at 9:45 AM, Sean Owen <sro...@gmail.com> wrote: > > This sounds like a job for frequent item set mining, which is kind of a > special case of the ideas you've mentioned here. Given N items in a cart, > which next item most frequently occurs in a purchased cart? > > > On Thu, Feb 14, 2013 at 6:30 PM, Pat Ferrel <pat.fer...@gmail.com> wrote: > >> I thought you might say that but we don't have the add-to-cart action. We >> have to calculate cart purchases by matching cart IDs or session IDs. So > we >> only have cart purchases with items. >> >> If we had the add-to-cart and the purchase we could use your cross-action >> method for getting recs by training only on those two actions. >> >> Still without the add-to-cart the method below should work, right? The >> main problem being finding a similar cart in the training set quickly. > Are >> there other problems? >> >> On Feb 14, 2013, at 9:19 AM, Ted Dunning <ted.dunn...@gmail.com> wrote: >> >> I think that this is an excellent use case for cross recommendation from >> cart contents (items) to cart purchases (items). The cross aspect is > that >> the recommendation is from two different kinds of actions, not two kinds > of >> things. The first action is insertion into a cart and the second is >> purchase of an item. >> >> On Thu, Feb 14, 2013 at 9:53 AM, Pat Ferrel <pat.fer...@gmail.com> > wrote: >> >>> There are several methods for recommending things given a shopping cart >>> contents. At the risk of using the same tool for every problem I was >>> thinking about a recommender's use here. >>> >>> I'd do something like train on shopping cart purchases so row = cartID, >>> column = itemID. >>> Given cart contents I could find the most similar cart in the training >> set >>> by using a similarity measure then get recs for this closest matched >> cart. >>> >>> The search for similar carts may be slow if I have to check for pairwise >>> similarity so I could cluster and find the best cluster then search it >> for >>> the best cart. I could create a decision tree on all trained carts and >> walk >>> as far as I can down the tree to find the cart with the most >> cooccurrences. >>> There may be other cooccurrence based methods in mahout??? With the id > of >>> the cart I can then get recs from the training set. I could also fold-in >>> the new cart contents to the training set and ask for recs based on it >>> (this seems like it would take a long time to compute). This last would >>> also pollute the trained matrix with partial carts over time. >>> >>> This seems like another place where Lucene might help but are there > other >>> mahout methods to look at before I diving into Lucene? >> >> > >