Haven’t read the whole thread but it sounds like you just need some simple start-here info…
To do collaborative filtering you must have user-id, item-id, action/weight. For a minimal commerce CF cooccurrence recommender this is typically something like: user-id, item-id, 1=purchase To use Mahout you will have to translate the ids into positive integers. Treat these like keys to an item-id or user-id lookup. So input into Mahout will be: user-id-key, item-id-key, 1 You can do CF with anonymous user-ids, meaning an individual took the action but you don’t know who. However to use this data you will have to have some way of tying an id to a real person. Using transactions ids as a proxy for user-ids will work in the training data but once you want to make a recommendation you will have to know some real user history to allow the recommender to compare it with transactions. Then you calculate recommendations using some Mahout recommender. If you are using the hadoop version the output will be a row per user-id-key that will contain some number of recommendation item-id-keys and their recommendation weight for sorting purposes. You then write your own retrieval code to get the recs for a given user-id-key, since they are all pre-calculated and in a Sequence File. If you are using the in-memory recommender you can ask for recs for a given user-id-key and get the list returned. You can also use transaction data alone to make anonymous recommendations, but that is market basket analysis. In that case you have: transaction-id-key, item-id-key, 1 Then at recommendation time you have a list of items in a single basket. There are several ways to get this to work so I’ll stop here unless it’s what you need, in which case let us know. On Jan 11, 2014, at 1:38 PM, Tim Smith <timsmit...@hotmail.com> wrote: > Is it about how to arrange your data to use this computation? The > references below might help with that. Yes, I read and tried the recommendation examples from MIA and there is a mention of item to item similarity, but I am not sure what form the file should take. The examples are along the lines of userid,itemid,value In section 6.2 of MIA we are multiplying the Co-occur matrix X User preferences = Recommendations (top of page 97), so if I do not have preferences should I just default them all to the same value? Taken together with your previous comments, is this how I should be preparing my data? Raw Sample Data (format: Transaction|Item) 123|Sun Glasses 124|Sun Glasses 124|Sun Glass Case 125|Sun Glass Case 126|Sun Glasses 126|Glass Repair Kit 127|Glass Repair Kit Are you suggesting that I just simply use (format: userid|item|value) 123|Sun Glasses|1 124|Sun Glasses|1 124|Sun Glass Case|1 125|Sun Glass Case|1 126|Sun Glasses|1 126|Glass Repair Kit|1 127|Glass Repair Kit|1 > Is it regarding the specifics of how you do the computation? I can help > with that, but would need a pointer to the difficulty. Not quite yet. I am working through the intuition first, I'll fight through the math once, if ever, the fog clears