Haven’t read the whole thread but it sounds like you just need some simple 
start-here info…

To do collaborative filtering you must have user-id, item-id, action/weight.

For a minimal commerce CF cooccurrence recommender this is typically something 
like: 
user-id, item-id, 1=purchase

To use Mahout you will have to translate the ids into positive integers. Treat 
these like keys to an item-id or user-id lookup. So input into Mahout will be:
user-id-key, item-id-key, 1

You can do CF with anonymous user-ids, meaning an individual took the action 
but you don’t know who. However to use this data you will have to have some way 
of tying an id to a real person. Using transactions ids as a proxy for user-ids 
will work in the training data but once you want to make a recommendation you 
will have to know some real user history to allow the recommender to compare it 
with transactions.

Then you calculate recommendations using some Mahout recommender. If you are 
using the hadoop version the output will be a row per user-id-key that will 
contain some number of recommendation item-id-keys and their recommendation 
weight for sorting purposes. You then write your own retrieval code to get the 
recs for a given user-id-key, since they are all pre-calculated and in a 
Sequence File. If you are using the in-memory recommender you can ask for recs 
for a given user-id-key and get the list returned.

You can also use transaction data alone to make anonymous recommendations, but 
that is market basket analysis. In that case you have:
transaction-id-key, item-id-key, 1

Then at recommendation time you have a list of items in a single basket. There 
are several ways to get this to work so I’ll stop here unless it’s what you 
need, in which case let us know.


On Jan 11, 2014, at 1:38 PM, Tim Smith <timsmit...@hotmail.com> wrote:

> Is it about how to arrange your data to use this computation?  The
> references below might help with that.

Yes, I read and tried the recommendation examples from MIA and there is a 
mention of item to item similarity, but I am not sure what form the file should 
take.  The examples are along the lines of  userid,itemid,value

In section 6.2 of MIA we are multiplying the Co-occur matrix X User 
preferences = Recommendations (top of page 97), so if I do not have preferences 
should
I just default them all to the same value?  Taken together with your previous 
comments, is this how I should be preparing my data?

Raw Sample Data (format: Transaction|Item)
123|Sun Glasses
124|Sun Glasses
124|Sun Glass Case
125|Sun Glass Case
126|Sun Glasses
126|Glass Repair Kit
127|Glass Repair Kit

Are you suggesting that I just simply use (format:  userid|item|value)
123|Sun Glasses|1
124|Sun Glasses|1
124|Sun Glass Case|1
125|Sun Glass Case|1
126|Sun Glasses|1
126|Glass Repair Kit|1
127|Glass Repair Kit|1

> Is it regarding the specifics of how you do the computation?  I can help
> with that, but would need a pointer to the difficulty.

Not quite yet.  I am working through the intuition first, I'll fight through 
the math once, if ever, the fog clears
                                          

Reply via email to