You can also use the new MultithreadedBatchItemSimilarities class to efficiently precompute item similarities on a single machine without having to go to MapReduce.
On 12.04.2013 00:54, Pat Ferrel wrote: > Do you not have a user ID? No matter (though if you do I'd use it) you can > use the item ID as a surrogate for a user ID in the recommender. And there > will be no filtering if you ask for recommender.mostSimilarItems(long itemID, > int howMany), which has no user ID in the call and so will not filter. Since > the recommender doesn't know you are using item IDs for user IDs this should > work fine. > > This allows you to use the in-memory version of the recommender as it is > described in MiA. The Row and ItemSimilarityJobs are mapreduce and will > produce results for all items in a batch. This is fine and will produce much > the same results but you will have to code up the query part yourself as a > runtime/live/service component. Using the in-memory recommender gives you a > query interface to call whenever you are showing a page to the user. > > Using the user ID will allow you to make and blend in user based > recommendations, which are calculated based on individual user history. They > may not be your primary recommendations, but when you dont have enough item > similarities, you can fall back or blend in user recommendations. > > On Apr 11, 2013, at 2:42 PM, Sean Owen <sro...@gmail.com> wrote: > > You can actually create a "user" #6 for your new order. Or you can use > the "anonymous user" function of the library, although it's hacky. > > We may be mixing up terms here. "DataModel" is a class that has > nothing to do with Hadoop. Hadoop in turn has no part in real-time > anything, like recommending to a brand-new "user". However it could > build an offline model of item-item similarities and you could do > something like a most-similar-items computation for a given new basket > of goods. That is effectively what the "anonymous user" function is > doing anyway. > > You can precompute all recommendations for all items but that's a lot > of work! It's easy to get away with it with a thousand items, but with > a million this may be infeasibly slow. > > On Thu, Apr 11, 2013 at 10:38 PM, Billy <b...@ntlworld.com> wrote: >> As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask >> for recommendations for user 1 then this works but if I ask for >> recommendations for user 6 (a new user yet to be added to the data model) >> then I get no recommendations ... so if I substitute users for orders then >> again I will get no recommendations ... which I sort of understand so do I >> need to inject my 'new' active order; along with its attached item/s into >> the data model first and then ask for the recommendations for the order by >> offering up the new orderId? or is there a way of merely offering up an >> 'item' and then getting recommendations based merely on the item using the >> data already stored and the relationships with my item? >> >> My assumptions: >> #1 >> I am assuming the data model is a static island of data that has been >> processed (flattened) overnight (most probably by an Hadoop process) due to >> the size of this data ... rather than a living document that is updated as >> soon as new data is available. >> #2 >> I'm also assuming that instead of reading in the data model and >> providing recommendations 'on the fly' I will have to run thru every item >> in my catalogue and find out the top 5 recommended items that are ordered >> with each item (most probably via a Hadoop process) and then store this >> output in dynamoDb or luncene for quick access. >> >> Sorry for all the questions but it such an interesting subject. >> >> >> On 11 April 2013 22:04, Ted Dunning <ted.dunn...@gmail.com> wrote: >> >>> Actually, making this user based is a really good thing because you get >>> recommendations from one session to the next. These may be much more >>> valuable for cross-sell than things in the same order. >>> >>> >>> On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen <sro...@gmail.com> wrote: >>> >>>> You can try treating your orders as the 'users'. Then just compute >>>> item-item similarities per usual. >>>> >>>> On Thu, Apr 11, 2013 at 7:59 PM, Billy <b...@ntlworld.com> wrote: >>>>> Thanks for replying, >>>>> >>>>> >>>>> I don't have users, well I do :-) but in this case it should not >>>> influence >>>>> the recommendations >>>>> >>>>> , >>>>> these need to be based on the relationship between >>>>> " >>>>> items ordered with other items >>>>> in the 'same order' >>>>> ". >>>>> >>>>> E.g. If item #1 has been order with item #4 >>>>> >>>>> [ >>>>> 22 >>>>> ] >>>>> times and item #1 has been order with item #9 >>>>> [ >>>>> 57 >>>>> ] >>>>> times then >>>>> if I added item #1 to my order >>>>> these would both be recommended >>>>> but item #9 would be recommended above item #4 purely based on the fact >>>> that >>>>> the relationship between item #1 and item #9 is greater than the >>>>> relationship with item #4. >>>>> >>>>> What I don't want is; if a user ordered items #A, #B, #C separately >>>>> 'at some point in their order history' then recommen >>>>> d #A and #C to other users who order #B ... I still don't want this if >>>> the >>>>> items are similar and/or the users similar. >>>>> >>>>> Cheers >>>>> >>>>> Billy >>>>> >>>>> >>>>> >>>>> On 11 Apr 2013 18:28, "Sean Owen" <sro...@gmail.com> wrote: >>>>>> >>>>>> This sounds like just a most-similar-items problem. That's good news >>>>>> because that's simpler. The only question is how you want to compute >>>>>> item-item similarities. That could be based on user-item interactions. >>>>>> If you're on Hadoop, try the RowSimilarityJob (where you will need >>>>>> rows to be items, columns the users). >>>>>> >>>>>> On Thu, Apr 11, 2013 at 6:11 PM, Billy <b...@ntlworld.com> wrote: >>>>>>> I am very new to Mahout and currently just ready up to chapter 5 of >>>>>>> 'MIA' >>>>>>> but after reading about the various User centric and Item centric >>>>>>> recommenders they all seem to still need a userId so still unsure if >>>>>>> Mahout >>>>>>> can help with a fairly common recommendation. >>>>>>> >>>>>>> My requirement is to produce 'n' item recommendations based on a >>>> chosen >>>>>>> item. >>>>>>> >>>>>>> E.g. "if I've added item #1 to my order then based on all the >>>>>>> other items; in all the other orders for this site, what are the >>>>>>> likely items that I may also want add to my order based; on the item >>>> to >>>>>>> item relationship in the history of orders of this site?" >>>>>>> >>>>>>> Most probably using the most popular relationship between the item I >>>>>>> have >>>>>>> chosen and all the items in all the other orders. >>>>>>> >>>>>>> My data is not 'user' specific; and I don't think it should be, but >>>> more >>>>>>> like order specific as its the pattern of items in each order that >>>>>>> should >>>>>>> determine the recommendation. >>>>>>> >>>>>>> I have no preference values so merely boolean preferences will be >>>> used. >>>>>>> >>>>>>> If Mahout can perform these calculations then how must I present the >>>>>>> data? >>>>>>> >>>>>>> Will I need to shape the data in some way to feed into Mahout >>>> (currently >>>>>>> versed in using Hadoop via Aws Emr using Java) >>>>>>> >>>>>>> Thanks for the advice in advance, >>>>>>> >>>>>>> Billy >>>> >>> >>> >