Re: Is Mahout the right tool to recommend cross sales?

Sebastian Schelter Thu, 11 Apr 2013 22:23:49 -0700

You can also use the new MultithreadedBatchItemSimilarities class to
efficiently precompute item similarities on a single machine without
having to go to MapReduce.


On 12.04.2013 00:54, Pat Ferrel wrote:
> Do you not have a user ID? No matter (though if you do I'd use it) you can 
> use the item ID as a surrogate for a user ID in the recommender. And there 
> will be no filtering if you ask for recommender.mostSimilarItems(long itemID, 
> int howMany), which has no user ID in the call and so will not filter. Since 
> the recommender doesn't know you are using item IDs for user IDs this should 
> work fine.
> 
> This allows you to use the in-memory version of the recommender as it is 
> described in MiA. The Row and ItemSimilarityJobs are mapreduce and will 
> produce results for all items in a batch. This is fine and will produce much 
> the same results but you will have to code up the query part yourself as a 
> runtime/live/service component. Using the in-memory recommender gives you a 
> query interface to call whenever you are showing a page to the user.
> 
> Using the user ID will allow you to make and blend in user based 
> recommendations, which are calculated based on individual user history. They 
> may not be your primary recommendations, but when you dont have enough item 
> similarities, you can fall back or blend in user recommendations.
> 
> On Apr 11, 2013, at 2:42 PM, Sean Owen <sro...@gmail.com> wrote:
> 
> You can actually create a "user" #6 for your new order. Or you can use
> the "anonymous user" function of the library, although it's hacky.
> 
> We may be mixing up terms here. "DataModel" is a class that has
> nothing to do with Hadoop. Hadoop in turn has no part in real-time
> anything, like recommending to a brand-new "user". However it could
> build an offline model of item-item similarities and you could do
> something like a most-similar-items computation for a given new basket
> of goods. That is effectively what the "anonymous user" function is
> doing anyway.
> 
> You can precompute all recommendations for all items but that's a lot
> of work! It's easy to get away with it with a thousand items, but with
> a million this may be infeasibly slow.
> 
> On Thu, Apr 11, 2013 at 10:38 PM, Billy <b...@ntlworld.com> wrote:
>> As in the example data 'intro.csv' in the MIA it has users 1-5 so if I ask
>> for recommendations for user 1 then this works but if I ask for
>> recommendations for user 6 (a new user yet to be added to the data model)
>> then I get no recommendations ... so if I substitute users for orders then
>> again I will get no recommendations ... which I sort of understand so do I
>> need to inject my 'new' active order; along with its attached item/s into
>> the data model first and then ask for the recommendations for the order by
>> offering up the new orderId? or is there a way of merely offering up an
>> 'item' and then getting recommendations based merely on the item using the
>> data already stored and the relationships with my item?
>>
>> My assumptions:
>> #1
>> I am assuming the data model is a static island of data that has been
>> processed (flattened) overnight (most probably by an Hadoop process) due to
>> the size of this data ... rather than a living document that is updated as
>> soon as new data is available.
>> #2
>> I'm also assuming that instead of reading in the data model and
>> providing recommendations 'on the fly' I will have to run thru every item
>> in my catalogue and find out the top 5 recommended items that are ordered
>> with each item (most probably via a Hadoop process) and then store this
>> output in dynamoDb or luncene for quick access.
>>
>> Sorry for all the questions but it such an interesting subject.
>>
>>
>> On 11 April 2013 22:04, Ted Dunning <ted.dunn...@gmail.com> wrote:
>>
>>> Actually, making this user based is a really good thing because you get
>>> recommendations from one session to the next.  These may be much more
>>> valuable for cross-sell than things in the same order.
>>>
>>>
>>> On Thu, Apr 11, 2013 at 12:50 PM, Sean Owen <sro...@gmail.com> wrote:
>>>
>>>> You can try treating your orders as the 'users'. Then just compute
>>>> item-item similarities per usual.
>>>>
>>>> On Thu, Apr 11, 2013 at 7:59 PM, Billy <b...@ntlworld.com> wrote:
>>>>> Thanks for replying,
>>>>>
>>>>>
>>>>> I don't have users, well I do :-) but in this case it should not
>>>> influence
>>>>> the recommendations
>>>>>
>>>>> ,
>>>>> these need to be based on the relationship between
>>>>> "
>>>>> items ordered with other items
>>>>> in the 'same order'
>>>>> ".
>>>>>
>>>>> E.g. If item #1 has been order with item #4
>>>>>
>>>>> [
>>>>> 22
>>>>> ]
>>>>> times and item #1 has been order with item #9
>>>>> [
>>>>> 57
>>>>> ]
>>>>> times then
>>>>> if I added item #1 to my order
>>>>> these would both be recommended
>>>>> but item #9 would be recommended above item #4 purely based on the fact
>>>> that
>>>>> the relationship between item #1 and item #9 is greater than the
>>>>> relationship with item #4.
>>>>>
>>>>> What I don't want is; if a user ordered items #A, #B, #C separately
>>>>> 'at some point in their order history' then recommen
>>>>> d #A and #C to other users who order #B ... I still don't want this if
>>>> the
>>>>> items are similar and/or the users similar.
>>>>>
>>>>> Cheers
>>>>>
>>>>> Billy
>>>>>
>>>>>
>>>>>
>>>>> On 11 Apr 2013 18:28, "Sean Owen" <sro...@gmail.com> wrote:
>>>>>>
>>>>>> This sounds like just a most-similar-items problem. That's good news
>>>>>> because that's simpler. The only question is how you want to compute
>>>>>> item-item similarities. That could be based on user-item interactions.
>>>>>> If you're on Hadoop, try the RowSimilarityJob (where you will need
>>>>>> rows to be items, columns the users).
>>>>>>
>>>>>> On Thu, Apr 11, 2013 at 6:11 PM, Billy <b...@ntlworld.com> wrote:
>>>>>>> I am very new to Mahout and currently just ready up to chapter 5 of
>>>>>>> 'MIA'
>>>>>>> but after reading about the various User centric and Item centric
>>>>>>> recommenders they all seem to still need a userId so still unsure if
>>>>>>> Mahout
>>>>>>> can help with a fairly common recommendation.
>>>>>>>
>>>>>>> My requirement is to produce 'n' item recommendations based on a
>>>> chosen
>>>>>>> item.
>>>>>>>
>>>>>>> E.g. "if I've added item #1 to my order then based on all the
>>>>>>> other items; in all the other orders for this site, what are the
>>>>>>> likely items that I may also want add to my order based; on the item
>>>> to
>>>>>>> item relationship in the history of orders of this site?"
>>>>>>>
>>>>>>> Most probably using the most popular relationship between the item I
>>>>>>> have
>>>>>>> chosen and all the items in all the other orders.
>>>>>>>
>>>>>>> My data is not 'user' specific; and I don't think it should be, but
>>>> more
>>>>>>> like order specific as its the pattern of items in each order that
>>>>>>> should
>>>>>>> determine the recommendation.
>>>>>>>
>>>>>>> I have no preference values so merely boolean preferences will be
>>>> used.
>>>>>>>
>>>>>>> If Mahout can perform these calculations then how must I present the
>>>>>>> data?
>>>>>>>
>>>>>>> Will I need to shape the data in some way to feed into Mahout
>>>> (currently
>>>>>>> versed in using Hadoop via Aws Emr using Java)
>>>>>>>
>>>>>>> Thanks for the advice in advance,
>>>>>>>
>>>>>>> Billy
>>>>
>>>
>>>
>

Re: Is Mahout the right tool to recommend cross sales?

Reply via email to