No. You seem to be describing combined vendor+service recommendations. So you 
will be creating input of

user ID, combined ID, rating

The way you are creating a combined ID is fine but it must still be mapped to a 
Mahout ID. The user and combined IDs must _each_ be mapped to 0-N. Think of 
Mahout IDs as row and column numbers in one big input table row # = Mahout user 
ID, column # = Mahout item ID. Mapping to and from these IDs is your task. 

user ID —> 0..n
combined ID —> 0..m

Then you will have input that Mahout can ingest. For example

0,0,3
1,0,2
…

The calculated recommendations will use the Mahout IDs so you must map them 
back into yours.

 
On Oct 1, 2014, at 5:58 PM, vinayakb malagatti <vinayakbmalaga...@gmail.com> 
wrote:

Hi Pat,

If I am wrong plz correct me, if we take table 2 (user2) then he rated for
vendor 1 - vendor 3,

  1. I am going assign for each user an ID starting from 1 - N.
  2. Vendors will have the ID with 601,602,603....
  3. Services will have the ID with 501,502,503.....
  4. If I translate the Vendor and Service IDs it looks like
  601501,601502,601503......
  5. The input to the Mahout will be for USER ID, COMBINED ID, RATING
  6. output form the Mahout will be COMBINED IDs, for the user and again I
  have to separate the COMBINED ID into Vendor ID and Service ID.

Is this the correct flow ?


Thanks and Regards,
Vinayak B


On Thu, Oct 2, 2014 at 12:23 AM, Pat Ferrel <p...@occamsmachete.com> wrote:

> First I agree with Ted that LLR is better. I've tried all of the
> similarity methods in Mahout on exactly the same dataset and got far higher
> cross-validation scores for LLR. You may still use pearson with Mahout 0.9
> and 1.0 but it is not supported in the Mahout 1.0 Spark jobs.
> 
> If you have data in tables you need to create single interactions. These
> will look like:
> 
> user1,vendor1,rating
> userN,vendorM,rating
> ...
> 
> If you are recommending vendors (not specific services of specific
> vendors) you need to map your IDs into IDs that the recommender can ingest.
> You can’t tell which of the separate ratings will be used if the same user
> rated multiple services of the same vendor so you should determine which
> rating you want to use as input.
> 
> You need to translate your IDs into Mahout IDs. Let’s say you go through
> all of your vendors, assign the first one a Mahout ID of integer = 0, then
> the next unique vendor you see will get Mahout ID = 1 and so on. You need
> to do this for your Items (vendors) as well. So your input to Mahout will
> look something like this:
> 
> Formatted as Mahout User ID, Mahout Item ID, rating your input files will
> contain:
> 
> 0,0,1
> 0,2000,3
> 0,4,5
> 1,3,1
> 1000,2000,5
> …
> 
> Then after you run the Mahout Item-based recommender you will get back a
> list of recommendations for each user. The key will be an integer equal to
> the Mahout user ID. The value will be a list of Mahout Item IDs with
> strengths. You will need to map the Mahout IDs back into your application
> ids. Since you are recommending vendors the vendors are items so map all
> Mahout Item IDs into your vendor ids and the Mahout User IDs into your user
> ids.
> 
> On Sep 30, 2014, at 6:55 PM, vinayakb malagatti <
> vinayakbmalaga...@gmail.com> wrote:
> 
> Thank you  @Ted, but my guide is suggesting to go with what Pat is
> suggesting. @Pat could you plz tell, if I want to recommend vendors to the
> user from the table how they should be grouped and  you mentioned "*your
> recs will be returned using the same integer IDs so you will have to
> translate your “user1” and “vendor1-service1” into non-negative contiguous
> integers*" i don't know about translation could you plz tell more about the
> translation.
> 
> Thanks and Regards,
> Vinayak B
> 
> 
> On Tue, Sep 30, 2014 at 10:36 AM, Ted Dunning <ted.dunn...@gmail.com>
> wrote:
> 
>> Yes.  But I strongly suggest that you not use Pearson Correlation.
>> 
>> Use the LLR similarity to compute indicator actions for each vendor.
> Then
>> use a user's history of actions to score vendors.  This is not only much
>> simpler than what you are asking for, it will be more accurate.
>> 
>> You should also measure additional actions besides ratings.
>> 
>> 
>> 
>> On Mon, Sep 29, 2014 at 6:56 PM, vinayakb malagatti <
>> vinayakbmalaga...@gmail.com> wrote:
>> 
>>> @Pat and @Ted Thank You so much for the replay. I was looking for the
>>> solution as Pat suggested, here I want to suggest the Vendors to the
> User
>>> which he not yet used by User taking the history of that User and
> compare
>>> with other user who have rated the common vendors. If we take the table
>> in
>>> that
>>> 
>>>  -   for User 1 - he has rated Vendor 1 ,Vendor 3 and Vendor 4 and
>> User 2
>>>  has rated Vendor 1, Vendor 2 and Vendor 3.
>>>  -  Common between User 2 and User 1 are Vendor 1 and Vendor 3.
>>>  - Assume that if Pearson Correlation between them is nearly 1, hence
>> we
>>>  can Recommend the Vendor 2 to the User 1 which User 1 is not used.
>>> 
>>> Can we do like this, using the Apache Mahout  if Yes could you plz give
>>> some brief idea.
>>> 
>>> Thanks and Regards,
>>> Vinayak B
>>> 
>>> 
>>> On Tue, Sep 30, 2014 at 2:10 AM, Ted Dunning <ted.dunn...@gmail.com>
>>> wrote:
>>> 
>>>> I would recommend that you look at actions other than ratings as well.
>>>> 
>>>> Did a user expand and read 1 review?  did they read >3 reviews?
>>>> 
>>>> Did they mark a rating as useful?
>>>> 
>>>> Did they ask for contact information?
>>>> 
>>>> You know your system better than I possibly could, but using other
>>>> information in addition to ratings is very important for getting the
>>>> highest quality predictive information.
>>>> 
>>>> You can start with ratings, but you should push to get other kinds of
>>>> information as much as possible.  Ratings are often given by only a
>> very
>>>> small number of people.  That severely limits how much value you can
>> add
>>>> with a recommendation engine.  At the same time most people are busy
>> not
>>>> giving you ratings, they are doing lots of other things that tell you
>>> what
>>>> they are thinking and reacting to.  If you don't pay attention to that
>>>> additional information, you are handicapping yourself severely.
>>>> 
>>>> 
>>>> On Mon, Sep 29, 2014 at 9:53 AM, vinayakb malagatti <
>>>> vinayakbmalaga...@gmail.com> wrote:
>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I have table something looks like in DB :
>>>>> 
>>>>> 
>>>>> ​​​
>>>>> rating table
>>>>> <
>>>>> 
>>>> 
>>> 
>> 
> https://docs.google.com/spreadsheets/d/1PrShX7X70PqnfIQg0Dfv6mIHtX1k7KSZHTBfTPMv_Do/edit?usp=drive_web
>>>>>> 
>>>>> ​
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Thanks and Regards,
>>>>> Vinayak B
>>>>> 
>>>> 
>>> 
>> 
> 
> 

Reply via email to