Re: recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Sebastian Schelter Sun, 02 Jan 2011 02:25:01 -0800

Am 02.01.2011 11:21, schrieb Stefano Bellasio:
> One question related to users.txt where i specify the users number: how can i 
> type more users? what format? right now i think is one number for each row, 
> is right? Thanks
>


Exactly, it's one userID per line.

--sebastian

> Il giorno 02/gen/2011, alle ore 11.08, Sebastian Schelter ha scritto:
> 
>> Hi Stefano, happy new year too!
>>
>> The running time of RecommenderJob is neither proportional to the number
>> of users you wanna compute recommendations for nor to the number of
>> recommendations per single user. Those parameters just influence the
>> last step of the job, but most time will be spent before when computing
>> item-item-similarities, which is done independently of the number of
>> users you wanna have recommendations for or the number of
>> recommendations per user.
>>
>> We have some parameters to control the amount of data considered in the
>> recommendation process, have you tried adjusting them to your needs? If
>> you haven't I think playing with those should be the best place to start
>> for you:
>>
>>  --maxPrefsPerUser maxPrefsPerUser
>>      Maximum number of preferences considered per user in final
>>      recommendation phase
>>
>>  --maxSimilaritiesPerItem maxSimilaritiesPerItem
>>      Maximum number of similarities considered per item
>>
>>  --maxCooccurrencesPerItem (-o) maxCooccurrencesPerItem
>>      try to cap the number of cooccurrences per item to this number
>>
>>
>> It would be very cool if you could keep us up to date with your progress
>> and maybe provide some numbers. I think there are a lot of things in the
>> RecommenderJob that could be optimized by us to increase its performance
>> and scalability, I think we'd be happy to patch it for you if you
>> encounter a problem.
>>
>> --sebastian
>>
>>
>> Am 02.01.2011 10:36, schrieb Stefano Bellasio:
>>> Hi guys, happy new year :) well, after several weeks of testing finally i 
>>> had a complete amazon ec2-hadoop working environment thanks to Cloudera ec2 
>>> script. Well, right now i'm doing some test with movielens (10 mln version) 
>>> and i need just to compute recommendations with different similirity by 
>>> RecommenderJob, all is ok. I ran Amazon EC2 cluster with 3 instances, 1 
>>> master node and 2 worker node (large instance) but even if i know that 
>>> recommender is not fast, i was thinking that 3 instances are very fast...my 
>>> process took about 3 hours to complete for 1 users (i specified the user 
>>> that needs recommendation with a user.txt file)....and just 10 
>>> recommendations. So, my question is, what is the correct setup for my 
>>> cluster? How many nodes? How many data nodes and so on? Is there something 
>>> that i can do to speed up this process...my goal is to recommend with a 
>>> dataset of about 20/30 GB and 200 milions of items...so i'm worried about 
>>> that. 
>>>
>>> Thanks :) Stefano
>>
>

Re: recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Reply via email to