Re: recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Stefano Bellasio Sun, 02 Jan 2011 02:18:53 -0800

Thanks Sebastian! I'm working on, so for this evening i will come back with 
some data and tests based on your hints :) Thanks again
Il giorno 02/gen/2011, alle ore 11.08, Sebastian Schelter ha scritto:


> Hi Stefano, happy new year too!
> 
> The running time of RecommenderJob is neither proportional to the number
> of users you wanna compute recommendations for nor to the number of
> recommendations per single user. Those parameters just influence the
> last step of the job, but most time will be spent before when computing
> item-item-similarities, which is done independently of the number of
> users you wanna have recommendations for or the number of
> recommendations per user.
> 
> We have some parameters to control the amount of data considered in the
> recommendation process, have you tried adjusting them to your needs? If
> you haven't I think playing with those should be the best place to start
> for you:
> 
>  --maxPrefsPerUser maxPrefsPerUser
>       Maximum number of preferences considered per user in final
>       recommendation phase
> 
>  --maxSimilaritiesPerItem maxSimilaritiesPerItem
>       Maximum number of similarities considered per item
> 
>  --maxCooccurrencesPerItem (-o) maxCooccurrencesPerItem
>       try to cap the number of cooccurrences per item to this number
> 
> 
> It would be very cool if you could keep us up to date with your progress
> and maybe provide some numbers. I think there are a lot of things in the
> RecommenderJob that could be optimized by us to increase its performance
> and scalability, I think we'd be happy to patch it for you if you
> encounter a problem.
> 
> --sebastian
> 
> 
> Am 02.01.2011 10:36, schrieb Stefano Bellasio:
>> Hi guys, happy new year :) well, after several weeks of testing finally i 
>> had a complete amazon ec2-hadoop working environment thanks to Cloudera ec2 
>> script. Well, right now i'm doing some test with movielens (10 mln version) 
>> and i need just to compute recommendations with different similirity by 
>> RecommenderJob, all is ok. I ran Amazon EC2 cluster with 3 instances, 1 
>> master node and 2 worker node (large instance) but even if i know that 
>> recommender is not fast, i was thinking that 3 instances are very fast...my 
>> process took about 3 hours to complete for 1 users (i specified the user 
>> that needs recommendation with a user.txt file)....and just 10 
>> recommendations. So, my question is, what is the correct setup for my 
>> cluster? How many nodes? How many data nodes and so on? Is there something 
>> that i can do to speed up this process...my goal is to recommend with a 
>> dataset of about 20/30 GB and 200 milions of items...so i'm worried about 
>> that. 
>> 
>> Thanks :) Stefano
>

Re: recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Reply via email to