recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Stefano Bellasio Sun, 02 Jan 2011 01:38:05 -0800

Hi guys, happy new year :) well, after several weeks of testing finally i had a 
complete amazon ec2-hadoop working environment thanks to Cloudera ec2 script. 
Well, right now i'm doing some test with movielens (10 mln version) and i need 
just to compute recommendations with different similirity by RecommenderJob, 
all is ok. I ran Amazon EC2 cluster with 3 instances, 1 master node and 2 
worker node (large instance) but even if i know that recommender is not fast, i 
was thinking that 3 instances are very fast...my process took about 3 hours to 
complete for 1 users (i specified the user that needs recommendation with a 
user.txt file)....and just 10 recommendations. So, my question is, what is the 
correct setup for my cluster? How many nodes? How many data nodes and so on? Is 
there something that i can do to speed up this process...my goal is to 
recommend with a dataset of about 20/30 GB and 200 milions of items...so i'm 
worried about that.


Thanks :) Stefano

recommendations with Hadoop and RecommenderJob in Amazon EC2, suggestions for performance?

Reply via email to