Hi guys, happy new year :) well, after several weeks of testing finally i had a complete amazon ec2-hadoop working environment thanks to Cloudera ec2 script. Well, right now i'm doing some test with movielens (10 mln version) and i need just to compute recommendations with different similirity by RecommenderJob, all is ok. I ran Amazon EC2 cluster with 3 instances, 1 master node and 2 worker node (large instance) but even if i know that recommender is not fast, i was thinking that 3 instances are very fast...my process took about 3 hours to complete for 1 users (i specified the user that needs recommendation with a user.txt file)....and just 10 recommendations. So, my question is, what is the correct setup for my cluster? How many nodes? How many data nodes and so on? Is there something that i can do to speed up this process...my goal is to recommend with a dataset of about 20/30 GB and 200 milions of items...so i'm worried about that.
Thanks :) Stefano
