One question related to users.txt where i specify the users number: how can i type more users? what format? right now i think is one number for each row, is right? Thanks
Il giorno 02/gen/2011, alle ore 11.08, Sebastian Schelter ha scritto: > Hi Stefano, happy new year too! > > The running time of RecommenderJob is neither proportional to the number > of users you wanna compute recommendations for nor to the number of > recommendations per single user. Those parameters just influence the > last step of the job, but most time will be spent before when computing > item-item-similarities, which is done independently of the number of > users you wanna have recommendations for or the number of > recommendations per user. > > We have some parameters to control the amount of data considered in the > recommendation process, have you tried adjusting them to your needs? If > you haven't I think playing with those should be the best place to start > for you: > > --maxPrefsPerUser maxPrefsPerUser > Maximum number of preferences considered per user in final > recommendation phase > > --maxSimilaritiesPerItem maxSimilaritiesPerItem > Maximum number of similarities considered per item > > --maxCooccurrencesPerItem (-o) maxCooccurrencesPerItem > try to cap the number of cooccurrences per item to this number > > > It would be very cool if you could keep us up to date with your progress > and maybe provide some numbers. I think there are a lot of things in the > RecommenderJob that could be optimized by us to increase its performance > and scalability, I think we'd be happy to patch it for you if you > encounter a problem. > > --sebastian > > > Am 02.01.2011 10:36, schrieb Stefano Bellasio: >> Hi guys, happy new year :) well, after several weeks of testing finally i >> had a complete amazon ec2-hadoop working environment thanks to Cloudera ec2 >> script. Well, right now i'm doing some test with movielens (10 mln version) >> and i need just to compute recommendations with different similirity by >> RecommenderJob, all is ok. I ran Amazon EC2 cluster with 3 instances, 1 >> master node and 2 worker node (large instance) but even if i know that >> recommender is not fast, i was thinking that 3 instances are very fast...my >> process took about 3 hours to complete for 1 users (i specified the user >> that needs recommendation with a user.txt file)....and just 10 >> recommendations. So, my question is, what is the correct setup for my >> cluster? How many nodes? How many data nodes and so on? Is there something >> that i can do to speed up this process...my goal is to recommend with a >> dataset of about 20/30 GB and 200 milions of items...so i'm worried about >> that. >> >> Thanks :) Stefano >
