Am 02.01.2011 11:21, schrieb Stefano Bellasio: > One question related to users.txt where i specify the users number: how can i > type more users? what format? right now i think is one number for each row, > is right? Thanks >
Exactly, it's one userID per line. --sebastian > Il giorno 02/gen/2011, alle ore 11.08, Sebastian Schelter ha scritto: > >> Hi Stefano, happy new year too! >> >> The running time of RecommenderJob is neither proportional to the number >> of users you wanna compute recommendations for nor to the number of >> recommendations per single user. Those parameters just influence the >> last step of the job, but most time will be spent before when computing >> item-item-similarities, which is done independently of the number of >> users you wanna have recommendations for or the number of >> recommendations per user. >> >> We have some parameters to control the amount of data considered in the >> recommendation process, have you tried adjusting them to your needs? If >> you haven't I think playing with those should be the best place to start >> for you: >> >> --maxPrefsPerUser maxPrefsPerUser >> Maximum number of preferences considered per user in final >> recommendation phase >> >> --maxSimilaritiesPerItem maxSimilaritiesPerItem >> Maximum number of similarities considered per item >> >> --maxCooccurrencesPerItem (-o) maxCooccurrencesPerItem >> try to cap the number of cooccurrences per item to this number >> >> >> It would be very cool if you could keep us up to date with your progress >> and maybe provide some numbers. I think there are a lot of things in the >> RecommenderJob that could be optimized by us to increase its performance >> and scalability, I think we'd be happy to patch it for you if you >> encounter a problem. >> >> --sebastian >> >> >> Am 02.01.2011 10:36, schrieb Stefano Bellasio: >>> Hi guys, happy new year :) well, after several weeks of testing finally i >>> had a complete amazon ec2-hadoop working environment thanks to Cloudera ec2 >>> script. Well, right now i'm doing some test with movielens (10 mln version) >>> and i need just to compute recommendations with different similirity by >>> RecommenderJob, all is ok. I ran Amazon EC2 cluster with 3 instances, 1 >>> master node and 2 worker node (large instance) but even if i know that >>> recommender is not fast, i was thinking that 3 instances are very fast...my >>> process took about 3 hours to complete for 1 users (i specified the user >>> that needs recommendation with a user.txt file)....and just 10 >>> recommendations. So, my question is, what is the correct setup for my >>> cluster? How many nodes? How many data nodes and so on? Is there something >>> that i can do to speed up this process...my goal is to recommend with a >>> dataset of about 20/30 GB and 200 milions of items...so i'm worried about >>> that. >>> >>> Thanks :) Stefano >> >
