Hi, I am using Mahout's recommenditembased algorithm on a data set with nearly 10,000 (implicit) user ratings. This is the command I used: *mahout recommenditembased --input ratings.csv --output recommendation --usersFile users.dat --tempDir temp --similarityClassname SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *
Although the output is successfully generated, this process takes nearly 7 minutes to produce recommendations for a single user. The Hadoop cluster has 8 nodes and the machine on which Mahout is invoked is an AWS EC2 c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more than one machine is *not* utilized at a time, and the *recommenditembased* command takes 9 mapreduce jobs altogether with approx. 45 seconds taken per job. Since the performance is too slow for real time recommendations, it would be really helpful to know whether I'm missing out any additional commands or configurations that enables faster performance. Thanks, Warunikay