1M ratings take up something like 20 megabytes. This is a datasize where it does not make any sense to use Hadoop. Just try the single machine implementation.

--sebastian



On 06/06/2014 12:01 PM, Warunika Ranaweera wrote:
Hi Sebastian,

Thanks for your prompt response. It's just a sample data set from our
database and it may expand up to 6 million ratings. Since the performance
was low for a smaller data set, I thought it would be even worse for a
larger data set. As per your suggestion, I also applied the same command on
1 million user ratings for approx. 6000 users and got the same performance
level.

What is the average running time for the Mahout distributed recommendation
job on 1 million ratings? Does it usually take more than 1 minute?

Thanks in advance,
Warunika


On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter <s...@apache.org> wrote:

You should not use Hadoop for such a tiny dataset. Use the
GenericItemBasedRecommender on a single machine in Java.

--sebastian


On 06/06/2014 11:10 AM, Warunika Ranaweera wrote:

Hi,

I am using Mahout's recommenditembased algorithm on a data set with nearly
10,000 (implicit) user ratings. This is the command I used:
*mahout recommenditembased --input ratings.csv --output recommendation

--usersFile users.dat --tempDir temp --similarityClassname
SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *


Although the output is successfully generated, this process takes nearly 7
minutes to produce recommendations for a single user. The Hadoop cluster
has 8 nodes and the machine on which Mahout is invoked is an AWS EC2
c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more
than one machine is *not* utilized at a time, and the *recommenditembased*

command takes 9 mapreduce jobs altogether with approx. 45 seconds taken
per
job.

Since the performance is too slow for real time recommendations, it would
be really helpful to know whether I'm missing out any additional commands
or configurations that enables faster performance.

Thanks,
Warunikay





Reply via email to