Hi Sebastian,

Thanks for your prompt response. It's just a sample data set from our
database and it may expand up to 6 million ratings. Since the performance
was low for a smaller data set, I thought it would be even worse for a
larger data set. As per your suggestion, I also applied the same command on
1 million user ratings for approx. 6000 users and got the same performance
level.

What is the average running time for the Mahout distributed recommendation
job on 1 million ratings? Does it usually take more than 1 minute?

Thanks in advance,
Warunika


On Fri, Jun 6, 2014 at 2:42 PM, Sebastian Schelter <s...@apache.org> wrote:

> You should not use Hadoop for such a tiny dataset. Use the
> GenericItemBasedRecommender on a single machine in Java.
>
> --sebastian
>
>
> On 06/06/2014 11:10 AM, Warunika Ranaweera wrote:
>
>> Hi,
>>
>> I am using Mahout's recommenditembased algorithm on a data set with nearly
>> 10,000 (implicit) user ratings. This is the command I used:
>> *mahout recommenditembased --input ratings.csv --output recommendation
>>
>> --usersFile users.dat --tempDir temp --similarityClassname
>> SIMILARITY_LOGLIKELIHOOD --numRecommendations 3 *
>>
>>
>> Although the output is successfully generated, this process takes nearly 7
>> minutes to produce recommendations for a single user. The Hadoop cluster
>> has 8 nodes and the machine on which Mahout is invoked is an AWS EC2
>> c3.2xlarge server. When I tracked the mapreduce jobs, I noticed that more
>> than one machine is *not* utilized at a time, and the *recommenditembased*
>>
>> command takes 9 mapreduce jobs altogether with approx. 45 seconds taken
>> per
>> job.
>>
>> Since the performance is too slow for real time recommendations, it would
>> be really helpful to know whether I'm missing out any additional commands
>> or configurations that enables faster performance.
>>
>> Thanks,
>> Warunikay
>>
>>
>

Reply via email to