Re: ItemSimilarityJob

Sebastian Schelter Wed, 11 Aug 2010 22:34:52 -0700

Hi Charly,

can you tell which Map/Reduce step was executed last before you ran out
of disk space?


I'm not familiar with the Netflix dataset and can only guess what
happened, but I would say that you ran out of diskspace because
ItemSimilarityJob currently uses all preferences to compute the
similarities. This makes it scale in the square of the number of
occurrences of the most popular item, which is a bad thing if that
number is huge. We need a way to limit the number of preferences
considered per item, there is already a ticket for this (
https://issues.apache.org/jira/browse/MAHOUT-460) and I plan to provide
a patch in the next days.

--sebastian



Am 12.08.2010 00:15, schrieb Charly Lizarralde:
> Hi, I am testing ItemSimilarityJob with Netflix data (2.6 GB) and I have
> just ran out of disk space (160 GB) in my mapred.local.dir when running
> RowSimilarityJob.
>
> Is this normal behaviour? How can I improve this?
>
> Thanks!
> Charly
>
>

Re: ItemSimilarityJob

Reply via email to