Hi Charly, can you tell which Map/Reduce step was executed last before you ran out of disk space?
I'm not familiar with the Netflix dataset and can only guess what happened, but I would say that you ran out of diskspace because ItemSimilarityJob currently uses all preferences to compute the similarities. This makes it scale in the square of the number of occurrences of the most popular item, which is a bad thing if that number is huge. We need a way to limit the number of preferences considered per item, there is already a ticket for this ( https://issues.apache.org/jira/browse/MAHOUT-460) and I plan to provide a patch in the next days. --sebastian Am 12.08.2010 00:15, schrieb Charly Lizarralde: > Hi, I am testing ItemSimilarityJob with Netflix data (2.6 GB) and I have > just ran out of disk space (160 GB) in my mapred.local.dir when running > RowSimilarityJob. > > Is this normal behaviour? How can I improve this? > > Thanks! > Charly > >
