Canopy Clustering is a 2 step process: Canopy Generation followed by Canopy 
Clustering.

For Canopy Generation, it uses a single reducer (and this cannot be overidden), 
while the Clustering task uses multiple reducers.

You seem to be hitting OOM during the Canopy generation phase.





On Monday, November 25, 2013 6:09 PM, Chih-Hsien Wu <chjaso...@gmail.com> wrote:
 
Hi all,  I have been experiencing memory issue while working with Mahout
canopy algorithm on big set of data on Hadoop. I notice that only one
reducer was running while other nodes were idle. I was wondering if
increasing the number of reduce tasks would ease down the memory usage and
speed up procedure. However, I realize that by configuring
"mapred.reduce.tasks" on Hadoop has no effect on canopy reduce tasks. It's
still running only with one reducer. Now, I'm question if canopy is set
that way, or am I not configuring correct on Hadoop?

Reply via email to