Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-26 Thread Suneel Marthi
von Herget; user@mahout.apache.org Subject: Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans ... forgot to ask? How many dimensions r u trying to cluster on? Adding a combiner may address this excessive memory usage issue in the reducer (presently not there). On

RE: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-26 Thread fx MA XIAOJUN
Marthi [mailto:suneel_mar...@yahoo.com] Sent: Thursday, March 27, 2014 9:19 AM To: Roland von Herget; user@mahout.apache.org Subject: Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans ... forgot to ask? How many dimensions r u trying to cluster on? Adding a combiner

Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-26 Thread Suneel Marthi
... forgot to ask? How many dimensions r u trying to cluster on? Adding a combiner may address this excessive memory usage issue in the reducer (presently not there). On Wednesday, March 26, 2014 8:10 PM, Suneel Marthi wrote: Hi Roland, Could u tell me how many intermediate centroids

Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-26 Thread Suneel Marthi
Hi Roland, Could u tell me how many intermediate centroids were being emitted from the mappers to the single reducer in ur scenario?  You have 6GB allocated for a reducer which is way more than what I can get on my work cluster (only 2GB -:)) . I take it that you have not specified the -rskm

Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-26 Thread Roland von Herget
Hi Suneel, I have the exact same problem with the following values: No of docs: 25.904.599 command line params: -k 1000 -km 17070 Reducer Xmx is 6GB, running in full Map/Reduce mode. Do you have any other idea what to try? Thanks, Roland On Tue, Mar 25, 2014 at 7:13 PM, Suneel Marthi wrote: >

Re: GC Overhead limit exceed in sequential mode of Mahout Streamingkmeans

2014-03-25 Thread Suneel Marthi
What's ur value for -km? Based on what you had provided -km should be =  1 * ln(200) = 145090 Try reducing ur no. of clusters to 1000 and -km = 14509 On Tuesday, March 25, 2014 2:45 AM, fx MA XIAOJUN wrote: I am using Mahout Streamingkmeans in sequential mode. With a dataset of