von Herget; user@mahout.apache.org
Subject: Re: GC Overhead limit exceed in sequential mode of Mahout
Streamingkmeans
... forgot to ask?
How many dimensions r u trying to cluster on?
Adding a combiner may address this excessive memory usage issue in the reducer
(presently not there).
On
Marthi [mailto:suneel_mar...@yahoo.com]
Sent: Thursday, March 27, 2014 9:19 AM
To: Roland von Herget; user@mahout.apache.org
Subject: Re: GC Overhead limit exceed in sequential mode of Mahout
Streamingkmeans
... forgot to ask?
How many dimensions r u trying to cluster on?
Adding a combiner
... forgot to ask?
How many dimensions r u trying to cluster on?
Adding a combiner may address this excessive memory usage issue in the reducer
(presently not there).
On Wednesday, March 26, 2014 8:10 PM, Suneel Marthi
wrote:
Hi Roland,
Could u tell me
how many intermediate centroids
Hi Roland,
Could u tell me how many intermediate centroids were being emitted from the
mappers to the single reducer in ur scenario? You have 6GB allocated for a
reducer which is way more than what I can get on my work cluster (only 2GB -:))
.
I take it that you have not specified the -rskm
Hi Suneel,
I have the exact same problem with the following values:
No of docs: 25.904.599
command line params: -k 1000 -km 17070
Reducer Xmx is 6GB, running in full Map/Reduce mode.
Do you have any other idea what to try?
Thanks,
Roland
On Tue, Mar 25, 2014 at 7:13 PM, Suneel Marthi wrote:
>
What's ur value for -km?
Based on what you had provided -km should be = 1 * ln(200) = 145090
Try reducing ur no. of clusters to 1000 and -km = 14509
On Tuesday, March 25, 2014 2:45 AM, fx MA XIAOJUN
wrote:
I am using Mahout Streamingkmeans in sequential mode.
With a dataset of
I am using Mahout Streamingkmeans in sequential mode.
With a dataset of 200 objects, 128 variables, I would like to get 1
clusters.
" GC Overhead limit exceed " error occurred.
How to set java memory limit for sequential model?
Yours Sincerely,
Ma