: Suneel Marthi [mailto:suneel_mar...@yahoo.com]
Sent: Wednesday, March 19, 2014 9:08 AM
To: fx MA XIAOJUN; user@mahout.apache.org
Subject: Re: reduce is too slow in StreamingKmeans
When dealing with Streaming KMeans, it would be helpful for troubleshooting
purposes if u could provide the values for k
: Tuesday, March 18, 2014 10:50 AM
To: Suneel Marthi; user@mahout.apache.org
Subject: RE:
reduce is too slow in StreamingKmeans
Thank you for your extremely quick reply.
What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u
mean Streaming KMeans here?
I want to try using -rskm
: Suneel Marthi [mailto:suneel_mar...@yahoo.com]
Sent: Wednesday, February 19, 2014 1:08 AM
To: user@mahout.apache.org
Subject: Re: reduce is too slow in StreamingKmeans
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the
slow performance that you have been experiencing
@mahout.apache.org
Subject: Re: reduce is too slow in StreamingKmeans
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the
slow performance that you have been experiencing.
How did u come up with -km 63000?
Given that u would like 1 clusters (= k) and have 2,000,000
compatible with Hadoop 0.20?
-Original Message-
From: Suneel Marthi [mailto:suneel_mar...@yahoo.com]
Sent: Monday, March 17, 2014 6:21 PM
To: fx MA XIAOJUN; user@mahout.apache.org
Subject: Re: reduce is too slow in StreamingKmeans
On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN xiaojun
: Re: reduce is too slow in StreamingKmeans
On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN xiaojun...@fujixerox.co.jp
wrote:
Thank you for your quick reply.
As to -km, I thought it was log10, instead of ln. I was wrong...
This time I set -km 14 and run mahout streamingkmeans again.(CDH
[mailto:xiaojun...@fujixerox.co.jp]
Sent: Tuesday, March 18, 2014 10:50 AM
To: Suneel Marthi; user@mahout.apache.org
Subject: RE: reduce is too slow in StreamingKmeans
Thank you for your extremely quick reply.
What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u
mean Streaming KMeans
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the
slow performance that you have been experiencing.
How did u come up with -km 63000?
Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (=
n) so k * ln(n) = 1 * ln(2 * 10^6) = 145087
I am using mahout 0.8 embedded in chd5.0.0 provided by cloudera and found
that reduce of mahout streamingkmeans is extremely slow.
For example:
With a dataset of 200 objects, 128 variables, I would like to get 1
clusters.
The command executed is as the following.
mahout streamingkmeans