RE: reduce is too slow in StreamingKmeans

2014-03-27 Thread fx MA XIAOJUN
: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Wednesday, March 19, 2014 9:08 AM To: fx MA XIAOJUN; user@mahout.apache.org Subject: Re: reduce is too slow in StreamingKmeans When dealing with Streaming KMeans, it would be helpful for troubleshooting purposes if u could provide the values for k

Re: reduce is too slow in StreamingKmeans

2014-03-18 Thread Suneel Marthi
: Tuesday, March 18, 2014 10:50 AM To: Suneel Marthi; user@mahout.apache.org Subject: RE: reduce is too slow in StreamingKmeans Thank you for your extremely quick reply. What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u mean Streaming KMeans here? I want to try using -rskm

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Wednesday, February 19, 2014 1:08 AM To: user@mahout.apache.org Subject: Re: reduce is too slow in StreamingKmeans Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the slow performance that you have been experiencing

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
@mahout.apache.org Subject: Re: reduce is too slow in StreamingKmeans Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the slow performance that you have been experiencing. How did u come up with -km 63000? Given that u would like 1 clusters (= k) and have 2,000,000

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
compatible with Hadoop 0.20? -Original Message- From: Suneel Marthi [mailto:suneel_mar...@yahoo.com] Sent: Monday, March 17, 2014 6:21 PM To: fx MA XIAOJUN; user@mahout.apache.org Subject: Re: reduce is too slow in StreamingKmeans On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN xiaojun

Re: reduce is too slow in StreamingKmeans

2014-03-17 Thread Suneel Marthi
: Re: reduce is too slow in StreamingKmeans On Monday, March 17, 2014 3:43 AM, fx MA XIAOJUN xiaojun...@fujixerox.co.jp wrote: Thank you for your quick reply. As to -km, I thought it was log10, instead of ln. I was wrong... This time I set -km 14 and run mahout streamingkmeans again.(CDH

RE: reduce is too slow in StreamingKmeans

2014-03-17 Thread fx MA XIAOJUN
[mailto:xiaojun...@fujixerox.co.jp] Sent: Tuesday, March 18, 2014 10:50 AM To: Suneel Marthi; user@mahout.apache.org Subject: RE: reduce is too slow in StreamingKmeans Thank you for your extremely quick reply. What do u mean by this? kmeans hasn't changed between 0.8 and 0.9. Did u mean Streaming KMeans

Re: reduce is too slow in StreamingKmeans

2014-02-18 Thread Suneel Marthi
Streaming KMeans runs with a single reducer that runs Ball KMeans and hence the slow performance that you have been experiencing. How did u come up with -km 63000? Given that u would like 1 clusters (= k) and have 2,000,000 datapoints (= n) so k * ln(n) = 1 * ln(2 * 10^6)  = 145087

reduce is too slow in StreamingKmeans

2014-02-17 Thread Sylvia Ma
I am using mahout 0.8 embedded in chd5.0.0 provided by cloudera and found that reduce of mahout streamingkmeans is extremely slow. For example: With a dataset of 200 objects, 128 variables, I would like to get 1 clusters. The command executed is as the following. mahout streamingkmeans