mapreduce streaming k-means invocation with -rskm switch times out

2015-10-07 Thread hsharma mailinglists
Hello, I'm trying to run streaming k-means in the hadoop/non-local mode with reduce_streaming_k_means option enabled. The map stage completes successfully but the reduce stage fails to do so and times out. I'm basically trying to understand why. I have quite a few questions, but first here's a

issue MAHOUT-1469

2015-10-07 Thread hsharma mailinglists
Hello Mahout committers, Are there plans to resolve issues.apache.org/jira/browse/MAHOUT-1469? I noticed that it's marked for fixing in version 0.11.1, but that it's been there since version 0.9. Thanks! - Harsh

[mahout 0.9 | k-means] methodology for selecting k to cluster very large datasets

2015-09-15 Thread hsharma mailinglists
Hello, I have some questions around large-scale clustering. I would like to arrive at a methodology that I can use to determine an appropriate value of K to run K-means clustering for (at least for my scenario, if not in general). More details follow below (apologies for the verbosity, but I