I see the error below:
Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=
MAHOUT-JOB:
/home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
15/03/10 11:50:20 INFO common.AbstractJob: Command line arguments:
{--clustering=null,
--clusters=[hdfs://maste
I see the error below:
On Tuesday, March 10, 2015 11:45 AM, Suneel Marthi
wrote:
Try
./mahout kmeans -i
http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-0
-o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -c
-dm org.apache.mahout.c
Try
./mahout kmeans -i
http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-0
-o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -c
-dm
org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure -x 5 -ow
-cl -k 25
I don't have a machine before me
ok so if -c is required then how can i give it or atleast is there a way to
remove -k itself?
./mahout kmeans -i
http://master:50070/explorer.html#/user/netlog/upload/output4/tfidf-vectors/part-r-0
-o /usr/netlog/upload/output4/tfidf-vectors-kmeans-clusters -dm
org.apache.mahout.common.dist
Oops! I meant to say that -c is required for the random centroid
initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c
is required.
On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer
wrote:
> No i have removed the -c option now so i get the mention
No i have removed the -c option now so i get the mentioned exception that -c is
mandatory.
On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi
wrote:
R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks rand
R u still specifying the -c option, its only needed if u have initial
centroids to launch the KMEans from otherwise KMeans picks random centroids.
Also CosineDistanceMeasure doesn't make sense with kMeans which is in
Euclidean space -try using SquaredEuclidean or Euclidean distances.
On Tue, Mar
Hi All,
I am trying to run the command:
./mahout kmeans -i
hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-0 -o
hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
-c hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
org.apache