R u still specifying the -c option, its only needed if u have initial centroids to launch the KMEans from otherwise KMeans picks random centroids.
Also CosineDistanceMeasure doesn't make sense with kMeans which is in Euclidean space -try using SquaredEuclidean or Euclidean distances. On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <alwaysra...@yahoo.com.invalid> wrote: > Hi All, > I am trying to run the command: > ./mahout kmeans -i > hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000 > -o > hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer > -c hdfs://master:54310/user/netlog/upload/mahoutoutput -dm > org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k 25 > -xm mapreduce > Since i dont have any clusters yet to give it as an input i can remove it > is what forums suggested. But now i get the error > > Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= > MAHOUT-JOB: > /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option > --clusters > Missing required option > --clusters > > Usage: > [--input <input> --output <output> --distanceMeasure > <distanceMeasure> > --clusters <clusters> --numClusters <k> --randomSeed > <randomSeed1> > [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter > <maxIter> > --overwrite --clustering --method <method> > --outlierThreshold > <outlierThreshold> --help --tempDir <tempDir> --startPhase > <startPhase> > --endPhase > <endPhase>] > --clusters (-c) clusters The input centroids, as Vectors. Must be > a > SequenceFile of Writable, Cluster/Canopy. If > k is > also specified, then a random set of vectors > will > be selected and written out to this path > first > 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes: > 0.006166666666666667) > Kindly help me out. > Thanks > > >