Oops! I meant to say that -c is required for the random centroid initialization if -k is specified. It initializes k random centroids in the folder specified by -c. so yes -c is required.
On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <alwaysra...@yahoo.com.invalid> wrote: > No i have removed the -c option now so i get the mentioned exception that > -c is mandatory. > > > On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi < > suneel.mar...@gmail.com> wrote: > > > R u still specifying the -c option, its only needed if u have initial > centroids to launch the KMEans from otherwise KMeans picks random > centroids. > > Also CosineDistanceMeasure doesn't make sense with kMeans which is in > Euclidean space -try using SquaredEuclidean or Euclidean distances. > > On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <alwaysra...@yahoo.com.invalid> > wrote: > > > Hi All, > > I am trying to run the command: > > ./mahout kmeans -i > > hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000 > > -o > > > hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer > > -c hdfs://master:54310/user/netlog/upload/mahoutoutput -dm > > org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k > 25 > > -xm mapreduce > > Since i dont have any clusters yet to give it as an input i can remove it > > is what forums suggested. But now i get the error > > > > Running on hadoop, using /usr/local/hadoop/bin/hadoop and > HADOOP_CONF_DIR= > > MAHOUT-JOB: > > > /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar > > 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option > > --clusters > > Missing required option > > --clusters > > > > Usage: > > [--input <input> --output <output> --distanceMeasure > > <distanceMeasure> > > --clusters <clusters> --numClusters <k> --randomSeed > > <randomSeed1> > > [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter > > <maxIter> > > --overwrite --clustering --method <method> > > --outlierThreshold > > <outlierThreshold> --help --tempDir <tempDir> --startPhase > > <startPhase> > > --endPhase > > <endPhase>] > > --clusters (-c) clusters The input centroids, as Vectors. Must be > > a > > SequenceFile of Writable, Cluster/Canopy. If > > k is > > also specified, then a random set of vectors > > will > > be selected and written out to this path > > first > > 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes: > > 0.006166666666666667) > > Kindly help me out. > > Thanks > > > > > > > > >