Oops! I meant to say that -c is required for the random centroid
initialization if -k is specified.
It initializes k random centroids in the folder specified by -c. so yes -c
is required.

On Tue, Mar 10, 2015 at 1:42 AM, Raghuveer <alwaysra...@yahoo.com.invalid>
wrote:

> No i have removed the -c option now so i get the mentioned exception that
> -c is mandatory.
>
>
>      On Tuesday, March 10, 2015 11:06 AM, Suneel Marthi <
> suneel.mar...@gmail.com> wrote:
>
>
>  R u still specifying the -c option, its only needed if u have initial
> centroids to launch the KMEans from otherwise KMeans picks random
> centroids.
>
> Also CosineDistanceMeasure doesn't make sense with kMeans which is in
> Euclidean space -try using SquaredEuclidean or Euclidean distances.
>
> On Tue, Mar 10, 2015 at 1:27 AM, Raghuveer <alwaysra...@yahoo.com.invalid>
> wrote:
>
> > Hi All,
> > I am trying to run the command:
> > ./mahout kmeans -i
> > hdfs://master:54310/user/netlog/upload/output4/tfidf-vectors/part-r-00000
> > -o
> >
> hdfs://master:54310//user/netlog/upload/output4/tfidf-vectors-kmeans-clusters-raghuveer
> > -c  hdfs://master:54310/user/netlog/upload/mahoutoutput -dm
> > org.apache.mahout.common.distance.CosineDistanceMeasure -x 5 -ow -cl -k
> 25
> > -xm mapreduce
> > Since i dont have any clusters yet to give it as an input i can remove it
> > is what forums suggested. But now i get the error
> >
> > Running on hadoop, using /usr/local/hadoop/bin/hadoop and
> HADOOP_CONF_DIR=
> > MAHOUT-JOB:
> >
> /home/raghuveer/trunk/examples/target/mahout-examples-1.0-SNAPSHOT-job.jar
> > 15/03/10 10:52:53 ERROR common.AbstractJob: Missing required option
> > --clusters
> > Missing required option
> > --clusters
> >
> > Usage:
> >  [--input <input> --output <output> --distanceMeasure
> > <distanceMeasure>
> > --clusters <clusters> --numClusters <k> --randomSeed
> > <randomSeed1>
> > [<randomSeed2> ...] --convergenceDelta <convergenceDelta> --maxIter
> > <maxIter>
> > --overwrite --clustering --method <method>
> > --outlierThreshold
> > <outlierThreshold> --help --tempDir <tempDir> --startPhase
> > <startPhase>
> > --endPhase
> > <endPhase>]
> > --clusters (-c) clusters    The input centroids, as Vectors.  Must be
> > a
> >                            SequenceFile of Writable, Cluster/Canopy.  If
> > k is
> >                            also specified, then a random set of vectors
> > will
> >                            be selected and written out to this path
> > first
> > 15/03/10 10:52:53 INFO driver.MahoutDriver: Program took 370 ms (Minutes:
> > 0.006166666666666667)
> > Kindly help me out.
> > Thanks
> >
> >
> >
>
>
>

Reply via email to