Thanks for pointing that out. I corrected the Wiki page.
________________________________ From: Marco <zentrop...@yahoo.co.uk> To: "user@mahout.apache.org" <user@mahout.apache.org> Sent: Thursday, August 1, 2013 3:08 PM Subject: Re: k-means issues thanks a lot. will try your suggestions asap. i was sort of following this http://goo.gl/u8VFZN ----- Messaggio originale ----- Da: Jeff Eastman <j...@windwardsolutions.com> A: user@mahout.apache.org Cc: Inviato: Giovedì 1 Agosto 2013 21:02 Oggetto: Re: k-means issues The clustering arguments are usually directories, not files. Try: mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i mahout/kmeans-clusters/clusters-1-final -n 20 -b 100 -o cdump.txt -p mahout/kmeans-clusters/clusteredPoints On 8/1/13 2:51 PM, Marco wrote: > mahout clusterdump -d mahout/vectors/dictionary.file-0 -dt sequencefile -i >mahout/kmeans-clusters/clusters-1-final/part-r-00000 -n 20 -b 100 -o cdump.txt >-p mahout/kmeans-clusters/clusteredPoints > > > > ----- Messaggio originale ----- > Da: Suneel Marthi <suneel_mar...@yahoo.com> > A: "user@mahout.apache.org" <user@mahout.apache.org>; Marco > <zentrop...@yahoo.co.uk> > Cc: > Inviato: Giovedì 1 Agosto 2013 17:24 > Oggetto: Re: k-means issues > > > > Could u post the Command line u r using for clusterdump? > > > > > ________________________________ > From: Marco <zentrop...@yahoo.co.uk> > To: "user@mahout.apache.org" <user@mahout.apache.org>; Suneel Marthi > <suneel_mar...@yahoo.com> > Sent: Thursday, August 1, 2013 10:29 AM > Subject: Re: k-means issues > > > ok i did put -cl and got clusteredPoints, but then I do clusterdump and > always get "Wrote 0 clusters" > > > > > ----- Messaggio originale ----- > Da: Suneel Marthi <suneel_mar...@yahoo.com> > A: "user@mahout.apache.org" <user@mahout.apache.org>; Marco > <zentrop...@yahoo.co.uk> > Cc: > Inviato: Giovedì 1 Agosto 2013 16:04 > Oggetto: Re: k-means issues > > Check examples/bin/cluster_reuters.sh for kmeans (it exists in Mahout 0.7 too > :)) > > You need to specify the clustering option -cl in your kmeans command. > > > > > > > ________________________________ > From: Marco <zentrop...@yahoo.co.uk> > To: "user@mahout.apache.org" <user@mahout.apache.org> > Sent: Thursday, August 1, 2013 9:55 AM > Subject: k-means issues > > > > > So I've got 13000 text files representing topics in certain newspaper > articles. > Each file is just a tab-separated list of topics (so something like "china > japan senkaku dispute" or "italy lampedusa immgration"). > > I want to run k-means clusteriazion on them. > > Here's what I do (i'm actually doing it on a subset of 100 files): > > 1) run seqdirectory to produce sequence file from raw text files > 2) run seq2sparse to produce vectors from sequence file > > (if i do seqdumper on tfidf-vectors/part-r-00000 i get something like > Key: /filename1: Value: > /filename1:{72:0.7071067811865476,0:0.7071067811865476} > and if i do it on dictionary.fie-0 i get > Key class: class org.apache.hadoop.io.Text Value Class: class > org.apache.hadoop.io.IntWritable > Key: china: Value: 0 > Key: japan: Value: 1 > > 3) i run k-means (mahout kmeans -i mahout/vectors/tfidf-vectors/ -k 10 -o > mahout/kmeans-clusters -dm > org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 --clusters > mahout/tmp) > first thing i notice here is it logs: > INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce > Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} > the "Input Vectors: {}" part puzzles me. > > > Even worse, this doesn't seem to create the clusteredPoints directory at all. > > What am I doing wrong? > >