Check examples/bin/cluster_reuters.sh for kmeans (it exists in Mahout 0.7 too :))
You need to specify the clustering option -cl in your kmeans command. ________________________________ From: Marco <zentrop...@yahoo.co.uk> To: "user@mahout.apache.org" <user@mahout.apache.org> Sent: Thursday, August 1, 2013 9:55 AM Subject: k-means issues So I've got 13000 text files representing topics in certain newspaper articles. Each file is just a tab-separated list of topics (so something like "china japan senkaku dispute" or "italy lampedusa immgration"). I want to run k-means clusteriazion on them. Here's what I do (i'm actually doing it on a subset of 100 files): 1) run seqdirectory to produce sequence file from raw text files 2) run seq2sparse to produce vectors from sequence file (if i do seqdumper on tfidf-vectors/part-r-00000 i get something like Key: /filename1: Value: /filename1:{72:0.7071067811865476,0:0.7071067811865476} and if i do it on dictionary.fie-0 i get Key class: class org.apache.hadoop.io.Text Value Class: class org.apache.hadoop.io.IntWritable Key: china: Value: 0 Key: japan: Value: 1 3) i run k-means (mahout kmeans -i mahout/vectors/tfidf-vectors/ -k 10 -o mahout/kmeans-clusters -dm org.apache.mahout.common.distance.CosineDistanceMeasure -x 10 --clusters mahout/tmp) first thing i notice here is it logs: INFO kmeans.KMeansDriver: convergence: 0.5 max Iterations: 10 num Reduce Tasks: org.apache.mahout.math.VectorWritable Input Vectors: {} the "Input Vectors: {}" part puzzles me. Even worse, this doesn't seem to create the clusteredPoints directory at all. What am I doing wrong?