See https://cwiki.apache.org/MAHOUT/quick-tour-of-text-analysis-using-the-mahout-command-line.html
and https://cwiki.apache.org/MAHOUT/k-means-clustering.html . Study the shell script referenced in the link. Hope that helps -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-make-normal-Text-suitable-for-Kmeans-using-mahout-tp3984839p4036002.html Sent from the Mahout User List mailing list archive at Nabble.com.