Re: mahout guide or tutorial or how to for test and run kmean on hadoop

Jeff Eastman Fri, 27 Aug 2010 15:45:08 -0700

 Hi Valerio,

All the Mahout clustering implementations operate over Hadoop sequencefiles of Mahout type VectorWritable. These entities allow you torepresent dense or sparse numeric information which may be furtherannotated by NamedVector wrappers to encode vector names in the dataset. If you can run Hadoop jobs or call Java from weka then you may beable to use our code directly. Look at the driver class under eachalgorithm for entry points. If all else fails we also have a commandline interface.

All the clustering jobs accept VectorWritable input files and produceHadoop directories (clusters-i) containing the Clusters produced by theparticular clustering iteration(s) plus an optional directory(clusteredPoints) containing sequence files of clustered points whichare keyed by the clusterId and contain WeightedVectorWritable wrappersaround the original input vector. These wrappers encode the pdf of thecluster assignment.


Hope this helps,
Jeff

On 8/27/10 12:06 PM, Valerio wrote:

hi all,

I need some guides that explain how to use mahout with the kmeans algorithm and
first of all,what type of dataset mahout uses?
I'm doing my thesis and I must run a k means clustering on weka,but weka must
call hadoop in background to parallelize the job. I discovered that mahout run
the kmeans on hadoop so i will call it from weka,but I don't understand what
type of files the kmeans of mahout read as input and how it works.

can someone help me?

Thanks all,
Valerio Ceraudo

Re: mahout guide or tutorial or how to for test and run kmean on hadoop

Reply via email to