The simplest way would be :
1) Create a maven project.
2) Add mahout-core as a dependency.
3) Use KMeansDriver's run method.
If you set parameter runSequential=true, then you don't even need hadoop cluster, but you will not be able to cluster really large datasets.

So, try it out with a smaller number of records(vectors) first, then go for hadoop cluster ( by setting runSequential=false and setting up HADOOP_HOME) .

You can find the documentation for KMeans here:
https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering

You can also start with Canopy Clustering and then go for K-Means, as its simpler and fast.
https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering

I recommend mahout 0.7-snapshot ( i.e. the current code at trunk ).

On 16-04-2012 10:57, OSCAR wrote:
Hello

My name is Oscar González, i'm studing System Engeineer in the universidad el 
Bosque from Colombia. And i have the next question:

I have a web aplication with hibernate. And i need use clustering, kmeans 
algorithm. I wanna use mahout, but I don't know, how can I apply mahout in my 
project... I'm using netbeans. Please answer me,

Thanks

Oscar Miguel Gonzalez

Enviado desde mi iPad

Reply via email to