The simplest way would be :
1) Create a maven project.
2) Add mahout-core as a dependency.
3) Use KMeansDriver's run method.
If you set parameter runSequential=true, then you don't even need
hadoop cluster, but you will not be able to cluster really large datasets.
So, try it out with a smaller number of records(vectors) first, then go
for hadoop cluster ( by setting runSequential=false and setting up
HADOOP_HOME) .
You can find the documentation for KMeans here:
https://cwiki.apache.org/confluence/display/MAHOUT/K-Means+Clustering
You can also start with Canopy Clustering and then go for K-Means, as
its simpler and fast.
https://cwiki.apache.org/confluence/display/MAHOUT/Canopy+Clustering
I recommend mahout 0.7-snapshot ( i.e. the current code at trunk ).
On 16-04-2012 10:57, OSCAR wrote:
Hello
My name is Oscar González, i'm studing System Engeineer in the universidad el
Bosque from Colombia. And i have the next question:
I have a web aplication with hibernate. And i need use clustering, kmeans
algorithm. I wanna use mahout, but I don't know, how can I apply mahout in my
project... I'm using netbeans. Please answer me,
Thanks
Oscar Miguel Gonzalez
Enviado desde mi iPad