Re: Process UnStructured Data in Mahout for Clustering

2014-12-05 Thread Ted Dunning
On Thu, Dec 4, 2014 at 5:38 AM, Shahid Shaikh shaikhshah...@gmail.com wrote: i see the problem is with the way data is written What exactly do you mean by this?

Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout clustering algorithms like Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . i see the problem is with the way data is written , Can some one please provide

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Shaikh shaikhshah...@gmail.com wrote: Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout clustering algorithms like Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . i see the problem is with the way data

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
PM, Shahid Shaikh shaikhshah...@gmail.com wrote: Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout clustering algorithms like Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . i see the problem

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Brian Dolan
parameters to the clustering algorithm like number of topics or number of clusters. Cheers, Donni On Thu, Dec 4, 2014 at 2:38 PM, Shahid Shaikh shaikhshah...@gmail.com wrote: Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout

Apache Mahout - KMeans Clustering

2014-05-21 Thread Aleksander Sadecki
Hi, I am following the book Mahout In Action. I downloaded sources and I am trying to run this piece of code: import java.io.File; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem;

Re: Apache Mahout - KMeans Clustering

2014-05-21 Thread tuxdna
You are using 0.9 version of Mahout amd 1.0 version of mahout-collections. The API might have changed considerably. I suggest you checkout the code from here: https://github.com/tdunning/MiA/tree/mahout-0.7 This code works with mahout-0.7 Regards, Saleem On Wed, May 21, 2014 at 4:49 PM,

Apache Mahout - KMeans Clustering

2014-05-21 Thread Aleksander Sadecki
Hi, Thank you for your answer. I changed my pom.xml: mahout.version0.7/mahout.version mahout.groupidorg.apache.mahout/mahout.groupid dependency groupId${mahout.groupid}/groupId

Mahout for clustering

2013-12-02 Thread Sameer Tilak
Hi All,We are using Apache Pig for building our data pipeline. We have data in the following fashion: userid, age, items {code 1, code 2, ….}, few other features... Each item has a unique alphanumeric code. I would like to use mahout for clustering it. Based on my current reading I see

RE: Mahout for clustering

2013-12-02 Thread Sameer Tilak
I am looking for some input on how to vectorize my data. From: ssti...@live.com To: user@mahout.apache.org Subject: Mahout for clustering Date: Mon, 2 Dec 2013 16:22:03 -0800 Hi All,We are using Apache Pig for building our data pipeline. We have data in the following fashion

Re: Mahout for clustering

2013-12-02 Thread Andrew Musselman
@mahout.apache.org Subject: Mahout for clustering Date: Mon, 2 Dec 2013 16:22:03 -0800 Hi All,We are using Apache Pig for building our data pipeline. We have data in the following fashion: userid, age, items {code 1, code 2, ….}, few other features... Each item has a unique

Re: Mahout for clustering

2013-12-02 Thread Ted Dunning
in the following fashion: userid, age, items {code 1, code 2, ….}, few other features... Each item has a unique alphanumeric code. I would like to use mahout for clustering it. Based on my current reading I see following few options 1. Map each alphanumeric item code to a numeric code -- A1 - 0

Re: Does something like an explain feature exist in Mahout for clustering.

2013-02-05 Thread Chris Harrington
I'm currently using KMeans with canopy and Cosine as the measure. The data I'm using has been somewhat curated into categories so I expected them to cluster alongside the other documents in their respective categories. Some of them fall nicely into clusters I'd expect but others are like the

Does something like an explain feature exist in Mahout for clustering.

2013-02-04 Thread Chris Harrington
I was wondering if there was an explain feature in Mahout, something that gives the reason why it did what it did, shows the values of the various features it used to evaluate and choose the result, etc. Because I have some wildly different text data being clustered together, for example it

Re: Does something like an explain feature exist in Mahout for clustering.

2013-02-04 Thread Steven Bourke
Sent from phone On 4 Feb 2013, at 18:57, Chris Harrington ch...@heystaks.com wrote: I was wondering if there was an explain feature in Mahout, something that gives the reason why it did what it did, shows the values of the various features it used to evaluate and choose the result, etc.

Re: Does something like an explain feature exist in Mahout for clustering.

2013-02-04 Thread Jeff Eastman
That's a really good question. Mahout does not have an explain feature; however, you can use the ClusterDumper to print out the cluster centers and vectors clustered within each cluster. Output is pretty verbose and, with large text vectors being truncated, might not be that useful. You might