Apache Mahout 0.9 LDA CVB Example

2014-09-23 Thread Shahid Shaikh
Hi, I am currently working on a project that needs categorization of documents (UN-structured data) based on internal context of document. I am using Apache mahout clustering solution for the same. So far we have explored Kmeans, Canopy with Kmeans, We have also used Lucene analyzer

Interpret CVB output

2014-09-26 Thread Shahid Shaikh
I have successfully ran CVB JOB with input k (number of topics) as 20 -x and got the following files as output of the Job 1. CVB output for parameter “*-o*” which is vector file with 20 records. 2. “doc-topic-distributions” for parameter “*-dt*” which is a vector file with 806 records

Issue in interpreting Mahout CVB Output

2014-10-14 Thread Shahid Shaikh
i am also stuck at the same stage and could not figure out how to relate documents to topic distributions . i saw that the CVB output is a sequence file of vectors as "org.apache.hadoop.io.IntWritable%org.apache.mahout.math.VectorWritable" and the topic distribution file is also same "org.apache

Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
Hi All, I have been trying mahout clustering on unstructured data i.e human written data . I have tried mahout clustering algorithms like Kmeans,Canopy+Kmeans and LDA but the results produced are not help full . i see the problem is with the way data is written , Can some one please provide me

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Shahid Shaikh
nds on the nature of data you are clustering. If you have knowledge > about your data, you can figure out the results and you can also set the > correct parameters to the clustering algorithm like number of topics or > number of clusters. > > Cheers, > Donni > > On Thu, Dec 4,

Re: General Instructions about Mahout

2014-12-09 Thread Shahid Shaikh
yes downloading the source code and building that in Eclipse will be good choice for that you need to have maven installed and configured on your machine . Once you are done you can simply configure the mahout distribution in you class path to use the classes of mahout and create your own project t