Top terms in the results of Kmeans clustering

2016-05-09 Thread Donni Khan
Hello everyone, I want to know how the Top terms in the results of Kmeans clustering are computed. Is there any semantic between them or just a top frequency. I will be glad if anyone give me some tips or any tutorials abut that. Thank you, Donni

Speed up LDA in Mahit 0.9

2015-05-05 Thread Donni Khan
Hello Mahout Users, I'm runing LDA job (Mahout 0.9) by using java code, but to run the algorithm on the small dataset is taking much time. Is there any way to speed up the prcessing time (like changing the parameter values)? Thanks in advance, Donni

Re: Text clustering with SVD

2015-03-31 Thread Donni Khan
for k100 as it would become quite slow in power iterations step. to your other questions, e.g. U*sigma result output, see overview and usage link given here: http://mahout.apache.org/users/dim-reduction/ssvd.html On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan prince.don...@googlemail.com

Text clustering with SVD

2015-03-30 Thread Donni Khan
Hallo Mahout users, I'm working on text clustering, I would like to reduce the features to enhance the clustering process. I would like to use the Singular Value Decomposition before cluatering process. I will be thankfull if anyone has used this before, Is it a good idea for clustering? Is

Re: Text clustering with SVD

2015-03-30 Thread Donni Khan
clustering step 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as input. On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan prince.don...@googlemail.com wrote: Hallo Mahout users, I'm working on text clustering, I would like to reduce the features to enhance

Re: Process UnStructured Data in Mahout for Clustering

2014-12-04 Thread Donni Khan
Hi it depends on the nature of data you are clustering. If you have knowledge about your data, you can figure out the results and you can also set the correct parameters to the clustering algorithm like number of topics or number of clusters. Cheers, Donni On Thu, Dec 4, 2014 at 2:38 PM, Shahid

Re: Clusters developed using Mahout 0.8 version, can I use clusterdump from Mahout 0.9 version?

2014-11-25 Thread Donni Khan
Hi, You can use WeightedVectorWritable(Version 0.8) instead of WeightedPropertyVectorWritable(Version .9). Donni On Wed, Nov 26, 2014 at 12:27 AM, Viral Parikh viral.par...@match.com wrote: To Whomsoever It May Concern - I have kmeans clusters from Mahout 0.8 version. Since then we have

How to choose the intioal clusters for K-mean from Tf-IDF vectors

2014-11-17 Thread Donni Khan
Hi All, I'm working with text clustering. I want to select specific documents(as a vectors) to be centroIDs fo k-means. I have created the TF-IDF for my dataset by using Mahout, and I would like to choose the initioal clusters from TFIDF vectors. Anyone has an idea Hw I can do it by Mahout?

Remove instance from SequenceFile

2014-11-11 Thread Donni Khan
Hi All, I'm working with text mining by using Mahoup algorithms. I'm calculating the similarity for text documents, First I computed the TF-IDF for all documents (SequenceFIle format), During computing the similarity, there are a lot of documents do not have any simlair Doc's. So I would like to

Re: Mahout documentation

2014-11-03 Thread Donni Khan
You can also use the parameter -h to look at the description of all params. On Mon, Nov 3, 2014 at 8:29 AM, sleefd sle...@gmail.com wrote: just go and read the source code(only the start part of the job driver class ). Its easy to learn what the params mean.Or you should find carefully

Map Rowsimilarity results to the orginal documents

2014-09-17 Thread Donni Khan
Hi all, I run Rowsimilarity between text documents. my documents are sorted as the folowing: *DocIDDocText* 0 x 1 2 .. .. The DocID is sorted from 0 and so

Understanding Conopy with Cosinesimilarity

2014-09-10 Thread Donni Khan
Hi all, I‘m using Canopy clustering with cosine similarity measure as input to kmenas clustering. I’m wondering how the similarity between documents is calculated with respect to t1 and t2 parameters. Let me say t1=0.8 and t2=0.5. For the cosine similarity if s(d1,d2)0.8 that means they are

Clustering with Dynamic Data

2014-09-03 Thread Donni Khan
Hi guys, Im working with text mining project, I would like to apply text clustering to identify trends inside the dcuments. The dcuments which enter to clustering process is dynamic depending on the query results from a big repository. I applied some Mahout algotitms like LDA, KMeans with

Mahout Configuration

2014-07-11 Thread Donni Khan
Hello, I started to develoe a text mining tool which will apply Weight Vectors, clustering, classification, and topic detection methods. I decided to use Mahout to perform that. Here I'm asking some questions which I would like to know thier answers. - What is better(with respect the time of