Hello everyone,
I want to know how the Top terms in the results of Kmeans clustering are
computed. Is there any semantic between them or just a top frequency.
I will be glad if anyone give me some tips or any tutorials abut that.
Thank you,
Donni
Hello Mahout Users,
I'm runing LDA job (Mahout 0.9) by using java code, but to run the
algorithm on the small dataset is taking much time.
Is there any way to speed up the prcessing time (like changing the
parameter values)?
Thanks in advance,
Donni
for k100 as it would become quite slow in power iterations
step.
to your other questions, e.g. U*sigma result output, see overview and
usage link given here:
http://mahout.apache.org/users/dim-reduction/ssvd.html
On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan
prince.don...@googlemail.com
Hallo Mahout users,
I'm working on text clustering, I would like to reduce the features to
enhance the clustering process.
I would like to use the Singular Value Decomposition before cluatering
process. I will be thankfull if anyone has used this before, Is it a good
idea for clustering?
Is
clustering step
3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as
input.
On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan prince.don...@googlemail.com
wrote:
Hallo Mahout users,
I'm working on text clustering, I would like to reduce the features to
enhance
Hi
it depends on the nature of data you are clustering. If you have knowledge
about your data, you can figure out the results and you can also set the
correct parameters to the clustering algorithm like number of topics or
number of clusters.
Cheers,
Donni
On Thu, Dec 4, 2014 at 2:38 PM, Shahid
Hi,
You can use WeightedVectorWritable(Version 0.8) instead of
WeightedPropertyVectorWritable(Version .9).
Donni
On Wed, Nov 26, 2014 at 12:27 AM, Viral Parikh viral.par...@match.com
wrote:
To Whomsoever It May Concern -
I have kmeans clusters from Mahout 0.8 version. Since then we have
Hi All,
I'm working with text clustering. I want to select specific documents(as a
vectors) to be centroIDs fo k-means.
I have created the TF-IDF for my dataset by using Mahout, and I would like
to choose the initioal clusters from TFIDF vectors.
Anyone has an idea Hw I can do it by Mahout?
Hi All,
I'm working with text mining by using Mahoup algorithms. I'm calculating
the similarity for text documents, First I computed the TF-IDF for all
documents (SequenceFIle format), During computing the similarity, there are
a lot of documents do not have any simlair Doc's. So I would like to
You can also use the parameter -h to look at the description of all
params.
On Mon, Nov 3, 2014 at 8:29 AM, sleefd sle...@gmail.com wrote:
just go and read the source code(only the start part of the job
driver class ). Its easy to learn what the params mean.Or you should find
carefully
Hi all,
I run Rowsimilarity between text documents. my documents are sorted as the
folowing:
*DocIDDocText*
0 x
1
2
.. ..
The DocID is sorted from 0 and so
Hi all,
I‘m using Canopy clustering with cosine similarity measure as input to
kmenas clustering. I’m wondering how the similarity between documents is
calculated with respect to t1 and t2 parameters.
Let me say t1=0.8 and t2=0.5. For the cosine similarity if s(d1,d2)0.8
that means they are
Hi guys,
Im working with text mining project, I would like to apply text clustering
to identify trends inside the dcuments. The dcuments which enter to
clustering process is dynamic depending on the query results from a big
repository.
I applied some Mahout algotitms like LDA, KMeans with
Hello,
I started to develoe a text mining tool which will apply Weight Vectors,
clustering, classification, and topic detection methods. I decided to use
Mahout to perform that.
Here I'm asking some questions which I would like to know thier answers.
- What is better(with respect the time of
14 matches
Mail list logo