Hi,
I was running the example kmeans program following the link here
https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html
So I increased the input size Synthetic_cotnrol.data from around 200kb to 1.2
GB by copying the data itself,
the max iteration is set to 10, so afte
LDA is vastly slower than LSA because LSA can use large scale SVD
algorithms.
LDA may be better for some applications, but even the fastest
implementations tend to be much slower than large scale SVD. The LDA
implementations in Mahout are not particularly fast.
On Wed, Dec 26, 2012 at 5:01 PM, V
LDA(Latent Dirichlet Allocation) is implemented in Mahout, and LDA is better
than LSA.
I cannot see the necessary to implement LSA in Mahout.
Sincerely
Vince Wei
-Original Message-
From: thyme@gmail.com [mailto:thyme@gmail.com] On Behalf Of Osman
Ba?kaya
Sent: 2012年12月27日 2:58
Ted:
Thanks for the helpful pointers.
> Do you have thousands of labeled documents for each category?
Yes, I have several years worth of human-classified documents. I can
get my hands on as many labeled documents as needed.
> Are the categories groupable into very similar clusters?
I don't under
Thank you so much guys. You are always very kind and helpful :)
On Wed, Dec 26, 2012 at 11:02 PM, Dmitriy Lyubimov wrote:
> yes, LSA is possible with Mahout using seqdirectory, seq2sparse and ssvd
> commands.
>
> You may need additional help on this forum with the structure of seq2sparse
> dictio
yes, LSA is possible with Mahout using seqdirectory, seq2sparse and ssvd
commands.
You may need additional help on this forum with the structure of seq2sparse
dictionary if you plan to do LSI and on-the-fly fold in operations.
On Wed, Dec 26, 2012 at 10:57 AM, Osman Başkaya
wrote:
> Greetings e
Hi Osman,
Mahout has all the building blocks you need to create a LSA pipeline:
You have to vectorize your documents using seqdirectory and seq2sparse
to get the term-document-matrix.
After that you can you use one of our two SVD implementations [1,2] to
compute the decomposition necessary for L
Do you have thousands of labeled documents for each category?
Are the categories groupable into very similar clusters?
Do categories come and go?
What is high accuracy to you?
My first recommendation for text classification always is L_1 regularized
logistic regression. Since your training dat
Greetings everyone,
I want to use Latent Semantic Analysis in Mahout. Is there any
implementation for this algorithm. I checked but I couldn't find. LSA is
very similar to SVD, so I thought the reason why there is no concrete LSA
implementation. Could you clarify this for me, please?
Thank you so
It could be a contradiction indeed. I wonder if you can help us to
characterize it further, perhaps by reading the code or by running your
data in sequential debug mode? Without a little more information it is
difficult to get to the root of your problem.
On 12/25/12 8:21 PM, yoshihiro fujimo
10 matches
Mail list logo