Out Of Memory for JVM issue running Kmeans example, not sure where to increase the xmx heapsize

2012-12-26 Thread Yunming Zhang
Hi, I was running the example kmeans program following the link here https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html So I increased the input size Synthetic_cotnrol.data from around 200kb to 1.2 GB by copying the data itself, the max iteration is set to 10, so afte

Re: LSA in Mahout

2012-12-26 Thread Ted Dunning
LDA is vastly slower than LSA because LSA can use large scale SVD algorithms. LDA may be better for some applications, but even the fastest implementations tend to be much slower than large scale SVD. The LDA implementations in Mahout are not particularly fast. On Wed, Dec 26, 2012 at 5:01 PM, V

RE: LSA in Mahout

2012-12-26 Thread Vince Wei (jianwei)
LDA(Latent Dirichlet Allocation) is implemented in Mahout, and LDA is better than LSA. I cannot see the necessary to implement LSA in Mahout. Sincerely Vince Wei -Original Message- From: thyme@gmail.com [mailto:thyme@gmail.com] On Behalf Of Osman Ba?kaya Sent: 2012年12月27日 2:58

Re: Document Classification - Recommended Algorithms?

2012-12-26 Thread Magesh Sarma
Ted: Thanks for the helpful pointers. > Do you have thousands of labeled documents for each category? Yes, I have several years worth of human-classified documents. I can get my hands on as many labeled documents as needed. > Are the categories groupable into very similar clusters? I don't under

Re: LSA in Mahout

2012-12-26 Thread Osman Başkaya
Thank you so much guys. You are always very kind and helpful :) On Wed, Dec 26, 2012 at 11:02 PM, Dmitriy Lyubimov wrote: > yes, LSA is possible with Mahout using seqdirectory, seq2sparse and ssvd > commands. > > You may need additional help on this forum with the structure of seq2sparse > dictio

Re: LSA in Mahout

2012-12-26 Thread Dmitriy Lyubimov
yes, LSA is possible with Mahout using seqdirectory, seq2sparse and ssvd commands. You may need additional help on this forum with the structure of seq2sparse dictionary if you plan to do LSI and on-the-fly fold in operations. On Wed, Dec 26, 2012 at 10:57 AM, Osman Başkaya wrote: > Greetings e

Re: LSA in Mahout

2012-12-26 Thread Sebastian Schelter
Hi Osman, Mahout has all the building blocks you need to create a LSA pipeline: You have to vectorize your documents using seqdirectory and seq2sparse to get the term-document-matrix. After that you can you use one of our two SVD implementations [1,2] to compute the decomposition necessary for L

Re: Document Classification - Recommended Algorithms?

2012-12-26 Thread Ted Dunning
Do you have thousands of labeled documents for each category? Are the categories groupable into very similar clusters? Do categories come and go? What is high accuracy to you? My first recommendation for text classification always is L_1 regularized logistic regression. Since your training dat

LSA in Mahout

2012-12-26 Thread Osman Başkaya
Greetings everyone, I want to use Latent Semantic Analysis in Mahout. Is there any implementation for this algorithm. I checked but I couldn't find. LSA is very similar to SVD, so I thought the reason why there is no concrete LSA implementation. Could you clarify this for me, please? Thank you so

Re: About Dirichlet clustering's threshold

2012-12-26 Thread Jeff Eastman
It could be a contradiction indeed. I wonder if you can help us to characterize it further, perhaps by reading the code or by running your data in sequential debug mode? Without a little more information it is difficult to get to the root of your problem. On 12/25/12 8:21 PM, yoshihiro fujimo