date:20150330

Re: Text clustering with SVD

2015-03-30 Thread Ted Dunning

Lanczos may be more accurate than SSVD, but if you use a power step or three, this difference goes away as well. The best way to select k is actually to pick a value k_max larger than you expect to need and then pick random vectors instead of singular vectors. To evaluate how many singular vectors

Re: Latent Semantic Analysis for Document Categorization

2015-03-30 Thread Ted Dunning

Hersheeta, For linking information, you should go for whatever you can find. For instance: 1) if the documents are HTML, href elements (aka web links) are an ideal kind of linkage. This is what Page rank was based on. 2) If the documents refer to people, places or things, then you have a secon

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov

Note that these instructions actually mean running PCA, not SVD but that's probably the intention here. I don't think just running SVD helps. On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi wrote: > Here are the steps if u r using Mahout-mrlegacy in the present Mahout > trunk: > > 1. Generate tfi

Re: Text clustering with SVD

2015-03-30 Thread Suneel Marthi

Lanczos has since been deprecated and will be removed in the upcoming release, so please desist from using/suggesting Lanczos. On Mon, Mar 30, 2015 at 3:00 PM, Dmitriy Lyubimov wrote: > I am not aware of _any_ scenario under which lanczos would be faster (see > N. Halko's dissertation for compa

Re: Text clustering with SVD

2015-03-30 Thread Dmitriy Lyubimov

I am not aware of _any_ scenario under which lanczos would be faster (see N. Halko's dissertation for comparisons), although admittedly i did not study all possible cases. having -k=100 is probably enough for anything. I would not recommend running -q>0 for k>100 as it would become quite slow in

Re: Text clustering with SVD

2015-03-30 Thread Fernando Fernández

SSVD is just one of may ways to compute a partial SVD. In mahout you also have Lanczos method, which I have found faster and more reliable in some applications, but most of people here seem to prefer SSVD, in fact I think Lanczos is (or has been) planned to be deprecated. This may also have changed

Re: Text clustering with SVD

2015-03-30 Thread Donni Khan

Hallo Suneel, Thanks for fast reply. Is SSVD like SVD? which one is better? I run the SSVD by java code on my data, but how do I compute U*Sigma? Can I do that by Mahout? Is there optimal method to determin K? another quesion is how do I make the relation between ssvd output and words dictionary

Re: Text clustering with SVD

2015-03-30 Thread Suneel Marthi

Here are the steps if u r using Mahout-mrlegacy in the present Mahout trunk: 1. Generate tfidf vectors from the input corpus using seq2sparse (I am assuming you had done this before and hence avoiding the details) 2. Run SSVD on the generated tfidf vectors from (1) ./bin/mahout ssvd -i -o

Text clustering with SVD

2015-03-30 Thread Donni Khan

Hallo Mahout users, I'm working on text clustering, I would like to reduce the features to enhance the clustering process. I would like to use the Singular Value Decomposition before cluatering process. I will be thankfull if anyone has used this before, Is it a good idea for clustering? Is there

Re: Latent Semantic Analysis for Document Categorization

2015-03-30 Thread Hersheeta Chandankar

Hi Ted, Thank you for a quick reply. It would be of great help if you could please explain what kind of 'linking information between documents' I should look for. On Fri, Mar 27, 2015 at 2:45 AM, Ted Dunning wrote: > Also, if you can include linking information between documents, you should > b

Latent Semantic Analysis for Document Categorization

2015-03-30 Thread Hersheeta Chandankar

Hi Ted, Thank you for a quick reply. It would be of great help if you could please explain what kind of 'linking information between documents' I should look for.

Re: Text clustering with SVD

Re: Latent Semantic Analysis for Document Categorization

Re: Text clustering with SVD

Re: Text clustering with SVD

Re: Text clustering with SVD

Re: Text clustering with SVD

Re: Text clustering with SVD

Re: Text clustering with SVD

Text clustering with SVD

Re: Latent Semantic Analysis for Document Categorization

Latent Semantic Analysis for Document Categorization

11 matches

Site Navigation

Mail list logo

Footer information