I am not aware of _any_ scenario under which lanczos would be faster (see N. Halko's dissertation for comparisons), although admittedly i did not study all possible cases.
having -k=100 is probably enough for anything. I would not recommend running -q>0 for k>100 as it would become quite slow in power iterations step. to your other questions, e.g. U*sigma result output, see "overview and usage" link given here: http://mahout.apache.org/users/dim-reduction/ssvd.html On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan <prince.don...@googlemail.com> wrote: > Hallo Suneel, > Thanks for fast reply. > Is SSVD like SVD? which one is better? > I run the SSVD by java code on my data, but how do I compute U*Sigma? Can > I do that by Mahout? > Is there optimal method to determin K? > > another quesion is how do I make the relation between ssvd output and > words dictionary(real words)? > > Thank you > Donni > > On Mon, Mar 30, 2015 at 10:04 AM, Suneel Marthi <suneel.mar...@gmail.com> > wrote: > > > Here are the steps if u r using Mahout-mrlegacy in the present Mahout > > trunk: > > > > 1. Generate tfidf vectors from the input corpus using seq2sparse (I am > > assuming you had done this before and hence avoiding the details) > > > > 2. Run SSVD on the generated tfidf vectors from (1) > > > > ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca > true > > -us true -U false -V false > > > > k = no. of reduced basis vectors > > > > You would need the U*Sigma output of the PCA flow for the next > > clustering step > > > > 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as > > input. > > > > > > On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan < > prince.don...@googlemail.com> > > wrote: > > > > > Hallo Mahout users, > > > > > > I'm working on text clustering, I would like to reduce the features to > > > enhance the clustering process. > > > I would like to use the Singular Value Decomposition before cluatering > > > process. I will be thankfull if anyone has used this before, Is it a > good > > > idea for clustering? > > > Is there any other method in mahout to reduce the text features before > > > clustring? > > > Is anyone has idea how can I apply SVD by using Java code? > > > > > > Thanks in advance, > > > Donni > > > > > >