Lanczos has since been deprecated and will be removed in the upcoming release, so please desist from using/suggesting Lanczos.
On Mon, Mar 30, 2015 at 3:00 PM, Dmitriy Lyubimov <dlie...@gmail.com> wrote: > I am not aware of _any_ scenario under which lanczos would be faster (see > N. Halko's dissertation for comparisons), although admittedly i did not > study all possible cases. > > having -k=100 is probably enough for anything. I would not recommend > running -q>0 for k>100 as it would become quite slow in power iterations > step. > > to your other questions, e.g. U*sigma result output, see "overview and > usage" link given here: > http://mahout.apache.org/users/dim-reduction/ssvd.html > > On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan <prince.don...@googlemail.com> > wrote: > > > Hallo Suneel, > > Thanks for fast reply. > > Is SSVD like SVD? which one is better? > > I run the SSVD by java code on my data, but how do I compute U*Sigma? > Can > > I do that by Mahout? > > Is there optimal method to determin K? > > > > another quesion is how do I make the relation between ssvd output and > > words dictionary(real words)? > > > > Thank you > > Donni > > > > On Mon, Mar 30, 2015 at 10:04 AM, Suneel Marthi <suneel.mar...@gmail.com > > > > wrote: > > > > > Here are the steps if u r using Mahout-mrlegacy in the present Mahout > > > trunk: > > > > > > 1. Generate tfidf vectors from the input corpus using seq2sparse (I am > > > assuming you had done this before and hence avoiding the details) > > > > > > 2. Run SSVD on the generated tfidf vectors from (1) > > > > > > ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca > > true > > > -us true -U false -V false > > > > > > k = no. of reduced basis vectors > > > > > > You would need the U*Sigma output of the PCA flow for the next > > > clustering step > > > > > > 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) > as > > > input. > > > > > > > > > On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan < > > prince.don...@googlemail.com> > > > wrote: > > > > > > > Hallo Mahout users, > > > > > > > > I'm working on text clustering, I would like to reduce the features > to > > > > enhance the clustering process. > > > > I would like to use the Singular Value Decomposition before > cluatering > > > > process. I will be thankfull if anyone has used this before, Is it a > > good > > > > idea for clustering? > > > > Is there any other method in mahout to reduce the text features > before > > > > clustring? > > > > Is anyone has idea how can I apply SVD by using Java code? > > > > > > > > Thanks in advance, > > > > Donni > > > > > > > > > >