Note that these instructions actually mean running PCA, not SVD but that's probably the intention here. I don't think just running SVD helps.
On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi <suneel.mar...@gmail.com> wrote: > Here are the steps if u r using Mahout-mrlegacy in the present Mahout > trunk: > > 1. Generate tfidf vectors from the input corpus using seq2sparse (I am > assuming you had done this before and hence avoiding the details) > > 2. Run SSVD on the generated tfidf vectors from (1) > > ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca true > -us true -U false -V false > > k = no. of reduced basis vectors > > You would need the U*Sigma output of the PCA flow for the next > clustering step > > 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as > input. > > > On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan <prince.don...@googlemail.com> > wrote: > > > Hallo Mahout users, > > > > I'm working on text clustering, I would like to reduce the features to > > enhance the clustering process. > > I would like to use the Singular Value Decomposition before cluatering > > process. I will be thankfull if anyone has used this before, Is it a good > > idea for clustering? > > Is there any other method in mahout to reduce the text features before > > clustring? > > Is anyone has idea how can I apply SVD by using Java code? > > > > Thanks in advance, > > Donni > > >