Re: Text clustering with SVD

Dmitriy Lyubimov Mon, 30 Mar 2015 12:01:20 -0700

I am not aware of _any_ scenario under which lanczos would be faster (see
N. Halko's dissertation for comparisons), although admittedly i did not
study all possible cases.


having -k=100 is probably enough for anything.  I would not recommend
running -q>0 for k>100 as it would become quite slow in power iterations
step.

to your other questions, e.g. U*sigma result output, see "overview and
usage" link given here:
http://mahout.apache.org/users/dim-reduction/ssvd.html

On Mon, Mar 30, 2015 at 2:19 AM, Donni Khan <prince.don...@googlemail.com>
wrote:

> Hallo Suneel,
> Thanks for fast reply.
> Is SSVD like SVD? which one is better?
> I run the SSVD  by java code on my data, but how do I compute U*Sigma?  Can
> I do that by Mahout?
> Is there optimal method to determin K?
>
> another quesion is how do I make the relation between ssvd output and
> words dictionary(real words)?
>
> Thank you
> Donni
>
> On Mon, Mar 30, 2015 at 10:04 AM, Suneel Marthi <suneel.mar...@gmail.com>
> wrote:
>
> > Here are the steps if u r using Mahout-mrlegacy in the present Mahout
> > trunk:
> >
> > 1. Generate tfidf vectors from the input corpus using seq2sparse (I am
> > assuming you had done this before and hence avoiding the details)
> >
> > 2. Run SSVD on the generated tfidf vectors from (1)
> >
> >       ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca
> true
> > -us true -U false -V false
> >
> >      k = no. of reduced basis vectors
> >
> >     You would need the U*Sigma output of the PCA flow for the next
> > clustering step
> >
> > 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as
> > input.
> >
> >
> > On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan <
> prince.don...@googlemail.com>
> > wrote:
> >
> > > Hallo Mahout users,
> > >
> > > I'm working on text clustering, I would like to reduce the features to
> > > enhance the clustering process.
> > > I would like to use  the Singular Value Decomposition before cluatering
> > > process. I will be thankfull if anyone has used this before, Is it a
> good
> > > idea for clustering?
> > > Is there any other method in mahout to reduce the text features before
> > > clustring?
> > > Is anyone has idea how can I apply SVD by using Java code?
> > >
> > > Thanks in advance,
> > > Donni
> > >
> >
>

Re: Text clustering with SVD

Reply via email to