Re: Text clustering with SVD

Dmitriy Lyubimov Mon, 30 Mar 2015 13:52:05 -0700

Note that these instructions actually mean running PCA, not SVD but that's
probably the intention here. I don't think just running SVD helps.


On Mon, Mar 30, 2015 at 1:04 AM, Suneel Marthi <suneel.mar...@gmail.com>
wrote:

> Here are the steps if u r using Mahout-mrlegacy in the present Mahout
> trunk:
>
> 1. Generate tfidf vectors from the input corpus using seq2sparse (I am
> assuming you had done this before and hence avoiding the details)
>
> 2. Run SSVD on the generated tfidf vectors from (1)
>
>       ./bin/mahout ssvd -i <tfidf vectors> -o <svd output> -k 80 -pca true
> -us true -U false -V false
>
>      k = no. of reduced basis vectors
>
>     You would need the U*Sigma output of the PCA flow for the next
> clustering step
>
> 3. Run KMeans (or any other clustering algo) with the U*Sigma from (2) as
> input.
>
>
> On Mon, Mar 30, 2015 at 3:39 AM, Donni Khan <prince.don...@googlemail.com>
> wrote:
>
> > Hallo Mahout users,
> >
> > I'm working on text clustering, I would like to reduce the features to
> > enhance the clustering process.
> > I would like to use  the Singular Value Decomposition before cluatering
> > process. I will be thankfull if anyone has used this before, Is it a good
> > idea for clustering?
> > Is there any other method in mahout to reduce the text features before
> > clustring?
> > Is anyone has idea how can I apply SVD by using Java code?
> >
> > Thanks in advance,
> > Donni
> >
>

Re: Text clustering with SVD

Reply via email to