One thing to watch out for in clustering SVD stuff is eigenspokes.

www.cs.cmu.edu/~badityap/papers/*eigenspokes*-pakdd10.pdf

On Mon, Jun 6, 2011 at 12:08 PM, Stefan Wienert <ste...@wienert.cc> wrote:

> Yeha :) That's what I assumed. So this problem is solved. Thanks!
>
> And now your questions:
>
> I want to do LSA/LSI with Mahout. Therefore I take the TDIDF
> Document-Term-Matrix and reduce it using Lanczos. So I get the
> Document-Concept-Vectors. These Vectors will be compared with cosine
> similarity to find similar documents. I can also cluster these
> documents with k-means...
>
> If you have any suggestions, feel free to tell me. I'm also interested
> in other document-similarity-techniques.
>
> Cheers,
> Stefan
>
> 2011/6/6 Danny Bickson <danny.bick...@gmail.com>:
> > If I understand your question correctly, you need to simply transpose M
> > before you start the run and that way
> > you will get the other singular vectors.
> >
> > May I ask what is the problem you are working on and why do you need the
> > singular vectors?
> > Can you consider using another matrix decomposition technique for example
> > alternating least squares
> > which gives you two lower rank matrices which simulates the large
> decomposed
> > matrix?
> >
> > On Mon, Jun 6, 2011 at 1:30 PM, Stefan Wienert <ste...@wienert.cc>
> wrote:
> >
> >> Hi Danny!
> >>
> >> I understand that for M*M' (and for M'*M) the left and right
> >> eigenvectors are identical. But that is not exactly what I want. The
> >> lanczos solver from mahout gives me the eigenvectors of M*M', which
> >> are the left singular vectors of M. But I need the right singular
> >> vectors of M (and not M*M'). How do I get them?
> >>
> >> Sorry, my matrix math is not as good as it should be, but I hope you
> >> can help me!
> >>
> >> Thanks,
> >> Stefan
> >>
> >> 2011/6/6 Danny Bickson <danny.bick...@gmail.com>:
> >> > Hi Stefan!
> >> > For a positive semidefinite matrix, the lest and right eigenvectors
> are
> >> > identical.
> >> > See SVD wikipeida text: When *M* is also positive
> >> > semi-definite<http://en.wikipedia.org/wiki/Positive-definite_matrix>,
> >> > the decomposition *M* = *U**D**U* * is also a singular value
> >> decomposition.
> >> > So you don't need to be worried about the other singular vectors.
> >> >
> >> > Hope this helps!
> >> >
> >> > On Mon, Jun 6, 2011 at 12:57 PM, Stefan Wienert <ste...@wienert.cc>
> >> wrote:
> >> >
> >> >> Hi.
> >> >>
> >> >> Thanks for the help.
> >> >>
> >> >> The important points from wikipedia are:
> >> >> - The left singular vectors of M are eigenvectors of M*M' .
> >> >> - The right singular vectors of M are eigenvectors of M'*M.
> >> >>
> >> >> as you describe, the mahout lanczos solver calculate A=M'*M (I think
> >> >> it does A=M*M', but it is not a problem). Therefore it does already
> >> >> calculate the right (or left) singular vector of M.
> >> >>
> >> >> But my question is, how can I get the other singular vector? I can
> >> >> transpose M, but then I have to calculated two SVDs, one for the
> right
> >> >> and one for the left singular value... I think there is a better way
> >> >> :)
> >> >>
> >> >> Hope you can help me with this...
> >> >> Thanks
> >> >> Stefan
> >> >>
> >> >>
> >> >> 2011/6/6 Danny Bickson <danny.bick...@gmail.com>:
> >> >> > Hi
> >> >> > Mahout SVD implementation computes the Lanzcos iteration:
> >> >> > http://en.wikipedia.org/wiki/Lanczos_algorithm
> >> >> > Denote the non-square input matrix as M. First a symmetric matrix A
> is
> >> >> > computed by A=M'*M
> >> >> > Then an approximating tridiagonal matrix T and a vector matrix V
> are
> >> >> > computed such that A =~ V*T*V'
> >> >> > (this process is done in a distributed way).
> >> >> >
> >> >> > Next the matrix T is next decomposed into eigenvectors and
> >> eignevalues.
> >> >> > Which is the returned result. (This process
> >> >> > is serial).
> >> >> >
> >> >> > The third step makes the returned eigenvectors orthogonal to each
> >> other
> >> >> > (which is optional IMHO).
> >> >> >
> >> >> > The heart of the code is found at:
> >> >> >
> >> >>
> >>
> ./math/src/main/java/org/apache/mahout/math/decomposer/lanczos/LanczosSolver.java
> >> >> > At least that is where it was in version 0.4 I am not sure if there
> >> are
> >> >> > changes in version 0.5
> >> >> >
> >> >> > Anyway, Mahout does not compute directly SVD. If you are interested
> in
> >> >> > learning more about the relation to SVD
> >> >> > look at: http://en.wikipedia.org/wiki/Singular_value_decomposition
> ,
> >> >> > subsection: relation to eigenvalue decomposition.
> >> >> >
> >> >> > Hope this helps,
> >> >> >
> >> >> > Danny Bickson
> >> >> >
> >> >> > On Mon, Jun 6, 2011 at 9:35 AM, Stefan Wienert <ste...@wienert.cc>
> >> >> wrote:
> >> >> >
> >> >> >> After reading this thread:
> >> >> >>
> >> >> >>
> >> >>
> >>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3caanlktinq5k4xrm7nabwn8qobxzgvobbot2rtjzsv4...@mail.gmail.com%3E
> >> >> >>
> >> >> >> Wiki-SVD: M = U S V* (* = transposed)
> >> >> >>
> >> >> >> The output of Mahout-SVD is (U S) right?
> >> >> >>
> >> >> >> So... How do I get V from (U S)  and M?
> >> >> >>
> >> >> >> Is V = M (U S)* (because this is, what the calculation in the
> example
> >> >> is)?
> >> >> >>
> >> >> >> Thanks
> >> >> >> Stefan
> >> >> >>
> >> >> >> 2011/6/6 Stefan Wienert <ste...@wienert.cc>:
> >> >> >> >
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
> >> >> >> >
> >> >> >> > What is done:
> >> >> >> >
> >> >> >> > Input:
> >> >> >> > tf-idf-matrix (docs x terms) 6076937 x 20444
> >> >> >> >
> >> >> >> > "SVD" of tf-idf-matrix (rank 100) produces the eigenvector (and
> >> >> >> > eigenvalues) of tf-idf-matrix, called:
> >> >> >> > svd (concepts x terms) 87 x 20444
> >> >> >> >
> >> >> >> > transpose tf-idf-matrix:
> >> >> >> > tf-idf-matrix-transpose (terms x docs) 20444 x 6076937
> >> >> >> >
> >> >> >> > transpose svd:
> >> >> >> > svd-transpose (terms x concepts) 20444 x 87
> >> >> >> >
> >> >> >> > matrix multiply:
> >> >> >> > tf-idf-matrix-transpose x svd-transpose = result
> >> >> >> > (terms x docs) x (terms x concepts) = (docs x concepts)
> >> >> >> >
> >> >> >> > so... I do understand, that the "svd" here is not SVD from
> >> wikipedia.
> >> >> >> > It only does the Lanczos algorithm and some magic which produces
> >> the
> >> >> >> >> Instead either the left or right (but usually the right)
> >> eigenvectors
> >> >> >> premultiplied by the diagonal or the square root of the
> >> >> >> >> diagonal element.
> >> >> >> > from
> >> >> >>
> >> >>
> >>
> http://mail-archives.apache.org/mod_mbox/mahout-user/201102.mbox/%3CAANLkTi=rta7tfrm8zi60vcfya5xf+dbfrj8pcds2n...@mail.gmail.com%3E
> >> >> >> >
> >> >> >> > so my question: what is the output of the SVD in mahout. And
> what
> >> do I
> >> >> >> > have to calculate to get the "right singular value" from svd?
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> > Stefan
> >> >> >> >
> >> >> >> > 2011/6/6 Stefan Wienert <ste...@wienert.cc>:
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
> >> >> >> >>
> >> >> >> >> the last step is the matrix multiplication:
> >> >> >> >>  --arg --numRowsA --arg 20444 \
> >> >> >> >>  --arg --numColsA --arg 6076937 \
> >> >> >> >>  --arg --numRowsB --arg 20444 \
> >> >> >> >>  --arg --numColsB --arg 87 \
> >> >> >> >> so the result is a 6,076,937 x 87 matrix
> >> >> >> >>
> >> >> >> >> the input has 6,076,937 (each with 20,444 terms). so the result
> of
> >> >> >> >> matrix multiplication has to be the right singular value
> regarding
> >> to
> >> >> >> >> the dimensions.
> >> >> >> >>
> >> >> >> >> so the result is the "concept-document vector matrix" (as I
> think,
> >> >> >> >> these is also called "document vectors" ?)
> >> >> >> >>
> >> >> >> >> 2011/6/6 Ted Dunning <ted.dunn...@gmail.com>:
> >> >> >> >>> Yes.  These are term vectors, not document vectors.
> >> >> >> >>>
> >> >> >> >>> There is an additional step that can be run to produce
> document
> >> >> >> vectors.
> >> >> >> >>>
> >> >> >> >>> On Sun, Jun 5, 2011 at 1:16 PM, Stefan Wienert
> <ste...@wienert.cc
> >> >
> >> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>>> compared to SVD, is the result is the "right singular value"?
> >> >> >> >>>>
> >> >> >> >>>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Stefan Wienert
> >> >> >> >>
> >> >> >> >> http://www.wienert.cc
> >> >> >> >> ste...@wienert.cc
> >> >> >> >>
> >> >> >> >> Telefon: +495251-2026838
> >> >> >> >> Mobil: +49176-40170270
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> > --
> >> >> >> > Stefan Wienert
> >> >> >> >
> >> >> >> > http://www.wienert.cc
> >> >> >> > ste...@wienert.cc
> >> >> >> >
> >> >> >> > Telefon: +495251-2026838
> >> >> >> > Mobil: +49176-40170270
> >> >> >> >
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> --
> >> >> >> Stefan Wienert
> >> >> >>
> >> >> >> http://www.wienert.cc
> >> >> >> ste...@wienert.cc
> >> >> >>
> >> >> >> Telefon: +495251-2026838
> >> >> >> Mobil: +49176-40170270
> >> >> >>
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Stefan Wienert
> >> >>
> >> >> http://www.wienert.cc
> >> >> ste...@wienert.cc
> >> >>
> >> >> Telefon: +495251-2026838
> >> >> Mobil: +49176-40170270
> >> >>
> >> >
> >>
> >>
> >>
> >> --
> >> Stefan Wienert
> >>
> >> http://www.wienert.cc
> >> ste...@wienert.cc
> >>
> >> Telefon: +495251-2026838
> >> Mobil: +49176-40170270
> >>
> >
>
>
>
> --
> Stefan Wienert
>
> http://www.wienert.cc
> ste...@wienert.cc
>
> Telefon: +495251-2026838
> Mobil: +49176-40170270
>

Reply via email to