The output from the LanczosSolver is not the final set of results.  The
fact that you passed --cleansvd "true" to the system means that you want it
to do some cleanup and remove any spurious singular vector/value pairs
(like the zero-eigenvalue case here).

Do you also see log output which looks like "appending {some vector info}
to {some path}" on your console output?  The vectors printed here should
include their eigenvalue and relative approximation error.

Your results also highlight a common fact about doing Lanczos on tiny
matrices: Lanczos is iterative, and can only iterate up to the overal
dimension of your input matrix, but only finds an approximation to the
singular vectors / values which gets better as the iterations continue.
 This is why you are getting a very accurate measure of the top singular
value, but progressively worse for lower ones.

Try running this on a larger matrix (try 100 x 100), and look at, say, the
top 50 singular vector/value pairs.  Those should be significantly more
accurate, and the only reason you would want to do a *distributed* SVD is
if a) your data is HUGE, and b) you're only wanting to look at the top few
(up to maybe hundreds) singular vector/value pairs.  Point b) is a point of
practicality if you have point a).

  -jake

On Tue, Nov 8, 2011 at 8:56 AM, Ed Fine <edward.f...@gmail.com> wrote:

> I am a Mahout newbie so please take this so I might be wrong, but I
> strongly suspect it has to do with one of your Eigenvalues being 0. That
> implies a singular matrix. You will see that your first two Eigenvalues are
> equal to the singular values. Parsing the structure in smaller eiganvals
> get numerically unstable in a near singular matrix. I bet that is your
> issue.  I think you can find a description of this issue in Numerical
> Linear Algebra by trephethan (spelling?) and Bau.
>
> On Nov 8, 2011, at 4:11 AM, motta <motta....@gmail.com> wrote:
>
> > Hi everybody,
> > I have completed my first Mahout experiment with an Hadoop local
> > installation (single machine) and I obtained different results from
> Scilab
> > and the Mahout Distributed Lanczos Solver. Could someone explain why this
> > happens? Am I doing something wrong?
> >
> > This is my matrix
> > 2,0,8,6,0
> > 1,6,0,1,7
> > 5,0,7,4,0
> > 7,0,8,5,0
> > 0,10,0,0,7
> >
> > This is my Mahout invocation
> > ./hadoop jar
> >
> /home/hadoop-user/mahout/mahout-distribution-0.5/mahout-examples-0.5-job.jar
> > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver --input
> > /user/hadoop-user/mahout-input --output /user/hadoop-user/mahout-output
> > --numCols 5 --numRows 5 --cleansvd "true" --rank 5
> >
> > These are the Mahout results
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: 4 passes through the
> corpus so
> > far...
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Lanczos iteration complete
> -
> > now to diagonalize the tri-diagonal auxiliary matrix.
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 0 found with
> > eigenvalue 0.0
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 1 found with
> > eigenvalue 1.0869992925693057
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 2 found with
> > eigenvalue 3.4305998309907
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 3 found with
> > eigenvalue 15.171371217397603
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 4 found with
> > eigenvalue 17.918370809987454
> > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: LanczosSolver finished.
> >
> > And these are the results from Scilab (svd(X))
> > -->[U,S,V]=svd(X);
> > -->S
> > S  =
> >
> >    17.918371    0.           0.          0.           0.
> >    0.           15.171372    0.          0.           0.
> >    0.           0.           3.564002    0.           0.
> >    0.           0.           0.          1.9842282    0.
> >    0.           0.           0.          0.           0.3495557
> >
> > thank you,
> > Alfredo
> >
> >
> > --
> > View this message in context:
> http://lucene.472066.n3.nabble.com/Comparing-results-of-Mahout-SVD-and-Scilab-tp3490066p3490066.html
> > Sent from the Mahout User List mailing list archive at Nabble.com.
>

Reply via email to