Hi Eshwaran,

  The results of the SVD via Lanczos, for an asymmetric matrix (like you
have in this example) are the square roots of the eigenvalues of
a.times(a.transpose()).  Lanczos is also iterative (it's just fancy power
iteration, after all), so you won't converge well on only a rank 3 matrix
(which means you only iterate twice, as you start with an input seed
vector).

  Can you try doing a) using a larger matrix, like say 10x10, and b) take
the square roots of the values you get from the matlab eigen decomposition
of the (square of the) input matrix, and tell me if your results look
closer?

  We currently effectively do this
in TestLanczosSolver.testEigenvalueCheck(), but only for a symmetric matrix,
by comparing Lanczos against COLT's eigen decomposition code (see line 48-49
in that test - this is calling out to COLT, which we absorbed).  As you can
see from the output, the errors in comparison to COLT, on a 100x100
symmetric matrix, are around 10^-16 or smaller for the first 10
eigenvectors.  If you print out  Math.abs((s-e)/e) from line 56, you can see
the fractional error in the eigenvalue as well, which stays less than 10^-10
all the way out past eigenvalue 45, but then quickly passing 1% error at
around eigenvalue 55, and get completely horrible after eigenvalue 60.

  Lanczos is designed to get the first few eigenvectors and eigenvalues, and
does a very good job at getting those vector/value pairs to very high
accuracy, but don't expect an accurate "full rank" decomposition from this
algorithm, especially on small matrices.

  -jake

On Tue, Jun 14, 2011 at 2:02 PM, Eshwaran Vijaya Kumar <
[email protected]> wrote:

> Hello all,
>  I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results
> with Matlab and am having issues satisfying myself regarding  the
> correctness of Mahout's output.  I would appreciate some clarification from
> some one who was looked at the code for a longer period of time than I have.
> I used a similar code to what Danny has done here (
> https://issues.apache.org/jira/browse/MAHOUT-369 )
>
>
> I added to TestLanczosSolver the following code:
>
>
>  @Test
>  public void testLanczosSolver2() throws Exception {
>    int numRows = 3; int numCols = 3;
>    SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols});
>    /**
>     *
>     *     3.1200 -3.1212 -3.0000
>     *         -3.1110 1.5000 2.1212
>     *             -7.0000 -8.0000 -4.0000
>     *
>     *             */
>    m.set(0,0,3.12);
>    m.set(0,1,-3.12121);
>    m.set(0,2,-3);
>    m.set(1,0,-3.111);
>    m.set(1,1,1.5);
>    m.set(1,2,2.12122);
>    m.set(2,0,-7);
>    m.set(2,1,-8);
>    m.set(2,2,-4);
>
>    int rank = 3;
>    System.out.println("******** Starting Eshwaran's Tests *************");
>    Vector initialVector = new DenseVector(numCols);
>    initialVector.assign(1d / Math.sqrt(numCols));
>    LanczosState state = new LanczosState(m, numCols, rank, initialVector);
>    long time = timeLanczos(m, state, rank, false);
>    assertTrue("Lanczos taking too long! Are you in the debugger? ", time <
> 10000);
>    //assertOrthonormal(eigens);
>    ////assertEigen(eigens, m, 0.1, false);
>   }
>
> Note that I had to slightly modify Danny's test code to get it working with
> the (latest ?) Mahout API.
>
>
> I printed out the value of realEigen in LanczosSolver.java. I also
> commented the normalization step ( //nextVector.assign(new Scale(1.0 /
> state.getScaleFactor()));
>
>  )  as was recommended in that discussion.
>
> My output:
>
> ******** Starting Eshwaran's Tests -I *************
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: 1 passes through the corpus so far...
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: 2 passes through the corpus so far...
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
> auxiliary matrix.
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Eigenvector [0.5536042338073482, 0.7356862573677923,
> 0.39024105759229055] found with eigenvalue 0.0
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Eigenvector [0.16585402589950515, -0.5566129719645326,
> 0.8140481813343339] found with eigenvalue 4.755295040050496
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: Eigenvector [0.8160972946919413, -0.38593746923689726,
> -0.43015982545504394] found with eigenvalue 129.2456107625402
> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
> INFO: LanczosSolver finished.
>
>
>
>
> Comparing  with Matlab.
>
> A =
>
>    3.1200   -3.1212   -3.0000
>   -3.1110    1.5000    2.1212
>   -7.0000   -8.0000   -4.0000
>
>
>  [a,b] = eig(A'*A)
>
> a =
>
>    0.2132   -0.8010   -0.5593
>   -0.5785    0.3578   -0.7330
>    0.7873    0.4799   -0.3871
>
>
> b =
>
>    0.0314         0         0
>         0   42.6175         0
>         0         0  131.2552
>
>
>
>
> Note that only one of the Eigen Values matches. Uncommenting out the
> normalization step obviously ensured that nothing matched. Furthermore,
> there are sign changes in the eigen vectors and they don't appear to be
> correctly matched up. For example, the eigen vector corresponding to value
> (131) in Mahout's case is [0.8160972946919413, -0.38593746923689726,
> -0.43015982545504394]  which as you can see from the Matlab output is the
> Eigen Vector associated with 42.61.
>
>
> Can someone clarifying what I am missing here ?
>
> Thanks in advance
> Eshwaran
>
>
>
>
>
>
>
>
>
>
>
>
>

Reply via email to