Hello all,
  I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results with 
Matlab and am having issues satisfying myself regarding  the correctness of 
Mahout's output.  I would appreciate some clarification from some one who was 
looked at the code for a longer period of time than I have. I used a similar 
code to what Danny has done here ( 
https://issues.apache.org/jira/browse/MAHOUT-369 ) 


I added to TestLanczosSolver the following code: 


 @Test
  public void testLanczosSolver2() throws Exception {
    int numRows = 3; int numCols = 3;
    SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols});
    /**
     *
     *     3.1200 -3.1212 -3.0000
     *         -3.1110 1.5000 2.1212
     *             -7.0000 -8.0000 -4.0000
     *
     *             */
    m.set(0,0,3.12);
    m.set(0,1,-3.12121);
    m.set(0,2,-3);
    m.set(1,0,-3.111);
    m.set(1,1,1.5);
    m.set(1,2,2.12122);
    m.set(2,0,-7);
    m.set(2,1,-8);
    m.set(2,2,-4);

    int rank = 3;
    System.out.println("******** Starting Eshwaran's Tests *************");
    Vector initialVector = new DenseVector(numCols);
    initialVector.assign(1d / Math.sqrt(numCols));
    LanczosState state = new LanczosState(m, numCols, rank, initialVector);
    long time = timeLanczos(m, state, rank, false);
    assertTrue("Lanczos taking too long! Are you in the debugger? ", time < 
10000);
    //assertOrthonormal(eigens);   
    ////assertEigen(eigens, m, 0.1, false);
   }
 
Note that I had to slightly modify Danny's test code to get it working with the 
(latest ?) Mahout API. 


I printed out the value of realEigen in LanczosSolver.java. I also commented 
the normalization step ( //nextVector.assign(new Scale(1.0 / 
state.getScaleFactor()));

 )  as was recommended in that discussion. 

My output:

******** Starting Eshwaran's Tests -I *************
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: 1 passes through the corpus so far...
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: 2 passes through the corpus so far...
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal 
auxiliary matrix.
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.5536042338073482, 0.7356862573677923, 0.39024105759229055] 
found with eigenvalue 0.0
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.16585402589950515, -0.5566129719645326, 
0.8140481813343339] found with eigenvalue 4.755295040050496
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.8160972946919413, -0.38593746923689726, 
-0.43015982545504394] found with eigenvalue 129.2456107625402
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: LanczosSolver finished.




Comparing  with Matlab. 

A =

    3.1200   -3.1212   -3.0000
   -3.1110    1.5000    2.1212
   -7.0000   -8.0000   -4.0000


 [a,b] = eig(A'*A)

a =

    0.2132   -0.8010   -0.5593
   -0.5785    0.3578   -0.7330
    0.7873    0.4799   -0.3871


b =

    0.0314         0         0
         0   42.6175         0
         0         0  131.2552




Note that only one of the Eigen Values matches. Uncommenting out the 
normalization step obviously ensured that nothing matched. Furthermore, there 
are sign changes in the eigen vectors and they don't appear to be correctly 
matched up. For example, the eigen vector corresponding to value (131) in 
Mahout's case is [0.8160972946919413, -0.38593746923689726, 
-0.43015982545504394]  which as you can see from the Matlab output is the Eigen 
Vector associated with 42.61. 


Can someone clarifying what I am missing here ? 

Thanks in advance
Eshwaran









 


Reply via email to