Hello all,
I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results with
Matlab and am having issues satisfying myself regarding the correctness of
Mahout's output. I would appreciate some clarification from some one who was
looked at the code for a longer period of time than I have. I used a similar
code to what Danny has done here (
https://issues.apache.org/jira/browse/MAHOUT-369 )
I added to TestLanczosSolver the following code:
@Test
public void testLanczosSolver2() throws Exception {
int numRows = 3; int numCols = 3;
SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols});
/**
*
* 3.1200 -3.1212 -3.0000
* -3.1110 1.5000 2.1212
* -7.0000 -8.0000 -4.0000
*
* */
m.set(0,0,3.12);
m.set(0,1,-3.12121);
m.set(0,2,-3);
m.set(1,0,-3.111);
m.set(1,1,1.5);
m.set(1,2,2.12122);
m.set(2,0,-7);
m.set(2,1,-8);
m.set(2,2,-4);
int rank = 3;
System.out.println("******** Starting Eshwaran's Tests *************");
Vector initialVector = new DenseVector(numCols);
initialVector.assign(1d / Math.sqrt(numCols));
LanczosState state = new LanczosState(m, numCols, rank, initialVector);
long time = timeLanczos(m, state, rank, false);
assertTrue("Lanczos taking too long! Are you in the debugger? ", time <
10000);
//assertOrthonormal(eigens);
////assertEigen(eigens, m, 0.1, false);
}
Note that I had to slightly modify Danny's test code to get it working with the
(latest ?) Mahout API.
I printed out the value of realEigen in LanczosSolver.java. I also commented
the normalization step ( //nextVector.assign(new Scale(1.0 /
state.getScaleFactor()));
) as was recommended in that discussion.
My output:
******** Starting Eshwaran's Tests -I *************
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: 1 passes through the corpus so far...
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: 2 passes through the corpus so far...
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
auxiliary matrix.
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.5536042338073482, 0.7356862573677923, 0.39024105759229055]
found with eigenvalue 0.0
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.16585402589950515, -0.5566129719645326,
0.8140481813343339] found with eigenvalue 4.755295040050496
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: Eigenvector [0.8160972946919413, -0.38593746923689726,
-0.43015982545504394] found with eigenvalue 129.2456107625402
Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
INFO: LanczosSolver finished.
Comparing with Matlab.
A =
3.1200 -3.1212 -3.0000
-3.1110 1.5000 2.1212
-7.0000 -8.0000 -4.0000
[a,b] = eig(A'*A)
a =
0.2132 -0.8010 -0.5593
-0.5785 0.3578 -0.7330
0.7873 0.4799 -0.3871
b =
0.0314 0 0
0 42.6175 0
0 0 131.2552
Note that only one of the Eigen Values matches. Uncommenting out the
normalization step obviously ensured that nothing matched. Furthermore, there
are sign changes in the eigen vectors and they don't appear to be correctly
matched up. For example, the eigen vector corresponding to value (131) in
Mahout's case is [0.8160972946919413, -0.38593746923689726,
-0.43015982545504394] which as you can see from the Matlab output is the Eigen
Vector associated with 42.61.
Can someone clarifying what I am missing here ?
Thanks in advance
Eshwaran