Hi Eshwaran, The results of the SVD via Lanczos, for an asymmetric matrix (like you have in this example) are the square roots of the eigenvalues of a.times(a.transpose()). Lanczos is also iterative (it's just fancy power iteration, after all), so you won't converge well on only a rank 3 matrix (which means you only iterate twice, as you start with an input seed vector).
Can you try doing a) using a larger matrix, like say 10x10, and b) take the square roots of the values you get from the matlab eigen decomposition of the (square of the) input matrix, and tell me if your results look closer? We currently effectively do this in TestLanczosSolver.testEigenvalueCheck(), but only for a symmetric matrix, by comparing Lanczos against COLT's eigen decomposition code (see line 48-49 in that test - this is calling out to COLT, which we absorbed). As you can see from the output, the errors in comparison to COLT, on a 100x100 symmetric matrix, are around 10^-16 or smaller for the first 10 eigenvectors. If you print out Math.abs((s-e)/e) from line 56, you can see the fractional error in the eigenvalue as well, which stays less than 10^-10 all the way out past eigenvalue 45, but then quickly passing 1% error at around eigenvalue 55, and get completely horrible after eigenvalue 60. Lanczos is designed to get the first few eigenvectors and eigenvalues, and does a very good job at getting those vector/value pairs to very high accuracy, but don't expect an accurate "full rank" decomposition from this algorithm, especially on small matrices. -jake On Tue, Jun 14, 2011 at 2:02 PM, Eshwaran Vijaya Kumar < [email protected]> wrote: > Hello all, > I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results > with Matlab and am having issues satisfying myself regarding the > correctness of Mahout's output. I would appreciate some clarification from > some one who was looked at the code for a longer period of time than I have. > I used a similar code to what Danny has done here ( > https://issues.apache.org/jira/browse/MAHOUT-369 ) > > > I added to TestLanczosSolver the following code: > > > @Test > public void testLanczosSolver2() throws Exception { > int numRows = 3; int numCols = 3; > SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols}); > /** > * > * 3.1200 -3.1212 -3.0000 > * -3.1110 1.5000 2.1212 > * -7.0000 -8.0000 -4.0000 > * > * */ > m.set(0,0,3.12); > m.set(0,1,-3.12121); > m.set(0,2,-3); > m.set(1,0,-3.111); > m.set(1,1,1.5); > m.set(1,2,2.12122); > m.set(2,0,-7); > m.set(2,1,-8); > m.set(2,2,-4); > > int rank = 3; > System.out.println("******** Starting Eshwaran's Tests *************"); > Vector initialVector = new DenseVector(numCols); > initialVector.assign(1d / Math.sqrt(numCols)); > LanczosState state = new LanczosState(m, numCols, rank, initialVector); > long time = timeLanczos(m, state, rank, false); > assertTrue("Lanczos taking too long! Are you in the debugger? ", time < > 10000); > //assertOrthonormal(eigens); > ////assertEigen(eigens, m, 0.1, false); > } > > Note that I had to slightly modify Danny's test code to get it working with > the (latest ?) Mahout API. > > > I printed out the value of realEigen in LanczosSolver.java. I also > commented the normalization step ( //nextVector.assign(new Scale(1.0 / > state.getScaleFactor())); > > ) as was recommended in that discussion. > > My output: > > ******** Starting Eshwaran's Tests -I ************* > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: 1 passes through the corpus so far... > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: 2 passes through the corpus so far... > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal > auxiliary matrix. > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Eigenvector [0.5536042338073482, 0.7356862573677923, > 0.39024105759229055] found with eigenvalue 0.0 > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Eigenvector [0.16585402589950515, -0.5566129719645326, > 0.8140481813343339] found with eigenvalue 4.755295040050496 > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: Eigenvector [0.8160972946919413, -0.38593746923689726, > -0.43015982545504394] found with eigenvalue 129.2456107625402 > Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info > INFO: LanczosSolver finished. > > > > > Comparing with Matlab. > > A = > > 3.1200 -3.1212 -3.0000 > -3.1110 1.5000 2.1212 > -7.0000 -8.0000 -4.0000 > > > [a,b] = eig(A'*A) > > a = > > 0.2132 -0.8010 -0.5593 > -0.5785 0.3578 -0.7330 > 0.7873 0.4799 -0.3871 > > > b = > > 0.0314 0 0 > 0 42.6175 0 > 0 0 131.2552 > > > > > Note that only one of the Eigen Values matches. Uncommenting out the > normalization step obviously ensured that nothing matched. Furthermore, > there are sign changes in the eigen vectors and they don't appear to be > correctly matched up. For example, the eigen vector corresponding to value > (131) in Mahout's case is [0.8160972946919413, -0.38593746923689726, > -0.43015982545504394] which as you can see from the Matlab output is the > Eigen Vector associated with 42.61. > > > Can someone clarifying what I am missing here ? > > Thanks in advance > Eshwaran > > > > > > > > > > > > >
