Well. I think one of the first things I tried was using rank=4. None of my log statements were getting executed in that case: I assumed from the discussion in the patch ( https://issues.apache.org/jira/browse/MAHOUT-369 ) that that was one of the issues sorted from the previous release. I am not especially worried about sign changes: After all, Suppose A*x = \lambda*x, A*-x = \lambda*(-x), i.e. both vectors lie in the same eigen space. My main concern is that the issues the patch is supposed to rectify: Matching eigen vectors with their associated eigen values was not happening properly.. On Jun 15, 2011, at 6:46 AM, Danny Bickson wrote:
> Hi Eshwaran! > Can you please try with rank=4 and let me know what do you get? If I recall > correctly the requested rank should be 4, and > then you get 3 eigenvalues. Take a look at: > http://bickson.blogspot.com/2011/02/some-thoughts-about-accuracy-of-mahouts.html > > Regarding sign changes, I remember seeing this as well... > The best way to debug is to download the Lanczos code I wrote in Matlab: > http://www.cs.cmu.edu/~bickson/gabp/#download > and then run iteration by iteration in the debugger. Instruction for setting > the debugging environment in Eclipse > are found here: > http://bickson.blogspot.com/2011/02/hadoopmahout-setting-up-development.html > > Best, > > DB > > On Tue, Jun 14, 2011 at 5:02 PM, Eshwaran Vijaya Kumar < > evijayaku...@mozilla.com> wrote: > >> Hello all, >> I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results >> with Matlab and am having issues satisfying myself regarding the >> correctness of Mahout's output. I would appreciate some clarification from >> some one who was looked at the code for a longer period of time than I have. >> I used a similar code to what Danny has done here ( >> https://issues.apache.org/jira/browse/MAHOUT-369 ) >> >> >> I added to TestLanczosSolver the following code: >> >> >> @Test >> public void testLanczosSolver2() throws Exception { >> int numRows = 3; int numCols = 3; >> SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols}); >> /** >> * >> * 3.1200 -3.1212 -3.0000 >> * -3.1110 1.5000 2.1212 >> * -7.0000 -8.0000 -4.0000 >> * >> * */ >> m.set(0,0,3.12); >> m.set(0,1,-3.12121); >> m.set(0,2,-3); >> m.set(1,0,-3.111); >> m.set(1,1,1.5); >> m.set(1,2,2.12122); >> m.set(2,0,-7); >> m.set(2,1,-8); >> m.set(2,2,-4); >> >> int rank = 3; >> System.out.println("******** Starting Eshwaran's Tests *************"); >> Vector initialVector = new DenseVector(numCols); >> initialVector.assign(1d / Math.sqrt(numCols)); >> LanczosState state = new LanczosState(m, numCols, rank, initialVector); >> long time = timeLanczos(m, state, rank, false); >> assertTrue("Lanczos taking too long! Are you in the debugger? ", time < >> 10000); >> //assertOrthonormal(eigens); >> ////assertEigen(eigens, m, 0.1, false); >> } >> >> Note that I had to slightly modify Danny's test code to get it working with >> the (latest ?) Mahout API. >> >> >> I printed out the value of realEigen in LanczosSolver.java. I also >> commented the normalization step ( //nextVector.assign(new Scale(1.0 / >> state.getScaleFactor())); >> >> ) as was recommended in that discussion. >> >> My output: >> >> ******** Starting Eshwaran's Tests -I ************* >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: 1 passes through the corpus so far... >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: 2 passes through the corpus so far... >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal >> auxiliary matrix. >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Eigenvector [0.5536042338073482, 0.7356862573677923, >> 0.39024105759229055] found with eigenvalue 0.0 >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Eigenvector [0.16585402589950515, -0.5566129719645326, >> 0.8140481813343339] found with eigenvalue 4.755295040050496 >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: Eigenvector [0.8160972946919413, -0.38593746923689726, >> -0.43015982545504394] found with eigenvalue 129.2456107625402 >> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info >> INFO: LanczosSolver finished. >> >> >> >> >> Comparing with Matlab. >> >> A = >> >> 3.1200 -3.1212 -3.0000 >> -3.1110 1.5000 2.1212 >> -7.0000 -8.0000 -4.0000 >> >> >> [a,b] = eig(A'*A) >> >> a = >> >> 0.2132 -0.8010 -0.5593 >> -0.5785 0.3578 -0.7330 >> 0.7873 0.4799 -0.3871 >> >> >> b = >> >> 0.0314 0 0 >> 0 42.6175 0 >> 0 0 131.2552 >> >> >> >> >> Note that only one of the Eigen Values matches. Uncommenting out the >> normalization step obviously ensured that nothing matched. Furthermore, >> there are sign changes in the eigen vectors and they don't appear to be >> correctly matched up. For example, the eigen vector corresponding to value >> (131) in Mahout's case is [0.8160972946919413, -0.38593746923689726, >> -0.43015982545504394] which as you can see from the Matlab output is the >> Eigen Vector associated with 42.61. >> >> >> Can someone clarifying what I am missing here ? >> >> Thanks in advance >> Eshwaran >> >> >> >> >> >> >> >> >> >> >> >> >>