Well. I think one of the first things I tried was using rank=4. None of my log 
statements were getting executed in that case: I assumed from the discussion in 
the patch ( https://issues.apache.org/jira/browse/MAHOUT-369 ) that that was 
one of the issues sorted from the previous release. I am not especially worried 
about sign changes: After all, Suppose A*x = \lambda*x, A*-x = \lambda*(-x), 
i.e. both vectors lie in the same eigen space. My main concern is that the 
issues the patch is supposed to rectify: Matching eigen vectors with their 
associated eigen values was not happening properly..
On Jun 15, 2011, at 6:46 AM, Danny Bickson wrote:

> Hi Eshwaran!
> Can you please try with rank=4 and let me know what do you get? If I recall
> correctly the requested rank should be 4, and
> then you get 3 eigenvalues. Take a look at:
> http://bickson.blogspot.com/2011/02/some-thoughts-about-accuracy-of-mahouts.html
> 
> Regarding sign changes, I remember seeing this as well...
> The best way to debug is to download the Lanczos code I wrote in Matlab:
> http://www.cs.cmu.edu/~bickson/gabp/#download
> and then run iteration by iteration in the debugger. Instruction for setting
> the debugging environment in Eclipse
> are found here:
> http://bickson.blogspot.com/2011/02/hadoopmahout-setting-up-development.html
> 
> Best,
> 
> DB
> 
> On Tue, Jun 14, 2011 at 5:02 PM, Eshwaran Vijaya Kumar <
> evijayaku...@mozilla.com> wrote:
> 
>> Hello all,
>> I am trying to compare the Mahout (0.5 RELEASE) Lanczos Solver Results
>> with Matlab and am having issues satisfying myself regarding  the
>> correctness of Mahout's output.  I would appreciate some clarification from
>> some one who was looked at the code for a longer period of time than I have.
>> I used a similar code to what Danny has done here (
>> https://issues.apache.org/jira/browse/MAHOUT-369 )
>> 
>> 
>> I added to TestLanczosSolver the following code:
>> 
>> 
>> @Test
>> public void testLanczosSolver2() throws Exception {
>>   int numRows = 3; int numCols = 3;
>>   SparseRowMatrix m = new SparseRowMatrix(new int[]{numRows, numCols});
>>   /**
>>    *
>>    *     3.1200 -3.1212 -3.0000
>>    *         -3.1110 1.5000 2.1212
>>    *             -7.0000 -8.0000 -4.0000
>>    *
>>    *             */
>>   m.set(0,0,3.12);
>>   m.set(0,1,-3.12121);
>>   m.set(0,2,-3);
>>   m.set(1,0,-3.111);
>>   m.set(1,1,1.5);
>>   m.set(1,2,2.12122);
>>   m.set(2,0,-7);
>>   m.set(2,1,-8);
>>   m.set(2,2,-4);
>> 
>>   int rank = 3;
>>   System.out.println("******** Starting Eshwaran's Tests *************");
>>   Vector initialVector = new DenseVector(numCols);
>>   initialVector.assign(1d / Math.sqrt(numCols));
>>   LanczosState state = new LanczosState(m, numCols, rank, initialVector);
>>   long time = timeLanczos(m, state, rank, false);
>>   assertTrue("Lanczos taking too long! Are you in the debugger? ", time <
>> 10000);
>>   //assertOrthonormal(eigens);
>>   ////assertEigen(eigens, m, 0.1, false);
>>  }
>> 
>> Note that I had to slightly modify Danny's test code to get it working with
>> the (latest ?) Mahout API.
>> 
>> 
>> I printed out the value of realEigen in LanczosSolver.java. I also
>> commented the normalization step ( //nextVector.assign(new Scale(1.0 /
>> state.getScaleFactor()));
>> 
>> )  as was recommended in that discussion.
>> 
>> My output:
>> 
>> ******** Starting Eshwaran's Tests -I *************
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Finding 3 singular vectors of matrix with 3 rows, via Lanczos
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: 1 passes through the corpus so far...
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: 2 passes through the corpus so far...
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Lanczos iteration complete - now to diagonalize the tri-diagonal
>> auxiliary matrix.
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Eigenvector [0.5536042338073482, 0.7356862573677923,
>> 0.39024105759229055] found with eigenvalue 0.0
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Eigenvector [0.16585402589950515, -0.5566129719645326,
>> 0.8140481813343339] found with eigenvalue 4.755295040050496
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: Eigenvector [0.8160972946919413, -0.38593746923689726,
>> -0.43015982545504394] found with eigenvalue 129.2456107625402
>> Jun 14, 2011 1:54:12 PM org.slf4j.impl.JCLLoggerAdapter info
>> INFO: LanczosSolver finished.
>> 
>> 
>> 
>> 
>> Comparing  with Matlab.
>> 
>> A =
>> 
>>   3.1200   -3.1212   -3.0000
>>  -3.1110    1.5000    2.1212
>>  -7.0000   -8.0000   -4.0000
>> 
>> 
>> [a,b] = eig(A'*A)
>> 
>> a =
>> 
>>   0.2132   -0.8010   -0.5593
>>  -0.5785    0.3578   -0.7330
>>   0.7873    0.4799   -0.3871
>> 
>> 
>> b =
>> 
>>   0.0314         0         0
>>        0   42.6175         0
>>        0         0  131.2552
>> 
>> 
>> 
>> 
>> Note that only one of the Eigen Values matches. Uncommenting out the
>> normalization step obviously ensured that nothing matched. Furthermore,
>> there are sign changes in the eigen vectors and they don't appear to be
>> correctly matched up. For example, the eigen vector corresponding to value
>> (131) in Mahout's case is [0.8160972946919413, -0.38593746923689726,
>> -0.43015982545504394]  which as you can see from the Matlab output is the
>> Eigen Vector associated with 42.61.
>> 
>> 
>> Can someone clarifying what I am missing here ?
>> 
>> Thanks in advance
>> Eshwaran
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 

Reply via email to