The output from the LanczosSolver is not the final set of results. The fact that you passed --cleansvd "true" to the system means that you want it to do some cleanup and remove any spurious singular vector/value pairs (like the zero-eigenvalue case here).
Do you also see log output which looks like "appending {some vector info} to {some path}" on your console output? The vectors printed here should include their eigenvalue and relative approximation error. Your results also highlight a common fact about doing Lanczos on tiny matrices: Lanczos is iterative, and can only iterate up to the overal dimension of your input matrix, but only finds an approximation to the singular vectors / values which gets better as the iterations continue. This is why you are getting a very accurate measure of the top singular value, but progressively worse for lower ones. Try running this on a larger matrix (try 100 x 100), and look at, say, the top 50 singular vector/value pairs. Those should be significantly more accurate, and the only reason you would want to do a *distributed* SVD is if a) your data is HUGE, and b) you're only wanting to look at the top few (up to maybe hundreds) singular vector/value pairs. Point b) is a point of practicality if you have point a). -jake On Tue, Nov 8, 2011 at 8:56 AM, Ed Fine <edward.f...@gmail.com> wrote: > I am a Mahout newbie so please take this so I might be wrong, but I > strongly suspect it has to do with one of your Eigenvalues being 0. That > implies a singular matrix. You will see that your first two Eigenvalues are > equal to the singular values. Parsing the structure in smaller eiganvals > get numerically unstable in a near singular matrix. I bet that is your > issue. I think you can find a description of this issue in Numerical > Linear Algebra by trephethan (spelling?) and Bau. > > On Nov 8, 2011, at 4:11 AM, motta <motta....@gmail.com> wrote: > > > Hi everybody, > > I have completed my first Mahout experiment with an Hadoop local > > installation (single machine) and I obtained different results from > Scilab > > and the Mahout Distributed Lanczos Solver. Could someone explain why this > > happens? Am I doing something wrong? > > > > This is my matrix > > 2,0,8,6,0 > > 1,6,0,1,7 > > 5,0,7,4,0 > > 7,0,8,5,0 > > 0,10,0,0,7 > > > > This is my Mahout invocation > > ./hadoop jar > > > /home/hadoop-user/mahout/mahout-distribution-0.5/mahout-examples-0.5-job.jar > > org.apache.mahout.math.hadoop.decomposer.DistributedLanczosSolver --input > > /user/hadoop-user/mahout-input --output /user/hadoop-user/mahout-output > > --numCols 5 --numRows 5 --cleansvd "true" --rank 5 > > > > These are the Mahout results > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: 4 passes through the > corpus so > > far... > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Lanczos iteration complete > - > > now to diagonalize the tri-diagonal auxiliary matrix. > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 0 found with > > eigenvalue 0.0 > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 1 found with > > eigenvalue 1.0869992925693057 > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 2 found with > > eigenvalue 3.4305998309907 > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 3 found with > > eigenvalue 15.171371217397603 > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: Eigenvector 4 found with > > eigenvalue 17.918370809987454 > > 11/11/08 12:45:04 INFO lanczos.LanczosSolver: LanczosSolver finished. > > > > And these are the results from Scilab (svd(X)) > > -->[U,S,V]=svd(X); > > -->S > > S = > > > > 17.918371 0. 0. 0. 0. > > 0. 15.171372 0. 0. 0. > > 0. 0. 3.564002 0. 0. > > 0. 0. 0. 1.9842282 0. > > 0. 0. 0. 0. 0.3495557 > > > > thank you, > > Alfredo > > > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Comparing-results-of-Mahout-SVD-and-Scilab-tp3490066p3490066.html > > Sent from the Mahout User List mailing list archive at Nabble.com. >