Hi, after reading this: https://cwiki.apache.org/confluence/display/MAHOUT/Dimensional+Reduction
This looks familiar to LSA/LSI, but I have some questions: In this example, the "tfidf-vectors"-matrix has 6,076,937 rows and 20,444 columns. My first question is: Do the rows represent the documents and the columns the terms in a traditional term-document-matrix? so after the svd job, you got these 87 eigenvectors with each 20,444 columns (representing the terms). These seem to be the eigenvectors of tfidf-vectors but reduced to only 87 documents? What is this mathematically? and so, why do you calculate tfidf-vectors^T * svdOut^T? I do not find myself an explanation compared to SVD, is the result is the "right singular value"? I know it works, but I don't understand some of these steps. Please help... :) -- Stefan Wienert http://www.wienert.cc ste...@wienert.cc Telefon: +495251-2026838 (neue Nummer seit 20.06.10) Mobil: +49176-40170270