I'm running SVD as:
./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir 
/tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 
--numCols 65458 --numRows  130103
 ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput 
/tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal --maxError 
0.1 --minEigenvalue 10.0

part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and 
largestCleanEigens is 88 MB.  For some reason, this really doesn't feel right.

Is there a guide on interpreting the output of SVD anywhere?  Intuitively, I 
believe the output should be a lot smaller?   I mean that's the point, right?  

I can share the vector if you want.

-Grant

--------------------------
Grant Ingersoll
http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8

Reply via email to