I'm running SVD as: ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 --numCols 65458 --numRows 130103 ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal --maxError 0.1 --minEigenvalue 10.0
part-out.vec is 52 MB. The output from SVD (svdOut) is 104 MB and largestCleanEigens is 88 MB. For some reason, this really doesn't feel right. Is there a guide on interpreting the output of SVD anywhere? Intuitively, I believe the output should be a lot smaller? I mean that's the point, right? I can share the vector if you want. -Grant -------------------------- Grant Ingersoll http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
