In particular, since our sparse representation requires an int (4 bytes) and a double (8 bytes) to store one non-zero entry while a dense row requires only 8 bytes per entry then your original data would require less storage if it has less than 200 * 8 / 12 = 133 non-zero entries per row on average. Depending on the data-set, this could be very likely or totally implausible.
SVD is still useful in these cases because it can provide useful smoothing. On Sun, Aug 29, 2010 at 3:29 PM, Akshay Bhat <[email protected]> wrote: > Even though the SVD is supposed to reduce dimensionality it does not means > that your results will have smaller size [in terms of memory], since U , S > and V are dense matrices. except if you are using too few eigenvectors. > Your > input matrix is a sparse, had it been represented as a dense matrix it > would > have far large size. > > > On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <[email protected] > >wrote: > > > Should be noted, that cranking the rank down to 20 produces a > significantly > > smaller result. > > > > > > On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote: > > > > > I'm running SVD as: > > > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir > > /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 > > --numCols 65458 --numRows 130103 > > > ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput > > /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal > > --maxError 0.1 --minEigenvalue 10.0 > > > > > > part-out.vec is 52 MB. The output from SVD (svdOut) is 104 MB and > > largestCleanEigens is 88 MB. For some reason, this really doesn't feel > > right. > > > > > > Is there a guide on interpreting the output of SVD anywhere? > > Intuitively, I believe the output should be a lot smaller? I mean > that's > > the point, right? > > > > > > I can share the vector if you want. > > > > > > -Grant > > > > > > -------------------------- > > > Grant Ingersoll > > > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 > > > > > > > -------------------------- > > Grant Ingersoll > > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct > 7-8 > > > > > > > -- > Akshay Uday Bhat. > Graduate Student, Computer Science, Cornell University > Website: http://www.akshaybhat.com >
