Like Jake said. On Sun, Aug 29, 2010 at 4:48 PM, Ted Dunning <[email protected]> wrote:
> > In particular, since our sparse representation requires an int (4 bytes) > and a double (8 bytes) to store one non-zero entry while a dense row > requires only 8 bytes per entry then your original data would require less > storage if it has less than 200 * 8 / 12 = 133 non-zero > entries per row on average. Depending on the data-set, this could be very > likely or totally implausible. > > SVD is still useful in these cases because it can provide useful smoothing. > > > On Sun, Aug 29, 2010 at 3:29 PM, Akshay Bhat <[email protected]>wrote: > >> Even though the SVD is supposed to reduce dimensionality it does not means >> that your results will have smaller size [in terms of memory], since U , S >> and V are dense matrices. except if you are using too few eigenvectors. >> Your >> input matrix is a sparse, had it been represented as a dense matrix it >> would >> have far large size. >> >> >> On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <[email protected] >> >wrote: >> >> > Should be noted, that cranking the rank down to 20 produces a >> significantly >> > smaller result. >> > >> > >> > On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote: >> > >> > > I'm running SVD as: >> > > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir >> > /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200 >> > --numCols 65458 --numRows 130103 >> > > ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut >> --corpusInput >> > /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal >> > --maxError 0.1 --minEigenvalue 10.0 >> > > >> > > part-out.vec is 52 MB. The output from SVD (svdOut) is 104 MB and >> > largestCleanEigens is 88 MB. For some reason, this really doesn't feel >> > right. >> > > >> > > Is there a guide on interpreting the output of SVD anywhere? >> > Intuitively, I believe the output should be a lot smaller? I mean >> that's >> > the point, right? >> > > >> > > I can share the vector if you want. >> > > >> > > -Grant >> > > >> > > -------------------------- >> > > Grant Ingersoll >> > > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8 >> > > >> > >> > -------------------------- >> > Grant Ingersoll >> > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct >> 7-8 >> > >> > >> >> >> -- >> Akshay Uday Bhat. >> Graduate Student, Computer Science, Cornell University >> Website: http://www.akshaybhat.com >> > >
