In particular, since our sparse representation requires an int (4 bytes) and
a double (8 bytes) to store one non-zero entry while a dense row requires
only 8 bytes per entry then your original data would require less storage if
it has less than 200 * 8 / 12 = 133 non-zero
entries per row on average.  Depending on the data-set, this could be very
likely or totally implausible.

SVD is still useful in these cases because it can provide useful smoothing.


On Sun, Aug 29, 2010 at 3:29 PM, Akshay Bhat <[email protected]> wrote:

> Even though the SVD is supposed to reduce dimensionality it does not means
> that your results will have smaller size [in terms of memory], since U , S
> and V are dense matrices. except if you are using too few eigenvectors.
> Your
> input matrix is a sparse, had it been represented as a dense matrix it
> would
> have far large size.
>
>
> On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <[email protected]
> >wrote:
>
> > Should be noted, that cranking the rank down to 20 produces a
> significantly
> > smaller result.
> >
> >
> > On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote:
> >
> > > I'm running SVD as:
> > > ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir
> > /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200
> > --numCols 65458 --numRows  130103
> > >  ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut --corpusInput
> > /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal
> > --maxError 0.1 --minEigenvalue 10.0
> > >
> > > part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and
> > largestCleanEigens is 88 MB.  For some reason, this really doesn't feel
> > right.
> > >
> > > Is there a guide on interpreting the output of SVD anywhere?
> >  Intuitively, I believe the output should be a lot smaller?   I mean
> that's
> > the point, right?
> > >
> > > I can share the vector if you want.
> > >
> > > -Grant
> > >
> > > --------------------------
> > > Grant Ingersoll
> > > http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
> > >
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct
> 7-8
> >
> >
>
>
> --
> Akshay Uday Bhat.
> Graduate Student, Computer Science, Cornell University
> Website: http://www.akshaybhat.com
>

Reply via email to