Thanks to all for the explanations.  

On Aug 29, 2010, at 7:49 PM, Ted Dunning wrote:

> Like Jake said.
> 
> On Sun, Aug 29, 2010 at 4:48 PM, Ted Dunning <[email protected]> wrote:
> 
>> 
>> In particular, since our sparse representation requires an int (4 bytes)
>> and a double (8 bytes) to store one non-zero entry while a dense row
>> requires only 8 bytes per entry then your original data would require less
>> storage if it has less than 200 * 8 / 12 = 133 non-zero
>> entries per row on average.  Depending on the data-set, this could be very
>> likely or totally implausible.
>> 
>> SVD is still useful in these cases because it can provide useful smoothing.
>> 
>> 
>> On Sun, Aug 29, 2010 at 3:29 PM, Akshay Bhat <[email protected]>wrote:
>> 
>>> Even though the SVD is supposed to reduce dimensionality it does not means
>>> that your results will have smaller size [in terms of memory], since U , S
>>> and V are dense matrices. except if you are using too few eigenvectors.
>>> Your
>>> input matrix is a sparse, had it been represented as a dense matrix it
>>> would
>>> have far large size.
>>> 
>>> 
>>> On Sun, Aug 29, 2010 at 5:13 PM, Grant Ingersoll <[email protected]
>>>> wrote:
>>> 
>>>> Should be noted, that cranking the rank down to 20 produces a
>>> significantly
>>>> smaller result.
>>>> 
>>>> 
>>>> On Aug 29, 2010, at 4:38 PM, Grant Ingersoll wrote:
>>>> 
>>>>> I'm running SVD as:
>>>>> ./mahout svd --input /tmp/solr-clust-n2/part-out.vec --tempDir
>>>> /tmp/solr-clust-n2/svdTemp --output /tmp/solr-clust-n2/svdOut --rank 200
>>>> --numCols 65458 --numRows  130103
>>>>> ./mahout cleansvd --eigenInput /tmp/solr-clust-n2/svdOut
>>> --corpusInput
>>>> /tmp/solr-clust-n2/part-out.vec --output /tmp/solr-clust-n2/svdFinal
>>>> --maxError 0.1 --minEigenvalue 10.0
>>>>> 
>>>>> part-out.vec is 52 MB.  The output from SVD  (svdOut) is 104 MB and
>>>> largestCleanEigens is 88 MB.  For some reason, this really doesn't feel
>>>> right.
>>>>> 
>>>>> Is there a guide on interpreting the output of SVD anywhere?
>>>> Intuitively, I believe the output should be a lot smaller?   I mean
>>> that's
>>>> the point, right?
>>>>> 
>>>>> I can share the vector if you want.
>>>>> 
>>>>> -Grant
>>>>> 
>>>>> --------------------------
>>>>> Grant Ingersoll
>>>>> http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
>>>>> 
>>>> 
>>>> --------------------------
>>>> Grant Ingersoll
>>>> http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct
>>> 7-8
>>>> 
>>>> 
>>> 
>>> 
>>> --
>>> Akshay Uday Bhat.
>>> Graduate Student, Computer Science, Cornell University
>>> Website: http://www.akshaybhat.com
>>> 
>> 
>> 

--------------------------
Grant Ingersoll
http://lucenerevolution.org Apache Lucene/Solr Conference, Boston Oct 7-8

Reply via email to