The problem is that SparseMatrix does not iterate through the matrix
in a sparse manner. It uses the algorithms from AbstractMatrix which
iterate directly through the rows & columns. This is appropriate for
DenseMatrix, not SparseMatrix.

The way to handle this is to recast the Matrix/Matrix options via the
methods that get row and column vectors and walk them via the
iterators.

Multiplying by +1/-1 does not alter the runtime. SparseMatrix has
special encoding for 0-value entries. If you do a special version of
SparseMatrix that also encodes for +1/-1 entries, then your batch jobs
would really fly.

Are +1/-1/0/double matrices common in algorithms? Would this be
commonly used in Mahout?

Lance

On Sun, Jun 26, 2011 at 3:13 PM, Vincent Xue <xue....@gmail.com> wrote:
> NonZero Elements: >1,000,000
> Input Matrix Size: 30,000 x 480,000 and 480,000 x 30,000
>
> Reading a matrix is dependent on how many non zero elements there are.
> To read each element of the matrix, including non zero, would probably
> take just as long as it currently does using the
> SparseMatrix.getQuick().
>
> I will test this implementation using larger, denser matrices. So far,
> I have been using it only with my design matrix but I will generate
> some denser random matrices and test the performance soon.
>
> Vincent
>
> On Sun, Jun 26, 2011 at 10:29 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
>> Regarding speed:
>>
>> How many non-zero elements?
>>
>> What is the size of your input matrices?
>>
>> How long does it take to read the matrices without doing any multiplication?
>>
>> Your test matrices seem small for big sparse matrices.
>>
>> This sort of thing could be very useful.
>>
>> On Sun, Jun 26, 2011 at 1:47 PM, Vincent Xue <xue....@gmail.com> wrote:
>>
>>> Hi.
>>>
>>> I was wondering how useful an in memory sparse matrix multiplier would
>>> be for mahout. In my current project I needed to multiply many large
>>> sparse matrices but submitting hundreds of jobs to the cluster creates
>>> too much overhead.
>>>
>>> I wrote up an implementation of sparse matrix multiplication using
>>> threads which can multiply a 30,000 x 48,0000 matrix by its transpose
>>> in about 5 minutes using 16 cores. Granted this matrix is composed
>>> mostly of 1s, and -1s, (with about 16 elements per row), is this
>>> considered fast? I have seen that my implementation is much faster
>>> than iterating though a matrix naively and would like some input to
>>> whether or not my 5 minute benchmark is by skewed.
>>>
>>> Many thanks for the input,
>>> Vincent
>>>
>>
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to