Anthony,
In that case, I think you are hitting the 1GB PostgreSQL limit.
Operations on sparse matrix format requires loading into memory 2 INTEGERS
for row/col plus the value (INTEGER, DOUBLE PRECISION, whatever size it is).
It means for your matrix the 2 INTEGERS alone are ~1.00E+09 bytes which is
already on the limit without even considering the value yet.
So I would suggest you do the computation in blocks. One approach to this:
* chunk your long matrix into n smaller VIEWS, say n=10 (i.e., MADlib
matrix operations do work on VIEWS)
* call matrix*vector for each chunk
* reassemble the n result vectors into the final vector
You could do this in a PL/pgSQL or PL/Python function.
There is one subtlety to be aware of though because you are working with
sparse matrices. For each of the n chunks, if there is no non-zero value in
the 100th column, you will get an error that looks like this:
madlib=# SELECT madlib.matrix_vec_mult('mat_a_view',
NULL,
array[1,2,3,4,5,6,7,8,9,10]
);
ERROR: plpy.Error: Matrix error: Dimension mismatch between matrix (1 x 9)
and vector (10 x 1)
CONTEXT: Traceback (most recent call last):
PL/Python function "matrix_vec_mult", line 24, in <module>
matrix_in, in_args, vector)
PL/Python function "matrix_vec_mult", line 2031, in matrix_vec_mult
PL/Python function "matrix_vec_mult", line 77, in _assert
PL/Python function "matrix_vec_mult"
See the explanation at the top of
http://madlib.apache.org/docs/latest/group__grp__matrix.html
regarding dimensionality of sparse matrices.
One way around this is to add a (fake) row to the bottom of your VIEW with
a 0 in the 100th column. But if you do this, be sure to drop the last
(fake) entry of each of the n intermediate vectors before you assemble into
the final vector.
Frank
On Wed, Jan 3, 2018 at 8:15 PM, Anthony Thomas <[email protected]>
wrote:
> Thanks Frank - the answer to both your questions is "yes"
>
> Best,
>
> Anthony
>
> On Wed, Jan 3, 2018 at 3:13 PM, Frank McQuillan <[email protected]>
> wrote:
>
>>
>> Anthony,
>>
>> Correct the install check error you are seeing is not related.
>>
>> Cpl questions:
>>
>> (1)
>> Are you using:
>>
>> -- Multiply matrix with vector
>> matrix_vec_mult( matrix_in, in_args, vector)
>>
>> (2)
>> Is matrix_in encoded in sparse format like at the top of
>> http://madlib.apache.org/docs/latest/group__grp__matrix.html
>>
>> e.g., like this?
>>
>> row_id | col_id | value
>> --------+--------+-------
>> 1 | 1 | 9
>> 1 | 5 | 6
>> 1 | 6 | 6
>> 2 | 1 | 8
>> 3 | 1 | 3
>> 3 | 2 | 9
>> 4 | 7 | 0
>>
>>
>> Frank
>>
>>
>> On Wed, Jan 3, 2018 at 2:52 PM, Anthony Thomas <[email protected]>
>> wrote:
>>
>>> Okay - thanks Ivan, and good to know about support for Ubuntu from
>>> Greenplum!
>>>
>>> Best,
>>>
>>> Anthony
>>>
>>> On Wed, Jan 3, 2018 at 2:38 PM, Ivan Novick <[email protected]> wrote:
>>>
>>>> Hi Anthony, this does NOT look like a Ubuntu problem, and in fact there
>>>> is OSS Greenplum officially on Ubuntu you can see here:
>>>> http://greenplum.org/install-greenplum-oss-on-ubuntu/
>>>>
>>>> Greenplum and PostgreSQL do limit to 1 Gig for each field (row/col
>>>> combination) but there are techniques to manage data sets working within
>>>> these constraints. I will let someone else who has more experience then me
>>>> working with matrices answer how is the best way to do so in a case like
>>>> you have provided.
>>>>
>>>> Cheers,
>>>> Ivan
>>>>
>>>> On Wed, Jan 3, 2018 at 2:22 PM, Anthony Thomas <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Madlib folks,
>>>>>
>>>>> I have a large tall and skinny sparse matrix which I'm trying to
>>>>> multiply by a dense vector. The matrix is 1.25e8 by 100 with approximately
>>>>> 1% nonzero values. This operations always triggers an error from
>>>>> Greenplum:
>>>>>
>>>>> plpy.SPIError: invalid memory alloc request size 1073741824 (context
>>>>> 'accumArrayResult') (mcxt.c:1254) (plpython.c:4957)
>>>>> CONTEXT: Traceback (most recent call last):
>>>>> PL/Python function "matrix_vec_mult", line 24, in <module>
>>>>> matrix_in, in_args, vector)
>>>>> PL/Python function "matrix_vec_mult", line 2044, in matrix_vec_mult
>>>>> PL/Python function "matrix_vec_mult", line 2001, in
>>>>> _matrix_vec_mult_dense
>>>>> PL/Python function "matrix_vec_mult"
>>>>>
>>>>> Some Googling suggests this error is caused by a hard limit from
>>>>> Postgres which restricts the maximum size of an array to 1GB. If this is
>>>>> indeed the cause of the error I'm seeing does anyone have any suggestions
>>>>> about how to circumvent this issue? This comes up in other cases as well
>>>>> like transposing a tall and skinny matrix. MVM with smaller matrices works
>>>>> fine.
>>>>>
>>>>> Here is relevant version information:
>>>>>
>>>>> SELECT VERSION();
>>>>> PostgreSQL 8.3.23 (Greenplum Database 5.1.0 build dev) on
>>>>> x86_64-pc-linux-gnu, compiled by GCC gcc
>>>>> (Ubuntu 5.4.0-6ubuntu1~16.04.5) 5.4.0 20160609 compiled on Dec 21
>>>>> 2017 09:09:46
>>>>>
>>>>> SELECT madlib.version();
>>>>> MADlib version: 1.12, git revision: unknown, cmake configuration time:
>>>>> Thu Dec 21 18:04:47 UTC 201
>>>>> 7, build type: RelWithDebInfo, build system: Linux-4.4.0-103-generic,
>>>>> C compiler: gcc 4.9.3, C++ co
>>>>> mpiler: g++ 4.9.3
>>>>>
>>>>> Madlib install-check reported one error in the "convex" module related
>>>>> to "loss too high" which seems unrelated to the issue described above. I
>>>>> know Ubuntu isn't officially supported by Greenplum so I'd like to be
>>>>> confident this issue isn't just the result of using an unsupported OS.
>>>>> Please let me know if any other information would be helpful.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Anthony
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ivan Novick, Product Manager Pivotal Greenplum
>>>> [email protected] -- (Mobile) 408-230-6491 <(408)%20230-6491>
>>>> https://www.youtube.com/GreenplumDatabase
>>>>
>>>>
>>>
>>
>