(array([[0.91263427],
[0.98520395],
[0.98082576],
[0.97490447],
[0.94312307],
[0.90573414],
[0.95057244],
[0.94955576],
[0.90342821]]), array([0, 9], dtype=int64), array([ 0, 4, 33, 38, 46, 49, 61, 64, 83], dtype=int64))
vs.
>>> acsr.shape
(1, 100)
On 7/6/2022 4:01 PM, dl wrote:
I have tabular data with one record field of type scipy.sparse.csr_matrix. I want to convert this tabular data to a pyarrow table. I had been first converting the csr_matrix first to a custom representation using three fields (shape, keys, indices) and building the pyarrow table using a schema with the types of these fields and table data with a separate list for each field (and each list having one entry per input record). I was hoping I could use a single pyarrow.SparseCSRMatrix field instead of the custom three field representation. Is that possible? Incidentally, the shape of the csr_matrix is typically (1,N) where N may vary for different records. But I don't think "typically (1,N)" matters. It would work with variable shape (M,N). The shape field has type pyarrow.List with value_type = pyarrow.int32().
On 7/6/2022 2:53 PM, Rok Mihevc wrote:
Hey David,
I don't think Table is designed in a way that you could "populate" it with a 2D tensor. It should rather be populated with a collection of equal length arrays.Sparse CSR tensor on the other hand is composed of three arrays (indices, indptr, values) and you need a bit more involved logic to manipulate those than regular arrays. See [1] for memory layout definition.
What are you looking to accomplish? What access patterns are you expecting?
Rok
On Wed, Jul 6, 2022 at 10:48 PM dl <dydx...@yahoo.com> wrote:
Hi Rok,
What data type would I use for a pyarrow SparseCSRMatrix in a schema? I need to build a table with rows which include a field of this type. I don't see a related example in the test module. I'm doing something like:
schema = pyarrow.schema(fields, metadata=metadata)
table = pyarrow.Table.from_arrays(table_data, schema=schema)
where fields is a list of tuples of the form (field_name, pyarrow_type), e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a SparseCSRMatrix field? Or will this not work?
Thanks,
David
On 7/1/2022 9:18 AM, Rok Mihevc wrote:
We lack pyarow sparse tensor documentation (PRs welcome), so tests are perhaps most extensive description of what is doable: https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
Rok
On Fri, Jul 1, 2022 at 5:38 PM dl via user <user@arrow.apache.org> wrote:
So, I guess this is supported in 8.0.0. I can do this:
import numpy as np import pyarrow as pa from scipy.sparse import csr_matrixa = np.random.rand(100) a[a < .9] = 0.0 s = csr_matrix(a) arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)Now, how do I use that to build a pyarrow table? Stay tuned...
On 7/1/2022 8:19 AM, dl wrote:
I find pyarrow.SparseCSRMatrix mentioned here. But how do I use that? Is there documentation for that class?
On 7/1/2022 7:47 AM, dl wrote:
Hi,
I'm trying to understand support for sparse tensors in Arrow. It looks like there is "experimental" support using the C++ API. When was this introduced? I see in the code base here Cython sparse array classes. Can these be accessed using the Python API. Are they included in the 8.0.0 release? Is there any other support for sparse arrays/tensors in the Python API? Are there good examples for any of this, in particular for using the 8.0.0 Python API to create sparse tensors?
Thanks,
David