arrow_sparse_csr_matrix.to_numpy() - will return underlying csr components arrow_sparse_csr_matrix.to_tensor().to_numpy() - should return a dense version of original matrix
On Thu, Jul 7, 2022 at 3:12 AM dl <dydx...@yahoo.com> wrote: > Minor separate question. The method pyarrow.SparseCSRMatrix.to_numpy() > doesn't seem to preserve the shape of the matrix. Am I wrong? For example > using the code from my original message, printing the result of > arrow_sparse_csr_matrix.to_numpy() in one case gives: > > (array([[0.91263427], > [0.98520395], > [0.98082576], > [0.97490447], > [0.94312307], > [0.90573414], > [0.95057244], > [0.94955576], > [0.90342821]]), array([0, 9], dtype=int64), array([ 0, 4, 33, 38, > 46, 49, 61, 64, 83], dtype=int64)) > > vs. > > >>> acsr.shape > (1, 100) > > > On 7/6/2022 4:01 PM, dl wrote: > > I have tabular data with one record field of type scipy.sparse.csr_matrix. > I want to convert this tabular data to a pyarrow table. I had been first > converting the csr_matrix first to a custom representation using three > fields (shape, keys, indices) and building the pyarrow table using a schema > with the types of these fields and table data with a separate list for each > field (and each list having one entry per input record). I was hoping I > could use a single pyarrow.SparseCSRMatrix field instead of the custom > three field representation. Is that possible? Incidentally, the shape of > the csr_matrix is typically (1,N) where N may vary for different records. > But I don't think "typically (1,N)" matters. It would work with variable > shape (M,N). The shape field has type pyarrow.List with value_type = > pyarrow.int32(). > > On 7/6/2022 2:53 PM, Rok Mihevc wrote: > > Hey David, > > I don't think Table is designed in a way that you could "populate" it with > a 2D tensor. It should rather be populated with a collection of equal > length arrays. > Sparse CSR tensor on the other hand is composed of three arrays (indices, > indptr, values) and you need a bit more involved logic to manipulate those > than regular arrays. See [1] for memory layout definition. > > What are you looking to accomplish? What access patterns are you expecting? > > Rok > > [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs > > On Wed, Jul 6, 2022 at 10:48 PM dl <dydx...@yahoo.com> wrote: > >> Hi Rok, >> >> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I >> need to build a table with rows which include a field of this type. I don't >> see a related example in the test module. I'm doing something like: >> >> schema = pyarrow.schema(fields, metadata=metadata) >> table = pyarrow.Table.from_arrays(table_data, schema=schema) >> >> where fields is a list of tuples of the form (field_name, pyarrow_type), >> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a >> SparseCSRMatrix field? Or will this not work? >> >> Thanks, >> David >> >> >> On 7/1/2022 9:18 AM, Rok Mihevc wrote: >> >> We lack pyarow sparse tensor documentation (PRs welcome), so tests are >> perhaps most extensive description of what is doable: >> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py >> >> Rok >> >> On Fri, Jul 1, 2022 at 5:38 PM dl via user <user@arrow.apache.org> wrote: >> >>> So, I guess this is supported in 8.0.0. I can do this: >>> >>> import numpy as npimport pyarrow as pafrom scipy.sparse import csr_matrix >>> >>> a = np.random.rand(100) >>> a[a < .9] = 0.0 >>> s = csr_matrix(a) >>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s) >>> >>> Now, how do I use that to build a pyarrow table? Stay tuned... >>> >>> On 7/1/2022 8:19 AM, dl wrote: >>> >>> I find pyarrow.SparseCSRMatrix mentioned here >>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>. >>> But how do I use that? Is there documentation for that class? >>> >>> On 7/1/2022 7:47 AM, dl wrote: >>> >>> >>> Hi, >>> >>> I'm trying to understand support for sparse tensors in Arrow. It looks >>> like there is "experimental" support using the C++ API >>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>. >>> When was this introduced? I see in the code base here >>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> >>> Cython sparse array classes. Can these be accessed using the Python API. >>> Are they included in the 8.0.0 release? Is there any other support for >>> sparse arrays/tensors in the Python API? Are there good examples for any of >>> this, in particular for using the 8.0.0 Python API to create sparse tensors? >>> >>> Thanks, >>> David >>> >>> >>> >>> >>> >> > >