If `l` is a plain list there, I don't think it's possible. The __arrow_array__ protocol relies on you to have a type that you can define the method on. I also don't think there are other customization hooks for pa.array() but maybe someone else knows better.
On Tue, Jul 12, 2022, at 17:18, dl via user wrote: > Hi David, > > Are there any good examples for the first section > <https://arrow.apache.org/docs/python/extending_types.html#controlling-conversion-to-pyarrow-array-with-the-arrow-array-protocol> > of your reference [1]: Controlling conversion to pyarrow.Array with the > __arrow_array__ protocol? > > I find examples of creating an extension array using an extension type with > explicit code in test_extension_type.py > <https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py>, > e.g. in test_ext_array_basics. I'm thinking it might be possible to have the > array type inferred by pyarrow.array() or pyarrow.Table.from_arrays() using a > extension array type as suggested there. Am I right about this? If so is > there a good example? I haven't been able to get this to work. > > For the record, here is what I can do. > > l = list() > *for *i *in *range(4): > s = csr_matrix(random_dense()) > struct = [(*'shape'*, s.shape), > (*'keys'*, s.data), > (*'indexes'*, s.indices)] > l.append(struct)* *struct_type = pa.struct([(*'shape'*, pa.list_(pa.int32())), > (*'keys'*, pa.list_(pa.float64())), > (*'indexes'*, pa.list_(pa.int64()))]) > arrow_array = pa.array(l,struct_type) > extension_array = pa.ExtensionArray.from_storage(SparseStructType(), > arrow_array) > > *class *SparseStructType(pa.PyExtensionType): > storage_type = pa.struct([(*'shape'*, pa.list_(pa.int32())), > (*'keys'*, pa.list_(pa.float64())), > (*'indexes'*, pa.list_(pa.int64()))]) > *def *__init__(self): > pa.PyExtensionType.__init__(self,self.storage_type) > > *def *__reduce__(self): > *return *SparseStructType, () > > I would like to be able to do something like > > > extension_array = pa.array(l,SparseStructType()) > > having the extension type of the array inferred by pa.array. Is that possible? > > Thanks, > David > > > On 7/6/2022 4:26 PM, David Li wrote: >> If I'm not mistaken, what you want is basically an extension type [1] for >> tensors, so you can have a column where each row contains a tensor/matrix. >> This has been discussed for quite some time [2]. >> >> Incidentally, you can keep the three-field representation but pack it into a >> single toplevel field with the Struct type. >> >> [1]: https://arrow.apache.org/docs/python/extending_types.html >> [2]: https://issues.apache.org/jira/browse/ARROW-1614 >> >> On Wed, Jul 6, 2022, at 19:01, dl via user wrote: >>> I have tabular data with one record field of type scipy.sparse.csr_matrix. >>> I want to convert this tabular data to a pyarrow table. I had been first >>> converting the csr_matrix first to a custom representation using three >>> fields (shape, keys, indices) and building the pyarrow table using a schema >>> with the types of these fields and table data with a separate list for each >>> field (and each list having one entry per input record). I was hoping I >>> could use a single pyarrow.SparseCSRMatrix field instead of the custom >>> three field representation. Is that possible? Incidentally, the shape of >>> the csr_matrix is typically (1,N) where N may vary for different records. >>> But I don't think "typically (1,N)" matters. It would work with variable >>> shape (M,N). The shape field has type pyarrow.List with value_type = >>> pyarrow.int32(). >>> >>> >>> On 7/6/2022 2:53 PM, Rok Mihevc wrote: >>>> Hey David, >>>> >>>> I don't think Table is designed in a way that you could "populate" it with >>>> a 2D tensor. It should rather be populated with a collection of equal >>>> length arrays. >>>> Sparse CSR tensor on the other hand is composed of three arrays (indices, >>>> indptr, values) and you need a bit more involved logic to manipulate those >>>> than regular arrays. See [1] for memory layout definition. >>>> >>>> What are you looking to accomplish? What access patterns are you expecting? >>>> >>>> Rok >>>> >>>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs >>>> >>>> On Wed, Jul 6, 2022 at 10:48 PM dl <[email protected]> wrote: >>>>> Hi Rok, >>>>> >>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I >>>>> need to build a table with rows which include a field of this type. I >>>>> don't see a related example in the test module. I'm doing something like: >>>>> >>>>> schema = pyarrow.schema(fields, metadata=metadata) >>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema) >>>>> >>>>> where fields is a list of tuples of the form (field_name, pyarrow_type), >>>>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a >>>>> SparseCSRMatrix field? Or will this not work? >>>>> >>>>> Thanks, >>>>> David >>>>> >>>>> >>>>> >>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote: >>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are >>>>>> perhaps most extensive description of what is doable: >>>>>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py >>>>>> >>>>>> >>>>>> Rok >>>>>> >>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <[email protected]> wrote: >>>>>>> So, I guess this is supported in 8.0.0. I can do this: >>>>>>> >>>>>>> *import *numpy *as *np >>>>>>> *import *pyarrow *as *pa >>>>>>> *from *scipy.sparse *import *csr_matrix >>>>>>> >>>>>>> >>>>>>> a = np.random.rand(100) >>>>>>> a[a < .9] = 0.0 >>>>>>> s = csr_matrix(a) >>>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s) >>>>>>> >>>>>>> >>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned... >>>>>>> >>>>>>> >>>>>>> On 7/1/2022 8:19 AM, dl wrote: >>>>>>>> I find pyarrow.SparseCSRMatrix mentioned here >>>>>>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>. >>>>>>>> But how do I use that? Is there documentation for that class? >>>>>>>> >>>>>>>> >>>>>>>> On 7/1/2022 7:47 AM, dl wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It >>>>>>>>> looks like there is "experimental" support using the C++ API >>>>>>>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>. >>>>>>>>> When was this introduced? I see in the code base here >>>>>>>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi> >>>>>>>>> Cython sparse array classes. Can these be accessed using the Python >>>>>>>>> API. Are they included in the 8.0.0 release? Is there any other >>>>>>>>> support for sparse arrays/tensors in the Python API? Are there good >>>>>>>>> examples for any of this, in particular for using the 8.0.0 Python >>>>>>>>> API to create sparse tensors? >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> David >>>>>>>>> >>>>>>>>> >>
