If `l` is a plain list there, I don't think it's possible. The __arrow_array__ 
protocol relies on you to have a type that you can define the method on. I also 
don't think there are other customization hooks for pa.array() but maybe 
someone else knows better.

On Tue, Jul 12, 2022, at 17:18, dl via user wrote:
> Hi David,
> 
> Are there any good examples for the first section 
> <https://arrow.apache.org/docs/python/extending_types.html#controlling-conversion-to-pyarrow-array-with-the-arrow-array-protocol>
>  of your reference [1]: Controlling conversion to pyarrow.Array with the 
> __arrow_array__ protocol?
> 
> I find examples of creating an extension array using an extension type with 
> explicit code in test_extension_type.py 
> <https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_extension_type.py>,
>  e.g. in test_ext_array_basics. I'm thinking it might be possible to have the 
> array type inferred by pyarrow.array() or pyarrow.Table.from_arrays() using a 
> extension array type as suggested there. Am I right about this? If so is 
> there a good example? I haven't been able to get this to work.
> 
> For the record, here is what I can do.
> 
> l = list()
> *for *i *in *range(4):
>     s = csr_matrix(random_dense())
>     struct = [(*'shape'*, s.shape),
>               (*'keys'*, s.data),
>               (*'indexes'*, s.indices)]
>     l.append(struct)*
*struct_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                           (*'keys'*, pa.list_(pa.float64())),
>                           (*'indexes'*, pa.list_(pa.int64()))])
> arrow_array = pa.array(l,struct_type)
> extension_array = pa.ExtensionArray.from_storage(SparseStructType(), 
> arrow_array)
> 
> *class *SparseStructType(pa.PyExtensionType):
>     storage_type = pa.struct([(*'shape'*, pa.list_(pa.int32())),
>                               (*'keys'*, pa.list_(pa.float64())),
>                               (*'indexes'*, pa.list_(pa.int64()))])
>     *def *__init__(self):
>         pa.PyExtensionType.__init__(self,self.storage_type)
> 
>     *def *__reduce__(self):
>         *return *SparseStructType, ()
> 
> I would like to be able to do something like
> 
> 
> extension_array = pa.array(l,SparseStructType())
> 
> having the extension type of the array inferred by pa.array. Is that possible?
> 
> Thanks,
> David
> 
> 
> On 7/6/2022 4:26 PM, David Li wrote:
>> If I'm not mistaken, what you want is basically an extension type [1] for 
>> tensors, so you can have a column where each row contains a tensor/matrix. 
>> This has been discussed for quite some time [2].
>> 
>> Incidentally, you can keep the three-field representation but pack it into a 
>> single toplevel field with the Struct type. 
>> 
>> [1]: https://arrow.apache.org/docs/python/extending_types.html
>> [2]: https://issues.apache.org/jira/browse/ARROW-1614
>> 
>> On Wed, Jul 6, 2022, at 19:01, dl via user wrote:
>>> I have tabular data with one record field of type scipy.sparse.csr_matrix. 
>>> I want to convert this tabular data to a pyarrow table. I had been first 
>>> converting the csr_matrix first to a custom representation using three 
>>> fields (shape, keys, indices) and building the pyarrow table using a schema 
>>> with the types of these fields and table data with a separate list for each 
>>> field (and each list having one entry per input record). I was hoping I 
>>> could use a single pyarrow.SparseCSRMatrix field  instead of the custom 
>>> three field representation. Is that possible? Incidentally, the shape of 
>>> the csr_matrix is typically (1,N) where N may vary for different records. 
>>> But I don't think "typically (1,N)" matters. It would work with variable 
>>> shape (M,N). The shape field has type pyarrow.List with value_type = 
>>> pyarrow.int32().
>>> 
>>> 
>>> On 7/6/2022 2:53 PM, Rok Mihevc wrote:
>>>> Hey David, 
>>>> 
>>>> I don't think Table is designed in a way that you could "populate" it with 
>>>> a 2D tensor. It should rather be populated with a collection of equal 
>>>> length arrays.
>>>> Sparse CSR tensor on the other hand is composed of three arrays (indices, 
>>>> indptr, values) and you need a bit more involved logic to manipulate those 
>>>> than regular arrays. See [1] for memory layout definition.
>>>> 
>>>> What are you looking to accomplish? What access patterns are you expecting?
>>>> 
>>>> Rok
>>>> 
>>>> [1] https://github.com/apache/arrow/blob/master/format/SparseTensor.fbs
>>>> 
>>>> On Wed, Jul 6, 2022 at 10:48 PM dl <[email protected]> wrote:
>>>>> Hi Rok,
>>>>> 
>>>>> What data type would I use for a pyarrow SparseCSRMatrix in a schema? I 
>>>>> need to build a table with rows which include a field of this type. I 
>>>>> don't see a related example in the test module. I'm doing something like:
>>>>> 
>>>>> schema = pyarrow.schema(fields, metadata=metadata)
>>>>> table = pyarrow.Table.from_arrays(table_data, schema=schema)
>>>>> 
>>>>> where fields is a list of tuples of the form (field_name, pyarrow_type), 
>>>>> e.g. ('field1', pyarrow.string()). What should pyarrow_type be for a 
>>>>> SparseCSRMatrix field? Or will this not work?
>>>>> 
>>>>> Thanks,
>>>>> David
>>>>> 
>>>>> 
>>>>> 
>>>>> On 7/1/2022 9:18 AM, Rok Mihevc wrote:
>>>>>> We lack pyarow sparse tensor documentation (PRs welcome), so tests are 
>>>>>> perhaps most extensive description of what is doable: 
>>>>>> https://github.com/apache/arrow/blob/master/python/pyarrow/tests/test_sparse_tensor.py
>>>>>>  
>>>>>> 
>>>>>> Rok
>>>>>> 
>>>>>> On Fri, Jul 1, 2022 at 5:38 PM dl via user <[email protected]> wrote:
>>>>>>> So, I guess this is supported in 8.0.0. I can do this:
>>>>>>> 
>>>>>>> *import *numpy *as *np
>>>>>>> *import *pyarrow *as *pa
>>>>>>> *from *scipy.sparse *import *csr_matrix
>>>>>>> 
>>>>>>> 
>>>>>>> a = np.random.rand(100)
>>>>>>> a[a < .9] = 0.0
>>>>>>> s = csr_matrix(a)
>>>>>>> arrow_sparse_csr_matrix = pa.SparseCSRMatrix.from_scipy(s)
>>>>>>> 
>>>>>>> 
>>>>>>> Now, how do I use that to build a pyarrow table? Stay tuned...
>>>>>>> 
>>>>>>> 
>>>>>>> On 7/1/2022 8:19 AM, dl wrote:
>>>>>>>> I find pyarrow.SparseCSRMatrix mentioned here 
>>>>>>>> <https://arrow.apache.org/docs/python/integration/extending.html?highlight=sparse#pyarrow.pyarrow_wrap_sparse_csr_matrix>.
>>>>>>>>  But how do I use that? Is there documentation for that class?
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 7/1/2022 7:47 AM, dl wrote:
>>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> I'm trying to understand support for sparse tensors in Arrow. It 
>>>>>>>>> looks like there is "experimental" support using the C++ API 
>>>>>>>>> <https://arrow.apache.org/docs/cpp/api/tensor.html?highlight=sparse#sparse-tensors>.
>>>>>>>>>  When was this introduced? I see in the code base here 
>>>>>>>>> <https://github.com/apache/arrow/blob/master/python/pyarrow/tensor.pxi>
>>>>>>>>>  Cython sparse array classes. Can these be accessed using the Python 
>>>>>>>>> API. Are they included in the 8.0.0 release? Is there any other 
>>>>>>>>> support for sparse arrays/tensors in the Python API? Are there good 
>>>>>>>>> examples for any of this, in particular for using the 8.0.0 Python 
>>>>>>>>> API to create sparse tensors?
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>> 
>>>>>>>>> 
>> 

Reply via email to