Hi there,

I'm writing a document analyzing different options for a Python dataframe
exchange protocol. And I wanted to ask a question regarding the
__arrow_array__ protocol.

I checked the code, and looks like the producer is expected to be sending
an Arrow array, and the consumer just receives it. This is the code I'm
checking, I guess it's the right one:
https://github.com/apache/arrow/blob/master/python/pyarrow/array.pxi#L110

Compared to the array interface (the NumPy buffer protocol), it works a bit
differently. In the NumPy one, the producer exposes the pointer, the
size... So, the producer doesn't need to depend on NumPy or any other
library, and then the consumer can simply use `numpy.array(obj)` and
generate the NumPy array. Or if other implementations support the protocol
(not sure if they do), they could call something like
`tensorflow.Tensor(obj)`, and NumPy would not be used at all.

Am I understanding correctly the `__arrow_array__` protocol? And if I am,
is there anything else similar to the NumPy protocol that can be used to
exchange data without relying on a particular implementation?

Thanks in advance!

Reply via email to