[ 
https://issues.apache.org/jira/browse/ARROW-4283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16747321#comment-16747321
 ] 

Paul Taylor commented on ARROW-4283:
------------------------------------

[~pitrou] Thanks for the feedback.

I want to clarify: my Python skills aren't sharp, I'm not familiar with the 
pyarrow API or Python's asyncio/async-iterable primitives, so filter my 
comments through the lens of a beginner.

The little experience I do have is using the RecordBatchStreamReader to read 
from stdin (via {{sys.stdin.buffer}}) and named file descriptors (via 
{{os.fdopen()}}). Since Python's so friendly (and I have no idea how the Python 
IO primitives work), I thought maybe I could pass aiohttp's {{Request.stream}} 
to the RecordBatchStreamReader constructor, and quickly learned that no, I 
can't ;).

In the JS implementation we have two main entry points for reading RecordBatch 
streams:
 # a static 
[{{RecordBatchReader.from(source)}}|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/src/ipc/reader.ts#L142],
 which accepts heterogeneous source types and returns a RecordBatchReader for 
the underlying Arrow type (file, stream, or JSON) and conforms to sync/async 
semantics of the source input type
 # methods that create [through/transform 
streams|https://github.com/apache/arrow/blob/cc1ce6194b905768b1a6d9f0e209270f62dc558a/js/bin/file-to-stream.js#L33]
 from the RecordBatchReader and RecordBatchWriter, for use with node's native 
stream primitives

Each link in the streaming pipeline is a sort of transform stream, and a 
significant amount of effort went into supporting all the different 
node/browser IO primitives, so I understand if that's too much to ask at this 
point.

As an alternative, would it be possible to add a method that accepts a Python 
byte stream, and returns a zero-copy AsyncIterable of RecordBatches? Or maybe 
add an an example in the 
[python/ipc|https://arrow.apache.org/docs/python/ipc.html#writing-and-reading-streams]
 docs page of how to do that?

> Should RecordBatchStreamReader/Writer be AsyncIterable?
> -------------------------------------------------------
>
>                 Key: ARROW-4283
>                 URL: https://issues.apache.org/jira/browse/ARROW-4283
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Paul Taylor
>            Priority: Minor
>             Fix For: 0.13.0
>
>
> Filing this issue after a discussion today with [~xhochy] about how to 
> implement streaming pyarrow http services. I had attempted to use both Flask 
> and [aiohttp|https://aiohttp.readthedocs.io/en/stable/streams.html]'s 
> streaming interfaces because they seemed familiar, but no dice. I have no 
> idea how hard this would be to add -- supporting all the asynciterable 
> primitives in JS was non-trivial.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to