Thanks for that; hopefully those links should be fruitful.

Question, before I get started, is there an API exposed so that I could do
the reverse of this:

import asyncioimport pathlibimport struct
import grpcimport pyarrow as paimport pyarrow.flight as pf
import Flight_pb2, Flight_pb2_grpc
async def main():
    ticket = pf.Ticket("tick")
    async with grpc.aio.insecure_channel("localhost:1234") as channel:
        stub = Flight_pb2_grpc.FlightServiceStub(channel)
        schema = None
        async for data in stub.DoGet(Flight_pb2.Ticket(ticket=ticket.ticket)):
            # 4 bytes: Need IPC continuation token
            token = b'\xff\xff\xff\xff'
            # 4 bytes: message length (little-endian)
            length = struct.pack('<I', len(data.data_header))
            buf = pa.py_buffer(token + length + data.data_header +
data.data_body)
            message = pa.ipc.read_message(buf)
            print(message)
            if schema is None:
                # This should work but is unimplemented
                # print(pa.ipc.read_schema(message))
                schema = pa.ipc.read_schema(buf)
                print(schema)
            else:
                batch = pa.ipc.read_record_batch(message, schema)
                print(batch)
                print(batch.to_pydict())

asyncio.run(main())


On Tue, 8 Feb 2022 at 18:33, David Li <[email protected]> wrote:

> Unfortunately Flight wraps the C++ Flight implementation, which uses
> gRPC/C++, which is mostly a separate library entirely from grpcio and does
> not benefit from any improvements there. (The two do share a common network
> stack, but that's all; also, grpcio doesn't expose any of the lower level
> APIs that might make it possible to combine the two somehow.)
>
> You might ask why pyarrow.flight didn't use grpcio directly (with bindings
> to transmute from FlightData to RecordBatch). However at the time the
> thought is that we would also have non-gRPC transports (which are finally
> being worked on) and so a from-scratch grpcio/Python implementation was not
> desirable.
>
> That said there are issues filed about better documenting FlightData. See
> ARROW-15287[1] which links a StackOverflow answer that demonstrates how to
> glue together asyncio/grpcio/PyArrow.
>
> There's also some previous discussion about adding async to Flight more
> generally [2].
>
> [1]: https://issues.apache.org/jira/browse/ARROW-15287
> [2]: "[C++] Async Arrow Flight" 2021/06/02
> https://lists.apache.org/thread/jrj6yx53gyj0tr18pfdghtb8krp4gpfv
>
> -David
>
> On Tue, Feb 8, 2022, at 13:24, R KB wrote:
>
> GRPC has pretty good AsyncIO support at this point, and since Flight is
> essentially a wrapper around some GRPC types: why can't we just expose
> something that generates FlightData grpc objects?
>
>
>
>
>
>
>

Reply via email to