Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-24 Thread Jacques Nadeau
Hey All, I used my Strata talk today as a forcing function to make additional progress on a GRPC-based Arrow RPC protocol [1]. I’m calling it “Apache Arrow Flight”. You can take a look at the work here [2]. I’ll work to clean up my work and explain my thoughts about the protocol in the coming day

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-09 Thread Wes McKinney
hi folks, I left some feedback on this PR. If others could take a look (particularly at the .proto service definition) that would be useful. We should decide on an approach to getting multiple production-worthy Flight/RPC implementations ready to go. It would be a good goal to deliver (end-to-end

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-10 Thread Antoine Pitrou
Hi, Sorry, I'll be limiting myself to very high-level remarks, but I have a bit of trouble understanding the whole design. - Why is it called "Arrow RPC" while it doesn't seem to be providing any kind of RPC service to the user? - Is the server supposed to store all "streams" (which mean data,

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-10 Thread Wes McKinney
hi, On Fri, Aug 10, 2018 at 3:57 AM, Antoine Pitrou wrote: > > Hi, > > Sorry, I'll be limiting myself to very high-level remarks, but I have a > bit of trouble understanding the whole design. > > - Why is it called "Arrow RPC" while it doesn't seem to be providing any > kind of RPC service to the

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-16 Thread Jacques Nadeau
I'm out of town this week (vacation) and will be reviewing your feedback next week. Thanks for the feedback! On Thu, Aug 9, 2018, 8:45 PM Wes McKinney wrote: > hi folks, > > I left some feedback on this PR. If others could take a look > (particularly at the .proto service definition) that would

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-16 Thread Wes McKinney
To give some extra color on my personal motivation for interest in Arrow Flight: Systems that expose databases on a network frequently send data very slowly. For example, ODBC is in general extremely slow. What I would like to see is servers that can expose a "sql" action type. So, in considerati

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-16 Thread Julian Hyde
If your use case is SQL RPC, then you are getting close to Avatica's territory. Avatica[1] is a protocol for implementing language-independent JDBC and ODBC stacks. Now, I agree that many ODBC implementations are inefficient. Some ODBC stacks make more round trips than necessary, and do more copyi

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-16 Thread Wes McKinney
hi Julian, Thanks for chiming in. On Thu, Aug 16, 2018 at 1:16 PM, Julian Hyde wrote: > If your use case is SQL RPC, then you are getting close to Avatica's > territory. Avatica[1] is a protocol for implementing > language-independent JDBC and ODBC stacks. I'm not proposing to develop a SQL RPC

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-24 Thread Li Jin
One question I have is around the choice of using protobufs - It seems that flatbuffers has better support for zero-copy and works with grpc as well. What's the rational behind picking protobuf over flatbuffer? On Thu, Aug 16, 2018 at 7:41 PM Wes McKinney wrote: > hi Julian, > > Thanks for chimi

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-25 Thread Wes McKinney
Hi Li -- Protobuf is the "native" wire format for GRPC [1]. You can use Flatbuffers with it, too [2], but if we are aiming for fairly broad support at the RPC level then using Protobuf is probably a safer bet. One question might be "Well, Arrow already uses Flatbuffers". That's true, but a system

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-26 Thread Jacques Nadeau
Wes nailed my thinking. There are autobindings for every language for the envelope if you use protobuf meaning someone can send/receive an arrow stream without having to know how to read the arrow stream. On Sat, Aug 25, 2018 at 6:00 PM Wes McKinney wrote: > Hi Li -- Protobuf is the "native" wir

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-08-27 Thread Li Jin
Thank you both for the explanation, it makes sense. Another feedback I have is around flight.proto - some of the message (such as FlightDescriptor and FlightPutInstruction) is not very clear to me - it would be helpful to get some more explanation for those here or on the PR. Thanks! Li On Sun,

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-09-01 Thread Wes McKinney
Jacques added more comments and expanded the protocol a bit. I've been working this past week on an initial C++ implementation: https://github.com/wesm/arrow/tree/flight-cpp-prototype I hope to have a PR up for review this coming week. There are a number of details (particularly the abstract serv

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-24 Thread Jacques Nadeau
FYI, if you want to see an example server you can run with a GRPC generated client, you can run the ExampleFlightServer located at [1]. Very basic 'test' with that class and client is located at [2]. [1] https://github.com/jacques-n/arrow/tree/flight/java/flight/src/main/java/org/apache/arrow/flig

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-27 Thread Wes McKinney
hey Jacques, This is great news, I look forward to digging into this. My biggest initial question is the Protobuf encapsulation, specifically: https://github.com/jacques-n/arrow/blob/flight/java/flight/src/main/protobuf/flight.proto#L99 My understanding of Protocol Buffers is that on read, the "

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-28 Thread Jacques Nadeau
Cutting through the layers of GRPC will be a per language approach thing. Assuming that each GRPC language implementation does a good job of separating message encapsulation from the base library, this should be straight-forward-ish. Hope improves around this as I see creation of non-protobuf proto

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-30 Thread Wes McKinney
I see; looking more closely I see you've sidestepped the standard Protobuf serialization to write the stream as tagged components: https://github.com/apache/arrow/compare/master...jacques-n:flight#diff-02cfc9235e22653fce8a7636c9f95507R241 and then reading the fields of the message tag by tag htt

Re: Progress on Arrow RPC a.k.a. Arrow Flight

2018-05-30 Thread Jacques Nadeau
Correct, I'm maintaining standard protobuf encoding so a consumer that doesn't go byte by byte can still consumer/produce the messages. More impls: for sure. On Wed, May 30, 2018 at 9:01 AM, Wes McKinney wrote: > I see; looking more closely I see you've sidestepped the standard > Protobuf seria