Is it actually a good idea to send complex nested data structures over arrow? Don’t you loose a lot of it’s benefits?
Just asking why? Genuinely curious as I have some data that is ostensibly time series (a time series of trade positions, each snapshot is a hashmap differing in cardinality so it can’t feasibly be split into a stream per key). > On 11 May 2022, at 16:27, Gavin Ray <[email protected]> wrote: > > > > For the Request, we have a Protobuf message that consists of two strings: > > the GraphQL query and an optional JSON string for variable definitions. We > > marshal the protobuf message to bytes which are used as the “ticket” for > > the `DoGet` request through Flight. > > Ah okay, so for the request you would just follow the standard "/graphql" > query object, with/without "operationName" > > > Because Arrow already contains complex types like nested structs / lists / > > etc. it’s not too difficult to construct an arrow Schema from the expected > > GraphQL response schema and just return a stream of record batches. > > Nice to know this is doable, it seemed like it might be overly complicated to > write the transform > > > Since pretty much every existing GraphQL engine outputs JSON right now, > > we’ve essentially built our own execution engine at this point by utilizing > > the planner from https://pkg.go.dev/github.com/jensneuse/graphql-go-tools > > and a custom built execution layer to execute the generated plan. > > This library is really neat, thanks for posting. > Seems to have wrapper datafetchers too for REST/GQL/static datasources, which > is nice. > > I'd be using "graphql-java", where a resolver/datafetcher can return > arbitrary types so > I don't think the JSON bit would be a hangup at least -- could directly > return record batches from query execution. > > >> On Wed, May 11, 2022 at 10:55 AM Matthew Topol <[email protected]> wrote: >> So I’m actually doing this currently in production for a service, as I spoke >> about in a talk at the Subsurface conference. >> >> >> >> For the Request, we have a Protobuf message that consists of two strings: >> the GraphQL query and an optional JSON string for variable definitions. We >> marshal the protobuf message to bytes which are used as the “ticket” for the >> `DoGet` request through Flight. >> >> >> >> Since pretty much every existing GraphQL engine outputs JSON right now, >> we’ve essentially built our own execution engine at this point by utilizing >> the planner from https://pkg.go.dev/github.com/jensneuse/graphql-go-tools >> and a custom built execution layer to execute the generated plan. Because >> Arrow already contains complex types like nested structs / lists / etc. it’s >> not too difficult to construct an arrow Schema from the expected GraphQL >> response schema and just return a stream of record batches. >> >> >> >> --Matt >> >> >> >> From: Gavin Ray <[email protected]> >> Sent: Wednesday, May 11, 2022 10:30 AM >> To: [email protected] >> Subject: GraphQL over Arrow (+ Flight)? >> >> >> >> If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly >> using these two together might look GraphQL is transport-agnostic, so you >> can theoretically use it over anything, a good case study being Dan Luu's >> article here: >> >> If anyone is familiar with both GraphQL and Arrow, I'm curious how exactly >> using >> >> these two together might look >> >> >> >> GraphQL is transport-agnostic, so you can theoretically use it over >> anything, a >> >> good case study being Dan Luu's article here: >> >> >> >> https://danluu.com/simple-architectures/ >> >> >> >> > "Some areas where we’re happy with our choices even though they may not >> >> > sound like the simplest feasible solution are with our API, where we use >> >> > GraphQL, with our transport protocols, where we had a custom protocol >> for a >> >> > while, and our host management, where we use Kubernetes. For our >> transport >> >> > protocols, we used to use a custom protocol that runs on top of UDP, >> with an >> >> > SMS and USSD fallback, for the performance reasons described in this >> talk. >> >> > With the rollout of HTTP/3, we’ve been able to replace our custom >> protocol >> >> > with HTTP/3 and we generally only need USSD for events like the recent >> >> > internet shutdowns in Mali)." >> >> >> >> I've seen also GraphQL done over Protobuf/gRPC, TCP/MsgPack, and a custom >> binary >> >> format: >> >> >> >> - >> https://github.com/google/rejoiner/blob/b1cb09e9bbf7ac68bfd9c93f23a73b691e6ead72/examples-gradle/src/main/java/com/google/api/graphql/examples/streaming/graphqlserver/GraphQlGrpcServer.java#L44 >> >> - https://github.com/OlegIlyenko/sangria-tcp-msgpack-example >> >> - https://github.com/esseswann/graphql-binary >> >> >> >> If someone were interested in using Arrow as the encoding layer, how would >> this >> >> work in practice? >> >> >> >> Arrow messages need to have a well-defined schema, and GraphQL >> >> queries return dynamic, nested data, so I'm having a hard time understanding >> how >> >> you'd go about representing/encoding that in an Arrow message. >> >> >> >> Thank you =)
