Re: Long title on github page

2021-05-16 Thread Nate Bauernfeind
Suggestion: faster -> more efficiently "Apache Arrow is a cross-language development platform for in-memory data. It enables systems to process and transport data more efficiently." On Sun, May 16, 2021 at 11:35 AM Wes McKinney wrote: > Here's what there now: > > "Apache Arrow is a cross-langua

Re: [Flight Extension] Request for Comments

2021-06-02 Thread Nate Bauernfeind
the > > feature > > > as > > > > > a new field in the protobuf so that it can be used in contexts with > > > other > > > > > header metadata types? Do you have time to riff on the format that > > will > > > > > apply to the other c

[C++] Async Arrow Flight

2021-06-02 Thread Nate Bauernfeind
It seems to me that the c++ arrow flight implementation uses only the synchronous version of the gRPC API. gRPC supports asynchronous message delivery in C++ via a CompletionQueue that must be polled. Has there been any desire to standardize on a solution for asynchronous use cases, perhaps deliver

Re: [C++] Async Arrow Flight

2021-06-03 Thread Nate Bauernfeind
a > useful addition. What sorts of things would it enable for you? > > -David > > On Wed, Jun 2, 2021, at 16:20, Nate Bauernfeind wrote: > > It seems to me that the c++ arrow flight implementation uses only the > > synchronous version of the gRPC API. gRPC supports asynchr

Re: [C++] Async Arrow Flight

2021-06-21 Thread Nate Bauernfeind
finding someone to do the work. > > Best, > David > > On Thu, Jun 3, 2021, at 12:11, Nate Bauernfeind wrote: > > In addition to Arrow Flight we have other gRPC APIs that work together > as a > > whole. For example, the API client establishes a session with the server

Re: [ANNOUNCE] New Arrow PMC member: David M Li

2021-06-21 Thread Nate Bauernfeind
Congratulations! Well earned! On Mon, Jun 21, 2021 at 4:20 PM Ian Cook wrote: > Congratulations, David! > > Ian > > > On Mon, Jun 21, 2021 at 6:19 PM Wes McKinney wrote: > > > > The Project Management Committee (PMC) for Apache Arrow has invited > > David M Li to become a PMC member and we are

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-23 Thread Nate Bauernfeind
Thanks for writing this up! I added a few general comments, but have a question on the approach because it's not quite what I was expecting. I am slightly concerned that the proposal looks more like support for "multiplexing" IPC streams into a single RPC stream rather than support for a changing

Re: [STRAW POLL] (How) should Arrow define storage for "Instant"s

2021-06-24 Thread Nate Bauernfeind
Option C. On Thu, Jun 24, 2021 at 1:53 PM Joris Peeters wrote: > C > > On Thu, Jun 24, 2021 at 8:39 PM Antoine Pitrou wrote: > > > > > Option C. > > > > > > Le 24/06/2021 à 21:24, Weston Pace a écrit : > > > > > > This proposal states that Arrow should define how to encode an Instant > > > into

Re: [C++] Reducing branching in compute/kernels/vector_selection.cc

2021-06-24 Thread Nate Bauernfeind
FYI, the bench was slightly broken; but the results stand. > benchmark::DoNotOptimize(output[rand()]); Since rand() has a domain of 0 to MAX_INT it blows past the output array (of length 4k). It segfaults in GCC; I'm not sure why the Clang benchmark is happy with that. I modified [1] it to: > ben

Re: [C++] Reducing branching in compute/kernels/vector_selection.cc

2021-06-24 Thread Nate Bauernfeind
> Basically, it reset/set the borrow bit in eflag register based on the if condition, and runs `outpos = outpos - (-1) - borrow_bit`. That's clever, and I clearly didn't see that! On Thu, Jun 24, 2021 at 8:57 PM Yibo Cai wrote: > > > On 6/25/21 6:58 AM, Nate Bauernfei

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Nate Bauernfeind
lexing" point >>>> > while at the same time it gives enough flexibility to address both >>>> Nate's >>>> > and our use cases. >>>> > 2. To David's point about other transports: in fact currently we are >>>> using

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-26 Thread Nate Bauernfeind
> > > > makes it more difficult to bring schema evolution back into the > > > IPC Stream format (i.e. it would live only in flight) > > > > Gosh's proposal extends the flatbuffer structures not the protobufs. Can > > you help me understand how difficult it would be to bring the `schema_id` > > appr

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-07-07 Thread Nate Bauernfeind
s a separate file alongside the Arrow file (indexed by > record > > > > > batch index) where you can take advantage of whatever format is > most > > > > > suitable. > > > > > > > > > > -David > > > > > > > >

Re: Arrow sync call July 7 at 12:00 US/Eastern, 16:00 UTC

2021-07-07 Thread Nate Bauernfeind
Is this still happening today? On Tue, Jul 6, 2021 at 11:07 AM Ian Cook wrote: > Hi all, > > Our biweekly sync call is tomorrow at > https://meet.google.com/vtm-teks-phx. All are welcome to join. Notes > will be shared with the mailing list afterward. > > Ian > --

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-07-07 Thread Nate Bauernfeind
n it is empty" is a > > feature, then we may not want to allocate space for those nodes given > that > > the record batch length will likely be greater than zero. > > > Having conflicting RecordBatch and top-level field nodes is something that > I believe we have pushed ba

WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

2021-07-23 Thread Nate Bauernfeind
In flight.proto [1] it states that the encoded bytes are as described in the flatbuffer schema. ``` /* * Wrap the result of a getSchema call */ message SchemaResult { // schema of the dataset as described in Schema.fbs::Schema. bytes schema = 1; } ``` However, both this schema and the schem

[DISCUSS] next iteration of flatbuffer structures

2021-07-26 Thread Nate Bauernfeind
Wes suggested that maybe there are enough new ideas that it may make sense to evolve-past the existing structures rather than to bolt-on new functionality. I would like to learn what requirements exist should new structures be adopted, and if applicable, would like to turn this into a full POC prop

Re: WireFormat of Flight SchemaResult and Flight FlightInfo's Schema

2021-07-26 Thread Nate Bauernfeind
need a > > > > vote to update this since it is changing files in the format dir. > > > > > > > > I did check the Java implementation quickly and even in the initial > > > > version, the schema is IPC-encapsulated[1]. > > > > > > >

Re: HTTP traffic of Arrow Flight

2021-09-07 Thread Nate Bauernfeind
HTTP (and HTTP/2) traffic is sent over TCP. You might need to be more specific, or possibly do some more research on your end Which arrow flight client are you using in your test? Java? C++? Which version? Can you provide a simple gRPC server/client example that shows up in WireShark as you expec

Re: Arrow sync call September 15 at 12:00 US/Eastern, 16:00 UTC

2021-09-15 Thread Nate Bauernfeind
Meeting notes for arrow-sync on 09/15/2021. Attendees: - Nate Bauernfeind - Nic Crane - Alenka Frim - Rok Mihevc - Niranda Perera There was no discussion this week; all attendees were here to lurk, listen, and be-a-fly-on-the-wall. See you in two weeks. -- On Wed, Sep 15, 2021 at 8:08 AM

Re: [DISCUSS] next iteration of flatbuffer structures

2021-11-08 Thread Nate Bauernfeind
ng existing users of RecordBatch to rather different behavior. > > > > > > For #3, a different thread was discussing some of the points there - it > > sounds like it may be possible to relax from map to > > map. > > > > > > -David > > > >

Re: [DISCUSS] next iteration of flatbuffer structures

2021-11-08 Thread Nate Bauernfeind
> I'm not sure anyone is actively working on RLE or other encoding schemes > at the moment. > > -David > > On Mon, Nov 8, 2021, at 13:19, Nate Bauernfeind wrote: > > I've written up the ColumnBag proposal addressing items 1 and 2 on the > > list. I'm open

Re: updates on top of arrow

2021-11-17 Thread Nate Bauernfeind
It's not clear if this will actually hit your use case or not. Specifically, low overhead means different things to different people. Also, my suggestion is not a database -- it is an analytics engine. The persistence part of the problem is off the table at this time, too. I wanted to mention it up

[Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
forward to your feedback; thank you! Nate Bauernfeind Deephaven Data Labs - https://deephaven.io/ --

Re: [Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
7;s existing metadata fields/API that would prevent you from using > them, as that way you (and we!) don't have to fully duplicate one of > Arrow's format definitions. Similarly, Flight already has a bidirectional > streaming endpoint, DoExchange, that allows arbitrary payloads (with

Re: [Flight Extension] Request for Comments

2021-03-03 Thread Nate Bauernfeind
tly was > planning to look at JavaScript support for Flight (using WebSockets as the > transport, IIRC) and it might make sense to join forces if that's a path > you were also going to pursue. > > Best, > David > > On Wed, Mar 3, 2021, at 18:05, Nate Bauernfeind wrot

Re: [Flight Extension] Request for Comments

2021-03-04 Thread Nate Bauernfeind
t support - there's an existing ticket: > https://issues.apache.org/jira/browse/ARROW-9860 > > I was sure I had seen another organization talking about browser support > recently, but now I can't find them. I'll update here if I do figure it out. > > Best, > Davi

Re: [Flight Extension] Request for Comments

2021-03-05 Thread Nate Bauernfeind
g. switching dictionary encoding on/off). > > -Micah > > > On Fri, Mar 5, 2021 at 11:42 AM David Li wrote: > > > (responses inline) > > > > On Thu, Mar 4, 2021, at 17:26, Nate Bauernfeind wrote: > > > Regarding the BarrageRecordBatch: > > > >

Re: [Flight Extension] Request for Comments

2021-03-05 Thread Nate Bauernfeind
Batch flatbuffer for added rows. - A set of FlightData record batches also using the normal RecordBatch flatbuffer for modified rows. On Fri, Mar 5, 2021 at 11:00 PM Nate Bauernfeind < natebauernfe...@deephaven.io> wrote: > > It seems that atomic application could also be something controlled i

Re: [Flight Extension] Request for Comments

2021-03-08 Thread Nate Bauernfeind
e first record > batch, without having to modify anything about the record batch itself, and > without having to define a new metadata header at the Arrow level - > everything could be implemented on top of the existing definitions. > > David > > On Sat, Mar 6, 2021, at 01:07

Re: Is Zulip still the preferred chat application for Arrow?

2021-03-10 Thread Nate Bauernfeind
> I also found out today that there is an official ASF slack with multiple Arrow channels, but this is only open to people who already have an apache.org email address (committers / PMC). FYI, non committers / PMC members can join the slack using this link: https://s.apache.org/slack-invite On We

Re: No replacement dictionaries supported in pyarrow?

2021-03-18 Thread Nate Bauernfeind
If dictionary replacements were supported, then the IPC file format couldn't guarantee random access reads. Personally, I would like to support a stream-based file format that is a series of the Flight protobufs. In my extension of arrow flight, by stuffing our state-based data into the app_metada

Re: No replacement dictionaries supported in pyarrow?

2021-03-19 Thread Nate Bauernfeind
mpression that the file format is supposed to support > deltas, but not replacements. Is this not implemented in C++? > > On Thu, Mar 18, 2021 at 9:57 PM Nate Bauernfeind < > nate.bauernfe...@gmail.com> > wrote: > > > If dictionary replacements were supported, then the

Re: No replacement dictionaries supported in pyarrow?

2021-03-19 Thread Nate Bauernfeind
a lot of wiggle room for alternatives. On Fri, Mar 19, 2021 at 10:03 AM Nate Bauernfeind < nate.bauernfe...@gmail.com> wrote: > The dictionary is not allowed to change throughout the file; which is > ultimately OP's request. This is because all of the dictionary definition > is i

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-04-13 Thread Nate Bauernfeind
> possibly in coordination with the Deephaven/Barrage team, if they're also still interested Good opportunity for me to chime in =). I think we still have interest in this feature. On the other thread, it took a little cajoling, but I've come around to agree with the conclusions of taking a Record

Re: [Java] Source control of generated flatbuffers code

2021-04-14 Thread Nate Bauernfeind
It would also be nice to upgrade that java flatbuffer version from 1.9 to 1.12. Is anyone planning on doing this work (as listed in ARROW-12111)? If I did this work today, might it be possible to get it included in the 4.0.0 release? On Fri, Mar 26, 2021 at 3:25 PM bobtins wrote: > OK, original

Re: [Java] Source control of generated flatbuffers code

2021-04-14 Thread Nate Bauernfeind
ng. > The methods of generation are all over the map, and some have no script or > build file, just doc. Would there be any value in making this more uniform? > > On 2021/04/14 16:36:47, Nate Bauernfeind > wrote: > > It would also be nice to upgrade that java flatbuffer ver

Re: [Java] Source control of generated flatbuffers code

2021-04-15 Thread Nate Bauernfeind
ion to an apache project; please let me know if there is anything else that I need to do to get this past the finish line. https://github.com/apache/arrow/pull/10058 Thanks, Nate On Wed, Apr 14, 2021 at 11:45 PM Nate Bauernfeind < natebauernfe...@deephaven.io> wrote: > Hey Bob, > > So