Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Nate Bauernfeind
> makes it more difficult to bring schema evolution back into the > IPC Stream format (i.e. it would live only in flight) Gosh's proposal extends the flatbuffer structures not the protobufs. Can you help me understand how difficult it would be to bring the `schema_id` approach to the IPC stream

Re: [C++]Handling client disconnects on DoExchange (memory leak?)

2021-06-25 Thread David Li
Hmm, something like that shouldn't leak memory and a disconnect certainly shouldn't cause the server to leak memory. A bug report would certainly be appreciated. -David On Fri, Jun 25, 2021, at 17:59, Radu Teodorescu wrote: > Hi, > I am seeing a memory leak server side caused by calls to

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Micah Kornfield
Sorry for the second reply: >1. In our case we do expect relatively frequent changes in the schema >of the batch being sent out. I don't see that pattern changing in the mid >term for a good reason. However long term maybe it will be possible to >leverage separate RPC calls. I

[C++]Handling client disconnects on DoExchange (memory leak?)

2021-06-25 Thread Radu Teodorescu
Hi, I am seeing a memory leak server side caused by calls to DoExchange: The simples repro is having a Flight server that implements DoExchange like this DoExchange(…) { while (true) { … reader->Next(); if (chunk.app_metadata == nullptr)

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Micah Kornfield
> > >1. Re complexity of "one schema at a time" vs "schema id based": i >think they are not much different, right? In fact the second one is more of >an optimization to the first one which is beneficial to us. Anyways even >with the first approach you need to add some logic of

Re: [Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread Max Burke
We've been using binary field types in Parquet and Arrow for WKB-formatted data and we've been finding that it works very well. Having a geospatial type in Arrow that allowed an optional SRID to be passed along would be nice but would be more useful if it came with a corresponding Parquet logical

Re: [Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread M. Edward (Ed) Borasky
I don't know about GeoPandas but in R there are two main in-memory GIS data types: the old-ish "sp" format and the new "sf" (simple features) format. As an R GIS developer, I would expect any Arrow GIS capability to efficiently facilitate "sf" / "tidyverse" operations. See

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Gosh Arzumanyan
Hi Micah, Sure, let me do it here: 1. In our case we do expect relatively frequent changes in the schema of the batch being sent out. I don't see that pattern changing in the mid term for a good reason. However long term maybe it will be possible to leverage separate RPC calls. I

Re: [Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread Julian Hyde
Cc += geospatial@. I think allowing WKB and WKT is sufficient. Perhaps Geometry could be a composite type (WKT, SRID) or (WKB, SRID). SRID (spatial reference identifier) is almost always needed to qualify a geometry value. It is analogous to how TimeZone is needed (implicitly or explicitly) to

Re: [Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread Emilio Lahr-Vivaz
Hello, Re: other projects, I'd like to point out the approach that we've taken on GeoMesa, a geospatial project that I work on. We model geometries in Arrow similarly to the GeoJSON spec[1], as lists of pairs of coordinates. We used FixedSizeList vectors of size 2 to represent each

Re: [VOTE][RUST] Release Apache Arrow Rust 4.4.0 RC1

2021-06-25 Thread Jorge Cardoso Leitão
+1 Ran verification script on Apple intel. On Fri, Jun 25, 2021 at 12:16 AM Andrew Lamb wrote: > Hi, > > I would like to propose a release of Apache Arrow Rust Implementation, > version 4.4.0. > > This release candidate is based on commit: > 32b835e5bee228d8a52015190596f4c33765849a [1] > >

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Jorge Cardoso Leitão
+1 On Fri, Jun 25, 2021 at 7:47 PM Julian Hyde wrote: > +1 > > > On Jun 25, 2021, at 10:36 AM, Antoine Pitrou wrote: > > > > > > Le 24/06/2021 à 21:16, Weston Pace a écrit : > >> The discussion in [1] led to the following proposal which I would like > >> to submit for a vote. > >> --- > >>

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Julian Hyde
+1 > On Jun 25, 2021, at 10:36 AM, Antoine Pitrou wrote: > > > Le 24/06/2021 à 21:16, Weston Pace a écrit : >> The discussion in [1] led to the following proposal which I would like >> to submit for a vote. >> --- >> Arrow allows a timestamp column to omit the time zone property. This >> has

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Antoine Pitrou
Le 24/06/2021 à 21:16, Weston Pace a écrit : The discussion in [1] led to the following proposal which I would like to submit for a vote. --- Arrow allows a timestamp column to omit the time zone property. This has caused confusion because some people have interpreted a timestamp without a

Re: [Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread Mauricio Vargas
Dear Jon Thanks for sending this. Based on previous projects, WKB works well with SQLite, DuckDB and others, at the expense of creating heavier size columns compared to PostGIS. In order to experiment with, it can be interesting to use the CENSO 2017 shape files:

[Discuss] If and how we should integrate geospatial data (specs) in Arrow

2021-06-25 Thread Jonathan Keane
Hello, There is an emerging spec[1] for how to store geospatial data in Arrow + pass through parquet files in the geopandas world. There is even a new R package that implements a wrapper to do the same in R[2]. These both define a serialization[3] for storing geospatial data as an Arrow table

Re: [VOTE][RUST] Release Apache Arrow Rust 4.4.0 RC1

2021-06-25 Thread Wes McKinney
+1 (binding) Ran the verification script on Apple aarch64 On Fri, Jun 25, 2021 at 2:23 AM Sutou Kouhei wrote: > > +1 > > I ran the following command line on Debian GNU/Linux sid: > > dev/release/verify-release-candidate.sh 4.4.0 1 > > > Thanks, > -- > kou > > In > "[VOTE][RUST] Release

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Wes McKinney
+1 (binding) On Fri, Jun 25, 2021 at 3:53 PM David Li wrote: > > +1 (binding) > > On Fri, Jun 25, 2021, at 08:30, Jonathan Keane wrote: > > +1 > > > > -Jon > > > > On Fri, Jun 25, 2021 at 5:30 AM Rok Mihevc wrote: > > > > > > +1 (non-binding) > > > > > > On Fri, Jun 25, 2021 at 11:21 AM Eduardo

Re: Status of Arrow Julia implementation?

2021-06-25 Thread Wes McKinney
hi Jacob — that's great to hear. We're standing by to help you out with this. On Fri, Jun 25, 2021 at 4:57 PM Jacob Quinn wrote: > > Hi Kou, > > Sorry for the slow response here, but it's been great to see how the new > Rust process has shaken out and I think it working well. I'd like to move >

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Micah Kornfield
> > 1. It seems like renaming stream_id to schema_id and delegating "logical > stream" distinction to app_metadata mitigates the "multiplexing" point > while at the same time it gives enough flexibility to address both Nate's > and our use cases. I don't think this is the case. It seems that

Re: [Python] Drop Python 3.6 and Numpy 1.16 support?

2021-06-25 Thread Micah Kornfield
Hi Joris, The reason why I asked for an extension, is I believe several products from my company that have Arrow as a dependency still support Python 3.6. Most are aiming to get off by next quarter. I'm not sure what version they will jump to. I can't speak to what a general rational policy

Re: Status of Arrow Julia implementation?

2021-06-25 Thread Jacob Quinn
Hi Kou, Sorry for the slow response here, but it's been great to see how the new Rust process has shaken out and I think it working well. I'd like to move forward with transferring the JuliaData/Arrow.jl repository to apache/arrow-julia and following a similar process to Rust in terms of

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread David Li
+1 (binding) On Fri, Jun 25, 2021, at 08:30, Jonathan Keane wrote: > +1 > > -Jon > > On Fri, Jun 25, 2021 at 5:30 AM Rok Mihevc wrote: > > > > +1 (non-binding) > > > > On Fri, Jun 25, 2021 at 11:21 AM Eduardo Ponce wrote: > > > > > +1 (non-binding) > > > > > > On Fri, Jun 25, 2021 at 4:31 AM

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Jonathan Keane
+1 -Jon On Fri, Jun 25, 2021 at 5:30 AM Rok Mihevc wrote: > > +1 (non-binding) > > On Fri, Jun 25, 2021 at 11:21 AM Eduardo Ponce wrote: > > > +1 (non-binding) > > > > On Fri, Jun 25, 2021 at 4:31 AM Joris Peeters > > wrote: > > > > > +1 > > > > > > On Fri, Jun 25, 2021 at 9:29 AM Joris Van

Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-06-25 Thread Gosh Arzumanyan
Hi guys, Thanks for sharing your insights/concerns! I also left some comments based on the discussion we had. Briefly: 1. It seems like renaming stream_id to schema_id and delegating "logical stream" distinction to app_metadata mitigates the "multiplexing" point while at the same time it gives

Re: [Discuss] Consider renaming "Arrow" in HO2 benchmarks?

2021-06-25 Thread Wes McKinney
I recommend sending a PR to the benchmark repo that clarifies that it's executing the query using the arrow R/C++ library, when in fact the query is actually primarily handled by dplyr and not Arrow at all. The benchmark is very misleading in its current form. On Fri, Jun 25, 2021 at 11:55 AM

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Rok Mihevc
+1 (non-binding) On Fri, Jun 25, 2021 at 11:21 AM Eduardo Ponce wrote: > +1 (non-binding) > > On Fri, Jun 25, 2021 at 4:31 AM Joris Peeters > wrote: > > > +1 > > > > On Fri, Jun 25, 2021 at 9:29 AM Joris Van den Bossche < > > jorisvandenboss...@gmail.com> wrote: > > > > > +1 > > > > > > On

[Discuss] Consider renaming "Arrow" in HO2 benchmarks?

2021-06-25 Thread Jorge Cardoso Leitão
Hi, HO2 has a set of benchmarks comparing different query engines [1]. There is currently an implementation named "Arrow", backed by the Arrow R implementation [2]. This is one of the least performant implementations evaluated. I sense that this may negatively affect the Arrow format, as people

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Eduardo Ponce
+1 (non-binding) On Fri, Jun 25, 2021 at 4:31 AM Joris Peeters wrote: > +1 > > On Fri, Jun 25, 2021 at 9:29 AM Joris Van den Bossche < > jorisvandenboss...@gmail.com> wrote: > > > +1 > > > > On Thu, 24 Jun 2021 at 21:21, Micah Kornfield > > wrote: > > > > > +1 (binding) > > > > > > On Thu, Jun

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Joris Peeters
+1 On Fri, Jun 25, 2021 at 9:29 AM Joris Van den Bossche < jorisvandenboss...@gmail.com> wrote: > +1 > > On Thu, 24 Jun 2021 at 21:21, Micah Kornfield > wrote: > > > +1 (binding) > > > > On Thu, Jun 24, 2021 at 12:17 PM Weston Pace > > wrote: > > > > > The discussion in [1] led to the

Re: [VOTE] Clarify meaning of timestamp without time zone to equal the concept of "LocalDateTime"

2021-06-25 Thread Joris Van den Bossche
+1 On Thu, 24 Jun 2021 at 21:21, Micah Kornfield wrote: > +1 (binding) > > On Thu, Jun 24, 2021 at 12:17 PM Weston Pace > wrote: > > > The discussion in [1] led to the following proposal which I would like > > to submit for a vote. > > > > --- > > Arrow allows a timestamp column to omit the