Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-09-01 Thread Micah Kornfield
It still sounds like adding a new type might be the safest approach (and marking the old type as discouraged). On Mon, Aug 23, 2021 at 11:18 AM David Li wrote: > I believe so. > > The encoding of a string in Flatbuffers is [byte] with a null terminator > not included in the length, so old files

Re: [Python/C++] The ABI compatibility of pyarrow cython/c++ api

2021-09-01 Thread Shawn Yang
Thanks for the suggestion. Seems It's the only solution. On Wed, Sep 1, 2021 at 10:02 PM Wes McKinney wrote: > There is no ABI stability between major versions of pyarrow either at > the Cython or C++ levels at the moment (furthermore, it seems unlikely > to be the case in the near future). I wo

Re: [Java] C Data Interface and dictionaries

2021-09-01 Thread Micah Kornfield
It is quite possible the dictionary related code in Java could use some rethinking. I recall working with them has been a little bit awkward and I think we had some open JIRAs related to this. On Thu, Aug 26, 2021 at 12:52 AM roee shlomo wrote: > > It seems that we have both raw value and encod

Re: Set of primitive physical types

2021-09-01 Thread Micah Kornfield
I agree, it is what I would have proposed for the interval type if there wasn't an interval type in Arrow already. I think FixedSizeList has for better or worse solved a lot of the problems that a struct type would be used for (e.g. coordinates) Cheers, Micah On Tue, Aug 31, 2021 at 8:27 AM Wes

Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-09-01 Thread Phillip Cloud
Hey everyone, As many of you know, the compute IR project has a lot of interested parties and has generated a lot of feedback. In light of some of the feedback we’ve received, we want to stress that the specification is intended to have input from many diverse points of view and that we welcome fo

Re: [Python/C++] The ABI compatibility of pyarrow cython/c++ api

2021-09-01 Thread Wes McKinney
There is no ABI stability between major versions of pyarrow either at the Cython or C++ levels at the moment (furthermore, it seems unlikely to be the case in the near future). I would recommend pinning the pyarrow version you depend on and bumping the pin when new major versions are released. On

Arrow sync call September 1 at 12:00 US/Eastern, 16:00 UTC

2021-09-01 Thread Ian Cook
Hi all, Our biweekly sync call is today at 12:00 noon Eastern time. We have switched to using Zoom instead of Google Meet. The Zoom meeting URL for this and future Arrow sync calls is: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Alternatively, enter this information into t

Re: C++ Determine Size of RecordBatch

2021-09-01 Thread Jorge Cardoso Leitão
note that that would be an upper bound because buffers can be shared between arrays. On Wed, Sep 1, 2021 at 2:15 PM Antoine Pitrou wrote: > On Tue, 31 Aug 2021 21:46:23 -0700 > Rares Vernica wrote: > > > > I'm storing RecordBatch objects in a local cache to improve performance. > I > > want to

Re: C++ Determine Size of RecordBatch

2021-09-01 Thread Antoine Pitrou
On Tue, 31 Aug 2021 21:46:23 -0700 Rares Vernica wrote: > > I'm storing RecordBatch objects in a local cache to improve performance. I > want to keep track of the memory usage to stay within bounds. The arrays > stored in the batch are not nested. > > The best way I came up to compute the size o

[Python/C++] The ABI compatibility of pyarrow cython/c++ api

2021-09-01 Thread Shawn Yang
I built a multi-language library based on arrow and the python implementation used arrow cython and c++ api. I want my library to be compatible with multiple pyarrow versions. But the dynamic library in my python implementation depended on `libarrow.xxx.dylib` and for every pyarrow version the `lib

Re: Merging sorted tables/ record batches

2021-09-01 Thread Antoine Pitrou
Le 01/09/2021 à 03:58, Micah Kornfield a écrit : According to Wikipedia there is a min-heap approach that is O(N log k) not sure if this matches with Niranda's proposal [1]. On the surface the analysis make sense to me but I could be missing something. [1] https://en.m.wikipedia.org/wiki/K-wa