Re: [C++] The best method to pass null from struct to its children & visitors

2021-02-18 Thread Micah Kornfield
Hi Ying, I have a need to standardize an Arrow Array so that it is fit for cheaper > conversion into ORC by making sure that all the children (and grandchildren > etc) of null struct entries are null. Is there an established method to > achieve that? I'm not aware of one. Maybe there is somethin

[C++] The best method to pass null from struct to its children & visitors

2021-02-18 Thread Ying Zhou
Hi, Now I’m working on fixing the last concerns on my ORC writer https://github.com/apache/arrow/pull/8648 and have two questions. I have a need to standardize an Arrow Array so that it is fit for cheaper conversion into ORC by making sure that all

Re: [Python] A user friendly way to filter parquet partitions

2021-02-18 Thread Bill Zhao
Hi Micah, Thank you for looking into this matter. I understand your goal of having minimal dependency and also solve the problem from C++ for multi-language support. With that, we cannot change to use the condition package as I proposed. However, I had a difficult time making partition filtering

Re: Arrow sync call February 17 at 12:00 US/Eastern, 17:00 UTC

2021-02-18 Thread Neal Richardson
Apologies for the delay in sending the notes out. Attendees: Ian Cook Andy Grove Jonathan Keane Micah Kornfield Hoi Link Tiffany Lam Andrew Lamb Jorge Cardoso Leitão Kirill Lykov Rok Mihevc Weston Pace Ruan Pearce-Authers Neal Richardson Naman Udasi Dmitry Unknown Discussion: * PR/JIRA tool: An

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
Thanks, then we'll just go ahead and address the remaining comments. Cheers, Gidon On Thu, Feb 18, 2021 at 5:45 PM Antoine Pitrou wrote: > > I don't think there's any concern around having a process-global shared > key cache. The discussion was just around the implementation. > > Also, FTR, a

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Antoine Pitrou
I don't think there's any concern around having a process-global shared key cache. The discussion was just around the implementation. Also, FTR, a standalone LRU cache class is proposed here, which may reduce the amount of original code in the Parquet encryption PR: https://github.com/apache/ar

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
I believe the shared structures that were debated are the key caches. Cheers, Gidon On Thu, Feb 18, 2021 at 6:37 AM Micah Kornfield wrote: > > > > I don't think any notion of threading should be present in the > > implementation, except for the required locks around shared structures. > > > I

Re: Any standard way for min/max values per record-batch?

2021-02-18 Thread Ben Kietzman
Unfortunately FieldNode is a `struct` instead of a `table`, so fields may not be added or deprecated. On Thu, Feb 18, 2021, 04:38 Antoine Pitrou wrote: > > Le 18/02/2021 à 04:37, Micah Kornfield a écrit : > > There is key-value metadata available on Message which might be able to > > work in the

[NIGHTLY] Arrow Build Report for Job nightly-2021-02-18-0

2021-02-18 Thread Crossbow
Arrow Build Report for Job nightly-2021-02-18-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-18-0 Failed Tasks: - conda-linux-gcc-py36-aarch64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-02-18-0-drone-conda-linux

Re: Any standard way for min/max values per record-batch?

2021-02-18 Thread Antoine Pitrou
Le 18/02/2021 à 04:37, Micah Kornfield a écrit : > There is key-value metadata available on Message which might be able to > work in the short term (some sort of encoded message). I think > standardizing how we store statistics per batch does make sense. > > We unfortunately can't add anything