Re: Fwd: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-03-10 Thread Antoine Pitrou
/apache/arrow/pull/8023 Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Thu, Feb 18, 2021 at 6:25 PM Subject: Re: Exposing low-level Parquet encryption to Python user (or, maybe not) To: dev Thanks, then we'll just go ahead and address the remaining comments

Fwd: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-03-09 Thread Gidon Gershinsky
, Feb 18, 2021 at 6:25 PM Subject: Re: Exposing low-level Parquet encryption to Python user (or, maybe not) To: dev Thanks, then we'll just go ahead and address the remaining comments. Cheers, Gidon On Thu, Feb 18, 2021 at 5:45 PM Antoine Pitrou wrote: > > I don't think there's any c

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
Thanks, then we'll just go ahead and address the remaining comments. Cheers, Gidon On Thu, Feb 18, 2021 at 5:45 PM Antoine Pitrou wrote: > > I don't think there's any concern around having a process-global shared > key cache. The discussion was just around the implementation. > > Also, FTR,

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Antoine Pitrou
I don't think there's any concern around having a process-global shared key cache. The discussion was just around the implementation. Also, FTR, a standalone LRU cache class is proposed here, which may reduce the amount of original code in the Parquet encryption PR:

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
I believe the shared structures that were debated are the key caches. Cheers, Gidon On Thu, Feb 18, 2021 at 6:37 AM Micah Kornfield wrote: > > > > I don't think any notion of threading should be present in the > > implementation, except for the required locks around shared structures. > > > I

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Micah Kornfield
> > I don't think any notion of threading should be present in the > implementation, except for the required locks around shared structures. I seem to recall the debate was how to model some class interactions to determine what should be considered shared structures and what should not. On Wed,

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
This certainly sounds good to me. Cheers, Gidon On Wed, Feb 17, 2021 at 7:36 PM Antoine Pitrou wrote: > > I don't think any notion of threading should be present in the > implementation, except for the required locks around shared structures. > I don't know where the idea of a "main thread"

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Antoine Pitrou
I don't think any notion of threading should be present in the implementation, except for the required locks around shared structures. I don't know where the idea of a "main thread" comes from, but it probably shouldn't exist in a C++ library. Regards Antoine. Le 17/02/2021 à 18:34, Gidon

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
Just to clarify. There are two options, which one do you refer to? A design with a main thread that handles projections and the keys (relevant for the projected columns); or the current code with any thread allowed to handle full file reading, inc the footer, column projections and their keys? Can

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
>From the doc, "To maintain consistency with the style of parquet-cpp, the above structures should not be explicitly synchronized with individual mutexes. In the case of a parquet::arrow::FileReader, the request to read a given selection of row groups and columns is issued from a single main

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Antoine Pitrou
I'm not sure a threading model is expected for an encryption layer. Am I missing something? Regards Antoine. Le 17/02/2021 à 06:59, Gidon Gershinsky a écrit : > Precisely, the main change is in the threading model. Afaik, the document > proposes a model that fits pandas, but might be

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
Precisely, the main change is in the threading model. Afaik, the document proposes a model that fits pandas, but might be problematic for other users of this library. Technically, this is not showstopper though; if the community decides on this model, it will be compatible with the high-level

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
Hi Antoine, My part there is mostly review and some advice. The bulk of the work is done by Tham, and by the community members who've reviewed the PR; my frustration is with seeing it in limbo for a while now. Regarding the remaining comments - currently, the main sticking points are the change

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Micah Kornfield
I think some of the comments might be conflicting. One of the concerns (that I would need to refresh myself on to offer an opinion which was covered in Ben's doc) was the threading model we expect in the library. On Tue, Feb 16, 2021 at 8:03 AM Antoine Pitrou wrote: > > Hi Gidon, > > Le

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Antoine Pitrou
Hi Gidon, Le 16/02/2021 à 16:42, Gidon Gershinsky a écrit : > Regarding the high-level layer, I think it waits for a progress at > https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit?usp=sharing > No activity there since last November. This is unfortunate,

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
Regarding the high-level layer, I think it waits for a progress at https://docs.google.com/document/d/11qz84ajysvVo5ZAV9mXKOeh6ay4-xgkBrubggCP5220/edit?usp=sharing No activity there since last November. This is unfortunate, because Tham has put a lot of work in coding the high-level layer (and

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Itamar Turner-Trauring
On Mon, Feb 15, 2021, at 2:49 PM, Micah Kornfield wrote: > Sorry I realized I had a typo in my email. We should definitely namespace > dangerous apis appropriately. Decryption doesn't seem necessarily dangerous? In any case, I will start with PR for decryption only and we can see how that

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-15 Thread Micah Kornfield
Sorry I realized I had a typo in my email. We should definitely namespace dangerous apis appropriately. On Monday, February 15, 2021, Itamar Turner-Trauring wrote: > > > On Fri, Feb 12, 2021, at 11:52 PM, Micah Kornfield wrote: > > 2. I'm open to exposing the lower level encryption libraries

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-15 Thread Itamar Turner-Trauring
On Fri, Feb 12, 2021, at 11:52 PM, Micah Kornfield wrote: > 2. I'm open to exposing the lower level encryption libraries in python > (without appropriate namespacing/communication). It seems at least for > reading, there is potentially less harm (I'll caveat that with I'm not a > security

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-12 Thread Micah Kornfield
My thoughts: 1. I've lost track of the higher level encryption implementation in C++. I think we were trying to come to a consensus on the threading/thread safety model? 2. I'm open to exposing the lower level encryption libraries in python (without appropriate namespacing/communication). It

Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-10 Thread Itamar Turner-Trauring
Hi, Since the PR for high-level C++ Parquet encryption API appears stalled (https://github.com/apache/arrow/pull/8023), I'm looking into exposing the low-level Parquet encryption API to Python. Arguments for doing this: the low-level API is all the users I'm talking to need, at the moment, so