Re: [ANNOUNCE] New Arrow PMC member: L. C. Hsieh

2022-09-03 Thread Gidon Gershinsky
Congrats Liang-Chi!! Cheers, Gidon On Sun, Sep 4, 2022 at 7:37 AM Micah Kornfield wrote: > Congrats! > > On Sat, Sep 3, 2022 at 8:19 PM QP Hou wrote: > > > Congrats Liang-Chi! > > > > On Sat, Sep 3, 2022 at 8:25 PM Remzi Yang <1371656737...@gmail.com> > wrote: > > > > > Congratulation Liang-C

Re: [ANNOUNCE] New Arrow committer: Liang-Chi Hsieh

2022-04-27 Thread Gidon Gershinsky
Congrats Liang-Chi! Cheers, Gidon On Thu, Apr 28, 2022 at 4:17 AM Yang hao <1371656737...@gmail.com> wrote: > Congratulations Liang-Chi! > > From: Weston Pace > Date: Thursday, April 28, 2022 at 05:19 > To: dev@arrow.apache.org > Subject: Re: [ANNOUNCE] New Arrow committer: Liang-Chi Hsieh >

Fwd: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-03-09 Thread Gidon Gershinsky
Hi Antoine, All comments have been handled. Can we ask you to shepherd this PR for the reminder of its lifecycle? (hopefully, most of this is already behind us). https://github.com/apache/arrow/pull/8023 Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Thu

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
mentation. > > Also, FTR, a standalone LRU cache class is proposed here, which may > reduce the amount of original code in the Parquet encryption PR: > https://github.com/apache/arrow/pull/8716 > > Best regards > > Antoine. > > > Le 18/02/2021 à 16:40, Gidon Gershinsk

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-18 Thread Gidon Gershinsky
red structures. > > > I seem to recall the debate was how to model some class interactions to > determine what should be considered shared structures and what should not. > > On Wed, Feb 17, 2021 at 9:52 AM Gidon Gershinsky wrote: > > > This certainly sounds good to me

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
idea of a "main thread" comes from, but it > probably shouldn't exist in a C++ library. > > Regards > > Antoine. > > > > Le 17/02/2021 à 18:34, Gidon Gershinsky a écrit : > > Just to clarify. There are two options, which one do you refer to? A > desi

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
be needed). Cheers, Gidon On Wed, Feb 17, 2021 at 2:40 PM Antoine Pitrou wrote: > > > Le 17/02/2021 à 12:47, Gidon Gershinsky a écrit : > > From the doc, > > "To maintain consistency with the style of parquet-cpp, the above > > structures should not be explici

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-17 Thread Gidon Gershinsky
; I think some of the comments might be conflicting. One of the concerns > >> (that I would need to refresh myself on to offer an opinion which was > >> covered in Ben's doc) was the threading model we expect in the library. > >> > >>

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
gt; On Tue, Feb 16, 2021 at 8:03 AM Antoine Pitrou wrote: > > > > > Hi Gidon, > > > > Le 16/02/2021 à 16:42, Gidon Gershinsky a écrit : > > > Regarding the high-level layer, I think it waits for a progress at > > > > > > https://docs.google.com/d

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
hange proposals in this googledoc. Once their status is clarified, I hope Tham will be able to resume addressing the comments (I'll help with some of them if needed). Cheers, Gidon On Tue, Feb 16, 2021 at 6:03 PM Antoine Pitrou wrote: > > Hi Gidon, > > Le 16/02/2021 à 16:42, Gi

Re: Exposing low-level Parquet encryption to Python user (or, maybe not)

2021-02-16 Thread Gidon Gershinsky
7;s plausible others would also find some > > benefit in having the Pyarrow API expose low-level Parquet encryption. > Then > > again, it might only be this one company and no one else cares. > > > > The arguments against, per Gidon Gershinsky: > > > > > *

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-16 Thread Gidon Gershinsky
ker tasks are launched. > After we assemble this ahead-of-time set of keys it will not change during > the course of a read, so the > DecryptionKeyRetriever can safely access it without mutexes. I've added a > comment to the doc > > On Fri, Nov 13, 2020 at 3:09 AM Gidon G

Re: [DISCUSS] Alternative design for KMS interaction in parquet-cpp

2020-11-13 Thread Gidon Gershinsky
Hi all, Glad to see the parquet-cpp progress on this! Can I suggest creating a googledoc for the technical discussion? The current md doc format seems to be harder for pinpointed comments. I got a few, but they are too minor for sending to the two mailing lists. Cheers, Gidon On Fri, Nov 13, 20

Re: Adding Parquet encryption support to PyArrow

2020-09-09 Thread Gidon Gershinsky
Thanks guys. I'll go over the intro sections to merge/streamline the text there. I've added a "commenter" access for all, so everybody could take part in the doc's discussion threads. For edit access, please contact Itamar (by pressing the request button). Cheers, Gidon On Wed, Sep 9, 2020 at 1:

Re: Adding Parquet encryption support to PyArrow

2020-09-06 Thread Gidon Gershinsky
> > > > Regards > > > > Antoine. > > > > > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit : > > > Why would the low level API be exposed directly.. This will break the > > > interop between the two analytic ecosystems down the road.

Re: Adding Parquet encryption support to PyArrow

2020-09-06 Thread Gidon Gershinsky
se two API levels > are, and to what usage they correspond. > Is Parquet encryption used only with that Spark? While Spark > interoperability is important, Parquet files are more ubiquitous than that. > > Regards > > Antoine. > > > Le 03/09/2020 à 22:31, Gidon Gershi

Re: Adding Parquet encryption support to PyArrow

2020-09-04 Thread Gidon Gershinsky
. > Is Parquet encryption used only with that Spark? While Spark > interoperability is important, Parquet files are more ubiquitous than that. > > Regards > > Antoine. > > > Le 03/09/2020 à 22:31, Gidon Gershinsky a écrit : > > Why would the low level API be exposed directl

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Why would the low level API be exposed directly.. This will break the interop between the two analytic ecosystems down the road. Again, let me suggest leveraging the high level interface, based on the PropertiesDrivenCryptoFactory. It should address your technical requirements; if it doesn't, we ca

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Hi Antoine, Sounds good to me. This PR is already being actively reviewed, and it'd be good to have Itamar's assessment. Cheers, Gidon On Thu, Sep 3, 2020 at 6:01 PM Antoine Pitrou wrote: > > Hi Gidon, > > Le 03/09/2020 à 16:53, Gidon Gershinsky a écrit : >

Re: Adding Parquet encryption support to PyArrow

2020-09-03 Thread Gidon Gershinsky
Hi Itamar, My suggestion would be wrap a different API in Python - the high-level encryption interface of https://github.com/apache/arrow/pull/8023 This will enable interoperability with Apache Spark (and other frameworks), where we don't expose the low level parquet encryption API. If such a low

Re: Property-driven Parquet encryption

2020-07-12 Thread Gidon Gershinsky
Hi Micah, Thanks for your comments here, and at the design googledoc. We'll get started, we've got the input we were looking for. Cheers, Gidon

Fwd: Property-driven Parquet encryption

2020-07-10 Thread Gidon Gershinsky
Sorry, Micah, and thanks again. Cheers, Gidon -- Forwarded message - From: Gidon Gershinsky Date: Fri, Jul 10, 2020 at 10:41 AM Subject: Re: Property-driven Parquet encryption To: dev , Hi Michah, Thanks! I was hoping for community feedback, it's better to discuss

Re: Property-driven Parquet encryption

2020-07-10 Thread Gidon Gershinsky
> I'm not sure I understand. By column key metadata, do you mean the column_keys parameter? Cheers, Gidon > > > On Wed, Jul 8, 2020 at 11:06 PM Gidon Gershinsky wrote: > > > Ok, so we had a look with Tham at the current pyarrow and parquet-cpp > > configuration ob

Fwd: Property-driven Parquet encryption

2020-07-08 Thread Gidon Gershinsky
_algorithm; }; Cheers, Gidon -- Forwarded message ----- From: Gidon Gershinsky Date: Tue, Jul 7, 2020 at 9:35 AM Subject: Property-driven Parquet encryption To: dev Cc: tham Hi all, We are working on the Parquet modular encryption, and are currently adding a high-level interface that

Property-driven Parquet encryption

2020-07-06 Thread Gidon Gershinsky
Hi all, We are working on the Parquet modular encryption, and are currently adding a high-level interface that allows to encrypt/decrypt parquet files via properties only (without calling the low level API). In the spark/parquet-mr domain, we're using the Hadoop configuration properties for that p

[jira] [Created] (ARROW-8018) Parquet Modular Encryption in parquet-cpp

2020-03-05 Thread Gidon Gershinsky (Jira)
Gidon Gershinsky created ARROW-8018: --- Summary: Parquet Modular Encryption in parquet-cpp Key: ARROW-8018 URL: https://issues.apache.org/jira/browse/ARROW-8018 Project: Apache Arrow Issue

Re: Merged C++ Parquet Encryption implementation PARQUET-1300

2019-11-08 Thread Gidon Gershinsky
Wes, Thank you for reviewing and merging this project. Regarding the note - we'll have interop testers in parquet-mr, so that cpp-written files, encrypted in various modes, would be tested by java readers - and vice versa. These manual tests could be run during development and ahead of releases. F