Further reading: https://en.wikipedia.org/wiki/Authenticated_encryption

AES-GCM is a form of Authenticated Encryption.

On Thu, Feb 27, 2025 at 3:33 AM Antoine Pitrou <[email protected]> wrote:

>
> Hello,
>
> Parquet encryption ensures integrity if you use the default encryption
> algorithm AES_GCM (not AES_CTR). You don't have to checksum the file
> yourself.
>
> Regards
>
> Antoine.
>
>
> On Tue, 25 Feb 2025 16:19:59 +0700
> Jason Sebastian Kusuma <[email protected]> wrote:
> > Hi everyone,
> > I want to ask the proper practice for doing checksums on parquet with
> > modular encryption using pyarrow. My current process is:
> > 1. Create a parquet file (not yet encrypred) and generate checksum.
> > 2. Create the encrypted version of the file using the ParquetWriter with
> > encryption properties.
> > 3. Send the encrypted file and checksum to somewhere.
> > 4. Decrypt the file using ParquetFile and write it as a decrypted parquet
> > file.
> > 5. Compare checksum
> >
> > I want to do checksum on the original file and the decrypted file to
> ensure
> > data integrity. But, I am aware that there could be metadata difference
> > because of different writer version. What is the proper way to do this?
> >
> > I am also wondering if checksums are not necessary in this case. Is there
> > already a mechanism to ensure the integrity in between encrypt, transfer,
> > and decrypt process?
> >
> > Thank you
> >
>
>
>
>

Reply via email to