Further reading: https://en.wikipedia.org/wiki/Authenticated_encryption
AES-GCM is a form of Authenticated Encryption. On Thu, Feb 27, 2025 at 3:33 AM Antoine Pitrou <[email protected]> wrote: > > Hello, > > Parquet encryption ensures integrity if you use the default encryption > algorithm AES_GCM (not AES_CTR). You don't have to checksum the file > yourself. > > Regards > > Antoine. > > > On Tue, 25 Feb 2025 16:19:59 +0700 > Jason Sebastian Kusuma <[email protected]> wrote: > > Hi everyone, > > I want to ask the proper practice for doing checksums on parquet with > > modular encryption using pyarrow. My current process is: > > 1. Create a parquet file (not yet encrypred) and generate checksum. > > 2. Create the encrypted version of the file using the ParquetWriter with > > encryption properties. > > 3. Send the encrypted file and checksum to somewhere. > > 4. Decrypt the file using ParquetFile and write it as a decrypted parquet > > file. > > 5. Compare checksum > > > > I want to do checksum on the original file and the decrypted file to > ensure > > data integrity. But, I am aware that there could be metadata difference > > because of different writer version. What is the proper way to do this? > > > > I am also wondering if checksums are not necessary in this case. Is there > > already a mechanism to ensure the integrity in between encrypt, transfer, > > and decrypt process? > > > > Thank you > > > > > >
