Hi everyone,
I want to ask the proper practice for doing checksums on parquet with
modular encryption using pyarrow. My current process is:
1. Create a parquet file (not yet encrypred) and generate checksum.
2. Create the encrypted version of the file using the ParquetWriter with
encryption properties.
3. Send the encrypted file and checksum to somewhere.
4. Decrypt the file using ParquetFile and write it as a decrypted parquet
file.
5. Compare checksum

I want to do checksum on the original file and the decrypted file to ensure
data integrity. But, I am aware that there could be metadata difference
because of different writer version. What is the proper way to do this?

I am also wondering if checksums are not necessary in this case. Is there
already a mechanism to ensure the integrity in between encrypt, transfer,
and decrypt process?

Thank you

Reply via email to