Hi everyone, I want to ask the proper practice for doing checksums on parquet with modular encryption using pyarrow. My current process is: 1. Create a parquet file (not yet encrypred) and generate checksum. 2. Create the encrypted version of the file using the ParquetWriter with encryption properties. 3. Send the encrypted file and checksum to somewhere. 4. Decrypt the file using ParquetFile and write it as a decrypted parquet file. 5. Compare checksum
I want to do checksum on the original file and the decrypted file to ensure data integrity. But, I am aware that there could be metadata difference because of different writer version. What is the proper way to do this? I am also wondering if checksums are not necessary in this case. Is there already a mechanism to ensure the integrity in between encrypt, transfer, and decrypt process? Thank you
