Hi, I'm working on a data processing tool that guarantees reproducibility / determinism of operations. It's a frequent task for me to verify that one dataset (Table) is equivalent to another.
I didn't find any functions related to computing hash sums in Arrow, but wondering if anyone knows existing implementations? If I were to implement a hashing over chunked arrays myself, does Arrow guarantee that any sort of padding between aligned values is zeroed-out, so that hashes are perfectly stable? Bonus question: Has anyone seen hashing algorithms for tabular data that can check for equivalence (rather than equality)? i.e. I consider datasets equivalent if they contain the same set of records, but not necessarily in the same order. Thank you! - Sergii
