One possibility could be to calculate the hash of the logical data when serializing and then put the hash in the metadata.
> I'm not even sure this can actually happen ... After all buffers should only > carry primitive types (not some complex structs) and they all seem to be > 16/32/64/128 bit long and should produce "gapless" buffers. Arrow buffers are aligned on 8 or 64 byte boundaries and there is a preference to align on 64 byte boundaries. So I think gaps/padding is a real possibility. On Fri, Dec 3, 2021 at 3:05 PM Sergii Mikhtoniuk <[email protected]> wrote: > > Apologies for the confusion, I was using wrong terminology. When I was > talking about "array chunks" I meant Buffers - contiguous slices of memory > with nullability, offsets, or value data. > > If Arrow is not explicit about Buffers having to be memset to zero before use > - whenever the size of the vale is not a multiple of its alignment we would > have garbage in between, messing up the stability of a buffer-wise hash. > > I'm not even sure this can actually happen ... After all buffers should only > carry primitive types (not some complex structs) and they all seem to be > 16/32/64/128 bit long and should produce "gapless" buffers.
