One possibility could be to calculate the hash of the logical data
when serializing and then put the hash in the metadata.

> I'm not even sure this can actually happen ... After all buffers should only 
> carry primitive types (not some complex structs) and they all seem to be 
> 16/32/64/128 bit long and should produce "gapless" buffers.

Arrow buffers are aligned on 8 or 64 byte boundaries and there is a
preference to align on 64 byte boundaries.  So I think gaps/padding is
a real possibility.

On Fri, Dec 3, 2021 at 3:05 PM Sergii Mikhtoniuk <[email protected]> wrote:
>
> Apologies for the confusion, I was using wrong terminology. When I was 
> talking about "array chunks" I meant Buffers - contiguous slices of memory 
> with nullability, offsets, or value data.
>
> If Arrow is not explicit about Buffers having to be memset to zero before use 
> - whenever the size of the vale is not a multiple of its alignment we would 
> have garbage in between, messing up the stability of a buffer-wise hash.
>
> I'm not even sure this can actually happen ... After all buffers should only 
> carry primitive types (not some complex structs) and they all seem to be 
> 16/32/64/128 bit long and should produce "gapless" buffers.

Reply via email to