Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-28 Thread Jörn Horstmann
Hi, On Wed, Oct 27, 2021 at 7:57 PM Antoine Pitrou wrote: > > This seems to assume that many or most arrays will have non-zero > offsets. Is this something that commonly happens in the Rust Arrow > world? In Arrow C++ I'm not sure non-zero offsets appear very frequently. > > Regards > >

Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-27 Thread Jorge Cardoso Leitão
Hi, > A big +1 to this, covering all the edge cases with slices is pretty complicated (there was at least one long standing bug related to this in the 6.0 release). I imagine there are potentially more lurking in the code base. Thanks for this observation, arrow-rs faces a similar issue: it is

Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-27 Thread Antoine Pitrou
Le 26/10/2021 à 21:30, Jorge Cardoso Leitão a écrit : Hi, One aspect of the design of "arrow2" is that it deals with array slices differently from the rest of the implementations. Essentially, the offset is not stored in ArrayData, but on each individual Buffer. Some important consequence

Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-26 Thread Micah Kornfield
> > To understand why this is the case, consider comparing two boolean arrays > (a, b), where "a" has been sliced and has a validity and "b" does not. In > this case, we could compare the values of the arrays (taking into account > "a"'s offset), and clone "a"'s validity. However, this does not

Re: [Discuss] Single offset per array has a non-trivial performance implication

2021-10-26 Thread Weston Pace
I don't think the presence of array-level offsets precludes the presence of buffer-level offsets. For example, in the C++ implementation we have both buffer offsets and array offsets. Buffer offsets are used mainly in the IPC layer I think when we are constructing arrays from larger memory

[Discuss] Single offset per array has a non-trivial performance implication

2021-10-26 Thread Jorge Cardoso Leitão
Hi, One aspect of the design of "arrow2" is that it deals with array slices differently from the rest of the implementations. Essentially, the offset is not stored in ArrayData, but on each individual Buffer. Some important consequence are: * people can work with buffers and bitmaps without