Given the constraints of not changing the existing struct definitions,
adding a new buffer seems like the only way forward from what I understand.
It is unfortunate that each array now needs need a new allocation (the
buffer lengths) when passing via FFI, but I don't have any other
suggestions
+1 for the original proposal as well.
---
The (minor) problem I see with flags is that there isn't much point to this
feature if you are gating on a flag. I'm assuming the goal is what Dewey
originally mentioned which is making buffer calculations easier. However,
if you're gating the feature
I agree with the approach originally proposed by Ben. It seems like the
most straightforward way to implement within the current protocol.
On Sun, Oct 29, 2023 at 4:59 PM Dewey Dunnington
wrote:
> In the absence of a general solution to the C data interface omitting
> buffer sizes, I think the
In the absence of a general solution to the C data interface omitting
buffer sizes, I think the original proposal is the best way
forward...this is the first type to be added whose buffer sizes cannot
be calculated without looping over every element of the array; the
buffer sizes are needed to
> This begs the question of what happens if a consumer receives an unknown
> flag value.
It seems to me that ignoring unknown flags is the primary case to consider
at
this point, since consumers may ignore unknown flags. Since that is the
case,
it seems adding any flag which would break such a
> This begs the question of what happens if a consumer receives an unknown flag
> value
That's a great point...I might be the only person who has implemented
a deep copy of an ArrowSchema in C, but it does blindly pass along a
schema's flag value (which in the scenario I proposed could lead to a
I'm afraid I've derailed the discussion into solving a bigger problem
than strictly necessary. I don't think this is the time to solve the
general problem of the C data interface having no way to communicate
buffer sizes, particularly since there's no immediate agreement on its
utility or
Le 26/10/2023 à 20:02, Benjamin Kietzman a écrit :
Is this buffer lengths buffer only present if the array type is Utf8View?
IIUC, the proposal would add the buffer lengths buffer for all types if the
schema's
flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
the
> Is this buffer lengths buffer only present if the array type is Utf8View?
IIUC, the proposal would add the buffer lengths buffer for all types if the
schema's
flags include ARROW_FLAG_BUFFER_LENGTHS. I do find it appealing to avoid
the special case and that `n_buffers` would continue to be
Is this buffer lengths buffer only present if the array type is Utf8View?
Or are you suggesting that other types might want to adopt this as well?
On Thu, Oct 26, 2023 at 10:00 AM Dewey Dunnington
wrote:
> > I expect C code to not be much longer then this :-)
>
> nanoarrow's
Le 26/10/2023 à 18:59, Dewey Dunnington a écrit :
That sounds a bit hackish to me.
Including only *some* buffer sizes in array->buffers[array->n_buffers]
special-cased for only two types (or altering the number of buffers
required by the IPC format vs. the number of buffers required by the
> I expect C code to not be much longer then this :-)
nanoarrow's buffer-length-calculation and validation concepts are
(perhaps inadvisably) intertwined...even with both it is not that much
code (perhaps I was remembering how much time it took me to figure out
which 35 lines to write :-))
>
Le 26/10/2023 à 17:45, Dewey Dunnington a écrit :
The lack of buffer sizes is something that has come up for me a few
times working with nanoarrow (which dedicates a significant amount of
code to calculating buffer sizes, which it uses to do validation and
more efficient copying).
By the
Le 26/10/2023 à 17:45, Dewey Dunnington a écrit :
> A potential alternative might be to allow any ArrowArray to declare
> its buffer sizes in array->buffers[array->n_buffers], perhaps with a
> new flag in schema->flags to advertise that capability.
That sounds a bit hackish to me.
I'd rather
Ben kindly explained to me offline that the need for the buffer sizes
is because when Arrow C++ imports an Array it creates Buffer class
wrappers around the imported pointers. Arrow C++ does not have a
notion of a buffer of unknown size to my knowledge, which leaves two
undesirable alternatives:
Hello,
We might want to keep the variadic buffers at the end and instead export
the buffer sizes as buffer #2? Though that's mostly stylistic...
Regards
Antoine.
Le 25/10/2023 à 18:36, Benjamin Kietzman a écrit :
Hello all,
The C ABI does not store buffer lengths explicitly, which
Worth noting: the c data interface explicitly forbids adding new members
[1] to its structs, so simply adding ArrowArray::buffer_sizes is not viable.
[1]
https://github.com/bkietz/arrow/blob/0afb739a16672483b69894c6fe3f5ece5cfc79d8/docs/source/format/CDataInterface.rst?plain=1#L984-L986
On Wed,
Hello all,
The C ABI does not store buffer lengths explicitly, which presents a
problem for Utf8View since buffer lengths are not trivially extractable
from other data in the array. A potential solution is to store the lengths
in an extra buffer after the variadic data buffers. I've adopted this
18 matches
Mail list logo