Matt already mentioned this earlier (thanks Matt!), but I wanted to add another
voice from RAPIDS saying that the new representation should work fine for
libcudf and would certainly be helpful.
On 2024/07/25 13:48:32 Joel Lubinitsky wrote:
> Thank you everyone for contributing to this discussion
Thank you everyone for contributing to this discussion.
I'd like to summarize where I think we've landed at this point:
- After considering pros/cons of first-class vs canonical extension type
and historical precedent, adopting Bool8 as a canonical extension type
seems reasonable for this proposal
>From a historical perspective, if we had had extension types / canonical
extension types, it would have made more sense to have the millisecond
dates as an extension type.
The goal of having the extra type was to avoid an unnecessary serialization
in systems where there is a benefit to moving dat
Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit :
If Canonical Extensions had existed at the time, I think there's a chance
we may have ended up with int32 Date as a first class type and int64
MillisecondDate as a Canonical Extension type.
Agreed.
Are there any lessons we've
learned from im
> As a counterpoint we have two different units for date [1] that don't
> really convey meaningful new information. Implementations have to deal
> with this somehow, and I think the only reason this exists is effectively
> to support different type systems.
Thanks for raising this Micah,
The pre
On Fri, 19 Jul 2024 10:03:48 -0300
Dewey Dunnington wrote:
> The extension-ness of it is a valid point...all the other cases where
> we have multiple Arrow types for the same element type (e.g., String,
> LargeString, StringView) are first-class types. For a Bool8, the
> tradeoffs are roughly the
I'm fine in principle with this being an extension type I just want to make
sure we had this conversation. Some replies inline.
I think it would confuse implementors of the spec and people implementing
> kernels way too much. “the bool Arrow type” should probably not start
> meaning two different
The extension-ness of it is a valid point...all the other cases where
we have multiple Arrow types for the same element type (e.g., String,
LargeString, StringView) are first-class types. For a Bool8, the
tradeoffs are roughly the same (less support for StringView and
LargeString, more space requir
Agreed with Felipe. This is meant for communicating with non-Arrow type
systems, but shouldn't be regarded as an alternative first-class boolean
type.
Regards
Antoine.
Le 19/07/2024 à 06:30, Felipe Oliveira Carvalho a écrit :
I think it would confuse implementors of the spec and people i
I think it would confuse implementors of the spec and people implementing
kernels way too much. “the bool Arrow type” should probably not start
meaning two different things.
—
Felipe
On Fri, 19 Jul 2024 at 01:26 Micah Kornfield wrote:
> As Boolean is already in the arrow type system I think it
As Boolean is already in the arrow type system I think it might be worth
asking the question as to whether this should be an extension type or a
first class type.
Given what I think of the last discussion on the trade-offs [1], I think
there is room for debate here, since Boolean is not currently
Thank you Joel for working on this! I have also came across
the need for a byte packed boolean support when implementing the
Python dataframe interchange protocol and also DPack which
is implemented in Arrow C++. The extension type is a great solution.
I will comment on the PR if I have any questi
Thanks Joel and Matt. This looks good to me.
I think it's worth saying here that Arrow-producing components should still
by default emit Booleans in the standard bit-packed Arrow layout. This
proposed bool8 canonical extension type is intended to be used in
applications where the producer knows th
Just chiming in that the libcudf documentation[1] states that this proposal
should work just fine. Bool8 type is described as "0 == false, else true".
--Matt
[1]:
https://docs.rapids.ai/api/libcudf/stable/group__utility__types#gadf077607da617d1dadcc5417e2783539
On Wed, Jul 17, 2024, 3:18 PM Joel
Thank you for your comments.
I spent some time trying to confirm definitively that this proposal would
enable zero copy sharing both ways between pyarrow and numpy. I put
together the following gist [1] with my experiment.
To summarize the results:
- I was able to share the underlying value buffe
>> Before the vote, I would like to see verification that this truly enables
>> zero-copy to/from NumPy bool arrays in Python.
> I think this is an implementation issue more than a specification
issue...I am not personally worried about any provisions on the
specification that might make this impo
Thank you for this! I have definitely run across the one-byte-per-item
bool in numpy, DuckDB, and cudf. I haven't heard any discussion about
DuckDB here but I am fairly sure that they represent their boolean
type as an int8 as well [1].
> Before the vote, I would like to see verification that this
Hi Joel,
This looks good to me on the principle. Can you split the spec and the
implementation(s) into separate PRs?
Regards
Antoine.
Le 16/07/2024 à 13:18, Joel Lubinitsky a écrit :
Hi Arrow devs,
I'm working on adding an extension type for 8-bit booleans, and wanted to
start a discuss
Thanks for taking the initiative on this!
As demonstrated by [1], the wish for an 8-bit Boolean extension type is
long-standing. I think this is a worthwhile addition to Arrow's canonical
extension types.
Before the vote, I would like to see verification that this truly enables
zero-copy to/from
Hi Arrow devs,
I'm working on adding an extension type for 8-bit booleans, and wanted to
start a discussion about it here because it could be valuable to others if
adopted as a canonical extension type.
The native implementation of the Boolean type uses 1 bit to encode each
value, enabling a very
20 matches
Mail list logo