Matt already mentioned this earlier (thanks Matt!), but I wanted to add another 
voice from RAPIDS saying that the new representation should work fine for 
libcudf and would certainly be helpful.

On 2024/07/25 13:48:32 Joel Lubinitsky wrote:
> Thank you everyone for contributing to this discussion.
>
> I'd like to summarize where I think we've landed at this point:
> - After considering pros/cons of first-class vs canonical extension type
> and historical precedent, adopting Bool8 as a canonical extension type
> seems reasonable for this proposal.
> - There was some discussion about "true == 1" vs "true != 0" semantics. The
> conclusion is that all systems must interpret any nonzero value as true for
> interoperability, but 1 is preferred when producing/casting Bool8 if
> implementations are deciding on a canonical value.
>
> Additionally the format change [1] and Go implementation [2] have been
> split into separate PRs as requested by several reviewers.
>
> Please share any additional comments or anything I may have missed. If this
> all seems reasonable, I will move forward with an additional implementation
> in C++ and open this to a formal vote.
>
> Thanks,
> Joel
>
>
> [1]: https://github.com/apache/arrow/pull/43234
> [2]: https://github.com/apache/arrow/pull/43323
>
> On Mon, Jul 22, 2024 at 5:59 PM Wes McKinney 
> <we...@gmail.com<mailto:we...@gmail.com>> wrote:
>
> > From a historical perspective, if we had had extension types / canonical
> > extension types, it would have made more sense to have the millisecond
> > dates as an extension type.
> >
> > The goal of having the extra type was to avoid an unnecessary serialization
> > in systems where there is a benefit to moving data efficiently over the
> > wire, and here it is the same — to be able to move 8-bit boolean data
> > without serialization from process to process in a reasonably standardized
> > way.
> >
> > Because boolean data is used much more than date data (in general), it
> > seems like it would be more burdensome for implementations if a 8-bit
> > boolean type were promoted to equal status with the 1-bit type.
> >
> > On Mon, Jul 22, 2024 at 2:33 PM Antoine Pitrou 
> > <an...@python.org<mailto:an...@python.org>> wrote:
> >
> > >
> > > Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit :
> > > >
> > > > If Canonical Extensions had existed at the time, I think there's a
> > chance
> > > > we may have ended up with int32 Date as a first class type and int64
> > > > MillisecondDate as a Canonical Extension type.
> > >
> > > Agreed.
> > >
> > > > Are there any lessons we've
> > > > learned from implementing both as first-class types as opposed to this
> > > > hypothetical first-class / extension split?
> > >
> > > In Arrow C++, not many lessons I'd say, because those date types don't
> > > support many operations.
> > >
> > > Regards
> > >
> > > Antoine.
> > >
> >
>

Reply via email to