Matt already mentioned this earlier (thanks Matt!), but I wanted to add another voice from RAPIDS saying that the new representation should work fine for libcudf and would certainly be helpful.
On 2024/07/25 13:48:32 Joel Lubinitsky wrote: > Thank you everyone for contributing to this discussion. > > I'd like to summarize where I think we've landed at this point: > - After considering pros/cons of first-class vs canonical extension type > and historical precedent, adopting Bool8 as a canonical extension type > seems reasonable for this proposal. > - There was some discussion about "true == 1" vs "true != 0" semantics. The > conclusion is that all systems must interpret any nonzero value as true for > interoperability, but 1 is preferred when producing/casting Bool8 if > implementations are deciding on a canonical value. > > Additionally the format change [1] and Go implementation [2] have been > split into separate PRs as requested by several reviewers. > > Please share any additional comments or anything I may have missed. If this > all seems reasonable, I will move forward with an additional implementation > in C++ and open this to a formal vote. > > Thanks, > Joel > > > [1]: https://github.com/apache/arrow/pull/43234 > [2]: https://github.com/apache/arrow/pull/43323 > > On Mon, Jul 22, 2024 at 5:59 PM Wes McKinney > <we...@gmail.com<mailto:we...@gmail.com>> wrote: > > > From a historical perspective, if we had had extension types / canonical > > extension types, it would have made more sense to have the millisecond > > dates as an extension type. > > > > The goal of having the extra type was to avoid an unnecessary serialization > > in systems where there is a benefit to moving data efficiently over the > > wire, and here it is the same — to be able to move 8-bit boolean data > > without serialization from process to process in a reasonably standardized > > way. > > > > Because boolean data is used much more than date data (in general), it > > seems like it would be more burdensome for implementations if a 8-bit > > boolean type were promoted to equal status with the 1-bit type. > > > > On Mon, Jul 22, 2024 at 2:33 PM Antoine Pitrou > > <an...@python.org<mailto:an...@python.org>> wrote: > > > > > > > > Le 22/07/2024 à 21:25, Joel Lubinitsky a écrit : > > > > > > > > If Canonical Extensions had existed at the time, I think there's a > > chance > > > > we may have ended up with int32 Date as a first class type and int64 > > > > MillisecondDate as a Canonical Extension type. > > > > > > Agreed. > > > > > > > Are there any lessons we've > > > > learned from implementing both as first-class types as opposed to this > > > > hypothetical first-class / extension split? > > > > > > In Arrow C++, not many lessons I'd say, because those date types don't > > > support many operations. > > > > > > Regards > > > > > > Antoine. > > > > > >