Re: [DISCUSS] Semantics of extension types

2023-12-15 Thread Dewey Dunnington
I also like these equivalence traits...in addition to being easy for extension type authors to specify when registering an extension type in Arrow C++, implementations that allow registration like pyarrow and arrow/R would be able to specify them easily, whereas implementing methods, compute functi

Re: [DISCUSS] Semantics of extension types

2023-12-14 Thread Jin Shang
I'm in favor of Antoine's proposal of storage equivalence traits[1]. For the sake of clarity I'll paste it here: I would suggest we perhaps need a more general semantic description of > storage type equivalence. > Draft: > class ExtensionType { > public: > // Storage equivalence for equality testi

Re: [DISCUSS] Semantics of extension types

2023-12-14 Thread Weston Pace
I agree engines can use their own strategy. Requiring explicit casts is probably ok as long as it is well documented but I think I lean slightly towards implicitly falling back to the storage type. I do think think people still shy away from extension types. Adding the extension type to an impli

Re: [DISCUSS] Semantics of extension types

2023-12-13 Thread Dewey Dunnington
Thank you for opening the discussion here and opening it up! I agree that attaching semantics as metadata and/or documenting them in a central repository is an unreasonable burden to put on extension type authors and Arrow implementations in general. I also agree that operations other than filter

Re: [DISCUSS] Semantics of extension types

2023-12-13 Thread Antoine Pitrou
Hi, For now, I would suggest that each implementation decides on their own strategy, because we don't have a clear idea of which is better (and extension types are probably not getting a lot of use yet). Regards Antoine. Le 13/12/2023 à 17:39, Benjamin Kietzman a écrit : The main proble

Re: [DISCUSS] Semantics of extension types

2023-12-13 Thread Benjamin Kietzman
The main problem I see with adding properties to ExtensionType is I'm not sure where that information would reside. Allowing type authors to declare in which ways the type is equivalent (or not) to its storage is appealing, but it seems to need an official extension field like ARROW:extension:seman

[DISCUSS] Semantics of extension types

2023-12-13 Thread Benjamin Kietzman
Hello all, Recently, a PR to arrow c++ [1] was opened to allow implicit casting from any extension type to its storage type in acero. This raises questions about the validity of applying operations to an extension array's storage. For example, some extension type authors may intend different order