Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Paul Balança
If I may, I would be really interested to be kept in the loop as well. I have been working on a small library making it easy to declare Python types and automatically getting them supported in Pyarrow as extension types (and then benefit of vecotrized ops) : https://github.com/balancap/arrowbic

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Micah Kornfield
> > I do not know if we voted on a naming convention, but we may want to > reserve a namespace for us (e.g. "arrow"). +1 to calling out in docs that the arrow namespace should be reserved. maybe "apache.arrow" to lower the possibility of collisions with people who already have extension types? (I

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Jorge Cardoso Leitão
Note that we do not have tests on tensor arrays, so testing the extension type on these may be hindered by divergences between implementations. I do not think we even have json integration files for them. If the focus is extension types, maybe it would be best to cover types whose physical

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-08 Thread Joris Van den Bossche
On Mon, 7 Feb 2022 at 21:02, Rok Mihevc wrote: > To follow up the discussion from the bi-weekly Arrow sync: > > - JSON seems the most suitable candidate for the extension metadata. > E.g.: TensorArray > {"key": "ARROW:extension:name", "value": "tensor 3, 4), strides=(12, 4, 1)>"}, > {"key":

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-02-07 Thread Rok Mihevc
To follow up the discussion from the bi-weekly Arrow sync: - JSON seems the most suitable candidate for the extension metadata. E.g.: TensorArray {"key": "ARROW:extension:name", "value": "tensor"}, {"key": "ARROW:extension:metadata", "value": "{'type': 'int64', 'shape': [3, 3, 4], 'strides': [12,

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-25 Thread Antoine Pitrou
Le 25/01/2022 à 10:12, Joris Van den Bossche a écrit : On Sat, 22 Jan 2022 at 20:27, Rok Mihevc wrote: Thanks for the input Weston! How about arrow/experimental/format/ExtensionTypes.fbs or arrow/format/ExtensionTypes.fbs for language independent schema and loosely arrow//extensions for

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-25 Thread Joris Van den Bossche
On Sat, 22 Jan 2022 at 20:27, Rok Mihevc wrote: > > Thanks for the input Weston! > > How about arrow/experimental/format/ExtensionTypes.fbs or > arrow/format/ExtensionTypes.fbs for language independent schema and > loosely arrow//extensions for implementations? > > Having machine readable

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-22 Thread Micah Kornfield
Sorry meant to add, that I think the C++ implementation should go where-ever is most convenient to make it work well in the system (unless the type requires heavy third-party dependencies). On Sat, Jan 22, 2022 at 8:53 PM Micah Kornfield wrote: > Do we need a vote on this? > > I was imagining

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-22 Thread Micah Kornfield
> > Do we need a vote on this? I was imagining well known types would follow roughly the same process that new types follow (requiring two different language implementations and an integration test). I don't think we need to stick to java as the second language though. On Sat, Jan 22, 2022 at

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-22 Thread Rok Mihevc
Thanks for the input Weston! How about arrow/experimental/format/ExtensionTypes.fbs or arrow/format/ExtensionTypes.fbs for language independent schema and loosely arrow//extensions for implementations? Having machine readable definitions could perhaps be useful for generating implementations in

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-20 Thread Weston Pace
Those all seem to be C++ locations. If we want to define cross-implementation "Well Known Extension Types" then it seems we would want to come up with some kind of language independent agreement (could just be a markdown file but maybe there is some advantage to having something programmatically

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2022-01-20 Thread Rok Mihevc
To continue the ExtensionType part of this thread - I would like to add TensorArray [1] as an ExtensionType to Arrow but we have not yet agreed on an "official" location for "Well Known Extension Types". Where could we put these? Some suggestions: * implementation folders (e.g.

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-05-01 Thread Andrew Lamb
I agree with others on this thread. Thanks for writing this down Micah On Fri, Apr 30, 2021 at 11:16 AM Antoine Pitrou wrote: > > I concur with both what Wes and Micah said. > > As for temporal types, they have wide-spread use and their semantics > require dedicated treatment for arithmetic and

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-30 Thread Antoine Pitrou
I concur with both what Wes and Micah said. As for temporal types, they have wide-spread use and their semantics require dedicated treatment for arithmetic and conversion, so it's helpful to define dedicated types for them, as opposed to mere annotations. Regards Antoine. Le 30/04/2021

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-30 Thread Wes McKinney
I agree that the bar for adding new types to the Type union in Schema.fbs should be quite high going forward. Using extension types increasingly for adding specializations of built-in types will mean less burden for implementations to simply "propagate forward" this data (by preserving the extra

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-30 Thread Brian Hulette
+1 this looks good to me. My only concern is with criteria #3 " Is the underlying encoding of the type already semantically supported by a type?". I think this is a good criteria, but it's inconsistent with the current spec. By that criteria some existing types (Timestamp, Time, Duration, Date)

Re: [DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-29 Thread Jorge Cardoso Leitão
Thanks for writing this. I agree. That is a good decision tree. +1 Best, Jorge On Thu, Apr 29, 2021 at 6:08 PM Micah Kornfield wrote: > The discussion around adding another interval type to the Schema.fbs raises > the issue of when do we decide to add a new type to the Schema.fbs vs using >

[DISCUSS] New Types (Schema.fbs vs Extension Types)

2021-04-29 Thread Micah Kornfield
The discussion around adding another interval type to the Schema.fbs raises the issue of when do we decide to add a new type to the Schema.fbs vs using other means (primarily extension types [1]). A few criteria come to mind that could help decide (feedback welcome): 1. Is the type a new