Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-10 Thread Joris Van den Bossche
On Thu, 10 Feb 2022 at 14:22, Antoine Pitrou wrote: > > > Le 10/02/2022 à 14:09, Alessandro Molina a écrit : > > Mentioned this already to Joris, but want to make sure we don't miss it. > > > > C-Data and thus ARROW:extension:metadata was mostly designed for shipping > > data to different

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-10 Thread Antoine Pitrou
Le 10/02/2022 à 14:09, Alessandro Molina a écrit : Mentioned this already to Joris, but want to make sure we don't miss it. C-Data and thus ARROW:extension:metadata was mostly designed for shipping data to different processes within the same host. ARROW:extension:metadata is unrelated to

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-10 Thread Alessandro Molina
Mentioned this already to Joris, but want to make sure we don't miss it. C-Data and thus ARROW:extension:metadata was mostly designed for shipping data to different processes within the same host. If we start using the spec for further uses, including saving it to files that could be read across

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Dewey Dunnington
I'll share a bit more about geospatial extension types that Joris mentioned. I'm new to the Arrow community and didn't know that there were any restrictions on metadata values (the C Data interface docs don't seem to indicate that there are restrictions, or if it's there I missed it!), so I used

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Weston Pace
I think I'm +0 but lean slightly towards JSON. In favor of binary I would guess that most extension types are going to have relatively simple parameterization (to the point that protobuf/flatbuffers isn't really needed). For example, the substrate consumer PR has five extension types at the

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
On Tue, 8 Feb 2022 at 17:37, Jorge Cardoso Leitão wrote: > ... > > Wrt to binary, imo the challenge is: > * we state that backward incompatible changes to the c data interface > require a new spec [1] > Note that this discussion wouldn't change anything about the C Data Interface spec itself.

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Micah Kornfield
> > One possible alternative could be to use the format as specified in the C > Data Interface for key-value metadata: > > https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata > (there it is used for the actual key-value metadata of a field, while here > it is for

Re: [Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Jorge Cardoso Leitão
Hi, Great questions and write up. Thanks! imo dragging a JSON reader and writer to read official extension types' metadata seems overkill. The c data interface is expected to be quite low level. Imo we should aim for a (non-human readable) binary format. For non-official, imo you are spot on -

[Discuss] Best practice for storing key-value metadata for Extension Types

2022-02-08 Thread Joris Van den Bossche
Hi all, There is currently some discussion regarding how we can formalize/document "well known" extension types (see the "[DISCUSS] New Types (Schema.fbs vs Extension Types)" thread). There is ongoing work on an extension type to store arrays / tensors by Rok (