Re: Parquet File Meta Data & Compatibility

2020-12-10 Thread Gabor Szadovszky
Created PARQUET-1950 to track this. Feel free to comment in the jira or wait for the draft PR. On Tue, Dec 8, 2020 at 8:06 PM Tim Armstrong wrote: > I agree with Gabor's idea - I love getting into nitty gritty but this > probably isn't the rig

Re: Parquet File Meta Data & Compatibility

2020-12-08 Thread Tim Armstrong
I agree with Gabor's idea - I love getting into nitty gritty but this probably isn't the right place. On Tue, Dec 8, 2020 at 4:26 AM Gabor Szadovszky wrote: > Without discussing further what should be included in the list of core > features I would propose a framing of this idea. > > Instead of

Re: Parquet File Meta Data & Compatibility

2020-12-08 Thread Gabor Szadovszky
Without discussing further what should be included in the list of core features I would propose a framing of this idea. Instead of commenting the different objects in the thrift file (or in the other related documentations) for being "experimental" or so I think it is more clear to have a separate

Re: Parquet File Meta Data & Compatibility

2020-12-07 Thread Tim Armstrong
> Introducing new logical types as "experimental" is a bit tricky. Maybe experimental is a bad term. I think mostly new features in the format do need to be backwards compatible and not buggy because data lasts a long time once it's written. Maybe "incubating" or "preview" is a better term. I guess

Re: Parquet File Meta Data & Compatibility

2020-12-07 Thread Gabor Szadovszky
I agree on separating the non widely used features to make the life of the implementers easier and to improve compatibility between these implementations. Meanwhile, it is not always clear how to define the core features. For example, the encryption feature will be released soon in parquet-mr and I

Re: Parquet File Meta Data & Compatibility

2020-12-06 Thread Micah Kornfield
I tried to get some level of clarification in a PR [1]. It kind of stalled because we had further conversation on a sync call a while ago that I have not had a chance to follow-up on. I'm happy to revise if we can come up with some sort of consensus for experimental/non-experimental. The things

Re: Parquet File Meta Data & Compatibility

2020-12-04 Thread Antoine Pitrou
On Fri, 4 Dec 2020 11:21:58 -0800 Tim Armstrong wrote: > I probably didn't say it very clearly, but my opinion as a consumer of the > Parquet spec is that the format needs a reset where encodings, logical > types and other metadata that are not widely adopted are removed from the > core spec and p

Re: Parquet File Meta Data & Compatibility

2020-12-04 Thread Tim Armstrong
I probably didn't say it very clearly, but my opinion as a consumer of the Parquet spec is that the format needs a reset where encodings, logical types and other metadata that are not widely adopted are removed from the core spec and put in an "experimental" category from which we can later promote

Re: Parquet File Meta Data & Compatibility

2020-12-04 Thread Tim Armstrong
I think it would be good for the project to define a core set of features that a Parquet implementation must support to be able to correctly read files all written by another compliant writer with the same version. There are then additional extensions like page indices that are not required to act

Re: Parquet File Meta Data & Compatibility

2020-10-16 Thread Micah Kornfield
> > IMHO, shouldn't the spec mention - quite precisely - what versions exist > and what features can be used in which version, so an implementation can > say "yes, I can fully write this versions" or "no, I can't" instead of > having a fuzzy set of features where some are described to "not work on

Parquet File Meta Data & Compatibility

2020-10-16 Thread Jan Finis
Hey folks, First of all, thanks for this great project! I am currently writing a library for reading/writing parquet, and I am a bit confused by some points, which I would like to discuss here. I think they will be relevant to anyone wanting to write own parquet reading/writing logic in a new