Re: Is Parquet Meant As a Standalone Database or is a Catalog/Metastore Required?

2024-05-18 Thread Curt Hagenlocher
While CSV is still the undisputed monarch of exchanging data via files, Parquet is arguably "top 3" -- and this is a scenario in which the file does really need to be self-contained. On Sat, May 18, 2024 at 9:01 AM Raphael Taylor-Davies wrote: > Hi Fokko, > > I am aware of catalogs such as

Re: Interest in Parquet V3

2024-05-13 Thread Curt Hagenlocher
Mihevc wrote: > I would be quite interested in working on data skipping and metadata > bottlenecks (points 1. and 2.). > > On Mon, May 13, 2024 at 5:28 PM Curt Hagenlocher > wrote: > > > One of the things they've done in the Delta table format which I think is > > sm

Re: Interest in Parquet V3

2024-05-13 Thread Curt Hagenlocher
One of the things they've done in the Delta table format which I think is smart is to stop using version numbers and instead start identifying specific features used by the table in a generic fashion. So instead of checking an opaque version number, a reader looks at the list of features and can

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-30 Thread Curt Hagenlocher
u can check out code in parquet rust module! > > I am not sure about parquet-cpp, we can use that implementation as > reference there. > > > On Mon, 29 Apr 2019 at 5:39 PM, Curt Hagenlocher > wrote: > > > Would that be covered by PARQUET-458 ( > > https://issues.

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
Would that be covered by PARQUET-458 ( https://issues.apache.org/jira/browse/PARQUET-458)? On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney wrote: > Is there a JIRA issue about data page v2 issues in parquet-cpp? > > On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher > wrote: > >

Re: Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
ou have > encountered could be related to the fact that parquet-cpp does not support > decoding of data page v2. > > > Cheers, > > Ivan > > On Mon, 29 Apr 2019 at 3:36 PM, Curt Hagenlocher > wrote: > >> To the best of my ability to tell, there is i

Error in parquet-testing/data/datapage_v2.snappy.parquet?

2019-04-29 Thread Curt Hagenlocher
To the best of my ability to tell, there is invalid Snappy data in the file parquet-testing/data/datapage_v2.snappy.parquet. I can neither read it with my own code nor with pyarrow 0.13.0. Is this expected to work? Thanks! -Curt