While CSV is still the undisputed monarch of exchanging data via files,
Parquet is arguably "top 3" -- and this is a scenario in which the file
does really need to be self-contained.
On Sat, May 18, 2024 at 9:01 AM Raphael Taylor-Davies
wrote:
> Hi Fokko,
>
> I am aware of catalogs such as
Mihevc wrote:
> I would be quite interested in working on data skipping and metadata
> bottlenecks (points 1. and 2.).
>
> On Mon, May 13, 2024 at 5:28 PM Curt Hagenlocher
> wrote:
>
> > One of the things they've done in the Delta table format which I think is
> > sm
One of the things they've done in the Delta table format which I think is
smart is to stop using version numbers and instead start identifying
specific features used by the table in a generic fashion. So instead of
checking an opaque version number, a reader looks at the list of features
and can
u can check out code in parquet rust module!
>
> I am not sure about parquet-cpp, we can use that implementation as
> reference there.
>
>
> On Mon, 29 Apr 2019 at 5:39 PM, Curt Hagenlocher
> wrote:
>
> > Would that be covered by PARQUET-458 (
> > https://issues.
Would that be covered by PARQUET-458 (
https://issues.apache.org/jira/browse/PARQUET-458)?
On Mon, Apr 29, 2019 at 8:18 AM Wes McKinney wrote:
> Is there a JIRA issue about data page v2 issues in parquet-cpp?
>
> On Mon, Apr 29, 2019 at 9:57 AM Curt Hagenlocher
> wrote:
> >
ou have
> encountered could be related to the fact that parquet-cpp does not support
> decoding of data page v2.
>
>
> Cheers,
>
> Ivan
>
> On Mon, 29 Apr 2019 at 3:36 PM, Curt Hagenlocher
> wrote:
>
>> To the best of my ability to tell, there is i
To the best of my ability to tell, there is invalid Snappy data in the file
parquet-testing/data/datapage_v2.snappy.parquet. I can neither read it with
my own code nor with pyarrow 0.13.0. Is this expected to work?
Thanks!
-Curt