Re: Developing a "dataset" API / framework for Arrow C++ users

2019-02-25 Thread Joel Pfaff
Hello, Thanks for the write-up. Have you considered sharing this document with the Apache Iceberg community? My feeling is that there are some shared goals here between the two projects. And while their implementation is in Java, their spec is language agnostic. Regards, Joel On Sun, Feb 24,

Re: How to append to parquet file periodically and read intermediate data - pyarrow.lib.ArrowIOError: Invalid parquet file. Corrupt footer.

2018-12-19 Thread Joel Pfaff
t; worth it compared with using a slower file format (like Avro) > > > > - Wes > > > >> On Wed, Dec 19, 2018 at 7:37 AM Joel Pfaff > wrote: > >> > >> Hello, > >> > >> For my company's usecases, we have found that the number

Re: How to append to parquet file periodically and read intermediate data - pyarrow.lib.ArrowIOError: Invalid parquet file. Corrupt footer.

2018-12-19 Thread Joel Pfaff
Hello, For my company's usecases, we have found that the number of files was a critical part of the time spent doing the execution plan, so we found the idea of very regularly writing small parquet files to be rather inefficient. There are some formats that support an `append` semantic (I have