[DISCUSS] Release Python Datafusion 0.3.0

2021-07-20 Thread Jorge Cardoso Leitão
Hi, I would like to gauge your interest in a release of the Python bindings for DataFusion. There has been a tremendous amount of updates to it, including support for Python 3.9. This release is backward compatible and there are no blockers. This would be the first time a release of this is

Re: C++ parquet::TypedColumnReader::ReadBatchSpaced() replacement?

2021-07-20 Thread Micah Kornfield
Hi Adam, > "ReadBatchSpaced() in a loop isfaster than reading an entire record > batch." Could you elaborate on this? What code path were you using for reading record batches that was slower? Did you try adjusting the batch size with ArrowReaderProperties [1] to be ~1000 rows also (by default

C++ parquet::TypedColumnReader::ReadBatchSpaced() replacement?

2021-07-20 Thread Adam Hooper
Hi list, Updating some code to Arrow 4.0, I noticed https://issues.apache.org/jira/browse/PARQUET-1899 deprecated parquet::TypedColumnReader::ReadBatchSpaced(). I use this function in a parquet-to-csv converter. It reads batches of 1,000 values at a time, allowing nulls. ReadBatchSpaced() in a

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-07-20 Thread Andrew Lamb
What I meant is that when you decide arrow2 is suitable for release to existing arrow users, I stand ready to help you incorporate it into arrow. All the feedback I have heard so far from the rest of the community is that we are ready. One might even say we are anxious to do so :) Andrew

Re: Apache Arrow Cookbook

2021-07-20 Thread Alessandro Molina
The Pull Request for the Cookbook has been created ( https://github.com/apache/arrow-cookbook/pull/1 ) I left as comments in the PR the steps that need to be done to enable compilation of the cookbook once the PR is merged (enabling actions, gh pages etc...) anyone willing to merge it should

Re: [Discuss] [Rust] Arrow2/parquet2 going foward

2021-07-20 Thread Jorge Cardoso Leitão
Hi, I meant to stop releasing "arrow" in crates.io and start releasing it as "arrow2" under a different versioning schema; like "psycopg" -> "psycopg2" in pypi and others that suffered from large architectural changes that required a different versioning that better represents the state of the