lidavidm opened a new pull request #9480: URL: https://github.com/apache/arrow/pull/9480
This changes 2 things: - ScanTask.execute() is now eagerly evaluated, so any work involved in creating a record batch reader is done up front. For example, a Parquet file will actually begin reading (but not decoding) data. This mirrors the behavior in C++. This also makes it easier to separate stages of work when pipelining reads (by manually dispatching scan tasks). - This has the side effect of working around a SIGSEGV in the Cython generator, caused because Cython raises StopIteration without holding the GIL when you have a generator that uses "with [no]gil". In my tests this is fixed on Cython master but not any stable or development release. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org