soumya-ghosh closed pull request #1057: Optimize reads of record batches by
pushing limit to file level
URL: https://github.com/apache/iceberg-python/pull/1057
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above
soumya-ghosh commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2323414028
Yep, closed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comme
kevinjqliu commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2323398340
@soumya-ghosh since #1043 is merged, can we close this PR?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and
kevinjqliu commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2290148441
Thanks @soumya-ghosh. I think #1043 is addressing this same issue.
Can we use this PR to standardize a test suite for the read path to ensure
the optimization is applied?
soumya-ghosh commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2290024214
@sungwy @kevinjqliu I misunderstood the functioning of
`_task_to_record_batches`, thus ended up making unnecessary changes, thanks for
review comments.
After running some
soumya-ghosh commented on code in PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#discussion_r1717613511
##
pyiceberg/io/pyarrow.py:
##
@@ -1366,6 +1373,7 @@ def project_table(
case_sensitive,
table_metadata.name_mapping(),
kevinjqliu commented on code in PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#discussion_r1716133239
##
pyiceberg/io/pyarrow.py:
##
@@ -1194,6 +1194,7 @@ def _task_to_record_batches(
case_sensitive: bool,
name_mapping: Optional[NameMapping] = None,
sungwy commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2287395213
Hi @soumya-ghosh - thank you for picking this issue up! I'm working on
refactoring this part of the code base, and I have a different, but similar
approach for pushing the limit dow
soumya-ghosh commented on PR #1057:
URL: https://github.com/apache/iceberg-python/pull/1057#issuecomment-2287206429
@kevinjqliu any thoughts on this implementation? Is this what you had in
mind?
I have tested in a file of approx 50 MB and verified that fewer batches are
scanned in this a