Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb closed pull request #7541: WIP: Avoid allocations when applying filters to RowSelection URL: https://github.com/apache/arrow-rs/pull/7541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb commented on PR #7541: URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905238332 Hmm, the benchmark fails like this: ``` thread 'main' panicked at parquet/benches/arrow_reader_clickbench.rs:826:9: assertion `left == right` failed: Expected 3312 rows, but got 98 in Q1 left: 98 right: 3312 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace error: bench failed, to rerun pass `-p parquet --bench arrow_reader_clickbench` ``` So there is some bug I need to address -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb commented on PR #7541: URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905224814 🤖: Benchmark completed Details ``` group alamb_less_allocations main - -- and_then1.00963.4±3.25µs? ?/sec1.11 1069.7±4.44µs ? ?/sec from_filters1.48 910.3±30.01µs? ?/sec1.00613.5±9.90µs ? ?/sec intersection1.05 2.0±0.00ms? ?/sec1.00 1912.0±4.44µs ? ?/sec union 1.08 2.2±0.01ms? ?/sec1.00 2.0±0.00ms ? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb commented on PR #7541: URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905213806 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/less_allocations (0d1074f9e76a9f8b68e3eebbdc393d960556b108) to e9df239980baa6d0f7eb4384eb01078bdd9b1701 [diff](https://github.com/apache/arrow-rs/compare/e9df239980baa6d0f7eb4384eb01078bdd9b1701..0d1074f9e76a9f8b68e3eebbdc393d960556b108) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_less_allocations Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb commented on PR #7541: URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905216060 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/less_allocations (0d1074f9e76a9f8b68e3eebbdc393d960556b108) to e9df239980baa6d0f7eb4384eb01078bdd9b1701 [diff](https://github.com/apache/arrow-rs/compare/e9df239980baa6d0f7eb4384eb01078bdd9b1701..0d1074f9e76a9f8b68e3eebbdc393d960556b108) BENCH_NAME=row_selector BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench row_selector BENCH_FILTER= BENCH_BRANCH_NAME=alamb_less_allocations Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]
alamb commented on code in PR #7541: URL: https://github.com/apache/arrow-rs/pull/7541#discussion_r2105082947 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -112,10 +112,9 @@ impl ReadPlanBuilder { }; } -let raw = RowSelection::from_filters(&filters); self.selection = match self.selection.take() { -Some(selection) => Some(selection.and_then(&raw)), -None => Some(raw), +Some(selection) => Some(selection.apply_filters(&filters)), Review Comment: The whole point of this PR is to call `apply_filters` here and avoid creating `raw` when there is an existing filter. I will run some benchmarks and see -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org