Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-06-25 Thread via GitHub


alamb closed pull request #7541: WIP: Avoid allocations when applying filters 
to RowSelection
URL: https://github.com/apache/arrow-rs/pull/7541


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-05-23 Thread via GitHub


alamb commented on PR #7541:
URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905238332

   Hmm, the benchmark fails like this:
   
   ```
   thread 'main' panicked at parquet/benches/arrow_reader_clickbench.rs:826:9:
   assertion `left == right` failed: Expected 3312 rows, but got 98 in Q1
 left: 98
right: 3312
   note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
   
   error: bench failed, to rerun pass `-p parquet --bench 
arrow_reader_clickbench`
   ```
   
   So there is some bug I need to address


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-05-23 Thread via GitHub


alamb commented on PR #7541:
URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905224814

   🤖: Benchmark completed
   
   Details
   
   
   
   ```
   group   alamb_less_allocations main
   -   -- 
   and_then1.00963.4±3.25µs? ?/sec1.11   1069.7±4.44µs  
  ? ?/sec
   from_filters1.48   910.3±30.01µs? ?/sec1.00613.5±9.90µs  
  ? ?/sec
   intersection1.05  2.0±0.00ms? ?/sec1.00   1912.0±4.44µs  
  ? ?/sec
   union   1.08  2.2±0.01ms? ?/sec1.00  2.0±0.00ms  
  ? ?/sec
   ```
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-05-23 Thread via GitHub


alamb commented on PR #7541:
URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905213806

   🤖 `./gh_compare_arrow.sh` [Benchmark 
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr  2 16:34:16 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_allocations (0d1074f9e76a9f8b68e3eebbdc393d960556b108) 
to e9df239980baa6d0f7eb4384eb01078bdd9b1701 
[diff](https://github.com/apache/arrow-rs/compare/e9df239980baa6d0f7eb4384eb01078bdd9b1701..0d1074f9e76a9f8b68e3eebbdc393d960556b108)
   BENCH_NAME=arrow_reader_clickbench
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench arrow_reader_clickbench 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-05-23 Thread via GitHub


alamb commented on PR #7541:
URL: https://github.com/apache/arrow-rs/pull/7541#issuecomment-2905216060

   🤖 `./gh_compare_arrow.sh` [Benchmark 
Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh)
 Running
   Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr  2 16:34:16 UTC 
2025 x86_64 x86_64 x86_64 GNU/Linux
   Comparing alamb/less_allocations (0d1074f9e76a9f8b68e3eebbdc393d960556b108) 
to e9df239980baa6d0f7eb4384eb01078bdd9b1701 
[diff](https://github.com/apache/arrow-rs/compare/e9df239980baa6d0f7eb4384eb01078bdd9b1701..0d1074f9e76a9f8b68e3eebbdc393d960556b108)
   BENCH_NAME=row_selector
   BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental 
--bench row_selector 
   BENCH_FILTER=
   BENCH_BRANCH_NAME=alamb_less_allocations
   Results will be posted here when complete
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] WIP: Avoid allocations when applying filters to RowSelection [arrow-rs]

2025-05-23 Thread via GitHub


alamb commented on code in PR #7541:
URL: https://github.com/apache/arrow-rs/pull/7541#discussion_r2105082947


##
parquet/src/arrow/arrow_reader/read_plan.rs:
##
@@ -112,10 +112,9 @@ impl ReadPlanBuilder {
 };
 }
 
-let raw = RowSelection::from_filters(&filters);
 self.selection = match self.selection.take() {
-Some(selection) => Some(selection.and_then(&raw)),
-None => Some(raw),
+Some(selection) => Some(selection.apply_filters(&filters)),

Review Comment:
   The whole point of this PR is to call `apply_filters` here and avoid 
creating `raw` when there is an existing filter.
   
   I will run some benchmarks and see



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org