Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2907789055 > > But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further. > > This is an interesting idea and I think it is worth explroing > > > Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step... > > 👍 > > Another thing that makes this tricky in my mind is that if `batch_size` is `8000` that requires the total number of `1`s in the mask needs to be `8000` -- the mask itself can be substantially larger (e.g. it could be `16000` and select every other row) 🤔 Very good point! @alamb It's hard for us to reduce it's overhead, maybe we can setting something like max_bitmap_iterator: When bitmap iterator hit > max_bitmap_iterator, we can consume it first as a output batch, and then to merge those batch finally. But i am not sure if it will make the performance worse than using selector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2907755726 > But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further. This is an interesting idea and I think it is worth explroing > Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step... 👍 Another thing that makes this tricky in my mind is that if `batch_size` is `8000` that requires the total number of `1`s in the mask needs to be `8000` -- the mask itself can be substantially larger (e.g. it could be `16000` and select every other row) 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906813523 > > And the default is selector because i use it to compute avg_size_of_selector. > > Make sense -- thank you > > I found [`SlicesIterator`](https://docs.rs/arrow/latest/arrow/compute/struct.SlicesIterator.html) when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use `skip/select` for large contiguous runs but switch to bitmap when there are smaller > > The other thing I couldn't easily work out was if there was any way to switch from `select/skip` _within_ a output batch, or if the plan needs to be either `RowSelector` or `BitMap` for each output batch > > Or maybe we could just add a third type of `ReadPlan`, namely `ReadPlan::Bitmap` 🤔 Thank you @alamb , this is very good point: 1. I was testing for output batch, we both use either `RowSelector` or `BitMap` for each output batch: Because, it may happen 8192 => bitmap, 8192 => selector, 8192 => bitmap... We can't use only one to make it optimize. 2. I think the best optimize way is : - We have the basic default window size for adaptive batch size 8192, just like above case we setting bitmap/selector for batch size. - But we also support merging window for the same type batch window: For example, we have a output batch, after selecting 5 batch size: 1) 8192 => bitmap 2) 8192 => bitmap 3) 8192 => selector 4) 8192 => selector 5) 8192 => bitmap We can merge 1, 2 because they are all bitmap. We can merge 4,5 because they are all selectors. And remaining one bitmap But i think we can start from the basic optimization, only use batch size window to make the decision to choose bitmap or selector. And later, we can optimize further. Maybe we can only have selector for ReadPlan, but for adaptive window size(currently fixed with batch size), we can change to bitmap if it's dense for the first step... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906794445 > And the default is selector because i use it to compute avg_size_of_selector. Make sense -- thank you I found [`SlicesIterator`](https://docs.rs/arrow/latest/arrow/compute/struct.SlicesIterator.html) when looking at the Bitmap --> RowSelection code the other day. I think that could be used to determine the "average run length" so we could continue to use `skip/select` for large contiguous runs but switch to bitmap when there are smaller The other thing I couldn't easily work out was if there was any way to switch from `select/skip` *within* a output batch, or if the plan needs to be either `RowSelector` or `BitMap` for each output batch Or maybe we could just add a third type of `ReadPlan`, namely `ReadPlan::Bitmap` 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906792622 > > > Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps > > > > > > Thank you @alamb, it's a good news! > > This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference). > > @zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of `10` somewhere I agree @alamb , i was testing with 10 for cutoff for using a bitmap vs a RowSelector, it's a very basic cutoff: avg_size_of_selector = total row / selectors if avg_size_of_selector > 10 using selector if avg_size_of_selector <= 10 using bitmap And the default is selector because i use it to compute avg_size_of_selector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906789235 > > Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps > > Thank you @alamb, it's a good news! This PR has shown me that for some queries the dispatch logic for RowSelection is quite high (as in just doing an extra compare in that loop made a measurable difference). @zhuqi-lucas in your testing, did you measure where the cutoff for using a bitmap vs a RowSelector was? I think I remember seeing a value of `10` somewhere -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2906345215 > Ok, the latest benchmark result I think are now better and show no regression thanks to @zhuqi-lucas 's suggestion. I will try @Dandandan 's idea to use a Vec and see if that helps Thank you @alamb, it's a good news! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905679427 🤖: Benchmark completed Details ``` groupalamb_row_selection_plan main - arrow_reader_clickbench/async/Q1 1.00 2.4±0.01ms? ?/sec 1.00 2.3±0.01ms? ?/sec arrow_reader_clickbench/async/Q101.00 14.3±0.50ms? ?/sec 1.00 14.3±0.09ms? ?/sec arrow_reader_clickbench/async/Q111.00 16.1±0.09ms? ?/sec 1.00 16.2±0.08ms? ?/sec arrow_reader_clickbench/async/Q121.00 37.2±0.24ms? ?/sec 1.00 37.1±0.17ms? ?/sec arrow_reader_clickbench/async/Q131.01 50.9±0.34ms? ?/sec 1.00 50.5±0.25ms? ?/sec arrow_reader_clickbench/async/Q141.01 48.7±0.38ms? ?/sec 1.00 48.2±0.31ms? ?/sec arrow_reader_clickbench/async/Q191.00 4.9±0.03ms? ?/sec 1.02 5.0±0.03ms? ?/sec arrow_reader_clickbench/async/Q201.00158.0±0.59ms? ?/sec 1.00157.2±0.79ms? ?/sec arrow_reader_clickbench/async/Q211.00202.5±0.87ms? ?/sec 1.07217.4±1.07ms? ?/sec arrow_reader_clickbench/async/Q221.00405.0±1.25ms? ?/sec 1.18478.0±2.49ms? ?/sec arrow_reader_clickbench/async/Q231.09506.7±7.67ms? ?/sec 1.00463.4±4.56ms? ?/sec arrow_reader_clickbench/async/Q241.01 56.7±0.81ms? ?/sec 1.00 56.1±1.00ms? ?/sec arrow_reader_clickbench/async/Q271.00159.5±0.91ms? ?/sec 1.01160.3±0.69ms? ?/sec arrow_reader_clickbench/async/Q281.00157.2±0.92ms? ?/sec 1.01158.5±0.78ms? ?/sec arrow_reader_clickbench/async/Q301.03 65.1±0.78ms? ?/sec 1.00 63.2±0.35ms? ?/sec arrow_reader_clickbench/async/Q361.00165.1±1.18ms? ?/sec 1.00165.3±0.81ms? ?/sec arrow_reader_clickbench/async/Q371.01 98.3±0.41ms? ?/sec 1.00 97.8±0.42ms? ?/sec arrow_reader_clickbench/async/Q381.00 38.4±0.27ms? ?/sec 1.00 38.5±0.24ms? ?/sec arrow_reader_clickbench/async/Q391.01 48.3±0.26ms? ?/sec 1.00 47.7±0.30ms? ?/sec arrow_reader_clickbench/async/Q401.00 53.0±0.63ms? ?/sec 1.01 53.4±0.92ms? ?/sec arrow_reader_clickbench/async/Q411.01 40.1±0.33ms? ?/sec 1.00 39.8±0.47ms? ?/sec arrow_reader_clickbench/async/Q421.01 14.3±0.10ms? ?/sec 1.00 14.2±0.04ms? ?/sec arrow_reader_clickbench/sync/Q1 1.01 2.2±0.01ms? ?/sec 1.00 2.2±0.00ms? ?/sec arrow_reader_clickbench/sync/Q10 1.00 13.1±0.06ms? ?/sec 1.00 13.0±0.07ms? ?/sec arrow_reader_clickbench/sync/Q11 1.00 15.0±0.05ms? ?/sec 1.00 14.9±0.06ms? ?/sec arrow_reader_clickbench/sync/Q12 1.00 39.3±0.27ms? ?/sec 1.00 39.1±0.29ms? ?/sec arrow_reader_clickbench/sync/Q13 1.01 52.3±0.32ms? ?/sec 1.00 51.8±0.35ms? ?/sec arrow_reader_clickbench/sync/Q14 1.00 50.6±0.33ms? ?/sec 1.00 50.5±0.26ms? ?/sec arrow_reader_clickbench/sync/Q19 1.00 4.2±0.01ms? ?/sec 1.02 4.3±0.03ms? ?/sec arrow_reader_clickbench/sync/Q20 1.00174.8±0.73ms? ?/sec 1.00174.1±0.77ms? ?/sec arrow_reader_clickbench/sync/Q21 1.00230.1±2.96ms? ?/sec 1.00230.1±1.93ms? ?/sec arrow_reader_clickbench/sync/Q22 1.00467.1±1.94ms? ?/sec 1.01469.8±2.79ms? ?/sec arrow_reader_clickbench/sync/Q23 1.01 439.6±12.85ms? ?/sec 1.00 434.9±15.22ms? ?/sec arrow_reader_clickbench/sync/Q24 1.00 53.6±0.53ms? ?/sec 1.00 53.4±0.93ms? ?/sec arrow_reader_clickbench/sync/Q27 1.00150.8±0.81ms? ?/sec 1.00150.4±0.93ms? ?/sec arrow_reader_clickbench/sync/Q28 1.01150.1±0.84ms? ?/sec 1.00148.9±0.94ms? ?/sec arrow_reader_clickbench/sync/Q30 1.01 62.2±0.32ms? ?/sec 1.00 61.3±0.40ms? ?/sec arrow_reader_clickbench/sync/Q36 1.00154.1±0.78ms? ?/sec 1.00154.1±0.75ms? ?/sec arrow_reader_clickbench/sync/Q37 1.00 90.4±0.59ms? ?/sec 1.00 90.5±0.55ms? ?/sec arrow_reader_clickbench/sync/Q38 1.01 31.1±0.34ms? ?/sec 1.00 30.8±0.20ms? ?/sec arrow_reader_clickbe
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905593200 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/row_selection_plan (948374a299340464cd8d2fc07c029444d03edfd1) to e9df239980baa6d0f7eb4384eb01078bdd9b1701 [diff](https://github.com/apache/arrow-rs/compare/e9df239980baa6d0f7eb4384eb01078bdd9b1701..948374a299340464cd8d2fc07c029444d03edfd1) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_row_selection_plan Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905589446 🤔 seems like vec made things slower (see [results](https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905570574) here) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905570574 🤖: Benchmark completed Details ``` groupalamb_row_selection_plan main - arrow_reader_clickbench/async/Q1 1.00 2.4±0.01ms? ?/sec 1.00 2.3±0.01ms? ?/sec arrow_reader_clickbench/async/Q101.02 14.5±0.06ms? ?/sec 1.00 14.2±0.06ms? ?/sec arrow_reader_clickbench/async/Q111.02 16.3±0.07ms? ?/sec 1.00 16.0±0.06ms? ?/sec arrow_reader_clickbench/async/Q121.05 39.2±0.22ms? ?/sec 1.00 37.3±0.21ms? ?/sec arrow_reader_clickbench/async/Q131.04 52.5±0.27ms? ?/sec 1.00 50.6±0.29ms? ?/sec arrow_reader_clickbench/async/Q141.04 50.6±0.40ms? ?/sec 1.00 48.5±0.39ms? ?/sec arrow_reader_clickbench/async/Q191.00 5.0±0.19ms? ?/sec 1.00 5.0±0.04ms? ?/sec arrow_reader_clickbench/async/Q201.03162.5±0.59ms? ?/sec 1.00157.3±0.76ms? ?/sec arrow_reader_clickbench/async/Q211.04208.9±0.91ms? ?/sec 1.00201.8±0.67ms? ?/sec arrow_reader_clickbench/async/Q221.07433.8±1.22ms? ?/sec 1.00407.3±1.45ms? ?/sec arrow_reader_clickbench/async/Q231.03 496.1±14.49ms? ?/sec 1.00481.9±5.63ms? ?/sec arrow_reader_clickbench/async/Q241.03 57.6±0.38ms? ?/sec 1.00 56.2±0.53ms? ?/sec arrow_reader_clickbench/async/Q271.03165.4±1.09ms? ?/sec 1.00160.1±0.91ms? ?/sec arrow_reader_clickbench/async/Q281.03162.7±0.90ms? ?/sec 1.00158.4±1.39ms? ?/sec arrow_reader_clickbench/async/Q301.02 64.8±0.38ms? ?/sec 1.00 63.2±0.45ms? ?/sec arrow_reader_clickbench/async/Q361.03168.6±0.91ms? ?/sec 1.00164.2±0.87ms? ?/sec arrow_reader_clickbench/async/Q371.06103.2±0.53ms? ?/sec 1.00 97.7±0.42ms? ?/sec arrow_reader_clickbench/async/Q381.02 39.1±0.46ms? ?/sec 1.00 38.5±0.27ms? ?/sec arrow_reader_clickbench/async/Q391.03 48.8±0.72ms? ?/sec 1.00 47.5±0.27ms? ?/sec arrow_reader_clickbench/async/Q401.00 52.9±0.59ms? ?/sec 1.01 53.1±0.60ms? ?/sec arrow_reader_clickbench/async/Q411.00 40.1±0.32ms? ?/sec 1.00 39.9±0.46ms? ?/sec arrow_reader_clickbench/async/Q421.00 14.2±0.10ms? ?/sec 1.00 14.3±0.07ms? ?/sec arrow_reader_clickbench/sync/Q1 1.00 2.2±0.01ms? ?/sec 1.00 2.2±0.01ms? ?/sec arrow_reader_clickbench/sync/Q10 1.02 13.3±0.06ms? ?/sec 1.00 13.0±0.05ms? ?/sec arrow_reader_clickbench/sync/Q11 1.02 15.2±0.07ms? ?/sec 1.00 14.9±0.06ms? ?/sec arrow_reader_clickbench/sync/Q12 1.05 41.4±0.26ms? ?/sec 1.00 39.4±0.31ms? ?/sec arrow_reader_clickbench/sync/Q13 1.04 53.7±0.31ms? ?/sec 1.00 51.6±0.42ms? ?/sec arrow_reader_clickbench/sync/Q14 1.06 52.5±0.43ms? ?/sec 1.00 49.6±0.28ms? ?/sec arrow_reader_clickbench/sync/Q19 1.00 4.3±0.02ms? ?/sec 1.00 4.3±0.02ms? ?/sec arrow_reader_clickbench/sync/Q20 1.03178.4±0.82ms? ?/sec 1.00173.9±0.57ms? ?/sec arrow_reader_clickbench/sync/Q21 1.04238.4±1.66ms? ?/sec 1.00229.9±2.06ms? ?/sec arrow_reader_clickbench/sync/Q22 1.05491.4±3.19ms? ?/sec 1.00467.3±2.22ms? ?/sec arrow_reader_clickbench/sync/Q23 1.00431.0±6.85ms? ?/sec 1.01 435.4±16.15ms? ?/sec arrow_reader_clickbench/sync/Q24 1.03 54.8±0.52ms? ?/sec 1.00 53.2±0.63ms? ?/sec arrow_reader_clickbench/sync/Q27 1.03155.4±0.63ms? ?/sec 1.00151.0±0.70ms? ?/sec arrow_reader_clickbench/sync/Q28 1.03153.0±0.65ms? ?/sec 1.00148.4±0.80ms? ?/sec arrow_reader_clickbench/sync/Q30 1.02 62.4±0.33ms? ?/sec 1.00 61.0±0.32ms? ?/sec arrow_reader_clickbench/sync/Q36 1.02157.7±0.70ms? ?/sec 1.00153.9±0.84ms? ?/sec arrow_reader_clickbench/sync/Q37 1.06 95.5±0.59ms? ?/sec 1.00 90.1±0.66ms? ?/sec arrow_reader_clickbench/sync/Q38 1.02 31.6±0.27ms? ?/sec 1.00 31.0±0.29ms? ?/sec arrow_reader_clickbe
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905472077 Ok, the latest benchmark result I think are now better and show now regression. I will try @Dandandan 's idea to use a Vec and see if that helps -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905494564 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/row_selection_plan (76b1ef6c851ac8cc5c7fdc8865ec40ab20f4be6a) to 0d774fe4b3d08fba73bbbacfba34c35af9ca2251 [diff](https://github.com/apache/arrow-rs/compare/0d774fe4b3d08fba73bbbacfba34c35af9ca2251..76b1ef6c851ac8cc5c7fdc8865ec40ab20f4be6a) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_row_selection_plan Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2105234219 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -131,13 +129,101 @@ impl ReadPlanBuilder { selection, } = self; -let selection = selection.map(|s| s.trim().into()); +// If the batch size is 0, read "all rows" +if batch_size == 0 { +return ReadPlan::All { batch_size: 0 }; +} + +// If no selection is provided, read all rows +let Some(selection) = selection else { +return ReadPlan::All { batch_size }; +}; + +let iterator = SelectionIterator::new(batch_size, selection.into()); +ReadPlan::Subset { iterator } +} +} + +/// Incrementally returns [`RowSelector`]s that describe reading from a Parquet file. +/// +/// The returned stream of [`RowSelector`]s is guaranteed to have: +/// 1. No empty selections (that select no rows) +/// 2. No selections that span batch_size boundaries +/// 3. No trailing skip selections +/// +/// For example, if the `batch_size` is 100 and we are selecting all 200 rows +/// from a Parquet file, the selectors will be: +/// - `RowSelector::select(100) <-- forced break at batch_size boundary` +/// - `RowSelector::select(100)` +#[derive(Debug, Clone)] +pub(crate) struct SelectionIterator { +/// how many rows to read in each batch +batch_size: usize, +/// how many records have been read by RowSelection in the "current" batch +read_records: usize, +/// Input selectors to read from +input_selectors: VecDeque, Review Comment: I used VecDeque as that is how the current code does it I can try and see if it makes any difference,. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
Dandandan commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2105229592 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -131,13 +129,101 @@ impl ReadPlanBuilder { selection, } = self; -let selection = selection.map(|s| s.trim().into()); +// If the batch size is 0, read "all rows" +if batch_size == 0 { +return ReadPlan::All { batch_size: 0 }; +} + +// If no selection is provided, read all rows +let Some(selection) = selection else { +return ReadPlan::All { batch_size }; +}; + +let iterator = SelectionIterator::new(batch_size, selection.into()); +ReadPlan::Subset { iterator } +} +} + +/// Incrementally returns [`RowSelector`]s that describe reading from a Parquet file. +/// +/// The returned stream of [`RowSelector`]s is guaranteed to have: +/// 1. No empty selections (that select no rows) +/// 2. No selections that span batch_size boundaries +/// 3. No trailing skip selections +/// +/// For example, if the `batch_size` is 100 and we are selecting all 200 rows +/// from a Parquet file, the selectors will be: +/// - `RowSelector::select(100) <-- forced break at batch_size boundary` +/// - `RowSelector::select(100)` +#[derive(Debug, Clone)] +pub(crate) struct SelectionIterator { +/// how many rows to read in each batch +batch_size: usize, +/// how many records have been read by RowSelection in the "current" batch +read_records: usize, +/// Input selectors to read from +input_selectors: VecDeque, Review Comment: I think `Vec` can be used here (track an index)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905429539 🤖: Benchmark completed Details ``` groupalamb_row_selection_plan main - arrow_reader_clickbench/async/Q1 1.01 2.4±0.05ms? ?/sec 1.00 2.3±0.00ms? ?/sec arrow_reader_clickbench/async/Q101.00 14.1±0.10ms? ?/sec 1.01 14.3±0.11ms? ?/sec arrow_reader_clickbench/async/Q111.00 16.4±0.15ms? ?/sec 1.00 16.4±0.19ms? ?/sec arrow_reader_clickbench/async/Q121.00 37.4±0.20ms? ?/sec 1.01 37.8±0.29ms? ?/sec arrow_reader_clickbench/async/Q131.00 50.9±0.38ms? ?/sec 1.01 51.6±0.64ms? ?/sec arrow_reader_clickbench/async/Q141.00 48.5±0.20ms? ?/sec 1.01 48.9±0.27ms? ?/sec arrow_reader_clickbench/async/Q191.00 4.9±0.04ms? ?/sec 1.01 5.0±0.05ms? ?/sec arrow_reader_clickbench/async/Q201.00157.3±0.67ms? ?/sec 1.08170.2±0.62ms? ?/sec arrow_reader_clickbench/async/Q211.00220.4±0.81ms? ?/sec 1.00220.4±1.27ms? ?/sec arrow_reader_clickbench/async/Q221.01487.1±1.89ms? ?/sec 1.00482.5±3.29ms? ?/sec arrow_reader_clickbench/async/Q231.01 488.8±19.49ms? ?/sec 1.00484.6±9.86ms? ?/sec arrow_reader_clickbench/async/Q241.01 57.1±0.82ms? ?/sec 1.00 56.4±0.42ms? ?/sec arrow_reader_clickbench/async/Q271.00161.4±0.98ms? ?/sec 1.00160.7±1.14ms? ?/sec arrow_reader_clickbench/async/Q281.00158.9±0.78ms? ?/sec 1.01159.8±1.08ms? ?/sec arrow_reader_clickbench/async/Q301.02 64.4±0.50ms? ?/sec 1.00 62.9±0.32ms? ?/sec arrow_reader_clickbench/async/Q361.00165.7±0.96ms? ?/sec 1.00165.3±1.03ms? ?/sec arrow_reader_clickbench/async/Q371.00 98.5±0.54ms? ?/sec 1.00 98.0±0.52ms? ?/sec arrow_reader_clickbench/async/Q381.00 38.4±0.28ms? ?/sec 1.01 38.7±0.19ms? ?/sec arrow_reader_clickbench/async/Q391.00 47.5±0.30ms? ?/sec 1.01 48.1±0.26ms? ?/sec arrow_reader_clickbench/async/Q401.00 53.1±0.42ms? ?/sec 1.00 53.2±0.26ms? ?/sec arrow_reader_clickbench/async/Q411.00 39.9±0.28ms? ?/sec 1.00 39.8±0.47ms? ?/sec arrow_reader_clickbench/async/Q421.00 14.3±0.09ms? ?/sec 1.00 14.3±0.07ms? ?/sec arrow_reader_clickbench/sync/Q1 1.01 2.2±0.01ms? ?/sec 1.00 2.2±0.00ms? ?/sec arrow_reader_clickbench/sync/Q10 1.01 13.1±0.06ms? ?/sec 1.00 13.0±0.04ms? ?/sec arrow_reader_clickbench/sync/Q11 1.01 15.0±0.06ms? ?/sec 1.00 14.9±0.05ms? ?/sec arrow_reader_clickbench/sync/Q12 1.01 39.5±0.22ms? ?/sec 1.00 39.3±0.29ms? ?/sec arrow_reader_clickbench/sync/Q13 1.03 52.7±0.42ms? ?/sec 1.00 51.3±0.40ms? ?/sec arrow_reader_clickbench/sync/Q14 1.00 50.3±0.37ms? ?/sec 1.00 50.3±0.33ms? ?/sec arrow_reader_clickbench/sync/Q19 1.00 4.2±0.03ms? ?/sec 1.02 4.3±0.01ms? ?/sec arrow_reader_clickbench/sync/Q20 1.00173.9±0.68ms? ?/sec 1.00174.5±0.70ms? ?/sec arrow_reader_clickbench/sync/Q21 1.00228.7±3.39ms? ?/sec 1.00227.9±2.17ms? ?/sec arrow_reader_clickbench/sync/Q22 1.00468.2±1.80ms? ?/sec 1.00468.1±2.34ms? ?/sec arrow_reader_clickbench/sync/Q23 1.03 438.5±15.37ms? ?/sec 1.00 426.0±12.10ms? ?/sec arrow_reader_clickbench/sync/Q24 1.02 54.3±0.35ms? ?/sec 1.00 53.5±0.56ms? ?/sec arrow_reader_clickbench/sync/Q27 1.00150.2±0.74ms? ?/sec 1.00150.7±0.78ms? ?/sec arrow_reader_clickbench/sync/Q28 1.00148.4±0.85ms? ?/sec 1.00148.9±0.75ms? ?/sec arrow_reader_clickbench/sync/Q30 1.01 62.0±0.34ms? ?/sec 1.00 61.2±0.40ms? ?/sec arrow_reader_clickbench/sync/Q36 1.00154.0±0.89ms? ?/sec 1.00154.5±0.92ms? ?/sec arrow_reader_clickbench/sync/Q37 1.00 90.2±0.44ms? ?/sec 1.00 90.6±0.45ms? ?/sec arrow_reader_clickbench/sync/Q38 1.00 31.2±0.25ms? ?/sec 1.00 31.3±0.28ms? ?/sec arrow_reader_clickbe
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905385143 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/row_selection_plan (00a0f1f747ae2208db63b132195bffc558630728) to 0d774fe4b3d08fba73bbbacfba34c35af9ca2251 [diff](https://github.com/apache/arrow-rs/compare/0d774fe4b3d08fba73bbbacfba34c35af9ca2251..00a0f1f747ae2208db63b132195bffc558630728) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_row_selection_plan Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905235600 Hmm, the benchmark fails like this: ``` thread 'main' panicked at parquet/benches/arrow_reader_clickbench.rs:826:9: assertion `left == right` failed: Expected 3312 rows, but got 98 in Q1 left: 98 right: 3312 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace error: bench failed, to rerun pass `-p parquet --bench arrow_reader_clickbench` ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2905197667 > It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now. Thansk @zhuqi-lucas -- I will remove that check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904683633 It looks like remove this check, can improve a little performance, but still regression some cases, can't find the root cause until now. ```rust // Reader should read exactly `batch_size` records except for last batch if !end_of_stream && (read_records != batch_size) { return Err(general_err!( "Internal Error: unexpected read count. Expected {batch_size} got {read_records}" )); } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904410078 > Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more. > > I may park this for a while and continue working on filter results caching some more Thank you very much @alamb , i can continue help investigate why the performance has some regression for this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2104577693 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +808,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop_front().unwrap(); -if front.skip { -let skipped = self.array_reader.skip_records(front.row_count)?; - -if skipped != front.row_count { -return Err(general_err!( -"failed to skip rows, expected {}, got {}", -front.row_count, -skipped -)); -} -continue; -} +while read_records < batch_size { +let Some(front) = self.read_plan.next() else { +end_of_stream = true; +break; +}; -//Currently, when RowSelectors with row_count = 0 are included then its interpreted as end of reader. -//Fix is to skip such entries. See https://github.com/apache/arrow-rs/issues/2669 -if front.row_count == 0 { -continue; -} +if front.skip { +let skipped = self.array_reader.skip_records(front.row_count)?; -// try to read record -let need_read = batch_size - read_records; -let to_read = match front.row_count.checked_sub(need_read) { -Some(remaining) if remaining != 0 => { -// if page row count less than batch_size we must set batch size to page row count. -// add check avoid dead loop - selection.push_front(RowSelector::select(remaining)); -need_read -} -_ => front.row_count, -}; -match self.array_reader.read_records(to_read)? { -0 => break, -rec => read_records += rec, -}; +if skipped != front.row_count { +return Err(general_err!( +"Internal Error: failed to skip rows, expected {}, got {}", +front.row_count, +skipped +)); +} +} else { +let read = self.array_reader.read_records(front.row_count)?; Review Comment: > I am not quite sure what you mean here -- if `read < row_count` I think that means the array is exhausted and the row group is done. > > What sort of fast path would it be? This right @alamb , sorry for that i am not describing it correctly, i mean do we need to break early for it. It looks like not needed. ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +808,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop_front().unwrap(); -if front.skip { -let skipped = self.array_reader.skip_records(front.row_count)?; - -if skipped != front.row_count { -return Err(general_err!( -"failed to skip rows, expected {}, got {}", -front.row_count, -skipped -)); -} -continue; -} +while read_records < batch_size { +let Some(front) = self.read_plan.next() else { +end_of_stream = true; +break; +}; -//Currently, when RowSelectors with row_count = 0 are included then its interpreted as end of reader. -//Fix is to skip such entries. See https://github.com/apache/arrow-rs/issues/2669 -
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904400494 🤖: Benchmark completed Details ``` groupalamb_row_selection_plan main - arrow_reader_clickbench/async/Q1 1.01 2.4±0.02ms? ?/sec 1.00 2.4±0.01ms? ?/sec arrow_reader_clickbench/async/Q101.05 15.4±0.18ms? ?/sec 1.00 14.6±0.10ms? ?/sec arrow_reader_clickbench/async/Q111.02 16.9±0.11ms? ?/sec 1.00 16.5±0.17ms? ?/sec arrow_reader_clickbench/async/Q121.04 39.8±0.35ms? ?/sec 1.00 38.1±0.27ms? ?/sec arrow_reader_clickbench/async/Q131.01 53.7±0.44ms? ?/sec 1.00 53.4±0.56ms? ?/sec arrow_reader_clickbench/async/Q141.02 51.4±0.34ms? ?/sec 1.00 50.4±0.24ms? ?/sec arrow_reader_clickbench/async/Q191.00 5.1±0.04ms? ?/sec 1.00 5.0±0.04ms? ?/sec arrow_reader_clickbench/async/Q201.03164.3±0.52ms? ?/sec 1.00159.0±1.15ms? ?/sec arrow_reader_clickbench/async/Q211.13231.1±1.03ms? ?/sec 1.00204.3±1.42ms? ?/sec arrow_reader_clickbench/async/Q221.08513.1±1.80ms? ?/sec 1.00475.6±2.61ms? ?/sec arrow_reader_clickbench/async/Q231.02496.8±8.85ms? ?/sec 1.00 486.4±11.94ms? ?/sec arrow_reader_clickbench/async/Q241.02 59.2±0.90ms? ?/sec 1.00 58.1±0.73ms? ?/sec arrow_reader_clickbench/async/Q271.03167.1±0.70ms? ?/sec 1.00161.8±0.83ms? ?/sec arrow_reader_clickbench/async/Q281.04167.4±1.78ms? ?/sec 1.00160.3±0.84ms? ?/sec arrow_reader_clickbench/async/Q301.04 66.7±0.33ms? ?/sec 1.00 64.4±0.55ms? ?/sec arrow_reader_clickbench/async/Q361.04173.2±0.94ms? ?/sec 1.00166.5±0.90ms? ?/sec arrow_reader_clickbench/async/Q371.05104.0±0.72ms? ?/sec 1.00 98.8±0.67ms? ?/sec arrow_reader_clickbench/async/Q381.02 39.8±0.30ms? ?/sec 1.00 38.9±0.29ms? ?/sec arrow_reader_clickbench/async/Q391.04 50.0±0.87ms? ?/sec 1.00 48.0±0.27ms? ?/sec arrow_reader_clickbench/async/Q401.00 53.9±0.57ms? ?/sec 1.00 54.0±0.36ms? ?/sec arrow_reader_clickbench/async/Q411.01 40.5±0.33ms? ?/sec 1.00 40.2±0.35ms? ?/sec arrow_reader_clickbench/async/Q421.00 14.4±0.09ms? ?/sec 1.01 14.5±0.12ms? ?/sec arrow_reader_clickbench/sync/Q1 1.01 2.2±0.02ms? ?/sec 1.00 2.2±0.01ms? ?/sec arrow_reader_clickbench/sync/Q10 1.04 13.7±0.07ms? ?/sec 1.00 13.2±0.06ms? ?/sec arrow_reader_clickbench/sync/Q11 1.03 15.6±0.09ms? ?/sec 1.00 15.1±0.07ms? ?/sec arrow_reader_clickbench/sync/Q12 1.00 38.5±0.21ms? ?/sec 1.05 40.2±0.44ms? ?/sec arrow_reader_clickbench/sync/Q13 1.00 51.1±0.24ms? ?/sec 1.07 54.7±0.64ms? ?/sec arrow_reader_clickbench/sync/Q14 1.00 50.0±0.34ms? ?/sec 1.02 51.1±0.36ms? ?/sec arrow_reader_clickbench/sync/Q19 1.01 4.3±0.03ms? ?/sec 1.00 4.3±0.03ms? ?/sec arrow_reader_clickbench/sync/Q20 1.03179.7±0.75ms? ?/sec 1.00174.5±1.01ms? ?/sec arrow_reader_clickbench/sync/Q21 1.03236.0±1.17ms? ?/sec 1.00230.2±2.01ms? ?/sec arrow_reader_clickbench/sync/Q22 1.04492.8±1.95ms? ?/sec 1.00472.5±2.10ms? ?/sec arrow_reader_clickbench/sync/Q23 1.04 456.8±15.44ms? ?/sec 1.00 440.6±12.59ms? ?/sec arrow_reader_clickbench/sync/Q24 1.00 56.3±0.62ms? ?/sec 1.00 56.4±0.61ms? ?/sec arrow_reader_clickbench/sync/Q27 1.05159.2±1.04ms? ?/sec 1.00152.0±0.86ms? ?/sec arrow_reader_clickbench/sync/Q28 1.04157.1±1.01ms? ?/sec 1.00151.3±0.72ms? ?/sec arrow_reader_clickbench/sync/Q30 1.04 64.5±0.37ms? ?/sec 1.00 62.2±0.25ms? ?/sec arrow_reader_clickbench/sync/Q36 1.03161.2±1.01ms? ?/sec 1.00156.0±0.87ms? ?/sec arrow_reader_clickbench/sync/Q37 1.05 97.0±0.52ms? ?/sec 1.00 92.0±0.62ms? ?/sec arrow_reader_clickbench/sync/Q38 1.02 32.6±0.25ms? ?/sec 1.00 32.1±0.21ms? ?/sec arrow_reader_clickbe
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904368666 Update here is that this is not looking super promising and I am somewhat stuck with how to integrate mask based selections into the logic more cleanly. I need to think about it some more. I may park this for a while and continue working on filter results caching some more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2104531117 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +808,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop_front().unwrap(); -if front.skip { -let skipped = self.array_reader.skip_records(front.row_count)?; - -if skipped != front.row_count { -return Err(general_err!( -"failed to skip rows, expected {}, got {}", -front.row_count, -skipped -)); -} -continue; -} +while read_records < batch_size { +let Some(front) = self.read_plan.next() else { +end_of_stream = true; +break; +}; -//Currently, when RowSelectors with row_count = 0 are included then its interpreted as end of reader. -//Fix is to skip such entries. See https://github.com/apache/arrow-rs/issues/2669 -if front.row_count == 0 { -continue; -} +if front.skip { +let skipped = self.array_reader.skip_records(front.row_count)?; -// try to read record -let need_read = batch_size - read_records; -let to_read = match front.row_count.checked_sub(need_read) { -Some(remaining) if remaining != 0 => { -// if page row count less than batch_size we must set batch size to page row count. -// add check avoid dead loop - selection.push_front(RowSelector::select(remaining)); -need_read -} -_ => front.row_count, -}; -match self.array_reader.read_records(to_read)? { -0 => break, -rec => read_records += rec, -}; +if skipped != front.row_count { +return Err(general_err!( +"Internal Error: failed to skip rows, expected {}, got {}", +front.row_count, +skipped +)); +} +} else { +let read = self.array_reader.read_records(front.row_count)?; Review Comment: I am not quite sure what you mean here -- if `read < row_count` I think that means the array is exhausted and the row group is done. What sort of fast path would it be? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904328566 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/row_selection_plan (ee714ca3c974b394ee6d7412cedb256f77d2eedc) to 0d774fe4b3d08fba73bbbacfba34c35af9ca2251 [diff](https://github.com/apache/arrow-rs/compare/0d774fe4b3d08fba73bbbacfba34c35af9ca2251..ee714ca3c974b394ee6d7412cedb256f77d2eedc) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_row_selection_plan Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2904327330 > 🤖: Benchmark completed 🤔 it seems it is less efficient -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2103735833 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +808,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop_front().unwrap(); -if front.skip { -let skipped = self.array_reader.skip_records(front.row_count)?; - -if skipped != front.row_count { -return Err(general_err!( -"failed to skip rows, expected {}, got {}", -front.row_count, -skipped -)); -} -continue; -} +while read_records < batch_size { +let Some(front) = self.read_plan.next() else { +end_of_stream = true; +break; +}; -//Currently, when RowSelectors with row_count = 0 are included then its interpreted as end of reader. -//Fix is to skip such entries. See https://github.com/apache/arrow-rs/issues/2669 -if front.row_count == 0 { -continue; -} +if front.skip { +let skipped = self.array_reader.skip_records(front.row_count)?; -// try to read record -let need_read = batch_size - read_records; -let to_read = match front.row_count.checked_sub(need_read) { -Some(remaining) if remaining != 0 => { -// if page row count less than batch_size we must set batch size to page row count. -// add check avoid dead loop - selection.push_front(RowSelector::select(remaining)); -need_read -} -_ => front.row_count, -}; -match self.array_reader.read_records(to_read)? { -0 => break, -rec => read_records += rec, -}; +if skipped != front.row_count { +return Err(general_err!( +"Internal Error: failed to skip rows, expected {}, got {}", +front.row_count, +skipped +)); +} +} else { +let read = self.array_reader.read_records(front.row_count)?; Review Comment: Minor, do we have fast path when read < front.row_count? ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -131,13 +129,101 @@ impl ReadPlanBuilder { selection, } = self; -let selection = selection.map(|s| s.trim().into()); +// If the batch size is 0, read "all rows" +if batch_size == 0 { +return ReadPlan::All { batch_size: 0 }; +} + +// If no selection is provided, read all rows +let Some(selection) = selection else { +return ReadPlan::All { batch_size }; +}; + +let iterator = SelectionIterator::new(batch_size, selection.into()); +ReadPlan::Subset { iterator } +} +} + +/// Incrementally returns [`RowSelector`]s that describe reading from a Parquet file. +/// +/// The returned stream of [`RowSelector`]s is guaranteed to have: +/// 1. No empty selections (that select no rows) +/// 2. No selections that span batch_size boundaries +/// 3. No trailing skip selections +/// +/// For example, if the `batch_size` is 100 and we are selecting all 200 rows +/// from a Parquet file, the selectors will be: +/// - `RowSelector::select(100) <-- forced break at batch_size boundary` +/// - `RowSelector::select(100)` Review Comment: Great work ! ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +809,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2902444038 🤖: Benchmark completed Details ``` groupalamb_row_selection_plan main - arrow_reader_clickbench/async/Q1 1.00 2.3±0.01ms? ?/sec 1.00 2.3±0.01ms? ?/sec arrow_reader_clickbench/async/Q101.01 14.9±0.10ms? ?/sec 1.00 14.8±0.13ms? ?/sec arrow_reader_clickbench/async/Q111.01 16.9±0.16ms? ?/sec 1.00 16.7±0.14ms? ?/sec arrow_reader_clickbench/async/Q121.01 39.5±0.32ms? ?/sec 1.00 39.0±0.36ms? ?/sec arrow_reader_clickbench/async/Q131.01 53.2±0.41ms? ?/sec 1.00 52.7±0.26ms? ?/sec arrow_reader_clickbench/async/Q141.01 51.0±0.34ms? ?/sec 1.00 50.3±0.26ms? ?/sec arrow_reader_clickbench/async/Q191.00 5.0±0.05ms? ?/sec 1.05 5.2±0.10ms? ?/sec arrow_reader_clickbench/async/Q201.02163.2±0.59ms? ?/sec 1.00160.2±0.72ms? ?/sec arrow_reader_clickbench/async/Q211.03210.9±0.99ms? ?/sec 1.00205.5±0.88ms? ?/sec arrow_reader_clickbench/async/Q221.04434.6±1.73ms? ?/sec 1.00419.3±2.68ms? ?/sec arrow_reader_clickbench/async/Q231.04505.1±9.31ms? ?/sec 1.00483.4±5.74ms? ?/sec arrow_reader_clickbench/async/Q241.05 58.7±0.62ms? ?/sec 1.00 55.8±0.59ms? ?/sec arrow_reader_clickbench/async/Q271.05168.4±1.17ms? ?/sec 1.00160.5±1.00ms? ?/sec arrow_reader_clickbench/async/Q281.07169.4±0.66ms? ?/sec 1.00158.1±0.79ms? ?/sec arrow_reader_clickbench/async/Q301.07 67.9±0.27ms? ?/sec 1.00 63.6±0.72ms? ?/sec arrow_reader_clickbench/async/Q361.06174.5±1.02ms? ?/sec 1.00165.2±1.09ms? ?/sec arrow_reader_clickbench/async/Q371.06105.0±0.61ms? ?/sec 1.00 98.8±0.77ms? ?/sec arrow_reader_clickbench/async/Q381.04 40.6±0.24ms? ?/sec 1.00 39.0±0.34ms? ?/sec arrow_reader_clickbench/async/Q391.05 50.6±0.25ms? ?/sec 1.00 48.2±0.40ms? ?/sec arrow_reader_clickbench/async/Q401.04 54.8±0.38ms? ?/sec 1.00 52.9±0.46ms? ?/sec arrow_reader_clickbench/async/Q411.05 41.9±0.18ms? ?/sec 1.00 39.8±0.31ms? ?/sec arrow_reader_clickbench/async/Q421.04 14.8±0.15ms? ?/sec 1.00 14.3±0.06ms? ?/sec arrow_reader_clickbench/sync/Q1 1.00 2.2±0.00ms? ?/sec 1.00 2.2±0.01ms? ?/sec arrow_reader_clickbench/sync/Q10 1.03 13.7±0.04ms? ?/sec 1.00 13.3±0.08ms? ?/sec arrow_reader_clickbench/sync/Q11 1.02 15.5±0.06ms? ?/sec 1.00 15.3±0.09ms? ?/sec arrow_reader_clickbench/sync/Q12 1.01 41.5±0.20ms? ?/sec 1.00 41.0±0.29ms? ?/sec arrow_reader_clickbench/sync/Q13 1.01 54.7±0.33ms? ?/sec 1.00 54.3±0.35ms? ?/sec arrow_reader_clickbench/sync/Q14 1.01 53.1±0.41ms? ?/sec 1.00 52.7±0.41ms? ?/sec arrow_reader_clickbench/sync/Q19 1.00 4.3±0.01ms? ?/sec 1.01 4.3±0.03ms? ?/sec arrow_reader_clickbench/sync/Q20 1.02180.4±0.71ms? ?/sec 1.00177.4±0.65ms? ?/sec arrow_reader_clickbench/sync/Q21 1.03239.5±2.89ms? ?/sec 1.00233.0±2.13ms? ?/sec arrow_reader_clickbench/sync/Q22 1.04496.7±3.98ms? ?/sec 1.00477.6±2.59ms? ?/sec arrow_reader_clickbench/sync/Q23 1.00438.9±8.52ms? ?/sec 1.01 444.3±14.93ms? ?/sec arrow_reader_clickbench/sync/Q24 1.02 56.9±1.10ms? ?/sec 1.00 55.9±0.68ms? ?/sec arrow_reader_clickbench/sync/Q27 1.02156.6±1.00ms? ?/sec 1.00153.1±0.68ms? ?/sec arrow_reader_clickbench/sync/Q28 1.03155.3±0.91ms? ?/sec 1.00151.4±0.84ms? ?/sec arrow_reader_clickbench/sync/Q30 1.00 63.8±0.35ms? ?/sec 1.00 63.6±0.53ms? ?/sec arrow_reader_clickbench/sync/Q36 1.02161.6±1.35ms? ?/sec 1.00158.5±0.96ms? ?/sec arrow_reader_clickbench/sync/Q37 1.04 96.3±0.44ms? ?/sec 1.00 92.6±0.39ms? ?/sec arrow_reader_clickbench/sync/Q38 1.00 32.2±0.42ms? ?/sec 1.00 32.1±0.11ms? ?/sec arrow_reader_clickbe
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2902388172 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubuntu SMP Wed Apr 2 16:34:16 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/row_selection_plan (ee714ca3c974b394ee6d7412cedb256f77d2eedc) to 0d774fe4b3d08fba73bbbacfba34c35af9ca2251 [diff](https://github.com/apache/arrow-rs/compare/0d774fe4b3d08fba73bbbacfba34c35af9ca2251..ee714ca3c974b394ee6d7412cedb256f77d2eedc) BENCH_NAME=arrow_reader_clickbench BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench arrow_reader_clickbench BENCH_FILTER= BENCH_BRANCH_NAME=alamb_row_selection_plan Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#issuecomment-2902385734 I am pretty happy with how this currently looks, but before I mark it for review I want to make a proof of concept that I can actually improve performance with it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2102821625 ## parquet/src/arrow/arrow_reader/mod.rs: ## @@ -808,54 +809,45 @@ impl ParquetRecordBatchReader { /// Returns `Result>` rather than `Option>` to /// simplify error handling with `?` fn next_inner(&mut self) -> Result> { +let mut end_of_stream = false; let mut read_records = 0; let batch_size = self.batch_size(); -match self.read_plan.selection_mut() { -Some(selection) => { -while read_records < batch_size && !selection.is_empty() { -let front = selection.pop_front().unwrap(); -if front.skip { -let skipped = self.array_reader.skip_records(front.row_count)?; - -if skipped != front.row_count { -return Err(general_err!( -"failed to skip rows, expected {}, got {}", -front.row_count, -skipped -)); -} -continue; -} +while read_records < batch_size { +let Some(front) = self.read_plan.next_selector() else { +end_of_stream = true; +break; +}; -//Currently, when RowSelectors with row_count = 0 are included then its interpreted as end of reader. -//Fix is to skip such entries. See https://github.com/apache/arrow-rs/issues/2669 -if front.row_count == 0 { -continue; -} +if front.skip { Review Comment: Other than error checking, the inner loop now simply reads a `RowSelection` and does what it says ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -225,25 +287,436 @@ impl LimitedReadPlanBuilder { } } -/// A plan reading specific rows from a Parquet Row Group. +/// A plan for reading specific rows from a Parquet Row Group. /// /// See [`ReadPlanBuilder`] to create `ReadPlan`s -pub(crate) struct ReadPlan { -/// The number of rows to read in each batch -batch_size: usize, -/// Row ranges to be selected from the data source -selection: Option>, +#[derive(Debug, Clone, PartialEq)] +pub(crate) enum ReadPlan { +/// Read all rows in `batch_sized` chunks +All { +/// The number of rows to read in each batch +batch_size: usize, +}, +/// Read only a specific subset of rows +Subset { +/// The number of rows to read in each batch +batch_size: usize, +/// Pattern in which to decode the rows from the Parquet file +/// +/// This is a queue of [`RowSelector`]s that are guaranteed to have: +/// 1. No empty selections (that select no rows) +/// 2. Not span batch_size boundaries +/// 3. No trailing skip selections +/// +/// For example, if the `batch_size` is 100 and we are selecting all 200 rows +/// from a Parquet file, the selectors will be: +/// /// - `RowSelector::select(100) <-- forced break at batch_size boundary` +/// /// - `RowSelector::select(100)` +// In the future, we hope to replace `RowSelector` with a different enum +// that 1) represents batch emission and 2) has a more efficient mask for +// the selection +selectors: VecDeque, +}, } impl ReadPlan { -/// Returns a mutable reference to the selection, if any -pub(crate) fn selection_mut(&mut self) -> Option<&mut VecDeque> { -self.selection.as_mut() +/// Returns the next read instruction +pub(crate) fn next_selector(&mut self) -> Option { +match self { +Self::All { batch_size } => { +// If we are reading all rows, return a selector that selects +// the next batch_size rows +Some(RowSelector::select(*batch_size)) +} +Self::Subset { +batch_size: _, +selectors, +} => { +// If we are reading a specific set of rows, return the next +// selector in the queue +selectors.pop_front() +} +} } - /// Return the number of rows to read in each output batch #[inline(always)] pub fn batch_size(&self) -> usize { -self.batch_size +match self { +Self::All { batch_size } => *batch_size, +Self::Subset { batch_size, .. } => *batch_size, +} +} +} + +#[cfg(test)] +mod tests { Review Comment: Technically the plan generation code is already fully covered by other tests in this crate, but I added new unit tests here to: 1. Document the behavior better 2. Make it easier to write
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2102323500 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -247,3 +328,23 @@ impl ReadPlan { self.batch_size } } + +/// How to select the next batch of rows to read from the Parquet file +/// +/// This allows the reader to dynamically choose between decoding strategies +pub(crate) enum RowsPlan { Review Comment: It makes sense! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
alamb commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2102257029 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -247,3 +328,23 @@ impl ReadPlan { self.batch_size } } + +/// How to select the next batch of rows to read from the Parquet file +/// +/// This allows the reader to dynamically choose between decoding strategies +pub(crate) enum RowsPlan { Review Comment: Thank you. It isn't quite done yet, but it is getting close. I am thinking I will likely make 2 PRs 1. one PR that rearranges when the selectors are created, but still uses `RowSelector` 2. a second PR that will switch to use this enum However, I will leave out the BooleanBuffer part of the enum initially I think, to keep the review load manageabe Then I am thinking we can can adapt the code from these PRs - https://github.com/apache/arrow-rs/pull/7454 - https://github.com/apache/arrow-rs/pull/7503 - https://github.com/apache/arrow-rs/pull/7524 To implement filtering via mask -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] Move Selection logic into ReadPlan builder [arrow-rs]
zhuqi-lucas commented on code in PR #7537: URL: https://github.com/apache/arrow-rs/pull/7537#discussion_r2102118063 ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -231,14 +306,20 @@ impl LimitedReadPlanBuilder { pub(crate) struct ReadPlan { /// The number of rows to read in each batch batch_size: usize, -/// Row ranges to be selected from the data source -selection: Option>, +/// In what pattern to decode the rows from the Parquet file +/// +/// This is a queue of [`RowSelector`]s that are guaranteed: +/// 1. To have no empty selections (that select no rows) +/// 2. fall on a batch_size boundary (e.g. 0, 100, 200, 300) +/// +/// TODO change this structure to an enum with emit + mask Review Comment: Thank you @alamb , this is great idea, it means we can build the range/bitmap at the build time, and also the adaptive policy can applied here. ## parquet/src/arrow/arrow_reader/read_plan.rs: ## @@ -247,3 +328,23 @@ impl ReadPlan { self.batch_size } } + +/// How to select the next batch of rows to read from the Parquet file +/// +/// This allows the reader to dynamically choose between decoding strategies +pub(crate) enum RowsPlan { Review Comment: Beautiful enum, it will include all the cases! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org