Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2985195223 This is the last remaining one ``` filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.09 97.0±0.99ms? ?/sec1.00 88.8±0.70ms? ?/sec ``` I could reproduce a very small delta locally (I measured 5-7% slower) I am pretty sure I can make it up when I bring the filtering into the coalesce kernel so I would like to proceed with this PR. I will polish it up and make sure it is ready for review tomorrow morning -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984892252 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.00309.0±2.48ms? ?/sec1.00308.3±3.48ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.02 9.2±0.11ms? ?/sec1.00 9.0±0.11ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.3±0.08ms? ?/sec1.05 4.5±0.05ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.00 3.5±0.03ms? ?/sec1.04 3.6±0.04ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.10305.0±2.69ms? ?/sec1.00277.6±2.14ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.00 10.6±0.07ms? ?/sec1.03 10.9±0.28ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.00 4.7±0.13ms? ?/sec1.02 4.8±0.12ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.6±0.02ms? ?/sec1.03 4.7±0.03ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.06 74.5±0.84ms? ?/sec1.00 70.3±0.74ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 12.9±0.15ms? ?/sec1.02 13.2±0.18ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.00 10.2±0.29ms? ?/sec1.03 10.5±0.29ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.00 8.4±0.21ms? ?/sec1.13 9.5±0.20ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.09 97.0±0.99ms? ?/sec1.00 88.8±0.70ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.03 15.4±0.13ms? ?/sec1.00 15.0±0.14ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.00 9.9±0.09ms? ?/sec1.05 10.4±0.28ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.00 10.0±0.25ms? ?/sec1.13 11.3±0.34ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 59.4±0.26ms? ?/sec1.19 70.6±0.29ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 7.6±0.05ms? ?/sec1.20 9.1±0.03ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.00 4.8±0.20ms? ?/sec1.11 5.3±0.26ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.00 3.1±0.03ms? ?/sec1.15 3.6±0.05ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 80.5±0.28ms? ?/sec1.12 90.4±0.38ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 10.9±0.05ms? ?/sec1.11 12.1±0.08ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.6±0.29ms? ?/sec1.08 6.0±0.22ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.00 3.8±0.01ms? ?/sec1.02 3.8±0.01ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 52.5±0.16ms? ?/sec1.19 62.6±0.55ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 6.3±0.03ms? ?/sec1.15 7.2±0.03ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.8±0.19ms? ?/sec1.13 3.1±0.18ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.2±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984839673 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (d63cac7b26c0254a72c903a6fcce4c1e336d0cb5) to 6227419d2238a6e943bbfd7a7251615ea4d60ba8 [diff](https://github.com/apache/arrow-rs/compare/6227419d2238a6e943bbfd7a7251615ea4d60ba8..d63cac7b26c0254a72c903a6fcce4c1e336d0cb5) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984471796
Ok, I figured out what is going on and why the `mixed_utf8view` test is
slowing down. The issue is that the new utf8view code is triggering garbage
collection (string copying) when the old one did not. I put some `println!` and
on main it shows
```
ideal_buffer_size: 370022, actual_buffer_size: 614032
```
This is right under the cutoff load factor (0.5) that would force a a copy
of the strings into new buffers
However, on this branch, because the GC happens *after* the input is sliced
the overall load factor is smaller which triggers the GC in some cases
```
ideal_buffer_size: 246034, actual_buffer_size: 614032
ideal_buffer_size: 123988, actual_buffer_size: 614032
ideal_buffer_size: 13, actual_buffer_size: 614032
Need GC
```
If I hard code the gc heuristic to be different
```diff
index 0be8702c1b..5e4695dd7e 100644
--- a/arrow-select/src/coalesce/byte_view.rs
+++ b/arrow-select/src/coalesce/byte_view.rs
@@ -290,7 +290,7 @@ impl InProgressArray for
InProgressByteViewArray {
// Copying the strings into a buffer can be time-consuming so
// only do it if the array is sparse
-if actual_buffer_size > (ideal_buffer_size * 2) {
+if actual_buffer_size > (ideal_buffer_size * 100) {
self.append_views_and_copy_strings(s.views(),
ideal_buffer_size, buffers);
} else {
self.append_views_and_update_buffer_index(s.views(), buffers);
```
The performance for this benchmark is the same as on main
I am thinking about how best to fix this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984356370 Well, I got some of the performance back but I still can't explain why it is 10% slower. The only thing I can come up with is that the coalesce kernel is slower due to allocations or something ``` group alamb_faster_coalesce main - - filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.09 96.2±0.63ms? ?/sec1.00 88.1±0.72ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.21 4.6±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec ``` Maybe I can show that i can make up the difference with some other optimization -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984142533 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.08300.7±2.67ms? ?/sec1.00279.6±3.08ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.02 9.3±0.08ms? ?/sec1.00 9.1±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.4±0.13ms? ?/sec1.01 4.4±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.01 3.5±0.02ms? ?/sec1.00 3.5±0.02ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.00263.3±2.61ms? ?/sec1.14299.4±2.48ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.05 10.8±0.10ms? ?/sec1.00 10.3±0.10ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.00 4.7±0.10ms? ?/sec1.01 4.8±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.6±0.02ms? ?/sec1.02 4.7±0.03ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.03 72.6±0.62ms? ?/sec1.00 70.5±0.68ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 13.0±0.18ms? ?/sec1.01 13.2±0.16ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.02 10.1±0.44ms? ?/sec1.00 9.9±0.12ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.00 8.2±0.17ms? ?/sec1.30 10.7±0.23ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.09 96.2±0.63ms? ?/sec1.00 88.1±0.72ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.02 15.2±0.11ms? ?/sec1.00 14.9±0.13ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.00 10.1±0.12ms? ?/sec1.03 10.4±0.31ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.00 9.5±0.14ms? ?/sec1.10 10.4±0.23ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 62.8±0.25ms? ?/sec1.15 72.0±0.71ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 8.0±0.04ms? ?/sec1.14 9.1±0.04ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.00 5.2±0.26ms? ?/sec1.06 5.5±0.27ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.00 3.1±0.03ms? ?/sec1.09 3.4±0.03ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 83.1±0.27ms? ?/sec1.09 90.9±0.39ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 10.9±0.03ms? ?/sec1.12 12.2±0.06ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.7±0.17ms? ?/sec1.05 5.9±0.14ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.21 4.6±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 55.0±0.14ms? ?/sec1.16 63.7±0.33ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 6.5±0.03ms? ?/sec1.12 7.3±0.01ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.7±0.08ms? ?/sec1.07 2.9±0.15ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.3±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984073132 I removed the use of `RecordBatch::slice` but I still see a difference locally for reasons I can't understand I am currently wondering if it is related to caching effects. I am going to try and change the benchmark so it regenerates the inputs each time rather than pre-computing them -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2984068473 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (83029492349d38936a132e2ae04dc65f7aff4549) to 6227419d2238a6e943bbfd7a7251615ea4d60ba8 [diff](https://github.com/apache/arrow-rs/compare/6227419d2238a6e943bbfd7a7251615ea4d60ba8..83029492349d38936a132e2ae04dc65f7aff4549) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2983703192 I played around with this PR for a while this morning to see why the coalesce kernel slower for high selectivity filters even when the algorithm should be the same. I could reproduce it locally: this branch: ``` cargo bench --bench coalesce_kernels -- "mixed_utf8, 8192, nulls: 0, selectivity: 0.8" filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [3.7678 ms 3.8577 ms 3.9434 ms] time: [4.0622 ms 4.1009 ms 4.1415 ms] -- why does this slow down? time: [4.0251 ms 4.0980 ms 4.1655 ms] ``` on main, there is a real and measurable difference ``` filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [3.9407 ms 3.9883 ms 4.0385 ms] time: [3.5373 ms 3.6533 ms 3.7646 ms] time: [3.4424 ms 3.5605 ms 3.6821 ms] ``` I poked around with the profiles and it seems like the answer may be the overhead due to RecordBatch::slice -- I am looking into avoiding that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2981477925 Context: https://github.com/apache/arrow-rs/issues/7688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2981461003 I found that `interleave_views` is also *really* slow - perhaps we can combine the improvements to `concat` and `coalesce` here (it will benefit sorting). Another option would be to move to use the BatchCoalescer in the sort implementation as well 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977865411 Here are three runs on main on my gcp VM: ``` alamb@aal-dev:~/arrow-rs$ cargo bench --bench coalesce_kernels -- 'mixed_utf8, 8192, nulls: 0, selectivity: 0.8' .. filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [8.0962 ms 8.1297 ms 8.1635 ms] ... filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [7.9674 ms 7.9984 ms 8.0300 ms] ... time: [8.2022 ms 8.2490 ms 8.2975 ms] change: [+2.4166% +3.1325% +3.8706%] (p = 0.00 < 0.05) ... ``` Hera are three runs on this branch (basically the same, I can't reproduce the different numbers) 🤔 ``` Running benches/coalesce_kernels.rs (target/release/deps/coalesce_kernels-57e0af298c8ccbf1) filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [8.1231 ms 8.1549 ms 8.1878 ms] change: [−1.8215% −1.1398% −0.4447%] (p = 0.00 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild alamb@aal-dev:~/arrow-rs$ cargo bench --bench coalesce_kernels -- 'mixed_utf8, 8192, nulls: 0, selectivity: 0.8' && cargo bench --bench coalesce_kernels -- 'mixed_utf8, 8192, nulls: 0, selectivity: 0.8' Finished `bench` profile [optimized] target(s) in 0.17s Running benches/coalesce_kernels.rs (target/release/deps/coalesce_kernels-57e0af298c8ccbf1) filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [8.2279 ms 8.2742 ms 8.3229 ms] change: [+0.7339% +1.4627% +2.1941%] (p = 0.00 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe Finished `bench` profile [optimized] target(s) in 0.17s Running benches/coalesce_kernels.rs (target/release/deps/coalesce_kernels-57e0af298c8ccbf1) filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 time: [8.1797 ms 8.2236 ms 8.2693 ms] change: [−1.4209% −0.6114% +0.1819%] (p = 0.13 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977784303 > I need to look into one case that reports slower now: > > ``` > filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.23 10.4±0.21ms? ?/sec1.00 8.5±0.18ms? ?/sec > filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.24 4.9±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec > ``` > > I suspect it is something silly as the utf8 case doesn't even use the codepath I added 🤔 Perhaps it could be impacted by the change in the other codepaths? I think what might happen in certain cases is that one benchmark "warms" the memory allocator by already allocating memory regions, which makes the next benchmark go faster. Perhaps making one benchmark faster for something (doing fewer allocations) will make others that relied on this before run slower. What happens when running only those benchmarks without running the others? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2150673931
##
arrow-select/src/coalesce/generic.rs:
##
@@ -0,0 +1,60 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use super::InProgressArray;
+use crate::concat::concat;
+use arrow_array::{Array, ArrayRef};
+use arrow_schema::ArrowError;
+
+/// Fallback implementation for [`InProgressArray`]
+///
+/// Internally, buffers arrays and calls [`concat`]
+///
+/// [`concat`]: crate::concat::concat
+#[derive(Debug)]
+pub(crate) struct GenericInProgressArray {
+/// The buffered arrays
+buffered_arrays: Vec,
+}
+
+impl GenericInProgressArray {
+/// Create a new `GenericInProgressArray`
+pub(crate) fn new() -> Self {
+Self {
+buffered_arrays: vec![],
+}
+}
+}
+impl InProgressArray for GenericInProgressArray {
+fn push_array(&mut self, array: ArrayRef) {
+self.buffered_arrays.push(array);
+}
+
+fn finish(&mut self) -> Result {
+// Concatenate all buffered arrays into a single array, which uses 2x
+// peak memory
+let array = concat(
+&self
+.buffered_arrays
+.iter()
+.map(|array| array as &dyn Array)
Review Comment:
casting could be done in `push_array` to avoid the extra collect?
We can use `std::mem::take` here in that case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977322168 I need to look into one case that reports slower now: ``` filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.23 10.4±0.21ms? ?/sec1.00 8.5±0.18ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.24 4.9±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec ``` I suspect it is something silly as the utf8 case doesn't even use the codepath I added 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977288667 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.00261.8±1.69ms? ?/sec1.15301.0±2.34ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.00 8.6±0.11ms? ?/sec1.04 8.9±0.07ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.1±0.05ms? ?/sec1.02 4.2±0.06ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.00 3.5±0.03ms? ?/sec1.00 3.5±0.02ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.00254.4±1.79ms? ?/sec1.14288.9±2.58ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.00 10.0±0.07ms? ?/sec1.03 10.3±0.11ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.00 4.6±0.09ms? ?/sec1.01 4.6±0.06ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.6±0.03ms? ?/sec1.01 4.6±0.02ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.00 61.3±0.62ms? ?/sec1.08 66.1±0.51ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 11.8±0.13ms? ?/sec1.08 12.7±0.13ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.00 9.3±0.14ms? ?/sec1.04 9.7±0.36ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.23 10.4±0.21ms? ?/sec1.00 8.5±0.18ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.02 84.5±0.60ms? ?/sec1.00 83.0±0.55ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.00 14.1±0.14ms? ?/sec1.02 14.3±0.17ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.00 9.5±0.20ms? ?/sec1.03 9.7±0.19ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.00 9.9±0.19ms? ?/sec1.02 10.1±0.23ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 50.6±0.62ms? ?/sec1.34 68.0±0.32ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 6.7±0.03ms? ?/sec1.30 8.8±0.03ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.00 4.5±0.16ms? ?/sec1.11 5.0±0.25ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.00 3.2±0.06ms? ?/sec1.02 3.2±0.03ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 70.2±0.27ms? ?/sec1.23 86.4±0.36ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 9.8±0.05ms? ?/sec1.20 11.8±0.06ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.5±0.23ms? ?/sec1.03 5.7±0.24ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.24 4.9±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 43.5±0.13ms? ?/sec1.37 59.4±0.38ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 5.6±0.02ms? ?/sec1.26 7.1±0.02ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.6±0.15ms? ?/sec1.11 2.9±0.17ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.3±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977206850 > filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.25 4.9±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec Hmm, maybe got too carried away -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2150372756
##
arrow-select/src/coalesce.rs:
##
@@ -428,43 +381,76 @@ mod tests {
#[test]
fn test_string_view_no_views() {
-Test::new()
+let output_batches = Test::new()
// both input batches have no views, so no need to compact
.with_batch(stringview_batch([Some("foo"), Some("bar")]))
.with_batch(stringview_batch([Some("baz"), Some("qux")]))
.with_expected_output_sizes(vec![4])
.run();
+
+expect_buffer_layout(
+col_as_string_view("c0", output_batches.first().unwrap()),
+vec![],
+);
}
#[test]
fn test_string_view_batch_small_no_compact() {
// view with only short strings (no buffers) --> no need to compact
let batch = stringview_batch_repeated(1000, [Some("a"), Some("b"),
Some("c")]);
-let gc_batches = Test::new()
+let output_batches = Test::new()
.with_batch(batch.clone())
.with_expected_output_sizes(vec![1000])
.run();
let array = col_as_string_view("c0", &batch);
-let gc_array = col_as_string_view("c0", gc_batches.first().unwrap());
+let gc_array = col_as_string_view("c0",
output_batches.first().unwrap());
assert_eq!(array.data_buffers().len(), 0);
assert_eq!(array.data_buffers().len(), gc_array.data_buffers().len());
// no compaction
+
+expect_buffer_layout(gc_array, vec![]);
}
#[test]
fn test_string_view_batch_large_no_compact() {
// view with large strings (has buffers) but full --> no need to
compact
let batch = stringview_batch_repeated(1000, [Some("This string is
longer than 12 bytes")]);
-let gc_batches = Test::new()
+let output_batches = Test::new()
.with_batch(batch.clone())
.with_batch_size(1000)
.with_expected_output_sizes(vec![1000])
.run();
let array = col_as_string_view("c0", &batch);
-let gc_array = col_as_string_view("c0", gc_batches.first().unwrap());
+let gc_array = col_as_string_view("c0",
output_batches.first().unwrap());
assert_eq!(array.data_buffers().len(), 5);
assert_eq!(array.data_buffers().len(), gc_array.data_buffers().len());
// no compaction
+
+expect_buffer_layout(
Review Comment:
I also added unit tests for the expected buffer layout as a lot of the
complexity is related to calculating this layour
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977238746 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (8d4fad51706144034eacc08090865771e9a1426c) to f48efc21a2fcbde01b8a7fa354d03c6b70607127 [diff](https://github.com/apache/arrow-rs/compare/f48efc21a2fcbde01b8a7fa354d03c6b70607127..8d4fad51706144034eacc08090865771e9a1426c) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2150368515
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +249,339 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
Review Comment:
done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2977204742 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (7e10eb0363df3ef132a1a9acc6a5b9fca4f7ac30) to f48efc21a2fcbde01b8a7fa354d03c6b70607127 [diff](https://github.com/apache/arrow-rs/compare/f48efc21a2fcbde01b8a7fa354d03c6b70607127..7e10eb0363df3ef132a1a9acc6a5b9fca4f7ac30) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976562026 > 🤖: Benchmark completed Now [that looks good to me ](https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976543890) -- basically as good or better for all cases.  I will polish this PR up and mark it ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976543890 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.00158.9±1.27ms? ?/sec1.85294.1±2.29ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.00 8.6±0.11ms? ?/sec1.02 8.8±0.10ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.2±0.10ms? ?/sec1.03 4.3±0.12ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.00 3.5±0.03ms? ?/sec1.00 3.5±0.02ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.00137.3±1.37ms? ?/sec1.81248.9±1.91ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.03 10.3±0.11ms? ?/sec1.00 10.0±0.07ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.05 4.8±0.05ms? ?/sec1.00 4.6±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.6±0.02ms? ?/sec1.01 4.6±0.02ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.00 62.1±0.67ms? ?/sec1.05 65.0±0.40ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 11.8±0.10ms? ?/sec1.07 12.5±0.15ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.00 9.5±0.31ms? ?/sec1.03 9.8±0.34ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.00 8.4±0.22ms? ?/sec1.22 10.3±0.31ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.02 84.6±0.64ms? ?/sec1.00 82.7±0.63ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.00 14.4±0.13ms? ?/sec1.00 14.4±0.11ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.03 10.1±0.29ms? ?/sec1.00 9.8±0.28ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.00 10.0±0.20ms? ?/sec1.00 10.0±0.16ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 51.9±0.37ms? ?/sec1.30 67.3±0.41ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 7.1±0.03ms? ?/sec1.24 8.8±0.35ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.00 4.8±0.30ms? ?/sec1.00 4.8±0.12ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.00 3.3±0.04ms? ?/sec1.00 3.3±0.03ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 71.9±0.37ms? ?/sec1.19 85.6±0.44ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 9.9±0.07ms? ?/sec1.18 11.7±0.04ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.7±0.26ms? ?/sec1.02 5.8±0.25ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.25 4.9±0.03ms? ?/sec1.00 3.9±0.02ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 44.5±0.30ms? ?/sec1.33 59.0±0.26ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 5.6±0.02ms? ?/sec1.24 7.0±0.02ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.5±0.04ms? ?/sec1.17 3.0±0.18ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.3±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976493133 I found limiting the max buffer size to 1MB (from 2MB) seems to have improved performance substantially. Rerunning now to check -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976495597 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (5aed0b88d14d6d5378f5ac34ff2b07c57a9d6889) to f48efc21a2fcbde01b8a7fa354d03c6b70607127 [diff](https://github.com/apache/arrow-rs/compare/f48efc21a2fcbde01b8a7fa354d03c6b70607127..5aed0b88d14d6d5378f5ac34ff2b07c57a9d6889) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2976197147 I have been thinking about this PR. My proposal is going to be to merge it as is (even though it shows a performance regression for many large strings). The thinking is that even though this particular change will slow down for large strings, we are still better off overall due to - XXX Also, I think when I incorporate actual filtering as well we'll further make up in performance - https://github.com/apache/arrow-rs/pull/7652 So I plan to: 1. Do some more testing 2. Make a POC with adding native filtering on top of this PR to see if we can get the performance back down -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2148839214
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +249,339 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
Review Comment:
For using it in DataFusion, this needs to be Send + Sync
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2148839141
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +249,339 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
Review Comment:
For using it in DataFusion (https://github.com/apache/datafusion/pull/16249)
this needs to be Send + Sync.
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +249,339 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
Review Comment:
For using it in DataFusion (https://github.com/apache/datafusion/pull/16249)
this needs to be Send + Sync.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2968329495 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.00304.8±2.68ms? ?/sec1.01308.4±2.14ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.00 8.6±0.09ms? ?/sec1.04 9.0±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.3±0.13ms? ?/sec1.05 4.5±0.11ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.00 3.5±0.02ms? ?/sec1.03 3.6±0.02ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.22304.8±2.53ms? ?/sec1.00250.3±2.87ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.00 10.0±0.10ms? ?/sec1.03 10.3±0.08ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.00 4.6±0.11ms? ?/sec1.07 4.9±0.08ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.5±0.02ms? ?/sec1.03 4.7±0.03ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.00 62.9±0.61ms? ?/sec1.09 68.7±0.77ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 12.1±0.09ms? ?/sec1.07 12.9±0.19ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.00 9.8±0.34ms? ?/sec1.01 9.8±0.38ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.00 8.2±0.15ms? ?/sec1.01 8.3±0.16ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.01 86.1±0.75ms? ?/sec1.00 85.5±0.49ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.00 14.2±0.16ms? ?/sec1.06 15.1±0.15ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.00 10.0±0.25ms? ?/sec1.03 10.3±0.37ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.04 10.3±0.24ms? ?/sec1.00 9.8±0.25ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 55.7±0.71ms? ?/sec1.25 69.5±0.45ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 7.8±0.04ms? ?/sec1.15 9.0±0.08ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.06 5.5±0.23ms? ?/sec1.00 5.2±0.25ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.10 3.6±0.13ms? ?/sec1.00 3.3±0.04ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 71.7±0.39ms? ?/sec1.24 89.0±0.48ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 9.8±0.05ms? ?/sec1.25 12.3±0.07ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.2±0.09ms? ?/sec1.20 6.2±0.14ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.26 4.7±0.03ms? ?/sec1.00 3.7±0.01ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 44.8±0.16ms? ?/sec1.36 61.0±0.29ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 5.5±0.02ms? ?/sec1.29 7.1±0.03ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.7±0.22ms? ?/sec1.17 3.2±0.18ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.2±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2968302569 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (eb6e123ad0044d3afc20b4030a451f8b1c79092d) to e32f545c75923a17cf5ab91d0040a72fbdc771a2 [diff](https://github.com/apache/arrow-rs/compare/e32f545c75923a17cf5ab91d0040a72fbdc771a2..eb6e123ad0044d3afc20b4030a451f8b1c79092d) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143743156
##
arrow-select/src/coalesce.rs:
##
@@ -158,14 +164,12 @@ impl BatchCoalescer {
/// Push next batch into the Coalescer
///
/// See [`Self::next_completed_batch()`] to retrieve any completed batches.
-pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError>
{
+pub fn push_batch(&mut self, mut batch: RecordBatch) -> Result<(),
ArrowError> {
if batch.num_rows() == 0 {
Review Comment:
Thanks -- that is a good idea -- I need to review what, if anything, assumes
perfectly sized target batches...
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143742195
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +250,333 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
+/// Push a new array to the in-progress array
+fn push_array(&mut self, array: ArrayRef);
+
+/// Finish the currently in-progress array and clear state for the next
+fn finish(&mut self) -> Result;
+}
+
+/// Fallback implementation for [`InProgressArray`]
///
-/// However, after a while (e.g., after `FilterExec` or `HashJoinExec`) the
-/// `StringViewArray` may only refer to a small portion of the buffer,
-/// significantly increasing memory usage.
-fn gc_string_view_batch(batch: RecordBatch) -> RecordBatch {
-let (schema, columns, num_rows) = batch.into_parts();
-let new_columns: Vec = columns
-.into_iter()
-.map(|c| {
-// Try to re-create the `StringViewArray` to prevent holding the
underlying buffer too long.
-let Some(s) = c.as_string_view_opt() else {
-return c;
-};
-if s.data_buffers().is_empty() {
-// If there are no data buffers, we can just return the array
as is
-return c;
-}
-let ideal_buffer_size: usize = s
-.views()
+/// Internally, buffers arrays and calls [`concat`]
+#[derive(Debug)]
+struct GenericInProgressArray {
+/// The buffered arrays
+buffered_arrays: Vec,
+}
+
+impl GenericInProgressArray {
+/// Create a new `GenericInProgressArray`
+pub fn new() -> Self {
+Self {
+buffered_arrays: vec![],
+}
+}
+}
+impl InProgressArray for GenericInProgressArray {
+fn push_array(&mut self, array: ArrayRef) {
+self.buffered_arrays.push(array);
+}
+
+fn finish(&mut self) -> Result {
+// Concatenate all buffered arrays into a single array, which uses 2x
+// peak memory
+let array = concat(
+&self
+.buffered_arrays
.iter()
-.map(|v| {
-let len = (*v as u32) as usize;
-if len > 12 {
-len
-} else {
-0
-}
-})
-.sum();
-let actual_buffer_size = s.get_buffer_memory_size();
-let buffers = s.data_buffers();
-
-// Re-creating the array copies data and can be time consuming.
-// We only do it if the array is sparse
-if actual_buffer_size > (ideal_buffer_size * 2) {
-if ideal_buffer_size == 0 {
-// If the ideal buffer size is 0, all views are inlined
-// so just reuse the views
-return Arc::new(unsafe {
-StringViewArray::new_unchecked(
-s.views().clone(),
-vec![],
-s.nulls().cloned(),
-)
-});
-}
-// We set the block size to `ideal_buffer_size` so that the
new StringViewArray only has one buffer, which accelerate later concat_batches.
-// See https://github.com/apache/arrow-rs/issues/6094 for more
details.
-let mut buffer: Vec =
Vec::with_capacity(ideal_buffer_size);
-
-let views: Vec = s
-.views()
-.as_ref(
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143676299
##
arrow-select/src/coalesce.rs:
##
@@ -158,14 +164,12 @@ impl BatchCoalescer {
/// Push next batch into the Coalescer
///
/// See [`Self::next_completed_batch()`] to retrieve any completed batches.
-pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError>
{
+pub fn push_batch(&mut self, mut batch: RecordBatch) -> Result<(),
ArrowError> {
if batch.num_rows() == 0 {
Review Comment:
Maybe consider not trying to coalesce the batch whenever it has more than
1/2 (or some other factor) of rows of `batch_size`?
Probably best to have it configurable, as it depends on the usage if this is
a good idea or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143676299
##
arrow-select/src/coalesce.rs:
##
@@ -158,14 +164,12 @@ impl BatchCoalescer {
/// Push next batch into the Coalescer
///
/// See [`Self::next_completed_batch()`] to retrieve any completed batches.
-pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError>
{
+pub fn push_batch(&mut self, mut batch: RecordBatch) -> Result<(),
ArrowError> {
if batch.num_rows() == 0 {
Review Comment:
Maybe consider not trying to coalesce the batch whenever it has more than
1/2 (or some other factor) of values of `batch_size`?
Probably best to have it configurable, as it depends on the usage if this is
a good idea or not.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143678004
##
arrow-select/src/coalesce.rs:
##
@@ -158,14 +164,12 @@ impl BatchCoalescer {
/// Push next batch into the Coalescer
///
/// See [`Self::next_completed_batch()`] to retrieve any completed batches.
-pub fn push_batch(&mut self, batch: RecordBatch) -> Result<(), ArrowError>
{
+pub fn push_batch(&mut self, mut batch: RecordBatch) -> Result<(),
ArrowError> {
if batch.num_rows() == 0 {
Review Comment:
This logic was once in DataFusion, but is somehow gone 🤔
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
Dandandan commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143633902
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +250,333 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
+/// Push a new array to the in-progress array
+fn push_array(&mut self, array: ArrayRef);
+
+/// Finish the currently in-progress array and clear state for the next
+fn finish(&mut self) -> Result;
+}
+
+/// Fallback implementation for [`InProgressArray`]
///
-/// However, after a while (e.g., after `FilterExec` or `HashJoinExec`) the
-/// `StringViewArray` may only refer to a small portion of the buffer,
-/// significantly increasing memory usage.
-fn gc_string_view_batch(batch: RecordBatch) -> RecordBatch {
-let (schema, columns, num_rows) = batch.into_parts();
-let new_columns: Vec = columns
-.into_iter()
-.map(|c| {
-// Try to re-create the `StringViewArray` to prevent holding the
underlying buffer too long.
-let Some(s) = c.as_string_view_opt() else {
-return c;
-};
-if s.data_buffers().is_empty() {
-// If there are no data buffers, we can just return the array
as is
-return c;
-}
-let ideal_buffer_size: usize = s
-.views()
+/// Internally, buffers arrays and calls [`concat`]
+#[derive(Debug)]
+struct GenericInProgressArray {
+/// The buffered arrays
+buffered_arrays: Vec,
+}
+
+impl GenericInProgressArray {
+/// Create a new `GenericInProgressArray`
+pub fn new() -> Self {
+Self {
+buffered_arrays: vec![],
+}
+}
+}
+impl InProgressArray for GenericInProgressArray {
+fn push_array(&mut self, array: ArrayRef) {
+self.buffered_arrays.push(array);
+}
+
+fn finish(&mut self) -> Result {
+// Concatenate all buffered arrays into a single array, which uses 2x
+// peak memory
+let array = concat(
+&self
+.buffered_arrays
.iter()
-.map(|v| {
-let len = (*v as u32) as usize;
-if len > 12 {
-len
-} else {
-0
-}
-})
-.sum();
-let actual_buffer_size = s.get_buffer_memory_size();
-let buffers = s.data_buffers();
-
-// Re-creating the array copies data and can be time consuming.
-// We only do it if the array is sparse
-if actual_buffer_size > (ideal_buffer_size * 2) {
-if ideal_buffer_size == 0 {
-// If the ideal buffer size is 0, all views are inlined
-// so just reuse the views
-return Arc::new(unsafe {
-StringViewArray::new_unchecked(
-s.views().clone(),
-vec![],
-s.nulls().cloned(),
-)
-});
-}
-// We set the block size to `ideal_buffer_size` so that the
new StringViewArray only has one buffer, which accelerate later concat_batches.
-// See https://github.com/apache/arrow-rs/issues/6094 for more
details.
-let mut buffer: Vec =
Vec::with_capacity(ideal_buffer_size);
-
-let views: Vec = s
-.views()
-.as_
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143592404
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +250,333 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
+/// Push a new array to the in-progress array
+fn push_array(&mut self, array: ArrayRef);
+
+/// Finish the currently in-progress array and clear state for the next
+fn finish(&mut self) -> Result;
+}
+
+/// Fallback implementation for [`InProgressArray`]
///
-/// However, after a while (e.g., after `FilterExec` or `HashJoinExec`) the
-/// `StringViewArray` may only refer to a small portion of the buffer,
-/// significantly increasing memory usage.
-fn gc_string_view_batch(batch: RecordBatch) -> RecordBatch {
Review Comment:
The logic of `gc_string_view_batch` has been incorporated into
`InProgressStringViewArray`, and thus saves a (third!!!) copy of the strings
for some StringViewArrays
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143592404
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +250,333 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
+/// Push a new array to the in-progress array
+fn push_array(&mut self, array: ArrayRef);
+
+/// Finish the currently in-progress array and clear state for the next
+fn finish(&mut self) -> Result;
+}
+
+/// Fallback implementation for [`InProgressArray`]
///
-/// However, after a while (e.g., after `FilterExec` or `HashJoinExec`) the
-/// `StringViewArray` may only refer to a small portion of the buffer,
-/// significantly increasing memory usage.
-fn gc_string_view_batch(batch: RecordBatch) -> RecordBatch {
Review Comment:
The logic of `gc_string_view_batch` has been incorporated into
`InProgressStringViewArray`, and thus saves a copy for arrays that are not
copied much
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143591380
##
arrow-select/src/coalesce.rs:
##
@@ -222,122 +250,333 @@ impl BatchCoalescer {
}
}
-/// Heuristically compact `StringViewArray`s to reduce memory usage, if needed
-///
-/// Decides when to consolidate the StringView into a new buffer to reduce
-/// memory usage and improve string locality for better performance.
-///
-/// This differs from `StringViewArray::gc` because:
-/// 1. It may not compact the array depending on a heuristic.
-/// 2. It uses a precise block size to reduce the number of buffers to track.
-///
-/// # Heuristic
+/// Return a new `InProgressArray` for the given data type
+fn create_in_progress_array(data_type: &DataType, batch_size: usize) ->
Box {
+match data_type {
+DataType::Utf8View =>
Box::new(InProgressStringViewArray::new(batch_size)),
+_ => Box::new(GenericInProgressArray::new()),
+}
+}
+
+/// Incrementally builds in progress arrays
///
-/// If the average size of each view is larger than 32 bytes, we compact the
array.
+/// There are different specialized implementations of this trait for different
+/// array types (e.g., [`StringViewArray`], [`UInt32Array`], etc.).
///
-/// `StringViewArray` include pointers to buffer that hold the underlying data.
-/// One of the great benefits of `StringViewArray` is that many operations
-/// (e.g., `filter`) can be done without copying the underlying data.
+/// This is a subset of the ArrayBuilder APIs, but specialized for
+/// the incremental usecase
+trait InProgressArray: std::fmt::Debug {
+/// Push a new array to the in-progress array
+fn push_array(&mut self, array: ArrayRef);
+
+/// Finish the currently in-progress array and clear state for the next
+fn finish(&mut self) -> Result;
+}
+
+/// Fallback implementation for [`InProgressArray`]
///
-/// However, after a while (e.g., after `FilterExec` or `HashJoinExec`) the
-/// `StringViewArray` may only refer to a small portion of the buffer,
-/// significantly increasing memory usage.
-fn gc_string_view_batch(batch: RecordBatch) -> RecordBatch {
-let (schema, columns, num_rows) = batch.into_parts();
-let new_columns: Vec = columns
-.into_iter()
-.map(|c| {
-// Try to re-create the `StringViewArray` to prevent holding the
underlying buffer too long.
-let Some(s) = c.as_string_view_opt() else {
-return c;
-};
-if s.data_buffers().is_empty() {
-// If there are no data buffers, we can just return the array
as is
-return c;
-}
-let ideal_buffer_size: usize = s
-.views()
+/// Internally, buffers arrays and calls [`concat`]
+#[derive(Debug)]
+struct GenericInProgressArray {
+/// The buffered arrays
+buffered_arrays: Vec,
+}
+
+impl GenericInProgressArray {
+/// Create a new `GenericInProgressArray`
+pub fn new() -> Self {
+Self {
+buffered_arrays: vec![],
+}
+}
+}
+impl InProgressArray for GenericInProgressArray {
+fn push_array(&mut self, array: ArrayRef) {
+self.buffered_arrays.push(array);
+}
+
+fn finish(&mut self) -> Result {
+// Concatenate all buffered arrays into a single array, which uses 2x
Review Comment:
The default fallback does the same as the existing coalesce kernel and
buffers all the input arrays until it is time to output and then calls `concat`
Over time I hope to replace this memory inefficient algorithm with one that
is both more memory efficient and faster
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on code in PR #7650:
URL: https://github.com/apache/arrow-rs/pull/7650#discussion_r2143589876
##
arrow-select/src/coalesce.rs:
##
@@ -123,8 +123,8 @@ pub struct BatchCoalescer {
schema: SchemaRef,
/// output batch size
batch_size: usize,
-/// In-progress buffered batches
-buffer: Vec,
+/// In-progress arrays
Review Comment:
The main change of this PR is to introduce per-type "InProgressArrays" that
have specializations for each type of array
This PR adds a specialization for StringViewArray (to remove the need for
`gc_string_view_batch`)
In follow on PRs (maybe tickets) we can add specialized versions for
PrimitiveArray, StringArray, etc
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2968026818 All cases faster except for this one (where I think there are a few more allocations). I'll try to squeeze a little more performance ``` > filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.22 4.6±0.03ms? ?/sec1.00 3.8±0.01ms? ?/sec ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2968020458 🤖: Benchmark completed Details ``` group alamb_faster_coalesce main - - filter: mixed_dict, 8192, nulls: 0, selectivity: 0.001 1.00285.3±2.13ms? ?/sec1.02291.8±1.70ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.01 1.00 8.6±0.07ms? ?/sec1.04 8.9±0.11ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.1 1.00 4.2±0.11ms? ?/sec1.02 4.3±0.03ms? ?/sec filter: mixed_dict, 8192, nulls: 0, selectivity: 0.8 1.00 3.5±0.12ms? ?/sec1.02 3.5±0.02ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.001 1.00280.3±3.59ms? ?/sec1.00279.6±3.05ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.01 1.00 10.0±0.04ms? ?/sec1.02 10.2±0.09ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.1 1.00 4.6±0.05ms? ?/sec1.08 5.0±0.10ms? ?/sec filter: mixed_dict, 8192, nulls: 0.1, selectivity: 0.8 1.00 4.5±0.03ms? ?/sec1.02 4.6±0.02ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.001 1.00 61.9±0.62ms? ?/sec1.09 67.6±0.54ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.01 1.00 12.1±0.17ms? ?/sec1.07 12.9±0.12ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.1 1.02 9.9±0.36ms? ?/sec1.00 9.8±0.41ms? ?/sec filter: mixed_utf8, 8192, nulls: 0, selectivity: 0.8 1.00 7.7±0.08ms? ?/sec1.09 8.5±0.06ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.001 1.01 85.2±0.43ms? ?/sec1.00 84.4±0.49ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.01 1.00 14.1±0.13ms? ?/sec1.06 15.0±0.17ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.1 1.00 9.9±0.29ms? ?/sec1.03 10.2±0.33ms? ?/sec filter: mixed_utf8, 8192, nulls: 0.1, selectivity: 0.8 1.00 9.4±0.11ms? ?/sec1.01 9.5±0.15ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.001 1.00 52.8±0.68ms? ?/sec1.31 68.9±0.40ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.01 1.00 6.7±0.04ms? ?/sec1.34 9.0±0.04ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.11.00 4.8±0.26ms? ?/sec1.05 5.1±0.22ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0, selectivity: 0.81.00 2.9±0.02ms? ?/sec1.12 3.3±0.02ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.0011.00 70.7±0.18ms? ?/sec1.24 87.3±0.30ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.01 1.00 9.6±0.03ms? ?/sec1.29 12.3±0.06ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.1 1.00 5.4±0.19ms? ?/sec1.12 6.1±0.04ms? ?/sec filter: mixed_utf8view (max_string_len=128), 8192, nulls: 0.1, selectivity: 0.8 1.22 4.6±0.03ms? ?/sec1.00 3.8±0.01ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.001 1.00 43.9±0.11ms? ?/sec1.38 60.4±0.34ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.011.00 5.4±0.03ms? ?/sec1.32 7.2±0.13ms ? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.1 1.00 2.7±0.18ms? ?/sec1.08 2.9±0.14ms? ?/sec filter: mixed_utf8view (max_string_len=20), 8192, nulls: 0, selectivity: 0.8 1.00 2.2±0.01ms
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2967987095 🤖 `./gh_compare_arrow.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_arrow.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubuntu SMP Thu Apr 24 20:41:05 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux Comparing alamb/faster_coalesce (6dd3aa76fa7038cbd203cb3e988b4e6de741760a) to e32f545c75923a17cf5ab91d0040a72fbdc771a2 [diff](https://github.com/apache/arrow-rs/compare/e32f545c75923a17cf5ab91d0040a72fbdc771a2..6dd3aa76fa7038cbd203cb3e988b4e6de741760a) BENCH_NAME=coalesce_kernels BENCH_COMMAND=cargo bench --features=arrow,async,test_common,experimental --bench coalesce_kernels BENCH_FILTER= BENCH_BRANCH_NAME=alamb_faster_coalesce Results will be posted here when complete -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] WIP: Optimize coalesce kernel for StringView [arrow-rs]
alamb commented on PR #7650: URL: https://github.com/apache/arrow-rs/pull/7650#issuecomment-2967892992 Update here is that this PR is quite a bit slower than main when coopying many string values. I am looking into why -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
