Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2031035548 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > I might be missing some obvious points, but > > 1. applying the same check for rhs is redundant? (does it approximately requires the same computation if we continue the execution as is) > 2. why aren't we also covering the cases when **lhs is all true and op is AND** / **lhs is all false and op is OR** I've optimized the comments, so it may look a bit clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783526836 > The only thing I think is needed for this PR is to either > > 1. remove the bechmark > 2. update the benchmark ti actually test `BinaryExpr` evaluation time done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
berkaysynnada commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2034007815 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > > Doesn't this brings a clear gain by short-circuiting? > > I understand these now. Sorry, I didn't notice it just now and thought it was only a positional difference. > > This is a great idea. I think we can open a new issue and add this optimization. what about these? > [xxx] AND [false,false,false...false] => return rhs, which is [false,false,false...false] > [xxx] OR [true, true, true... true] => return rhs, which is [true, true, true...true] -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
berkaysynnada commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2034007019 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > I don't think you can rewrite expressions like this without changing the short-circuit semantics (if lHS is false then don't run the RHS in `a AND b`) We don't need a rewrite. evaluate() needs to return a ColumnarValue, so we can just return the short-circuited versions in my examples -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2787055441 the extended tests appear to be failing after this PR: - https://github.com/apache/datafusion/issues/15641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2033602903 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: I don't think you can rewrite expressions like this without changing the short-circuit semantics (if lHS is false then don't run the RHS in `a AND b`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2033205911 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > Doesn't this brings a clear gain by short-circuiting? I understand these now. Sorry, I didn't notice it just now and thought it was only a positional difference. This is a great idea. I think we can open a new issue and add this optimization. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
berkaysynnada commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2033185234 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > but wouldn't it be better to do it this way: the optimizer rewrites simpler expressions to the left as much as possible? Is there a planned work to implement this? If it is so, trying first lhs makes sense. But even if that's the case, checking the rhs values as well, before doing the all binary computation, is a preferable option? Let's imagine this case also (I adapt my second example such that homogenous side is lhs) ``` [true, true, true... true] AND [xxx] => return rhs, which is [xxx] [false, false, false... false] OR [xxx] => return rhs, which is [xxx] ``` Doesn't this brings a clear gain by short-circuiting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2033160515 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: Okay, I understand now. However, the current execution order of BinaryExpr is fixed to evaluate the left side first and then the right side. The situation you mentioned I had also considered before, but wouldn't it be better to do it this way: the optimizer rewrites simpler expressions to the left as much as possible? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
berkaysynnada commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2033101460 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: I think I couldn't be clear in my last comment. This PR short-circuits this case: ``` [false,false,false...false] AND [xxx] => return lhs, which is [false,false,false...false] [true, true, true... true] OR [xxx] => return lhs, which is [true, true, true...true] ``` I have 2 further ideas to discuss: 1) ``` [xxx] AND [false,false,false...false] => return rhs, which is [false,false,false...false] [xxx] OR [true, true, true... true] => return rhs, which is [true, true, true...true] ``` Why don't we check this case as well? 2) ``` [xxx] AND [true, true, true... true] => return lhs, which is [xxx] [xxx] OR [false, false, false... false] => return lhs, which is [xxx] ``` isn't this case optimizable as well? Are we handling those cases in another place? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2032964400 ## datafusion/physical-expr/benches/binary_op.rs: ## @@ -0,0 +1,373 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::{ +array::BooleanArray, +compute::{bool_and, bool_or}, +datatypes::{DataType, Field, Schema}, +}; +use arrow::{array::StringArray, record_batch::RecordBatch}; +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use datafusion_expr::{and, binary_expr, col, lit, or, Operator}; +use datafusion_physical_expr::{ +expressions::{BinaryExpr, Column}, +planner::logical2physical, +PhysicalExpr, +}; +use std::sync::{Arc, LazyLock}; + +/// Generates BooleanArrays with different true/false distributions for benchmarking. +/// +/// Returns a vector of tuples containing scenario name and corresponding BooleanArray. +/// +/// # Arguments +/// - `TEST_ALL_FALSE` - Used to generate what kind of test data +/// - `len` - Length of the BooleanArray to generate +fn generate_boolean_cases( +len: usize, +) -> Vec<(String, BooleanArray)> { +let mut cases = Vec::with_capacity(6); + +// Scenario 1: All elements false or all elements true +if TEST_ALL_FALSE { +let all_false = BooleanArray::from(vec![false; len]); +cases.push(("all_false".to_string(), all_false)); +} else { +let all_true = BooleanArray::from(vec![true; len]); +cases.push(("all_true".to_string(), all_true)); +} + +// Scenario 2: Single true at first position or single false at first position +if TEST_ALL_FALSE { +let mut first_true = vec![false; len]; +first_true[0] = true; +cases.push(("one_true_first".to_string(), BooleanArray::from(first_true))); +} else { +let mut first_false = vec![true; len]; +first_false[0] = false; +cases.push(( +"one_false_first".to_string(), +BooleanArray::from(first_false), +)); +} + +// Scenario 3: Single true at last position or single false at last position +if TEST_ALL_FALSE { +let mut last_true = vec![false; len]; +last_true[len - 1] = true; +cases.push(("one_true_last".to_string(), BooleanArray::from(last_true))); +} else { +let mut last_false = vec![true; len]; +last_false[len - 1] = false; +cases.push(("one_false_last".to_string(), BooleanArray::from(last_false))); +} + +// Scenario 4: Single true at exact middle or single false at exact middle +let mid = len / 2; +if TEST_ALL_FALSE { +let mut mid_true = vec![false; len]; +mid_true[mid] = true; +cases.push(("one_true_middle".to_string(), BooleanArray::from(mid_true))); +} else { +let mut mid_false = vec![true; len]; +mid_false[mid] = false; +cases.push(( +"one_false_middle".to_string(), +BooleanArray::from(mid_false), +)); +} + +// Scenario 5: Single true at 25% position or single false at 25% position +let mid_left = len / 4; +if TEST_ALL_FALSE { +let mut mid_left_true = vec![false; len]; +mid_left_true[mid_left] = true; +cases.push(( +"one_true_middle_left".to_string(), +BooleanArray::from(mid_left_true), +)); +} else { +let mut mid_left_false = vec![true; len]; +mid_left_false[mid_left] = false; +cases.push(( +"one_false_middle_left".to_string(), +BooleanArray::from(mid_left_false), +)); +} + +// Scenario 6: Single true at 75% position or single false at 75% position +let mid_right = (3 * len) / 4; +if TEST_ALL_FALSE { +let mut mid_right_true = vec![false; len]; +mid_right_true[mid_right] = true; +cases.push(( +"one_true_middle_right".to_string(), +BooleanArray::from(mid_right_true), +)); +} else { +let mut mid_right_false = vec![true; len]; +mid_right_false[mid_right] = false; +cases.push(( +"one_false_middle_right".to_string(), +BooleanArray::from(mid_r
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2786029908 Filed https://github.com/apache/datafusion/issues/15631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb merged PR #15462: URL: https://github.com/apache/datafusion/pull/15462 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2786018330 Thanks again for the wonderful work @acking-you -- I ran the benchmarks locally and indeed they show a non trivial amount of time being spent in count_ones.  I also confirmed that without the short circuit optimization the benchmarks get much much slower ``` Gnuplot not found, using plotters backend bench_all_false_and time: [503.28 µs 505.29 µs 507.36 µs] change: [+1711392% +1718962% +1727169%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe bench_all_true_or time: [504.57 µs 506.27 µs 508.10 µs] change: [+1704732% +1713204% +1721665%] (p = 0.00 < 0.05) Performance has regressed. Found 7 outliers among 100 measurements (7.00%) 6 (6.00%) high mild ``` Let's merge this PR in and I will file a follow on ticket to track potentially improving performance even more -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2784190060 > I think the likely part of it getting slower is short-circuiting whenever a `true` is observed (having a branch for each item to check). > > For this case it might be interesting to compare it with `array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && acc)` (e.g. fully evaluate the entire array) and see if it is any faster than the `true_count` implementation. > > For `max_boolean` it might be interesting to see if we can make it faster by doing both, early exiting and generating more efficient code. Maybe it could evaluate it per `n` values instead of per-value to benefit from faster code / branch prediction while still exiting early enough. I’m certain that the semantics of the `if` statement, which preemptively exits based on a condition, caused the compiler to abandon SIMD optimization. To test this, I conducted the following experiment: I wrote two simple functions to simulate the two scenarios and then examined the optimized assembly code generated by the compiler to compare the differences. You can view the specific assembly code generated from my simulated experiment here: [https://godbolt.org/z/63bEvYP4d](https://godbolt.org/z/63bEvYP4d). ## Records Rust code as follows: ```rust /// Scene 1: Counting all set bits (the number of 1s) #[no_mangle] #[inline(never)] pub fn count_ones(ba: &[u8]) -> usize { ba.iter().map(|x| x.count_ones() as usize).sum() } /// Scene 2: Check if a bit is set #[no_mangle] #[inline(never)] pub fn find_any(ba: &[u8]) -> bool { ba.iter().any(|&x| x != 0) } ``` Assembly code as follows: ```asm count_ones: testrsi, rsi je .LBB0_1 cmp rsi, 4 jae .LBB0_5 xor ecx, ecx xor eax, eax jmp .LBB0_8 .LBB0_1: xor eax, eax ret .LBB0_5: mov rcx, rsi and rcx, -4 pxorxmm0, xmm0 xor eax, eax movdqa xmm2, xmmword ptr [rip + .LCPI0_0] movdqa xmm3, xmmword ptr [rip + .LCPI0_1] movdqa xmm5, xmmword ptr [rip + .LCPI0_2] pxorxmm4, xmm4 pxorxmm1, xmm1 .LBB0_6: movzx edx, word ptr [rdi + rax] movdxmm7, edx movzx edx, word ptr [rdi + rax + 2] movdxmm6, edx punpcklbw xmm7, xmm0 punpcklwd xmm7, xmm0 punpckldq xmm7, xmm0 movdqa xmm8, xmm7 psrlw xmm8, 1 pandxmm8, xmm2 psubb xmm7, xmm8 movdqa xmm8, xmm7 pandxmm8, xmm3 psrlw xmm7, 2 pandxmm7, xmm3 paddb xmm7, xmm8 movdqa xmm8, xmm7 psrlw xmm8, 4 paddb xmm8, xmm7 pandxmm8, xmm5 psadbw xmm8, xmm0 paddq xmm4, xmm8 punpcklbw xmm6, xmm0 punpcklwd xmm6, xmm0 punpckldq xmm6, xmm0 movdqa xmm7, xmm6 psrlw xmm7, 1 pandxmm7, xmm2 psubb xmm6, xmm7 movdqa xmm7, xmm6 pandxmm7, xmm3 psrlw xmm6, 2 pandxmm6, xmm3 paddb xmm6, xmm7 movdqa xmm7, xmm6 psrlw xmm7, 4 paddb xmm7, xmm6 pandxmm7, xmm5 psadbw xmm7, xmm0 paddq xmm1, xmm7 add rax, 4 cmp rcx, rax jne .LBB0_6 paddq xmm1, xmm4 pshufd xmm0, xmm1, 238 paddq xmm0, xmm1 movqrax, xmm0 cmp rcx, rsi je .LBB0_2 .LBB0_8: movzx edx, byte ptr [rdi + rcx] imuledx, edx, 134480385 shr edx, 3 and edx, 286331153 imuledx, edx, 286331153 shr edx, 28 add rax, rdx inc rcx cmp rsi, rcx jne .LBB0_8 .LBB0_2: ret find_any: xor ecx, ecx .LBB1_1: mov rax, rcx cmp rsi, rcx je .LBB1_3 lea rcx, [rax + 1] cmp byte ptr [rdi + rax], 0 je .LBB1_1 .LBB1_3: cmp rsi, rax setne al ret ``` ## Conclusion 1. **Vectorization Optimization Differences**: - The assembly code for `count_ones` utilizes SIMD instructions (such as `movdqa`, `psrlw`, and `paddb`), achieving automatic vectorization. The compiler packs four bytes (32 bits) of data into an XMM register for parallel processing, efficiently calculating the number of set bits
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783230355 > > Security audit failure is not related to this PR > > How can this be resolved? - it is tracked by https://github.com/apache/datafusion/issues/15571 There are ideas on there of how to resolve the isseu To be clear I don't think this is blocking this PR from merging The only thing I think is needed for this PR is to either 1. remove the bechmark 2. update the benchmark ti actually test `BinaryExpr` evaluation time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
Dandandan commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783121877 > > The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan > > Thank you for your diligence @acking-you > > I think we should merge this PR in (with the count bits and the benchmarks) and file a follow on ticket to potentially improve the performance in some other way. > > I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something > > https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160 https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751 I think the likely part of it getting slower is short-circuiting whenever a `true` is observed. For this case it might be interesting to compare it with `array.values().bit_chunks().iter_padded().fold(true, |acc, i: u64| i != 0 && acc)` and see if it is any faster than the `true_count` implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2031069949 ## datafusion/physical-expr/benches/boolean_op.rs: ## @@ -0,0 +1,187 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::{ +array::BooleanArray, +compute::{bool_and, bool_or}, +}; +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use std::sync::{Arc, LazyLock}; + +/// Generates BooleanArrays with different true/false distributions for benchmarking. +/// +/// Returns a vector of tuples containing scenario name and corresponding BooleanArray. +/// +/// # Arguments +/// - `TEST_ALL_FALSE` - Used to generate what kind of test data +/// - `len` - Length of the BooleanArray to generate +fn generate_boolean_cases( +len: usize, +) -> Vec<(String, BooleanArray)> { +let mut cases = Vec::with_capacity(6); + +// Scenario 1: All elements false or all elements true +if TEST_ALL_FALSE { +let all_false = BooleanArray::from(vec![false; len]); +cases.push(("all_false".to_string(), all_false)); +} else { +let all_true = BooleanArray::from(vec![true; len]); +cases.push(("all_true".to_string(), all_true)); +} + +// Scenario 2: Single true at first position or single false at first position +if TEST_ALL_FALSE { +let mut first_true = vec![false; len]; +first_true[0] = true; +cases.push(("one_true_first".to_string(), BooleanArray::from(first_true))); +} else { +let mut first_false = vec![true; len]; +first_false[0] = false; +cases.push(( +"one_false_first".to_string(), +BooleanArray::from(first_false), +)); +} + +// Scenario 3: Single true at last position or single false at last position +if TEST_ALL_FALSE { +let mut last_true = vec![false; len]; +last_true[len - 1] = true; +cases.push(("one_true_last".to_string(), BooleanArray::from(last_true))); +} else { +let mut last_false = vec![true; len]; +last_false[len - 1] = false; +cases.push(("one_false_last".to_string(), BooleanArray::from(last_false))); +} + +// Scenario 4: Single true at exact middle or single false at exact middle +let mid = len / 2; +if TEST_ALL_FALSE { +let mut mid_true = vec![false; len]; +mid_true[mid] = true; +cases.push(("one_true_middle".to_string(), BooleanArray::from(mid_true))); +} else { +let mut mid_false = vec![true; len]; +mid_false[mid] = false; +cases.push(( +"one_false_middle".to_string(), +BooleanArray::from(mid_false), +)); +} + +// Scenario 5: Single true at 25% position or single false at 25% position +let mid_left = len / 4; +if TEST_ALL_FALSE { +let mut mid_left_true = vec![false; len]; +mid_left_true[mid_left] = true; +cases.push(( +"one_true_middle_left".to_string(), +BooleanArray::from(mid_left_true), +)); +} else { +let mut mid_left_false = vec![true; len]; +mid_left_false[mid_left] = false; +cases.push(( +"one_false_middle_left".to_string(), +BooleanArray::from(mid_left_false), +)); +} + +// Scenario 6: Single true at 75% position or single false at 75% position +let mid_right = (3 * len) / 4; +if TEST_ALL_FALSE { +let mut mid_right_true = vec![false; len]; +mid_right_true[mid_right] = true; +cases.push(( +"one_true_middle_right".to_string(), +BooleanArray::from(mid_right_true), +)); +} else { +let mut mid_right_false = vec![true; len]; +mid_right_false[mid_right] = false; +cases.push(( +"one_false_middle_right".to_string(), +BooleanArray::from(mid_right_false), +)); +} + +cases +} + +fn benchmark_boolean_ops(c: &mut Criterion) { Review Comment: > in other words, make micro benchmarks for expressions like A AND B AND C AND D similar to what you added to the clickbench extended suite that shows the performance of ex
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783033518 > I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something Yes, I was also quite surprised. The only difference in the implementation of these two functions might be whether or not `find` was used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2031057246 ## datafusion/physical-expr/benches/boolean_op.rs: ## @@ -0,0 +1,187 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use arrow::{ +array::BooleanArray, +compute::{bool_and, bool_or}, +}; +use criterion::{black_box, criterion_group, criterion_main, Criterion}; +use std::sync::{Arc, LazyLock}; + +/// Generates BooleanArrays with different true/false distributions for benchmarking. +/// +/// Returns a vector of tuples containing scenario name and corresponding BooleanArray. +/// +/// # Arguments +/// - `TEST_ALL_FALSE` - Used to generate what kind of test data +/// - `len` - Length of the BooleanArray to generate +fn generate_boolean_cases( +len: usize, +) -> Vec<(String, BooleanArray)> { +let mut cases = Vec::with_capacity(6); + +// Scenario 1: All elements false or all elements true +if TEST_ALL_FALSE { +let all_false = BooleanArray::from(vec![false; len]); +cases.push(("all_false".to_string(), all_false)); +} else { +let all_true = BooleanArray::from(vec![true; len]); +cases.push(("all_true".to_string(), all_true)); +} + +// Scenario 2: Single true at first position or single false at first position +if TEST_ALL_FALSE { +let mut first_true = vec![false; len]; +first_true[0] = true; +cases.push(("one_true_first".to_string(), BooleanArray::from(first_true))); +} else { +let mut first_false = vec![true; len]; +first_false[0] = false; +cases.push(( +"one_false_first".to_string(), +BooleanArray::from(first_false), +)); +} + +// Scenario 3: Single true at last position or single false at last position +if TEST_ALL_FALSE { +let mut last_true = vec![false; len]; +last_true[len - 1] = true; +cases.push(("one_true_last".to_string(), BooleanArray::from(last_true))); +} else { +let mut last_false = vec![true; len]; +last_false[len - 1] = false; +cases.push(("one_false_last".to_string(), BooleanArray::from(last_false))); +} + +// Scenario 4: Single true at exact middle or single false at exact middle +let mid = len / 2; +if TEST_ALL_FALSE { +let mut mid_true = vec![false; len]; +mid_true[mid] = true; +cases.push(("one_true_middle".to_string(), BooleanArray::from(mid_true))); +} else { +let mut mid_false = vec![true; len]; +mid_false[mid] = false; +cases.push(( +"one_false_middle".to_string(), +BooleanArray::from(mid_false), +)); +} + +// Scenario 5: Single true at 25% position or single false at 25% position +let mid_left = len / 4; +if TEST_ALL_FALSE { +let mut mid_left_true = vec![false; len]; +mid_left_true[mid_left] = true; +cases.push(( +"one_true_middle_left".to_string(), +BooleanArray::from(mid_left_true), +)); +} else { +let mut mid_left_false = vec![true; len]; +mid_left_false[mid_left] = false; +cases.push(( +"one_false_middle_left".to_string(), +BooleanArray::from(mid_left_false), +)); +} + +// Scenario 6: Single true at 75% position or single false at 75% position +let mid_right = (3 * len) / 4; +if TEST_ALL_FALSE { +let mut mid_right_true = vec![false; len]; +mid_right_true[mid_right] = true; +cases.push(( +"one_true_middle_right".to_string(), +BooleanArray::from(mid_right_true), +)); +} else { +let mut mid_right_false = vec![true; len]; +mid_right_false[mid_right] = false; +cases.push(( +"one_false_middle_right".to_string(), +BooleanArray::from(mid_right_false), +)); +} + +cases +} + +fn benchmark_boolean_ops(c: &mut Criterion) { Review Comment: Can you please update these benchmarks for the performance of boolean expression evaluation? As of now, these are benchmarks for `bool_and` and `bool_or`? In other words,
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783035350 > Security audit failure is not related to this PR How can this be resolved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2782983563 I sincerely apologize for the delay in updating this PR. I have now designed a detailed comparative test for the `bool_or/bool_and` issue described by @Dandandan. The related results are as follows. I’m not sure what you think about these results — should we continue using `true_count/false_count`, or look for a better solution? @alamb @Dandandan 1. I designed six distribution scenarios([code](https://github.com/apache/datafusion/pull/15462/files#diff-8710f6b44dd74240d19e6fcdfbf971c034f59cd48c022e51b07ed876a8cc7c5eR37-R120)) for the `true_count/false_count` and `bool_or/bool_and` of the boolean array to conduct performance testing comparisons. The final results are as follows: | Test Case | true_count/false_count | bool_or/bool_and | Change | Performance Impact | | -- | -- | --- | | -- | | all_false | 1.6593 µs - 1.6695 µs | 4.1013 µs - 4.1774 µs | 150.91% | 🚫 Regressed | | one_true_first | 1.6726 µs - 1.6771 µs | **1.6885 ns - 1.7430 ns** | **-99.898%** | ✅ Improved | | one_true_last | 1.6663 µs - 1.6714 µs | 4.1096 µs - 4.1554 µs | +147.14% | 🚫 Regressed | | one_true_middle| 1.6723 µs - 1.6819 µs | 2.1505 µs - 2.2117 µs | +30.180% | 🚫 Regressed | | one_true_middle_left | 1.6672 µs - 1.6727 µs | 1.1088 µs - 1.1483 µs | -32.995% | ✅ Improved | | one_true_middle_right | 1.6689 µs - 1.6741 µs | 3.1562 µs - 3.2521 µs | +93.762% | 🚫 Regressed | | all_true | 1.6711 µs - 1.6747 µs | 4.2779 µs - 4.4291 µs | +155.35% | 🚫 Regressed | | one_false_first| 1.6722 µs - 1.6782 µs | **1.6278 ns - 1.6360 ns** | **-99.903%** | ✅ Improved | | one_false_last | 1.6818 µs - 1.7512 µs | 4.2175 µs - 4.2930 µs | +153.50% | 🚫 Regressed | | one_false_middle | 1.8437 µs - 1.9665 µs | 2.0575 µs - 2.0871 µs | +11.931% | 🚫 Regressed | | one_false_middle_left | 2.0004 µs - 2.3194 µs | 1.0243 µs - 1.0398 µs | -57.059% | ✅ Improved | | one_false_middle_right | 2.0770 µs - 2.2721 µs | 3.0275 µs - 3.0582 µs | +47.668% | 🚫 Regressed | It can be seen that when `false/true` is located slightly to the left of the middle, `bool_or/bool_and` has a significant advantage, with up to 10^3 times the performance lead when in the first position. However, in other cases, using `true_count/false_count` performs better, showing relatively stable behavior across various scenarios. 2. The test can be reproduced with the following command: ```sh # test true_count/false_count TEST_BOOL_COUNT=1 cargo bench --bench boolean_op # test bool_or/bool_and cargo bench --bench boolean_op ``` detail benchmark code: https://github.com/apache/datafusion/pull/15462/files#diff-8710f6b44dd74240d19e6fcdfbf971c034f59cd48c022e51b07ed876a8cc7c5e -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783021714 > The related results are as follows. I’m not sure what you think about these results — should we continue using true_count/false_count, or look for a better solution? @alamb @Dandandan Thank you for your diligence @acking-you I think we should merge this PR in (with the count bits and the benchmarks) and file a follow on ticket to potentially improve the performance in some other way. I don't fully understand your performance results given that the two functions seem very similar -- maybe it has to do with option hanlding messing auto vectorization or something https://docs.rs/arrow-array/54.3.1/src/arrow_array/array/boolean_array.rs.html#160 https://docs.rs/arrow-arith/54.3.1/src/arrow_arith/aggregate.rs.html#751 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2783022496 Security audit is not related to this RP -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2030375948 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: Thank you very much for your reply, your opinion is correct. I actually did the same. I'm very sorry. Maybe my comment is a little confusing. I will fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2030375948 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: Thank you very much for your CR, your opinion is correct. I actually did the same. I'm very sorry. Maybe my comment is a little confusing. I will fix it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
berkaysynnada commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2030260718 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: I might be missing some obvious points, but 1) applying the same check for rhs is redundant? (does it approximately requires the same computation if we continue the execution as is) 2) why aren't we also covering the cases when **lhs is all true and op is AND** / **lhs is all false and op is OR** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020615756 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: > I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0) I think the overhead added here should be very small (the compiler optimization should work well), and the test results we discussed before were sometimes fast and sometimes slow (maybe noise). Your suggestion of making an early judgment and returning false seems like a good idea, but I'm not sure if it will be effective. The concern I have with this approach is that it requires adding an `if` condition inside the `for` loop, which will most likely disable the compiler's SIMD instruction optimization (I've encountered a similar situation before, and I had to manually unroll the SIMD...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
Dandandan commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020558418 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: I remember this had some overhead (for calculating the counts) from a previous try. I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap `!= 0`) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2026912500 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: I looked more carefully at bool_or and I do think it would be faster than this implementation on the case where there are some true values (as it stops as soon as it finds a single non zero): https://docs.rs/arrow/latest/arrow/compute/fn.bool_or.html -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
alamb commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2775475696 > > I looked for a binary expression and/or micro benchmark and could not find one: https://github.com/apache/datafusion/tree/c929a1cd133590e4944bc2c7900611220450335a/datafusion/physical-expr/benches > > Thank you for your suggestions and help. I also noticed this part of the code yesterday. I've had a cold and fever in the past few days, so I haven't tried the method mentioned by @Dandandan for benchmarking yet. I should be able to work on it today. I hope you feel better soon I have been running some benchmark numbers (below) and I have gotten some strange results. I definitely see this PR improves performance on the newly added clickbench extended benchmark (almost 3x faster!) However `clickbench_1` seems to show a slowdown compared to `clickbench_partitioned` which I don't understand. I will try and reproduce this locally Details ``` Benchmark clickbench_1.json ┏━━┳┳━━━┳━━┓ ┃ Query┃ main_base ┃ add_short_circuit ┃ Change ┃ ┡━━╇╇━━━╇━━┩ │ QQuery 0 │ 0.58ms │0.59ms │no change │ │ QQuery 1 │70.93ms │ 72.44ms │no change │ │ QQuery 2 │ 118.33ms │ 120.96ms │no change │ │ QQuery 3 │ 124.52ms │ 130.17ms │no change │ │ QQuery 4 │ 788.26ms │ 835.71ms │ 1.06x slower │ │ QQuery 5 │ 891.18ms │ 1033.71ms │ 1.16x slower │ │ QQuery 6 │68.02ms │ 66.47ms │no change │ │ QQuery 7 │79.51ms │ 80.38ms │no change │ │ QQuery 8 │ 951.66ms │ 962.46ms │no change │ │ QQuery 9 │ 1242.83ms │ 1277.46ms │no change │ │ QQuery 10│ 305.01ms │ 312.25ms │no change │ │ QQuery 11│ 334.19ms │ 341.58ms │no change │ │ QQuery 12│ 941.89ms │ 1079.60ms │ 1.15x slower │ │ QQuery 13│ 1341.24ms │ 1528.55ms │ 1.14x slower │ │ QQuery 14│ 874.85ms │ 1057.59ms │ 1.21x slower │ │ QQuery 15│ 1118.77ms │ 1134.54ms │no change │ │ QQuery 16│ 1803.30ms │ 1920.39ms │ 1.06x slower │ │ QQuery 17│ 1669.97ms │ 1780.42ms │ 1.07x slower │ │ QQuery 18│ 3269.95ms │ 3297.16ms │no change │ │ QQuery 19│ 120.62ms │ 116.02ms │no change │ │ QQuery 20│ 1222.20ms │ 1302.52ms │ 1.07x slower │ │ QQuery 21│ 1471.42ms │ 1638.85ms │ 1.11x slower │ │ QQuery 22│ 2740.23ms │ 4508.20ms │ 1.65x slower │ │ QQuery 23│ 8733.62ms │10619.68ms │ 1.22x slower │ │ QQuery 24│ 515.74ms │ 695.11ms │ 1.35x slower │ │ QQuery 25│ 431.26ms │ 608.92ms │ 1.41x slower │ │ QQuery 26│ 568.41ms │ 743.79ms │ 1.31x slower │ │ QQuery 27│ 1865.82ms │ 1966.63ms │ 1.05x slower │ │ QQuery 28│ 13400.82ms │13320.07ms │no change │ │ QQuery 29│ 574.24ms │ 590.87ms │no change │ │ QQuery 30│ 863.90ms │ 1059.33ms │ 1.23x slower │ │ QQuery 31│ 940.63ms │ 1115.46ms │ 1.19x slower │ │ QQuery 32│ 2790.62ms │ 2933.31ms │ 1.05x slower │ │ QQuery 33│ 3534.40ms │ 3583.06ms │no change │ │ QQuery 34│ 3515.54ms │ 3592.40ms │no change │ │ QQuery 35│ 1327.76ms │ 1384.98ms │no change │ │ QQuery 36│ 257.33ms │ 269.35ms │no change │ │ QQuery 37│ 124.48ms │ 187.43ms │ 1.51x slower │ │ QQuery 38│ 167.20ms │ 182.73ms │ 1.09x slower │ │ QQuery 39│ 454.63ms │ 474.86ms │no change │ │ QQuery 40│82.41ms │ 83.28ms │no change │ │ QQuery 41│82.20ms │ 80.21ms │no change │ │ QQuery 42│78.13ms │ 87.76ms │ 1.12x slower │ └──┴┴───┴──┘ ┏━━┳┓ ┃ Benchmark Summary┃┃ ┡━━╇┩ │ Total Time (main_base) │ 61858.61ms │ │ Total Time (add_short_circuit) │ 68177.23ms │ │ Average Time (main_base) │ 1438.57ms │ │ Average Time (add_short_circuit) │ 1585.52ms │ │ Queries Faster │ 0 │ │ Queries Slower │ 21 │ │ Queries with No Change │ 22 │ └──┴┘ Benchmark clickbench_extended.json ┏━━┳┳━
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2774487207 > I looked for a binary expression and/or micro benchmark and could not find one: https://github.com/apache/datafusion/tree/c929a1cd133590e4944bc2c7900611220450335a/datafusion/physical-expr/benches Thank you for your suggestions and help. I also noticed this part of the code yesterday. I've had a cold and fever in the past few days, so I haven't tried the method mentioned by @Dandandan for benchmarking yet. I should be able to work on it today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2023116930 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: > Might be overkill, but one _could_ try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop. Thank you for your suggestion, but if we're only applying conditional checks to the first few blocks, then I feel this optimization might not be meaningful. If nearly all blocks can be filtered out by the preceding filter, the optimization will no longer be effective. >If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example) I tend to agree with @alamb's point that if the overhead of verification is somewhat unacceptable, adopting some heuristic approaches would be better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
ctsk commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2022894121 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: Might be overkill, but one *could* try a sampling approach: Run the loop with the early exit for the first few chunks, and then switch over to the unconditional loop. Almost seems like something the compiler could automagically do... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020615756 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: > I wonder if it helps to short optimize this expression (e.g. match until we get a chunk of the bitmap != 0) I think the overhead added here should be very small (the compiler optimization should work well), and the test results we discussed before were sometimes fast and sometimes slow (maybe noise). Your suggestion of making an early judgment and returning false seems like a good idea, but I'm not sure if it will be effective. The concern I have with this approach is that it requires adding an `if` condition inside the `for` loop, which will most likely disable the compiler's SIMD instruction optimization (I've encountered a similar situation before when writing code, and I had to manually unroll the SIMD...). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020726805 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: > Either way, we can use `bool_and` (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and `bool_or` which operates on `u64` values to test performance changes. Thank you for your suggestion. I will try it later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
Dandandan commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020695065 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: Either way, we can use `bool_and` (https://docs.rs/arrow/latest/arrow/compute/fn.bool_and.html) and `bool_or` which operates on `u64` values to test performance changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
Dandandan commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020695065 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +/// 1. For the `AND` operator, if the `lhs` result all are `false` +/// 2. For the `OR` operator, if the `lhs` result all are `true` +/// 3. Otherwise, it does not meet the short-circuit condition +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); Review Comment: Either way, we can use `max_boolean` (https://docs.rs/arrow/latest/arrow/compute/fn.max_boolean.html) and `min_boolean` which operates on `u64` values to test performance changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2764998480 > Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change? I have successfully split Q6 into: https://github.com/apache/datafusion/pull/15500 Thank you very much for your CR. @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2020360286 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) -> Result { use arrow::compute::kernels::numeric::*; +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: I've moved the function outside and added some comments. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2019944906 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) -> Result { use arrow::compute::kernels::numeric::*; +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > Is there any reason to have this function defined in the evaluate method? There was no particular reason. Maybe I couldn't find a suitable place to write it at the time, haha. Where do you think this function should be placed? > If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example) I also agree that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2019945905 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) -> Result { use arrow::compute::kernels::numeric::*; +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { +let data_type = arg.data_type(); +match (data_type, op) { +(DataType::Boolean, Operator::And) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.false_count() == array.len(); +} +} +ColumnarValue::Scalar(scalar) => { +if let ScalarValue::Boolean(Some(value)) = scalar { +return !value; +} +} +} +false +} +(DataType::Boolean, Operator::Or) => { +match arg { +ColumnarValue::Array(array) => { +if let Ok(array) = as_boolean_array(&array) { +return array.true_count() == array.len(); +} +} +ColumnarValue::Scalar(scalar) => { +if let ScalarValue::Boolean(Some(value)) = scalar { +return *value; +} +} +} +false +} +_ => false, +} +} Review Comment: I'll add these tests right away. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2019944906 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) -> Result { use arrow::compute::kernels::numeric::*; +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > Is there any reason to have this function defined in the evaluate method? There was no particular reason. Maybe I couldn't find a suitable place to write it at the time, haha. > If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example) I also agree that ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -358,7 +358,50 @@ impl PhysicalExpr for BinaryExpr { fn evaluate(&self, batch: &RecordBatch) -> Result { use arrow::compute::kernels::numeric::*; +fn check_short_circuit(arg: &ColumnarValue, op: &Operator) -> bool { Review Comment: > Is there any reason to have this function defined in the evaluate method? There was no particular reason. Maybe I couldn't find a suitable place to write it at the time, haha. > If we find that this slows down some other performance we could also add some sort of heuristic check to calling false_count / true_count -- like for example if the rhs arg is "complex" (not a Column for example) I also agree that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]
acking-you commented on PR #15462: URL: https://github.com/apache/datafusion/pull/15462#issuecomment-2764093707 > Also, could you please add the new Q6 benchmark in a separate PR so I can more easily run my benchmark scripts before/after your code change? Okey,I got it.Do you mean that Q6 and its related description in the current branch need to be completed with a separate PR? But this way, it seems that you would still need to cherry-pick Q6 to the corresponding branch when testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org