Re: [PR] Fix parsing EXECUTE (...) with a more general string expression [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on PR #2295: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2295#issuecomment-4213042044 Looks good @romanb! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] MySQL: Add support for `ORDER BY` on single-table `UPDATE` [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on PR #2296: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2296#issuecomment-4213055349 Looks good @tpyo, let's just see that the checks pass and the branch is rebased. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Support optional AS keyword in CTE definitions for Databricks [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
funcpp commented on code in PR #2286: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2286#discussion_r3056830207 ## src/parser/mod.rs: ## @@ -14060,7 +14060,7 @@ impl<'a> Parser<'a> { }) } -/// Parse a CTE (`alias [( col1, col2, ... )] AS (subqu

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213466423 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213389643) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
Dandandan commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213469404 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213471572 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213469404) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

[PR] Enable `arrow-ipc/zstd` in `datasource-arrow` [datafusion]

2026-04-09 Thread via GitHub
AdamGS opened a new pull request, #21504: URL: https://github.com/apache/datafusion/pull/21504 ## Which issue does this PR close? - Closes #21503. ## Rationale for this change The spill manager assumes that all available compressions are actually available, which current

Re: [I] `ProjectionExec` produces unknown statistics for all `ScalarFunctionExpr` outputs [datafusion]

2026-04-09 Thread via GitHub
xudong963 commented on issue #21307: URL: https://github.com/apache/datafusion/issues/21307#issuecomment-4213476776 I haven't had a chance to take a careful look at https://github.com/apache/datafusion/pull/21122, but it looks good. @friendlymatthew, if the ExpressionAnalyzer can mak

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213482629 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213469404) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213485386 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213469404) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213480505 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213388053) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213485753 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213388053) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] Make `test_display_pg_json` pass regardless of build setup and dependencies [datafusion]

2026-04-09 Thread via GitHub
AdamGS commented on PR #21502: URL: https://github.com/apache/datafusion/pull/21502#issuecomment-4214031861 Just realized that there's a SLT test that has the same question in `explain.slt:642`, it just runs `explain format pgjson select * from values (1);`. I'll add the same fix to the `sq

Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21506: URL: https://github.com/apache/datafusion/pull/21506#issuecomment-4214060970 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21506#issuecomment-4214049145) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove merged PR #3915: URL: https://github.com/apache/datafusion-comet/pull/3915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
Dandandan commented on PR #21506: URL: https://github.com/apache/datafusion/pull/21506#issuecomment-4214049145 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21506: URL: https://github.com/apache/datafusion/pull/21506#issuecomment-4214064704 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21506#issuecomment-4214049145) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21506: URL: https://github.com/apache/datafusion/pull/21506#issuecomment-4214064406 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21506#issuecomment-4214049145) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] feat: Use single spill file for multiple partitions in native shuffle [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on PR #3903: URL: https://github.com/apache/datafusion-comet/pull/3903#issuecomment-4214091602 I will run benchmarks with this PR today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: add cast_to_type UDF for type-based casting [datafusion]

2026-04-09 Thread via GitHub
alamb merged PR #21322: URL: https://github.com/apache/datafusion/pull/21322 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058256894 ## datafusion/common/src/functional_dependencies.rs: ## @@ -590,6 +590,46 @@ pub fn get_required_group_by_exprs_indices( .collect() } +/// Returns i

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058253384 ## datafusion/optimizer/src/eliminate_duplicated_expr.rs: ## @@ -76,12 +76,34 @@ impl OptimizerRule for EliminateDuplicatedExpr { .map(|wra

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
neilconway commented on PR #21362: URL: https://github.com/apache/datafusion/pull/21362#issuecomment-4214755472 > My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I d

Re: [PR] fix: Update TPC-DS q36a golden file for Spark 4.0 decimal UNION widening change [datafusion-comet]

2026-04-09 Thread via GitHub
parthchandra commented on PR #3915: URL: https://github.com/apache/datafusion-comet/pull/3915#issuecomment-4214762746 Thank you @mbutrovich @andygrove ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on PR #21362: URL: https://github.com/apache/datafusion/pull/21362#issuecomment-4214741135 > Thanks for iterating on this! > > Can you "resolve" comment threads for review comments you believe have been addressed, please? @neilconway My previous understandi

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on PR #21362: URL: https://github.com/apache/datafusion/pull/21362#issuecomment-4214779237 > > My previous understanding was that the issue would only be marked as "resolved" after I had fixed it and the reviewer had confirmed that everything was in order; that is why I

Re: [I] Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2026-04-09 Thread via GitHub
parthchandra closed issue #325: Implement Spark-compatible CAST from String to Decimal URL: https://github.com/apache/datafusion-comet/issues/325 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Implement Spark-compatible CAST from String to Decimal [datafusion-comet]

2026-04-09 Thread via GitHub
parthchandra commented on issue #325: URL: https://github.com/apache/datafusion-comet/issues/325#issuecomment-4214785296 Completed in https://github.com/apache/datafusion-comet/pull/3884 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Add a memory bound FileStatisticsCache for the Listing Table [datafusion]

2026-04-09 Thread via GitHub
mkleen commented on code in PR #20047: URL: https://github.com/apache/datafusion/pull/20047#discussion_r3058315947 ## datafusion/sqllogictest/test_files/encrypted_parquet.slt: ## @@ -85,5 +85,5 @@ float_field float ) STORED AS PARQUET LOCATION 'test_files/scratch/encrypted_par

Re: [PR] Support optional AS keyword in CTE definitions for Databricks [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on PR #2286: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2286#issuecomment-4212876346 @funcpp, please address the conflicts and see if my comment makes sense. Otherwise, looks good. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Support optional AS keyword in CTE definitions for Databricks [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on code in PR #2286: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2286#discussion_r3056596906 ## src/parser/mod.rs: ## @@ -14060,7 +14060,7 @@ impl<'a> Parser<'a> { }) } -/// Parse a CTE (`alias [( col1, col2, ... )] AS (su

Re: [PR] fix: apply the left side schema on the right side in set expressions [datafusion]

2026-04-09 Thread via GitHub
gruuya commented on code in PR #21052: URL: https://github.com/apache/datafusion/pull/21052#discussion_r3056599401 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1984,6 +2001,17 @@ fn project_with_validation( } } } + +// When inside a set e

Re: [PR] MySQL: Support `SHOW FULL PROCESSLIST` syntax [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on code in PR #2292: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2292#discussion_r3056659178 ## src/parser/mod.rs: ## @@ -15047,6 +15047,10 @@ impl<'a> Parser<'a> { Ok(self.parse_show_views(terse, false)?) } else if self

Re: [PR] Fix parsing EXECUTE (...) with a more general string expression [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud merged PR #2295: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] PostgreSQL `ALTER FUNCTION` / `ALTER AGGREGATE` [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on PR #2248: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2248#issuecomment-4213189498 Looks good @LucaCappelletti94. Please rebase and we should be good to go! -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] Optimize `regexp_replace` by stripping trailing .* from anchored patterns. 2.4x improvement (ClickBench Q28) [datafusion]

2026-04-09 Thread via GitHub
Dandandan merged PR #21379: URL: https://github.com/apache/datafusion/pull/21379 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Optimize regexp_replace by stripping trailing .* from anchored patterns [datafusion]

2026-04-09 Thread via GitHub
Dandandan closed issue #21382: Optimize regexp_replace by stripping trailing .* from anchored patterns URL: https://github.com/apache/datafusion/issues/21382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Enable numeric-prefix identifiers for Databricks dialect [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
yoavcloud commented on PR #2290: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2290#issuecomment-4212911169 Looks good @funcpp! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] fix: improve inner join cardinality estimation for FK joins [datafusion]

2026-04-09 Thread via GitHub
Dandandan opened a new pull request, #21500: URL: https://github.com/apache/datafusion/pull/21500 ## Which issue does this PR close? - Closes #. ## Rationale for this change When distinct count statistics are absent (common when join keys involve CAST expressions or when

Re: [I] Update ClickBench benchmarks with DataFusion 53.0.0 (when released) [datafusion]

2026-04-09 Thread via GitHub
AdamGS commented on issue #20602: URL: https://github.com/apache/datafusion/issues/20602#issuecomment-4213512789 Take, I'm going to re-run Datafusion-Vortex next week so might as well run Parquet. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [I] Update ClickBench benchmarks with DataFusion 53.0.0 (when released) [datafusion]

2026-04-09 Thread via GitHub
AdamGS commented on issue #20602: URL: https://github.com/apache/datafusion/issues/20602#issuecomment-4213513397 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Support multi-column aliases in SELECT items [datafusion-sqlparser-rs]

2026-04-09 Thread via GitHub
funcpp commented on code in PR #2289: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2289#discussion_r3056872913 ## tests/sqlparser_databricks.rs: ## @@ -644,3 +644,17 @@ fn parse_databricks_json_accessor() { "SELECT raw:store.bicycle.price::DOUBLE FROM st

[I] `test_display_pg_json` currently fails with default feaetures [datafusion]

2026-04-09 Thread via GitHub
AdamGS opened a new issue, #21501: URL: https://github.com/apache/datafusion/issues/21501 ### Describe the bug The default feature set for `datafusion-expr` makes the test to fail, getting it to pass relies on feature unification for `serde_json`. ### To Reproduce `carg

Re: [PR] feat: jupyter notebook support [datafusion-ballista]

2026-04-09 Thread via GitHub
milenkovicm merged PR #1513: URL: https://github.com/apache/datafusion-ballista/pull/1513 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Improve Jupyter notebook support with SQL magic commands and examples [datafusion-ballista]

2026-04-09 Thread via GitHub
milenkovicm closed issue #1398: Improve Jupyter notebook support with SQL magic commands and examples URL: https://github.com/apache/datafusion-ballista/issues/1398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix: apply the left side schema on the right side in set expressions [datafusion]

2026-04-09 Thread via GitHub
jonahgao merged PR #21052: URL: https://github.com/apache/datafusion/pull/21052 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213403534 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213388053) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213403420 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213389643) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213401628 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213389643) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213398992 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213388053) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] feat: Add pluggable StatisticsRegistry for operator-level statistics propagation [datafusion]

2026-04-09 Thread via GitHub
asolimando commented on code in PR #21483: URL: https://github.com/apache/datafusion/pull/21483#discussion_r3057935779 ## datafusion/physical-optimizer/src/join_selection.rs: ## @@ -53,36 +56,49 @@ impl JoinSelection { } } +/// Get statistics for a plan node, using the r

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
neilconway commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3057916421 ## datafusion/common/src/functional_dependencies.rs: ## @@ -590,6 +590,46 @@ pub fn get_required_group_by_exprs_indices( .collect() } +/// Returns i

Re: [I] Expose Spark functions [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #1482: URL: https://github.com/apache/datafusion-python/issues/1482#issuecomment-4214360456 After some back and for with an agent, we came up with this plan. This does not need to be the final plan, but it looks good to me as a starting point. # Plan:

Re: [PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
Mark1626 commented on PR #21506: URL: https://github.com/apache/datafusion/pull/21506#issuecomment-4214373147 Not sure why Clickbench Q19 is degrading, it doesn't have an agg ``` SELECT "UserID" FROM hits WHERE "UserID" = 435090932899640449; ``` -- This is an automated message fro

Re: [PR] feat: replace custom shuffle block format with Arrow IPC streams [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on PR #3911: URL: https://github.com/apache/datafusion-comet/pull/3911#issuecomment-4214374621 I ran TPC-H @ 1TB and did not see any significant change in performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214396580 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] feat: replace custom shuffle block format with Arrow IPC streams [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on PR #3911: URL: https://github.com/apache/datafusion-comet/pull/3911#issuecomment-4214397922 @Kontinuation fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214404522 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] feat: add initial support for `array_exists` with lambda expression support [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on code in PR #3611: URL: https://github.com/apache/datafusion-comet/pull/3611#discussion_r3057984890 ## native/spark-expr/src/array_funcs/array_exists.rs: ## @@ -0,0 +1,546 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: add initial support for `array_exists` with lambda expression support [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on code in PR #3611: URL: https://github.com/apache/datafusion-comet/pull/3611#discussion_r3057988240 ## spark/src/test/scala/org/apache/comet/CometArrayExpressionSuite.scala: ## @@ -930,6 +930,82 @@ class CometArrayExpressionSuite extends CometTestBase with

Re: [PR] feat: add initial support for `array_exists` with lambda expression support [datafusion-comet]

2026-04-09 Thread via GitHub
andygrove commented on code in PR #3611: URL: https://github.com/apache/datafusion-comet/pull/3611#discussion_r3057990563 ## spark/src/main/scala/org/apache/comet/serde/arrays.scala: ## @@ -622,6 +622,80 @@ object CometArrayFilter extends CometExpressionSerde[ArrayFilter] {

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058052640 ## datafusion/common/src/functional_dependencies.rs: ## @@ -590,6 +590,46 @@ pub fn get_required_group_by_exprs_indices( .collect() } +/// Returns i

Re: [I] Missing docstring examples for `to_date`, `to_time`, and `to_local_time` in scalar temporal functions [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #1433: Missing docstring examples for `to_date`, `to_time`, and `to_local_time` in scalar temporal functions URL: https://github.com/apache/datafusion-python/issues/1433 -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058053631 ## datafusion/sqllogictest/test_files/window.slt: ## Review Comment: Done! -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] [EPIC] Expose all operators and expressions in Python [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #191: [EPIC] Expose all operators and expressions in Python URL: https://github.com/apache/datafusion-python/issues/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] [EPIC] Expose all operators and expressions in Python [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #191: URL: https://github.com/apache/datafusion-python/issues/191#issuecomment-4214498405 Closed by the combination of issues listed in https://github.com/apache/datafusion-python/pull/1460 -- This is an automated message from the Apache Git Service. To respo

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058059373 ## datafusion/optimizer/src/eliminate_duplicated_expr.rs: ## @@ -76,12 +76,50 @@ impl OptimizerRule for EliminateDuplicatedExpr { .map(|wra

Re: [I] EPIC: Add all `SessionContext` and `DataFrame` methods to Python API [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #24: EPIC: Add all `SessionContext` and `DataFrame` methods to Python API URL: https://github.com/apache/datafusion-python/issues/24 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] EPIC: Add all `SessionContext` and `DataFrame` methods to Python API [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #24: URL: https://github.com/apache/datafusion-python/issues/24#issuecomment-4214493256 Closed by #1475 and #1472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Optimize ORDER BY by Pruning Functionally Redundant Sort Keys [datafusion]

2026-04-09 Thread via GitHub
xiedeyantu commented on code in PR #21362: URL: https://github.com/apache/datafusion/pull/21362#discussion_r3058064144 ## datafusion/optimizer/src/eliminate_duplicated_expr.rs: ## @@ -175,4 +214,40 @@ mod tests { TableScan: test ") } + +#[test] +

Re: [I] Incorrect query results for GROUP BY with UNIQUE constraint [datafusion]

2026-04-09 Thread via GitHub
asolimando commented on issue #21507: URL: https://github.com/apache/datafusion/issues/21507#issuecomment-4214500598 Postgres has the same behavior of DuckDB, you can see an exact reproducer here: https://www.db-fiddle.com/f/33K5ckQbQFNHMC5ro8JmEx/0 -- This is an automated message from th

Re: [I] fail to sql query if column contains capitalized letter [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #399: URL: https://github.com/apache/datafusion-python/issues/399#issuecomment-4214514761 Closing because this is documented in the online site. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] fail to sql query if column contains capitalized letter [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #399: fail to sql query if column contains capitalized letter URL: https://github.com/apache/datafusion-python/issues/399 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Add more regexp_replace test coverage [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21485: URL: https://github.com/apache/datafusion/pull/21485#issuecomment-4214545189 Thanks @Dandandan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: Use codepoints in `lpad`, `rpad`, `translate` [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21405: URL: https://github.com/apache/datafusion/pull/21405#issuecomment-4214563440 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix: Use codepoints in `lpad`, `rpad`, `translate` [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21405: URL: https://github.com/apache/datafusion/pull/21405#issuecomment-4214565012 I can click the merge button with the best of them! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] feat: add cast_to_type UDF for type-based casting [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21322: URL: https://github.com/apache/datafusion/pull/21322#issuecomment-4214579995 Looks like this PR has a bunch of approvals and no outstanding comments, so merging! -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] ParserError when "WITHIN GROUP" is specified in SELECT [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #528: URL: https://github.com/apache/datafusion-python/issues/528#issuecomment-4214591298 I'm not sure which version corrected it, but the error no longer exists. -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [I] ParserError when "WITHIN GROUP" is specified in SELECT [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #528: ParserError when "WITHIN GROUP" is specified in SELECT URL: https://github.com/apache/datafusion-python/issues/528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] test_binary_string_functions fails locally [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer closed issue #531: test_binary_string_functions fails locally URL: https://github.com/apache/datafusion-python/issues/531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] test_binary_string_functions fails locally [datafusion-python]

2026-04-09 Thread via GitHub
timsaucer commented on issue #531: URL: https://github.com/apache/datafusion-python/issues/531#issuecomment-4214604875 No longer fails locally. Please feel free to reopen the issue if you continue to have problems. -- This is an automated message from the Apache Git Service. To respond t

Re: [I] `test_display_pg_json` currently fails with default feaetures [datafusion]

2026-04-09 Thread via GitHub
AdamGS commented on issue #21501: URL: https://github.com/apache/datafusion/issues/21501#issuecomment-4213186742 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] fix: improve inner join cardinality estimation for FK joins [datafusion]

2026-04-09 Thread via GitHub
Dandandan commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213222926 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Improve inner join cardinality estimation for FK joins without distinct count statistics [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213434948 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213348828) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [I] `test_spill_compression` fails with default features [datafusion]

2026-04-09 Thread via GitHub
AdamGS commented on issue #21503: URL: https://github.com/apache/datafusion/issues/21503#issuecomment-4213454074 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] `test_spill_compression` fails with default features [datafusion]

2026-04-09 Thread via GitHub
AdamGS opened a new issue, #21503: URL: https://github.com/apache/datafusion/issues/21503 ### Describe the bug The `test_spill_compression` test fails with default features because it can't spill with `zstd` compression. It implicitly relies on the `datafusion-datasource-arrow` crate

[PR] feat: Optimise convert_to_state for SUM and BIT_OR_XOR [datafusion]

2026-04-09 Thread via GitHub
Mark1626 opened a new pull request, #21506: URL: https://github.com/apache/datafusion/pull/21506 ## Which issue does this PR close? ## Rationale for this change I have a query where GroupedHashAggregateStream was switching to SkippingAggregation. I noticed `convert_to_state` co

Re: [PR] fix: propagate column statistics through CAST in projections [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213974136 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213800450) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] fix: propagate column statistics through CAST in projections [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21500: URL: https://github.com/apache/datafusion/pull/21500#issuecomment-4213962537 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21500#issuecomment-4213800450) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] Consolidate special case `regexp_match` logic [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21486: URL: https://github.com/apache/datafusion/pull/21486#issuecomment-4213973218 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21486#issuecomment-4213804340) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] feat: add vector distance and array math functions [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21371: URL: https://github.com/apache/datafusion/pull/21371#issuecomment-4214252980 Hi -- thank you for this PR. I think it will be challenging to review a PR of this size To help review can you please: 1. file a ticket to track adding these new functions, wi

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add a memory bound FileStatisticsCache for the Listing Table [datafusion]

2026-04-09 Thread via GitHub
mkleen commented on PR #20047: URL: https://github.com/apache/datafusion/pull/20047#issuecomment-4214197175 @martin-g Could you also do another review round please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] feat: add array_sum, array_product, array_avg functions and list_min alias [datafusion]

2026-04-09 Thread via GitHub
alamb commented on PR #21376: URL: https://github.com/apache/datafusion/pull/21376#issuecomment-4214237771 > Question for the committers (@alamb / @neilconway) — would you prefer these split further? Options I see: Short answer is yes -- as I think it is unlikely I'll have time to

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214283966 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
alamb commented on code in PR #21456: URL: https://github.com/apache/datafusion/pull/21456#discussion_r3057860299 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -83,3 +83,7 @@ harness = false [[bench]] name = "first_last" harness = false + +[[bench]] +name = "count_disti

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214283917 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
alamb commented on code in PR #21456: URL: https://github.com/apache/datafusion/pull/21456#discussion_r3057868955 ## datafusion/functions-aggregate-common/src/aggregate/count_distinct/native.rs: ## @@ -165,3 +165,354 @@ impl Accumulator for FloatDistinctCountAccumulato

Re: [PR] perf : Optimize count distinct [datafusion]

2026-04-09 Thread via GitHub
adriangbot commented on PR #21456: URL: https://github.com/apache/datafusion/pull/21456#issuecomment-4214284037 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21456#issuecomment-4214262367) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] ci: add breaking change detector [datafusion]

2026-04-09 Thread via GitHub
rluvaton commented on PR #21499: URL: https://github.com/apache/datafusion/pull/21499#issuecomment-4214304633 @alamb, @comphead , would love your review once merge I will iterate over this, but this works from what I tested -- This is an automated message from the Apache Git Service. To

  1   2   3   4   5   6   >