Re: [I] feat: allow math functions to parse input from Utf8 [datafusion]

2025-08-16 Thread via GitHub
caicancai commented on issue #9302: URL: https://github.com/apache/datafusion/issues/9302#issuecomment-3194173986 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Remove redundant `plan` from extension's check_invariants [datafusion]

2025-08-16 Thread via GitHub
findepi merged PR #17199: URL: https://github.com/apache/datafusion/pull/17199 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

[PR] chore: Add drop table test on create_drop.rs [datafusion]

2025-08-16 Thread via GitHub
caicancai opened a new pull request, #17219: URL: https://github.com/apache/datafusion/pull/17219 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

[I] Support display newline inside error message [datafusion]

2025-08-16 Thread via GitHub
2010YOUY01 opened a new issue, #17218: URL: https://github.com/apache/datafusion/issues/17218 ### Is your feature request related to a problem or challenge? If a newline `\n` is included in the error message, and the error is created with the macro like `exec_datafusion_err!(...)`, th

[PR] Minor: improve error message when file creation failed [datafusion]

2025-08-16 Thread via GitHub
2010YOUY01 opened a new pull request, #17217: URL: https://github.com/apache/datafusion/pull/17217 ## Which issue does this PR close? - Closes #17194 ## Rationale for this change When running the tests with `cargo test --test fuzz`, some spilling execution t

Re: [PR] Enable dynamic filter pushdown for LEFT/RIGHT/SEMI/ANTI/Mark joins; surface probe metadata in plans; add join-preservation docs [datafusion]

2025-08-16 Thread via GitHub
kosiew commented on PR #17090: URL: https://github.com/apache/datafusion/pull/17090#issuecomment-3194107741 Thanks @adriangb , @alamb for your review. Putting this into draft and I will split this into smaller PRs. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Implement `partition_statistics` API for `RepartitionExec` [datafusion]

2025-08-16 Thread via GitHub
xudong963 commented on code in PR #17061: URL: https://github.com/apache/datafusion/pull/17061#discussion_r2280694564 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -755,10 +756,42 @@ impl ExecutionPlan for RepartitionExec { } fn partition_statistics(&self

Re: [I] [substrait] [sqllogictest] Null cast is not valid [datafusion]

2025-08-16 Thread via GitHub
jkosh44 commented on issue #16272: URL: https://github.com/apache/datafusion/issues/16272#issuecomment-3194020152 The failure happens when converting the DF logical plan into substrait. The logical plan of the above query looks like ``` Projection(Projection { expr: [

Re: [I] Metric `files_pruned_statistics` is incorrect / wrongly named [datafusion]

2025-08-16 Thread via GitHub
adriangb closed issue #16586: Metric `files_pruned_statistics` is incorrect / wrongly named URL: https://github.com/apache/datafusion/issues/16586 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Metric `files_pruned_statistics` is incorrect / wrongly named [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on issue #16586: URL: https://github.com/apache/datafusion/issues/16586#issuecomment-3194007748 We've since renamed it and given it extensive documentation: https://github.com/apache/datafusion/blob/8f15991f33bf6aca9d4da8958141b59d196b2ed6/datafusion/datasource-parq

Re: [PR] feat: implement_ansi_eval_mode_arithmetic [datafusion-comet]

2025-08-16 Thread via GitHub
coderfender commented on PR #2136: URL: https://github.com/apache/datafusion-comet/pull/2136#issuecomment-3193999424 @kazuyukitanimura , it seems like the issue is with `coalesce` which in spark is ignoring divide by zero exception while it cant perform the same operation with comet

Re: [I] [substrait] [sqllogictest] Null cast is not valid [datafusion]

2025-08-16 Thread via GitHub
jkosh44 commented on issue #16272: URL: https://github.com/apache/datafusion/issues/16272#issuecomment-3193914378 Here's a more minimal .slt repro ``` statement ok CREATE TABLE null_cast (a INT); statement ok set datafusion.sql_parser.dialect = 'Postgres'; query

Re: [I] [substrait] [sqllogictest] Null cast is not valid [datafusion]

2025-08-16 Thread via GitHub
jkosh44 commented on issue #16272: URL: https://github.com/apache/datafusion/issues/16272#issuecomment-3193912323 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[PR] chore: replace Schema with SchemaRef in PruningExpressionBuilder [datafusion]

2025-08-16 Thread via GitHub
etolbakov opened a new pull request, #17216: URL: https://github.com/apache/datafusion/pull/17216 Signed-off-by: Eugene Tolbakov ## Which issue does this PR close? - Closes # https://github.com/apache/datafusion/issues/17198 ## Rationale for this change p

Re: [PR] chore: Improve Arrow FFI documentation [datafusion-comet]

2025-08-16 Thread via GitHub
rluvaton commented on PR #2163: URL: https://github.com/apache/datafusion-comet/pull/2163#issuecomment-3193882896 Maybe we should add assertion in the code for making sure that before free the data and reusing making sure that it is not used -- This is an automated message from the Apach

[PR] build(deps): bump async-trait from 0.1.88 to 0.1.89 [datafusion-python]

2025-08-16 Thread via GitHub
dependabot[bot] opened a new pull request, #1203: URL: https://github.com/apache/datafusion-python/pull/1203 Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.88 to 0.1.89. Release notes Sourced from https://github.com/dtolnay/async-trait/releases";>async-trait's

[PR] build(deps): bump uuid from 1.17.0 to 1.18.0 [datafusion-python]

2025-08-16 Thread via GitHub
dependabot[bot] opened a new pull request, #1202: URL: https://github.com/apache/datafusion-python/pull/1202 Bumps [uuid](https://github.com/uuid-rs/uuid) from 1.17.0 to 1.18.0. Release notes Sourced from https://github.com/uuid-rs/uuid/releases";>uuid's releases. v1.18.0

Re: [PR] Only update TopK dynamic filters if the new ones are more selective [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on PR #16433: URL: https://github.com/apache/datafusion/pull/16433#issuecomment-3193850159 Okay I've re-run after 310100b and am now seeing only improvements across the board. @Dandandan I've requested another review from you since I think this may be in a good place now.

Re: [PR] chore: Improve Arrow FFI documentation [datafusion-comet]

2025-08-16 Thread via GitHub
andygrove commented on PR #2163: URL: https://github.com/apache/datafusion-comet/pull/2163#issuecomment-3193849800 @rluvaton fyi, since we discussed this some time ago -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: implement_ansi_eval_mode_arithmetic [datafusion-comet]

2025-08-16 Thread via GitHub
coderfender commented on PR #2136: URL: https://github.com/apache/datafusion-comet/pull/2136#issuecomment-3193840124 After some debugging early exception raising (from datafusion custom ANSI kernel) seems to be the issue . Below is the exact test which is causing a failure in Spark4.0

Re: [PR] fix(ci): update `datafusion-physical-expr-adapter` version to 49.0.1in Cargo.lock [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on PR #17209: URL: https://github.com/apache/datafusion/pull/17209#issuecomment-3193819083 Thank you! Sorry I did not notice the version was out of date before merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280525475 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan DataS

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280521294 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan DataS

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280520051 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan DataS

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280521294 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan DataS

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280520051 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan DataS

Re: [I] Add a way to get what takes memory [datafusion]

2025-08-16 Thread via GitHub
rluvaton commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3193805380 This PR was more what I meant: - https://github.com/apache/datafusion/pull/16926 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

2025-08-16 Thread via GitHub
adriangb commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3193804053 @nuno-faria thank you for your patience. I can reproduce now, I must have been on the wrong commit or something. My laptop started having issues a couple days ago so I've been reset

Re: [I] Add a way to get what takes memory [datafusion]

2025-08-16 Thread via GitHub
rluvaton commented on issue #16904: URL: https://github.com/apache/datafusion/issues/16904#issuecomment-3193801681 > Specifically I think [@rluvaton](https://github.com/rluvaton) is requesting something that is already possible with TrackConsumersPool -- and if that is the case, then we cou

Re: [PR] trivial: remove unnecessary clone() [datafusion-comet]

2025-08-16 Thread via GitHub
isimluk commented on PR #2066: URL: https://github.com/apache/datafusion-comet/pull/2066#issuecomment-3193800870 > @isimluk could you rebase this on main? done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] chore: Improve documentation for `CometExecIterator` [datafusion-comet]

2025-08-16 Thread via GitHub
andygrove commented on code in PR #2169: URL: https://github.com/apache/datafusion-comet/pull/2169#discussion_r2280511357 ## spark/src/main/scala/org/apache/comet/CometExecIterator.scala: ## @@ -240,29 +287,37 @@ class CometExecIterator( traceMemoryUsage() } -

[PR] chore: Improve documentation for `CometExecIterator` [datafusion-comet]

2025-08-16 Thread via GitHub
andygrove opened a new pull request, #2169: URL: https://github.com/apache/datafusion-comet/pull/2169 ## Which issue does this PR close? N/A ## Rationale for this change ## What changes are included in this PR? - Add scaladoc documentati

Re: [PR] chore: Improve Arrow FFI documentation [datafusion-comet]

2025-08-16 Thread via GitHub
comphead commented on code in PR #2163: URL: https://github.com/apache/datafusion-comet/pull/2163#discussion_r2280502690 ## native/core/src/execution/utils.rs: ## @@ -81,7 +81,7 @@ impl SparkArrowConvert for ArrayData { Ok(ffi_array) } -/// Move this ArrowDat

[PR] minor: clean up distinct window code [datafusion]

2025-08-16 Thread via GitHub
zhuqi-lucas opened a new pull request, #17215: URL: https://github.com/apache/datafusion/pull/17215 ## Which issue does this PR close? Minor: code clean up Address comments from here: https://github.com/apache/datafusion/pull/16925#discussion_r2240964937 ## Rationale f

Re: [PR] docs: Add comprehensive JavaDoc/ScalaDoc to CometBatchIterator and CometExecIterator [datafusion-comet]

2025-08-16 Thread via GitHub
andygrove closed pull request #2165: docs: Add comprehensive JavaDoc/ScalaDoc to CometBatchIterator and CometExecIterator URL: https://github.com/apache/datafusion-comet/pull/2165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Doc: Update upgrade guide for the rewritten NLJ operator [datafusion]

2025-08-16 Thread via GitHub
xudong963 merged PR #17202: URL: https://github.com/apache/datafusion/pull/17202 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix(ci): update `datafusion-physical-expr-adapter` version to 49.0.1in Cargo.lock [datafusion]

2025-08-16 Thread via GitHub
alamb commented on PR #17209: URL: https://github.com/apache/datafusion/pull/17209#issuecomment-3193672555 Thank you @miroim -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] minor: Fix potential issue in CometBatchIterator [datafusion-comet]

2025-08-16 Thread via GitHub
andygrove opened a new pull request, #2168: URL: https://github.com/apache/datafusion-comet/pull/2168 ## Which issue does this PR close? N/A ## Rationale for this change The current code makes me nervous that it is possible for a batch to be garbage-colle

Re: [I] [EPIC] Improve the performance of ListingTable [datafusion]

2025-08-16 Thread via GitHub
alamb commented on issue #9964: URL: https://github.com/apache/datafusion/issues/9964#issuecomment-3193670421 We have made some progress on this recently so I am going to close down this old epic. I have filed a new planning epic for future improvements - https://github.com/apache

Re: [I] [EPIC] Improve the performance of ListingTable [datafusion]

2025-08-16 Thread via GitHub
alamb closed issue #9964: [EPIC] Improve the performance of ListingTable URL: https://github.com/apache/datafusion/issues/9964 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Fix: ListingTableFactory hive column detection [datafusion]

2025-08-16 Thread via GitHub
alamb commented on PR #17050: URL: https://github.com/apache/datafusion/pull/17050#issuecomment-3193669061 I also started collecitng listing table improvements togehter here to make them more discovreable - https://github.com/apache/datafusion/issues/17214 -- This is an automated messa

Re: [PR] Fix: ListingTableFactory hive column detection [datafusion]

2025-08-16 Thread via GitHub
alamb commented on PR #17050: URL: https://github.com/apache/datafusion/pull/17050#issuecomment-3193668747 > @alamb Since #17049 and #17212 are now separate issues, would you like me to close this PR and split the fixes into new PRs so the PRs are more directly aligned with the issues?

[I] [EPIC] ListingTable improvements [datafusion]

2025-08-16 Thread via GitHub
alamb opened a new issue, #17214: URL: https://github.com/apache/datafusion/issues/17214 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] feat: support `Utf8View` for more args of `regexp_replace` [datafusion]

2025-08-16 Thread via GitHub
alamb commented on code in PR #17195: URL: https://github.com/apache/datafusion/pull/17195#discussion_r2280428090 ## datafusion/functions/src/regex/regexpreplace.rs: ## @@ -94,14 +98,30 @@ impl Default for RegexpReplaceFunc { impl RegexpReplaceFunc { pub fn new() -> Self

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-16 Thread via GitHub
zhuqi-lucas commented on PR #17193: URL: https://github.com/apache/datafusion/pull/17193#issuecomment-3193655089 > Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions. Thank you @Dandandan for review! -- This is an au

[PR] fix: potential native broadcast failure in scenarios with ReusedExhange [datafusion-comet]

2025-08-16 Thread via GitHub
akupchinskiy opened a new pull request, #2167: URL: https://github.com/apache/datafusion-comet/pull/2167 ## Which issue does this PR close? Closes #. ## Rationale for this change Current CometBroadcast implementation might cause some complex plans to fail during the

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
xudong963 commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280383934 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -804,8 +805,8 @@ impl ExecutionPlan for HashJoinExec { self.mode, self.null_e

Re: [PR] Fix HashJoinExec sideways information passing for partitioned queries [datafusion]

2025-08-16 Thread via GitHub
nuno-faria commented on PR #17197: URL: https://github.com/apache/datafusion/pull/17197#issuecomment-3193592981 @adriangb I tested again and still see the issues. I'm testing with `datafusion-cli` using the debug mode, in 481f7f9382dd83d7d5416cdfb1bf52d26d7cec40. ```shell ❯ git lo

Re: [PR] Fix dynamic filter pushdown in HashJoinExec [datafusion]

2025-08-16 Thread via GitHub
nuno-faria commented on code in PR #17201: URL: https://github.com/apache/datafusion/pull/17201#discussion_r2280354181 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -286,5 +286,37 @@ explain select a from t where CAST(a AS string) = '0123'; physical_plan Dat

Re: [I] [datafusion-cli`] Add a way to see what object store requests are made [datafusion]

2025-08-16 Thread via GitHub
geoffreyclaude commented on issue #17207: URL: https://github.com/apache/datafusion/issues/17207#issuecomment-3193576589 > I think that is the exact usecase of the https://github.com/datafusion-contrib/datafusion-tracing project which might be worth a look It makes a lot of sense for

Re: [I] Fast parquet order inversion [datafusion]

2025-08-16 Thread via GitHub
zhuqi-lucas commented on issue #17172: URL: https://github.com/apache/datafusion/issues/17172#issuecomment-3193464795 Thank you @crepererum @alamb, I have a couple of clarification questions: 1. Is the idea here essentially to push down the TopK (ORDER BY … LIMIT k) into the ParquetEx

Re: [PR] Improve GitHub actions/python workflows [datafusion-ballista]

2025-08-16 Thread via GitHub
milenkovicm commented on PR #1289: URL: https://github.com/apache/datafusion-ballista/pull/1289#issuecomment-3193541754 > I keep the Python portion only. The later rust update seem to improve compiling time so would be nice to have that as well. Which part helps compile time rust edi

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-16 Thread via GitHub
Dandandan commented on PR #17193: URL: https://github.com/apache/datafusion/pull/17193#issuecomment-3193480856 Very cool! Added a comment on the upstream PR, I think it makes sense to see if we can avoid the (small) regressions. -- This is an automated message from the Apache Git Service.

Re: [PR] Unnest Correlated Subquery [datafusion]

2025-08-16 Thread via GitHub
irenjj commented on PR #17110: URL: https://github.com/apache/datafusion/pull/17110#issuecomment-3193472284 > Could you share how to merge this safely into mainbranch? should we maintain some feature flag, and let 2 framework of subquery decorrelation exists at the same time (current scatte

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-16 Thread via GitHub
zhuqi-lucas commented on PR #17193: URL: https://github.com/apache/datafusion/pull/17193#issuecomment-3193457194 Polished the upstream PR to support it now: https://github.com/apache/arrow-rs/pull/8146 -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Testing: Try test optimize performance for coalesce [datafusion]

2025-08-16 Thread via GitHub
zhuqi-lucas commented on code in PR #17193: URL: https://github.com/apache/datafusion/pull/17193#discussion_r2280283675 ## datafusion/physical-plan/src/coalesce/mod.rs: ## @@ -15,290 +15,158 @@ // specific language governing permissions and limitations // under the License.