Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-03-31 Thread via GitHub
martin-g commented on code in PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#discussion_r3013790462 ## spark/src/main/scala/org/apache/spark/sql/comet/CometExecRDD.scala: ## @@ -139,6 +139,13 @@ private[spark] class CometExecRDD( ctx.addTaskCompletion

[PR] feat: feature-gate datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-03-31 Thread via GitHub
zhuqi-lucas opened a new pull request, #21268: URL: https://github.com/apache/datafusion/pull/21268 ## Which issue does this PR close? N/A (build improvement) ## Rationale for this change `datafusion-substrait` is a heavyweight dependency that most developers and users d

[I] Feature-gate datafusion-substrait behind optional feature to reduce compile time [datafusion]

2026-03-31 Thread via GitHub
zhuqi-lucas opened a new issue, #21269: URL: https://github.com/apache/datafusion/issues/21269 **Is your feature request related to a problem or challenge?** `datafusion-substrait` is always compiled as a dependency of `datafusion-sqllogictest`, even though most developers and CI jobs

Re: [PR] feat: feature-gate datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-03-31 Thread via GitHub
zhuqi-lucas commented on PR #21268: URL: https://github.com/apache/datafusion/pull/21268#issuecomment-4160473831 CI failure in `verify benchmark results` is a pre-existing issue on main — `explain.slt` JSON format mismatch, unrelated to this PR (we only changed substrait feature gating).

Re: [PR] feat: add support for parquet content defined chunking options [datafusion]

2026-03-31 Thread via GitHub
kszucs commented on code in PR #21110: URL: https://github.com/apache/datafusion/pull/21110#discussion_r3014082004 ## docs/source/user-guide/configs.md: ## @@ -112,6 +112,7 @@ The following configuration settings are available: | datafusion.execution.parquet.allow_single_file_p

Re: [PR] feat: feature-gate datafusion-substrait behind optional 'substrait' feature [datafusion]

2026-03-31 Thread via GitHub
Copilot commented on code in PR #21268: URL: https://github.com/apache/datafusion/pull/21268#discussion_r3013970567 ## .github/workflows/rust.yml: ## @@ -537,7 +537,7 @@ jobs: # command cannot be run for all the .slt files. Run it for just one that works (limit.slt)

Re: [PR] Allow Spark partial / Comet final for compatible aggregates [datafusion-comet]

2026-03-31 Thread via GitHub
Shekharrajak commented on PR #2994: URL: https://github.com/apache/datafusion-comet/pull/2994#issuecomment-4160573420 Below tests assert on Spark-internal AQE optimization behavior (empty relation propagation, partition coalescing) that legitimately doesn't work when Comet's native operato

Re: [PR] feat: support GroupsAccumulator for first_value and last_value with string/binary types [datafusion]

2026-03-31 Thread via GitHub
UBarney commented on PR #21090: URL: https://github.com/apache/datafusion/pull/21090#issuecomment-4160617969 ``` > │ QQuery 5 │ 9810.71 / 10101.94 ±187.72 / 10392.09 ms │ 9705.66 / 10087.60 ±202.53 / 10258.60 ms │ no change │ > │ QQuery 6 │ 986.14 / 1000.78 ±18.89 / 1037.76 ms │

Re: [I] CaseWhen does not work with custom implemented column expression [datafusion]

2026-03-31 Thread via GitHub
rluvaton commented on issue #21231: URL: https://github.com/apache/datafusion/issues/21231#issuecomment-4160717417 you only need to pass throught the same naming scope -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[I] [DISCUSSION] Maybe support planning scalar function in `ExprPlanner` as well? [datafusion]

2026-03-31 Thread via GitHub
MichaelScofield opened a new issue, #21270: URL: https://github.com/apache/datafusion/issues/21270 I see there are `plan_aggregate` and `plan_window` in `ExprPlanner`, each for planning aggregate function and window function. However, there's no alike method for scalar function. Any blocker

[PR] fix: disable atan2 instead of tan [datafusion-comet]

2026-03-31 Thread via GitHub
kazuyukitanimura opened a new pull request, #3849: URL: https://github.com/apache/datafusion-comet/pull/3849 ## Which issue does this PR close? Related: #1897 ## Rationale for this change #1897 claims `tan` is incompatible; however, what is really incompatible is `atan2`

Re: [I] tan(-0.0) produces incorrect result [datafusion-comet]

2026-03-31 Thread via GitHub
kazuyukitanimura commented on issue #1897: URL: https://github.com/apache/datafusion-comet/issues/1897#issuecomment-4160813340 Looks what is actually wrong is `atan2` #3849 I plan to update the description of this issue later -- This is an automated message from the Apache Git Service

[PR] chore(deps): bump astral-sh/setup-uv from 7.6.0 to 8.0.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21272: URL: https://github.com/apache/datafusion/pull/21272 Bumps [astral-sh/setup-uv](https://github.com/astral-sh/setup-uv) from 7.6.0 to 8.0.0. Release notes Sourced from https://github.com/astral-sh/setup-uv/releases";>astral-sh/setup

Re: [PR] Refactor: expose predicate constant inference from physical-expr [datafusion]

2026-03-31 Thread via GitHub
xudong963 merged PR #21167: URL: https://github.com/apache/datafusion/pull/21167 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] chore(deps): bump object_store from 0.13.1 to 0.13.2 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21275: URL: https://github.com/apache/datafusion/pull/21275 Bumps [object_store](https://github.com/apache/arrow-rs-object-store) from 0.13.1 to 0.13.2. Changelog Sourced from https://github.com/apache/arrow-rs-object-store/blob/main/CHAN

[PR] chore(deps): bump rustyline from 17.0.2 to 18.0.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21276: URL: https://github.com/apache/datafusion/pull/21276 Bumps [rustyline](https://github.com/kkawakam/rustyline) from 17.0.2 to 18.0.0. Release notes Sourced from https://github.com/kkawakam/rustyline/releases";>rustyline's releases.

[PR] chore(deps): bump getrandom from 0.3.4 to 0.4.2 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21278: URL: https://github.com/apache/datafusion/pull/21278 Bumps [getrandom](https://github.com/rust-random/getrandom) from 0.3.4 to 0.4.2. Changelog Sourced from https://github.com/rust-random/getrandom/blob/master/CHANGELOG.md";>getran

[PR] chore(deps): bump the all-other-cargo-deps group with 7 updates [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21274: URL: https://github.com/apache/datafusion/pull/21274 Bumps the all-other-cargo-deps group with 7 updates: | Package | From | To | | --- | --- | --- | | [insta](https://github.com/mitsuhiko/insta) | `1.46.3` | `1.47.2` | | [uui

[PR] chore(deps): bump md-5 from 0.10.6 to 0.11.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21279: URL: https://github.com/apache/datafusion/pull/21279 Bumps [md-5](https://github.com/RustCrypto/hashes) from 0.10.6 to 0.11.0. Commits https://github.com/RustCrypto/hashes/commit/b5051e5a5e7dc86a6c27c1ec7a390744ebcfb97a";>b5051e

[PR] chore(deps): bump sha2 from 0.10.9 to 0.11.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21281: URL: https://github.com/apache/datafusion/pull/21281 Bumps [sha2](https://github.com/RustCrypto/hashes) from 0.10.9 to 0.11.0. Commits https://github.com/RustCrypto/hashes/commit/ffe093984c004769747e998f77da8ff7c0e7a765";>ffe093

[PR] chore(deps): bump ctor from 0.6.3 to 0.8.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21282: URL: https://github.com/apache/datafusion/pull/21282 Bumps [ctor](https://github.com/mmastrac/rust-ctor) from 0.6.3 to 0.8.0. Commits See full diff in https://github.com/mmastrac/rust-ctor/commits";>compare view

Re: [I] Wrapped unary negation under IS NULL can escape analyzer type validation [datafusion]

2026-03-31 Thread via GitHub
myandpr closed issue #20988: Wrapped unary negation under IS NULL can escape analyzer type validation URL: https://github.com/apache/datafusion/issues/20988 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] chore(deps): bump taiki-e/install-action from 2.69.7 to 2.70.3 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21271: URL: https://github.com/apache/datafusion/pull/21271 Bumps [taiki-e/install-action](https://github.com/taiki-e/install-action) from 2.69.7 to 2.70.3. Release notes Sourced from https://github.com/taiki-e/install-action/releases";>t

Re: [PR] CI: Add CodeQL workflow for GitHub Actions security scanning [datafusion-python]

2026-03-31 Thread via GitHub
kosiew commented on PR #1408: URL: https://github.com/apache/datafusion-python/pull/1408#issuecomment-4161017358 Merging as this a straightforward improvement PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] CI: Add CodeQL workflow for GitHub Actions security scanning [datafusion-python]

2026-03-31 Thread via GitHub
kosiew merged PR #1408: URL: https://github.com/apache/datafusion-python/pull/1408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

[PR] chore(deps): bump snmalloc-rs from 0.3.8 to 0.7.4 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21280: URL: https://github.com/apache/datafusion/pull/21280 Bumps [snmalloc-rs](https://github.com/microsoft/snmalloc) from 0.3.8 to 0.7.4. Release notes Sourced from https://github.com/microsoft/snmalloc/releases";>snmalloc-rs's release

[PR] chore(deps): bump github/codeql-action from 4.34.1 to 4.35.1 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21273: URL: https://github.com/apache/datafusion/pull/21273 Bumps [github/codeql-action](https://github.com/github/codeql-action) from 4.34.1 to 4.35.1. Release notes Sourced from https://github.com/github/codeql-action/releases";>github/

[PR] chore(deps): bump sha1 from 0.10.6 to 0.11.0 [datafusion]

2026-03-31 Thread via GitHub
dependabot[bot] opened a new pull request, #21277: URL: https://github.com/apache/datafusion/pull/21277 Bumps [sha1](https://github.com/RustCrypto/hashes) from 0.10.6 to 0.11.0. Commits https://github.com/RustCrypto/hashes/commit/2f00175af936de46b3ddefe65c4de93cb4e876e4";>2f0017

Re: [PR] Add end-to-end Parquet tests for List and LargeList struct schema evolution [datafusion]

2026-03-31 Thread via GitHub
kosiew commented on code in PR #20840: URL: https://github.com/apache/datafusion/pull/20840#discussion_r3014540056 ## datafusion/sqllogictest/test_files/schema_evolution_nested.slt: ## @@ -0,0 +1,124 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more con

Re: [PR] Add end-to-end Parquet tests for List and LargeList struct schema evolution [datafusion]

2026-03-31 Thread via GitHub
kosiew merged PR #20840: URL: https://github.com/apache/datafusion/pull/20840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

[I] CometCastSuite: negative precision cast test compares `Option[Throwable]` to a `String` [datafusion-comet]

2026-03-31 Thread via GitHub
manuzhang opened a new issue, #3850: URL: https://github.com/apache/datafusion-comet/issues/3850 ### Describe the bug The test is broken because `checkSparkAnswerMaybeThrows` returns `Option[Throwable]`, so `expected.contains("PARSE_SYNTAX_ERROR")` compares a `Throwable` option to a

[PR] fix: correct invalid Option.contains assertion in cast test [datafusion-comet]

2026-03-31 Thread via GitHub
manuzhang opened a new pull request, #3851: URL: https://github.com/apache/datafusion-comet/pull/3851 ## Which issue does this PR close? Closes #3850. ## Rationale for this change The test is broken because `checkSparkAnswerMaybeThrows` returns `Option[Throwa

Re: [PR] feat: support GroupsAccumulator for first_value and last_value with string/binary types [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21090: URL: https://github.com/apache/datafusion/pull/21090#issuecomment-4160174867 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21090#issuecomment-4160090093) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

[PR] chore: fix upgrade guide link for object_store release notes [datafusion]

2026-03-31 Thread via GitHub
haohuaijin opened a new pull request, #21283: URL: https://github.com/apache/datafusion/pull/21283 fix object_store release notes link in datafusion 53 upgrade guide -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Spark quarter function implementation [datafusion]

2026-03-31 Thread via GitHub
kosiew commented on code in PR #20808: URL: https://github.com/apache/datafusion/pull/20808#discussion_r3014862064 ## datafusion/spark/src/function/datetime/quarter.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[PR] fix: include scan impl in PR Linux artifact names [datafusion-comet]

2026-03-31 Thread via GitHub
manuzhang opened a new pull request, #3853: URL: https://github.com/apache/datafusion-comet/pull/3853 ## Which issue does this PR close? Closes #3852. ## Rationale for this change ## What changes are included in this PR? ## How are these cha

[I] CI: Artifact name conflict (409) in PR Build (Linux) when two matrix profiles share the same name [datafusion-comet]

2026-03-31 Thread via GitHub
manuzhang opened a new issue, #3852: URL: https://github.com/apache/datafusion-comet/issues/3852 ### Describe the bug In `.github/workflows/pr_build_linux.yml`, the `linux-test` job constructs artifact names as: ```yaml artifact_name: ${{ matrix.profile.name }}-${{ matrix.su

Re: [PR] Skip probe-side consumption when hash join build side is empty [datafusion]

2026-03-31 Thread via GitHub
LiaCastaneda commented on PR #21068: URL: https://github.com/apache/datafusion/pull/21068#issuecomment-4161629388 I was not aware there was a PR for this already -- I will try to review this week 👀 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Add `FileStreamBuilder` for creating FileStreams [datafusion]

2026-03-31 Thread via GitHub
alamb commented on code in PR #21261: URL: https://github.com/apache/datafusion/pull/21261#discussion_r3011842221 ## datafusion/datasource/src/file_stream/mod.rs: ## @@ -539,10 +535,13 @@ mod tests { .with_limit(self.limit) .build(); let me

[PR] Add FixedSizeList struct support for nested schema evolution (planner/runtime parity) [datafusion]

2026-03-31 Thread via GitHub
kosiew opened a new pull request, #21284: URL: https://github.com/apache/datafusion/pull/21284 ## Which issue does this PR close? * Part of #20835 --- ## Rationale for this change DataFusion already supports recursive schema evolution for several nested container

Re: [PR] perf: improve json read [datafusion]

2026-03-31 Thread via GitHub
ariel-miculas commented on PR #20823: URL: https://github.com/apache/datafusion/pull/20823#issuecomment-4162022807 @alamb can this be merged or is there something more you'd like me to do? -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix(sql): fix a bug when planning semi- or antijoins [datafusion]

2026-03-31 Thread via GitHub
aalexandrov commented on code in PR #20990: URL: https://github.com/apache/datafusion/pull/20990#discussion_r3015343761 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4847,6 +4847,40 @@ fn test_using_join_wildcard_schema() { ); } +#[test] +fn test_using_join_wildcard

[PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
alamb opened a new pull request, #21285: URL: https://github.com/apache/datafusion/pull/21285 ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/20529 - Broken out of https://github.com/apache/datafusion/pull/20820 ## Rationale for this c

Re: [PR] Refactor parquet datasource into an explicit state machine [datafusion]

2026-03-31 Thread via GitHub
alamb commented on code in PR #21190: URL: https://github.com/apache/datafusion/pull/21190#discussion_r3015363551 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -125,15 +133,338 @@ pub(super) struct ParquetOpener { pub reverse_row_groups: bool, } +/// States for [

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3015384978 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[coun

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3015387714 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[coun

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3015389470 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[coun

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
Dandandan commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3015389470 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[coun

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
Dandandan commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4162087969 We're close now in terms of performance! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix(sql): fix a bug when planning semi- or antijoins [datafusion]

2026-03-31 Thread via GitHub
aalexandrov commented on code in PR #20990: URL: https://github.com/apache/datafusion/pull/20990#discussion_r3015407690 ## datafusion/expr/src/utils.rs: ## Review Comment: I don't have a strong preference here—in the initial PR I was trying to do the minimal amount of chan

Re: [PR] Spark quarter function implementation [datafusion]

2026-03-31 Thread via GitHub
kazantsev-maksim commented on code in PR #20808: URL: https://github.com/apache/datafusion/pull/20808#discussion_r3015543413 ## datafusion/spark/src/function/datetime/quarter.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] Spark quarter function implementation [datafusion]

2026-03-31 Thread via GitHub
kazantsev-maksim commented on code in PR #20808: URL: https://github.com/apache/datafusion/pull/20808#discussion_r3015554736 ## datafusion/spark/src/function/datetime/quarter.rs: ## @@ -0,0 +1,106 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
asolimando commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4162286204 > ## Rationale for this change > Previously, DataFusion evaluated uncorrelated scalar subqueries by transforming them into joins. This has two shortcomings: > > 1. Scalar

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
alamb commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196 run benchmarks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162497344 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162497348 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162497375 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] docs: Add `RESET` Command Documentation [datafusion]

2026-03-31 Thread via GitHub
alamb commented on PR #21245: URL: https://github.com/apache/datafusion/pull/21245#issuecomment-4162567809 TIL about RESET. Thank you @erenavsarogullari and @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] docs: Add `RESET` Command Documentation [datafusion]

2026-03-31 Thread via GitHub
alamb merged PR #21245: URL: https://github.com/apache/datafusion/pull/21245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add RESET Command Documentation [datafusion]

2026-03-31 Thread via GitHub
alamb closed issue #21244: Add RESET Command Documentation URL: https://github.com/apache/datafusion/issues/21244 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Spurious error when subquery column alias clashes with outer query identifier [datafusion]

2026-03-31 Thread via GitHub
alamb closed issue #21206: Spurious error when subquery column alias clashes with outer query identifier URL: https://github.com/apache/datafusion/issues/21206 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: Fix three bugs in query decorrelation [datafusion]

2026-03-31 Thread via GitHub
alamb merged PR #21208: URL: https://github.com/apache/datafusion/pull/21208 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] IN subqueries can return incorrect results [datafusion]

2026-03-31 Thread via GitHub
alamb closed issue #21205: IN subqueries can return incorrect results URL: https://github.com/apache/datafusion/issues/21205 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Ambiguous reference to __always_true with multiple correlated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
alamb closed issue #20315: Ambiguous reference to __always_true with multiple correlated scalar subqueries URL: https://github.com/apache/datafusion/issues/20315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162576684 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162610519 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] refactor: Split Parquet BloomFilter CPU and IO into separate states [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21285: URL: https://github.com/apache/datafusion/pull/21285#issuecomment-4162613868 🤖 Benchmark completed (GKE) | [trigger](https://github.com/apache/datafusion/pull/21285#issuecomment-4162481196) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) CPU

Re: [PR] fix: include scan impl in PR Linux artifact names [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove merged PR #3853: URL: https://github.com/apache/datafusion-comet/pull/3853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] CI: Artifact name conflict (409) in PR Build (Linux) when two matrix profiles share the same name [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove closed issue #3852: CI: Artifact name conflict (409) in PR Build (Linux) when two matrix profiles share the same name URL: https://github.com/apache/datafusion-comet/issues/3852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
neilconway commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4162738604 @asolimando Thanks for the thoughtful comment! > I am not aware of any database going down this route This technique is widely used: Postgres does almost exactly what

Re: [PR] Sketch out a Morselize API [datafusion]

2026-03-31 Thread via GitHub
alamb commented on code in PR #20820: URL: https://github.com/apache/datafusion/pull/20820#discussion_r3016032590 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -108,48 +132,136 @@ pub(super) struct ParquetOpener { pub enable_row_group_stats_pruning: bool, /// C

Re: [I] [DISCUSSION] Maybe support planning scalar function in `ExprPlanner` as well? [datafusion]

2026-03-31 Thread via GitHub
alamb commented on issue #21270: URL: https://github.com/apache/datafusion/issues/21270#issuecomment-4162784117 Seems like a natural extension to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] test: add SQL file test for casting double to string [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove opened a new pull request, #3854: URL: https://github.com/apache/datafusion-comet/pull/3854 ## Which issue does this PR close? N/A - adds test coverage. ## Rationale for this change Add SQL file test coverage for casting double to string, specifically covering

Re: [PR] test: cast negative zero to string [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove merged PR #3829: URL: https://github.com/apache/datafusion-comet/pull/3829 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] fix: disable atan2 instead of tan [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove commented on PR #3849: URL: https://github.com/apache/datafusion-comet/pull/3849#issuecomment-4162954294 Thanks @kazuyukitanimura. Could you add SQL file based tests, since this is preferred approach now. See https://github.com/apache/datafusion-comet/pull/3854 for example. --

[I] Experiment: immediate-mode shuffle [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove opened a new issue, #3855: URL: https://github.com/apache/datafusion-comet/issues/3855 ### What is the problem the feature request solves? I have been experimenting with "immediate mode" shuffle, where we immediately repartition batches using interleave record batches to pro

Re: [I] Experiment: immediate-mode shuffle [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove commented on issue #3855: URL: https://github.com/apache/datafusion-comet/issues/3855#issuecomment-4162984087 The main issue I am running into with this approach is higher peak memory due to allocator churn due to interleave creating so many small arrays. This approach uses 2x th

Re: [PR] doc: Add documentation explaining the behavior of `null` values ​​in struct comparisons [datafusion]

2026-03-31 Thread via GitHub
xiedeyantu commented on PR #21226: URL: https://github.com/apache/datafusion/pull/21226#issuecomment-4163001705 @alamb Could you please take a look at it again? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] add Expr FFI for building and combining expressions [datafusion]

2026-03-31 Thread via GitHub
yuvalif opened a new issue, #21286: URL: https://github.com/apache/datafusion/issues/21286 ### Is your feature request related to a problem or challenge? I am implementing C bindings for the lancedb project (see: https://github.com/lancedb/lancedb-c) and need datafusion expression FFI

Re: [PR] test: ceil and floor works correctly for Decimal128 [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove commented on code in PR #3848: URL: https://github.com/apache/datafusion-comet/pull/3848#discussion_r3016304065 ## native/spark-expr/src/math_funcs/ceil.rs: ## @@ -178,9 +177,9 @@ mod test { unreachable!() }; let expected = Decimal128Arra

Re: [PR] fix: `SELECT * EXCLUDE(...)` silently returns empty rows when all columns are excluded [datafusion]

2026-03-31 Thread via GitHub
xiedeyantu commented on PR #21259: URL: https://github.com/apache/datafusion/pull/21259#issuecomment-4163155425 @alamb Sorry to bother you. Could someone please help me review this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] Experiment: immediate-mode shuffle [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove commented on issue #3855: URL: https://github.com/apache/datafusion-comet/issues/3855#issuecomment-4163190297 Gluten follows this approach: - Pre-allocate one set of column buffers per partition - Scatter-write rows from each input batch directly into partition buffers u

Re: [PR] fix: correct invalid Option.contains assertion in cast test [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove merged PR #3851: URL: https://github.com/apache/datafusion-comet/pull/3851 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] CometCastSuite: negative precision cast test compares `Option[Throwable]` to a `String` [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove closed issue #3850: CometCastSuite: negative precision cast test compares `Option[Throwable]` to a `String` URL: https://github.com/apache/datafusion-comet/issues/3850 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] bug: Native Iceberg reader can return wrong results for migrated Parquet files with INT96 timestamps [datafusion-comet]

2026-03-31 Thread via GitHub
mbutrovich opened a new issue, #3856: URL: https://github.com/apache/datafusion-comet/issues/3856 ### Describe the bug CometIcebergNativeScan reads INT96 timestamps incorrectly, resulting in ~1170 year offset. **Example:** - Correct (Spark/Java): `3332-12-14 11:33:10.965`

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-03-31 Thread via GitHub
comphead commented on code in PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#discussion_r3016561532 ## spark/src/test/scala/org/apache/spark/sql/comet/CometTaskMetricsSuite.scala: ## @@ -91,4 +94,66 @@ class CometTaskMetricsSuite extends CometTestBase with

Re: [PR] feat: enable native Iceberg reader by default [datafusion-comet]

2026-03-31 Thread via GitHub
mbutrovich commented on PR #3819: URL: https://github.com/apache/datafusion-comet/pull/3819#issuecomment-4163406255 I would say this issue is blocking for this PR: https://github.com/apache/datafusion-comet/issues/3856. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] test: cast negative zero to string [datafusion-comet]

2026-03-31 Thread via GitHub
kazuyukitanimura commented on PR #3829: URL: https://github.com/apache/datafusion-comet/pull/3829#issuecomment-4163419358 Thank you @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-03-31 Thread via GitHub
comphead commented on code in PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#discussion_r3016598050 ## spark/src/test/scala/org/apache/spark/sql/comet/CometTaskMetricsSuite.scala: ## @@ -91,4 +94,66 @@ class CometTaskMetricsSuite extends CometTestBase with

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
neilconway commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3016663839 ## datafusion/sqllogictest/test_files/tpch/plans/q22.slt.part: ## @@ -61,40 +61,41 @@ logical_plan 03)Aggregate: groupBy=[[custsale.cntrycode]], aggr=[[cou

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
neilconway commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4163500190 run benchmark tpch tpch10 tpcds -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
neilconway commented on code in PR #21240: URL: https://github.com/apache/datafusion/pull/21240#discussion_r3016674981 ## datafusion/physical-plan/src/scalar_subquery.rs: ## @@ -0,0 +1,416 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4163519101 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21240#issuecomment-4163500190) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4163521005 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21240#issuecomment-4163500190) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] perf: Implement physical execution of uncorrelated scalar subqueries [datafusion]

2026-03-31 Thread via GitHub
adriangbot commented on PR #21240: URL: https://github.com/apache/datafusion/pull/21240#issuecomment-4163519153 🤖 Benchmark running (GKE) | [trigger](https://github.com/apache/datafusion/pull/21240#issuecomment-4163500190) **Instance:** `c4a-highmem-16` (12 vCPU / 65 GiB) | `Linux bench-

Re: [PR] chore: `native_datafusion` to report scan task input metrics [datafusion-comet]

2026-03-31 Thread via GitHub
comphead commented on code in PR #3842: URL: https://github.com/apache/datafusion-comet/pull/3842#discussion_r3016711255 ## spark/src/main/scala/org/apache/spark/sql/comet/CometExecRDD.scala: ## @@ -139,6 +139,13 @@ private[spark] class CometExecRDD( ctx.addTaskCompletion

Re: [PR] fix: native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove merged PR #3808: URL: https://github.com/apache/datafusion-comet/pull/3808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields [datafusion-comet]

2026-03-31 Thread via GitHub
andygrove closed issue #3760: native_datafusion: case-insensitive mode doesn't detect duplicate/ambiguous Parquet fields URL: https://github.com/apache/datafusion-comet/issues/3760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] support dynamic filtering on partitioned data from file source [datafusion]

2026-03-31 Thread via GitHub
alamb commented on issue #20195: URL: https://github.com/apache/datafusion/issues/20195#issuecomment-4163587474 There is a lot of good discussion on https://github.com/apache/datafusion/pull/20901 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] perf: Optimize `string_to_array` for scalar args [datafusion]

2026-03-31 Thread via GitHub
comphead merged PR #21131: URL: https://github.com/apache/datafusion/pull/21131 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

  1   2   3   >