Re: [PR] chore: Collect fallback reasons for spark sql tests [datafusion-comet]

2025-09-04 Thread via GitHub
codecov-commenter commented on PR #2313: URL: https://github.com/apache/datafusion-comet/pull/2313#issuecomment-3257255278 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2313?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] better preserve statistics when applying limits [datafusion]

2025-09-04 Thread via GitHub
xudong963 commented on code in PR #17381: URL: https://github.com/apache/datafusion/pull/17381#discussion_r2324235835 ## datafusion/common/src/stats.rs: ## @@ -391,62 +391,85 @@ impl Statistics { /// parameter to compute global statistics in a multi-partition setting.

Re: [PR] Add PhysicalExpr::is_volatile [datafusion]

2025-09-04 Thread via GitHub
xudong963 commented on code in PR #17351: URL: https://github.com/apache/datafusion/pull/17351#discussion_r2324227931 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -377,6 +377,19 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash {

[PR] chore: Collect fallback reasons for spark sql tests [datafusion-comet]

2025-09-04 Thread via GitHub
wForget opened a new pull request, #2313: URL: https://github.com/apache/datafusion-comet/pull/2313 ## Which issue does this PR close? Closes #2302 . ## Rationale for this change Helps us find operators that comet does not support ## What changes are includ

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-04 Thread via GitHub
stuartcarnie commented on PR #17431: URL: https://github.com/apache/datafusion/pull/17431#issuecomment-3257175640 > I'm surprised binary data isnt part of the join fuzz testing, this could be put in a follow up issue Sounds good. Where is the fuzz testing, as I was looking for a place

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n commented on PR #17431: URL: https://github.com/apache/datafusion/pull/17431#issuecomment-3257195263 Here is the file for slt testing SMJ: https://github.com/apache/datafusion/blob/main/datafusion/sqllogictest/test_files/sort_merge_join.slt -- This is an automated message from

Re: [PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n commented on PR #17431: URL: https://github.com/apache/datafusion/pull/17431#issuecomment-3257192246 > > I'm surprised binary data isnt part of the join fuzz testing, this could be put in a follow up issue > > Sounds good. Where is the fuzz testing, as I was looking for a

Re: [PR] feat: Improve some confusing fallback reasons [datafusion-comet]

2025-09-04 Thread via GitHub
wForget commented on code in PR #2301: URL: https://github.com/apache/datafusion-comet/pull/2301#discussion_r2324130066 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -540,8 +540,13 @@ case class CometExecRule(session: SparkSession) extends Rule[Spark

Re: [PR] refactor: Use `BufferedBatchState` enum for SMJ spilling [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n commented on code in PR #17429: URL: https://github.com/apache/datafusion/pull/17429#discussion_r2323986334 ## datafusion/physical-plan/src/joins/sort_merge_join/stream.rs: ## @@ -849,11 +854,13 @@ impl SortMergeJoinStream { fn free_reservation(&mut self, buff

Re: [PR] chore: [1941-Part2]: Introduce map_to_list scalar function [datafusion-comet]

2025-09-04 Thread via GitHub
rishvin commented on PR #2312: URL: https://github.com/apache/datafusion-comet/pull/2312#issuecomment-3257130644 @comphead / @andygrove split the second function. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] chore: [1941-Part2]: Introduce map_to_list scalar function [datafusion-comet]

2025-09-04 Thread via GitHub
rishvin opened a new pull request, #2312: URL: https://github.com/apache/datafusion-comet/pull/2312 ## Which issue does this PR close? Addresses Part of #1941 ## Rationale for this change Introduces `map_to_list` which converts a `MapArray` to `ListArray`.

[PR] refactor: Use `BufferedBatchState` enum for SMJ spilling [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n opened a new pull request, #17429: URL: https://github.com/apache/datafusion/pull/17429 ## Which issue does this PR close? This was brought up by @alamb in the SMJ spilling pull request. ## Rationale for this change Using enum to represent batch spill sta

[I] Add configuration to choose specific join implementation [datafusion]

2025-09-04 Thread via GitHub
2010YOUY01 opened a new issue, #17432: URL: https://github.com/apache/datafusion/issues/17432 ### Is your feature request related to a problem or challenge? DataFusion has several join implementations like Nested Loop Join, Hash Join, etc. For a given join SQL query, there might be mu

[PR] feat: Support binary data types for `SortMergeJoin` `on` clause [datafusion]

2025-09-04 Thread via GitHub
stuartcarnie opened a new pull request, #17431: URL: https://github.com/apache/datafusion/pull/17431 ## Which issue does this PR close? N/A ## Rationale for this change Adds the ability for the `SortMergeJoin` physical node to join on binary types: - `Binary`,

Re: [PR] refactor: Use `BufferedBatchState` enum for SMJ spilling [datafusion]

2025-09-04 Thread via GitHub
2010YOUY01 commented on code in PR #17429: URL: https://github.com/apache/datafusion/pull/17429#discussion_r2324056495 ## datafusion/physical-plan/src/joins/sort_merge_join/stream.rs: ## @@ -185,11 +185,12 @@ impl StreamedBatch { } /// A buffered batch that contains contiguo

Re: [PR] refactor: Use `BufferedBatchState` enum for SMJ spilling [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n commented on code in PR #17429: URL: https://github.com/apache/datafusion/pull/17429#discussion_r2323986334 ## datafusion/physical-plan/src/joins/sort_merge_join/stream.rs: ## @@ -849,11 +854,13 @@ impl SortMergeJoinStream { fn free_reservation(&mut self, buff

[PR] docs: Render `--` properly in profiling docs [datafusion]

2025-09-04 Thread via GitHub
petern48 opened a new pull request, #17430: URL: https://github.com/apache/datafusion/pull/17430 ## Which issue does this PR close? ## Rationale for this change I noticed in the profiling docs the `--` renders as a larger dash in the text (top of the pic). I

Re: [PR] fix: return ALL constants in `EquivalenceProperties::constants` [datafusion]

2025-09-04 Thread via GitHub
alamb commented on PR #17404: URL: https://github.com/apache/datafusion/pull/17404#issuecomment-3256972386 πŸ€–: Benchmark completed Details ``` group crepererum_issue17372 main -

Re: [PR] fix: modify the type coercion logic to avoid planning error [datafusion]

2025-09-04 Thread via GitHub
2010YOUY01 commented on PR #17418: URL: https://github.com/apache/datafusion/pull/17418#issuecomment-3256917413 I'm not familiar with the related code at this moment, but 1. If this code path is only used for null literals like `NULL % NULL` 2. No more other hack changes to pass pass th

[I] Extension metadata improperly propagated across IS NULL operator [datafusion]

2025-09-04 Thread via GitHub
paleolimbot opened a new issue, #17428: URL: https://github.com/apache/datafusion/issues/17428 ### Describe the bug In the expression `some_function_that_returns_an_extension() IS NULL`, the field metadata is propagated into the output boolean. This causes errors when collecting an e

Re: [PR] feat(spark): Implement Spark functions `url_encode` and `url_decode` [datafusion]

2025-09-04 Thread via GitHub
anhvdq commented on PR #17399: URL: https://github.com/apache/datafusion/pull/17399#issuecomment-3256869783 Thank @timsaucer for your feedback. Let me check and update accordingly -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Support `MemoryExec` in proto `try_from_physical_plan` [datafusion]

2025-09-04 Thread via GitHub
lewiszlw closed issue #14082: Support `MemoryExec` in proto `try_from_physical_plan` URL: https://github.com/apache/datafusion/issues/14082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] implement handle_scalar_subquery [datafusion]

2025-09-04 Thread via GitHub
github-actions[bot] commented on PR #16691: URL: https://github.com/apache/datafusion/pull/16691#issuecomment-3256824353 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: Remove duplicate filter from `CrossJoin` unparsing [datafusion]

2025-09-04 Thread via GitHub
Jefffrey merged PR #17382: URL: https://github.com/apache/datafusion/pull/17382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] [EPIC]: Make `PiecewiseMergeJoin` work in Datafusion [datafusion]

2025-09-04 Thread via GitHub
jonathanc-n opened a new issue, #17427: URL: https://github.com/apache/datafusion/issues/17427 ### Is your feature request related to a problem or challenge? I will organize the following tasks that will be done in separate pull requests as #16660 got a little large and confusing.

[I] TakeOrderedAndProjectExec is not reporting all fallback reasons [datafusion-comet]

2025-09-04 Thread via GitHub
kazuyukitanimura opened a new issue, #2311: URL: https://github.com/apache/datafusion-comet/issues/2311 ### Describe the bug https://github.com/apache/datafusion-comet/blob/main/spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala#L384 TakeOrderedAndProjectExec fallb

Re: [PR] fix: Validating object store configs should not throw exception [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove merged PR #2308: URL: https://github.com/apache/datafusion-comet/pull/2308 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] auto scan mode does not fall back on error validating object store configs [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove closed issue #2305: auto scan mode does not fall back on error validating object store configs URL: https://github.com/apache/datafusion-comet/issues/2305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Unparsing of CROSS JOINs with filters is generating incorrect queries [datafusion]

2025-09-04 Thread via GitHub
Jefffrey closed issue #17359: Unparsing of CROSS JOINs with filters is generating incorrect queries URL: https://github.com/apache/datafusion/issues/17359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] fix: Remove duplicate filter from `CrossJoin` unparsing [datafusion]

2025-09-04 Thread via GitHub
Jefffrey commented on code in PR #17382: URL: https://github.com/apache/datafusion/pull/17382#discussion_r2323782764 ## datafusion/sql/src/unparser/plan.rs: ## @@ -696,13 +696,6 @@ impl Unparser<'_> { join_filters.as_ref(), )?; -

Re: [PR] Add PhysicalExpr::is_volatile [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on PR #17351: URL: https://github.com/apache/datafusion/pull/17351#issuecomment-3255965541 @findepi could you take a look at this please? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[I] Extension metadata dropped from literals in SQL VALUES clause [datafusion]

2025-09-04 Thread via GitHub
paleolimbot opened a new issue, #17425: URL: https://github.com/apache/datafusion/issues/17425 ### Describe the bug Function calls that return scalars can be used in SQL VALUES; however if they contain extension metadata the metadata is dropped. ### To Reproduce Output:

[I] Update docs to explain that native_iceberg_compat uses the system CA certificates and not JVM key store [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove opened a new issue, #2310: URL: https://github.com/apache/datafusion-comet/issues/2310 ### What is the problem the feature request solves? Update docs to explain that native_iceberg_compat uses the system CA certificates and not JVM key store ### Describe the potentia

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
Blizzara commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323707179 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it int

Re: [PR] Enable merge queue in sqlparser-rs [datafusion-sqlparser-rs]

2025-09-04 Thread via GitHub
blaginin commented on PR #2007: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2007#issuecomment-3255477010 I'm going to merge to test it. Will keep an eye on the repo - @iffyio if something is inconvenient with the new setup, feel free to ping me and I'll fix it -- This is

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
vbarua commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323688863 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it into

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
vbarua commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323680328 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it into

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
xanderbailey commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r232319 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
xanderbailey commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323660433 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
xanderbailey commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323653717 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-04 Thread via GitHub
vbarua commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2323596891 ## datafusion/substrait/tests/cases/logical_plans.rs: ## @@ -144,6 +144,47 @@ mod tests { Ok(()) } +#[tokio::test] +async fn null_literal_be

[I] Write documentation about enabling native logging via log4rs and COMET_CONF_DIR [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove opened a new issue, #2309: URL: https://github.com/apache/datafusion-comet/issues/2309 ### What is the problem the feature request solves? Write documentation about enabling native logging via log4rs and COMET_CONF_DIR ### Describe the potential solution _No re

Re: [PR] WIP: Upgrade to arrow 56.1.0 [datafusion]

2025-09-04 Thread via GitHub
alamb commented on PR #17275: URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3254382330 Thanks @nuno-faria -- I'll try and polish this over the next few days if no one beats me to it -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Enable merge queue in sqlparser-rs [datafusion-sqlparser-rs]

2025-09-04 Thread via GitHub
blaginin commented on code in PR #2007: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2007#discussion_r2323436373 ## .github/workflows/rust.yml: ## @@ -17,13 +17,17 @@ name: Rust -on: [push, pull_request] +on: + push: +branches-ignore: + - 'gh-reado

Re: [PR] Enable merge queue in sqlparser-rs [datafusion-sqlparser-rs]

2025-09-04 Thread via GitHub
blaginin commented on PR #2007: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2007#issuecomment-3255517640 All steps are now required - seems to be working :) https://github.com/user-attachments/assets/ee1de8f7-3a4f-4228-a4fa-b025a82e844b"; /> Asked the infra tim

Re: [PR] Enable merge queue in sqlparser-rs [datafusion-sqlparser-rs]

2025-09-04 Thread via GitHub
blaginin commented on PR #2007: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2007#issuecomment-3255451276 Small note for clarity: this pr itself won't enable MQ - this has to be done manually by the ASF Infra. But it'll make necessary changes on our side for that πŸ™‚ -- Thi

Re: [PR] fix bounds accumulator reset in HashJoinExec dynamic filter pushdown [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17371: URL: https://github.com/apache/datafusion/pull/17371#discussion_r2323370782 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -837,7 +842,6 @@ impl ExecutionPlan for HashJoinExec { )?, // Keep the dy

Re: [PR] fix: Validating object store configs should not throw exception [datafusion-comet]

2025-09-04 Thread via GitHub
codecov-commenter commented on PR #2308: URL: https://github.com/apache/datafusion-comet/pull/2308#issuecomment-3255368070 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2308?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix bounds accumulator reset in HashJoinExec dynamic filter pushdown [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17371: URL: https://github.com/apache/datafusion/pull/17371#discussion_r2323320054 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -837,7 +842,6 @@ impl ExecutionPlan for HashJoinExec { )?, // Keep the dy

Re: [I] Typo in the manual [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer closed issue #1176: Typo in the manual URL: https://github.com/apache/datafusion-python/issues/1176 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] fix bounds accumulator reset in HashJoinExec dynamic filter pushdown [datafusion]

2025-09-04 Thread via GitHub
rkrishn7 commented on code in PR #17371: URL: https://github.com/apache/datafusion/pull/17371#discussion_r2323274666 ## datafusion/physical-plan/src/joins/hash_join/exec.rs: ## @@ -837,7 +842,6 @@ impl ExecutionPlan for HashJoinExec { )?, // Keep the dy

Re: [I] `FileScanConfig::output_ordering` must be vector of optionals [datafusion]

2025-09-04 Thread via GitHub
destrex271 commented on issue #17354: URL: https://github.com/apache/datafusion/issues/17354#issuecomment-3255293786 @crepererum : Opened the PR https://github.com/apache/datafusion/pull/17423 . I have made as minimal changes possible while preserving the functionality and signatures of oth

[PR] fix: updated `output_ordering` in file_scan_config.rs to use Vec> instead of just Vec [datafusion]

2025-09-04 Thread via GitHub
destrex271 opened a new pull request, #17423: URL: https://github.com/apache/datafusion/pull/17423 ## Which issue does this PR close? - Closes #17354 ## What changes are included in this PR? - Updated the type of output_ordering in `FileScanConfig` to `Vec>` - Upda

Re: [PR] docs: update link to user example for custom table provider [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer commented on PR #1224: URL: https://github.com/apache/datafusion-python/pull/1224#issuecomment-3255202880 > I don’t intend for them to come across as nitpicks or anything. πŸ˜„ I greatly appreciate these corrections. -- This is an automated message from the Apache Git Servi

Re: [PR] feat: allow passing a slice to and expression with the [] indexing [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer commented on code in PR #1215: URL: https://github.com/apache/datafusion-python/pull/1215#discussion_r2323210208 ## python/tests/test_functions.py: ## @@ -494,6 +494,26 @@ def py_flatten(arr): lambda col: f.list_slice(col, literal(-1), literal(2)),

[I] Fix regressions in `CometToPrettyStringSuite` [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove opened a new issue, #2307: URL: https://github.com/apache/datafusion-comet/issues/2307 ### Describe the bug `CometToPrettyStringSuite` has not been running in CI, and the tests are failing. Once the test is passing it should be added to the PR workflows and the `dev/

Re: [PR] docs: update link to user example for custom table provider [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer merged PR #1224: URL: https://github.com/apache/datafusion-python/pull/1224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Invalid url example reference in documentation [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer closed issue #1223: Invalid url example reference in documentation URL: https://github.com/apache/datafusion-python/issues/1223 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] docs: fix CaseBuilder documentation example [datafusion-python]

2025-09-04 Thread via GitHub
timsaucer merged PR #1225: URL: https://github.com/apache/datafusion-python/pull/1225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Use return_field instead of return_type for calling aggregates via FFI [datafusion]

2025-09-04 Thread via GitHub
timsaucer commented on PR #17407: URL: https://github.com/apache/datafusion/pull/17407#issuecomment-3254873367 > lgtm thank @timsaucer WDYT about mentioning the change in `upgrading.md` ? I don't think there is any update anyone needs to do unless they have a unit test or something ca

[PR] feat(spark): Implement Spark functions `url_encode` and `url_decode` [datafusion]

2025-09-04 Thread via GitHub
anhvdq opened a new pull request, #17399: URL: https://github.com/apache/datafusion/pull/17399 ## Which issue does this PR close? - Part of #15914 ## Rationale for this change ## What changes are included in this PR? Implement Spark functions `url_encode` and `url_

[I] Improve some confusing fallback reasons [datafusion-comet]

2025-09-04 Thread via GitHub
wForget opened a new issue, #2300: URL: https://github.com/apache/datafusion-comet/issues/2300 ### What is the problem the feature request solves? > Comet cannot accelerate ProjectExec because: Comet is not enabled We do not need to add fallback info for spark plan node when Com

Re: [I] Any plan to support flink [datafusion-comet]

2025-09-04 Thread via GitHub
PHILO-HE commented on issue #1311: URL: https://github.com/apache/datafusion-comet/issues/1311#issuecomment-3252534814 Gluten's Flink support is still in its early stages. Much of the existing Velox code originally developed for Spark and Presto cannot be directly used for Flink. I expect

[PR] chore(deps): bump actions/stale from 9.1.0 to 10.0.0 [datafusion]

2025-09-04 Thread via GitHub
dependabot[bot] opened a new pull request, #17409: URL: https://github.com/apache/datafusion/pull/17409 Bumps [actions/stale](https://github.com/actions/stale) from 9.1.0 to 10.0.0. Release notes Sourced from https://github.com/actions/stale/releases";>actions/stale's releases.

Re: [PR] doc: Document caveats of `swap_inputs()` interface in join executors [datafusion]

2025-09-04 Thread via GitHub
alamb merged PR #17373: URL: https://github.com/apache/datafusion/pull/17373 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Extension metadata dropped in some queries (possibly involving subquries) [datafusion]

2025-09-04 Thread via GitHub
paleolimbot opened a new issue, #17422: URL: https://github.com/apache/datafusion/issues/17422 ### Describe the bug I haven't yet figured out exactly what part of this query causes the issue, but in the query: ```sql SELECT L.id l_id FROM L WHERE EXISTS (SELECT 1 FROM R W

Re: [PR] Use DataFusionError instead of ArrowError in FileOpenFuture [datafusion]

2025-09-04 Thread via GitHub
adriangb merged PR #17397: URL: https://github.com/apache/datafusion/pull/17397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Use DataFusionError instead of ArrowError in FileOpenFuture [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on PR #17397: URL: https://github.com/apache/datafusion/pull/17397#issuecomment-3254972431 I also spoke with @alamb about this change, he agreed it's a positive change and that the breakage should be minimal because (1) not many people are implementing `FileOpenFuture` an

Re: [PR] fix: Expose hash to FFI udf/udaf/udwf to fix their Eq [datafusion]

2025-09-04 Thread via GitHub
timsaucer commented on PR #17350: URL: https://github.com/apache/datafusion/pull/17350#issuecomment-3254937174 > If two different functions return same hash (e.g. `42`), will Eq return incorrect result in such case? I must be missing something - how is that any different than any of t

Re: [PR] feat: Support `FILTER` clause in aggregate window functions [datafusion]

2025-09-04 Thread via GitHub
geoffreyclaude commented on PR #17378: URL: https://github.com/apache/datafusion/pull/17378#issuecomment-3254277786 > Would be nice to have a test via DataFrame API if possible. Also for the proto, I think we can raise an issue so we can have it tracked in GitHub. > > Should we also u

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-09-04 Thread via GitHub
alamb commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3254532336 > I believe we do not need to bring CachePhysicalExec to DataFusion, we just need to provide a LogicalPlan::Cache. That makes sense to me and the idea of being able to spec

Re: [I] September 2025 ASF Board Report [datafusion]

2025-09-04 Thread via GitHub
xudong963 commented on issue #16259: URL: https://github.com/apache/datafusion/issues/16259#issuecomment-3253660868 The doc looks good to me, thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-09-04 Thread via GitHub
alamb commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3253647623 I think we should start preparing to make the release -- maybe we can try and test/finalize anything needed this week and shoot for making the release candidate late next week.

Re: [PR] fix: Fix regression in NativeConfigSuite [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove commented on code in PR #2299: URL: https://github.com/apache/datafusion-comet/pull/2299#discussion_r2322245615 ## spark/src/test/scala/org/apache/comet/objectstore/NativeConfigSuite.scala: ## @@ -119,15 +119,18 @@ class NativeConfigSuite extends AnyFunSuite with Match

[I] Aggregate function via FFI calls `return_type()` instead of `return_field()` [datafusion]

2025-09-04 Thread via GitHub
paleolimbot opened a new issue, #17400: URL: https://github.com/apache/datafusion/issues/17400 ### Describe the bug I'm not sure if this is intentional, but this code: https://github.com/apache/datafusion/blob/d83a290d1d534f7db9849b11c39d2b0a289a62e4/datafusion/ffi/src/udaf/mod

Re: [PR] fix: return ALL constants in `EquivalenceProperties::constants` [datafusion]

2025-09-04 Thread via GitHub
alamb commented on code in PR #17404: URL: https://github.com/apache/datafusion/pull/17404#discussion_r2322384067 ## datafusion/physical-expr/src/equivalence/properties/mod.rs: ## @@ -255,10 +255,11 @@ impl EquivalenceProperties { pub fn constants(&self) -> Vec { s

Re: [PR] Use DataFusionError instead of ArrowError in FileOpenFuture [datafusion]

2025-09-04 Thread via GitHub
comphead commented on code in PR #17397: URL: https://github.com/apache/datafusion/pull/17397#discussion_r2322969878 ## datafusion/core/tests/physical_optimizer/filter_pushdown/util.rs: ## @@ -344,7 +338,7 @@ impl TestStream { } impl Stream for TestStream { -type Item =

Re: [PR] Use DataFusionError instead of ArrowError in FileOpenFuture [datafusion]

2025-09-04 Thread via GitHub
comphead commented on code in PR #17397: URL: https://github.com/apache/datafusion/pull/17397#discussion_r2322968565 ## datafusion/datasource/src/file_stream.rs: ## @@ -345,7 +343,7 @@ impl RecordBatchStream for FileStream { /// A fallible future that resolves to a stream of

Re: [I] Support GROUP BY and DISTINCT with FixedSizeList values [datafusion]

2025-09-04 Thread via GitHub
adriangb closed issue #16442: Support GROUP BY and DISTINCT with FixedSizeList values URL: https://github.com/apache/datafusion/issues/16442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Extract complex default impls from AggregateUDFImpl trait [datafusion]

2025-09-04 Thread via GitHub
comphead commented on code in PR #17391: URL: https://github.com/apache/datafusion/pull/17391#discussion_r2322960477 ## datafusion/expr/src/udaf.rs: ## @@ -991,6 +779,259 @@ impl PartialOrd for dyn AggregateUDFImpl { } } +pub fn udaf_default_schema_name( +func: &F, +

[PR] Test grouping by FixedSizeList [datafusion]

2025-09-04 Thread via GitHub
findepi opened a new pull request, #17415: URL: https://github.com/apache/datafusion/pull/17415 Support for grouping by FixedSizeList values was added in Arrow. This adds a regression test in DataFusion for the SQL-level feature this unlocked. - closes https://github.com/apache/datafu

Re: [PR] Use a struct for ProjectionExpr [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17398: URL: https://github.com/apache/datafusion/pull/17398#discussion_r2322315202 ## datafusion/core/tests/physical_optimizer/limit_pushdown.rs: ## @@ -52,9 +52,18 @@ fn projection_exec( ) -> Result> { Ok(Arc::new(ProjectionExec::try_new(

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-09-04 Thread via GitHub
alamb commented on PR #17396: URL: https://github.com/apache/datafusion/pull/17396#issuecomment-3254826195 Replaced by - #17275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-09-04 Thread via GitHub
alamb closed pull request #17396: chore(deps): bump the arrow-parquet group with 7 updates URL: https://github.com/apache/datafusion/pull/17396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore(deps): bump the arrow-parquet group with 7 updates [datafusion]

2025-09-04 Thread via GitHub
dependabot[bot] commented on PR #17396: URL: https://github.com/apache/datafusion/pull/17396#issuecomment-3254823943 This pull request was built based on a group rule. Closing it will not ignore any of these versions in future pull requests. To ignore these dependencies, configure [ig

Re: [PR] fix: Remove duplicate filter from `CrossJoin` unparsing [datafusion]

2025-09-04 Thread via GitHub
alamb commented on code in PR #17382: URL: https://github.com/apache/datafusion/pull/17382#discussion_r2322953374 ## datafusion/sql/src/unparser/plan.rs: ## @@ -696,13 +696,6 @@ impl Unparser<'_> { join_filters.as_ref(), )?; -

Re: [PR] Use DataFusionError instead of ArrowError in FileOpenFuture [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on PR #17397: URL: https://github.com/apache/datafusion/pull/17397#issuecomment-3254817035 Thanks @comphead ! Addressed your feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] chore(deps): bump tonic from 0.13.1 to 0.14.2 [datafusion]

2025-09-04 Thread via GitHub
alamb closed pull request #17403: chore(deps): bump tonic from 0.13.1 to 0.14.2 URL: https://github.com/apache/datafusion/pull/17403 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] chore(deps): bump tonic from 0.13.1 to 0.14.2 [datafusion]

2025-09-04 Thread via GitHub
dependabot[bot] commented on PR #17403: URL: https://github.com/apache/datafusion/pull/17403#issuecomment-3254806668 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] chore(deps): bump tonic from 0.13.1 to 0.14.2 [datafusion]

2025-09-04 Thread via GitHub
alamb commented on PR #17403: URL: https://github.com/apache/datafusion/pull/17403#issuecomment-3254806485 This needs a new version of tonic which requires a new arrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Make EncryptionFactory trait async [datafusion]

2025-09-04 Thread via GitHub
alamb closed issue #17341: Make EncryptionFactory trait async URL: https://github.com/apache/datafusion/issues/17341 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [PR] feat: Make Parquet EncryptionFactory async [datafusion]

2025-09-04 Thread via GitHub
alamb commented on PR #17342: URL: https://github.com/apache/datafusion/pull/17342#issuecomment-3254804398 πŸš€ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] feat: Make Parquet EncryptionFactory async [datafusion]

2025-09-04 Thread via GitHub
alamb merged PR #17342: URL: https://github.com/apache/datafusion/pull/17342 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Fail to merge schema field for patitioned table with dict. [datafusion]

2025-09-04 Thread via GitHub
valkum opened a new issue, #17421: URL: https://github.com/apache/datafusion/issues/17421 ### Describe the bug I am trying out datafusion for some refactoring. I am testing with the following setup: ``` +-+

Re: [PR] Push down sorts into `TableScan` logical plan node [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2322907841 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -2525,6 +2525,8 @@ pub struct TableScan { pub filters: Vec, /// Optional number of rows to read

Re: [PR] fix: implement lazy evaluation in Coalesce function [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove commented on code in PR #2270: URL: https://github.com/apache/datafusion-comet/pull/2270#discussion_r2319898702 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -394,6 +394,20 @@ class CometExpressionSuite extends CometTestBase with AdaptiveS

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-09-04 Thread via GitHub
milenkovicm commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3254759427 Create memory table may work, input property may be used as lineage, table reference can capture cache id and session I'd. One thing I need to check if DDL statements

Re: [PR] fix: Fix regression in NativeConfigSuite [datafusion-comet]

2025-09-04 Thread via GitHub
andygrove merged PR #2299: URL: https://github.com/apache/datafusion-comet/pull/2299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Refactor TableProvider::scan into TableProvider::scan_with_args [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17336: URL: https://github.com/apache/datafusion/pull/17336#discussion_r2322914487 ## datafusion/catalog/src/table.rs: ## @@ -299,6 +317,68 @@ pub trait TableProvider: Debug + Sync + Send { } } +#[derive(Debug, Clone, Default)] +pub stru

Re: [PR] Refactor DataSourceExec::try_swapping_with_projection to simplify and remove abstraction leakage [datafusion]

2025-09-04 Thread via GitHub
adriangb commented on code in PR #17395: URL: https://github.com/apache/datafusion/pull/17395#discussion_r2320467543 ## datafusion/physical-plan/src/projection.rs: ## @@ -147,6 +147,8 @@ impl ProjectionExec { } } +pub type ProjectionExpr = (Arc, String); Review Comment:

  1   2   >