Re: [PR] chore: Split expression serde hash map into separate categories [datafusion-comet]

2025-09-08 Thread via GitHub
rishvin commented on PR #2322: URL: https://github.com/apache/datafusion-comet/pull/2322#issuecomment-3268588193 Looks good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: Fallback length function with binary input [datafusion-comet]

2025-09-08 Thread via GitHub
codecov-commenter commented on PR #2349: URL: https://github.com/apache/datafusion-comet/pull/2349#issuecomment-3268681463 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2349?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Implementing `From` for `sqlparser::ast::Statement` variants [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub
LucaCappelletti94 commented on issue #2020: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2020#issuecomment-3268966727 @iffyio could you kindly lmk your opinion on the matter before I start a PR? -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Added derive trait `Copy` to `OrderByOptions` struct [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub
iffyio merged PR #2021: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for ClickHouse CSE. [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub
pravic commented on code in PR #2024: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2024#discussion_r2332055186 ## src/parser/mod.rs: ## @@ -12260,6 +12260,27 @@ impl<'a> Parser<'a> { }) } +/// Parse a CTE or CSE. +pub fn parse_cte_or_cse(&

[PR] build: Fix CI? [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new pull request, #2353: URL: https://github.com/apache/datafusion-comet/pull/2353 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub
petern48 commented on issue #17480: URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268882942 > Good idea! Perhaps we can override `EXPLAIN ANALYZE` too? That's a good idea, too! I tried it in the cli, and `explain analyze` already overrides to `indent`. The code

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub
mbutrovich commented on code in PR #2325: URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2331486201 ## docs/source/user-guide/latest/datasources.md: ## @@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` Parquet scan implementations

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub
2010YOUY01 commented on issue #17480: URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268722339 Good idea! Perhaps we can override `EXPLAIN ANALYZE` too? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[PR] ignore [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new pull request, #2352: URL: https://github.com/apache/datafusion-comet/pull/2352 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] POC: `ClassicJoin` for PWMJ [datafusion]

2025-09-08 Thread via GitHub
jonathanc-n commented on code in PR #17482: URL: https://github.com/apache/datafusion/pull/17482#discussion_r2331953794 ## datafusion/sqllogictest/test_files/joins.slt: ## @@ -5161,6 +5178,44 @@ WHERE k1 < 0 +# PiecewiseMergeJoin Test +statement ok +set datafusion.exec

[PR] POC: `ClassicJoin` for PWMJ [datafusion]

2025-09-08 Thread via GitHub
jonathanc-n opened a new pull request, #17482: URL: https://github.com/apache/datafusion/pull/17482 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

[PR] Always use 'indent' format for explain verbose [datafusion]

2025-09-08 Thread via GitHub
petern48 opened a new pull request, #17481: URL: https://github.com/apache/datafusion/pull/17481 ## Which issue does this PR close? - Closes #17480 ## Rationale for this change `datafusion-cli` uses `tree` format by default. In order to get proper explain ver

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-08 Thread via GitHub
milenkovicm commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2331232678 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-08 Thread via GitHub
BlakeOrth commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3268395582 > Also, BTW tried it out but it doesn't seem to be working anymore @alamb I've found the bug and fixed this behavior. Although this is one of those scenarios where I'm somewh

Re: [I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub
petern48 commented on issue #17480: URL: https://github.com/apache/datafusion/issues/17480#issuecomment-3268694949 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[I] `EXPLAIN VERBOSE` only works when format is set to (non-default) 'indent' [datafusion]

2025-09-08 Thread via GitHub
petern48 opened a new issue, #17480: URL: https://github.com/apache/datafusion/issues/17480 ### Describe the bug On the `datafusion-cli`, `tree` was made the default explain format in [this PR](https://github.com/apache/datafusion/pull/15427). Now, when we use `EXPLAIN VERBOSE`, we s

[PR] chore: Add hdfs feature test job [datafusion-comet]

2025-09-08 Thread via GitHub
wForget opened a new pull request, #2350: URL: https://github.com/apache/datafusion-comet/pull/2350 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] feat: Make supported hadoop filesystem schemes configurable [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra merged PR #2272: URL: https://github.com/apache/datafusion-comet/pull/2272 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] fix: Fallback length function with non-string input [datafusion-comet]

2025-09-08 Thread via GitHub
wForget opened a new pull request, #2349: URL: https://github.com/apache/datafusion-comet/pull/2349 ## Which issue does this PR close? Closes #2338. ## Rationale for this change length function panic with binary input ## What changes are included in this PR

Re: [PR] feat: implement job data cleanup in pull-staged strategy #1219 [datafusion-ballista]

2025-09-08 Thread via GitHub
KR-bluejay commented on code in PR #1314: URL: https://github.com/apache/datafusion-ballista/pull/1314#discussion_r2331838811 ## ballista/executor/src/execution_loop.rs: ## @@ -88,8 +90,29 @@ pub async fn poll_loop match poll_work_result { Ok(result) =>

Re: [PR] Support csv truncated rows in datafusion [datafusion]

2025-09-08 Thread via GitHub
zhuqi-lucas merged PR #17465: URL: https://github.com/apache/datafusion/pull/17465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Support csv truncated rows in datafusion [datafusion]

2025-09-08 Thread via GitHub
zhuqi-lucas commented on PR #17465: URL: https://github.com/apache/datafusion/pull/17465#issuecomment-326862 Thank you @xudong963 , @alamb! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Various issues with Comet's handling of aggregates [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove commented on issue #2294: URL: https://github.com/apache/datafusion-comet/issues/2294#issuecomment-3268247088 duplicate of https://github.com/apache/datafusion-comet/issues/1267 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory [datafusion]

2025-09-08 Thread via GitHub
github-actions[bot] closed pull request #15727: test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory URL: https://github.com/apache/datafusion/pull/15727 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] fix: Incorrect memory accounting in `array_agg` function [datafusion]

2025-09-08 Thread via GitHub
github-actions[bot] closed pull request #16519: fix: Incorrect memory accounting in `array_agg` function URL: https://github.com/apache/datafusion/pull/16519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Statistics: Implement SampledDistribution variant to Distribution to … [datafusion]

2025-09-08 Thread via GitHub
github-actions[bot] closed pull request #16614: Statistics: Implement SampledDistribution variant to Distribution to … URL: https://github.com/apache/datafusion/pull/16614 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
comphead commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331760084 ## common/src/main/java/org/apache/comet/vector/CometSelectionVector.java: ## @@ -0,0 +1,279 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331698398 ## native/core/src/execution/operators/scan.rs: ## @@ -239,6 +239,87 @@ impl ScanExec { let mut timer = arrow_ffi_time.timer(); +// C

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-08 Thread via GitHub
SparkApplicationMaster commented on code in PR #17456: URL: https://github.com/apache/datafusion/pull/17456#discussion_r2331438737 ## datafusion/spark/src/function/map/map_from_arrays.rs: ## @@ -0,0 +1,207 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] chore(deps): bump wasm-bindgen-test from 0.3.50 to 0.3.51 [datafusion]

2025-09-08 Thread via GitHub
comphead merged PR #17470: URL: https://github.com/apache/datafusion/pull/17470 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] chore: Remove IcebergCometBatchReader.java [datafusion-comet]

2025-09-08 Thread via GitHub
comphead opened a new pull request, #2347: URL: https://github.com/apache/datafusion-comet/pull/2347 ## Which issue does this PR close? Remove unused code Closes #. ## Rationale for this change ## What changes are included in this PR? ##

Re: [PR] WIP: Upgrade to arrow 56.1.0 [datafusion]

2025-09-08 Thread via GitHub
alamb commented on PR #17275: URL: https://github.com/apache/datafusion/pull/17275#issuecomment-3267868833 I saw this @nuno-faria I hope to look at it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
hsiang-c commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331595304 ## common/src/main/java/org/apache/comet/vector/CometSelectionVector.java: ## @@ -0,0 +1,279 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
hsiang-c commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331580405 ## native/core/src/execution/operators/scan.rs: ## @@ -239,6 +239,87 @@ impl ScanExec { let mut timer = arrow_ffi_time.timer(); +// Check

Re: [PR] feat: Support log for Decimal128 and Decimal256 [datafusion]

2025-09-08 Thread via GitHub
theirix commented on code in PR #17023: URL: https://github.com/apache/datafusion/pull/17023#discussion_r2331486576 ## datafusion/functions/src/math/log.rs: ## @@ -121,55 +198,68 @@ impl ScalarUDFImpl for LogFunc { fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Re

Re: [PR] Auto detect hive column partitioning with ListingTableFactory / `CREATE EXTERNAL TABLE` [datafusion]

2025-09-08 Thread via GitHub
BlakeOrth commented on PR #17232: URL: https://github.com/apache/datafusion/pull/17232#issuecomment-3268288321 I've made just a couple very small changes to the docs with this most recent commit to fix a spelling error (I love the fact CI runs spell check here! :heart: ) and fixes a small c

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331548761 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -1,5 +1,5 @@ diff --git a/build.gradle b/build.gradle -index 7327b38..7967109 100644 Review Comment: Updated t

Re: [PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra commented on code in PR #2346: URL: https://github.com/apache/datafusion-comet/pull/2346#discussion_r2331543965 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -1,5 +1,5 @@ diff --git a/build.gradle b/build.gradle -index 7327b38..7967109 100644 Review Comment: Oops. I f

Re: [I] [iceberg] Storage Partition Join (SPJ) returns mismatch results [datafusion-comet]

2025-09-08 Thread via GitHub
hsiang-c closed issue #2119: [iceberg] Storage Partition Join (SPJ) returns mismatch results URL: https://github.com/apache/datafusion-comet/issues/2119 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] [iceberg] `Comet execution only takes Arrow Arrays, but got class org.apache.iceberg.spark.data.vectorized.ColumnVectorWithFilter` [datafusion-comet]

2025-09-08 Thread via GitHub
hsiang-c commented on issue #2117: URL: https://github.com/apache/datafusion-comet/issues/2117#issuecomment-3268262991 This issue should be fixed by https://github.com/apache/datafusion-comet/pull/2346 -- This is an automated message from the Apache Git Service. To respond to the message

[I] Add support for Spark 4.0.1 [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2345: URL: https://github.com/apache/datafusion-comet/issues/2345 ### What is the problem the feature request solves? Spark 4.0.1 has been released. We should update Comet to support this release instead of 4.0.0 since it contains many important bug and

[PR] feat: [iceberg] delete rows support using selection vectors [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra opened a new pull request, #2346: URL: https://github.com/apache/datafusion-comet/pull/2346 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2060 ## Rationale for this change Current Iceberg integration does not sup

Re: [I] Various issues with Comet's handling of aggregates [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove closed issue #2294: Various issues with Comet's handling of aggregates URL: https://github.com/apache/datafusion-comet/issues/2294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Update docs to explain that native_iceberg_compat uses the system CA certificates and not JVM key store [datafusion-comet]

2025-09-08 Thread via GitHub
mbutrovich closed issue #2310: Update docs to explain that native_iceberg_compat uses the system CA certificates and not JVM key store URL: https://github.com/apache/datafusion-comet/issues/2310 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove commented on code in PR #2325: URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2331493432 ## docs/source/user-guide/latest/datasources.md: ## @@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` Parquet scan implementations

Re: [I] Release Comet 0.10.0 (September 2025) [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove commented on issue #1970: URL: https://github.com/apache/datafusion-comet/issues/1970#issuecomment-3268191949 I plan on creating a release candidate later this week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove commented on code in PR #2325: URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2331483346 ## docs/source/user-guide/latest/datasources.md: ## @@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` Parquet scan implementations

Re: [PR] chore(deps): bump log from 0.4.27 to 0.4.28 in /native [datafusion-comet]

2025-09-08 Thread via GitHub
mbutrovich merged PR #2333: URL: https://github.com/apache/datafusion-comet/pull/2333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] chore(deps): bump cc from 1.2.35 to 1.2.36 in /native [datafusion-comet]

2025-09-08 Thread via GitHub
mbutrovich merged PR #2337: URL: https://github.com/apache/datafusion-comet/pull/2337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: Support log for Decimal128 and Decimal256 [datafusion]

2025-09-08 Thread via GitHub
theirix commented on code in PR #17023: URL: https://github.com/apache/datafusion/pull/17023#discussion_r2331472397 ## datafusion/functions/src/math/log.rs: ## @@ -58,21 +64,91 @@ impl Default for LogFunc { impl LogFunc { pub fn new() -> Self { -use DataType::*;

[I] Allow more control over enabling/disabling specific expressions [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2344: URL: https://github.com/apache/datafusion-comet/issues/2344 ### What is the problem the feature request solves? Currently, we have a single config `spark.comet.expression.allowIncompatible` which enables or disables all incompatible expressions.

Re: [I] [native_iceberg_compat] Add support for Parquet modular decryption [datafusion-comet]

2025-09-08 Thread via GitHub
parthchandra commented on issue #2339: URL: https://github.com/apache/datafusion-comet/issues/2339#issuecomment-3268157028 It would be nice if we can validate that native side parquet decryption support passes tests equivalent to the ones in `org.apache.parquet.crypto.TestPropertiesDrivenE

Re: [PR] Extract complex default impls from AggregateUDFImpl trait [datafusion]

2025-09-08 Thread via GitHub
alamb commented on PR #17391: URL: https://github.com/apache/datafusion/pull/17391#issuecomment-3266688062 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] feat(spark): implement Spark `map` function `map_from_arrays` [datafusion]

2025-09-08 Thread via GitHub
comphead commented on PR #17456: URL: https://github.com/apache/datafusion/pull/17456#issuecomment-3268150583 Thanks @SparkApplicationMaster I'll check this again later this week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Various issues with Comet support for HashAggregateExec [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #1267: URL: https://github.com/apache/datafusion-comet/issues/1267 ### What is the problem the feature request solves? Add support for distinct aggregates and enable "distinct" test in CometAggregateSuite - https://github.com/apache/datafusion-come

Re: [PR] docs: Add note about Root CA Certificate location with native scans [datafusion-comet]

2025-09-08 Thread via GitHub
mbutrovich commented on code in PR #2325: URL: https://github.com/apache/datafusion-comet/pull/2325#discussion_r2331454885 ## docs/source/user-guide/latest/datasources.md: ## @@ -175,6 +175,13 @@ The `native_datafusion` and `native_iceberg_compat` Parquet scan implementations

[I] Upgrade to DataFusion 50.0.0 [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2343: URL: https://github.com/apache/datafusion-comet/issues/2343 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- Th

Re: [PR] make `giscus` comment section opt-in to comply with ASF policy [datafusion-site]

2025-09-08 Thread via GitHub
kevinjqliu commented on PR #106: URL: https://github.com/apache/datafusion-site/pull/106#issuecomment-3267013012 thank you for the screenshot showing network traffic! I updated the PR with your suggestion -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] `DataFrame.cache()` does not work in distributed environments [datafusion]

2025-09-08 Thread via GitHub
milenkovicm commented on issue #17297: URL: https://github.com/apache/datafusion/issues/17297#issuecomment-3267800666 > > [datafusion/datafusion/core/src/execution/context/mod.rs](https://github.com/apache/datafusion/blob/fd7df66724f958a2d44ba1fda1b11dc6833f0296/datafusion/core/src/execution

Re: [PR] feature: sort by/cluster by/distribute by [datafusion]

2025-09-08 Thread via GitHub
alamb commented on PR #16310: URL: https://github.com/apache/datafusion/pull/16310#issuecomment-3267969860 Sadly I don't think I will have time ot reivew this feature for a while. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [D] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion Blog [datafusion-site]

2025-09-08 Thread via GitHub
GitHub user alamb added a comment to the discussion: Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion Blog The tests works! GitHub link: https://github.com/apache/datafusion-site/discussions/108#discussioncomment-1434304

Re: [PR] fix: modify the type coercion logic to avoid planning error [datafusion]

2025-09-08 Thread via GitHub
etolbakov commented on PR #17418: URL: https://github.com/apache/datafusion/pull/17418#issuecomment-3267677632 Hi @kosiew! Thanks a lot for the review! I've addressed all points! Please let me know what do you think? -- This is an automated message from the Apache Git Service. To re

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-08 Thread via GitHub
BlakeOrth commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3267718285 > Its also cool to see that it works on the local object store, but the output appears duplicated: @nuno-faria thanks for giving this a test run! Looking at this output I in

Re: [PR] feat: Support distributed plan in `EXPLAIN` command [datafusion-ballista]

2025-09-08 Thread via GitHub
milenkovicm commented on code in PR #1309: URL: https://github.com/apache/datafusion-ballista/pull/1309#discussion_r2331186349 ## ballista/client/tests/context_unsupported.rs: ## Review Comment: I guess this test should not panic, test should move to `context_checks.rs`.

Re: [I] Support aggregates and constant filters in `QUALIFY` [datafusion]

2025-09-08 Thread via GitHub
alamb closed issue #17210: Support aggregates and constant filters in `QUALIFY` URL: https://github.com/apache/datafusion/issues/17210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] Improve performance of `PartialOrd` for logical nodes [datafusion]

2025-09-08 Thread via GitHub
alamb opened a new issue, #17477: URL: https://github.com/apache/datafusion/issues/17477 `PartialOrd` / `PartialEq` are used during planning and thus affect planning performance. We found some improvements that are possible in - https://github.com/apache/datafusion/pull/17438

Re: [PR] Fix array types coercion: preserve child element nullability for list types [datafusion]

2025-09-08 Thread via GitHub
sgrebnov commented on PR #17306: URL: https://github.com/apache/datafusion/pull/17306#issuecomment-3267701179 Thank you @alamb and @joroKr21 for review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Unnest Correlated Subquery [datafusion]

2025-09-08 Thread via GitHub
duongcongtoai commented on PR #17110: URL: https://github.com/apache/datafusion/pull/17110#issuecomment-3267559977 PR to fix null propagation: https://github.com/irenjj/datafusion/pull/1/files -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[I] Custom authentication [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2341: URL: https://github.com/apache/datafusion-comet/issues/2341 ### What is the problem the feature request solves? # Custom Authentication & External File Systems *(Access hdfs/hadoop-aws via JNI)* ## 1. HDFS support via `fs-hdfs` - [x] Fo

Re: [PR] make `giscus` comment section opt-in to comply with ASF policy [datafusion-site]

2025-09-08 Thread via GitHub
alamb commented on PR #106: URL: https://github.com/apache/datafusion-site/pull/106#issuecomment-3267328816 > one step closer! thanks for the review thank you for pushing this forward -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] TakeOrderedAndProjectExec is not reporting all fallback reasons [datafusion-comet]

2025-09-08 Thread via GitHub
kazuyukitanimura closed issue #2311: TakeOrderedAndProjectExec is not reporting all fallback reasons URL: https://github.com/apache/datafusion-comet/issues/2311 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-08 Thread via GitHub
xanderbailey commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2329515274 ## datafusion/substrait/tests/cases/logical_plans.rs: ## @@ -144,6 +144,47 @@ mod tests { Ok(()) } +#[tokio::test] +async fn null_lite

[I] [native_iceberg_compat] Add support for custom authentication [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2340: URL: https://github.com/apache/datafusion-comet/issues/2340 ### What is the problem the feature request solves? This is mostly a documentation and testing tasks, since this is already implemented. ### Describe the potential solution _

[I] [native_iceberg_compat] Add support for Parquet modular decryption [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove opened a new issue, #2339: URL: https://github.com/apache/datafusion-comet/issues/2339 ### What is the problem the feature request solves? Placeholder. Details TBD. - Comet needs native KMS provider that can call into Spark via JNI ### Describe the potential sol

[PR] Add support for ClickHouse CSE. [datafusion-sqlparser-rs]

2025-09-08 Thread via GitHub
pravic opened a new pull request, #2024: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2024 https://clickhouse.com/docs/sql-reference/statements/select/with#common-scalar-expressions: ```sql WITH AS ``` fixes #1514. Unfortunately, this changes the publi

Re: [D] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion Blog [datafusion-site]

2025-09-08 Thread via GitHub
GitHub user giscus[bot] closed a discussion: Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion Blog # Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet - Apache DataFusion

Re: [I] Push down entire hash table from HashJoinExec into scans [datafusion]

2025-09-08 Thread via GitHub
alamb commented on issue #17171: URL: https://github.com/apache/datafusion/issues/17171#issuecomment-3267280510 > My counter argument to this would be that this is only a problem if the size of your build side ≈ the size of your probe side, but if that's the case you already probably have a

Re: [PR] make `giscus` comment section opt-in to comply with ASF policy [datafusion-site]

2025-09-08 Thread via GitHub
alamb merged PR #106: URL: https://github.com/apache/datafusion-site/pull/106 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] make `giscus` comment section opt-in to comply with ASF policy [datafusion-site]

2025-09-08 Thread via GitHub
alamb commented on PR #106: URL: https://github.com/apache/datafusion-site/pull/106#issuecomment-3267236037 Thanks again @kevinjqliu -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-09-08 Thread via GitHub
BlakeOrth commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3267229694 @alamb Thanks for the review! I'll take a look into why it's suddenly stopped working (or perhaps it's a "works on my machine" situation, which is also never good). > I thin

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub
djanderson commented on code in PR #103: URL: https://github.com/apache/datafusion-site/pull/103#discussion_r2330868489 ## content/blog/2025-09-10-dynamic-filters.md: ## @@ -0,0 +1,643 @@ +--- +layout: post +title: Dynamic Filters: Passing Information Between Operators During Ex

Re: [PR] Generalize struct-to-struct casting with CastOptions and SchemaAdapter integration [datafusion]

2025-09-08 Thread via GitHub
adriangb commented on PR #17468: URL: https://github.com/apache/datafusion/pull/17468#issuecomment-3267209382 Btw I approved but let's leave this up for another day or so to see if anyone else has feedback -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Generalize struct-to-struct casting with CastOptions and SchemaAdapter integration [datafusion]

2025-09-08 Thread via GitHub
adriangb commented on code in PR #17468: URL: https://github.com/apache/datafusion/pull/17468#discussion_r2330852818 ## datafusion/common/src/nested_struct.rs: ## @@ -215,40 +271,81 @@ mod tests { }; } +fn field(name: &str, data_type: DataType) -> Field { +

Re: [PR] feat: Implement `DFSchema.print_schema_tree()` method [datafusion]

2025-09-08 Thread via GitHub
comphead commented on PR #17459: URL: https://github.com/apache/datafusion/pull/17459#issuecomment-3267152875 > BTW I was wondering if this would be a better default Display for DFSchema, but it seems like there is already a default implementation `Display` provides more information n

Re: [PR] fix: modify the type coercion logic to avoid planning error [datafusion]

2025-09-08 Thread via GitHub
kosiew commented on code in PR #17418: URL: https://github.com/apache/datafusion/pull/17418#discussion_r2329411507 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -316,6 +321,17 @@ impl<'a> BinaryTypeCoercer<'a> { } } +#[inline] +fn is_both_null(lhs: &DataTy

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub
adriangb commented on code in PR #103: URL: https://github.com/apache/datafusion-site/pull/103#discussion_r2330804693 ## content/images/dynamic-filters/execution-time.svg: ## Review Comment: shared with you -- This is an automated message from the Apache Git Service. To

Re: [I] Simplify `col1 || 'a' || 'b' || col2` to `col1 || 'ab' || col2` [datafusion]

2025-09-08 Thread via GitHub
pepijnve commented on issue #17158: URL: https://github.com/apache/datafusion/issues/17158#issuecomment-3265646839 @alamb I had a quick look at the current implementation that handles this. I believe this case is now handled by `ConstEvaluator`. If I'm reading the code correctly the impleme

Re: [PR] Improve `PartialEq`, `Eq` speed for `LexOrdering`, make `PartialEq` and `PartialOrd` consistent [datafusion]

2025-09-08 Thread via GitHub
alamb commented on code in PR #17442: URL: https://github.com/apache/datafusion/pull/17442#discussion_r2330795717 ## datafusion/physical-expr-common/src/sort_expr.rs: ## @@ -367,8 +367,21 @@ impl LexOrdering { /// Creates a new [`LexOrdering`] from the given vector of sort

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-08 Thread via GitHub
adriangb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2330795001 ## datafusion/optimizer/src/push_down_sort.rs: ## @@ -0,0 +1,580 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Dynamic filters blog post (rev 2) [datafusion-site]

2025-09-08 Thread via GitHub
djanderson commented on code in PR #103: URL: https://github.com/apache/datafusion-site/pull/103#discussion_r2330769637 ## content/images/dynamic-filters/execution-time.svg: ## Review Comment: The figure description did make it clear, btw, but I got stuck on the figure its

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-09-08 Thread via GitHub
vbarua commented on code in PR #17299: URL: https://github.com/apache/datafusion/pull/17299#discussion_r2330752844 ## datafusion/substrait/src/logical_plan/consumer/rel/project_rel.rs: ## @@ -62,7 +62,17 @@ pub async fn from_project_rel( // to transform it into

Re: [PR] docs: Update supported expressions and operators in user guide [datafusion-comet]

2025-09-08 Thread via GitHub
comphead commented on code in PR #2327: URL: https://github.com/apache/datafusion-comet/pull/2327#discussion_r2330746423 ## docs/source/user-guide/latest/operators.md: ## @@ -22,16 +22,24 @@ The following Spark operators are currently replaced with native versions. Query stage

Re: [PR] fix: Support aggregate expressions in `QUALIFY` [datafusion]

2025-09-08 Thread via GitHub
alamb merged PR #17313: URL: https://github.com/apache/datafusion/pull/17313 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Support join cardinality estimation if distinct_count is set [datafusion]

2025-09-08 Thread via GitHub
jackkleeman opened a new pull request, #17476: URL: https://github.com/apache/datafusion/pull/17476 The goal of this PR is to allow cardinality statistics being passed through joins even if fields don't have max and min values set, as long as a distinct value estimate is provided. Cu

Re: [PR] [branch-50] fix: Implement AggregateUDFImpl::reverse_expr for StringAgg (#17165) [datafusion]

2025-09-08 Thread via GitHub
alamb commented on PR #17473: URL: https://github.com/apache/datafusion/pull/17473#issuecomment-3266991329 Thanks @comphead and @nuno-faria -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Push down preferred sorts into `TableScan` logical plan node [datafusion]

2025-09-08 Thread via GitHub
adriangb commented on code in PR #17337: URL: https://github.com/apache/datafusion/pull/17337#discussion_r2330741000 ## datafusion/optimizer/src/push_down_sort.rs: ## @@ -0,0 +1,580 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] docs: Update supported expressions and operators in user guide [datafusion-comet]

2025-09-08 Thread via GitHub
comphead commented on code in PR #2327: URL: https://github.com/apache/datafusion-comet/pull/2327#discussion_r2330739057 ## docs/source/user-guide/latest/datatypes.md: ## @@ -19,27 +19,29 @@ # Supported Spark Data Types Review Comment: when Comet says supported does it me

Re: [PR] docs: Use `sphinx-reredirects` for redirects [datafusion-comet]

2025-09-08 Thread via GitHub
andygrove merged PR #2324: URL: https://github.com/apache/datafusion-comet/pull/2324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] fix(SubqueryAlias): use maybe_project_redundant_column [datafusion]

2025-09-08 Thread via GitHub
notfilippo opened a new pull request, #17478: URL: https://github.com/apache/datafusion/pull/17478 ## Which issue does this PR close? - Closes #17405. ## Rationale for this change When creating nested `SubqueryAlias` operations in complex joins, DataFusion was incorrectl

  1   2   >