[I] The SQL Unparser `unproject_sort_expr` does not handle arbitrary expressions [datafusion]

2025-05-20 Thread via GitHub
phillipleblanc opened a new issue, #16126: URL: https://github.com/apache/datafusion/issues/16126 ### Describe the bug DataFusion turns aggregation computations from a LogicalPlan node into column references in the higher level plans. To illustrate, consider the following plan:

Re: [PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-20 Thread via GitHub
ding-young commented on PR #16081: URL: https://github.com/apache/datafusion/pull/16081#issuecomment-2896605817 Thank you @2010YOUY01 @jfahne . I'll address them by today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-20 Thread via GitHub
2010YOUY01 commented on code in PR #16081: URL: https://github.com/apache/datafusion/pull/16081#discussion_r2099336841 ## datafusion-cli/tests/cli_integration.rs: ## @@ -122,6 +122,42 @@ fn test_cli_format<'a>(#[case] format: &'a str) { assert_cmd_snapshot!(cmd); } +#[rs

Re: [PR] Set `TrackConsumersPool` as default in datafusion-cli [datafusion]

2025-05-20 Thread via GitHub
Copilot commented on code in PR #16081: URL: https://github.com/apache/datafusion/pull/16081#discussion_r2099329043 ## datafusion-cli/src/main.rs: ## @@ -169,9 +179,22 @@ async fn main_inner() -> Result<()> { if let Some(memory_limit) = args.memory_limit { // set m

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2099256280 ## src/parser/mod.rs: ## @@ -475,6 +475,10 @@ impl<'a> Parser<'a> { if expecting_statement_delimiter && word.keyword == Keyword::E

[PR] Phillip/250521 fix sort unproject unparser upstream [datafusion]

2025-05-20 Thread via GitHub
phillipleblanc opened a new pull request, #16127: URL: https://github.com/apache/datafusion/pull/16127 ## Which issue does this PR close? - Closes #16126 ## Rationale for this change DataFusion turns aggregation computations from a LogicalPlan node into column references

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2896488108 The illustration makes a lot of sense to me, thanks @aharpervc! I think we should be able to go with the current idea in this PR. Could you take a look at the conflicts in

Re: [PR] Handle optional datatypes properly in `CREATE FUNCTION` statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio merged PR #1826: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] pretty-print CREATE VIEW statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
iffyio merged PR #1855: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] [datafusion-spark] Implement `factorical` function [datafusion]

2025-05-20 Thread via GitHub
tlm365 opened a new pull request, #16125: URL: https://github.com/apache/datafusion/pull/16125 ## Which issue does this PR close? - Closes #16124 . ## Rationale for this change ## What changes are included in this PR? Implement spark-compatible `factori

[I] [datafusion-spark] Implement `factorial` function [datafusion]

2025-05-20 Thread via GitHub
tlm365 opened a new issue, #16124: URL: https://github.com/apache/datafusion/issues/16124 ### Is your feature request related to a problem or challenge? - Part of #15914 ### Describe the solution you'd like Implement spark-compatible [factorial](https://spark.apache.org

Re: [I] Move prepare/parameter handling tests into their own module [datafusion]

2025-05-20 Thread via GitHub
liamzwbao commented on issue #16056: URL: https://github.com/apache/datafusion/issues/16056#issuecomment-2896214930 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2896101349 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1761?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: Add support for `expm1` expression from `datafusion-spark` crate [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1711: URL: https://github.com/apache/datafusion-comet/pull/1711#issuecomment-2896081936 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1711?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2896038701 Perhaps this PR should also remove this item from the compatibility guide? ``` - Reading legacy INT96 timestamps contained within complex types can produce different

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2098991685 ## spark/src/test/scala/org/apache/comet/parquet/ParquetReadSuite.scala: ## @@ -1233,7 +1233,9 @@ abstract class ParquetReadSuite extends CometTestBas

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098980947 ## native/core/src/execution/planner.rs: ## @@ -1108,6 +1108,44 @@ impl PhysicalPlanner { .map(|expr| self.create_expr(expr, Arc::clon

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#issuecomment-2895978064 This should be ready for review now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098954205 ## datafusion/core/src/lib.rs: ## @@ -488,16 +488,16 @@ //! DataFusion automatically runs each plan with multiple CPU cores using //! a [Tokio] [`Runtime`] as a

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] commit 304488d3... (2025-02-05) broke JOIN ... USING("UPPERCASE_FIELD_NAME") [datafusion]

2025-05-20 Thread via GitHub
jfahne commented on issue #16120: URL: https://github.com/apache/datafusion/issues/16120#issuecomment-2895962861 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098957264 ## datafusion/core/src/lib.rs: ## @@ -311,9 +311,9 @@ //! ``` //! //! A [`TableProvider`] provides information for planning and -//! an [`ExecutionPlan`]s for

Re: [PR] chore: Use materialized data for filter pushdown tests [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16123: URL: https://github.com/apache/datafusion/pull/16123#discussion_r2098950624 ## datafusion/core/tests/parquet/filter_pushdown.rs: ## @@ -32,50 +32,41 @@ use arrow::compute::concat_batches; use arrow::record_batch::RecordBatch; use datafu

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
comphead commented on code in PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#discussion_r2098944769 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -41,36 +41,44 @@ import org.apache.comet.parquet.{CometParquetScan, SupportsComet}

[PR] chore: Use pre created data for filter pushdown tests [datafusion]

2025-05-20 Thread via GitHub
comphead opened a new pull request, #16123: URL: https://github.com/apache/datafusion/pull/16123 ## Which issue does this PR close? - Closes #. ## Rationale for this change When working on #16062 I found random data generators based on `random` crate are prone to hav

Re: [PR] chore: Upgrade rand crate and some other minor crates [datafusion]

2025-05-20 Thread via GitHub
comphead commented on PR #16062: URL: https://github.com/apache/datafusion/pull/16062#issuecomment-2895914995 depends on #16123 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098863135 ## native/core/src/execution/planner.rs: ## @@ -1108,6 +1108,44 @@ impl PhysicalPlanner { .map(|expr| self.create_expr(expr, Arc::clone

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
patrickcsullivan commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098899042 ## datafusion/core/src/lib.rs: ## @@ -617,8 +617,8 @@ //! The state required to execute queries is managed by the following //! structures: //! -//! 1.

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-05-20 Thread via GitHub
jonathanc-n commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2895840655 @2010YOUY01 I think i'lll try to get sql queries optimized into a right mark join after support for symmetric hash join + sort merge join. Right mark is equivalent to left

Re: [PR] Move PruningStatistics into datafusion::common [datafusion]

2025-05-20 Thread via GitHub
adriangb commented on PR #16069: URL: https://github.com/apache/datafusion/pull/16069#issuecomment-2895831498 Anything needed on my end to merge? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#issuecomment-2895804263 Open for review again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on code in PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#discussion_r2098855366 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -229,7 +232,13 @@ public NativeBatchReader(AbstractColumnReader[] columnRe

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895798125 Ouch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Shift from Field to FieldRef for all user defined functions [datafusion]

2025-05-20 Thread via GitHub
timsaucer opened a new pull request, #16122: URL: https://github.com/apache/datafusion/pull/16122 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16121 ## Rationale for this change With the switch from `DataType` to `Field` we may h

Re: [PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
findepi commented on code in PR #16119: URL: https://github.com/apache/datafusion/pull/16119#discussion_r2098815968 ## datafusion/core/src/lib.rs: ## @@ -617,8 +617,8 @@ //! The state required to execute queries is managed by the following //! structures: //! -//! 1. [`Sessio

Re: [PR] Clean up ExternalSorter and use upstream kernel [datafusion]

2025-05-20 Thread via GitHub
Dandandan merged PR #16109: URL: https://github.com/apache/datafusion/pull/16109 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-20 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2895687205 @kosiew that sounds good but please keep this working branch around on the `schema-adapter` branch. Maybe you can cherry-pick the changes onto a new branch when you break these

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#issuecomment-2895677097 > Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. Th

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098749560 ## native/core/src/execution/operators/copy.rs: ## @@ -148,7 +148,7 @@ impl ExecutionPlan for CopyExec { } fn statistics(&self) -> DataFusionResu

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098749131 ## native/core/src/execution/planner.rs: ## @@ -884,7 +884,7 @@ impl PhysicalPlanner { func_name, fun_expr,

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098732085 ## native/core/src/parquet/schema_adapter.rs: ## @@ -226,6 +227,13 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::try_new_

Re: [PR] build: disable doctests in miri workflow [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove closed pull request #1749: build: disable doctests in miri workflow URL: https://github.com/apache/datafusion-comet/pull/1749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] build: disable doctests in miri workflow [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1749: URL: https://github.com/apache/datafusion-comet/pull/1749#issuecomment-2895640145 This fix got merged as part of https://github.com/apache/datafusion-comet/pull/1746 so I will close this PR. Thanks for the review @kazuyukitanimura. -- This is an automate

[I] commit 304488d3... (2025-02-05) broke JOIN ... USING("UPPERCASE_FIELD_NAME") [datafusion]

2025-05-20 Thread via GitHub
brunal opened a new issue, #16120: URL: https://github.com/apache/datafusion/issues/16120 ### Describe the bug Commit https://github.com/apache/datafusion/commit/304488d348ad2c952ce24f93064a81046155da79 updated sqlparser (0.53->0.54) and updated datafusion source for it. It b

Re: [PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1761: URL: https://github.com/apache/datafusion-comet/pull/1761#discussion_r2098673404 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -220,12 +220,7 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpar

[PR] chore: Reenable nested types for CometFuzzTestSuite with int96 [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich opened a new pull request, #1761: URL: https://github.com/apache/datafusion-comet/pull/1761 Builds on #1710. ## Which issue does this PR close? Closes #. ## Rationale for this change https://github.com/apache/datafusion/pull/16058 Fixed s

Re: [I] Schema contains qualified field name left."concat('a', 'b')" and unqualified field name "concat('a', 'b')" which would be ambiguous [datafusion]

2025-05-20 Thread via GitHub
LiaCastaneda commented on issue #16114: URL: https://github.com/apache/datafusion/issues/16114#issuecomment-2895490377 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] pipe column orderings into pruning predicate creation [datafusion]

2025-05-20 Thread via GitHub
etseidl commented on code in PR #15821: URL: https://github.com/apache/datafusion/pull/15821#discussion_r2098666984 ## datafusion/physical-optimizer/src/pruning.rs: ## @@ -1566,6 +1599,50 @@ fn build_predicate_expression( return expr; } +// Special handlng fo

Re: [I] Schema contains qualified field name left."concat('a', 'b')" and unqualified field name "concat('a', 'b')" which would be ambiguous [datafusion]

2025-05-20 Thread via GitHub
LiaCastaneda commented on issue #16114: URL: https://github.com/apache/datafusion/issues/16114#issuecomment-2895491802 goinf to try a solution I have in mind, it should be (hopefully) quite straightforward -- This is an automated message from the Apache Git Service. To respond to the mess

[PR] docs: Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
patrickcsullivan opened a new pull request, #16119: URL: https://github.com/apache/datafusion/pull/16119 ## Which issue does this PR close? - Closes #16118. ## What changes are included in this PR? Fixes various minor types and grammatical issues in the Architecture docs.

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#issuecomment-2895481116 @parthchandra @mbutrovich This is ready for review now. I don't know if we want to keep in draft until more complete or merge and iterate. I also did not make auto the default

Re: [PR] feat: Add auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2098652013 ## docs/source/user-guide/compatibility.md: ## @@ -29,12 +29,6 @@ Comet aims to provide consistent results with the version of Apache Spark that i This g

[I] Fix typos and minor grammatical issues in Architecture docs [datafusion]

2025-05-20 Thread via GitHub
patrickcsullivan opened a new issue, #16118: URL: https://github.com/apache/datafusion/issues/16118 While reading the [architecture docs](https://docs.rs/datafusion/latest/datafusion/#architecture) for the first time I noticed a few typos and minor grammatical issues. Example:

Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub
GitHub user Epicism added a comment to the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210286 This is an automatically sent email for github@datafusi

Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub
GitHub user Epicism deleted a comment on the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automaticall

Re: [D] Indexing Support in DataFusion? [datafusion]

2025-05-20 Thread via GitHub
GitHub user Epicism added a comment to the discussion: Indexing Support in DataFusion? This is amazing! I can't wait to go through this. > Message ID: github.com> > GitHub link: https://github.com/apache/datafusion/discussions/9963#discussioncomment-13210271 This is an automatically

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895355782 `SQLConf.PARQUET_FIELD_ID_READ_ENABLED` is enabled in **all** Spark tests, so not sure what to do about this now. -- This is an automated message from the Apache Git Service

Re: [PR] chore: Add `scanImpl` attribute to `CometScanExec` [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1746: URL: https://github.com/apache/datafusion-comet/pull/1746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Enable more complex type tests [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1753: URL: https://github.com/apache/datafusion-comet/pull/1753#issuecomment-2895323808 Late approval. Thanks for the new tests! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] Docs: Setup Comet on IntelliJ [datafusion-comet]

2025-05-20 Thread via GitHub
coderfender opened a new pull request, #1760: URL: https://github.com/apache/datafusion-comet/pull/1760 ## Which issue does this PR close? Closes#1759 Closes #. ## Rationale for this change Docs updated to set comet and spark on IntelliJ and open up debugging

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098534086 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2327,18 +2346,18 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098532793 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -99,6 +100,56 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSpark

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098530916 ## spark/src/test/scala/org/apache/comet/CometFuzzTestSuite.scala: ## @@ -99,6 +100,56 @@ class CometFuzzTestSuite extends CometTestBase with AdaptiveSparkP

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098530199 ## native/core/src/parquet/schema_adapter.rs: ## @@ -196,15 +198,43 @@ impl SchemaMapper for SchemaMapping { // go through each field in the pr

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895301640 Pre-emptively approving this. We can defer field_id support for native readers for the time being. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#issuecomment-2895297102 Is it possible to have arrow-rs 55.1.0 in datafusion 48.0.0.? A performance improvement went in for int8/int16 which was as a result of the unsigned int issues we raised. T

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098529235 ## native/core/src/parquet/parquet_support.rs: ## @@ -60,9 +60,6 @@ pub struct SparkParquetOptions { pub allow_incompat: bool, /// Support casting

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098527578 ## native/core/src/parquet/mod.rs: ## @@ -715,6 +715,7 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat fil

Re: [PR] fix: default values for experimental native_datafusion scan [datafusion-comet]

2025-05-20 Thread via GitHub
mbutrovich commented on code in PR #1756: URL: https://github.com/apache/datafusion-comet/pull/1756#discussion_r2098528439 ## native/core/src/parquet/parquet_exec.rs: ## @@ -61,12 +63,14 @@ pub(crate) fn init_datasource_exec( file_groups: Vec>, projection_vector: Optio

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
codecov-commenter commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895250157 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1757?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: make error handling in indent explain consistent with that in tree [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16097: URL: https://github.com/apache/datafusion/pull/16097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): bump testcontainers from 0.23.3 to 0.24.0 [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #15989: URL: https://github.com/apache/datafusion/pull/15989 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 [datafusion]

2025-05-20 Thread via GitHub
alamb closed pull request #16107: chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 URL: https://github.com/apache/datafusion/pull/16107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore(deps): bump testcontainers-modules from 0.11.6 to 0.12.0 [datafusion]

2025-05-20 Thread via GitHub
dependabot[bot] commented on PR #16107: URL: https://github.com/apache/datafusion/pull/16107#issuecomment-2895243129 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [I] mac CI tests failing with `fatal runtime error: stack overflow` [datafusion]

2025-05-20 Thread via GitHub
alamb commented on issue #16117: URL: https://github.com/apache/datafusion/issues/16117#issuecomment-2895171572 Weird but some CI runs on main also pass: https://github.com/apache/datafusion/actions/runs/15142926577/job/42571476975 -- This is an automated message from the Apache Git Serv

Re: [PR] Support metadata on scalar values [datafusion]

2025-05-20 Thread via GitHub
tobixdev commented on code in PR #16053: URL: https://github.com/apache/datafusion/pull/16053#discussion_r2098428747 ## datafusion/physical-expr/src/expressions/literal.rs: ## @@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval; use datafusion_expr_com

Re: [PR] Minor: Add `ScalarFunctionArgs::return_type` method [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16113: URL: https://github.com/apache/datafusion/pull/16113#issuecomment-2895156844 FYI @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Support metadata on scalar values [datafusion]

2025-05-20 Thread via GitHub
tobixdev commented on code in PR #16053: URL: https://github.com/apache/datafusion/pull/16053#discussion_r2098428747 ## datafusion/physical-expr/src/expressions/literal.rs: ## @@ -34,15 +36,37 @@ use datafusion_expr_common::interval_arithmetic::Interval; use datafusion_expr_com

Re: [PR] pretty-print CREATE TABLE statements [datafusion-sqlparser-rs]

2025-05-20 Thread via GitHub
alamb merged PR #1854: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[I] mac CI tests failing with `fatal runtime error: stack overflow` [datafusion]

2025-05-20 Thread via GitHub
alamb opened a new issue, #16117: URL: https://github.com/apache/datafusion/issues/16117 ### Describe the bug CI is failing on main and some PRs https://github.com/apache/datafusion/actions/runs/15138796927/job/42557036842 ``` thread 'tokio-runtime-worker' has over

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #16033: URL: https://github.com/apache/datafusion/pull/16033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Use qualified names on DELETE selections [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #16033: URL: https://github.com/apache/datafusion/pull/16033#issuecomment-2895133818 The CI failure is unrelated to this PR - https://github.com/apache/datafusion/issues/16117 So merging in. Thanks @nuno-faria and @comphead -- This is an automated m

Re: [I] Type coercion does not handle `Float16` correctly [datafusion]

2025-05-20 Thread via GitHub
alamb closed issue #15815: Type coercion does not handle `Float16` correctly URL: https://github.com/apache/datafusion/issues/15815 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] fix: Add coercion rules for Float16 types [datafusion]

2025-05-20 Thread via GitHub
alamb merged PR #15816: URL: https://github.com/apache/datafusion/pull/15816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098402731 ## native/core/src/execution/planner.rs: ## @@ -884,7 +884,7 @@ impl PhysicalPlanner { func_name, fun_expr,

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098399653 ## native/core/src/parquet/schema_adapter.rs: ## @@ -226,6 +227,13 @@ impl SchemaMapper for SchemaMapping { let record_batch = RecordBatch::t

Re: [PR] fix: correct schema type checking in native_iceberg_compat [datafusion-comet]

2025-05-20 Thread via GitHub
parthchandra commented on PR #1755: URL: https://github.com/apache/datafusion-comet/pull/1755#issuecomment-2895101624 Thanks for the initial review Andy. I've changed this to draft while I investigate the ci failures. -- This is an automated message from the Apache Git Service. To respon

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098377650 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -137,7 +137,7 @@ impl ExecutionPlan for ShuffleWriterExec { } fn statistics(&self)

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#issuecomment-2895073250 This may make too many tests fall back because Spark may be enabling this by default in all tests ... investigating -- This is an automated message from the Apache Git Servi

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
comphead commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098373625 ## native/core/src/execution/shuffle/shuffle_writer.rs: ## @@ -137,7 +137,7 @@ impl ExecutionPlan for ShuffleWriterExec { } fn statistics(&self) -

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
comphead commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098371824 ## native/core/src/execution/planner.rs: ## @@ -884,7 +884,7 @@ impl PhysicalPlanner { func_name, fun_expr,

Re: [PR] chore: Prepare for DataFusion 48.0.0 [datafusion-comet]

2025-05-20 Thread via GitHub
comphead commented on code in PR #1710: URL: https://github.com/apache/datafusion-comet/pull/1710#discussion_r2098368603 ## native/core/src/execution/operators/copy.rs: ## @@ -148,7 +148,7 @@ impl ExecutionPlan for CopyExec { } fn statistics(&self) -> DataFusionResul

Re: [PR] chore: Enable more complex type tests [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove merged PR #1753: URL: https://github.com/apache/datafusion-comet/pull/1753 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[I] [native_datafusion] PARQUET_FIELD_ID_READ_ENABLED is not respected [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove opened a new issue, #1758: URL: https://github.com/apache/datafusion-comet/issues/1758 ### Describe the bug There are a number of Spark SQL tests failing related to field ids when reading Parquet files when we use native_datafusion / native_iceberg_compat because these new

Re: [PR] Improve the DML / DDL Documentation [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16115: URL: https://github.com/apache/datafusion/pull/16115#discussion_r2098353174 ## datafusion/expr/src/logical_plan/dml.rs: ## @@ -89,8 +89,27 @@ impl Hash for CopyTo { } } -/// The operator that modifies the content of a database (ad

Re: [PR] Improve the DML / DDL Documentation [datafusion]

2025-05-20 Thread via GitHub
comphead commented on code in PR #16115: URL: https://github.com/apache/datafusion/pull/16115#discussion_r2098352617 ## datafusion/expr/src/logical_plan/dml.rs: ## @@ -89,8 +89,27 @@ impl Hash for CopyTo { } } -/// The operator that modifies the content of a database (ad

Re: [PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove commented on code in PR #1757: URL: https://github.com/apache/datafusion-comet/pull/1757#discussion_r2098328838 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -41,36 +41,44 @@ import org.apache.comet.parquet.{CometParquetScan, SupportsComet}

[PR] fix: Fallback to Spark when PARQUET_FIELD_ID_READ_ENABLED=true for new native scans [datafusion-comet]

2025-05-20 Thread via GitHub
andygrove opened a new pull request, #1757: URL: https://github.com/apache/datafusion-comet/pull/1757 ## Which issue does this PR close? N/A ## Rationale for this change Fix some Spark SQL test failures when the new native scans are enabled ## What

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894976792 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark sort_tpch.json ┏━━━

Re: [PR] Perf: Optimize in memory sort [datafusion]

2025-05-20 Thread via GitHub
alamb commented on PR #15380: URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2894965327 🤖: Benchmark completed Details ``` Comparing HEAD and concat_batches_for_sort Benchmark clickbench_extended.json --

  1   2   >