Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1605955848 ## docs/source/user-guide/expressions.md: ## @@ -304,6 +304,16 @@ select log(-1), log(0), sqrt(-1); | rollup(exprs)

Re: [PR] Improve signature of `get_field` function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 merged PR #10569: URL: https://github.com/apache/datafusion/pull/10569 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Improve signature of `get_field` is function [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 closed issue #10566: Improve signature of `get_field` is function URL: https://github.com/apache/datafusion/issues/10566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

[PR] Minor: Move group accumulator for aggregate function to physical-expr-common, and add ahash physical-expr-common [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 opened a new pull request, #10574: URL: https://github.com/apache/datafusion/pull/10574 ## Which issue does this PR close? Closes #. ## Rationale for this change 1. add ahash for common, used for distinct count accumulator #10484 2. move other g

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605950326 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: Can be a bug after the JSON path parse changes. -- This is an automated message from

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya merged PR #447: URL: https://github.com/apache/datafusion-comet/pull/447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
backkem commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2119108342 Yes, these are basically the same object. The one in DataFusion was put there temporarily until the trait extension in the sqlparser repo is landed and pushed to crates.io. --

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on PR #447: URL: https://github.com/apache/datafusion-comet/pull/447#issuecomment-2119108275 Merged. Thanks @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters [datafusion-comet]

2024-05-18 Thread via GitHub
viirya closed issue #448: CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters URL: https://github.com/apache/datafusion-comet/issues/448 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605947314 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: New failure: ``` Running "array.slt" External error: query failed: DataFusio

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-18 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605946538 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -689,7 +689,7 @@ select column1, column2, column3, column4, column5 from nested_arrays; # values table

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2119066019 As the mentioned in `dialect.rs` https://github.com/apache/datafusion/blob/e7858ff0ab1c282ab46bd93cabc3dc83db583165/datafusion/sql/src/unparser/dialect.rs#L19 I think

[PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-18 Thread via GitHub
goldmedal opened a new pull request, #10573: URL: https://github.com/apache/datafusion/pull/10573 ## Which issue does this PR close? Closes #10557 ## Rationale for this change ## What changes are included in this PR? Only implement the default dialect in this PR.

[PR] Draft: Add pyi stubs for type hinting [datafusion-python]

2024-05-18 Thread via GitHub
timsaucer opened a new pull request, #709: URL: https://github.com/apache/datafusion-python/pull/709 # Which issue does this PR close? This PR does not close an issue, but it aims to address part of the discussion in https://github.com/apache/datafusion-python/issues/440 . This takes

Re: [I] select multiple columns in a single `Expr` [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on issue #10102: URL: https://github.com/apache/datafusion/issues/10102#issuecomment-2119060296 I didn't find equivalent behavior in postgres. I'm not sure should we support this kind of `returns subset of columns based on column name matching` -- This is an automated

Re: [PR] Improve round-robin repartitioning [datafusion]

2024-05-18 Thread via GitHub
github-actions[bot] closed pull request #6047: Improve round-robin repartitioning URL: https://github.com/apache/datafusion/pull/6047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] `select array_concat([])` panicked [datafusion]

2024-05-18 Thread via GitHub
jayzhan211 commented on issue #10200: URL: https://github.com/apache/datafusion/issues/10200#issuecomment-2119054943 Actually, I'm thinking about whether we should change the behavior of array_concat similar to postgres and duckdb. It is one of the earliest array functions that we don't f

Re: [PR] feat: Supports UUID column [datafusion-comet]

2024-05-18 Thread via GitHub
huaxingao commented on code in PR #395: URL: https://github.com/apache/datafusion-comet/pull/395#discussion_r1605917689 ## common/src/main/java/org/apache/comet/parquet/CometParquetToSparkSchemaConverter.scala: ## @@ -0,0 +1,403 @@ +/* + * Licensed to the Apache Software Foundat

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-18 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2119039796 Thanks for your update! I'll work on the tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on code in PR #447: URL: https://github.com/apache/datafusion-comet/pull/447#discussion_r1605914565 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v2_7/q5a/explain.txt: ## @@ -72,70 +72,16 @@ TakeOrderedAndProject (137) :

[PR] feat: add hex scalar function [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck opened a new pull request, #449: URL: https://github.com/apache/datafusion-comet/pull/449 ## Which issue does this PR close? Related to https://github.com/apache/datafusion-comet/issues/341. ## Rationale for this change I recently added `unhex` so this PR adds `he

[I] CometNativeExec.doCanonicalize should canonicalize SparkPlan in Product parameters [datafusion-comet]

2024-05-18 Thread via GitHub
viirya opened a new issue, #448: URL: https://github.com/apache/datafusion-comet/issues/448 ### Describe the bug `SparkPlan.doCanonicalize` default implementation canonicalizes expressions in Product parameters, but not for `SparkPlan` because derived classes in Spark doesn't have su

[PR] fix: fix CometNativeExec.doCanonicalize for ReusedExchangeExec [datafusion-comet]

2024-05-18 Thread via GitHub
viirya opened a new pull request, #447: URL: https://github.com/apache/datafusion-comet/pull/447 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1605893848 ## docs/source/contributor-guide/adding_a_new_expression.md: ## @@ -0,0 +1,212 @@ + + +# Adding a Expression + +There are a number of Spark expression that are n

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-18 Thread via GitHub
tshauck commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1605893537 ## docs/source/contributor-guide/adding_a_new_expression.md: ## @@ -0,0 +1,212 @@ + + +# Adding a Expression + +There are a number of Spark expression that are n

[PR] build(deps): bump prost-types from 0.12.3 to 0.12.6 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #708: URL: https://github.com/apache/datafusion-python/pull/708 Bumps [prost-types](https://github.com/tokio-rs/prost) from 0.12.3 to 0.12.6. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6c626ceb0dac5fda0edcb41";>d4

[PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #707: URL: https://github.com/apache/datafusion-python/pull/707 Bumps [object_store](https://github.com/apache/arrow-rs) from 0.9.1 to 0.10.1. Changelog Sourced from https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md";>object_

[PR] build(deps): bump syn from 2.0.63 to 2.0.64 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #706: URL: https://github.com/apache/datafusion-python/pull/706 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.63 to 2.0.64. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.64 Su

[PR] build(deps): bump prost from 0.12.4 to 0.12.6 [datafusion-python]

2024-05-18 Thread via GitHub
dependabot[bot] opened a new pull request, #705: URL: https://github.com/apache/datafusion-python/pull/705 Bumps [prost](https://github.com/tokio-rs/prost) from 0.12.4 to 0.12.6. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6c626ceb0dac5fda0edcb41";>d42c85e

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-18 Thread via GitHub
codecov-commenter commented on PR #445: URL: https://github.com/apache/datafusion-comet/pull/445#issuecomment-2118934008 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/445?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] feat: Add HashJoin support for BuildRight [datafusion-comet]

2024-05-18 Thread via GitHub
codecov-commenter commented on PR #437: URL: https://github.com/apache/datafusion-comet/pull/437#issuecomment-2118931077 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/437?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[I] Row groups are read out of order or with completely different values [datafusion]

2024-05-18 Thread via GitHub
twitu opened a new issue, #10572: URL: https://github.com/apache/datafusion/issues/10572 ### Describe the bug Datafusion is reading row groups out of order and sometimes with completely different values for the row groups. The data is verified by reading the same files using the Pyth

Re: [PR] Minor: Move proxy to datafusion common [datafusion]

2024-05-18 Thread via GitHub
comphead commented on code in PR #10561: URL: https://github.com/apache/datafusion/pull/10561#discussion_r1605851167 ## datafusion/functions/Cargo.toml: ## @@ -74,7 +74,7 @@ datafusion-common = { workspace = true } datafusion-execution = { workspace = true } datafusion-expr =

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-18 Thread via GitHub
andygrove commented on code in PR #445: URL: https://github.com/apache/datafusion-comet/pull/445#discussion_r1605843645 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -261,7 +261,10 @@ abstract class CometTestBase } val extendedInfo = ne

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#issuecomment-2118898041 Merged. Thanks @ceppelli @kazuyukitanimura @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] compatibility issue with AWS EMR 6.15.0 SPARK 3.4.1 [datafusion-comet]

2024-05-18 Thread via GitHub
viirya closed issue #411: compatibility issue with AWS EMR 6.15.0 SPARK 3.4.1 URL: https://github.com/apache/datafusion-comet/issues/411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-18 Thread via GitHub
viirya merged PR #412: URL: https://github.com/apache/datafusion-comet/pull/412 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-18 Thread via GitHub
codecov-commenter commented on PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#issuecomment-2118876243 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/412?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2118875053 > https://github.com/sqlparser-rs/sqlparser-rs/blob/54184460b5d873a67c2801e8b7c6e4f145bc65df/src/dialect/mod.rs#L113-L116 > > The dialect specific implementations just n

Re: [PR] fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on PR #441: URL: https://github.com/apache/datafusion-comet/pull/441#issuecomment-2118872698 Merged. Thanks @kazuyukitanimura @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule [datafusion-comet]

2024-05-18 Thread via GitHub
viirya merged PR #441: URL: https://github.com/apache/datafusion-comet/pull/441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] `CometBroadcastExchangeExec` cannot be reused by Spark `ReuseExchangeAndSubquery` rule [datafusion-comet]

2024-05-18 Thread via GitHub
viirya closed issue #439: `CometBroadcastExchangeExec` cannot be reused by Spark `ReuseExchangeAndSubquery` rule URL: https://github.com/apache/datafusion-comet/issues/439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-18 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605813002 ## tpch/queries/q15.sql: ## @@ -0,0 +1,33 @@ +-- SQLBench-H query 15 derived from TPC-H query 15 under the terms of the TPC Fair Use Policy. +-- TPC-H queries

[PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-18 Thread via GitHub
andygrove opened a new pull request, #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-18 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605813002 ## tpch/queries/q15.sql: ## @@ -0,0 +1,33 @@ +-- SQLBench-H query 15 derived from TPC-H query 15 under the terms of the TPC Fair Use Policy. +-- TPC-H queries

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#issuecomment-2118865381 I take the liberty to commit some suggestions on code comment and style as it is not responded for days. I will merge this once CI passes. -- This is an automated message from the

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-18 Thread via GitHub
timsaucer commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2118864412 Great! I've rebased @alamb 's branch and added the changes I suggested. I was about to start testing the code and then I was going to write up the unit tests. My work in progres

Re: [PR] fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule [datafusion-comet]

2024-05-18 Thread via GitHub
viirya commented on code in PR #441: URL: https://github.com/apache/datafusion-comet/pull/441#discussion_r1605809454 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -576,11 +576,13 @@ class CometSparkSessionExtensions // exchange. It is

Re: [I] datafusion-cli not installed [datafusion]

2024-05-18 Thread via GitHub
MohamedAbdeen21 commented on issue #9294: URL: https://github.com/apache/datafusion/issues/9294#issuecomment-2118859061 Hey @l1t1, as per Andy's comments on #9452, datafusion-cli releases should be handled in the python repo. -- This is an automated message from the Apache Git Service. To

Re: [I] Make it easier to create WindowFunctions with the Expr API [datafusion]

2024-05-18 Thread via GitHub
shanretoo commented on issue #6747: URL: https://github.com/apache/datafusion/issues/6747#issuecomment-2118840627 I am willing to help with this task. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Implement a way to preserve partitioning through `UnionExec` without losing ordering [datafusion]

2024-05-18 Thread via GitHub
xinlifoobar commented on issue #10314: URL: https://github.com/apache/datafusion/issues/10314#issuecomment-2118825197 Hi @alamb, I am trying to work on this. I am not very familiar on the `InterleaveExec` in the optimizer. As initial thought, the interleaveExec is acting as a **Repart

[I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-18 Thread via GitHub
LorrensP-2158466 opened a new issue, #10571: URL: https://github.com/apache/datafusion/issues/10571 ### Is your feature request related to a problem or challenge? This is really a feature request but more of a question. Currently `UserDefinedLogicalNode::from_template` only retu

Re: [I] UserDefinedLogicalNode::from_template does not return a Result<...> > [datafusion]

2024-05-18 Thread via GitHub
LorrensP-2158466 closed issue #10570: UserDefinedLogicalNode::from_template does not return a Result<...> > URL: https://github.com/apache/datafusion/issues/10570 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[I] UserDefinedLogicalNode::from_template does not return a Result<...> > [datafusion]

2024-05-18 Thread via GitHub
LorrensP-2158466 opened a new issue, #10570: URL: https://github.com/apache/datafusion/issues/10570 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've cons

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-18 Thread via GitHub
andygrove commented on code in PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#discussion_r1605784115 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -123,7 +134,7 @@ class CometExpressionCoverageSuite extends CometTestBase w

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-18 Thread via GitHub
advancedxy commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605766340 ## spark/src/main/scala/org/apache/spark/sql/comet/operators.scala: ## @@ -899,6 +899,40 @@ case class CometSortMergeJoinExec( "join_time" -> SQLMetric

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
backkem commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2118742184 Indeed, there is already a function on the sqlparser::dialect trait that takes this into account: https://github.com/sqlparser-rs/sqlparser-rs/blob/54184460b5d873a67c2801

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-18 Thread via GitHub
goldmedal commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2118721347 Provide something I surveyed. I think we can follow how Calcite handles the quoted issue. The `SqlDialect` of Calcite has a check rule `identifierNeedsQuote`. ht

Re: [I] Implement Spark-compatible cast between decimals with different precision and scale [datafusion-comet]

2024-05-18 Thread via GitHub
caicancai commented on issue #375: URL: https://github.com/apache/datafusion-comet/issues/375#issuecomment-2118707359 I am working on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Implement Unparse `GroupingSet` Expr --> String Support sql [datafusion]

2024-05-18 Thread via GitHub
xinlifoobar commented on code in PR #10555: URL: https://github.com/apache/datafusion/pull/10555#discussion_r1605709554 ## datafusion/sql/src/unparser/expr.rs: ## @@ -411,9 +411,34 @@ impl Unparser<'_> { Expr::Wildcard { qualifier: _ } => { not_impl