[PR] Improve signature of `get_field` function [datafusion]

2024-05-17 Thread via GitHub
lewiszlw opened a new pull request, #10569: URL: https://github.com/apache/datafusion/pull/10569 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/10566. ## Rationale for this change ## What changes are included in this

Re: [PR] Remove `Expr::GetIndexedField` and fix panic of `field`, `index` and `range` [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1605684566 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -1037,29 +1037,6 @@ mod tests { assert_optimized_plan_equal(plan, expected) } -

Re: [PR] Remove `Expr::GetIndexedField` and fix panic of `field`, `index` and `range` [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1605684497 ## datafusion/core/tests/optimizer_integration.rs: ## @@ -233,3 +241,120 @@ impl TableSource for MyTableSource { self.schema.clone() } } + +#[tes

[PR] Remove `Expr::GetIndexedField` and fix panic of `field`, `index` and `range` [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 opened a new pull request, #10568: URL: https://github.com/apache/datafusion/pull/10568 ## Which issue does this PR close? Closes #10374. Closes #10565 ## Rationale for this change ## What changes are included in this PR? ## Ar

Re: [PR] fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule [datafusion-comet]

2024-05-17 Thread via GitHub
kazuyukitanimura commented on code in PR #441: URL: https://github.com/apache/datafusion-comet/pull/441#discussion_r1605682420 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -576,11 +576,13 @@ class CometSparkSessionExtensions // excha

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
kazuyukitanimura commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605680912 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -698,6 +698,31 @@ class CometSparkSessionExtensions withInf

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-17 Thread via GitHub
kazuyukitanimura commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1605660821 ## docs/source/contributor-guide/adding_a_new_expression.md: ## @@ -0,0 +1,212 @@ + + +# Adding a Expression + +There are a number of Spark expression t

Re: [I] Using `Expr::field` panics [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on issue #10565: URL: https://github.com/apache/datafusion/issues/10565#issuecomment-2118650583 I can fix this along with #10374 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Enable codecov [datafusion]

2024-05-17 Thread via GitHub
github-actions[bot] closed pull request #6067: Enable codecov URL: https://github.com/apache/datafusion/pull/6067 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Transform with payload [datafusion]

2024-05-17 Thread via GitHub
github-actions[bot] closed pull request #8664: Transform with payload URL: https://github.com/apache/datafusion/pull/8664 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] rewrite nvl function [datafusion]

2024-05-17 Thread via GitHub
github-actions[bot] closed pull request #9413: rewrite nvl function URL: https://github.com/apache/datafusion/pull/9413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] Move `Count` to `functions-aggregate` [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10484: URL: https://github.com/apache/datafusion/pull/10484#discussion_r1599247532 ## datafusion/physical-expr-common/src/aggregate/count_distinct/mod.rs: ## @@ -0,0 +1,23 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// o

Re: [PR] Move aggregate statistic , combine aggregate and limited distinct aggregate test to slt [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on PR #10554: URL: https://github.com/apache/datafusion/pull/10554#issuecomment-2118531004 I think I can use functions-aggregate in physical optimizer directly 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Move aggregate statistic , combine aggregate and limited distinct aggregate test to slt [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 closed pull request #10554: Move aggregate statistic , combine aggregate and limited distinct aggregate test to slt URL: https://github.com/apache/datafusion/pull/10554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1605623477 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -621,9 +623,12 @@ async fn roundtrip_expr_api() -> Result<()> { lit(1), )

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1605592641 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,31 +819,31 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1605592641 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,31 +819,31 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1605591431 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1605591431 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_

Re: [PR] doc: Add Tuning Guide with shuffle configs [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on PR #443: URL: https://github.com/apache/datafusion-comet/pull/443#issuecomment-2118449502 Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] docs: Add benchmarking guide [datafusion-comet]

2024-05-17 Thread via GitHub
andygrove merged PR #444: URL: https://github.com/apache/datafusion-comet/pull/444 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] doc: Add Tuning Guide with shuffle configs [datafusion-comet]

2024-05-17 Thread via GitHub
andygrove merged PR #443: URL: https://github.com/apache/datafusion-comet/pull/443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

[PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-17 Thread via GitHub
andygrove opened a new pull request, #445: URL: https://github.com/apache/datafusion-comet/pull/445 ## Which issue does this PR close? N/A ## Rationale for this change The current message `Native shuffle is not enabled` is not very informative. There are

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file [datafusion]

2024-05-17 Thread via GitHub
NGA-TRAN commented on PR #10537: URL: https://github.com/apache/datafusion/pull/10537#issuecomment-2118417052 @alamb : The PR is ready for review. Some notes: 1. The tests in `datafusion/core/src/datasource/physical_plan/parquet/statistics.rs` are tests for different dat types. Instead o

Re: [I] Connection reset by peer on AWS S3 object store. [datafusion]

2024-05-17 Thread via GitHub
Maxsparrow commented on issue #10478: URL: https://github.com/apache/datafusion/issues/10478#issuecomment-2118409774 We've been hitting this issue too, the object store doesn't retry these errors even if you set retries: ``` Error: External(ParquetError(General("AsyncChunkReader::get_b

Re: [PR] Handle dictionary values in ScalarValue serde [datafusion]

2024-05-17 Thread via GitHub
thinkharderdev merged PR #10563: URL: https://github.com/apache/datafusion/pull/10563 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] docs: Add benchmarking guide [datafusion-comet]

2024-05-17 Thread via GitHub
andygrove commented on code in PR #444: URL: https://github.com/apache/datafusion-comet/pull/444#discussion_r1605543934 ## docs/source/contributor-guide/benchmarking.md: ## @@ -0,0 +1,43 @@ +# Comet Benchmarking Guide Review Comment: Good catch. Looks like we have no CI chec

Re: [PR] docs: Add benchmarking guide [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #444: URL: https://github.com/apache/datafusion-comet/pull/444#discussion_r1605541803 ## docs/source/contributor-guide/benchmarking.md: ## @@ -0,0 +1,43 @@ +# Comet Benchmarking Guide Review Comment: license header? -- This is an automated

[PR] docs: Add benchmarking guide [datafusion-comet]

2024-05-17 Thread via GitHub
andygrove opened a new pull request, #444: URL: https://github.com/apache/datafusion-comet/pull/444 ## Which issue does this PR close? N/A ## Rationale for this change Explain how to run TPC-* benchmarks ## What changes are included in this PR?

[PR] doc: Add Tuning Guide with shuffle configs [datafusion-comet]

2024-05-17 Thread via GitHub
viirya opened a new pull request, #443: URL: https://github.com/apache/datafusion-comet/pull/443 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [I] Support custom SchemaAdapter on ParquetExec [datafusion]

2024-05-17 Thread via GitHub
comphead closed issue #10398: Support custom SchemaAdapter on ParquetExec URL: https://github.com/apache/datafusion/issues/10398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Expose Parquet Schema Adapter [datafusion]

2024-05-17 Thread via GitHub
comphead merged PR #10515: URL: https://github.com/apache/datafusion/pull/10515 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1605497952 ## datafusion/proto/tests/cases/roundtrip_logical_plan.rs: ## @@ -31,8 +31,10 @@ use datafusion::datasource::TableProvider; use datafusion::execution::context::Sess

Re: [PR] feat: Expose Parquet Schema Adapter [datafusion]

2024-05-17 Thread via GitHub
HawaiianSpork commented on PR #10515: URL: https://github.com/apache/datafusion/pull/10515#issuecomment-2118311478 Sorry, to keep you hanging and thank you for the quick review. I agree with the suggestions but haven't circled back around to completing. If you want to wait, I will get to

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
alamb commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2118310034 This looks super cool @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Implement Unparse `GroupingSet` Expr --> String Support sql [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10555: URL: https://github.com/apache/datafusion/pull/10555#discussion_r1605496677 ## datafusion/sql/src/unparser/expr.rs: ## @@ -411,9 +411,34 @@ impl Unparser<'_> { Expr::Wildcard { qualifier: _ } => { not_impl_err!(

Re: [PR] Add examples of how to convert logical plan to/from sql strings [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10558: URL: https://github.com/apache/datafusion/pull/10558#discussion_r1605494227 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -77,3 +81,132 @@ fn simple_expr_to_sql_demo_escape_mysql_style() -> Result<()> { assert_eq!(sql, r#"((`a`

Re: [PR] Minor: Move proxy to datafusion common [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10561: URL: https://github.com/apache/datafusion/pull/10561#discussion_r1605490880 ## datafusion/common/src/memory_pool/mod.rs: ## @@ -0,0 +1,18 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agree

Re: [PR] Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10564: URL: https://github.com/apache/datafusion/pull/10564#discussion_r1605486971 ## datafusion/core/src/physical_planner.rs: ## @@ -234,7 +234,7 @@ fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result { stride: _

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-17 Thread via GitHub
appletreeisyellow commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1605477700 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -64,63 +67,55 @@ impl SingleDistinctToGroupBy { } /// Check whether all aggregate ex

[I] Improve signature of `get_field` is function [datafusion]

2024-05-17 Thread via GitHub
alamb opened a new issue, #10566: URL: https://github.com/apache/datafusion/issues/10566 ### Is your feature request related to a problem or challenge? The signature of [`get_field`](https://docs.rs/datafusion/latest/datafusion/functions/core/expr_fn/fn.get_field.html) is akward -- i

[I] Using `Expr::field` panics [datafusion]

2024-05-17 Thread via GitHub
alamb opened a new issue, #10565: URL: https://github.com/apache/datafusion/issues/10565 ### Describe the bug After https://github.com/apache/datafusion/pull/10375 merged [`Expr::field`](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.Expr.html#method.field) no

Re: [PR] Minor: Fix names in ArrayFunctionRewriter [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10564: URL: https://github.com/apache/datafusion/pull/10564#discussion_r1605446551 ## datafusion/core/src/physical_planner.rs: ## @@ -234,7 +234,7 @@ fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result { stride: _

Re: [PR] Minor: Fix names in ArrayFunctionRewriter [datafusion]

2024-05-17 Thread via GitHub
alamb commented on code in PR #10564: URL: https://github.com/apache/datafusion/pull/10564#discussion_r1605444906 ## datafusion/core/src/physical_planner.rs: ## @@ -234,7 +234,7 @@ fn create_physical_name(e: &Expr, is_first_expr: bool) -> Result { stride: _

[PR] Minor: Fix names in ArrayFunctionRewriter [datafusion]

2024-05-17 Thread via GitHub
alamb opened a new pull request, #10564: URL: https://github.com/apache/datafusion/pull/10564 ## Which issue does this PR close? N/A ## Rationale for this change While debugging a DataFusion upgrade in InfluxDB the different names for the same thing is making it hard to

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1605443737 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -64,63 +67,55 @@ impl SingleDistinctToGroupBy { } /// Check whether all aggregate expr

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1605443422 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -64,63 +67,55 @@ impl SingleDistinctToGroupBy { } /// Check whether all aggregate expr

[PR] Bump rexml from 3.2.6 to 3.2.8 [datafusion-site]

2024-05-17 Thread via GitHub
dependabot[bot] opened a new pull request, #2: URL: https://github.com/apache/datafusion-site/pull/2 Bumps [rexml](https://github.com/ruby/rexml) from 3.2.6 to 3.2.8. Release notes Sourced from https://github.com/ruby/rexml/releases";>rexml's releases. REXML 3.2.8 - 2024-05-

Re: [PR] Create default jekyll site with old datafusion posts [datafusion-site]

2024-05-17 Thread via GitHub
andygrove merged PR #1: URL: https://github.com/apache/datafusion-site/pull/1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Create default jekyll site with old datafusion posts [datafusion-site]

2024-05-17 Thread via GitHub
andygrove commented on PR #1: URL: https://github.com/apache/datafusion-site/pull/1#issuecomment-2118217121 Thanks for the review @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on PR #10527: URL: https://github.com/apache/datafusion/pull/10527#issuecomment-2118196189 There are still some clones but I think they are necessary here because we are building two aggregates from one -- This is an automated message from the Apache Git Service.

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-17 Thread via GitHub
erratic-pattern commented on PR #10527: URL: https://github.com/apache/datafusion/pull/10527#issuecomment-2118186691 Since what I was working on overlapped a bit with the code changes here, I made a PR on this branch to remove some additional clones, @appletreeisyellow https://githu

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605335130 ## tpch/queries/sf=1/q1.sql: ## @@ -0,0 +1,23 @@ +-- SQLBench-H query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. +-- TPC-H querie

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
andygrove commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605321768 ## tpch/queries/sf=1/q1.sql: ## @@ -0,0 +1,23 @@ +-- SQLBench-H query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. +-- TPC-H que

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
andygrove commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605314268 ## tpch/queries/sf=1/q1.sql: ## @@ -0,0 +1,23 @@ +-- SQLBench-H query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. +-- TPC-H que

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
andygrove commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605314268 ## tpch/queries/sf=1/q1.sql: ## @@ -0,0 +1,23 @@ +-- SQLBench-H query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. +-- TPC-H que

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605283914 ## runners/datafusion-comet/tpcbench.py: ## @@ -0,0 +1,79 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreem

Re: [PR] feat: Expose Parquet Schema Adapter [datafusion]

2024-05-17 Thread via GitHub
comphead commented on PR #10515: URL: https://github.com/apache/datafusion/pull/10515#issuecomment-2117965789 @HawaiianSpork please address the suggestions, you dont need to change the code, just make a response -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] feat: Expose Parquet Schema Adapter [datafusion]

2024-05-17 Thread via GitHub
alamb commented on PR #10515: URL: https://github.com/apache/datafusion/pull/10515#issuecomment-2117960822 > are we waiting feedbacks from @HawaiianSpork or we good to merge the PR? I was waiting to see if they wanted to make any of the suggestions. If not, let's merge it in -- Thi

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1605271540 ## tpch/queries/sf=1/q1.sql: ## @@ -0,0 +1,23 @@ +-- SQLBench-H query 1 derived from TPC-H query 1 under the terms of the TPC Fair Use Policy. +-- TPC-H querie

Re: [PR] Implement unparse `ScalarVariable` to String [datafusion]

2024-05-17 Thread via GitHub
alamb merged PR #10541: URL: https://github.com/apache/datafusion/pull/10541 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `ScalarVariable` Expr --> String Support [datafusion]

2024-05-17 Thread via GitHub
alamb closed issue #10518: `ScalarVariable` Expr --> String Support URL: https://github.com/apache/datafusion/issues/10518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Docs: Update PR workflow documentation [datafusion]

2024-05-17 Thread via GitHub
viirya commented on code in PR #10532: URL: https://github.com/apache/datafusion/pull/10532#discussion_r1605266696 ## docs/source/contributor-guide/index.md: ## @@ -66,24 +66,33 @@ ideas with the community to get feedback on implementation. ## Pull Request Overview -We welc

[PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-17 Thread via GitHub
andygrove opened a new pull request, #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1 This PR adds a README and some initial scripts for running TPC-H against DataFusion Python and DataFusion Comet. There is a lot that needs to be done following on from this initial com

Re: [PR] Docs: Update PR workflow documentation [datafusion]

2024-05-17 Thread via GitHub
viirya commented on PR #10532: URL: https://github.com/apache/datafusion/pull/10532#issuecomment-2117908529 > I realize I often wait some time between approve and merge which can be confusing and is not really documented. Most recently this came up with @shanretoo on https://github.com/apac

Re: [PR] feat: Expose Parquet Schema Adapter [datafusion]

2024-05-17 Thread via GitHub
comphead commented on PR #10515: URL: https://github.com/apache/datafusion/pull/10515#issuecomment-2117879593 are we waiting feedbacks from @HawaiianSpork or we good to merge the PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Docs: Update PR workflow documentation [datafusion]

2024-05-17 Thread via GitHub
comphead commented on code in PR #10532: URL: https://github.com/apache/datafusion/pull/10532#discussion_r1605212601 ## docs/source/contributor-guide/index.md: ## @@ -66,24 +66,33 @@ ideas with the community to get feedback on implementation. ## Pull Request Overview -We we

Re: [PR] fix: Reuse CometBroadcastExchangeExec with Spark ReuseExchangeAndSubquery rule [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #441: URL: https://github.com/apache/datafusion-comet/pull/441#discussion_r1605142664 ## spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala: ## @@ -62,6 +63,39 @@ class CometExecSuite extends CometTestBase { } } + test("Reus

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
leoluan2009 commented on PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#issuecomment-2117773652 @viirya thanks for your review, resolve your commets -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
leoluan2009 commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605131157 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2545,6 +2545,18 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] feature: Add a WindowUDFImpl::simplify() API [datafusion]

2024-05-17 Thread via GitHub
milenkovicm commented on PR #9906: URL: https://github.com/apache/datafusion/pull/9906#issuecomment-2117747727 I guess at some point we'd need to change `UDF::simplify` to align interfaces -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] Handle dictionary values in ScalarValue serde [datafusion]

2024-05-17 Thread via GitHub
thinkharderdev opened a new pull request, #10563: URL: https://github.com/apache/datafusion/pull/10563 ## Which issue does this PR close? Closes #10562 ## Rationale for this change ## What changes are included in this PR? 1. Handle serde fo

Re: [I] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-17 Thread via GitHub
alamb commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2117733259 After working through an actual example in https://github.com/apache/datafusion/pull/10549 I have a new API proposal: https://github.com/NGA-TRAN/arrow-datafusion/pull/118

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

2024-05-17 Thread via GitHub
Michael-J-Ward commented on issue #696: URL: https://github.com/apache/datafusion-python/issues/696#issuecomment-2117705572 My default would be *not* to add the data to the repo, but I'll let @andygrove decide that. Maybe we could add it to the test-data submodule https://github.com/apache

[PR] Move proxy to datafusion common [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 opened a new pull request, #10561: URL: https://github.com/apache/datafusion/pull/10561 ## Which issue does this PR close? Closes #. Part of #10484 ## Rationale for this change These two allocExtension is used in binary map (datafusion/p

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605056410 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2545,6 +2545,18 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605054299 ## spark/src/main/scala/org/apache/spark/sql/comet/operators.scala: ## @@ -899,6 +899,39 @@ case class CometSortMergeJoinExec( "join_time" -> SQLMetrics.cr

Re: [PR] introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on PR #10560: URL: https://github.com/apache/datafusion/pull/10560#issuecomment-2117670563 I plan to add `distinct` macro in the follow up PR for `count`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1605052852 ## datafusion/functions-aggregate/src/macros.rs: ## @@ -48,49 +48,98 @@ macro_rules! make_udaf_expr_and_func { None, )) }

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605052739 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -698,6 +698,31 @@ class CometSparkSessionExtensions withInfo(s, Seq(m

Re: [PR] feat: Support CartesianProductExec in comet [datafusion-comet]

2024-05-17 Thread via GitHub
viirya commented on code in PR #442: URL: https://github.com/apache/datafusion-comet/pull/442#discussion_r1605052327 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -698,6 +698,31 @@ class CometSparkSessionExtensions withInfo(s, Seq(m

[PR] introduce expr builder for aggregate function [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 opened a new pull request, #10560: URL: https://github.com/apache/datafusion/pull/10560 ## Which issue does this PR close? Closes #10545. ## Rationale for this change After this PR, there are two kinds of expression API. One is the normal one

[I] Dynamic schema for custom TableProvider [datafusion]

2024-05-17 Thread via GitHub
maronavenue opened a new issue, #10559: URL: https://github.com/apache/datafusion/issues/10559 ### Is your feature request related to a problem or challenge? Hi team, We have been exploring DataFusion to provide unified access to all our various data sources. For cases where we

Re: [I] Ensure examples stay updated in CI. [datafusion-python]

2024-05-17 Thread via GitHub
timsaucer commented on issue #696: URL: https://github.com/apache/datafusion-python/issues/696#issuecomment-2117480476 Ok, made some good progress on this. Will try to wrap up tomorrow. https://github.com/timsaucer/datafusion-python/blob/tsaucer/prepare_tpch_examples_for_ci/examples/tpch/_t

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-17 Thread via GitHub
alamb commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2117476032 Thanks @peter-toth -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Move aggregate statistic , combine aggregate and limited distinct aggregate test to slt [datafusion]

2024-05-17 Thread via GitHub
jayzhan211 commented on code in PR #10554: URL: https://github.com/apache/datafusion/pull/10554#discussion_r1604805920 ## datafusion/core/src/physical_optimizer/combine_partial_final_agg.rs: ## @@ -194,282 +194,3 @@ fn discard_column_index(group_expr: Arc) -> Arc { -

Re: [I] Incorrect Order Preserving Cases for Cast Expressions [datafusion]

2024-05-17 Thread via GitHub
ozankabak closed issue #9832: Incorrect Order Preserving Cases for Cast Expressions URL: https://github.com/apache/datafusion/issues/9832 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-17 Thread via GitHub
ozankabak merged PR #10504: URL: https://github.com/apache/datafusion/pull/10504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-17 Thread via GitHub
ozankabak commented on PR #10504: URL: https://github.com/apache/datafusion/pull/10504#issuecomment-2117335725 I will go ahead and merge this and we will fix any issues with a quick follow-on PR in case @alamb discovers any when he has time to take a look. -- This is an automated message

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2117302110 @alamb, I think you shouldn't wait for my https://github.com/apache/datafusion/pull/10473, as there is still some preliminary work required I need to finish (https://github.co

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604616059 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_node(

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604645685 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -48,7 +51,42 @@ use indexmap::IndexMap; /// Since an identifier is likely to be copied many time

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604616059 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_node(

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604645685 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -48,7 +51,42 @@ use indexmap::IndexMap; /// Since an identifier is likely to be copied many time

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604645685 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -48,7 +51,42 @@ use indexmap::IndexMap; /// Since an identifier is likely to be copied many time

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604645685 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -48,7 +51,42 @@ use indexmap::IndexMap; /// Since an identifier is likely to be copied many time

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604629158 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_node(

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604625075 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -782,31 +819,31 @@ impl TreeNodeRewriter for CommonSubexprRewriter<'_> { return Ok(T

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604616059 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_node(

Re: [PR] Better CSE identifier [datafusion]

2024-05-17 Thread via GitHub
peter-toth commented on code in PR #10473: URL: https://github.com/apache/datafusion/pull/10473#discussion_r1604616059 ## datafusion/expr/src/expr.rs: ## @@ -1389,6 +1390,201 @@ impl Expr { | Expr::Placeholder(..) => false, } } + +pub fn hash_node(

  1   2   >