Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597542555 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597540519 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #418: URL: https://github.com/apache/datafusion-comet/pull/418#issuecomment-2106099066 @viirya Help to start CI, thanks -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 opened a new pull request, #418: URL: https://github.com/apache/datafusion-comet/pull/418 ## Which issue does this PR close? Closes #417. ## Rationale for this change ## What changes are included in this PR? ## How are these chan

[I] Update the DataFusion in Python website [datafusion-python]

2024-05-11 Thread via GitHub
Weijun-H opened a new issue, #687: URL: https://github.com/apache/datafusion-python/issues/687 **Describe the bug** The logo and the links on the website are outdated and deprecated. We should check and update them. -- This is an automated message from the Apache Git Service. To re

[I] chore: Rename some columnar shuffle configs for code consistently [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 opened a new issue, #417: URL: https://github.com/apache/datafusion-comet/issues/417 ### What is the problem the feature request solves? some config name of columnar shuffle is not consistent and refine it ### Describe the potential solution _No response_

Re: [PR] Improve round-robin repartitioning [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] commented on PR #6047: URL: https://github.com/apache/datafusion/pull/6047#issuecomment-2106085595 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Add `async` UDF example [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] closed pull request #6713: Add `async` UDF example URL: https://github.com/apache/datafusion/pull/6713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] UpdateD pool.rs [datafusion]

2024-05-11 Thread via GitHub
github-actions[bot] closed pull request #6943: UpdateD pool.rs URL: https://github.com/apache/datafusion/pull/6943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [I] Apply guarantee rewriter to sql workflow [datafusion]

2024-05-11 Thread via GitHub
yyy1000 commented on issue #10456: URL: https://github.com/apache/datafusion/issues/10456#issuecomment-2106074548 On a second glance, I feel it's difficult. 😥 When simplifying a logicalplan, it seems impossible to get the underlying data which could making `guarantees`. -- This is an

Re: [PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 closed pull request #10463: Add simplifier additional between URL: https://github.com/apache/datafusion/pull/10463 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597527103 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -98,25 +101,75 @@ fn analyze_internal( // select t2.c2 from t1 where t1.c1 in (select t2.c1 from

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597526532 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -98,25 +101,75 @@ fn analyze_internal( // select t2.c2 from t1 where t1.c1 in (select t2.c1 from

Re: [PR] Stop copying LogicalPlan and Exprs in `TypeCoercion` (10% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead commented on code in PR #10356: URL: https://github.com/apache/datafusion/pull/10356#discussion_r1597526491 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -66,26 +67,28 @@ impl AnalyzerRule for TypeCoercion { } fn analyze(&self, plan: LogicalPl

Re: [PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 commented on PR #10463: URL: https://github.com/apache/datafusion/pull/10463#issuecomment-2106065020 The example in #10456 in this PR like below ``` > create table t (c int) as values (1), (3), (5); 0 row(s) fetched. Elapsed 0.031 seconds. > explain verbose sel

[PR] Add simplifier additional between [datafusion]

2024-05-11 Thread via GitHub
yyy1000 opened a new pull request, #10463: URL: https://github.com/apache/datafusion/pull/10463 ## Which issue does this PR close? Closes #10456. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597524103 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597524103 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2106058219 Do you mean df-python always panics if `stride` is not given? ``` #[pyfunction] #[pyo3(signature = (array, begin, end, stride = 1))] fn array_slice(array: PyExp

[I] Add to_unixtime function to scalar functions doc [datafusion]

2024-05-11 Thread via GitHub
Omega359 opened a new issue, #10462: URL: https://github.com/apache/datafusion/issues/10462 ### Describe the bug The to_unixtime function is missing from the scalar functions doc in the user guide. ### To Reproduce See https://datafusion.apache.org/user-guide/sql/scalar

[I] Add to_date function to scalar functions doc [datafusion]

2024-05-11 Thread via GitHub
Omega359 opened a new issue, #10461: URL: https://github.com/apache/datafusion/issues/10461 ### Describe the bug The to_date function is missing from the scalar functions doc in the user guide. ### To Reproduce See https://datafusion.apache.org/user-guide/sql/scalar_fun

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2106006470 Wow -- according to my benchmarks this change makes a non trivial difference in performance. We just keep driving tese numbers down ``` group

[PR] build(deps): bump datafusion-expr from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #686: URL: https://github.com/apache/datafusion-python/pull/686 Bumps [datafusion-expr](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f

[PR] build(deps): bump datafusion-sql from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #685: URL: https://github.com/apache/datafusion-python/pull/685 Bumps [datafusion-sql](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f6

[PR] build(deps): bump datafusion-functions-array from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #684: URL: https://github.com/apache/datafusion-python/pull/684 Bumps [datafusion-functions-array](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c

[PR] build(deps): bump datafusion-substrait from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #683: URL: https://github.com/apache/datafusion-python/pull/683 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed

Re: [PR] Introduce user-defined signature [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10439: URL: https://github.com/apache/datafusion/pull/10439#issuecomment-2105998161 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.58 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] closed pull request #631: build(deps): bump syn from 2.0.48 to 2.0.58 URL: https://github.com/apache/datafusion-python/pull/631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] build(deps): bump syn from 2.0.48 to 2.0.58 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] commented on PR #631: URL: https://github.com/apache/datafusion-python/pull/631#issuecomment-2105997735 Superseded by #682. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] build(deps): bump syn from 2.0.60 to 2.0.63 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #682: URL: https://github.com/apache/datafusion-python/pull/682 Bumps [syn](https://github.com/dtolnay/syn) from 2.0.60 to 2.0.63. Release notes Sourced from https://github.com/dtolnay/syn/releases";>syn's releases. 2.0.63 Pa

[PR] build(deps): bump datafusion-optimizer from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #681: URL: https://github.com/apache/datafusion-python/pull/681 Bumps [datafusion-optimizer](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed

[PR] build(deps): bump datafusion-common from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #679: URL: https://github.com/apache/datafusion-python/pull/679 Bumps [datafusion-common](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed756

[PR] build(deps): bump datafusion from 37.1.0 to 38.0.0 [datafusion-python]

2024-05-11 Thread via GitHub
dependabot[bot] opened a new pull request, #680: URL: https://github.com/apache/datafusion-python/pull/680 Bumps [datafusion](https://github.com/apache/datafusion) from 37.1.0 to 38.0.0. Commits https://github.com/apache/datafusion/commit/cafbc9ddceb5af8c6408d0c8bbfed7568f655dd

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10430: URL: https://github.com/apache/datafusion/pull/10430#issuecomment-2105995932 Thanks for the review @comphead 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597493578 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597493524 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105992692 > hi, @alamb, I'd like to know if those two expressions are semantically identical: Yes, I believe they are the same -- This is an automated message from the Apache Git S

Re: [PR] refactor: Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10454: URL: https://github.com/apache/datafusion/pull/10454#issuecomment-2105992517 Thanks @erratic-pattern -- I took the liberty of merging the branch up from main to resolve a merge conflict as well -- This is an automated message from the Apache Git Service. To

Re: [PR] Fix Docs [datafusion-python]

2024-05-11 Thread via GitHub
andygrove merged PR #676: URL: https://github.com/apache/datafusion-python/pull/676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] doc builds are broken [datafusion-python]

2024-05-11 Thread via GitHub
andygrove closed issue #675: doc builds are broken URL: https://github.com/apache/datafusion-python/issues/675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-11 Thread via GitHub
viirya commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2105965894 Triggered CI pipelines. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
viirya commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105965411 > Ci failed because of connection timeout I will re-trigger failed pipelines. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597480063 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478938 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478938 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on code in PR #10460: URL: https://github.com/apache/datafusion/pull/10460#discussion_r1597478628 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -88,60 +94,72 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { input,

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105957139 Since the `regexp_*` UDFs have the same problem, I suspect that `array_slice` was just our first encounter w/ the underlying issue: any "inner" UDF implementation that be

[PR] refactor: replace distinct with aggr [datafusion]

2024-05-11 Thread via GitHub
ClSlaid opened a new pull request, #10460: URL: https://github.com/apache/datafusion/pull/10460 ## Which issue does this PR close? Closes #10293. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105950929 hi, @alamb, I'd like to know if those two expressions differ: ```rust let aggr_expr = vec![Expr::AggregateFunction(AggregateFunction::new(

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-11 Thread via GitHub
comphead merged PR #10430: URL: https://github.com/apache/datafusion/pull/10430 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Remove `AggregateFunctionDefinition::Name` [datafusion]

2024-05-11 Thread via GitHub
comphead merged PR #10441: URL: https://github.com/apache/datafusion/pull/10441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-11 Thread via GitHub
ozankabak commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2105942784 Thank you. We will address your review feedback and then merge afterwards. 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] fix: make `columnize_expr` resistant to display_name collisions [datafusion]

2024-05-11 Thread via GitHub
jonahgao opened a new pull request, #10459: URL: https://github.com/apache/datafusion/pull/10459 ## Which issue does this PR close? Closes #10413. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597465482 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105745741 > I like the idea of generalizing the `(u64, &Expr)` struct into something reuseable across optimizations. Honestly, I don't know the those referenced usecases, but I f

Re: [PR] refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
erratic-pattern commented on code in PR #10454: URL: https://github.com/apache/datafusion/pull/10454#discussion_r1597448118 ## datafusion/expr/src/expr.rs: ## @@ -1654,28 +1654,42 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -fn create_

Re: [I] Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on issue #414: URL: https://github.com/apache/datafusion-comet/issues/414#issuecomment-2105743676 I plan on creating a PR to update our documentation to make it clear that we only support Apache Spark and not other Spark implementations. -- This is an automated messag

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-11 Thread via GitHub
andygrove commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105743297 > I think this is more a user experience problem, how should we design it is discussable. It is more than a user experience issue. The current API is causing test failu

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597446354 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -525,6 +527,18 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597446255 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1207,7 +1223,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde {

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597445759 ## core/src/execution/proto/expr.proto: ## @@ -233,12 +233,20 @@ message Remainder { DataType return_type = 4; } +enum EvalMode { + LEGACY = 0; + TRY =

Re: [PR] feat: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
andygrove commented on code in PR #415: URL: https://github.com/apache/datafusion-comet/pull/415#discussion_r1597445680 ## core/src/execution/datafusion/planner.rs: ## @@ -346,10 +346,10 @@ impl PhysicalPlanner { let child = self.create_expr(expr.child.as_ref().

Re: [I] Support "User defined coercion" rules [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 closed issue #10423: Support "User defined coercion" rules URL: https://github.com/apache/datafusion/issues/10423 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Introduce user-defined signature [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 merged PR #10439: URL: https://github.com/apache/datafusion/pull/10439 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
erratic-pattern commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105737013 Thanks for the detailed write up @peter-toth . Though I did mention `HashSet` specifically, my suggestion more generally goes along the lines of using the `Hash` implem

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 merged PR #10457: URL: https://github.com/apache/datafusion/pull/10457 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on PR #10457: URL: https://github.com/apache/datafusion/pull/10457#issuecomment-2105730542 Thanks @NoeB and @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-11 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2105729144 hi @parthchandra @andygrove made changes as suggested ported the date parsing logic [from SparkDateTimeUtils](https://github.com/apache/spark/blob/9d79ab42b127d1a12164cec260

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597438771 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597438771 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597437435 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597437435 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597436309 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -20,23 +20,124 @@ use std::sync::Arc; use crate::signature::{ ArrayFunctionSignature, FIXED_SIZE_

Re: [PR] build: Add spark-4.0 profile and shims [datafusion-comet]

2024-05-11 Thread via GitHub
kazuyukitanimura commented on PR #407: URL: https://github.com/apache/datafusion-comet/pull/407#issuecomment-2105690563 @viirya @andygrove passed all the tests on my personal github actions -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597433886 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597432978 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

[PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-11 Thread via GitHub
vaibhawvipul opened a new pull request, #416: URL: https://github.com/apache/datafusion-comet/pull/416 ## Which issue does this PR close? Closes #374 . ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597428939 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2105681913 > UPDATE: It looks like Expr already derives Hash. Is there a reason we're not using that instead of string keys? I forgot to comment on this thread that my detailed answ

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105681808 of course thank you @ClSlaid -- just be aware these PRs have tended to be tricky. I recommend doing it incrementally if possible. Let me know if you need some help 🙏 -- This

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597427699 ## datafusion/expr/src/type_coercion/functions.rs: ## @@ -20,23 +20,124 @@ use std::sync::Arc; use crate::signature::{ ArrayFunctionSignature, FIXED_SIZE_LIST_

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-11 Thread via GitHub
alamb commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2105680216 > UPDATE: It looks like Expr already derives Hash. Is there a reason we're not using that instead of string keys? I believe CSE may predate the Hash impl for Expr -- This i

Re: [PR] refactor: use Reduce string allocations in Expr::display_name (use write instead of format!) [datafusion]

2024-05-11 Thread via GitHub
alamb commented on code in PR #10454: URL: https://github.com/apache/datafusion/pull/10454#discussion_r1597426402 ## datafusion/expr/src/expr.rs: ## @@ -1654,28 +1654,42 @@ fn fmt_function( write!(f, "{}({}{})", fun, distinct_str, args.join(", ")) } -fn create_function_n

Re: [PR] doc: fix old master branch references to main [datafusion]

2024-05-11 Thread via GitHub
alamb merged PR #10458: URL: https://github.com/apache/datafusion/pull/10458 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
NoeB commented on PR #10457: URL: https://github.com/apache/datafusion/pull/10457#issuecomment-2105674257 @Jefffrey Thank you for the review, I added a new commit which adds the missing distinct for the type -- This is an automated message from the Apache Git Service. To respond to the me

[PR] doc: fix old master branch references to main [datafusion]

2024-05-11 Thread via GitHub
Jefffrey opened a new pull request, #10458: URL: https://github.com/apache/datafusion/pull/10458 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Minor: Add usecase to comments in `LogicalPlan::recompute_schema` [datafusion]

2024-05-11 Thread via GitHub
alamb commented on PR #10443: URL: https://github.com/apache/datafusion/pull/10443#issuecomment-2105669432 Thanks for the review @Jefffrey and @yyy1000 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Minor: Add usecase to comments in `LogicalPlan::recompute_schema` [datafusion]

2024-05-11 Thread via GitHub
alamb merged PR #10443: URL: https://github.com/apache/datafusion/pull/10443 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-11 Thread via GitHub
peter-toth commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105664520 > Are there any potential issues with simply using the existing `Hash` implementation of `Expr` to create `HashSet`s? > > Serveral other optimization passes use string

Re: [PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
Jefffrey commented on code in PR #10457: URL: https://github.com/apache/datafusion/pull/10457#discussion_r1597413465 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -2285,6 +2285,201 @@ ORDER BY tag 33 11 NULL 33 11 NULL 33 11 NULL B +# bit_and_i32 +statement ok

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-11 Thread via GitHub
b41sh commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597413786 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

[PR] Move bit_and_or_xor unit tests to slt [datafusion]

2024-05-11 Thread via GitHub
NoeB opened a new pull request, #10457: URL: https://github.com/apache/datafusion/pull/10457 ## Which issue does this PR close? part of #10384 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested? ##

Re: [I] Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` [datafusion]

2024-05-11 Thread via GitHub
samuelcolvin commented on issue #10400: URL: https://github.com/apache/datafusion/issues/10400#issuecomment-2105649193 @alamb using `PruningPredicate` makes sense, but please can you point me at where I need to make changes to add this functionality? -- This is an automated message from t

[I] Apply guarantee rewriter to sql workflow [datafusion]

2024-05-11 Thread via GitHub
jayzhan211 opened a new issue, #10456: URL: https://github.com/apache/datafusion/issues/10456 ### Is your feature request related to a problem or challenge? While deprecating`Expr::GetIndexedField`, I found there are many test cases that are not covered in sqllogictest, for example `

Re: [I] Stop copying LogicalPlan and Exprs in `ReplaceDistinctWithAggregate` [datafusion]

2024-05-11 Thread via GitHub
ClSlaid commented on issue #10293: URL: https://github.com/apache/datafusion/issues/10293#issuecomment-2105629456 May I take a look? Merci. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105628167 > thanks @leoluan2009 appreciate if you could add the unit test to prevent regression @comphead can you give me a example? thanks -- This is an automated message from

Re: [PR] fix: Unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-11 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105618196 Ci failed because of connection timeout: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] chore: Use enum to represent CAST eval_mode in expr.proto [datafusion-comet]

2024-05-11 Thread via GitHub
prashantksharma commented on issue #361: URL: https://github.com/apache/datafusion-comet/issues/361#issuecomment-2105614143 @andygrove cc: @viirya I have opended a draft PR. I have tested the changes using - `make test-rust` - `make test-jvm` Details on PR messa

  1   2   >