[I] Introduce user-defined-optimizer for user-defined function [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 opened a new issue, #10455: URL: https://github.com/apache/datafusion/issues/10455 ### Is your feature request related to a problem or challenge? While moving `count` to UDAF, I found that we need to call UDAF in the optimize rule [`single_distinct_to_groupby`](https://git

[PR] test: parametrize test_array_functions [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward opened a new pull request, #678: URL: https://github.com/apache/datafusion-python/pull/678 test_array_functions now has 56 passing test cases and 1 expected failure (`array_slice` being the expected failure Ref #670). test_array_function_flatten was broken out as a sing

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-10 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117835 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-10 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117835 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [PR] feat: Implement Spark-compatible CAST from non-integral numeric types to integral types [datafusion-comet]

2024-05-10 Thread via GitHub
rohitrastogi commented on code in PR #399: URL: https://github.com/apache/datafusion-comet/pull/399#discussion_r1596117964 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -232,6 +232,240 @@ macro_rules! cast_int_to_int_macro { }}; } +// When Spark casts to By

Re: [I] short-circuited expression should be evaluated one by one [datafusion]

2024-05-10 Thread via GitHub
liukun4515 commented on issue #8927: URL: https://github.com/apache/datafusion/issues/8927#issuecomment-2105525720 > I also find some very special cases when I meet the error. This is example sql: > > ``` > SELECT > ( > case > when SUM(col1) != 0 then SUM(col1)

Re: [PR] fix: Fix unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-10 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105518050 I have update plan stability results, can you help trigger CI? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] short-circuited expression should be evaluated one by one [datafusion]

2024-05-10 Thread via GitHub
liukun4515 commented on issue #8927: URL: https://github.com/apache/datafusion/issues/8927#issuecomment-2105501373 I also find some very special cases when I meet the error. This is example sql: ``` SELECT ( case when SUM(col1) != 0 then SUM(col1) / SUM(col2)

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
yyy1000 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597339778 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

Re: [PR] fix: Fix unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105462846 You need to update plan stability results, including `CometTPCDSV1_4_PlanStabilitySuite` and `CometTPCDSV2_7_PlanStabilitySuite`. -- This is an automated message from the Apache G

Re: [PR] fix: Fix unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105462682 > Can anyone help to find why some checks failed and how can I fix. @comphead @viirya TPCDS plans comparison failed, supposedly because of this change the plan representati

Re: [PR] fix: Fix unknown operator id when explain with formatted mode [datafusion-comet]

2024-05-10 Thread via GitHub
leoluan2009 commented on PR #410: URL: https://github.com/apache/datafusion-comet/pull/410#issuecomment-2105460599 Can anyone help to find why some checks failed and how can I fix. @comphead @viirya -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Enable codecov [datafusion]

2024-05-10 Thread via GitHub
github-actions[bot] commented on PR #6067: URL: https://github.com/apache/datafusion/pull/6067#issuecomment-2105440010 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] feat: support postgres types transparently in queries [datafusion]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #7156: feat: support postgres types transparently in queries URL: https://github.com/apache/datafusion/pull/7156 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Support array aggregate sum function [datafusion]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #7242: Support array aggregate sum function URL: https://github.com/apache/datafusion/pull/7242 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: Implement quantile_cont()/quantile_disc() aggregate functions [datafusion]

2024-05-10 Thread via GitHub
github-actions[bot] closed pull request #7337: feat: Implement quantile_cont()/quantile_disc() aggregate functions URL: https://github.com/apache/datafusion/pull/7337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105439058 Agreed @jayzhan211, these are two separate issues. The panic issue was filed separately as #10425 -- This is an automated message from the Apache Git Service. To

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597326136 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597326136 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597326136 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols );

Re: [I] Support "Tracing" / Spans [datafusion]

2024-05-10 Thread via GitHub
erratic-pattern commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2105427704 `create_physical_plan` would be a good follow up, to PoC instrumentation of async code -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105427649 > ```sql > select array_slice(column1, -1, 2, 1) from data3; > ``` I think there are two issues here, one is the panic issue in `datafusion-cli` and another one i

Re: [I] Support "Tracing" / Spans [datafusion]

2024-05-10 Thread via GitHub
erratic-pattern commented on issue #9415: URL: https://github.com/apache/datafusion/issues/9415#issuecomment-2105425559 After spending some time with the optimizer, I think it would be a good candidate to PoC the `tracing` crate. * It is not async, which makes it much easier to instru

[PR] refactor: use Write instead of format! to implement display_name [datafusion]

2024-05-10 Thread via GitHub
erratic-pattern opened a new pull request, #10454: URL: https://github.com/apache/datafusion/pull/10454 Just a small refactor that should, in theory, reduce string allocations and thus benefit concurrent throughput by reducing allocator lock contention. However, when running concurrency ben

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-10 Thread via GitHub
jayzhan211 commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597305859 ## datafusion/sqllogictest/test_files/coalesce.slt: ## @@ -23,7 +23,7 @@ select coalesce(1, 2, 3); 1 # test with first null -query IT +query ?T Review Comm

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597304147 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType, o

[I] Add `regex_replace` example back to docs [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward opened a new issue, #677: URL: https://github.com/apache/datafusion-python/issues/677 `regexp_replace` is broken similar to `array_slice` in #670. The example was removed from the docs in #676 -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Fix Docs [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on PR #676: URL: https://github.com/apache/datafusion-python/pull/676#issuecomment-2105373343 Thanks @comphead. I had included some `generated` files by accident in one of the commits. -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-10 Thread via GitHub
erratic-pattern commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2105361124 Are there any potential issues with simply using the existing `Hash` implementation of `Expr` to create `HashSet`s? Serveral other optimization passes use string n

Re: [PR] Fix Docs [datafusion-python]

2024-05-10 Thread via GitHub
comphead commented on PR #676: URL: https://github.com/apache/datafusion-python/pull/676#issuecomment-2105358788 > Any clues on why `Dev / Release Audit Tool (RAT)` would have broken from this PR? There are *.rst files added which not covered by Apache license header. You need eithe

Re: [PR] Fix Docs [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on PR #676: URL: https://github.com/apache/datafusion-python/pull/676#issuecomment-2105312645 Any clues on why `Dev / Release Audit Tool (RAT)` would have broken from this PR? -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Stop copying `Expr`s and LogicalPlans so much during Common Subexpression Elimination [datafusion]

2024-05-10 Thread via GitHub
erratic-pattern commented on issue #9873: URL: https://github.com/apache/datafusion/issues/9873#issuecomment-2105306903 @peter-toth > the issue of String identifiers are explained in my PR. I have a solution for the issue, but https://github.com/apache/datafusion/pull/10396 is already hu

[PR] Fix Docs [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward opened a new pull request, #676: URL: https://github.com/apache/datafusion-python/pull/676 # Which issue does this PR close? Closes #675 # Rationale for this change # What changes are included in this PR? 1) Remove offending function call `regex_rep

Re: [PR] fix: newFileScanRDD should not take constructor from custom Spark versions [datafusion-comet]

2024-05-10 Thread via GitHub
kazuyukitanimura commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597246287 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType

Re: [I] Remove "Execution error: " prefix from error messages from Rust [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on issue #293: URL: https://github.com/apache/datafusion-comet/issues/293#issuecomment-2105286369 Depends on DF 38.0.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[I] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new issue, #10453: URL: https://github.com/apache/datafusion/issues/10453 ### Is your feature request related to a problem or challenge? There are at least three places that parquet statistics are extracted into ArrayRefs today 1. ParquetExec (pruning Row Groups)

Re: [PR] chore: Improve release process for next time [datafusion]

2024-05-10 Thread via GitHub
andygrove merged PR #10447: URL: https://github.com/apache/datafusion/pull/10447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] DataFusion `38.0.0` Release [datafusion]

2024-05-10 Thread via GitHub
andygrove commented on issue #10217: URL: https://github.com/apache/datafusion/issues/10217#issuecomment-2105234195 Release has been completed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] DataFusion `38.0.0` Release [datafusion]

2024-05-10 Thread via GitHub
andygrove closed issue #10217: DataFusion `38.0.0` Release URL: https://github.com/apache/datafusion/issues/10217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
andygrove commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105220455 I am +1 for this solution suggested by @Michael-J-Ward ```rust #[doc = "returns a slice of the array."] pub fn array_slice(array: Expr, begin: Expr, end: Expr, str

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
Omega359 commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105200102 I recall hitting something like this with the substr/substring function. One would think they would be identical however they were not (since rust doesn't do variadic functions

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105192400 I believe we encountered the same issue again with `regex_replace`, so I suspect this issue isn't constrained to `array_slice` and instead applies to any `UDF` that previ

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-10 Thread via GitHub
alamb commented on PR #10404: URL: https://github.com/apache/datafusion/pull/10404#issuecomment-2105191465 I also updated this PR's title to reflect the user visible changes and added the `api-change` label -- This is an automated message from the Apache Git Service. To respond to the me

Re: [PR] Make `CREATE EXTERNAL TABLE` format options consistent, remove special syntax for `HEADER ROW`, `DELIMITER` and `COMPRESSION` [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10404: URL: https://github.com/apache/datafusion/pull/10404#discussion_r1597163747 ## datafusion/core/src/datasource/stream.rs: ## @@ -58,12 +58,22 @@ impl TableProviderFactory for StreamTableFactory { let schema: SchemaRef = Arc::new(cmd.

Re: [I] doc builds are broken [datafusion-python]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on issue #675: URL: https://github.com/apache/datafusion-python/issues/675#issuecomment-2105182877 @andygrove, `regexp_replace` looks like another example where the `UDF` version has a non-optional argument, the same as we encountered with `array_slice`: https://gi

[PR] Minor: Improve documentation for `catalog.has_header` config option [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new pull request, #10452: URL: https://github.com/apache/datafusion/pull/10452 ## Which issue does this PR close? Part of #7013 ## Rationale for this change While reviewing https://github.com/apache/datafusion/pull/10404 I found the documentation on this

[I] Document `CREATE EXTERNAL TABLE ... OPTIONS` [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new issue, #10451: URL: https://github.com/apache/datafusion/issues/10451 ### Is your feature request related to a problem or challenge? While reviewing https://github.com/apache/datafusion/pull/10404/files I could not find documentation on what the syntax / available s

Re: [I] `stride` is not optional for new `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
Michael-J-Ward commented on issue #10424: URL: https://github.com/apache/datafusion/issues/10424#issuecomment-2105156463 First, I'd like to emphasize that if passing `stride=1` to the UDF worked the same as *not* providing a stride in the SQL api, then this would be just porcelain / a n

Re: [PR] Minor: format comments in `PushDownFilter` rule [datafusion]

2024-05-10 Thread via GitHub
alamb merged PR #10437: URL: https://github.com/apache/datafusion/pull/10437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10445: URL: https://github.com/apache/datafusion/pull/10445#discussion_r1597133318 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -188,37 +181,61 @@ impl LogicalPlanBuilder { n_cols ); } -

[PR] DRAFT: example fix to make `stride` optional in `array_slice` UDF [datafusion]

2024-05-10 Thread via GitHub
Michael-J-Ward opened a new pull request, #10450: URL: https://github.com/apache/datafusion/pull/10450 All i did was expand the `make_udf_function` macro and add the `if let Some(stride) = stride` conditional. To me, making the argument `Option<_>` is the natural way to make it optio

Re: [PR] chore: Improve release process for next time [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10447: URL: https://github.com/apache/datafusion/pull/10447#discussion_r1597128139 ## datafusion/functions-aggregate/README.md: ## @@ -0,0 +1,27 @@ + + +# DataFusion Aggregate Function Library Review Comment: ❤️ -- This is an automated mes

Re: [PR] Introduce coercion signature `VariadicCoercion` and `UniformCoercion` [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10439: URL: https://github.com/apache/datafusion/pull/10439#discussion_r1597109610 ## datafusion/functions-array/src/make_array.rs: ## @@ -111,6 +111,25 @@ impl ScalarUDFImpl for MakeArray { fn aliases(&self) -> &[String] { &self.alia

[I] make some datasource listing helper functions public? [datafusion]

2024-05-10 Thread via GitHub
samuelcolvin opened a new issue, #10449: URL: https://github.com/apache/datafusion/issues/10449 ### Is your feature request related to a problem or challenge? I'm trying to implement a custom variant of `ListingTable`, and I'm running into an issue that the following three helper func

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2105096394 > As we add support for more Spark expressions, I can see that it makes sense to see if these expressions are already implemented in DataFusion, but we would still have to impleme

[PR] Add `Expr::try_as_col`, deprecate `Expr::try_into_col` [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new pull request, #10448: URL: https://github.com/apache/datafusion/pull/10448 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/9637 ## Rationale for this change There are places in the code that check if an expr is a c

Re: [I] Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on issue #414: URL: https://github.com/apache/datafusion-comet/issues/414#issuecomment-2105071185 There is also a reported compatibility issue with Databricks Spark: #190 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597072728 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType, o

Re: [PR] fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597071283 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType, o

Re: [PR] fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597071283 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType, o

Re: [I] Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on issue #414: URL: https://github.com/apache/datafusion-comet/issues/414#issuecomment-2105063433 Thanks @andygrove for creating this. I think we don't claim that Comet supports for closed source forks of Spark right now. It would be impossible to make such claims as

Re: [PR] fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1597065878 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType,

[I] Will Comet support closed-source forks of Apache Spark (e.g. CSP versions)? [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove opened a new issue, #414: URL: https://github.com/apache/datafusion-comet/issues/414 ### What is the problem the feature request solves? We have our first PR up that works around an issue with Comet working with AWS Spark (https://github.com/apache/datafusion-comet/pull/412)

Re: [PR] Draft: Add examples from TPC-H [datafusion-python]

2024-05-10 Thread via GitHub
timsaucer commented on PR #666: URL: https://github.com/apache/datafusion-python/pull/666#issuecomment-2105051976 Thanks for the feedback. I am seeing a few differences between a couple of the results I'm getting and what's in the answers file, so I want to get those resolved before commit

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2105051501 > Just to understand, backed by datafusion does not automatically mean that has Spark compatibility? I have a similar question, and I'm not sure that I really understand th

Re: [PR] chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on PR #413: URL: https://github.com/apache/datafusion-comet/pull/413#issuecomment-2105016299 > I don't think we need to change the description of the config. We just need to make it internal so that it doesn't appear in the public documentation. Yes, I agreed. Thanks

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2104990426 @parthchandra @advancedxy @kazuyukitanimura @viirya @andygrove if you guys have time to have a second look -- This is an automated message from the Apache Git Service. To respon

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#issuecomment-2104989539 Changes: - only 1 spark coverage file: `docs/spark_builtin_expr_coverage.txt` - The file contains table with `spark function name`, `query`, `result`, `cometMessage`, `datafu

Re: [PR] Coverage: Add a manual test to show what Spark built in expression the DF can support directly [datafusion-comet]

2024-05-10 Thread via GitHub
comphead commented on code in PR #331: URL: https://github.com/apache/datafusion-comet/pull/331#discussion_r1597014414 ## spark/src/test/scala/org/apache/comet/CometExpressionCoverageSuite.scala: ## @@ -135,6 +146,97 @@ class CometExpressionCoverageSuite extends CometTestBase w

Re: [PR] Draft: Add examples from TPC-H [datafusion-python]

2024-05-10 Thread via GitHub
andygrove commented on PR #666: URL: https://github.com/apache/datafusion-python/pull/666#issuecomment-2104977334 These examples are looking really nice @timsaucer. Don't feel that you have to wait until all of them are implemented before we start merging into main. We could do this in sta

Re: [PR] chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove merged PR #413: URL: https://github.com/apache/datafusion-comet/pull/413 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove commented on PR #413: URL: https://github.com/apache/datafusion-comet/pull/413#issuecomment-2104961167 > lgtm thanks @viirya should we also modify the text saying this param is internal? I don't think we need to change the description of the config. We just need to make it

Re: [PR] chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on PR #413: URL: https://github.com/apache/datafusion-comet/pull/413#issuecomment-2104924176 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] bug: TPC-H q16 failed with "ColumnarToRow does not implement doExecuteBroadcast" [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove commented on issue #408: URL: https://github.com/apache/datafusion-comet/issues/408#issuecomment-2104912739 Thanks @viirya that resolves this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] bug: TPC-H q16 failed with "ColumnarToRow does not implement doExecuteBroadcast" [datafusion-comet]

2024-05-10 Thread via GitHub
andygrove closed issue #408: bug: TPC-H q16 failed with "ColumnarToRow does not implement doExecuteBroadcast" URL: https://github.com/apache/datafusion-comet/issues/408 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] chore: Improve release process for next time [datafusion]

2024-05-10 Thread via GitHub
andygrove opened a new pull request, #10447: URL: https://github.com/apache/datafusion/pull/10447 ## Which issue does this PR close? N/A ## Rationale for this change I ran into a couple of minor issues during the 38.0.0 release process ## What chang

Re: [PR] fix: workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on code in PR #412: URL: https://github.com/apache/datafusion-comet/pull/412#discussion_r1596962572 ## spark/src/main/spark-3.x/org/apache/comet/shims/ShimCometScanExec.scala: ## @@ -69,6 +69,8 @@ trait ShimCometScanExec { readSchema: StructType, o

Re: [PR] Minor: Simplify conjunction and disjunction, improve docs [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10446: URL: https://github.com/apache/datafusion/pull/10446#discussion_r1596956807 ## datafusion/expr/src/utils.rs: ## @@ -1107,20 +1107,49 @@ fn split_binary_impl<'a>( /// assert_eq!(conjunction(split), Some(expr)); /// ``` pub fn conjunction(f

[PR] Minor: Simplify conjunction and disjunction, improve docs [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new pull request, #10446: URL: https://github.com/apache/datafusion/pull/10446 ## Which issue does this PR close? N/A ## Rationale for this change I ran into this code while working on #10291 ## What changes are included in this PR? 1. Simpli

[PR] Fix values with different data types caused failure [datafusion]

2024-05-10 Thread via GitHub
b41sh opened a new pull request, #10445: URL: https://github.com/apache/datafusion/pull/10445 ## Which issue does this PR close? Closes #10440 ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [I] [EPIC] Support native execution for all TPC-H queries [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on issue #391: URL: https://github.com/apache/datafusion-comet/issues/391#issuecomment-2104827125 Please disable `spark.comet.exec.broadcast.enabled` which should not be used in normal query: https://github.com/apache/datafusion-comet/issues/408#issuecomment-2104818958 -

[PR] chore: Make COMET_EXEC_BROADCAST_FORCE_ENABLED internal config [datafusion-comet]

2024-05-10 Thread via GitHub
viirya opened a new pull request, #413: URL: https://github.com/apache/datafusion-comet/pull/413 ## Which issue does this PR close? Closes #408. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [I] bug: TPC-H q16 failed with "ColumnarToRow does not implement doExecuteBroadcast" [datafusion-comet]

2024-05-10 Thread via GitHub
viirya commented on issue #408: URL: https://github.com/apache/datafusion-comet/issues/408#issuecomment-2104818958 Oh, please disable `spark.comet.exec.broadcast.enabled`, which is only used to enforce enabling broadcast for invalid cases. I should make it internal config. -- This is an

[PR] Stop copying LogicalPlan and Exprs in `PushDownFilter` [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new pull request, #10444: URL: https://github.com/apache/datafusion/pull/10444 Draft * builds on https://github.com/apache/datafusion/pull/10437 * Still has clones I am removing ## Which issue does this PR close? Closes https://github.com/apache/datafusio

[PR] [FIX] - workaround for Amazon EMR version: emr-6.15.0 and Spark 3.4.1 custom implementation [datafusion-comet]

2024-05-10 Thread via GitHub
ceppelli opened a new pull request, #412: URL: https://github.com/apache/datafusion-comet/pull/412 ## Which issue does this PR close? Closes #411 . ## Rationale for this change the file spark-sql_2.12-3.4.1-amzn-2.jar is a custom version of spark and

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
alamb commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596890143 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596888406 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596887351 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

[I] compatibility issue with AWS EMR 6.15.0 SPARK 3.4.1 [datafusion-comet]

2024-05-10 Thread via GitHub
ceppelli opened a new issue, #411: URL: https://github.com/apache/datafusion-comet/issues/411 ### Describe the bug Compiling and running datafusion-comet for AWS EMR version emr-6.15.0 with Spark 3.4.1 won't work # how to reproduce the issue ``` scala> (0 until 10).to

Re: [PR] Add `LogicalPlan::recompute_schema` for handling rewrite passes [datafusion]

2024-05-10 Thread via GitHub
alamb commented on PR #10410: URL: https://github.com/apache/datafusion/pull/10410#issuecomment-2104788296 PR https://github.com/apache/datafusion/pull/10443 to add comments based on @yyy1000 's feedback on this PR -- This is an automated message from the Apache Git Service. To respond to

[PR] Minor: Clarify usecase for `LogicalPlan::recompute_schema` [datafusion]

2024-05-10 Thread via GitHub
alamb opened a new pull request, #10443: URL: https://github.com/apache/datafusion/pull/10443 ## Which issue does this PR close? ## Rationale for this change As part of the review https://github.com/apache/datafusion/pull/10410 @yyy1000 had a good question https://gith

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596885617 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596884516 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596884320 ## datafusion/optimizer/src/join_key_set.rs: ## @@ -0,0 +1,240 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] Add `LogicalPlan::recompute_schema` for handling rewrite passes [datafusion]

2024-05-10 Thread via GitHub
alamb commented on PR #10410: URL: https://github.com/apache/datafusion/pull/10410#issuecomment-2104782564 Included as part of https://github.com/apache/datafusion/pull/10410, so closing this PR -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Add `LogicalPlan::recompute_schema` for handling rewrite passes [datafusion]

2024-05-10 Thread via GitHub
alamb closed pull request #10410: Add `LogicalPlan::recompute_schema` for handling rewrite passes URL: https://github.com/apache/datafusion/pull/10410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Stop copying LogicalPlan and Exprs in `OptimizeProjections` (2% faster planning) [datafusion]

2024-05-10 Thread via GitHub
alamb commented on PR #10405: URL: https://github.com/apache/datafusion/pull/10405#issuecomment-2104780979 Thanks for the review @comphead fyi @mustafasrepo -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Remove some Expr clones in `EliminateCrossJoin`(3%-5% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead commented on code in PR #10430: URL: https://github.com/apache/datafusion/pull/10430#discussion_r1596878039 ## datafusion/optimizer/src/eliminate_cross_join.rs: ## @@ -204,8 +204,8 @@ fn try_flatten_join_inputs( fn find_inner_join( left_input: &LogicalPlan, r

Re: [PR] Stop copying LogicalPlan and Exprs in `OptimizeProjections` (2% faster planning) [datafusion]

2024-05-10 Thread via GitHub
comphead merged PR #10405: URL: https://github.com/apache/datafusion/pull/10405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Stop copying LogicalPlan and Exprs in `OptimizeProjections` [datafusion]

2024-05-10 Thread via GitHub
comphead closed issue #10209: Stop copying LogicalPlan and Exprs in `OptimizeProjections` URL: https://github.com/apache/datafusion/issues/10209 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

  1   2   >