Re: [I] Many `DEBUG datafusion_functions_array] Overwrite existing UDF: array_to_string` messages in log [datafusion]

2024-05-24 Thread via GitHub
goldmedal commented on issue #10658: URL: https://github.com/apache/datafusion/issues/10658#issuecomment-2130807751 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] bug: substring with negative indices produces incorrect results [datafusion-comet]

2024-05-24 Thread via GitHub
sonhmai commented on issue #463: URL: https://github.com/apache/datafusion-comet/issues/463#issuecomment-2130650900 i'm working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10655: URL: https://github.com/apache/datafusion/pull/10655#discussion_r1614200131 ## datafusion/functions/src/core/getfield.rs: ## @@ -106,6 +106,9 @@ impl ScalarUDFImpl for GetFieldFunc { }; let access_schema =

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10655: URL: https://github.com/apache/datafusion/pull/10655#discussion_r1614200131 ## datafusion/functions/src/core/getfield.rs: ## @@ -106,6 +106,9 @@ impl ScalarUDFImpl for GetFieldFunc { }; let access_schema =

Re: [PR] Fix `NULL["field"]` for expr_API [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10655: URL: https://github.com/apache/datafusion/pull/10655#discussion_r1614200131 ## datafusion/functions/src/core/getfield.rs: ## @@ -106,6 +106,9 @@ impl ScalarUDFImpl for GetFieldFunc { }; let access_schema =

Re: [I] Many `DEBUG datafusion_functions_array] Overwrite existing UDF: array_to_string` messages in log [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on issue #10658: URL: https://github.com/apache/datafusion/issues/10658#issuecomment-2130541520 Just cleanup aliases in those functions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1614116423 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1614116423 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-24 Thread via GitHub
xinlifoobar commented on PR #10645: URL: https://github.com/apache/datafusion/pull/10645#issuecomment-2130457023 The gate issue is not related to the PR.. could we rerun? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Sort Merge Join. LeftAnti issues [datafusion]

2024-05-24 Thread via GitHub
edmondop commented on issue #10380: URL: https://github.com/apache/datafusion/issues/10380#issuecomment-2130435075 Thanks. Will work on #10659 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] SortMergeJoin: Add fuzz tests for SMJ when the JoinFilter is set [datafusion]

2024-05-24 Thread via GitHub
comphead opened a new issue, #10659: URL: https://github.com/apache/datafusion/issues/10659 ### Is your feature request related to a problem or challenge? SMJ now is in experimental state, and the test coverage can be improved. Because of poor test coverage it was found multiple bugs

Re: [I] Sort Merge Join. LeftAnti issues [datafusion]

2024-05-24 Thread via GitHub
comphead commented on issue #10380: URL: https://github.com/apache/datafusion/issues/10380#issuecomment-2130415661 Thanks @edmondop I'm already working on it, if you have spare time and would like to work on SMJ its really needed to add fuzz tests for SMJ when the filter set, I'll create a

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-24 Thread via GitHub
alamb commented on code in PR #10543: URL: https://github.com/apache/datafusion/pull/10543#discussion_r1614057653 ## datafusion/physical-plan/src/work_table.rs: ## @@ -169,7 +169,7 @@ impl ExecutionPlan for WorkTableExec { } -fn children() -> Vec> { +fn

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-24 Thread via GitHub
comphead commented on PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#issuecomment-2130409303 @andygrove @advancedxy I fixed the test, implementing extra parsing and manual small tests if the parsing is complicated. I hope now we have better picture. -- This is an

Re: [PR] Add `FileScanConfig::new()` API [datafusion]

2024-05-24 Thread via GitHub
alamb commented on PR #10623: URL: https://github.com/apache/datafusion/pull/10623#issuecomment-2130400446 Thank you @metegenez -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-24 Thread via GitHub
alamb commented on code in PR #10650: URL: https://github.com/apache/datafusion/pull/10650#discussion_r1614042615 ## datafusion/functions/Cargo.toml: ## @@ -80,7 +80,7 @@ itertools = { workspace = true } log = { workspace = true } md-5 = { version = "^0.10.0", optional = true

Re: [I] Library Guide: Building LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
edmondop commented on issue #7306: URL: https://github.com/apache/datafusion/issues/7306#issuecomment-2130346827 @alamb I checked the previous @andygrove PR and it seems like we have a good setup. I read the docs, and noticed they lack a paragraph on how to translate the LogicalPlan into

[PR] change version to 38.0.1 [datafusion-python]

2024-05-24 Thread via GitHub
andygrove opened a new pull request, #716: URL: https://github.com/apache/datafusion-python/pull/716 # Which issue does this PR close? N/A # Rationale for this change We cannot upload 38.0.0 RC2 to test.pypi.org so bumping to 38.0.1 # What changes

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-24 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2130307887 @metegenez Of course on it being a community decision. It is an ASF project after all.  I've probably led everyone astray by naming this "BigQuery Options" because that just

Re: [PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #469: URL: https://github.com/apache/datafusion-comet/pull/469#discussion_r1613974875 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1094,24 +1094,46 @@ object QueryPlanSerde extends Logging with

Re: [I] Sort Merge Join. LeftAnti issues [datafusion]

2024-05-24 Thread via GitHub
edmondop commented on issue #10380: URL: https://github.com/apache/datafusion/issues/10380#issuecomment-2130287032 @comphead can I pick this or are you arleady working on a solution? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613965861 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613965861 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613963307 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1613955491 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1613955491 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

[I] Many `DEBUG datafusion_functions_array] Overwrite existing UDF: array_to_string` messages in log [datafusion]

2024-05-24 Thread via GitHub
alamb opened a new issue, #10658: URL: https://github.com/apache/datafusion/issues/10658 ### Describe the bug We noticed some additional expected log messages upstream in InfluxDB. I found the same messages are present in `datafusion-cli` ### To Reproduce ```

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1613955491 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

Re: [PR] fix wrong type validation on unnest expr [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai commented on code in PR #10657: URL: https://github.com/apache/datafusion/pull/10657#discussion_r1613955127 ## datafusion/sql/src/utils.rs: ## @@ -311,7 +311,7 @@ pub(crate) fn recursive_transform_unnest( tnr: _, } =

[PR] fix wrong type validation on unnest expr [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai opened a new pull request, #10657: URL: https://github.com/apache/datafusion/pull/10657 ## Which issue does this PR close? Closes #10656 ## Rationale for this change ## What changes are included in this PR? ## Are these

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613950678 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613948812 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai opened a new issue, #10656: URL: https://github.com/apache/datafusion/issues/10656 ### Describe the bug Given this slt ``` statement ok CREATE TABLE temp AS VALUES ([struct(1,2)]) ; query ? select unnest(column1) as struct_elem from temp;

Re: [I] Wrong error thrown when unnesting a list of struct [datafusion]

2024-05-24 Thread via GitHub
duongcongtoai commented on issue #10656: URL: https://github.com/apache/datafusion/issues/10656#issuecomment-2130247278 I'm open a PR to fix soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2130243583 FIW there is a lot more to SQL evaluation than just the expression evaluation, so that might be a reason to use DataFusion even if you had to implement your own expressions 樂

[I] Selecting struct field within field produces unexpected results [datafusion-python]

2024-05-24 Thread via GitHub
timsaucer opened a new issue, #715: URL: https://github.com/apache/datafusion-python/issues/715 **Describe the bug** When you have a column that is a struct of struct and you attempt to index into the lowest level, if there is a null at the first level of the struct you get an

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1613906351 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,20 @@ class CometExpressionSuite extends CometTestBase with

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
kazuyukitanimura commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1613863992 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,89 @@ impl Cast { self.eval_mode,

[I] Error on `NULL["field_name"]`: The expression to get an indexed field is only valid for `List`, `Struct`, or `Map` types, got Null [datafusion]

2024-05-24 Thread via GitHub
alamb opened a new issue, #10654: URL: https://github.com/apache/datafusion/issues/10654 ### Describe the bug Expr::field is broken for ScalarValue::Null After https://github.com/apache/datafusion/pull/10375 merged `Expr::field` is broken when we try and do it on

Re: [I] Parquet Predicate Pushdown Does Not Handle Type Coercion [datafusion]

2024-05-24 Thread via GitHub
jeffreyssmith2nd commented on issue #7925: URL: https://github.com/apache/datafusion/issues/7925#issuecomment-2130106971 The case we're running into in InfluxDB when enabling timezones is slightly different. It is a parquet file with Timestamp without a timezone and then querying with

Re: [I] bug: ABS should only overflow in ANSI mode [datafusion-comet]

2024-05-24 Thread via GitHub
planga82 commented on issue #464: URL: https://github.com/apache/datafusion-comet/issues/464#issuecomment-2130081236 I want to try this!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-24 Thread via GitHub
tshauck commented on PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#issuecomment-2130054380 I think I made the updates requested in the latest round. I left the dictionary handling the same, but I can look into flattening the dictionary specific to hex if you guys think

Re: [PR] Minor: Csv Options Clean-up [datafusion]

2024-05-24 Thread via GitHub
metegenez commented on code in PR #10650: URL: https://github.com/apache/datafusion/pull/10650#discussion_r1613754503 ## datafusion/core/src/datasource/file_format/csv.rs: ## @@ -301,13 +296,7 @@ impl CsvFormat { while let Some(chunk) =

Re: [PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on code in PR #469: URL: https://github.com/apache/datafusion-comet/pull/469#discussion_r1613743037 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1094,24 +1094,46 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
viirya commented on code in PR #469: URL: https://github.com/apache/datafusion-comet/pull/469#discussion_r1613739740 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1094,24 +1094,46 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on PR #10653: URL: https://github.com/apache/datafusion/pull/10653#issuecomment-2129941680 @alamb curious what you think of fixing just the consumer side first, without touching the producer - if that'd be okay, then I can add some unit tests to this PR? -- This is an

[PR] Support consuming Substrait with compound signature function names [datafusion]

2024-05-24 Thread via GitHub
Blizzara opened a new pull request, #10653: URL: https://github.com/apache/datafusion/pull/10653 Substrait 0.32.0+ requires functions to be specified using compound names, which include the function name as well as the arguments it takes. We don't necessarily need that information while

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
faucct commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2129907158 I think that compiling SQL-expressions to UDFs by hand would kinda kill the whole point of the framework, but it seems like most of the framework would be irrelevant for the

[PR] feat: Add support for RLike [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove opened a new pull request, #469: URL: https://github.com/apache/datafusion-comet/pull/469 ## Which issue does this PR close? N/A ## Rationale for this change Regular expression support is usually important in ETL jobs, so we should start adding

Re: [PR] Add tests for reading numeric limits in parquet statistics [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10642: URL: https://github.com/apache/datafusion/pull/10642#discussion_r1613715175 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -212,13 +218,13 @@ impl Test { let expected_null_counts = Arc::new(expected_null_counts) as

Re: [PR] Add tests for reading numeric limits in parquet statistics [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10642: URL: https://github.com/apache/datafusion/pull/10642#discussion_r1613714632 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -212,13 +218,13 @@ impl Test { let expected_null_counts = Arc::new(expected_null_counts) as

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613685757 ## datafusion/proto-common/src/common.rs: ## @@ -0,0 +1,22 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613687288 ## datafusion/proto-common/src/lib.rs: ## @@ -0,0 +1,62 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613684688 ## datafusion/proto-common/gen/src/main.rs: ## @@ -0,0 +1,57 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
comphead commented on code in PR #10649: URL: https://github.com/apache/datafusion/pull/10649#discussion_r1613680726 ## datafusion/proto-common/Cargo.toml: ## @@ -0,0 +1,54 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [I] bug: hash expression is not consistent with Spark [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove closed issue #427: bug: hash expression is not consistent with Spark URL: https://github.com/apache/datafusion-comet/issues/427 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove merged PR #433: URL: https://github.com/apache/datafusion-comet/pull/433 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
cisaacson commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129782772 Got it, that makes total sense. I will only care about the accepted pushdown filters, so the `TableScan` will work for what we need. -- This is an automated message from

Re: [I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-24 Thread via GitHub
goldmedal commented on issue #10652: URL: https://github.com/apache/datafusion/issues/10652#issuecomment-2129764802 By the way, I saw there're other unimplemented plans in `plan.rs`: - Distinct - Union - Window - Extension (I guess we need to provide some method for

[I] Support convert LogicalPlan JOIN with `Using` constraint to SQL String [datafusion]

2024-05-24 Thread via GitHub
goldmedal opened a new issue, #10652: URL: https://github.com/apache/datafusion/issues/10652 ### Is your feature request related to a problem or challenge? We only support to convert JOIN with `ON` constraint to SQL String now. The SQL as below can't be converted now. ```

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on PR #10531: URL: https://github.com/apache/datafusion/pull/10531#issuecomment-2129741689 @jonahgao @alamb this last one is ready now too :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129736024 > If TableScan has filters why would that not catch all filters? Depending on the value of

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
cisaacson commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129731243 This looks very good, pretty much what I implemented. The only question I have remaining is: If `TableScan` has `filters` why would that not catch all filters? What

Re: [I] Bad CPU type in executable protoc-jar [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on issue #227: URL: https://github.com/apache/datafusion-comet/issues/227#issuecomment-2129653865 This is no longer an issue for me and we have not had other reports of this happening, so will close this -- This is an automated message from the Apache Git Service. To

Re: [I] Bad CPU type in executable protoc-jar [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove closed issue #227: Bad CPU type in executable protoc-jar URL: https://github.com/apache/datafusion-comet/issues/227 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] bug: CAST timestamp to string ignores timezone prior to Spark 3.4 [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove opened a new issue, #468: URL: https://github.com/apache/datafusion-comet/issues/468 ### Describe the bug In `CometExpressionSuite` we have two tests that are ignored for Spark 3.2 and 3.3. ```scala test("cast timestamp and timestamp_ntz to string") { //

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1613550658 ## datafusion/functions-aggregate/Cargo.toml: ## @@ -39,6 +39,7 @@ path = "src/lib.rs" [dependencies] arrow = { workspace = true } +arrow-schema = {

Re: [PR] Move Median to `functions-aggregate` and Introduce Numeric signature [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10644: URL: https://github.com/apache/datafusion/pull/10644#discussion_r1613540288 ## datafusion/functions-aggregate/src/median.rs: ## @@ -15,71 +15,105 @@ // specific language governing permissions and limitations // under the License.

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-24 Thread via GitHub
Omega359 commented on PR #10573: URL: https://github.com/apache/datafusion/pull/10573#issuecomment-2129537623 This shouldn't have passed checks. ``` + cargo fmt --all -- --check `cargo metadata` exited with an error: error: failed to load manifest for workspace member

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-24 Thread via GitHub
faucct commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2129536338 Though the paper that you have mentioned admits that JIT-compilation is beneficial for OLTP workloads: > Besides OLAP performance, other factors also play an important role.

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2129511762 > So that suggests to me it treats NaN as the largest floating point value This is confirmed in the latest version of the [postgres

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2129505697 > So that suggests to me it treats NaN as the largest floating point value If this is the case then there is divergence between postgres and arrow-rs. Which takes

Re: [PR] Minor: Add tests showing aggregate behavior for NaNs [datafusion]

2024-05-24 Thread via GitHub
westonpace commented on code in PR #10634: URL: https://github.com/apache/datafusion/pull/10634#discussion_r1613461895 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4374,6 +4374,42 @@ GROUP BY dummy text1, text1, text1 +# Tests for aggregating with NaN

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on PR #10640: URL: https://github.com/apache/datafusion/pull/10640#issuecomment-2129492219 I plan to merge this PR now as it might conflict with #10646. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao merged PR #10640: URL: https://github.com/apache/datafusion/pull/10640 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[PR] Convert Sum to UDAF [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 opened a new pull request, #10651: URL: https://github.com/apache/datafusion/pull/10651 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Start setting up new StreamTable config [datafusion]

2024-05-24 Thread via GitHub
matthewmturner commented on PR #10600: URL: https://github.com/apache/datafusion/pull/10600#issuecomment-2129456972 @alamb i have a working example now. i have idea to update it to show more of the streaming nature (i.e. write to the fifo and get batches multiple times) but wont have time

Re: [I] Versions >32.0.0 on PyPI have broken substrait support [datafusion-python]

2024-05-24 Thread via GitHub
mbwhite commented on issue #646: URL: https://github.com/apache/datafusion-python/issues/646#issuecomment-2129449063 FYI _ tried the v38.0.0. from pypi-test and problem remains. Rebuilding the code locally and using the wheel created then works fine `maturin build --features substrait`

Re: [I] Implement Spark-compatible CAST from String to Timestamp [datafusion-comet]

2024-05-24 Thread via GitHub
andygrove commented on issue #328: URL: https://github.com/apache/datafusion-comet/issues/328#issuecomment-2129430050 There is a follow on issue to complete this work: https://github.com/apache/datafusion-comet/issues/376 -- This is an automated message from the Apache Git Service. To

Re: [PR] Convert first, last aggregate function to UDAF [datafusion]

2024-05-24 Thread via GitHub
jayzhan211 commented on code in PR #10648: URL: https://github.com/apache/datafusion/pull/10648#discussion_r1613370248 ## datafusion/functions-aggregate/src/first_last.rs: ## @@ -161,6 +144,23 @@ impl AggregateUDFImpl for FirstValue { fn aliases() -> &[String] {

[PR] Minor: Csv Options Clean-up [datafusion]

2024-05-24 Thread via GitHub
berkaysynnada opened a new pull request, #10650: URL: https://github.com/apache/datafusion/pull/10650 ## Which issue does this PR close? Closes #. ## Rationale for this change When CSV header option is not specified from the options clause, it is set

[PR] Factor out common datafusion types into another proto file [datafusion]

2024-05-24 Thread via GitHub
mustafasrepo opened a new pull request, #10649: URL: https://github.com/apache/datafusion/pull/10649 ## Which issue does this PR close? Closes #10477. ## Rationale for this change See [issue

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
crepererum commented on code in PR #10647: URL: https://github.com/apache/datafusion/pull/10647#discussion_r1613322378 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -75,7 +75,79 @@ pub use metrics::ParquetFileMetrics; pub use

[PR] Convert first, last aggregate function to UDAF [datafusion]

2024-05-24 Thread via GitHub
mustafasrepo opened a new pull request, #10648: URL: https://github.com/apache/datafusion/pull/10648 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] Make TaskContext wrap SessionState [datafusion]

2024-05-24 Thread via GitHub
crepererum commented on issue #10631: URL: https://github.com/apache/datafusion/issues/10631#issuecomment-2129296188 I would suggest a rather larger refactoring? We have: - `SessionState` - `SessionConfig` - `SessionContext` - `TaskContext` - `RuntimeConfig` -

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
Blizzara commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613246909 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1686,7 +1691,7 @@ fn to_substrait_bounds(window_frame: ) -> Result<(Bound, Bound)> { )) }

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2129137066 Moving out of slack into Github so it might be more easily found If your usecase is to to get the list of filters and tables that appear in a query, one way to do this is:

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-24 Thread via GitHub
ozankabak commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2129132050 @davisp what I meant was looking at how other dialects (and systems using those dialects, such as BigQuery) handle column-specific metadata and analyze pros and cons of various

Re: [PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
alamb commented on PR #10647: URL: https://github.com/apache/datafusion/pull/10647#issuecomment-2129087302 @thinkharderdev , @tustvold, @Ted-Jiang and @crepererum: if you have time, could you double check that this correctly describes `ParquetExec` to your understanding? -- This is an

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-24 Thread via GitHub
metegenez commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2129065420 There is a robust method to define column-specific options in Datafusion table options. I believe there should be a single way to do this in Datafusion, but BigQuery is widely

[PR] Improve `ParquetExec` and related documentation [datafusion]

2024-05-24 Thread via GitHub
alamb opened a new pull request, #10647: URL: https://github.com/apache/datafusion/pull/10647 ## Which issue does this PR close? Part of #10549 ## Rationale for this change While trying to make an example that uses ParquetExec, I found it's documentation could be improved

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613097509 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1175,6 +1180,47 @@ pub(crate) fn from_substrait_type(dt: ::proto::Type) -> Result Result { +fn

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-24 Thread via GitHub
jonahgao commented on code in PR #10640: URL: https://github.com/apache/datafusion/pull/10640#discussion_r1613068493 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -1686,7 +1691,7 @@ fn to_substrait_bounds(window_frame: ) -> Result<(Bound, Bound)> { )) }

[PR] feat: add substrait support for Interval types and literals [datafusion]

2024-05-24 Thread via GitHub
waynexia opened a new pull request, #10646: URL: https://github.com/apache/datafusion/pull/10646 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? Support convert to/from

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-24 Thread via GitHub
alamb merged PR #10638: URL: https://github.com/apache/datafusion/pull/10638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

Re: [I] Docker CLI build fails in WSL2 - "Ubuntu 22.04.4 LTS" [datafusion]

2024-05-24 Thread via GitHub
alamb closed issue #10472: Docker CLI build fails in WSL2 - "Ubuntu 22.04.4 LTS" URL: https://github.com/apache/datafusion/issues/10472 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-24 Thread via GitHub
alamb commented on code in PR #10638: URL: https://github.com/apache/datafusion/pull/10638#discussion_r1613016952 ## datafusion-cli/Dockerfile: ## @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -FROM rust:1.73-bullseye as

Re: [PR] Minor: add runtime asserts to `RowGroup` [datafusion]

2024-05-24 Thread via GitHub
alamb commented on PR #10641: URL: https://github.com/apache/datafusion/pull/10641#issuecomment-2128826061 Thanks @viirya and @advancedxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] June 2024 ASF Board Report [datafusion]

2024-05-24 Thread via GitHub
alamb commented on issue #10155: URL: https://github.com/apache/datafusion/issues/10155#issuecomment-2128812716 Draft report: https://docs.google.com/document/d/1h4yjvomQO0XdzxKuE4aBSWGNliFFmn8GADd8DlPuXBw/edit -- This is an automated message from the Apache Git Service. To respond to

Re: [I] feat: Support ANSI mode for round [datafusion-comet]

2024-05-24 Thread via GitHub
vidyasankarv commented on issue #466: URL: https://github.com/apache/datafusion-comet/issues/466#issuecomment-2128653555 I will work on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

  1   2   >