Re: [I] feat: Support ANSI mode for round [datafusion-comet]

2024-05-23 Thread via GitHub
vidyasankarv commented on issue #466: URL: https://github.com/apache/datafusion-comet/issues/466#issuecomment-2128653555 I will work on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-23 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1612654513 ## runners/datafusion-comet/tpcbench.py: ## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agree

Re: [PR] Add initial README and scripts [datafusion-benchmarks]

2024-05-23 Thread via GitHub
viirya commented on code in PR #1: URL: https://github.com/apache/datafusion-benchmarks/pull/1#discussion_r1612654116 ## runners/datafusion-comet/tpcbench.py: ## @@ -0,0 +1,108 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agree

Re: [PR] Minor: add runtime asserts to `RowGroup` [datafusion]

2024-05-23 Thread via GitHub
viirya merged PR #10641: URL: https://github.com/apache/datafusion/pull/10641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Replace logical plan from Arc to Box [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 closed pull request #9763: Replace logical plan from Arc to Box URL: https://github.com/apache/datafusion/pull/9763 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Fix incorrect statistics read for binary columns in parquet [datafusion]

2024-05-23 Thread via GitHub
xinlifoobar opened a new pull request, #10645: URL: https://github.com/apache/datafusion/pull/10645 ## Which issue does this PR close? Closes #10605 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-23 Thread via GitHub
advancedxy commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1612607959 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,100 @@ use crate::physical_optimizer::pruning::{PruningPredicate, Pruni

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612605807 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | DataTyp

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-23 Thread via GitHub
advancedxy commented on PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#issuecomment-2128357191 Thanks everyone for reviewing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-23 Thread via GitHub
vaibhawvipul commented on issue #465: URL: https://github.com/apache/datafusion-comet/issues/465#issuecomment-2128357365 I am working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Replace logical plan from Arc to Box [datafusion]

2024-05-23 Thread via GitHub
github-actions[bot] commented on PR #9763: URL: https://github.com/apache/datafusion/pull/9763#issuecomment-2128338094 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Avoid clone for LogicalPlan during optimizer passes [datafusion]

2024-05-23 Thread via GitHub
github-actions[bot] commented on PR #9768: URL: https://github.com/apache/datafusion/pull/9768#issuecomment-2128338041 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

[PR] Move Median to `functions-aggregate` [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 opened a new pull request, #10644: URL: https://github.com/apache/datafusion/pull/10644 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1612495192 ## core/src/execution/datafusion/expressions/scalar_funcs/hex.rs: ## @@ -0,0 +1,371 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612492325 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -503,41 +503,37 @@ impl Cast { fn cast_array(&self, array: ArrayRef) -> DataFusionResult {

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#issuecomment-2128254128 This is ready for review now @viirya @parthchandra @kazuyukitanimura @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612478631 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -622,14 +590,91 @@ impl Cast { self.eval_mode, from_type,

[I] Add tests for casting between timestamp types [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #467: URL: https://github.com/apache/datafusion-comet/issues/467 ### What is the problem the feature request solves? We currently delegate to DataFusion when casting between timestamps (as discovered in https://github.com/apache/datafusion-comet/pull/461)

Re: [I] Create a DataFusion blog [datafusion]

2024-05-23 Thread via GitHub
andygrove closed issue #10535: Create a DataFusion blog URL: https://github.com/apache/datafusion/issues/10535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Create a DataFusion blog [datafusion]

2024-05-23 Thread via GitHub
andygrove commented on issue #10535: URL: https://github.com/apache/datafusion/issues/10535#issuecomment-2128187298 This task is complete: https://datafusion.apache.org/blog/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
appletreeisyellow commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2128169898 > * Fall Back: When the clocks move backward, there is an "extra" hour. For example, in US central time zone, when DST ends at 2:00 AM, the clocks are set back to 1:00

Re: [I] Support `date_bin` on timestamps with timezone, properly accounting for Daylight Savings Time [datafusion]

2024-05-23 Thread via GitHub
appletreeisyellow commented on issue #10602: URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2128168367 To make `date_bin` timezone aware, there are some edge cases we need to consider when design it: 1. **Daylight Saving Time (DST) Transitions:** - Spring F

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove merged PR #451: URL: https://github.com/apache/datafusion-comet/pull/451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10643: URL: https://github.com/apache/datafusion/pull/10643#issuecomment-2128118789 The more I think about this the more I like https://github.com/apache/datafusion/pull/10636 and deprecate the ParquetExec::new function... -- This is an automated message from the A

Re: [PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb closed pull request #10643: Simplify `ParquetExec::new()` URL: https://github.com/apache/datafusion/pull/10643 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

[PR] Simplify `ParquetExec::new()` [datafusion]

2024-05-23 Thread via GitHub
alamb opened a new pull request, #10643: URL: https://github.com/apache/datafusion/pull/10643 ## Which issue does this PR close? Part of #10546 ## Rationale for this change While working on https://github.com/apache/datafusion/pull/10549 it was cumbersome to create a

Re: [PR] Fix Already Borrowed Panic When SessionContext Used in Multiple Threads [datafusion-python]

2024-05-23 Thread via GitHub
andygrove commented on PR #367: URL: https://github.com/apache/datafusion-python/pull/367#issuecomment-2128107098 I'll go ahead and close this since it has been open for a year and is in draft. @kylebrooks-8451 feel free to re-open if you resume work on this -- This is an automate

Re: [PR] Fix Already Borrowed Panic When SessionContext Used in Multiple Threads [datafusion-python]

2024-05-23 Thread via GitHub
andygrove closed pull request #367: Fix Already Borrowed Panic When SessionContext Used in Multiple Threads URL: https://github.com/apache/datafusion-python/pull/367 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Use int64 for TPC-H keys and set input schema to not nullable [datafusion-python]

2024-05-23 Thread via GitHub
andygrove merged PR #714: URL: https://github.com/apache/datafusion-python/pull/714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] test: parametrize test_array_functions [datafusion-python]

2024-05-23 Thread via GitHub
andygrove merged PR #678: URL: https://github.com/apache/datafusion-python/pull/678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add `ParquetExec::builder` API [datafusion]

2024-05-23 Thread via GitHub
alamb closed pull request #10636: Add `ParquetExec::builder` API URL: https://github.com/apache/datafusion/pull/10636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612318862 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | Data

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612318862 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | Data

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612317278 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | Data

Re: [I] Comet doesn't support Spark BroadcastHashJoinExec if it is null-aware anti-join [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on issue #457: URL: https://github.com/apache/datafusion-comet/issues/457#issuecomment-2128028530 For details about Spark null-aware anti join, see https://issues.apache.org/jira/browse/SPARK-32290. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2128021619 Merged. Thanks @sunchao @kazuyukitanimura @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Automatically pick best shuffle implementation for a query [datafusion-comet]

2024-05-23 Thread via GitHub
viirya closed issue #459: Automatically pick best shuffle implementation for a query URL: https://github.com/apache/datafusion-comet/issues/459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya merged PR #460: URL: https://github.com/apache/datafusion-comet/pull/460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[I] feat: Support ANSI mode for round [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #466: URL: https://github.com/apache/datafusion-comet/issues/466 ### What is the problem the feature request solves? Comet does not support ANSI mode for `round`. ## Create test data ``` val df = Seq(Int.MaxValue, Int.MinValue).toDF("a")

[I] feat: Implement ANSI support for UnaryMinus [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #465: URL: https://github.com/apache/datafusion-comet/issues/465 ### What is the problem the feature request solves? Comet does not support ANSI mode for UnaryMinus. ## Create test data ``` val df = Seq(Int.MaxValue, Int.MinValue).toDF("a"

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2127914834 Oh, it is from one patch just merged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-23 Thread via GitHub
Omega359 commented on code in PR #10638: URL: https://github.com/apache/datafusion/pull/10638#discussion_r1612224719 ## datafusion-cli/Dockerfile: ## @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -FROM rust:1.73-bullseye as

Re: [PR] fix Incorrect statistics read for i8 i16 columns in parquet [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10629: URL: https://github.com/apache/datafusion/pull/10629#issuecomment-2127905830 Thanks again @Lordworms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-23 Thread via GitHub
alamb closed issue #10585: Incorrect statistics read for `i8` `i16` columns in parquet URL: https://github.com/apache/datafusion/issues/10585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] fix Incorrect statistics read for i8 i16 columns in parquet [datafusion]

2024-05-23 Thread via GitHub
alamb merged PR #10629: URL: https://github.com/apache/datafusion/pull/10629 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612223812 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

[I] bug: ABS should only overflow in ANSI mode [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #464: URL: https://github.com/apache/datafusion-comet/issues/464 ### Describe the bug Comet currently fails with an overflow when ANSI mode is disabled, but we should return the original value instead. ## Create test data ``` val df = Seq

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-23 Thread via GitHub
edmondop commented on code in PR #10638: URL: https://github.com/apache/datafusion/pull/10638#discussion_r1612203716 ## datafusion-cli/Dockerfile: ## @@ -15,7 +15,7 @@ # specific language governing permissions and limitations # under the License. -FROM rust:1.73-bullseye as

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya closed pull request #460: feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode URL: https://github.com/apache/datafusion-comet/pull/460 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2127852781 ``` Error: /Users/runner/work/datafusion-comet/datafusion-comet/spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:1215: value COMET_COLUMNAR_SHUFFLE_ENABLED i

[I] bug: substring with negative indices produces incorrect results [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #463: URL: https://github.com/apache/datafusion-comet/issues/463 ### Describe the bug repro is to modify this existing test in `CometExpressionSuite` to also test negative indices: ```scala test("string type and substring") { withParque

Re: [I] bug: LIKE with custom escape char produces incorrect results [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on issue #462: URL: https://github.com/apache/datafusion-comet/issues/462#issuecomment-2127834178 note that we could choose to fallback to spark if there is a custom escape character -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2127820215 Thank you @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on code in PR #461: URL: https://github.com/apache/datafusion-comet/pull/461#discussion_r1612162690 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -644,14 +644,16 @@ impl Cast { | DataType::Float32 | DataTyp

[I] bug: LIKE with custom escape char produces incorrect results [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new issue, #462: URL: https://github.com/apache/datafusion-comet/issues/462 ### Describe the bug Repro test case for `CometExpressionSuite`: ```scala test("like with custom escape") { val names = Seq("", "a_b", "d_e_f") withTempDir { dir =>

Re: [PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-23 Thread via GitHub
Omega359 closed pull request #10638: Update cli Dockerfile to a newer ubuntu release, newer rust release URL: https://github.com/apache/datafusion/pull/10638 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1612105585 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1131,9 +1129,8 @@ class CometAggregateSuite extends CometTestBase with AdaptiveSp

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2127740760 > LGTM in general. How do we pick shuffle mode when it is `auto`? I don't seem to find the logic in this PR. When it is `auto`, Comet chooses native shuffle if possible as it

Re: [PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on PR #10640: URL: https://github.com/apache/datafusion/pull/10640#issuecomment-2127732837 @alamb @jonahgao - this next one is up for review/merge now :) And thanks for merging the struct one! -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127726660 @andygrove @parthchandra @kazuyukitanimura thank you for reviews and support in helping me through my first open source contribution. Its been a great learning experience. Sti

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub
comphead commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612087281 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_coun

Re: [PR] fix Incorrect statistics read for i8 i16 columns in parquet [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10629: URL: https://github.com/apache/datafusion/pull/10629#issuecomment-2127712749 I took the liberty of merging up from main to resolve some conflicts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2127703710 Thanks again @Blizzara and @jonahgao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
alamb merged PR #10622: URL: https://github.com/apache/datafusion/pull/10622 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2127703510 > Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes? I think this was a logical conflic

Re: [I] Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove closed issue #327: Implement Spark-compatible CAST from String to Date URL: https://github.com/apache/datafusion-comet/issues/327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove merged PR #383: URL: https://github.com/apache/datafusion-comet/pull/383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-23 Thread via GitHub
alamb commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2127696472 Thank you @cisaacson -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Dynamic schema for custom TableProvider [datafusion]

2024-05-23 Thread via GitHub
alamb commented on issue #10559: URL: https://github.com/apache/datafusion/issues/10559#issuecomment-2127695450 > @alamb - Thanks for responding. This sounds interesting. Could you elaborate on how this can be achieved? > What information gets passed down to SchemaProvider that I can

Re: [PR] Minor: Generate the supported Spark builtin expression list into MD file [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on code in PR #455: URL: https://github.com/apache/datafusion-comet/pull/455#discussion_r1612070732 ## docs/spark_expressions_support.md: ## @@ -0,0 +1,477 @@ + + +# Supported Spark Expressions + +### agg_funcs + - [ ] any + - [ ] any_value + - [ ] approx_cou

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-23 Thread via GitHub
alamb commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1612063262 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,100 @@ use crate::physical_optimizer::pruning::{PruningPredicate, PruningSta

[PR] Minor: add runtime asserts to `RowGroup` [datafusion]

2024-05-23 Thread via GitHub
alamb opened a new pull request, #10641: URL: https://github.com/apache/datafusion/pull/10641 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/10607/ ## Rationale for this change @advancedxy noted https://github.com/apache/datafus

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-23 Thread via GitHub
alamb commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1612054787 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,100 @@ use crate::physical_optimizer::pruning::{PruningPredicate, PruningSta

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1612010377 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,6 +1407,56 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on PR #10531: URL: https://github.com/apache/datafusion/pull/10531#issuecomment-2127582955 @jonahgao Thanks for the review - I pushed a new version but it builds on top of https://github.com/apache/datafusion/pull/10622 and https://github.com/apache/datafusion/pull/10640

[PR] More properly handle nullability of types/literals in Substrait [datafusion]

2024-05-23 Thread via GitHub
Blizzara opened a new pull request, #10640: URL: https://github.com/apache/datafusion/pull/10640 ## Which issue does this PR close? Extracted from #10531 Builds on top of #10622 so I'll rebase this once it's merged. ## Rationale for this change More clo

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
sunchao commented on code in PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1611986030 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -131,14 +132,18 @@ object CometConf { .booleanConf .createWithDefault(false) -

[PR] fix: Only delegate to DataFusion cast when we know that it is compatible with Spark [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove opened a new pull request, #461: URL: https://github.com/apache/datafusion-comet/pull/461 ## Which issue does this PR close? N/A ## Rationale for this change We have a catchall block in `cast.rs` that delegates to DataFusion for any cast that we

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1611984584 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,7 +1404,84 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127552719 @andygrove thank you very much for looking into this. tested fuzz test with your suggestions and is working now. pushed changes in the latest commit https://github.com/apache/

[I] Implement protobuf serialization for LogicalPlan::Unnest [datafusion]

2024-05-23 Thread via GitHub
jamesmcm opened a new issue, #10639: URL: https://github.com/apache/datafusion/issues/10639 ### Is your feature request related to a problem or challenge? This job in Ballista fails: ```rust let avro_file = "gs://..."; let metadata_df = ctx .read_avr

Re: [I] Examples of using `TreeNode` APIs to walk and manipulate LogicalPlans [datafusion]

2024-05-23 Thread via GitHub
cisaacson commented on issue #10628: URL: https://github.com/apache/datafusion/issues/10628#issuecomment-2127532983 @alamb This is a great idea. As I learn more perhaps I can help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-23 Thread via GitHub
alamb commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2127444050 I hope to review this PR later today or tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Minor: Move median test [datafusion]

2024-05-23 Thread via GitHub
alamb merged PR #10611: URL: https://github.com/apache/datafusion/pull/10611 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127417428 @vidyasankarv I figured out what the issue is. I don't fully understand why, but when the fuzz test creates the DataFrame, the cast operation that gets performed is from a

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-23 Thread via GitHub
huaxingao commented on PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#issuecomment-2127416716 Thanks @viirya @andygrove @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #460: URL: https://github.com/apache/datafusion-comet/pull/460#issuecomment-2127412513 cc @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-23 Thread via GitHub
viirya commented on PR #456: URL: https://github.com/apache/datafusion-comet/pull/456#issuecomment-2127410051 Merged. Thanks @huaxingao @andygrove @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] feat: correlation support [datafusion-comet]

2024-05-23 Thread via GitHub
viirya merged PR #456: URL: https://github.com/apache/datafusion-comet/pull/456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add SessionContext::register_object_store [datafusion]

2024-05-23 Thread via GitHub
comphead merged PR #10621: URL: https://github.com/apache/datafusion/pull/10621 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2127396639 Thanks @jonahgao , that seems to work indeed. This is ready to merge by me :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-23 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2127348412 @vidyasankarv I am also very confused .. values that fail in the fuzz test work in the other test 🤔 I am debugging and will let you know when I get to the bottom of this m

Re: [I] Render tables using html in notebooks. [datafusion-python]

2024-05-23 Thread via GitHub
Michael-J-Ward commented on issue #713: URL: https://github.com/apache/datafusion-python/issues/713#issuecomment-2127332326 Rounding out options. I recently came across this python library dedicated to creating nicely formatted html tables [great-tables](https://github.com/posit-dev

[PR] Update cli Dockerfile to a newer ubuntu release, newer rust release [datafusion]

2024-05-23 Thread via GitHub
Omega359 opened a new pull request, #10638: URL: https://github.com/apache/datafusion/pull/10638 ## Which issue does this PR close? Closes #10472 ## Rationale for this change The current Dockerfile is based on a pretty old ubuntu release which doesn't build when

Re: [PR] Refactor parquet row group pruning into a struct (use new statistics API, part 1) [datafusion]

2024-05-23 Thread via GitHub
advancedxy commented on code in PR #10607: URL: https://github.com/apache/datafusion/pull/10607#discussion_r1611810009 ## datafusion/core/src/datasource/physical_plan/parquet/row_groups.rs: ## @@ -38,42 +38,100 @@ use crate::physical_optimizer::pruning::{PruningPredicate, Pruni

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
jonahgao commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2127227068 > Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes? CI is attempting to merge the c

Re: [PR] Add support for Substrait Struct literals and type [datafusion]

2024-05-23 Thread via GitHub
Blizzara commented on PR #10622: URL: https://github.com/apache/datafusion/pull/10622#issuecomment-2127110699 Not sure what's up with the checks here - cargo check runs fine for me locally and I don't see how the errors would be related to my changes? -- This is an automated message from

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1611638831 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1611644593 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Introduce expr builder for aggregate function [datafusion]

2024-05-23 Thread via GitHub
jayzhan211 commented on code in PR #10560: URL: https://github.com/apache/datafusion/pull/10560#discussion_r1611644593 ## datafusion-examples/examples/udaf_expr.rs: ## @@ -0,0 +1,45 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

  1   2   >