Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-15 Thread via GitHub
berkaysynnada commented on PR #10504: URL: https://github.com/apache/datafusion/pull/10504#issuecomment-2111732264 > I haven't had a chance to review this PR yet @berkaysynnada -- I wonder if you have seen the API in #10117 from @tinfoil-knight Yes, I have. It is becoming a nicer and

Re: [I] Use `min_value` and `max_value` on statistics to avoid `ExecutionPlan.execute` [datafusion]

2024-05-15 Thread via GitHub
samuelcolvin commented on issue #10400: URL: https://github.com/apache/datafusion/issues/10400#issuecomment-2111742794 Actually I'm on a flight today, so might have some time to work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
berkaysynnada commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601094751 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
dmitrybugakov commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601133136 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
dmitrybugakov commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601133136 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
dmitrybugakov commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601138710 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-15 Thread via GitHub
vidyasankarv commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1601143103 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -563,9 +563,33 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
berkaysynnada commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601146396 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-15 Thread via GitHub
vidyasankarv commented on code in PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#discussion_r1600987926 ## core/src/execution/datafusion/expressions/cast.rs: ## @@ -1444,13 +1483,136 @@ fn parse_str_to_time_only_timestamp(value: &str) -> CometResult> { O

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
dmitrybugakov commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601162206 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
berkaysynnada commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601184358 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value i

[PR] Update substrait requirement from 0.32.0 to 0.33.3 [datafusion]

2024-05-15 Thread via GitHub
dependabot[bot] opened a new pull request, #10516: URL: https://github.com/apache/datafusion/pull/10516 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from https://github.com/substrait-io/su

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601235381 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601235381 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-15 Thread via GitHub
crepererum commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2112020903 FWIW I've also seen the high cost of expression string formatting (using `Display`/`to_string()`) in a good number of profiles. I think there's nothing wrong about havi

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601299434 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601303058 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601303058 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601306168 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601306168 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10386: URL: https://github.com/apache/datafusion/pull/10386#discussion_r1601303058 ## datafusion/optimizer/src/rewrite_cycle.rs: ## @@ -0,0 +1,262 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2112038666 I think the logic is correct, although it takes we sometime to understand the difference between the first pass and other pass, but I did not have a better design about this. --

Re: [I] Port aggregate test to sqllogictest [datafusion]

2024-05-15 Thread via GitHub
xinlifoobar commented on issue #10384: URL: https://github.com/apache/datafusion/issues/10384#issuecomment-2112041649 Hello @jayzhan211 , I am trying to ramp up the repo via porting the min_max tests to sqllogictests. During the migrations, however, I found there are inconsistent beha

[I] Release DataFusion `39.0.0` [datafusion]

2024-05-15 Thread via GitHub
alamb opened a new issue, #10517: URL: https://github.com/apache/datafusion/issues/10517 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] Release DataFusion `38.0.0` [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10255: URL: https://github.com/apache/datafusion/issues/10255#issuecomment-2112148435 Filed https://github.com/apache/datafusion/issues/10517 for version 39 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] DataFusion `38.0.0` Release [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10217: URL: https://github.com/apache/datafusion/issues/10217#issuecomment-2112148915 Filed https://github.com/apache/datafusion/issues/10517 for next one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] [Regression] Query using ARRAY_AGG(DISTINCT) causes panic [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10486: URL: https://github.com/apache/datafusion/issues/10486#issuecomment-2112150826 Added to https://github.com/apache/datafusion/issues/10517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[I] `ScalarVariable` Expr --> String Support [datafusion]

2024-05-15 Thread via GitHub
alamb opened a new issue, #10518: URL: https://github.com/apache/datafusion/issues/10518 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/9726 to complete the Expr Converting `Expr` back to `SQL` is valuable for s

[I] `IsNull` / `IsNotNull` Expr --> String Support [datafusion]

2024-05-15 Thread via GitHub
alamb opened a new issue, #10519: URL: https://github.com/apache/datafusion/issues/10519 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Remove `file_type()` from `FileFormat` [datafusion]

2024-05-15 Thread via GitHub
Jefffrey commented on PR #10499: URL: https://github.com/apache/datafusion/pull/10499#issuecomment-2112169994 > As you go through implementing ORC support, if you hit anything else that woudl make it easier to add new format support to the core and/or listing table that would be great.

[I] `OuterColumnReference` Expr --> String Support [datafusion]

2024-05-15 Thread via GitHub
alamb opened a new issue, #10523: URL: https://github.com/apache/datafusion/issues/10523 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [I] Complete support for `Expr --> String ` [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #9726: URL: https://github.com/apache/datafusion/issues/9726#issuecomment-2112176103 I filed tickets for the remaining issues These ones are likely straightforward. - [ ] https://github.com/apache/datafusion/issues/10518 - [ ] https://github.com/apac

Re: [I] Port aggregate test to sqllogictest [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on issue #10384: URL: https://github.com/apache/datafusion/issues/10384#issuecomment-2112295991 I think we don't need to keep the test. The reason that you didn't get the error is that we have coercion already in building values. It is the expected result. -- This

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601481220 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value is d

Re: [PR] Implement conversion from ColumnStatistics to NullableInterval [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 commented on code in PR #10510: URL: https://github.com/apache/datafusion/pull/10510#discussion_r1601481220 ## datafusion/expr/src/interval_arithmetic.rs: ## @@ -1469,6 +1472,8 @@ pub enum NullableInterval { MaybeNull { values: Interval }, /// The value is d

Re: [PR] chore: update to maturin's recommended project layout for rust/python… [datafusion-python]

2024-05-15 Thread via GitHub
davidhewitt commented on PR #695: URL: https://github.com/apache/datafusion-python/pull/695#issuecomment-2112332332 Looks like you figured this out already, but yes I agree no need to change `lib.name` here 👍 -- This is an automated message from the Apache Git Service. To respond to the

[PR] UDAF: Extend more args to `state_fields` and `groups_accumulator_supported` and introduce `ReversedUDAF` [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 opened a new pull request, #10525: URL: https://github.com/apache/datafusion/pull/10525 ## Which issue does this PR close? Closes #. ## Rationale for this change This PR is pulled from #10484 and it has a similar rationale in #10391 but the cha

[PR] fix: Handle compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-15 Thread via GitHub
advancedxy opened a new pull request, #433: URL: https://github.com/apache/datafusion-comet/pull/433 ## Which issue does this PR close? Closes #427 ## Rationale for this change Bug fixes. When submitting #344, we found there's a bug in spark_hash, which doesn't handle dictionar

Re: [PR] feat: Add xxhash64 function support [datafusion-comet]

2024-05-15 Thread via GitHub
advancedxy commented on code in PR #424: URL: https://github.com/apache/datafusion-comet/pull/424#discussion_r1601584368 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -193,27 +241,67 @@ macro_rules! hash_array_decimal { fn create_hashes_dictionary( array: &ArrayRef

Re: [PR] feat: Add xxhash64 function support [datafusion-comet]

2024-05-15 Thread via GitHub
advancedxy commented on PR #424: URL: https://github.com/apache/datafusion-comet/pull/424#issuecomment-2112436642 @andygrove @viirya I have created #433 and mark this as a draft. We should merge that first and then come back to this PR . PLAL when you have tome. -- This is an automated m

[I] Bump maturin version to satisfy conda-forge constraints? [datafusion-python]

2024-05-15 Thread via GitHub
charlesbluca opened a new issue, #701: URL: https://github.com/apache/datafusion-python/issues/701 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Availability of this package on conda-forge for each Python minor version is depend

[PR] Support merge batch for distinct array aggregate function [datafusion]

2024-05-15 Thread via GitHub
jayzhan211 opened a new pull request, #10526: URL: https://github.com/apache/datafusion/pull/10526 ## Which issue does this PR close? Closes #10486 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] `IsNull` / `IsNotNull` Expr --> String Support [datafusion]

2024-05-15 Thread via GitHub
goldmedal commented on issue #10519: URL: https://github.com/apache/datafusion/issues/10519#issuecomment-2112578650 Hi @alamb, I'm new to DataFusion. Could I take this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-15 Thread via GitHub
ozankabak commented on PR #10504: URL: https://github.com/apache/datafusion/pull/10504#issuecomment-2112592630 That PR and this are orthogonal. The work in this PR will simply inherit/benefit from the refactor in the other PR. I will review this one in detail tomorrow. -- This is a

[PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-15 Thread via GitHub
appletreeisyellow opened a new pull request, #10527: URL: https://github.com/apache/datafusion/pull/10527 ## Which issue does this PR close? Closes #10295 ## Rationale for this change Make the optimizer faster by not copying ## What changes are incl

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-15 Thread via GitHub
berkaysynnada commented on code in PR #10504: URL: https://github.com/apache/datafusion/pull/10504#discussion_r1601707815 ## datafusion/core/src/physical_optimizer/enforce_distribution.rs: ## @@ -3572,7 +3572,11 @@ pub(crate) mod tests { expr: col("c", &schema).unwr

[PR] Example for simple conversion [datafusion]

2024-05-15 Thread via GitHub
edmondop opened a new pull request, #10528: URL: https://github.com/apache/datafusion/pull/10528 ## Which issue does this PR close? Closes #10524 . At the time of opening this PR, the example fails with: ``` thread 'main' panicked at datafusion-examples/examples/plan_t

Re: [PR] Example for simple conversion [datafusion]

2024-05-15 Thread via GitHub
edmondop commented on PR #10528: URL: https://github.com/apache/datafusion/pull/10528#issuecomment-2112657414 Using the `expr_to_sql` api, we get the following error: ``` assertion `left == right` failed left: "((\"a\" < 5) OR (\"a\" = 8))" right: "a < 5 OR a = 8" ```

Re: [I] Add an example of how to use the SQL parser/unparser API [datafusion]

2024-05-15 Thread via GitHub
edmondop commented on issue #10524: URL: https://github.com/apache/datafusion/issues/10524#issuecomment-211245 @alamb I wasn't able to get a simple sql example to pass, if I convert the `Expr` to String directly, the literal is wrapped in a type like so. ``` Running `/hom

Re: [PR] feat: expose `named_struct` in python [datafusion-python]

2024-05-15 Thread via GitHub
andygrove merged PR #700: URL: https://github.com/apache/datafusion-python/pull/700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Expose named_struct in python [datafusion-python]

2024-05-15 Thread via GitHub
andygrove closed issue #692: Expose named_struct in python URL: https://github.com/apache/datafusion-python/issues/692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-15 Thread via GitHub
codecov-commenter commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2112718643 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/383?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] Minor: Extract parent/child limit calculation into a function, improve docs [datafusion]

2024-05-15 Thread via GitHub
comphead commented on code in PR #10501: URL: https://github.com/apache/datafusion/pull/10501#discussion_r1601777537 ## datafusion/optimizer/src/push_down_limit.rs: ## @@ -217,6 +183,78 @@ impl OptimizerRule for PushDownLimit { } } +/// Combines two limits into a single

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1601785741 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -617,50 +663,17 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde wi

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1601786519 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -950,7 +950,8 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#issuecomment-2112795131 > Hmm, I think this is for publishing cargo package. But I think Comet should be released as a maven package at maven repository? So users can simply add Comet as a dependency in

[PR] Implement unparse `IS_NULL` to String and enhance the tests [datafusion]

2024-05-15 Thread via GitHub
goldmedal opened a new pull request, #10529: URL: https://github.com/apache/datafusion/pull/10529 ## Which issue does this PR close? Closes #10519 ## Rationale for this change ## What changes are included in this PR? I found that `is_not_null` has been supported. I on

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-15 Thread via GitHub
comphead commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1601861040 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -991,6 +992,9 @@ impl SMJStream { Ordering::Equal => { if matches!(s

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
vaibhawvipul commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1601865299 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -950,7 +950,8 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelp

Re: [PR] feat: expose `named_struct` in python [datafusion-python]

2024-05-15 Thread via GitHub
timsaucer commented on PR #700: URL: https://github.com/apache/datafusion-python/pull/700#issuecomment-2112904138 That was fast! Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[PR] Website fixes [datafusion-python]

2024-05-15 Thread via GitHub
Michael-J-Ward opened a new pull request, #702: URL: https://github.com/apache/datafusion-python/pull/702 # Which issue does this PR close? Closes #699. Closes #687. # Rationale for this change Datafusion is now an Apache Top Level Project and has new logo branding.

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-15 Thread via GitHub
tshauck commented on PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#issuecomment-2112964714 Notes from comet community meeting: * Possible to get link for presentation slides and video? * Andy to publish video * Liang Chi has slides here: https://docs.goog

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-15 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1601943239 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -991,6 +992,9 @@ impl SMJStream { Ordering::Equal => { if matches!(sel

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-15 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1601943239 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -991,6 +992,9 @@ impl SMJStream { Ordering::Equal => { if matches!(sel

Re: [PR] Website fixes [datafusion-python]

2024-05-15 Thread via GitHub
andygrove merged PR #702: URL: https://github.com/apache/datafusion-python/pull/702 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] links to `examples` on PyPI page are broken [datafusion-python]

2024-05-15 Thread via GitHub
andygrove closed issue #699: links to `examples` on PyPI page are broken URL: https://github.com/apache/datafusion-python/issues/699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Update the DataFusion in Python website [datafusion-python]

2024-05-15 Thread via GitHub
andygrove closed issue #687: Update the DataFusion in Python website URL: https://github.com/apache/datafusion-python/issues/687 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara opened a new pull request, #10531: URL: https://github.com/apache/datafusion/pull/10531 ## Which issue does this PR close? Closes #10530 ## Rationale for this change ## What changes are included in this PR? Adds support for Substrait's

Re: [PR] feat: RewriteCycle API for short-circuiting optimizer loops [datafusion]

2024-05-15 Thread via GitHub
erratic-pattern commented on PR #10386: URL: https://github.com/apache/datafusion/pull/10386#issuecomment-2113033273 > I think the logic is correct, although it takes me sometime to understand the difference between the first pass and other pass, but I did not have a better design about thi

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601978505 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1137,7 +1220,8 @@ fn from_substrait_type(dt: &substrait::proto::Type) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601979954 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -1277,6 +1407,56 @@ pub(crate) fn from_substrait_literal(lit: &Literal) -> Result {

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601980361 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -165,6 +168,53 @@ pub fn to_substrait_rel( }))), })) } +

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601980707 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -165,6 +168,53 @@ pub fn to_substrait_rel( }))), })) } +

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601981711 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2113042972 There is one test failure with JDK 8 / Spark 3.2: ``` - cast StringType to DateType *** FAILED *** (349 milliseconds) "[CAST_INVALID_INPUT] The value '0' of the type

Re: [I] Substrait integration doesn't recognize typed functions [datafusion]

2024-05-15 Thread via GitHub
Blizzara commented on issue #10412: URL: https://github.com/apache/datafusion/issues/10412#issuecomment-2113045304 Cool, I'll take a look in the next days! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1601983256 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -948,10 +948,8 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelper

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
vaibhawvipul commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1601993517 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -948,10 +948,8 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHel

Re: [I] TPCDS query 91 throws java.io.IOException: Could not read object from config with key parquet.private.read.filter.predicate exception [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on issue #182: URL: https://github.com/apache/datafusion-comet/issues/182#issuecomment-2113071500 I just ran into the same issue with TPC-H q2 on my MBP. I will debug and add some notes here once I know more. -- This is an automated message from the Apache Git Service

Re: [PR] Remove `file_type()` from `FileFormat` [datafusion]

2024-05-15 Thread via GitHub
alamb merged PR #10499: URL: https://github.com/apache/datafusion/pull/10499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Make/Remove `FileType` enum and replace with a `trait` [datafusion]

2024-05-15 Thread via GitHub
alamb closed issue #8657: Make/Remove `FileType` enum and replace with a `trait` URL: https://github.com/apache/datafusion/issues/8657 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Remove `file_type()` from `FileFormat` [datafusion]

2024-05-15 Thread via GitHub
alamb commented on PR #10499: URL: https://github.com/apache/datafusion/pull/10499#issuecomment-2113074511 Thanks again @Jefffrey -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] TPCDS query 91 throws java.io.IOException: Could not read object from config with key parquet.private.read.filter.predicate exception [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on issue #182: URL: https://github.com/apache/datafusion-comet/issues/182#issuecomment-2113077753 My repro: Using latest commit from main (`1a04805be5e0f3a634521a821b24c0e0efb43d31`) I ran `make release`. Started Spark shell with: ```shell $SPARK_

Re: [PR] Minor: add a test for `current_time` (no args) [datafusion]

2024-05-15 Thread via GitHub
alamb merged PR #10509: URL: https://github.com/apache/datafusion/pull/10509 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10482: URL: https://github.com/apache/datafusion/issues/10482#issuecomment-2113090667 Review Queue: - [ ] https://github.com/apache/datafusion/pull/10386 - [ ] -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Using Union's input schema when recompute schema [datafusion]

2024-05-15 Thread via GitHub
alamb commented on code in PR #10494: URL: https://github.com/apache/datafusion/pull/10494#discussion_r1602014671 ## datafusion/optimizer/src/propagate_empty_relation.rs: ## @@ -154,14 +156,14 @@ impl OptimizerRule for PropagateEmptyRelation { Ok(Transfo

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
kazuyukitanimura commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1602009700 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -617,50 +663,17 @@ object QueryPlanSerde extends Logging with ShimQueryPlanS

Re: [I] TPCDS query 91 throws java.io.IOException: Could not read object from config with key parquet.private.read.filter.predicate exception [datafusion-comet]

2024-05-15 Thread via GitHub
andygrove commented on issue #182: URL: https://github.com/apache/datafusion-comet/issues/182#issuecomment-2113096704 I do the same on Linux. I am using JDK 11 in both cases. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [I] Add an example of how to use the SQL parser/unparser API [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10524: URL: https://github.com/apache/datafusion/issues/10524#issuecomment-2113100593 > is it the expectation that column names are always quoted? I think so. @phillipleblanc added a way to get the expression back without the quotes -- https://github.com/apa

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
vaibhawvipul commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r1602021543 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -617,50 +663,17 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [PR] fix: parsing timestamp with date format [datafusion]

2024-05-15 Thread via GitHub
alamb merged PR #10476: URL: https://github.com/apache/datafusion/pull/10476 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] to_date with a date string and format fails with error parsing timestamp [datafusion]

2024-05-15 Thread via GitHub
alamb closed issue #10471: to_date with a date string and format fails with error parsing timestamp URL: https://github.com/apache/datafusion/issues/10471 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] Make `CommonSubexprEliminate` faster by avoiding the use of strings [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10426: URL: https://github.com/apache/datafusion/issues/10426#issuecomment-2113109684 > FWIW I've also seen the high cost of expression string formatting (using `Display`/`to_string()`) in a good number of profiles. > I think there's nothing wrong about havin

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-15 Thread via GitHub
kazuyukitanimura commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1602020418 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -227,27 +248,11 @@ pub fn create_hashes<'a>( arrays: &[ArrayRef], hashes_buffer: &'a mut

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-15 Thread via GitHub
alamb commented on PR #10504: URL: https://github.com/apache/datafusion/pull/10504#issuecomment-2113111066 > @berkaysynnada kindly reminded me that the type undergoing the refactor disappears in this extended formulation. In that case this PR may supersede the refactor one. Indeed, t

Re: [PR] PhysicalExpr Orderings with Range Information [datafusion]

2024-05-15 Thread via GitHub
alamb commented on PR #10504: URL: https://github.com/apache/datafusion/pull/10504#issuecomment-2113113341 > Having range information would help ScalarFunctionExpr's order calculations since many of them have monotonicity pattern on some defined intervals. I have given an example of it for

Re: [PR] feat: Add support for TryCast expression in Spark 3.2 and 3.3 [datafusion-comet]

2024-05-15 Thread via GitHub
vaibhawvipul commented on code in PR #416: URL: https://github.com/apache/datafusion-comet/pull/416#discussion_r160202 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -617,50 +663,17 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [I] `IsNull` / `IsNotNull` Expr --> String Support [datafusion]

2024-05-15 Thread via GitHub
alamb commented on issue #10519: URL: https://github.com/apache/datafusion/issues/10519#issuecomment-2113115526 > Hi @alamb, I'm new to DataFusion. Could I take this issue? Absolutely ! Thank you @goldmedal -- in general feel free to take any issue, as described in https://datafusion

Re: [I] Support complex datatypes [datafusion-comet]

2024-05-15 Thread via GitHub
viirya commented on issue #434: URL: https://github.com/apache/datafusion-comet/issues/434#issuecomment-2113115795 Actually Comet columnar shuffle already supports some complex data types. You can find some tests using complex types in Comet shuffle test suites. But Comet scan operat

Re: [PR] Implement unparse `IS_NULL` to String and enhance the tests [datafusion]

2024-05-15 Thread via GitHub
alamb commented on code in PR #10529: URL: https://github.com/apache/datafusion/pull/10529#discussion_r1602033644 ## datafusion/sql/src/unparser/expr.rs: ## @@ -391,7 +391,9 @@ impl Unparser<'_> { Expr::ScalarVariable(_, _) => { not_impl_err!("Unsup

  1   2   3   >