Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2813394021 > Great collaboration @adriangb, thank you. I hope more will come. Me too, great stuff! -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Final release note touchups [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15741: URL: https://github.com/apache/datafusion/pull/15741#issuecomment-2812931042 Thanks @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-17 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2049125883 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-17 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2049170187 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

[PR] fix: parquet coerce_int96 schema [datafusion]

2025-04-17 Thread via GitHub
chenkovsky opened a new pull request, #15750: URL: https://github.com/apache/datafusion/pull/15750 ## Which issue does this PR close? - Closes #15721. ## Rationale for this change coerce_int96 is ignored when infer schema. ## What changes are included in this P

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
berkaysynnada merged PR #15566: URL: https://github.com/apache/datafusion/pull/15566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove opened a new pull request, #1655: URL: https://github.com/apache/datafusion-comet/pull/1655 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-17 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2049125883 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [PR] Add DataFusion 47.0.0 Upgrade Guide [datafusion]

2025-04-17 Thread via GitHub
comphead commented on code in PR #15749: URL: https://github.com/apache/datafusion/pull/15749#discussion_r2049135077 ## docs/source/library-user-guide/upgrading.md: ## @@ -19,6 +19,112 @@ # Upgrade Guides +## DataFusion `47.0.0` + +This section calls out some of the major c

Re: [PR] fix: parquet coerce_int96 schema [datafusion]

2025-04-17 Thread via GitHub
comphead commented on PR #15750: URL: https://github.com/apache/datafusion/pull/15750#issuecomment-2813202353 @mbutrovich cc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
adriangb commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2049207122 ## datafusion/physical-optimizer/src/push_down_filter.rs: ## @@ -0,0 +1,535 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] enable `supports_filter_during_aggregation` for Generic [datafusion-sqlparser-rs]

2025-04-17 Thread via GitHub
goldmedal commented on PR #1815: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1815#issuecomment-2812858094 @alamb, What do you think about this? I think this change is related to the default behavior of DataFusion. -- This is an automated message from the Apache Git Servic

[PR] feat: make task distribution policies pluggable [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm opened a new pull request, #1243: URL: https://github.com/apache/datafusion-ballista/pull/1243 # Which issue does this PR close? Closes #1238. # Rationale for this change - it not easy to introduce new task distribution policies without changing ball

Re: [I] Spark SQL test failures in native_datafusion scan [datafusion-comet]

2025-04-17 Thread via GitHub
mbutrovich commented on issue #1545: URL: https://github.com/apache/datafusion-comet/issues/1545#issuecomment-2813380228 ``` catalyst: Passed: Total 7224, Failed 0, Errors 0, Passed 7224, Ignored 5, Canceled 1 core 1: Failed: Total 9138, Failed 46, Errors 0, Passed 9092, Ignored 292,

Re: [PR] Apply pre-selection and computation skipping to short-circuit optimization [datafusion]

2025-04-17 Thread via GitHub
alamb merged PR #15694: URL: https://github.com/apache/datafusion/pull/15694 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-17 Thread via GitHub
acking-you commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2813577217 > Thank you [@acking-you](https://github.com/acking-you) for the idea. Does it similar to parquet filter pushdown? We are already trying to make it default. [#3463](https://g

Re: [I] Add more short-circuit optimization scenarios for `OR` and `AND` [datafusion]

2025-04-17 Thread via GitHub
alamb closed issue #15636: Add more short-circuit optimization scenarios for `OR` and `AND` URL: https://github.com/apache/datafusion/issues/15636 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Updated extending operators documentation [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15612: URL: https://github.com/apache/datafusion/pull/15612#issuecomment-2813589469 Another alternate to make the doc tests pass is to comment the code out, using ``` # /* ... */ ``` For example, see - https://github.com/apache/datafusion/pull/

Re: [I] Nested correlated subquery error with a depth exceeding 1 [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #15558: URL: https://github.com/apache/datafusion/issues/15558#issuecomment-2813832115 I think support for this kind of query will require a more unified approach, such as the one described by @duongcongtoai in - https://github.com/apache/datafusion/issues/14554

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2813833799 I really like the idea of the incremental approach -- I think it is practically speaking the only one we are likely to be able to pull off. Thank you @duongcongtoai There

Re: [I] [Epic] Add snapshot tests (migrate to `insta` for tests) [datafusion]

2025-04-17 Thread via GitHub
blaginin commented on issue #15178: URL: https://github.com/apache/datafusion/issues/15178#issuecomment-2813832937 for sure, will do! thank you!! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Minor: simplify code in datafusion-proto [datafusion]

2025-04-17 Thread via GitHub
alamb opened a new pull request, #15752: URL: https://github.com/apache/datafusion/pull/15752 ## Which issue does this PR close? - Follow on to https://github.com/apache/datafusion/pull/15737 ## Rationale for this change I noticed a potential simplification while revi

Re: [PR] Improve documentation for format `OPTIONS` clause [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15708: URL: https://github.com/apache/datafusion/pull/15708#discussion_r2049366426 ## docs/source/user-guide/sql/format_options.md: ## @@ -0,0 +1,209 @@ + + +# Format Options + +DataFusion supports customizing how data is read from or written to di

Re: [I] Building project takes a *long* time (esp compilation time for `datafusion` core crate) [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #13814: URL: https://github.com/apache/datafusion/issues/13814#issuecomment-2813842960 > Interesting material comes up this week https://www.feldera.com/blog/cutting-down-rust-compile-times-from-30-to-2-minutes-with-one-thousand-crates And the nature of this project

Re: [PR] fix: serialize listing table without partition column [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15737: URL: https://github.com/apache/datafusion/pull/15737#issuecomment-2813800026 Thanks again @chenkovsky -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049483574 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
comphead commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2049436227 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2849,33 +2825,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-17 Thread via GitHub
acking-you commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2814555616 > [@acking-you](https://github.com/acking-you) have you seen [#15301](https://github.com/apache/datafusion/pull/15301)? Thank you for your hint. I haven't looked into t

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15735: URL: https://github.com/apache/datafusion/pull/15735#discussion_r2049565548 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -198,6 +198,7 @@ impl ExprSimplifier { /// /// See [Self::simplify] for details

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049483295 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -163,6 +177,50 @@ impl GroupsAccumulatorAdapter { /// invokes f(accumul

Re: [I] Improve performance of `dropDuplicates` [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove closed issue #1275: Improve performance of `dropDuplicates` URL: https://github.com/apache/datafusion-comet/issues/1275 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] support OR operator in binary `evaluate_bounds` [datafusion]

2025-04-17 Thread via GitHub
davidhewitt commented on PR #15716: URL: https://github.com/apache/datafusion/pull/15716#issuecomment-2814011078 Thanks, I will probably get around to fixing this up on Tuesday 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Ballista example project does not build [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #482: Ballista example project does not build URL: https://github.com/apache/datafusion-ballista/issues/482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] ShuffleWriterExec::schema mismatch [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #483: ShuffleWriterExec::schema mismatch URL: https://github.com/apache/datafusion-ballista/issues/483 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Update datafusion <> homebrew instructions [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #15751: URL: https://github.com/apache/datafusion/issues/15751#issuecomment-2813743648 This is pretty amazing -- I don't really know who setup the homebrew stuff initially, so I don't know how much value there is in the instructions -- This is an automated message

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
comphead commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2049450280 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2889,6 +2857,48 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] enable `supports_filter_during_aggregation` for Generic dialect [datafusion-sqlparser-rs]

2025-04-17 Thread via GitHub
alamb commented on PR #1815: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1815#issuecomment-2813825103 > oops , I forgot to mention the tests. I found it will be covered by the original case: I was thinking a test that would prevent against regressions in a future ref

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2812885176 Ok sounds good to me for now. We'll measure in our production system and if there's overhead from planning time we can come back and edit the API. -- This is an automated message

Re: [I] [Epic] Add snapshot tests (migrate to `insta` for tests) [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #15178: URL: https://github.com/apache/datafusion/issues/15178#issuecomment-2813827406 @blaginin any chance you could file some good first issue tickets to cover the places taht @xudong963 identified in https://github.com/apache/datafusion/issues/15178#issuecomment

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049469308 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating Mi

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r1982212016 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -299,6 +420,12 @@ impl GroupsAccumulatorAdapter { } impl GroupsAccumulator fo

[I] sql planning benchmark fails with internal type coercion error [datafusion]

2025-04-17 Thread via GitHub
alamb opened a new issue, #15753: URL: https://github.com/apache/datafusion/issues/15753 ### Describe the bug I get an error when trying to run the sql planning benchmarkl: here is the error: thread 'main' panicked at datafusion/core/benches/sql_planner.rs:60:14: c

Re: [PR] Improve `simplify_expressions` rule [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15735: URL: https://github.com/apache/datafusion/pull/15735#issuecomment-2814024296 BTW I tried to run the planning benchmarks to see if this made things better, but sadly I found a bug: - https://github.com/apache/datafusion/issues/15753 -- This is an automated

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2814025946 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu SMP Fr

Re: [PR] fix: handle missing field correctly in native_iceberg_compat [datafusion-comet]

2025-04-17 Thread via GitHub
codecov-commenter commented on PR #1656: URL: https://github.com/apache/datafusion-comet/pull/1656#issuecomment-2813754271 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1656?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: handle missing field correctly in native_iceberg_compat [datafusion-comet]

2025-04-17 Thread via GitHub
parthchandra commented on PR #1656: URL: https://github.com/apache/datafusion-comet/pull/1656#issuecomment-2813993987 Merged. Thank you @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Minor: simplify code in datafusion-proto [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15752: URL: https://github.com/apache/datafusion/pull/15752#issuecomment-2813961646 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu

Re: [PR] Minor: simplify code in datafusion-proto [datafusion]

2025-04-17 Thread via GitHub
xudong963 merged PR #15752: URL: https://github.com/apache/datafusion/pull/15752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Issue with partitioned `ListingTable` [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #1239: URL: https://github.com/apache/datafusion-ballista/issues/1239#issuecomment-2814065630 Fix for this issue should be available with DataFusion 48. Thanks @chenkovsky for your effort -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Add nulls checks to generated pruning predicates [datafusion]

2025-04-17 Thread via GitHub
github-actions[bot] closed pull request #14297: Add nulls checks to generated pruning predicates URL: https://github.com/apache/datafusion/pull/14297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Draft: LogicalScalar [datafusion]

2025-04-17 Thread via GitHub
github-actions[bot] commented on PR #14609: URL: https://github.com/apache/datafusion/pull/14609#issuecomment-2814346641 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] adding `RowsReader` and writer [datafusion]

2025-04-17 Thread via GitHub
github-actions[bot] closed pull request #14149: adding `RowsReader` and writer URL: https://github.com/apache/datafusion/pull/14149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Bump `rand` to `0.9`, `rand_distr` to `0.5.0`, `getrandom` to `0.3.1` [datafusion]

2025-04-17 Thread via GitHub
github-actions[bot] closed pull request #14447: Bump `rand` to `0.9`, `rand_distr` to `0.5.0`, `getrandom` to `0.3.1` URL: https://github.com/apache/datafusion/pull/14447 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] crate readme has outdated version [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #465: crate readme has outdated version URL: https://github.com/apache/datafusion-ballista/issues/465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] [DISCUSS] Add open table format support. [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #1241: URL: https://github.com/apache/datafusion-ballista/issues/1241#issuecomment-2814093033 For reference - #456 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Upgrade to latest Tokio version(1.21.1) [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #281: Upgrade to latest Tokio version(1.21.1) URL: https://github.com/apache/datafusion-ballista/issues/281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2813216975 @berkaysynnada if CI passes I think this is ready to merge and we can tweak later as we implement in more places 😄 -- This is an automated message from the Apache Git Service. To

Re: [I] Release DataFusion `47.0.0` (April 2025) [datafusion]

2025-04-17 Thread via GitHub
alamb commented on issue #15072: URL: https://github.com/apache/datafusion/issues/15072#issuecomment-2812715785 Here is a draft upgrade guide: - https://github.com/apache/datafusion/pull/15749 -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049436982 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -163,6 +177,50 @@ impl GroupsAccumulatorAdapter { /// invokes f(accumul

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049489035 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating Mi

Re: [PR] fix: serialize listing table without partition column [datafusion]

2025-04-17 Thread via GitHub
alamb merged PR #15737: URL: https://github.com/apache/datafusion/pull/15737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Partitioned `ListingTable` read fails after logical plan ser/de [datafusion]

2025-04-17 Thread via GitHub
alamb closed issue #15718: Partitioned `ListingTable` read fails after logical plan ser/de URL: https://github.com/apache/datafusion/issues/15718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Refactor config [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #479: URL: https://github.com/apache/datafusion-ballista/issues/479#issuecomment-2814085453 Closing this as ballista uses same configuration like datafusion now. Please re open if needed -- This is an automated message from the Apache Git Service. To respon

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2811986744 > @berkaysynnada I updated the SLT tests. I'll have another review tomorrow but things I'd like to point out now: > > 1. We should still think about the `retry` parameter

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-17 Thread via GitHub
berkaysynnada commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2811993429 > Yep, I've noticed that too, but how those plans didn't change before? We were trying to pushdown filters over RepartitionExec's and CoalesceBatches Do you agree that we

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-04-17 Thread via GitHub
berkaysynnada commented on PR #14523: URL: https://github.com/apache/datafusion/pull/14523#issuecomment-2812071975 Thank you @ch-sc. I will review it today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [DISCUSS] Add open table format support. [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #1241: URL: https://github.com/apache/datafusion-ballista/issues/1241#issuecomment-2812079618 I believe we can setup an external project where integration effort can happen, we could build binaries there. It would make everyone's like easier. I'd be hap

Re: [I] crate readme has outdated version [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #465: URL: https://github.com/apache/datafusion-ballista/issues/465#issuecomment-2814087305 Looks like outdated, closing it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-17 Thread via GitHub
adriangb commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2814612031 Well the point of this recent work is to create filters from the order by clause :) -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Perf: Support automatically concat_batches for sort which will improve performance [datafusion]

2025-04-17 Thread via GitHub
zhuqi-lucas commented on code in PR #15380: URL: https://github.com/apache/datafusion/pull/15380#discussion_r2049125883 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -662,53 +665,152 @@ impl ExternalSorter { let elapsed_compute = metrics.elapsed_compute().clone()

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-17 Thread via GitHub
acking-you commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2814587680 > DataFusion already does late materialization: if orders filters by least to most expensive, then scans only the columns that the filter needs. Once it applies all filters i

Re: [PR] Add a fast path for `optimize_projection` [datafusion]

2025-04-17 Thread via GitHub
xudong963 commented on PR #15746: URL: https://github.com/apache/datafusion/pull/15746#issuecomment-2814633013 > 🤖: Benchmark completed > > Details > Quote reply It doesn't seem to have any obvious influence. -- This is an automated message from the Apache Git

Re: [I] [Epic] A collection of dynamic filtering related items [datafusion]

2025-04-17 Thread via GitHub
acking-you commented on issue #15512: URL: https://github.com/apache/datafusion/issues/15512#issuecomment-2814537239 I also noticed that recently, a PR was merged in ClickHouse that does exactly what was described above: https://github.com/ClickHouse/ClickHouse/pull/55518, with the correspo

Re: [PR] Coerce and simplify FixedSizeBinary equality to literal binary [datafusion]

2025-04-17 Thread via GitHub
jayzhan211 commented on PR #15726: URL: https://github.com/apache/datafusion/pull/15726#issuecomment-2814290610 Thanks @leoyvens @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Coerce and simplify FixedSizeBinary equality to literal binary [datafusion]

2025-04-17 Thread via GitHub
jayzhan211 merged PR #15726: URL: https://github.com/apache/datafusion/pull/15726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Allow parsing byte literals as FixedSizeBinary [datafusion]

2025-04-17 Thread via GitHub
jayzhan211 closed issue #15686: Allow parsing byte literals as FixedSizeBinary URL: https://github.com/apache/datafusion/issues/15686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[I] Support `FixedSizeBinary` to `BinaryView` [datafusion]

2025-04-17 Thread via GitHub
jayzhan211 opened a new issue, #15755: URL: https://github.com/apache/datafusion/issues/15755 As a follow on PR it might be good to also support `BinaryView` as well _Originally posted by @alamb in https://github.com/apache/datafusion/pull/15726#discussion_r2049064058_

Re: [I] `flatten` should be single-step, not recursive [datafusion]

2025-04-17 Thread via GitHub
alamb closed issue #13757: `flatten` should be single-step, not recursive URL: https://github.com/apache/datafusion/issues/13757 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] enable `supports_filter_during_aggregation` for Generic dialect [datafusion-sqlparser-rs]

2025-04-17 Thread via GitHub
goldmedal commented on PR #1815: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1815#issuecomment-2814306305 > > oops , I forgot to mention the tests. I found it will be covered by the original case: > > I was thinking a test that would prevent against regressions in a f

Re: [I] Comet 0.7.0 (March 2025) [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove closed issue #1420: Comet 0.7.0 (March 2025) URL: https://github.com/apache/datafusion-comet/issues/1420 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] Improve push down limit (logical optimizer rule) [datafusion]

2025-04-17 Thread via GitHub
xudong963 commented on PR #15744: URL: https://github.com/apache/datafusion/pull/15744#issuecomment-2814322918 > Topk IIUC, the topk in https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs is only used for test. -- This is an autom

Re: [PR] Minor: simplify code in datafusion-proto [datafusion]

2025-04-17 Thread via GitHub
alamb commented on PR #15752: URL: https://github.com/apache/datafusion/pull/15752#issuecomment-2813964914 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.8.0-1016-gcp #18-Ubuntu

Re: [I] Ballista standalone mode tests fail: `context::tests::test_task_stuck_when_referenced_task_failed` [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #25: Ballista standalone mode tests fail: `context::tests::test_task_stuck_when_referenced_task_failed` URL: https://github.com/apache/datafusion-ballista/issues/25 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Upgrade dependency datafusion-objectstore-hdfs to 0.1.2 [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm closed issue #450: Upgrade dependency datafusion-objectstore-hdfs to 0.1.2 URL: https://github.com/apache/datafusion-ballista/issues/450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
codecov-commenter commented on PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#issuecomment-2814488451 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1655?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-04-17 Thread via GitHub
berkaysynnada commented on code in PR #14523: URL: https://github.com/apache/datafusion/pull/14523#discussion_r2049071746 ## datafusion/expr-common/src/interval_arithmetic.rs: ## @@ -963,6 +961,23 @@ pub fn apply_operator(op: &Operator, lhs: &Interval, rhs: &Interval) -> Result

Re: [PR] Change `flatten` so it does only a level, not recursively [datafusion]

2025-04-17 Thread via GitHub
alamb merged PR #15160: URL: https://github.com/apache/datafusion/pull/15160 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Ballista: Partition columns are duplicated in protobuf decoding. [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #484: URL: https://github.com/apache/datafusion-ballista/issues/484#issuecomment-2813671752 Will be fixed in https://github.com/apache/datafusion/pull/15737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] feat: Add option to adjust writer buffer size for query output [datafusion]

2025-04-17 Thread via GitHub
alamb commented on code in PR #15747: URL: https://github.com/apache/datafusion/pull/15747#discussion_r2049392845 ## datafusion/datasource/src/write/mod.rs: ## @@ -88,6 +91,21 @@ pub async fn create_writer( file_compression_type.convert_async_writer(buf_writer) } +/// Re

Re: [PR] Add DataFusion 47.0.0 Upgrade Guide [datafusion]

2025-04-17 Thread via GitHub
comphead commented on code in PR #15749: URL: https://github.com/apache/datafusion/pull/15749#discussion_r2049129318 ## docs/source/library-user-guide/upgrading.md: ## @@ -19,6 +19,112 @@ # Upgrade Guides +## DataFusion `47.0.0` + +This section calls out some of the major c

[PR] fix: handle missing field correctly in native_iceberg_compat [datafusion-comet]

2025-04-17 Thread via GitHub
parthchandra opened a new pull request, #1656: URL: https://github.com/apache/datafusion-comet/pull/1656 ## Part of #1542 ## Rationale for this change In some cases, the field that is being requested does not exist in the file being read. With primitive types, one has to c

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2049397190 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2849,33 +2825,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] fix: handle missing field correctly in native_iceberg_compat [datafusion-comet]

2025-04-17 Thread via GitHub
parthchandra commented on PR #1656: URL: https://github.com/apache/datafusion-comet/pull/1656#issuecomment-2813644237 @mbutrovich @andygrove fyi some more test failures fixed in native_iceberg_compat. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Ballista: Partition columns are duplicated in protobuf decoding. [datafusion-ballista]

2025-04-17 Thread via GitHub
milenkovicm commented on issue #484: URL: https://github.com/apache/datafusion-ballista/issues/484#issuecomment-2813669738 I believe this is addressed in #1239 and subsequent datafusion commits -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] fix: better int96 support for experimental native scans [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove commented on code in PR #1652: URL: https://github.com/apache/datafusion-comet/pull/1652#discussion_r2049084879 ## spark/src/test/scala/org/apache/comet/CometCastSuite.scala: ## @@ -901,7 +901,6 @@ class CometCastSuite extends CometTestBase with AdaptiveSparkPlanHelpe

Re: [PR] feat: Add support for complex types in native shuffle [datafusion-comet]

2025-04-17 Thread via GitHub
andygrove commented on code in PR #1655: URL: https://github.com/apache/datafusion-comet/pull/1655#discussion_r2049396215 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2889,6 +2855,31 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049436982 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -163,6 +177,50 @@ impl GroupsAccumulatorAdapter { /// invokes f(accumul

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049446800 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -299,6 +420,12 @@ impl GroupsAccumulatorAdapter { } impl GroupsAccumulator

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049451786 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating

Re: [PR] feat: add `with_group_indices_order_mode` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-17 Thread via GitHub
rluvaton commented on code in PR #15022: URL: https://github.com/apache/datafusion/pull/15022#discussion_r2049452472 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -106,6 +107,44 @@ impl EmitTo { /// [`Accumulator`]: crate::accumulator::Accumulator /// [Aggregating

  1   2   3   >