[I] Diagram display problem [datafusion-site]

2024-08-05 Thread via GitHub
lewiszlw opened a new issue, #16: URL: https://github.com/apache/datafusion-site/issues/16 ![image](https://github.com/user-attachments/assets/a1fad003-f82c-48bf-8be7-3793f98be079) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[PR] chore(deps): update rstest requirement from 0.21.0 to 0.22.0 [datafusion]

2024-08-05 Thread via GitHub
dependabot[bot] opened a new pull request, #11811: URL: https://github.com/apache/datafusion/pull/11811 Updates the requirements on [rstest](https://github.com/la10736/rstest) to permit the latest version. Release notes Sourced from https://github.com/la10736/rstest/releases";>rste

Re: [I] Scalar NULL literal in aggregate functions are not supported (SQLancer) [datafusion]

2024-08-05 Thread via GitHub
xinlifoobar commented on issue #11749: URL: https://github.com/apache/datafusion/issues/11749#issuecomment-2268492991 I tested it on my machine and even on PostgreSQL the NULL literary is very random. I limited the change to min/max while there are better error handling in other aggregate f

[PR] Support NULL literal in Min/Max [datafusion]

2024-08-05 Thread via GitHub
xinlifoobar opened a new pull request, #11812: URL: https://github.com/apache/datafusion/pull/11812 ## Which issue does this PR close? Closes #11749 ## Rationale for this change ## What changes are included in this PR? ## Are these changes t

[PR] Don't implement `create_sliding_accumulator` repeatedly [datafusion]

2024-08-05 Thread via GitHub
lewiszlw opened a new pull request, #11813: URL: https://github.com/apache/datafusion/pull/11813 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[PR] Add valid Distinct case for aggregation [datafusion]

2024-08-05 Thread via GitHub
mertak-synnada opened a new pull request, #11814: URL: https://github.com/apache/datafusion/pull/11814 ## Which issue does this PR close? - Closes #. ## Rationale for this change This change is for satisfying the related conversation: https://github.com/apache/datafus

Re: [PR] Improve `AccumulatorArgs` by removing the usgaes of `input_types` [datafusion]

2024-08-05 Thread via GitHub
jayzhan211 commented on code in PR #11761: URL: https://github.com/apache/datafusion/pull/11761#discussion_r1703900440 ## datafusion/expr/src/expressions/column.rs: ## Review Comment: @xinlifoobar We should not move `physical-expr` in `expr`, what is the issue you met tha

Re: [PR] Improve `AccumulatorArgs` by removing the usgaes of `input_types` [datafusion]

2024-08-05 Thread via GitHub
jayzhan211 commented on code in PR #11761: URL: https://github.com/apache/datafusion/pull/11761#discussion_r1703903524 ## datafusion/expr/src/expressions/column.rs: ## Review Comment: We might to the refactor #11359 before removing `input_types` -- This is an automated

Re: [PR] Improve MSRV CI check to print out problems to log [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11789: URL: https://github.com/apache/datafusion/pull/11789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve MSRV CI check to print out problems to log [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11789: URL: https://github.com/apache/datafusion/pull/11789#issuecomment-2268779262 Thanks again @korowa and @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] CI cargo msrv check fails silently [datafusion]

2024-08-05 Thread via GitHub
alamb closed issue #11788: CI cargo msrv check fails silently URL: https://github.com/apache/datafusion/issues/11788 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscr

Re: [I] Some memory reservations of GroupedHashAggregateStream seem to be mis-tagged as spillable while they do not allow spilling [datafusion]

2024-08-05 Thread via GitHub
Ablu commented on issue #11390: URL: https://github.com/apache/datafusion/issues/11390#issuecomment-2268783795 Simply sorting seems to run into the same kind of problems: ``` External(External(ResourcesExhausted("Failed to allocate additional 2470912 bytes for ExternalSorterMerge[0] wi

Re: [PR] Skipping partial aggregation when it is not helping for high cardinality aggregates [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2268786466 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Skipping partial aggregation when it is not helping for high cardinality aggregates [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11627: URL: https://github.com/apache/datafusion/pull/11627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add valid Distinct case for aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11814: URL: https://github.com/apache/datafusion/pull/11814#discussion_r1703936973 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -82,10 +82,11 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { for dep in i

Re: [PR] Improve log func tests stability [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11808: URL: https://github.com/apache/datafusion/pull/11808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Metrics for when partial aggregation mode is hit [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new issue, #11815: URL: https://github.com/apache/datafusion/issues/11815 ### Is your feature request related to a problem or challenge? @korowa added "skip partial aggregation mode" in https://github.com/apache/datafusion/pull/11627 which helps with high cardinality

[I] Improve performance of `GeometricMean` aggregage: implement `convert_to_state` [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new issue, #11816: URL: https://github.com/apache/datafusion/issues/11816 ### Is your feature request related to a problem or challenge? @korowa added "skip partial aggregation mode" in https://github.com/apache/datafusion/pull/11627 which helps with high cardinality a

[PR] Add create index plan [datafusion]

2024-08-05 Thread via GitHub
lewiszlw opened a new pull request, #11817: URL: https://github.com/apache/datafusion/pull/11817 ## Which issue does this PR close? Closes #. ## Rationale for this change Like prepare/transaction statements, we could provide create index plan to users and let

Re: [I] Improve performance of `Avg` aggregage: implement `convert_to_state` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11816: URL: https://github.com/apache/datafusion/issues/11816#issuecomment-2268818210 I have a draft PR for this here: https://github.com/apache/datafusion/pull/11734 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[I] Implement `convert_to_state` for BooleanGroupsAccumulator [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new issue, #11818: URL: https://github.com/apache/datafusion/issues/11818 ### Is your feature request related to a problem or challenge? dd ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered _No resp

Re: [I] Metrics for when partial aggregation mode is hit [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11815: URL: https://github.com/apache/datafusion/issues/11815#issuecomment-2268823100 I have a draft PR for this here; https://github.com/apache/datafusion/pull/11706 -- This is an automated message from the Apache Git Service. To respond to the message, please l

[I] ddd [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new issue, #11819: URL: https://github.com/apache/datafusion/issues/11819 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Skipping partial aggregation when it is not helping for high cardinality aggregates [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11627: URL: https://github.com/apache/datafusion/pull/11627#issuecomment-2268830451 Thank you again everyone for all your work. I am hoping this is the first step towards some significantly improved TPCH / ClickBench performance I filed the following fol

[PR] Improve comments in row_hash.rs for skipping aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11820: URL: https://github.com/apache/datafusion/pull/11820 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/11627 ## Rationale for this change @2010YOUY01 had some good suggestions on https://git

Re: [PR] Skipping partial aggregation when it is not helping for high cardinality aggregates [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11627: URL: https://github.com/apache/datafusion/pull/11627#discussion_r1703970839 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -90,6 +94,69 @@ struct SpillState { merging_group_by: PhysicalGroupBy, } +struct SkipAggregatio

[PR] Minor: refactor probe check into function `should_skip_aggregation` [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11821: URL: https://github.com/apache/datafusion/pull/11821 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/11627 ## Rationale for this change Per https://github.com/apache/datafusion/pull/116

Re: [PR] Minor: refactor probe check into function `should_skip_aggregation` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11821: URL: https://github.com/apache/datafusion/pull/11821#discussion_r1703979218 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -635,11 +635,7 @@ impl Stream for GroupedHashAggregateStream { (

Re: [PR] Skipping partial aggregation when it is not helping for high cardinality aggregates [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11627: URL: https://github.com/apache/datafusion/pull/11627#discussion_r1703979815 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -484,6 +612,12 @@ impl Stream for GroupedHashAggregateStream { (

[PR] Add missing expressions to wrapper export [datafusion-python]

2024-08-05 Thread via GitHub
timsaucer opened a new pull request, #795: URL: https://github.com/apache/datafusion-python/pull/795 # Which issue does this PR close? Fixes CI for newly added expressions # Rationale for this change In #793 two expressions were added to the internal module that were not

[I] Support of timestamps and steps of less than a day for `generate_series` [datafusion]

2024-08-05 Thread via GitHub
Abdullahsab3 opened a new issue, #11822: URL: https://github.com/apache/datafusion/issues/11822 ### Is your feature request related to a problem or challenge? I would like to have support of time generation using time ranges and with steps that are less than 1 day (e.g. an hour, custo

Re: [I] Support of timestamps and steps of less than a day for `generate_series` [datafusion]

2024-08-05 Thread via GitHub
Abdullahsab3 commented on issue #11822: URL: https://github.com/apache/datafusion/issues/11822#issuecomment-2268892380 I opened https://github.com/apache/datafusion/issues/11823 for the bug as well -- This is an automated message from the Apache Git Service. To respond to the message, ple

[I] `generate_series` hangs indefinitely when providing a step of less than 1 day [datafusion]

2024-08-05 Thread via GitHub
Abdullahsab3 opened a new issue, #11823: URL: https://github.com/apache/datafusion/issues/11823 ### Describe the bug When using dates for `generate_series` and providing a step of less than 1 day, Datafusion hangs indefinitely without any failures/successes (or maybe I didn't wait lo

Re: [I] Aggregate SQL query with filter not working in datafusion CLI [datafusion]

2024-08-05 Thread via GitHub
2010YOUY01 closed issue #11783: Aggregate SQL query with filter not working in datafusion CLI URL: https://github.com/apache/datafusion/issues/11783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Aggregate SQL query with filter not working in datafusion CLI [datafusion]

2024-08-05 Thread via GitHub
2010YOUY01 commented on issue #11783: URL: https://github.com/apache/datafusion/issues/11783#issuecomment-2268950476 > For an aggregate function with a filter, you should enable PostgreSQL dialect by using `set datafusion.sql_parser.dialect = 'Postgres';` I see, thank you -- This i

Re: [PR] Add valid Distinct case for aggregation [datafusion]

2024-08-05 Thread via GitHub
mertak-synnada commented on code in PR #11814: URL: https://github.com/apache/datafusion/pull/11814#discussion_r1704073637 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -82,10 +82,11 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { for

Re: [I] SQL Logical Plan Drops ordering within UDAFs [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #7531: URL: https://github.com/apache/datafusion/issues/7531#issuecomment-2269012692 I think we may be close to / done fixing this as part of https://github.com/apache/datafusion/issues/8708 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] Add valid Distinct case for aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11814: URL: https://github.com/apache/datafusion/pull/11814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add valid Distinct case for aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11814: URL: https://github.com/apache/datafusion/pull/11814#discussion_r1704081412 ## datafusion/optimizer/src/replace_distinct_aggregate.rs: ## @@ -82,10 +82,11 @@ impl OptimizerRule for ReplaceDistinctWithAggregate { for dep in i

Re: [PR] Add `skipped_aggregation_rows`to aggregate operator [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11706: URL: https://github.com/apache/datafusion/pull/11706#discussion_r1704094369 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -611,6 +629,9 @@ impl Stream for GroupedHashAggregateStream { match ready!(self.in

[I] feat: better exception when table doesn't exist [datafusion-python]

2024-08-05 Thread via GitHub
gforsyth opened a new issue, #796: URL: https://github.com/apache/datafusion-python/issues/796 Ibis supports datafusion as one of our backends, and we make use of `datafusion-python` in service of that. One of the thornier issues we face right is that Datafusion returns `Exception` a

Re: [I] Improve performance of other non GroupsAdapter aggregates: implement `convert_to_state` [datafusion]

2024-08-05 Thread via GitHub
Rachelint commented on issue #11819: URL: https://github.com/apache/datafusion/issues/11819#issuecomment-2269242948 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Change name of MAX/MIN udaf to lowercase max/min [datafusion]

2024-08-05 Thread via GitHub
jayzhan211 merged PR #11795: URL: https://github.com/apache/datafusion/pull/11795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Don't implement `create_sliding_accumulator` repeatedly [datafusion]

2024-08-05 Thread via GitHub
jayzhan211 merged PR #11813: URL: https://github.com/apache/datafusion/pull/11813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Support NULL literal in Min/Max [datafusion]

2024-08-05 Thread via GitHub
jayzhan211 commented on code in PR #11812: URL: https://github.com/apache/datafusion/pull/11812#discussion_r1704244482 ## datafusion/core/tests/dataframe/describe.rs: ## @@ -102,8 +102,8 @@ async fn describe_null() -> Result<()> { "| null_count | 0| 1|",

Re: [PR] chore(deps): update rstest requirement from 0.21.0 to 0.22.0 [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11811: URL: https://github.com/apache/datafusion/pull/11811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Align UDF names to lowercase or uppercase [datafusion]

2024-08-05 Thread via GitHub
edmondop closed issue #11779: Align UDF names to lowercase or uppercase URL: https://github.com/apache/datafusion/issues/11779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11824: URL: https://github.com/apache/datafusion/pull/11824#discussion_r1704272966 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -4530,21 +4530,21 @@ EXPLAIN SELECT DISTINCT c3, min(c1) FROM aggregate_test_100 group by c3 limit 5; --

[PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11824: URL: https://github.com/apache/datafusion/pull/11824 ## Which issue does this PR close? N/A ## Rationale for this change CI is failing on main, for example https://github.com/apache/datafusion/actions/runs/10251066368/job/28358425606

Re: [PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11824: URL: https://github.com/apache/datafusion/pull/11824#issuecomment-2269328261 Thanks @lewiszlw -- let's merge this once the tests pass green to get CI back success -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11824: URL: https://github.com/apache/datafusion/pull/11824#issuecomment-2269412652 ✅ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11824: URL: https://github.com/apache/datafusion/pull/11824 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Minor: Update exected output due to logical conflict [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11824: URL: https://github.com/apache/datafusion/pull/11824#issuecomment-2269412858 Thanks for the speedy review @lewiszlw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] Use `filtered_null_mask` in `CountGroupsAccumulator ` and `PrimitiveGroupsAccumulator` [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11825: URL: https://github.com/apache/datafusion/pull/11825 ## Which issue does this PR close? Draft: - [ ] Builds on https://github.com/apache/datafusion/pull/11734 - [ ] Run end to end benchmarks - [ ] Run `cargo bench` benchmarks

Re: [PR] Single mode for multi column group by -- Almost 2x for ClickBench Q32 [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11792: URL: https://github.com/apache/datafusion/pull/11792#issuecomment-2269439713 I am running some benchmarks on this one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Comet library not initializing [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove commented on issue #773: URL: https://github.com/apache/datafusion-comet/issues/773#issuecomment-2269448057 Thanks for reporting this @zelda89. Could you show me the output of the following command? ``` $ jar tvf spark/target/comet-spark-spark3.4_2.12-0.2.0-SNAPSHOT | gr

Re: [PR] doc: Add support for `map` and `make_map` functions [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11799: URL: https://github.com/apache/datafusion/pull/11799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Document the Map function in the documentation [datafusion]

2024-08-05 Thread via GitHub
alamb closed issue #11435: Document the Map function in the documentation URL: https://github.com/apache/datafusion/issues/11435 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Single mode for multi column group by -- Almost 2x for ClickBench Q32 [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11792: URL: https://github.com/apache/datafusion/pull/11792#issuecomment-2269563693 I also measured some non trivial improvements (16 core) along with some slowdowns ``` Benchmark clickbench_1.json ┏━

Re: [I] Support of timestamps and steps of less than a day for `generate_series` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11822: URL: https://github.com/apache/datafusion/issues/11822#issuecomment-2269570836 Thank you @Abdullahsab3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] [Minor] Short circuit `ApplyFunctionRewrites` if there are no function rewrites [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11765: URL: https://github.com/apache/datafusion/pull/11765#issuecomment-2269612599 > However, the tightest bottleneck as per lldb is actually ApplyFunctionRewrites, which can't be opted out of, even though after https://github.com/apache/datafusion/pull/11155 it has

Re: [PR] minor: always time batch_filter even when the result is an empty batch [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11775: URL: https://github.com/apache/datafusion/pull/11775#issuecomment-2269613815 👌 this is a nice one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[I] DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new issue, #11826: URL: https://github.com/apache/datafusion/issues/11826 Follow on to https://github.com/apache/datafusion/issues/11710 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 **It would be great for other contributors to DataFusion to

Re: [I] DataFusion weekly project plan (Andrew Lamb) - July 29, 2024 [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11710: URL: https://github.com/apache/datafusion/issues/11710#issuecomment-2269621430 Next week's plan: https://github.com/apache/datafusion/issues/11826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11826: URL: https://github.com/apache/datafusion/issues/11826#issuecomment-2269622464 REview Queue: DataFusion - [ ] https://github.com/apache/datafusion/pull/11792 - [ ] https://github.com/apache/datafusion/pull/11758 - [ ] https://github.com/apache/

Re: [I] DataFusion weekly project plan (Andrew Lamb) - July 29, 2024 [datafusion]

2024-08-05 Thread via GitHub
alamb closed issue #11710: DataFusion weekly project plan (Andrew Lamb) - July 29, 2024 URL: https://github.com/apache/datafusion/issues/11710 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] Fix TODO comment in QueryPlanSerde handling of Coalesce [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove opened a new issue, #779: URL: https://github.com/apache/datafusion-comet/issues/779 ### What is the problem the feature request solves? I think that we can remove this cast call now. ```rust case a @ Coalesce(_) => val exprChildren = a.children

[PR] Impl `convert_to_state` for `GroupsAccumulatorAdapter`. [datafusion]

2024-08-05 Thread via GitHub
Rachelint opened a new pull request, #11827: URL: https://github.com/apache/datafusion/pull/11827 ## Which issue does this PR close? Closes #11819 ## Rationale for this change See #11819 ## What changes are included in this PR? Impl `convert_to_sta

Re: [PR] Improve readme page in crates.io [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11809: URL: https://github.com/apache/datafusion/pull/11809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve readme page in crates.io [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11809: URL: https://github.com/apache/datafusion/pull/11809#issuecomment-2269663560 Thanks again @lewiszlw -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Pass scalar to `eq` inside `nullif` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11697: URL: https://github.com/apache/datafusion/pull/11697#issuecomment-2269664634 Thanks again @simonvandel and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Pass scalar to `eq` inside `nullif` [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11697: URL: https://github.com/apache/datafusion/pull/11697 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `LogicalPlan::CreateIndex` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11817: URL: https://github.com/apache/datafusion/pull/11817#discussion_r1704494633 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4426,6 +4427,35 @@ fn test_parse_escaped_string_literal_value() { ) } +#[test] +fn plan_create_index() {

Re: [I] Add MetricValue that returns the max during aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11754: URL: https://github.com/apache/datafusion/issues/11754#issuecomment-2269674208 I worry: 1. This is something very special 2. The addition of a new metric type solely to change how the aggregator works will cause confusion to other users of metrics

Re: [PR] feat: Add MaxTime type for a Time that returns the max on aggregation [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11755: URL: https://github.com/apache/datafusion/pull/11755#issuecomment-2269674625 I had some more thoughts on this PR here: https://github.com/apache/datafusion/issues/11754 -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] fix: invalid sqls when unparsing derived table with columns contains calculations, limit/order/distinct [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11756: URL: https://github.com/apache/datafusion/pull/11756#discussion_r1704499611 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -373,6 +373,38 @@ fn roundtrip_statement_with_dialect() -> Result<()> { parser_dialect: Box::new(Gen

Re: [PR] fix: hash join tests with forced collisions [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11806: URL: https://github.com/apache/datafusion/pull/11806#discussion_r1704502590 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -2029,12 +2031,21 @@ mod tests { assert_eq!(columns, vec!["a1", "b2", "c1", "a1", "b2", "c2"]);

Re: [PR] Support `convert_to_state` for `AVG` accumulator [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11734: URL: https://github.com/apache/datafusion/pull/11734#issuecomment-2269687928 This PR is failing like the following when running on benchmarks on TPCH. I think there may be a bug related to types in the intermediates. I will keep debugging ``` Query 19

[PR] Minor: make it clearer that clone() is not slow [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11828: URL: https://github.com/apache/datafusion/pull/11828 ## Which issue does this PR close? Builds on https://github.com/apache/datafusion/pull/11802 ## Rationale for this change While reviewing https://github.com/apache/datafusion

[PR] Minor: Avoid need for PartitionedFile default [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11829: URL: https://github.com/apache/datafusion/pull/11829 ## Which issue does this PR close? Follow on to https://github.com/apache/datafusion/pull/11802 ## Rationale for this change While reviewing https://github.com/apache/datafusion/

[PR] fix: withInfo was overwriting information in some cases [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove opened a new pull request, #780: URL: https://github.com/apache/datafusion-comet/pull/780 ## Which issue does this PR close? N/A ## Rationale for this change Calls to withInfo would sometimes overwrite information from previous calls to withInfo

Re: [PR] Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11802: URL: https://github.com/apache/datafusion/pull/11802#discussion_r1704504377 ## datafusion/core/src/datasource/listing/mod.rs: ## @@ -78,10 +78,11 @@ pub struct PartitionedFile { /// /// DataFusion relies on these statistics for pla

Re: [PR] Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11802: URL: https://github.com/apache/datafusion/pull/11802#discussion_r1704504377 ## datafusion/core/src/datasource/listing/mod.rs: ## @@ -78,10 +78,11 @@ pub struct PartitionedFile { /// /// DataFusion relies on these statistics for pla

Re: [PR] Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11802: URL: https://github.com/apache/datafusion/pull/11802#discussion_r1704504377 ## datafusion/core/src/datasource/listing/mod.rs: ## @@ -78,10 +78,11 @@ pub struct PartitionedFile { /// /// DataFusion relies on these statistics for pla

Re: [PR] Reduce clone of `Statistics` in `ListingTable` and `PartitionedFile` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11802: URL: https://github.com/apache/datafusion/pull/11802#issuecomment-2269749336 > But when I pulling main and rebasing, the results become different... > I can almost make sure that, it is not really related to codes... So strange... I think there can be

Re: [PR] Impl `convert_to_state` for `GroupsAccumulatorAdapter`. [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11827: URL: https://github.com/apache/datafusion/pull/11827#discussion_r1704538507 ## datafusion/physical-expr/src/aggregate/groups_accumulator/adapter.rs: ## @@ -342,6 +374,50 @@ impl GroupsAccumulator for GroupsAccumulatorAdapter { fn size(&

Re: [PR] fix: withInfo was overwriting information in some cases [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove commented on PR #780: URL: https://github.com/apache/datafusion-comet/pull/780#issuecomment-2269753958 @parthchandra could you review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] move `aggregate_statistics`, `enforce_sorting`, `limited_distinct_aggregtion` and `replace_with_order_preserving_variants` to `datafusion-physical-optimizer` crate [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11793: URL: https://github.com/apache/datafusion/pull/11793#issuecomment-2269754485 Thank you @Weijun-H -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] Improve `AccumulatorArgs` by removing the usgaes of `input_types` [datafusion]

2024-08-05 Thread via GitHub
alamb commented on code in PR #11761: URL: https://github.com/apache/datafusion/pull/11761#discussion_r1704540401 ## datafusion/expr/src/expressions/column.rs: ## Review Comment: THank you all for this -- yes getting the physical-expr and `Expr` types untangled will be a g

Re: [I] Improve parquet ListingTable speed with parquet metadata (short clickbench queries) [datafusion]

2024-08-05 Thread via GitHub
alamb commented on issue #11719: URL: https://github.com/apache/datafusion/issues/11719#issuecomment-2269756478 https://github.com/apache/datafusion/pull/11802 is very nice 👌 It would be fascinating to know what the flamegraph looks like after that PR (aka what are next highest bottleneck)

Re: [PR] Add missing expressions to wrapper export [datafusion-python]

2024-08-05 Thread via GitHub
andygrove merged PR #795: URL: https://github.com/apache/datafusion-python/pull/795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: support `Utf8View` type in `starts_with` function [datafusion]

2024-08-05 Thread via GitHub
alamb commented on PR #11787: URL: https://github.com/apache/datafusion/pull/11787#issuecomment-2269767653 > With the tests, things look good generally, though `STARTS_WITH(column1_utf8, column2_utf8view)` is a bit unfortunate as casts the first column to a utf8view. I think in this

[PR] build: Re-enable TPCDS q72 [datafusion-comet]

2024-08-05 Thread via GitHub
viirya opened a new pull request, #781: URL: https://github.com/apache/datafusion-comet/pull/781 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [PR] feat: support `Utf8View` type in `starts_with` function [datafusion]

2024-08-05 Thread via GitHub
tshauck commented on PR #11787: URL: https://github.com/apache/datafusion/pull/11787#issuecomment-2269793471 Sounds good, I'll leave this as is then, and will move on to making the tickets for the rest of the functions as discussed. -- This is an automated message from the Apache Git Serv

[PR] chore: bump DataFusion to rev c6f0d3c [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove opened a new pull request, #782: URL: https://github.com/apache/datafusion-comet/pull/782 ## Which issue does this PR close? N/A ## Rationale for this change Weekly DataFusion revision bump. ## What changes are included in this PR?

Re: [PR] chore: bump DataFusion to rev c6f0d3c [datafusion-comet]

2024-08-05 Thread via GitHub
andygrove commented on PR #782: URL: https://github.com/apache/datafusion-comet/pull/782#issuecomment-2269819481 Build fails with: ``` error[E0432]: unresolved imports `datafusion::physical_expr::expressions::Max`, `datafusion::physical_expr::expressions::Min` --> core/src/ex

Re: [PR] refactor: move `aggregate_statistics` to `datafusion-physical-optimizer` [datafusion]

2024-08-05 Thread via GitHub
alamb merged PR #11798: URL: https://github.com/apache/datafusion/pull/11798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Move optimizer integration tests to `core_integration` [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11830: URL: https://github.com/apache/datafusion/pull/11830 ## Which issue does this PR close? Follow up https://github.com/apache/datafusion/pull/11798 Part of https://github.com/apache/datafusion/issues/11502 ## Rationale for this change

Re: [PR] chore: bump DataFusion to rev c6f0d3c [datafusion-comet]

2024-08-05 Thread via GitHub
huaxingao commented on PR #782: URL: https://github.com/apache/datafusion-comet/pull/782#issuecomment-2269857613 > Looks like we need to update Min and Max as we have done recently for Sum and Count Thanks for pinging me. I will fix this. -- This is an automated message from th

[PR] Minor: move path_partition into `core_integration` [datafusion]

2024-08-05 Thread via GitHub
alamb opened a new pull request, #11831: URL: https://github.com/apache/datafusion/pull/11831 ## Which issue does this PR close? N/A ## Rationale for this change Each individual .rs file in the `tests` directory results in a new test binary which means: * 10s of MB of new

  1   2   >