Re: [PR] Implement intermeidate result blocked approach sketch [datafusion]

2025-04-28 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2834179862 This pr may be ready to review now, it is a formal version of #11943 (have proved the idea promising, 10%~20% faster) with some improvements: - Almost don't lead any regression w

Re: [I] main 5a7f638 is broken [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer commented on issue #1118: URL: https://github.com/apache/datafusion-python/issues/1118#issuecomment-2834954709 Duplicate of https://github.com/apache/datafusion-python/issues/1116 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] fix: recursive import [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer commented on PR #1117: URL: https://github.com/apache/datafusion-python/pull/1117#issuecomment-2834958727 Thank you so much. I also have a branch for this that I was going to push up, but I'll prefer your solution. I do want to add unit tests, so I might push up what I have on th

[PR] Unparse `UNNEST` projection with the table column alias [datafusion]

2025-04-28 Thread via GitHub
goldmedal opened a new pull request, #15879: URL: https://github.com/apache/datafusion/pull/15879 ## Which issue does this PR close? - Closes #15233 . ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Use `interleave` to speed up hash repartitioning [datafusion]

2025-04-28 Thread via GitHub
comphead commented on code in PR #15768: URL: https://github.com/apache/datafusion/pull/15768#discussion_r2063914703 ## datafusion/physical-plan/src/repartition/mod.rs: ## @@ -260,33 +260,43 @@ impl BatchPartitioner { } => { let idx = *next_

Re: [PR] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra commented on code in PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#discussion_r2064055623 ## common/src/main/java/org/apache/comet/CometSchemaImporter.java: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one

Re: [PR] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on code in PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#discussion_r2064062673 ## common/src/main/java/org/apache/comet/CometSchemaImporter.java: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
huaxingao commented on PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#issuecomment-2835847941 > The scan is a org.apache.iceberg.spark.source.SparkBatchQueryScan which does not implement SupportsComet Sorry, I forgot to mention that I have to [PR](https://github

[PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
aharpervc opened a new pull request, #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831 This PR is a followup to https://github.com/apache/datafusion-sqlparser-rs/pull/1821 to address several difficulties I had parsing real world SQL files. There are 3 related en

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2064171122 ## src/ast/mod.rs: ## @@ -2226,7 +2226,33 @@ impl fmt::Display for IfStatement { } } -/// A block within a [Statement::Case] or [Statement::I

Re: [PR] feat: add `register_metadata` function for `GroupsAccumulator` to help create specialized impl [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15022: URL: https://github.com/apache/datafusion/pull/15022#issuecomment-2835680575 > That is not always the case, some users like Comet for example build PhysicalPlan directly and execute that and does not use the optimizer at all. I wonder if we can take a ste

Re: [PR] Support `GroupsAccumulator` for Avg duration [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15748: URL: https://github.com/apache/datafusion/pull/15748#issuecomment-2835731298 I think the groups accumulator will result in faster performance, not new functionality. @shruti2522 maybe you can verify this with `datafusion-cli` and using `generate_series`

Re: [PR] Add slt tests for `datafusion.execution.parquet.coerce_int96` setting [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15723: URL: https://github.com/apache/datafusion/pull/15723 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add slt tests for `datafusion.execution.parquet.coerce_int96` setting [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15723: URL: https://github.com/apache/datafusion/pull/15723#issuecomment-2835794392 Thanks @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] fix: typo for `instr` in fuzz testing [datafusion-comet]

2025-04-28 Thread via GitHub
mbutrovich opened a new pull request, #1686: URL: https://github.com/apache/datafusion-comet/pull/1686 ## Which issue does this PR close? Closes #. ## Rationale for this change There's a typo in our fuzz test query generator. `instr` is misspelled as `in_

Re: [I] Release sqlparser-rs version `0.56.0` around 2024-04-20 [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
alamb commented on issue #1756: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1756#issuecomment-2835916691 @iffyio do you think we are ready to make a release candidate? Specifically is there any other PR you think we should try and get into this release? If not, I'

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-28 Thread via GitHub
timsaucer commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2835919583 @berkaysynnada do you think you'll be able to review soon? I know we wanted to get this in earlier in the 48 cycle to shake out any bugs since it is a big change -- This is an a

Re: [PR] Prepare for 0.56.0 release: Version and CHANGELOG [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
alamb commented on PR #1822: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1822#issuecomment-2835918737 I'll plan to merge this PR (after update) when we are happy with the release contents - See this comment for more https://github.com/apache/datafusion-sqlparser-rs/i

Re: [I] How to install ballista python package? [datafusion-ballista]

2025-04-28 Thread via GitHub
Wuerike commented on issue #1257: URL: https://github.com/apache/datafusion-ballista/issues/1257#issuecomment-2835923772 Thanks @milenkovicm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] How to install ballista python package? [datafusion-ballista]

2025-04-28 Thread via GitHub
Wuerike closed issue #1257: How to install ballista python package? URL: https://github.com/apache/datafusion-ballista/issues/1257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add support for `XMLTABLE` [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
alamb commented on PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817#issuecomment-2835929793 Epic -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
alamb commented on PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808#issuecomment-2835931034 Thanks again @iffyio and @aharpervc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Add `OR ALTER` support for `CREATE VIEW` [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
alamb commented on PR #1818: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1818#issuecomment-2835931909 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Implement method to apply scalar or aggregate function to Array elements [datafusion]

2025-04-28 Thread via GitHub
timsaucer opened a new issue, #15882: URL: https://github.com/apache/datafusion/issues/15882 ### Is your feature request related to a problem or challenge? Suppose I have an DataFrame in which one column contains arrays. I wish to be able to apply any scalar expr to each value of that

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-04-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2064138880 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -90,6 +90,29 @@ class CometExpressionSuite extends CometTestBase with Adap

[I] Weekly Plan: Andrew Lamb 2025-04-28 [datafusion]

2025-04-28 Thread via GitHub
alamb opened a new issue, #15880: URL: https://github.com/apache/datafusion/issues/15880 This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2063976818 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchma

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove merged PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#issuecomment-2835706462 > are we okay to merge this PR? Yes, I'll go ahead and merge and we can follow up with change -- This is an automated message from the Apache Git Service. To respond t

[I] Set up Comet + Iceberg integration tests in CI [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove opened a new issue, #1685: URL: https://github.com/apache/datafusion-comet/issues/1685 ### What is the problem the feature request solves? We want to avoid making code changes in Comet that cause regressions with the Iceberg integration. I would like to add an integra

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
Rachelint commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2835790300 My lists (still mainly about aggregation performance, it has falled much behind duckdb on clickbench...): - [ ] Push forward #7065 , which has been proved to be really promi

[PR] POC: Eliminate unnecessary group by keys (q35 in clickbench 1.35x faster) [datafusion]

2025-04-28 Thread via GitHub
Rachelint opened a new pull request, #13617: URL: https://github.com/apache/datafusion/pull/13617 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[I] Internal error with `generate_series`: Internal error: could not cast array of type Date32 [datafusion]

2025-04-28 Thread via GitHub
alamb opened a new issue, #15881: URL: https://github.com/apache/datafusion/issues/15881 ### Describe the bug When invalid arguments are passed to `generate_series` it generates internal errors rather than normal execution errorrs Also the errors are not clear how to fix the is

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2064175724 ## src/ast/mod.rs: ## @@ -3403,6 +3447,10 @@ pub enum Statement { /// Cursor name name: Ident, direction: FetchDirection,

Re: [PR] Improve support for cursors for SQL Server [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
aharpervc commented on code in PR #1831: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1831#discussion_r2064175279 ## src/ast/mod.rs: ## @@ -3032,6 +3068,14 @@ pub enum Statement { partition: Option>, }, /// ```sql +/// OPEN cursor_name +

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-28 Thread via GitHub
duongcongtoai commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2836125284 Yes, that paper basically gave pretty neat skeleton for a decorrelation framework -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Release sqlparser-rs version `0.56.0` around 2024-04-20 [datafusion-sqlparser-rs]

2025-04-28 Thread via GitHub
iffyio commented on issue #1756: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1756#issuecomment-2836134518 @alamb yeah I think we should be good to make a release candidate! -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-04-28 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2836123267 According to the discussions in this issue, i think we can list the following items to support a subqueries decorrelation framework: - Unify the optimizor for correlat

Re: [I] An error occurs when ordering by an aggregate function (like AVG) that is not included in the SELECT list. [datafusion]

2025-04-28 Thread via GitHub
UBarney commented on issue #15875: URL: https://github.com/apache/datafusion/issues/15875#issuecomment-2834567285 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] An error occurs when ordering by an aggregate function (like AVG) that is not included in the SELECT list. [datafusion]

2025-04-28 Thread via GitHub
UBarney opened a new issue, #15875: URL: https://github.com/apache/datafusion/issues/15875 ### Describe the bug Executing the SQL query SELECT value, max(value) + min(value) FROM generate_series(1, 5) GROUP BY value ORDER BY avg(value); results in an error ### To Reproduce

Re: [I] Rewrite `datafusion-sqlancer` in Rust [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on issue #14535: URL: https://github.com/apache/datafusion/issues/14535#issuecomment-2834586678 > > I bet we are not the only project that would like to have SQLLancer type support in Rust > > That's really cool! Databend will definitely be interesting in building

[PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-04-28 Thread via GitHub
UBarney opened a new pull request, #15876: URL: https://github.com/apache/datafusion/pull/15876 ## Which issue does this PR close? - Closes #15875. ## Rationale for this change ## What changes are included in this PR? If `sorts.expr` contains agg_func n

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
srh commented on PR #15836: URL: https://github.com/apache/datafusion/pull/15836#issuecomment-2834607199 @comphead It isn't parser related, it's a bug in evaluation, with this optimizer. It seems a query like `"SELECT s.col ILIKE 'A' FROM (SELECT 'a' AS col) AS s"` will trigger the b

Re: [PR] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on code in PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#discussion_r2063690526 ## common/src/main/java/org/apache/comet/CometSchemaImporter.java: ## @@ -0,0 +1,30 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + *

Re: [PR] [wip] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#issuecomment-2835237562 The scan is a `org.apache.iceberg.spark.source.SparkBatchQueryScan` which does not implement `SupportsComet`. In fact, I do not see any references to `SupportsComet` in Iceber

Re: [PR] fix: recursive import [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer merged PR #1117: URL: https://github.com/apache/datafusion-python/pull/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] fix: recursive import [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer commented on PR #1117: URL: https://github.com/apache/datafusion-python/pull/1117#issuecomment-2835242970 Actually, it's going to take a while to get all these unit tests written up so I'll merge your solution in to fix CI and then put my PR up separate. -- This is an automated

Re: [PR] docs: Add documentation for accelerating Iceberg Parquet scans with Comet [datafusion-comet]

2025-04-28 Thread via GitHub
andygrove commented on PR #1683: URL: https://github.com/apache/datafusion-comet/pull/1683#issuecomment-2835258164 After modifying Iceberg to make `SparkBatchQueryScan` implement `SupportsComet`, this now appears to be fully working. ``` scala> spark.sql(s"SELECT * from t1").show(

Re: [PR] Update extending-operators.md [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2835290751 > > You can rebase with main > > doe this solve the issue ? You can open the failed CI and see what's wrong: ``` error[E0599]: no method named `unwrap` found for

Re: [I] main 5a7f638 is broken [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer commented on issue #1118: URL: https://github.com/apache/datafusion-python/issues/1118#issuecomment-2835314986 Closed by https://github.com/apache/datafusion-python/pull/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] main 5a7f638 is broken [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer closed issue #1118: main 5a7f638 is broken URL: https://github.com/apache/datafusion-python/issues/1118 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Fix regression on main with circular import on expr [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer closed issue #1116: Fix regression on main with circular import on expr URL: https://github.com/apache/datafusion-python/issues/1116 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Fix regression on main with circular import on expr [datafusion-python]

2025-04-28 Thread via GitHub
timsaucer commented on issue #1116: URL: https://github.com/apache/datafusion-python/issues/1116#issuecomment-2835315276 Closed by https://github.com/apache/datafusion-python/pull/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Use `interleave` to speed up hash repartitioning [datafusion]

2025-04-28 Thread via GitHub
Dandandan commented on PR #15768: URL: https://github.com/apache/datafusion/pull/15768#issuecomment-2835325709 @alamb could you run the benchmarks maybe? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] Keeping pull request in sync with the base branch [datafusion]

2025-04-28 Thread via GitHub
xudong963 opened a new issue, #15877: URL: https://github.com/apache/datafusion/issues/15877 https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/proposing-changes-to-your-work-with-pull-requests/keeping-your-pull-request-in-sync-with-the-base-branch How to set up

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-28 Thread via GitHub
xudong963 commented on PR #15852: URL: https://github.com/apache/datafusion/pull/15852#issuecomment-2834656400 Github is down, my recent update is delayed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-28 Thread via GitHub
daphnenhuch-at commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2836826844 > Thanks for the report [@daphnenhuch-at](https://github.com/daphnenhuch-at) > > If you want the output sorted in a particular way I think you need to explicitly ad

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836840621 I'll try to start with https://github.com/apache/datafusion/issues/14510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Improve `ListingTable` / `ListingTableOptions` docs [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15767: URL: https://github.com/apache/datafusion/pull/15767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: describe Parquet schema with coerce_int96 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15750: URL: https://github.com/apache/datafusion/pull/15750#issuecomment-2836855789 Nice -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] fix: clickbench type err [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15773: URL: https://github.com/apache/datafusion/pull/15773#issuecomment-2836854794 Thank you so much @chenkovsky and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] test: add fuzz test for doing aggregation with larger than memory groups and sorting with limited memory [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15727: URL: https://github.com/apache/datafusion/pull/15727#issuecomment-2836864703 What I suggest we should do with this PR is 1. `[#ignore]` the tests that are failing 2. leave a comment with link to the PR / ticket to fix them 3. Merge this PR -- This

Re: [PR] Upgrade-guide: Downgrade "FileScanConfig –> FileScanConfigBuilder" headline [datafusion]

2025-04-28 Thread via GitHub
alamb merged PR #15883: URL: https://github.com/apache/datafusion/pull/15883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2064905078 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1606,8 +1606,9 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15836: URL: https://github.com/apache/datafusion/pull/15836#issuecomment-2836883716 Thank you for this PR @srh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on code in PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2064904374 ## native/spark-expr/src/array_funcs/array_repeat.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

[I] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
maxburke opened a new issue, #15886: URL: https://github.com/apache/datafusion/issues/15886 ### Describe the bug Referenced discussion: https://the-asf.slack.com/archives/C01QUFS30TD/p1745875862723149 Given this table: ``` > create table d1 (ul_node_id string); ```

Re: [I] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836896989 Simpler test case ORDER BY in outer query ``` > explain select x.* from (select 1 a union all select null) x order by a nulls last; +---+--

Re: [I] [REGRESSION] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
maxburke commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836968414 I bisected this to commit 02eab80cd62e02fcb68dee8b99d63aaac680a66c -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Do not add redundant subquery ordering into plan [datafusion]

2025-04-28 Thread via GitHub
maxburke commented on PR #12003: URL: https://github.com/apache/datafusion/pull/12003#issuecomment-2836969736 I think this change causes this bug: https://github.com/apache/datafusion/issues/15886 -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] feat: make execution_graph.stages() public [datafusion-ballista]

2025-04-28 Thread via GitHub
andygrove commented on code in PR #1256: URL: https://github.com/apache/datafusion-ballista/pull/1256#discussion_r2064981773 ## ballista/scheduler/src/state/execution_graph.rs: ## @@ -218,7 +218,7 @@ impl ExecutionGraph { new_tid } -pub(crate) fn stages(&sel

Re: [PR] Migrate Optimizer tests to insta, part2 [datafusion]

2025-04-28 Thread via GitHub
qstommyshu commented on PR #15884: URL: https://github.com/apache/datafusion/pull/15884#issuecomment-2836975145 Hi @alamb @blaginin , This PR is ready for review; I’ll tackle the remaining migrations in subsequent PRs to keep each set of changes manageable. -- This is an automated

Re: [I] Migrate optimizer tests to `insta` [datafusion]

2025-04-28 Thread via GitHub
qstommyshu commented on issue #15396: URL: https://github.com/apache/datafusion/issues/15396#issuecomment-2836976596 Hi @alamb and @blaginin , Do you mind reopening this issue just to indicate the status of this issue is not done yet? -- This is an automated message from the Apache

[PR] chore: update dev/release/rat_exclude_files.txt [datafusion-comet]

2025-04-28 Thread via GitHub
hsiang-c opened a new pull request, #1689: URL: https://github.com/apache/datafusion-comet/pull/1689 ## Which issue does this PR close? Closes #. https://github.com/apache/datafusion-comet/issues/1678 ## Rationale for this change Update Rat's exclude files

Re: [I] [REGRESSION] Sorts being removed from inner expressions [datafusion]

2025-04-28 Thread via GitHub
comphead commented on issue #15886: URL: https://github.com/apache/datafusion/issues/15886#issuecomment-2836992611 > I'm getting the results I expect if I revert the changes from that commit ^ in this file: datafusion/sql/src/relation/mod.rs (ie: remove the call to `optimize_subquery_sort`)

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-04-28 Thread via GitHub
hsiang-c commented on issue #1678: URL: https://github.com/apache/datafusion-comet/issues/1678#issuecomment-2836800737 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
skyzh commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836799420 I think #14595 is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result for some lateral joins, that would be a good starting

[I] [DISCUSSION] JOIN "task force" / project team [datafusion]

2025-04-28 Thread via GitHub
alamb opened a new issue, #15885: URL: https://github.com/apache/datafusion/issues/15885 # What I see (what problem we are trying to solve) DataFusion's current join implementations are fairly basic. They are functional enough to run TPCH and TPC-DS, but lack other features such as large

Re: [I] Decorrelate scalar subqueries with more complex filter expressions [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #14554: URL: https://github.com/apache/datafusion/issues/14554#issuecomment-2836816251 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Gi

Re: [D] How does 'sort' interact with record batches? [datafusion]

2025-04-28 Thread via GitHub
GitHub user daphnenhuch-at added a comment to the discussion: How does 'sort' interact with record batches? That doesn't fix this problem unfortunately. When I swap the order I still get the record batch starting with 8192 first GitHub link: https://github.com/apache/datafusion/discussions/1

Re: [I] [EPIC] More Subquery support [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #5483: URL: https://github.com/apache/datafusion/issues/5483#issuecomment-2836816535 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [I] Implement nested join optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #3843: URL: https://github.com/apache/datafusion/issues/3843#issuecomment-2836816875 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2836818904 - I am trying to organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the Apache Git

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836821370 > I think [#14595](https://github.com/apache/datafusion/pull/14595) is in a decent shape and could be merged :) Though it cannot unnest all queries and might produce wrong result

Re: [I] [DISCUSSION] DataFusion Road Map: Q3-Q4 2025 [datafusion]

2025-04-28 Thread via GitHub
alamb commented on issue #15878: URL: https://github.com/apache/datafusion/issues/15878#issuecomment-2836819606 - I am also trying to help organize a join task force for planning joins / subqueries: https://github.com/apache/datafusion/issues/15885 -- This is an automated message from the

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra commented on code in PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#discussion_r2064867088 ## native/spark-expr/src/array_funcs/array_repeat.rs: ## @@ -0,0 +1,216 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more cont

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
comphead commented on PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672#issuecomment-2837020381 are we okay to merge it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [I] Sorting is not maintained after using a window function [datafusion]

2025-04-28 Thread via GitHub
akurmustafa commented on issue #15833: URL: https://github.com/apache/datafusion/issues/15833#issuecomment-2837020145 Sort function in `Datafusion` accepts the vector. By this way, you can pass the desired lexicographical ordering such as (`.sort(vec![ident("userPrimaryKey").sort(true, true

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
srh commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2065080834 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1606,8 +1606,9 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] feat: Add `datafusion-spark` crate [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15168: URL: https://github.com/apache/datafusion/pull/15168#discussion_r2065131934 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -0,0 +1,57 @@ + + +# Spark Test Files + +This directory contains test files for the `spark` test suite.

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra merged PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: Remove fallback reason "because the children were not native" [datafusion-comet]

2025-04-28 Thread via GitHub
parthchandra commented on PR #1672: URL: https://github.com/apache/datafusion-comet/pull/1672#issuecomment-2837134447 Merged. Thanks @andygrove, @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
srh commented on PR #15836: URL: https://github.com/apache/datafusion/pull/15836#issuecomment-2837073697 > Thanks @srh for providing the test case please add the query to one of select.slt files to preserve the regression I have added a test case (not the one with an inner select, bec

Re: [PR] Map file-level column statistics to the table-level [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15865: URL: https://github.com/apache/datafusion/pull/15865#discussion_r2065156402 ## datafusion/core/src/datasource/listing/table.rs: ## @@ -1129,7 +1130,17 @@ impl ListingTable { let (file_group, inexact_stats) = get_files_w

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-04-28 Thread via GitHub
kosiew commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2837154356 Closing, no update from @TheBuilderJR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-04-28 Thread via GitHub
kosiew closed pull request #15295: Enhance Schema adapter to accommodate evolving struct URL: https://github.com/apache/datafusion/pull/15295 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on PR #15783: URL: https://github.com/apache/datafusion/pull/15783#issuecomment-2837160708 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15836: URL: https://github.com/apache/datafusion/pull/15836#discussion_r2065170350 ## datafusion/sqllogictest/test_files/strings.slt: ## @@ -115,6 +115,12 @@ p1 p1e1 p1m1e1 +query T rowsort Review Comment: In case anyone is curious, without

Re: [PR] Set HashJoin seed [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15783: URL: https://github.com/apache/datafusion/pull/15783#discussion_r2065171783 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -86,6 +86,10 @@ use datafusion_physical_expr_common::physical_expr::fmt_sql; use futures::{ready, Stream,

Re: [PR] Feat: introduce partition statistics API [datafusion]

2025-04-28 Thread via GitHub
alamb commented on code in PR #15852: URL: https://github.com/apache/datafusion/pull/15852#discussion_r2065176521 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -941,49 +994,15 @@ impl ExecutionPlan for AggregateExec { } fn statistics(&self) -> Result { -

  1   2   3   >