[I] Cannot use Projection::new_from_schema to set parquet field ids. [datafusion]

2025-04-23 Thread via GitHub
init-js opened a new issue, #15837: URL: https://github.com/apache/datafusion/issues/15837 ### Describe the bug Our goal is to take an existing `DataFrame` and change the parquet field ids (after the fact) of its schema. The function `Projection::new_from_schema` looks promising, in

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-04-23 Thread via GitHub
rluvaton commented on PR #13991: URL: https://github.com/apache/datafusion/pull/13991#issuecomment-2826533597 I'm really sorry, had crazy week with the baby, will work on it today -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in `min` and `max` [datafusion]

2025-04-23 Thread via GitHub
gabotechs commented on PR #13991: URL: https://github.com/apache/datafusion/pull/13991#issuecomment-2826530936 @rluvaton do you have an estimate of when this might be shipped? We’re currently blocked by this, so we’d be glad to handle it ourselves if that would help ease your workload. --

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2826457369 > > > > i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 after

Re: [PR] Fix `ILIKE` expression support in SQL unparser [datafusion]

2025-04-23 Thread via GitHub
phillipleblanc commented on PR #15820: URL: https://github.com/apache/datafusion/pull/15820#issuecomment-2826473113 Correct, DataFusion was correctly handling Like and ILike - but DataFusion stores that as a single `Expr::Like ` with a boolean for whether its case insensitive. When t

[PR] fix: Avoid mistaken ILike to string equality optimization [datafusion]

2025-04-23 Thread via GitHub
srh opened a new pull request, #15836: URL: https://github.com/apache/datafusion/pull/15836 ## Which issue does this PR close? - Closes #15835. ## Rationale for this change Bugfix ## What changes are included in this PR? Bugfix and unit test cases for the op

[I] ILike with no wildcards is mistakenly optimized to string equality [datafusion]

2025-04-23 Thread via GitHub
srh opened a new issue, #15835: URL: https://github.com/apache/datafusion/issues/15835 ### Describe the bug `'a' ILIKE 'A'` ends up evaluating as false. PR incoming. ### To Reproduce _No response_ ### Expected behavior _No response_ ### Additio

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2826449563 > > > i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 after

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2826434082 > > i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterward

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2057547218 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2826402327 > i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L24 afterward

Re: [PR] replace reassign_predicate_columns helper with PhysicalExpr::with_schema [datafusion]

2025-04-23 Thread via GitHub
adriangb commented on code in PR #15779: URL: https://github.com/apache/datafusion/pull/15779#discussion_r2057512794 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -333,6 +333,15 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash {

Re: [PR] Fix `ILIKE` expression support in SQL unparser [datafusion]

2025-04-23 Thread via GitHub
ewgenius commented on PR #15820: URL: https://github.com/apache/datafusion/pull/15820#issuecomment-2826267482 @comphead thanks for the response. The problem we faced at Spice is that `ILIKE` expression pushed down as `LIKE` to the data source: - https://github.com/spiceai/spiceai/i

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
Garamda commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2826255767 @jayzhan211 Alright, I see. Thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-04-23 Thread via GitHub
adamreeve commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2826229093 With the KMS API not being included in arrow-rs but being built as a third-party crate (https://github.com/apache/arrow-rs/pull/7387#issuecomment-2819908130), I would assume

[PR] Minor: cleanup hash table after emit all [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 opened a new pull request, #15834: URL: https://github.com/apache/datafusion/pull/15834 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Support exposing setting memory limit of memory pool [datafusion]

2025-04-23 Thread via GitHub
Rachelint closed issue #15830: Support exposing setting memory limit of memory pool URL: https://github.com/apache/datafusion/issues/15830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint merged PR #15828: URL: https://github.com/apache/datafusion/pull/15828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] support simple/cross lateral joins [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14595: URL: https://github.com/apache/datafusion/pull/14595#issuecomment-282594 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Consolidate feature flags into configuration guide [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14657: URL: https://github.com/apache/datafusion/pull/14657#issuecomment-2825977721 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] feat: implement contextualized ObjectStore [datafusion]

2025-04-23 Thread via GitHub
github-actions[bot] commented on PR #14805: URL: https://github.com/apache/datafusion/pull/14805#issuecomment-2825977607 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] Support integration with Parquet modular encryption [datafusion]

2025-04-23 Thread via GitHub
corwinjoy commented on issue #15216: URL: https://github.com/apache/datafusion/issues/15216#issuecomment-2825947274 @alamb @adamreeve With the modular encryption essentially complete in arrow-rs, we are interested in beginning to move forward with adding support for this feature in datafus

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2057129418 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2057125845 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #15825: URL: https://github.com/apache/datafusion/pull/15825#issuecomment-2825860245 You can merge this PR, I had kept the commit from @vadimpiven -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15825: URL: https://github.com/apache/datafusion/pull/15825#discussion_r2057104794 ## datafusion/core/tests/execution/logical_plan.rs: ## @@ -96,3 +100,37 @@ where }; element } + +#[test] +fn inline_scan_projection_test() -> Result<

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove merged PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2825855767 I think support both query would be confusing, if we plan to end up support the new syntax at the end, it is better not to keep the old syntax -- This is an automated message fr

Re: [PR] feat: support `array_repeat` [datafusion-comet]

2025-04-23 Thread via GitHub
comphead commented on PR #1680: URL: https://github.com/apache/datafusion-comet/pull/1680#issuecomment-2825758696 Now problem DF and Spark returns different value if count is null DF returns empty array ``` > select array_repeat(null, arrow_cast(null, 'Int32')); +---

[PR] feat: support `array_repeat` [datafusion-comet]

2025-04-23 Thread via GitHub
comphead opened a new pull request, #1680: URL: https://github.com/apache/datafusion-comet/pull/1680 ## Which issue does this PR close? Replaces #1205 . Closes #1347 ## Rationale for this change ## What changes are included in this PR?

Re: [PR] feat: More warning info for users [datafusion-comet]

2025-04-23 Thread via GitHub
parthchandra commented on code in PR #1667: URL: https://github.com/apache/datafusion-comet/pull/1667#discussion_r2056942543 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -161,10 +161,28 @@ class CometSparkSessionExtensions }

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056934141 ## src/parser/mod.rs: ## @@ -484,8 +488,18 @@ impl<'a> Parser<'a> { } let statement = self.parse_statement()?; +

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056929658 ## src/parser/mod.rs: ## @@ -484,8 +488,18 @@ impl<'a> Parser<'a> { } let statement = self.parse_statement()?; +

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056927589 ## src/parser/mod.rs: ## @@ -618,6 +632,7 @@ impl<'a> Parser<'a> { // `COMMENT` is snowflake specific https://docs.snowflake.com/en

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056927872 ## src/parser/mod.rs: ## @@ -3939,6 +3954,26 @@ impl<'a> Parser<'a> { }) } +/// Return nth previous token, possibly whitespac

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056908951 ## src/dialect/mssql.rs: ## @@ -116,7 +116,17 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056886926 ## src/dialect/mssql.rs: ## @@ -116,7 +116,17 @@ impl Dialect for MsSqlDialect { true } -fn is_column_alias(&self, kw: &Keyword,

[I] Sorting is not maintained after using a window function [datafusion]

2025-04-23 Thread via GitHub
daphnenhuch-at opened a new issue, #15833: URL: https://github.com/apache/datafusion/issues/15833 ### Describe the bug I have a query which sorts the data by a column called "userPrimaryKey" and then using a windowing function to add a row number column to the data frame. I've set `t

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056903407 ## src/parser/mod.rs: ## @@ -4055,6 +4090,38 @@ impl<'a> Parser<'a> { ) } +/// Look backwards in the token stream and expect that

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#issuecomment-2825528849 Here's another example case this PR should parse properly, before merging (on my todo list...) ``` USE some_database; GO ;WITH cte AS ( SELECT 1

[PR] Add `DECLARE ... CURSOR FOR` support for SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc opened a new pull request, #1821: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1821 This PR adds support for declaring cursors on queries for SQL Server ([docs](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql)) Eg, thi

[PR] ignore: explore jemalloc and snmalloc instead of mimalloc [datafusion-comet]

2025-04-23 Thread via GitHub
mbutrovich opened a new pull request, #1679: URL: https://github.com/apache/datafusion-comet/pull/1679 ## Which issue does this PR close? Closes #. ## Rationale for this change Comet currently supports mimalloc as its memory allocate (`make release COMET_FEAT

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint commented on PR #15828: URL: https://github.com/apache/datafusion/pull/15828#issuecomment-2824818687 @waynexia -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add `CREATE TRIGGER` support for SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
aharpervc commented on PR #1810: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1810#issuecomment-2825353800 Rebased again now that https://github.com/apache/datafusion-sqlparser-rs/pull/1808 has been merged. Should be ready for review. -- This is an automated message from

Re: [PR] Add support for `GO` batch delimiter in SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio commented on code in PR #1809: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1809#discussion_r2056434400 ## src/parser/mod.rs: ## @@ -484,8 +488,18 @@ impl<'a> Parser<'a> { } let statement = self.parse_statement()?; +

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
codecov-commenter commented on PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#issuecomment-2825234249 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1677?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-23 Thread via GitHub
gabotechs commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2825064512 > Thoughts? I have only a very slight preference for smaller pieces, but since I wasn’t on the front lines coding this, I trust your judgment much more. I’ll apply your sugg

Re: [PR] docs: Add changelog for 0.8.0 [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove merged PR #1675: URL: https://github.com/apache/datafusion-comet/pull/1675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] bug: build fails with `--no-default-features` [datafusion-ballista]

2025-04-23 Thread via GitHub
milenkovicm opened a new pull request, #1255: URL: https://github.com/apache/datafusion-ballista/pull/1255 # Which issue does this PR close? Closes #1254 # Rationale for this change # What changes are included in this PR? - Bug fix - GitHub action check if

Re: [PR] Factor out Substrait consumers into separate files [datafusion]

2025-04-23 Thread via GitHub
Blizzara commented on PR #15794: URL: https://github.com/apache/datafusion/pull/15794#issuecomment-2824982349 Thanks! This has indeed been a long time todo :) also cc @vbarua I think personally I'd prefer a bit less files, but that's just a suggestion: I'd probably do something like:

Re: [PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 commented on PR #15832: URL: https://github.com/apache/datafusion/pull/15832#issuecomment-2824970974 hi @xudong963 , i want to ask that did we had to rewrite the part of code https://github.com/apache/datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs#L18-L2

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-23 Thread via GitHub
acking-you commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2824964842 > Relevant: https://clickhouse.com/blog/clickhouse-gets-lazier-and-faster-introducing-lazy-materialization Thank you so much for sharing this blog link—it’s truly an ex

[PR] Update extending-operators.md [datafusion]

2025-04-23 Thread via GitHub
Adez017 opened a new pull request, #15832: URL: https://github.com/apache/datafusion/pull/15832 ## Which issue does this PR close? - Closes #15774 ## Rationale for this change updated the extending-operators.md file -- This is an automated message from the A

Re: [PR] Add `OR ALTER` support for `CREATE VIEW` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1818: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1818 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
waynexia commented on code in PR #15828: URL: https://github.com/apache/datafusion/pull/15828#discussion_r2056500052 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -141,6 +141,24 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug { /// Return the total amount o

[I] Ensure Substrait producer for `BinaryExpr` includes `output_type` [datafusion]

2025-04-23 Thread via GitHub
kadinrabo opened a new issue, #15831: URL: https://github.com/apache/datafusion/issues/15831 ### Describe the bug When converting `BinaryExpr` expressions to Substrait using `from_binary_expr`, the resulting scalar function omits the `output_type` field. This happens via the `make_bi

Re: [PR] Add support for `XMLTABLE` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
lovasoa commented on PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817#issuecomment-2824821901 Thanks for merging, @iffyio ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] xmltable(...) function support [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio closed issue #1816: xmltable(...) function support URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1816 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [D] Should ExecutionPlan spawn tasks in `execute` function [datafusion]

2025-04-23 Thread via GitHub
GitHub user pepijnve added a comment to the discussion: Should ExecutionPlan spawn tasks in `execute` function I can't give you an authoritative answer on this one, but FWIW `CoalescePartitionsExec::execute` also requires a current/active Tokio context since it spawns a task for each partitio

[I] Support exposing setting memory limit of memory pool [datafusion]

2025-04-23 Thread via GitHub
Rachelint opened a new issue, #15830: URL: https://github.com/apache/datafusion/issues/15830 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [PR] Add `CREATE FUNCTION` support for SQL Server [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1808: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1808 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Add support for `XMLTABLE` [datafusion-sqlparser-rs]

2025-04-23 Thread via GitHub
iffyio merged PR #1817: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1817 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] Minor: fix potential flaky test in aggregate.slt [datafusion]

2025-04-23 Thread via GitHub
bikbov opened a new pull request, #15829: URL: https://github.com/apache/datafusion/pull/15829 ## Which issue does this PR close? - Closes #15789. ## Rationale for this change Tests improvement ## What changes are included in this PR? Fix potential flaky

[PR] Add `MemoryPool::memory_limit` to expose setting memory usage limit [datafusion]

2025-04-23 Thread via GitHub
Rachelint opened a new pull request, #15828: URL: https://github.com/apache/datafusion/pull/15828 …fusion. ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are the

Re: [I] Make ClickBench Q23 Go Faster [datafusion]

2025-04-23 Thread via GitHub
acking-you commented on issue #15177: URL: https://github.com/apache/datafusion/issues/15177#issuecomment-2824790924 > I tried the rewrite into a Semi join and indeed it is over 2x slower (5.3sec vs 12sec) > > > SELECT * from 'hits_partitioned' WHERE "URL" LIKE '%google%' ORDER BY "E

Re: [PR] Support unparsing `UNION` for distinct results [datafusion]

2025-04-23 Thread via GitHub
goldmedal commented on PR #15814: URL: https://github.com/apache/datafusion/pull/15814#issuecomment-2824780534 Thanks @phillipleblanc and @sgrebnov for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] The SQL Unparser does not correctly handle `UNION` [datafusion]

2025-04-23 Thread via GitHub
goldmedal closed issue #15813: The SQL Unparser does not correctly handle `UNION` URL: https://github.com/apache/datafusion/issues/15813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Support unparsing `UNION` for distinct results [datafusion]

2025-04-23 Thread via GitHub
goldmedal merged PR #15814: URL: https://github.com/apache/datafusion/pull/15814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Unnecessary casting in stats & filter evaluation [datafusion]

2025-04-23 Thread via GitHub
leoyvens commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2824716928 To understand how this happens in the logical optimizer, as part of the `SimplifyExpressions` pass, you can look at [unwrap_cast.rs](https://github.com/apache/datafusion/blob/m

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on code in PR #15825: URL: https://github.com/apache/datafusion/pull/15825#discussion_r2056286726 ## datafusion/core/tests/execution/logical_plan.rs: ## @@ -96,3 +100,37 @@ where }; element } + +#[test] +fn inline_scan_projection_test() -> Result<

Re: [PR] Implement min max for dictionary types [datafusion]

2025-04-23 Thread via GitHub
XiangpengHao commented on code in PR #15827: URL: https://github.com/apache/datafusion/pull/15827#discussion_r2056278172 ## datafusion/functions-aggregate/src/min_max.rs: ## @@ -1854,9 +1866,31 @@ mod tests { #[test] fn test_get_min_max_return_type_coerce_dictionary()

[PR] Implement min max for dictionary types [datafusion]

2025-04-23 Thread via GitHub
XiangpengHao opened a new pull request, #15827: URL: https://github.com/apache/datafusion/pull/15827 ## Which issue does this PR close? - Closes #. ## Rationale for this change I hit a run time error when passing a dictionary type to the min/max aggregation.

Re: [PR] feat: update datafusion dependency 47 [datafusion-python]

2025-04-23 Thread via GitHub
robtandy commented on code in PR #1107: URL: https://github.com/apache/datafusion-python/pull/1107#discussion_r2056227303 ## src/functions.rs: ## @@ -698,8 +677,22 @@ pub fn approx_percentile_cont_with_weight( add_builder_fns_to_aggregate(agg_fn, None, filter, None, None)

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
vadimpiven commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2824581546 You can merge your change and just close my PR, there is no difference to the result for me. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056187263 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-04-23 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2824434533 > > What's the status of this PR? > > It's ready to review. I'm still waiting for someone to help review it. Thanks @goldmedal. We'll need this as well, so let's re

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056074078 ## .github/workflows/miri.yml: ## @@ -38,6 +38,12 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 + - name: Install

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
codecov-commenter commented on PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676#issuecomment-2824337650 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1676?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056071541 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

Re: [PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1677: URL: https://github.com/apache/datafusion-comet/pull/1677#discussion_r2056028940 ## native/Cargo.toml: ## @@ -38,16 +38,16 @@ arrow = { version = "55.0.0", features = ["prettyprint", "ffi", "chrono-tz"] } async-trait = { version = "0.1"

[I] Fix rat check errors during release process [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new issue, #1678: URL: https://github.com/apache/datafusion-comet/issues/1678 ### Describe the bug The rat exclude list needs updating to ignore these files: ``` NOT APPROVED: docs/source/_static/images/comet-dataflow.excalidraw (apache-datafusion-comet-0.

Re: [I] Adjust sizeInBytes estimation for Comet exchanges to avoid join strategy regressions [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove closed issue #1671: Adjust sizeInBytes estimation for Comet exchanges to avoid join strategy regressions URL: https://github.com/apache/datafusion-comet/issues/1671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] chore(deps): bump env_logger from 0.11.7 to 0.11.8 [datafusion]

2025-04-23 Thread via GitHub
xudong963 merged PR #15823: URL: https://github.com/apache/datafusion/pull/15823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] chore: Update viable crates [datafusion-comet]

2025-04-23 Thread via GitHub
EmilyMatt opened a new pull request, #1677: URL: https://github.com/apache/datafusion-comet/pull/1677 ## Rationale for this change Reduce the amount of duplicate crates due to crates that use outdated versions, thereby improving compile times and reducing binary size. Some

Re: [PR] docs: add ArkFlow [datafusion]

2025-04-23 Thread via GitHub
xudong963 merged PR #15826: URL: https://github.com/apache/datafusion/pull/15826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Make `Diagnostic` easy/convinient to attach by using macro and avoiding `map_err` [datafusion]

2025-04-23 Thread via GitHub
logan-keede commented on PR #15796: URL: https://github.com/apache/datafusion/pull/15796#issuecomment-2823666019 > @logan-keede please run the planner tests to check if this change affects planner performance, we got some experience in the past #7522 using ```sh cargo bench --bench

Re: [PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1676: URL: https://github.com/apache/datafusion-comet/pull/1676#discussion_r2055934997 ## dev/release/README.md: ## @@ -60,7 +60,6 @@ Create a PR against the main branch to prepare for developing the next release: - Update the Rust crate vers

[PR] chore: Start 0.9.0 development [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new pull request, #1676: URL: https://github.com/apache/datafusion-comet/pull/1676 ## Which issue does this PR close? N/A ## Rationale for this change Now that the release branch `branch-0.8` has been created, it is time to switch `main

[PR] docs: Add changelog for 0.8.0 [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove opened a new pull request, #1675: URL: https://github.com/apache/datafusion-comet/pull/1675 ## Which issue does this PR close? N/A ## Rationale for this change ## What changes are included in this PR? ## How are these changes teste

Re: [PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on PR #1674: URL: https://github.com/apache/datafusion-comet/pull/1674#issuecomment-2824086340 I will merge this so that I can start the 0.8.0 release process. Thanks for the reviews @comphead and @kazuyukitanimura, -- This is an automated message from the Apache Git

Re: [PR] perf: Experimental fix to avoid join strategy regression [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove merged PR #1674: URL: https://github.com/apache/datafusion-comet/pull/1674 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Join on pandas dataframe from python API fails due to schema metadata [datafusion]

2025-04-23 Thread via GitHub
lesam commented on issue #15754: URL: https://github.com/apache/datafusion/issues/15754#issuecomment-2824080701 https://github.com/apache/datafusion/issues/12736#issuecomment-2613005807 also seems to be the same issue -- This is an automated message from the Apache Git Service. To respon

Re: [PR] docs: Add instructions on running TPC-H on macOS [datafusion-comet]

2025-04-23 Thread via GitHub
andygrove commented on code in PR #1647: URL: https://github.com/apache/datafusion-comet/pull/1647#discussion_r2055896317 ## docs/source/contributor-guide/benchmarking_macos.md: ## @@ -0,0 +1,145 @@ + + +# Comet Benchmarking on macOS + +This guide is for setting up TPC-H benchma

[PR] docs: add ArkFlow [datafusion]

2025-04-23 Thread via GitHub
chenquan opened a new pull request, #15826: URL: https://github.com/apache/datafusion/pull/15826 ## Which issue does this PR close? no. ## Rationale for this change ## What changes are included in this PR? Add Arkflow to the document

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on PR #15811: URL: https://github.com/apache/datafusion/pull/15811#issuecomment-2823998323 https://github.com/apache/datafusion/pull/15825 I couldn't open a PR to your repo. Here is the fix I did -- This is an automated message from the Apache Git Service. To re

Re: [PR] Preserve projection for inline scan [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 commented on code in PR #15811: URL: https://github.com/apache/datafusion/pull/15811#discussion_r2055814059 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -498,7 +498,7 @@ impl LogicalPlanBuilder { TableScan::try_new(table_name, table_source, proje

[PR] Project inline [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 opened a new pull request, #15825: URL: https://github.com/apache/datafusion/pull/15825 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [PR] Fix: fetch is missing in `EnforceSorting` optimizer (two places) [datafusion]

2025-04-23 Thread via GitHub
xudong963 commented on code in PR #15822: URL: https://github.com/apache/datafusion/pull/15822#discussion_r2055810839 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -137,6 +137,12 @@ fn plan_with_order_preserving_variants(

Re: [I] Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions [datafusion]

2025-04-23 Thread via GitHub
jayzhan211 closed issue #11732: Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions URL: https://github.com/apache/datafusion/issues/11732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Support WITHIN GROUP syntax to standardize certain existing aggregate functions [datafusion]

2025-04-23 Thread via GitHub
Garamda commented on PR #13511: URL: https://github.com/apache/datafusion/pull/13511#issuecomment-2823913238 @jayzhan211 Thank you for reviewing! However, I have one concern. Is it okay to merge this PR right away, considering https://github.com/apache/datafusion/pull/13511#pul

  1   2   >