Re: [PR] feat: add macros for DataFusionError variants [datafusion]

2025-05-11 Thread via GitHub
Chen-Yuan-Lai commented on PR #15946: URL: https://github.com/apache/datafusion/pull/15946#issuecomment-2871094058 ### Summary of this change 1. **Enhancement of `DataFusionError` Enum Variants** * Updated the `ExecutionJoin` and `External` variants to maintain consistent backt

[PR] Set execution options in init using set [datafusion-ray]

2025-05-11 Thread via GitHub
pranavJibhakate opened a new pull request, #87: URL: https://github.com/apache/datafusion-ray/pull/87 Fix for the issue #72 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] fix: Check acquired memory when CometMemoryPool grows [datafusion-comet]

2025-05-11 Thread via GitHub
wForget opened a new pull request, #1732: URL: https://github.com/apache/datafusion-comet/pull/1732 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max [datafusion]

2025-05-11 Thread via GitHub
gabotechs commented on PR #15857: URL: https://github.com/apache/datafusion/pull/15857#issuecomment-2870958657 Here's the new PR: https://github.com/apache/datafusion/pull/16025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[PR] Min max over lists [datafusion]

2025-05-11 Thread via GitHub
gabotechs opened a new pull request, #16025: URL: https://github.com/apache/datafusion/pull/16025 ## Which issue does this PR close? - Closes #13987. ## Rationale for this change Reuse the work done on https://github.com/apache/datafusion/pull/15667 for a

[PR] fix: Avoid releasing unacquired memory [datafusion-comet]

2025-05-11 Thread via GitHub
wForget opened a new pull request, #1731: URL: https://github.com/apache/datafusion-comet/pull/1731 ## Which issue does this PR close? Closes #. ## Rationale for this change The memeory acquired by `CometMemoryPool.grow` may be less than the actual request, so `M

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max [datafusion]

2025-05-11 Thread via GitHub
gabotechs closed pull request #15857: feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max URL: https://github.com/apache/datafusion/pull/15857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] feat(datafusion-functions-aggregate): add support for lists and other nested types in min and max [datafusion]

2025-05-11 Thread via GitHub
gabotechs commented on PR #15857: URL: https://github.com/apache/datafusion/pull/15857#issuecomment-2870925010 After reviewing https://github.com/apache/datafusion/pull/15667, I think it makes more sense to reuse the work done there rather than what this PR proposes. Thanks @alamb for point

Re: [I] Add NullState::is_null public method [datafusion]

2025-05-11 Thread via GitHub
joroKr21 commented on issue #11591: URL: https://github.com/apache/datafusion/issues/11591#issuecomment-2870882363 This method would be unusable because the callback-based APIs of `NullState` borrow it as mutable -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Add NullState::is_null public method [datafusion]

2025-05-11 Thread via GitHub
joroKr21 closed issue #11591: Add NullState::is_null public method URL: https://github.com/apache/datafusion/issues/11591 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] fix: Fix data race in memory profiling [datafusion-comet]

2025-05-11 Thread via GitHub
wForget commented on code in PR #1727: URL: https://github.com/apache/datafusion-comet/pull/1727#discussion_r2083780711 ## spark/src/main/java/org/apache/spark/CometTaskMemoryManager.java: ## @@ -30,36 +34,41 @@ * memory manager. This assumes Spark's off-heap memory mode is en

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-11 Thread via GitHub
brayanjuls commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2083769931 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,17 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] chores: Add lint rule to enforce string formatting style [datafusion]

2025-05-11 Thread via GitHub
kosiew commented on code in PR #16024: URL: https://github.com/apache/datafusion/pull/16024#discussion_r2083735160 ## datafusion/ffi/src/tests/async_provider.rs: ## @@ -270,7 +270,7 @@ impl Stream for AsyncTestRecordBatchStream { None => std::task::Poll::Ready(N

[PR] Add lint rule to enforce string formatting style [datafusion]

2025-05-11 Thread via GitHub
Lordworms opened a new pull request, #16024: URL: https://github.com/apache/datafusion/pull/16024 ## Which issue does this PR close? - Closes #16021 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [I] Table function supports non-literal args [datafusion]

2025-05-11 Thread via GitHub
Lordworms commented on issue #14958: URL: https://github.com/apache/datafusion/issues/14958#issuecomment-2870484350 wating for https://github.com/apache/datafusion/pull/16015 to be merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-11 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2083691379 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4607,82 +4607,58 @@ fn test_prepare_statement_to_plan_params_as_constants() { } #[test] -fn test_infer_t

Re: [PR] Include data types in logical plans of inferred prepare statements [datafusion]

2025-05-11 Thread via GitHub
qstommyshu commented on code in PR #16019: URL: https://github.com/apache/datafusion/pull/16019#discussion_r2083691801 ## datafusion/sql/src/statement.rs: ## @@ -710,6 +710,17 @@ impl SqlToRel<'_, S> { *statement, &mut planner_context,

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-11 Thread via GitHub
qstommyshu commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2083691379 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4607,82 +4607,58 @@ fn test_prepare_statement_to_plan_params_as_constants() { } #[test] -fn test_infer_t

Re: [I] Add lint rule to enforce string formatting style [datafusion]

2025-05-11 Thread via GitHub
Lordworms commented on issue #16021: URL: https://github.com/apache/datafusion/issues/16021#issuecomment-2870436642 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2870271047 @alamb please review again I implemented and added a test 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Fix big performance issue in string serialization [datafusion-sqlparser-rs]

2025-05-11 Thread via GitHub
lovasoa opened a new pull request, #1848: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1848 - **optimize string escaping by writing chunks instead of individual chars** - **add comments** -- This is an automated message from the Apache Git Service. To respond to the m

[PR] DRAFT: Eliminate Self Joins [datafusion]

2025-05-11 Thread via GitHub
atahanyorganci opened a new pull request, #16023: URL: https://github.com/apache/datafusion/pull/16023 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] support `regexp_match` as predicate [datafusion]

2025-05-11 Thread via GitHub
juju4 commented on issue #15872: URL: https://github.com/apache/datafusion/issues/15872#issuecomment-2870214294 it does or need subquery to evaluate ``` > select * from 'test-datafusion.csv' where regexp_match(b, '-O') is not null; +---+--+ | a | b| +--

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-11 Thread via GitHub
brayanjuls commented on code in PR #15743: URL: https://github.com/apache/datafusion/pull/15743#discussion_r2083604044 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4607,82 +4607,58 @@ fn test_prepare_statement_to_plan_params_as_constants() { } #[test] -fn test_infer_t

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
alamb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083599017 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchronously o

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2870055758 > > > > How would that work going from sync -> async? For example: `1 = 2 OR 1 = call_llm_model_async()`. I imagine this would build something like `BinaryExpr(BinaryExpr(1, Eq, 2), Or

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-11 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2870037513 Found discord is banned in my current mac (work mac belonging to company), I will swith to work on my personal mac and start to communicate on it today later. -- This is an

Re: [I] `GenericDialect` should support multi-table DELETE and DELETE without FROM clause [datafusion-sqlparser-rs]

2025-05-11 Thread via GitHub
piki commented on issue #1846: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1846#issuecomment-2869951944 It looks like #1120 is where this changed. `BigQueryDialect` got the ability to parse `DELETE` statements without the `FROM` keyword. `GenericDialect` got treated the

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083553168 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchronousl

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083553168 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchronousl

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2869915327 > > > How would that work going from sync -> async? For example: `1 = 2 OR 1 = call_llm_model_async()`. I imagine this would build something like `BinaryExpr(BinaryExpr(1, Eq,

Re: [PR] refactor: remove deprecated `MemoryExec` [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #16007: URL: https://github.com/apache/datafusion/pull/16007#issuecomment-2869904980 https://github.com/apache/datafusion/pull/16005#issuecomment-2869904849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] refactor: remove deprecated `JsonExec` [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #16005: URL: https://github.com/apache/datafusion/pull/16005#issuecomment-2869904849 Hi @comphead. you're right, but I think we can make an exception here for the greater good. The reasoning is well summarized here: https://github.com/apache/datafusion/issues/1

Re: [PR] refactor: remove deprecated `ArrowExec` [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #16006: URL: https://github.com/apache/datafusion/pull/16006#issuecomment-2869904936 https://github.com/apache/datafusion/pull/16005#issuecomment-2869904849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Select with order by from empty table triggers SanityCheckPlan error [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on issue #16001: URL: https://github.com/apache/datafusion/issues/16001#issuecomment-2869898307 I think there shouldn't be a SortExec there at all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-11 Thread via GitHub
Rachelint commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2869898423 > Thanks for the details [@Rachelint](https://github.com/Rachelint). I see you made a significant progress here, and what you provide and the roadmap sound logical. Of course

Re: [PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-11 Thread via GitHub
irenjj commented on PR #15993: URL: https://github.com/apache/datafusion/pull/15993#issuecomment-2869891230 > @irenjj I've set the default value of this config to "false", and auto-complete the slt tests. This is one of the changes: > > https://private-user-images.githubusercontent.co

Re: [I] Linear Aggregate Functions Optimization [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on issue #15633: URL: https://github.com/apache/datafusion/issues/15633#issuecomment-2869894299 Thanks for the details @Rachelint. I see you made a significant progress here, and what you provide and the roadmap sound logical. Of course I'm happy to let you take it t

Re: [PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-11 Thread via GitHub
irenjj closed pull request #15993: Add configuration for eliminating sort in subquery URL: https://github.com/apache/datafusion/pull/15993 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Add configuration for eliminating sort in subquery [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #15993: URL: https://github.com/apache/datafusion/pull/15993#issuecomment-2869869805 @irenjj I've set the default value of this config to "false", and auto-complete the slt tests. This is one of the changes: https://github.com/user-attachments/assets/a277

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2083522199 ## datafusion/datasource/src/file_stream.rs: ## @@ -367,7 +368,7 @@ impl Default for OnError { pub trait FileOpener: Unpin + Send + Sync { /// Asynchro

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083518706 ## datafusion/sqllogictest/test_files/prepare.slt: ## @@ -264,16 +264,19 @@ WHERE run_id = 'foo' ORDER BY random() LIMIT $1 -query I +query error EXECUT

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083517075 ## datafusion/physical-optimizer/src/optimizer.rs: ## @@ -126,6 +119,13 @@ impl PhysicalOptimizer { // into an `order by max(x) limit y`. In thi

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2869838330 > The rule issue is not very trivial because we cannot just track and eliminate some hardcoded patterns, since we also need to be aware of upper parts of the plan, and new patterns

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2869838533 @berkaysynnada you might need to resolve the conflicts for CI to run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083516712 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -229,7 +232,9 @@ EXPLAIN select * from t_pushdown where val != 'c'; logical_plan 0

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083516280 ## datafusion/sqllogictest/test_files/prepare.slt: ## @@ -264,16 +264,19 @@ WHERE run_id = 'foo' ORDER BY random() LIMIT $1 -query I +query error EXECUT

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083516211 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -570,6 +680,47 @@ impl TopKHeap { + self.store.size() + self.owned_bytes } + +

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083515926 ## datafusion/core/tests/fuzz_cases/topk_filter_pushdown.rs: ## @@ -0,0 +1,354 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083513975 ## datafusion/sqllogictest/test_files/parquet_filter_pushdown.slt: ## @@ -86,7 +86,8 @@ logical_plan physical_plan 01)SortPreservingMergeExec: [a@0 ASC NUL

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2869830929 The PR looks really nice BTW, and every line is clearly understandable. Of course there are some possible optimizations (as you've also noticed some of them like the todo durin

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2083506643 ## datafusion/physical-plan/src/topk/mod.rs: ## @@ -570,6 +680,47 @@ impl TopKHeap { + self.store.size() + self.owned_bytes }

Re: [PR] Update Dependency to arrow/parquet `55.1.0` [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16012: URL: https://github.com/apache/datafusion/pull/16012#issuecomment-2869817974 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_arrow_55.1_update Benchmark clickbench_extended.json --

Re: [PR] Update Dependency to arrow/parquet `55.1.0` [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16012: URL: https://github.com/apache/datafusion/pull/16012#issuecomment-2869787134 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2869790733 > Not sure how to construct the empty stream. You can use something like https://docs.rs/futures/latest/futures/stream/fn.iter.html perhaps -- like `futures::stream::iter(v

Re: [PR] Add late pruning of file based on file level statistics [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16014: URL: https://github.com/apache/datafusion/pull/16014#issuecomment-2869788980 > It might be nice to implement pruning for Vec where each statistic represents an arbitrary container (e.g. partition or file). Yes this would be super nice -- the more we can d

Re: [PR] Update Dependency to arrow/parquet `55.1.0` [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16012: URL: https://github.com/apache/datafusion/pull/16012#issuecomment-2869783033 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_arrow_55.1_update Benchmark clickbench_extended.json --

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
adriangb commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2869780441 > > How would that work going from sync -> async? For example: `1 = 2 OR 1 = call_llm_model_async()`. I imagine this would build something like `BinaryExpr(BinaryExpr(1, Eq, 2), Or,

Re: [PR] Update Dependency to arrow/parquet `55.1.0` [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #16012: URL: https://github.com/apache/datafusion/pull/16012#issuecomment-2869734535 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2083486922 ## datafusion/core/src/physical_planner.rs: ## @@ -775,12 +776,44 @@ impl DefaultPhysicalPlanner { let runtime_expr =

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
berkaysynnada commented on PR #14837: URL: https://github.com/apache/datafusion/pull/14837#issuecomment-2869684912 > How would that work going from sync -> async? For example: `1 = 2 OR 1 = call_llm_model_async()`. I imagine this would build something like `BinaryExpr(BinaryExpr(1, Eq, 2),

Re: [I] Weekly Plan: Andrew Lamb 2025-05-05 [datafusion]

2025-05-11 Thread via GitHub
alamb closed issue #15943: Weekly Plan: Andrew Lamb 2025-05-05 URL: https://github.com/apache/datafusion/issues/15943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubsc

[I] Weekly Plan: Andrew Lamb 2025-05-05 [datafusion]

2025-05-11 Thread via GitHub
alamb opened a new issue, #16022: URL: https://github.com/apache/datafusion/issues/16022 This is my plan this week for reviews, etc. I am putting it here to make it visible and keep myself organized - [ ] Arrow Filter Performance: Complete ClickBench benchmark: https://github.com/apa

Re: [I] Weekly Plan: Andrew Lamb 2025-05-05 [datafusion]

2025-05-11 Thread via GitHub
alamb commented on issue #15943: URL: https://github.com/apache/datafusion/issues/15943#issuecomment-2869680894 Next week: - https://github.com/apache/datafusion/issues/16022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Introduce Async User Defined Functions [datafusion]

2025-05-11 Thread via GitHub
alamb commented on code in PR #14837: URL: https://github.com/apache/datafusion/pull/14837#discussion_r2083479793 ## datafusion/core/src/physical_planner.rs: ## @@ -775,12 +776,44 @@ impl DefaultPhysicalPlanner { let runtime_expr = self.cr

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-05-11 Thread via GitHub
alamb commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2869674992 > I've tried to use this branch to sort data larger than memory. For 24GB parquet file, it produce error `Error: ArrowError(IoError("No space left on device (os error 28)", Os { code:

Re: [PR] Cascaded spill merge and re-spill [datafusion]

2025-05-11 Thread via GitHub
LogicFan commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2869669114 I've tried to use this branch to sort data larger than memory. For 24GB parquet file, it produce error `Error: ArrowError(IoError("No space left on device (os error 28)", Os { code:

Re: [PR] Updated extending operators documentation [datafusion]

2025-05-11 Thread via GitHub
Max-Meldrum commented on PR #15612: URL: https://github.com/apache/datafusion/pull/15612#issuecomment-2869668640 I believe this PR is now ready to be merged @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-11 Thread via GitHub
UBarney commented on PR #15954: URL: https://github.com/apache/datafusion/pull/15954#issuecomment-2869638528 @xudong963 Thanks for reviewing. All comments have been addressed, PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] fix: Allow ORDER BY aggregates not present in SELECT list [datafusion]

2025-05-11 Thread via GitHub
UBarney commented on code in PR #15876: URL: https://github.com/apache/datafusion/pull/15876#discussion_r2083447146 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -797,26 +807,146 @@ impl LogicalPlanBuilder { } // remove pushed down sort columns -