Re: [PR] feat: add test to check for `ctx.read_json()` [datafusion-ballista]

2025-04-03 Thread via GitHub
westhide commented on code in PR #1212: URL: https://github.com/apache/datafusion-ballista/pull/1212#discussion_r2028175201 ## ballista/scheduler/src/scheduler_server/grpc.rs: ## @@ -125,10 +125,15 @@ impl SchedulerGrpc let mut tasks = vec![]; for (_

Re: [PR] feat: Improve fetch partition performance, support skip validation arrow ipc files [datafusion-ballista]

2025-04-03 Thread via GitHub
westhide commented on PR #1216: URL: https://github.com/apache/datafusion-ballista/pull/1216#issuecomment-2777671990 > > Q1: As the `BallistaFlightService` keep listenning on each Executor,writting it allow client to send a `do_get` request, and without check `FetchPartition` action's `pat

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027781065 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1445,119 +1465,26 @@ mod tests { // batch1: c1(string) let batch1 = string_ba

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027668758 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1445,119 +1465,26 @@ mod tests { // batch1: c1(string) let batch1 = string_ba

Re: [I] Blog post about user defined window functions [datafusion]

2025-04-03 Thread via GitHub
Adez017 commented on issue #6781: URL: https://github.com/apache/datafusion/issues/6781#issuecomment-2776464521 hi @alamb , it sounds pretty interesting ! . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Optimize repartitioning logic in ShuffleWriterExec using interleave_record_batch [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove closed issue #1235: Optimize repartitioning logic in ShuffleWriterExec using interleave_record_batch URL: https://github.com/apache/datafusion-comet/issues/1235 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] Add coerce int96 option for Parquet to support different TimeUnits, test int96_from_spark.parquet from parquet-testing [datafusion]

2025-04-03 Thread via GitHub
mbutrovich commented on PR #15537: URL: https://github.com/apache/datafusion/pull/15537#issuecomment-2776452713 https://github.com/apache/parquet-testing/pull/73 merged so I updated the parquet-testing dependency. Now waiting on an arrow-rs release and DF bumping to that version. -- This

Re: [I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-04-03 Thread via GitHub
mbutrovich commented on issue #1228: URL: https://github.com/apache/datafusion-comet/issues/1228#issuecomment-2776728769 > Has there been an update on this? > > My workflow specifies a schema to my read function through a JSON file, date-related fields are specified to be timestamp t

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027702690 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4665,18 +4674,18 @@ Projection: person.id, person.age } #[test] -fn test_prepare_statement_infer_types_fr

Re: [PR] Docs : Added Sql examples for window Functions : `nth_val` , etc [datafusion]

2025-04-03 Thread via GitHub
Adez017 commented on PR #1: URL: https://github.com/apache/datafusion/pull/1#issuecomment-2777473262 hey @alamb , i'm pretty confused over here . could you please help to specify what is failing for ? -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Audit-check fails in main branch [datafusion]

2025-04-03 Thread via GitHub
xudong963 commented on issue #15554: URL: https://github.com/apache/datafusion/issues/15554#issuecomment-2777425499 dup with https://github.com/apache/datafusion/issues/15571 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [I] Internal error: PhysicalExpr Column references bound error, Failure in spilling for `AggregateMode::Single` [datafusion]

2025-04-03 Thread via GitHub
alamb closed issue #15530: Internal error: PhysicalExpr Column references bound error, Failure in spilling for `AggregateMode::Single` URL: https://github.com/apache/datafusion/issues/15530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Remove record_batch! macro once upstream updates [datafusion]

2025-04-03 Thread via GitHub
alamb commented on issue #13037: URL: https://github.com/apache/datafusion/issues/13037#issuecomment-2776834231 Thanks @ByteBaker -- I don't have any strong opinion / advice here unfortunately -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [I] Blog post about user defined window functions [datafusion]

2025-04-03 Thread via GitHub
Adez017 commented on issue #6781: URL: https://github.com/apache/datafusion/issues/6781#issuecomment-2776464810 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Audit-check fails in main branch [datafusion]

2025-04-03 Thread via GitHub
xudong963 closed issue #15554: Audit-check fails in main branch URL: https://github.com/apache/datafusion/issues/15554 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] Run all benchmarks on merge to main branch [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 commented on issue #15511: URL: https://github.com/apache/datafusion/issues/15511#issuecomment-2777328776 > You are also right here about the cost, but what if we can have 2 modes for benchmarks, one for the actual benchmarking purpose, and one with just to validate. If it is in

Re: [I] limit max disk usage for spilling queries [datafusion]

2025-04-03 Thread via GitHub
alamb closed issue #15358: limit max disk usage for spilling queries URL: https://github.com/apache/datafusion/issues/15358 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [I] A complete solution for stable and safe sort with spill [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on issue #14692: URL: https://github.com/apache/datafusion/issues/14692#issuecomment-2777376142 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] A complete solution for stable and safe sort with spill [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on issue #14692: URL: https://github.com/apache/datafusion/issues/14692#issuecomment-2777376035 This issue looks interesting to me. I'll try to work on it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [I] Erroneous warning on unset options during FFI table operation [datafusion]

2025-04-03 Thread via GitHub
chenkovsky commented on issue #15565: URL: https://github.com/apache/datafusion/issues/15565#issuecomment-2777362621 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[PR] chore: update clickbench [datafusion]

2025-04-03 Thread via GitHub
chenkovsky opened a new pull request, #15574: URL: https://github.com/apache/datafusion/pull/15574 ## Which issue does this PR close? - Closes #15509 . ## Rationale for this change ## What changes are included in this PR? remove ::INT::DATE in clickbench.

Re: [I] `count` fails for FFI Table Providers [datafusion]

2025-04-03 Thread via GitHub
chenkovsky commented on issue #15569: URL: https://github.com/apache/datafusion/issues/15569#issuecomment-2777357348 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] `cargo audit` is failing on main [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 commented on issue #15571: URL: https://github.com/apache/datafusion/issues/15571#issuecomment-2777340299 Can we try `proc-macro-error2` or the latest version of `proc-macro-error`? If it is not trivial or causing too much breaking change, then we can add a warning instead --

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 merged PR #15281: URL: https://github.com/apache/datafusion/pull/15281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2776871923 This is first thing on my list to review tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-04-03 Thread via GitHub
comphead closed issue #1228: [comet-parquet-exec] Track remaining test failures in POC 1 & 2 URL: https://github.com/apache/datafusion-comet/issues/1228 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] fix: Queries similar to `count-bug` produce incorrect results [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 commented on PR #15281: URL: https://github.com/apache/datafusion/pull/15281#issuecomment-2777259273 Thanks @suibianwanwank -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Similar to the "count-bug" case that produces incorrect results [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 closed issue #15032: Similar to the "count-bug" case that produces incorrect results URL: https://github.com/apache/datafusion/issues/15032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: add Arc for statistics in FileGroup [datafusion]

2025-04-03 Thread via GitHub
jayzhan211 commented on PR #15564: URL: https://github.com/apache/datafusion/pull/15564#issuecomment-2777258566 Thanks @xudong963 @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Enable repartitioning on MemTable. [datafusion]

2025-04-03 Thread via GitHub
wiedld commented on PR #15409: URL: https://github.com/apache/datafusion/pull/15409#issuecomment-2777243569 Thanks for the review. Haven't had time to do the updates. Converting to draft, and will mark ready again after updates. -- This is an automated message from the Apache Git

Re: [I] Table function supports non-literal args [datafusion]

2025-04-03 Thread via GitHub
Lordworms commented on issue #14958: URL: https://github.com/apache/datafusion/issues/14958#issuecomment-2777223899 Like what duckdb did ## 1. Detect Correlated Columns - Walk subquery expressions and collect all outer (correlated) column references. - Represent them as `Expr::Outer

Re: [I] Trivial WHERE filter not eliminated when combined with CTE [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on issue #15387: URL: https://github.com/apache/datafusion/issues/15387#issuecomment-2777214161 But is that possible to evaluate via a simplifier? I'd think that in general we don't know that until execution time. -- This is an automated message from the Apache

Re: [PR] Add short circuit evaluation for `AND` and `OR` [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #15462: URL: https://github.com/apache/datafusion/pull/15462#discussion_r2026912500 ## datafusion/physical-expr/src/expressions/binary.rs: ## @@ -805,6 +811,47 @@ impl BinaryExpr { } } +/// Check if it meets the short-circuit condition +///

Re: [I] `cargo audit` is failing on main [datafusion]

2025-04-03 Thread via GitHub
Jiashu-Hu commented on issue #15571: URL: https://github.com/apache/datafusion/issues/15571#issuecomment-2777171963 For now, it might be best to just allow this warning. Since everything is working fine as it is. Alternatively we could try to use 'proc-macro-error2' instead, but that might

Re: [I] `cargo audit` is failing on main [datafusion]

2025-04-03 Thread via GitHub
Jiashu-Hu commented on issue #15571: URL: https://github.com/apache/datafusion/issues/15571#issuecomment-2777150559 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027851655 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -295,3 +312,89 @@ fn create_initial_plan( // default to scanning all row groups Ok(ParquetAccessPl

Re: [I] Include Apple macOS support in jars in Maven central [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove commented on issue #1010: URL: https://github.com/apache/datafusion-comet/issues/1010#issuecomment-2775877457 Closing as duplicate of https://github.com/apache/datafusion-comet/issues/947 -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove closed issue #1450: 【TPCH】Comet do not show performance advantages over native Spark? URL: https://github.com/apache/datafusion-comet/issues/1450 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] TPCH DataGen Not working [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove commented on issue #1157: URL: https://github.com/apache/datafusion-comet/issues/1157#issuecomment-2775885159 Closing this issue since it is for an issue with a Databricks repo that we do not control -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Add more developer documentation [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove closed issue #230: Add more developer documentation URL: https://github.com/apache/datafusion-comet/issues/230 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] 【TPCH】Comet do not show performance advantages over native Spark? [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove commented on issue #1450: URL: https://github.com/apache/datafusion-comet/issues/1450#issuecomment-2775796160 I'll close this issue for now since I cannot reproduce. Please feel free to reopen it if you'd like to continue with this. -- This is an automated message from the Apac

[PR] Fix clippy lint on rust 1.86 [datafusion-sqlparser-rs]

2025-04-03 Thread via GitHub
iffyio opened a new pull request, #1796: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1796 Fixes [lint failures in CI](https://github.com/apache/datafusion-sqlparser-rs/actions/runs/14236730212/job/39938621832?pr=1790) from the latest rust release -- This is an automated m

Re: [PR] Docs : Added Sql examples for window Functions : `nth_val` , etc [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #1: URL: https://github.com/apache/datafusion/pull/1#issuecomment-2776823505 It looks like there are a few small errors to fix to get a clean CI run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027772249 ## datafusion/sqllogictest/test_files/parquet.slt: ## @@ -625,7 +625,7 @@ physical_plan 01)CoalesceBatchesExec: target_batch_size=8192 02)--FilterExec: column1@

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027656995 ## datafusion/core/src/datasource/physical_plan/parquet.rs: ## @@ -1445,119 +1465,26 @@ mod tests { // batch1: c1(string) let batch1 = string_batch

Re: [PR] Fix Possible Congestion Scenario in `SortPreservingMergeExec` [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #12302: URL: https://github.com/apache/datafusion/pull/12302#issuecomment-2776791139 FYI @rluvaton and I have noticed a `clone()` introduced in this PR appearing in some traces: - https://github.com/apache/datafusion/issues/15573 -- This is an automated messag

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [I] [EPIC] A collection of tickets for improving sorting larger than memory datasets / spilling sorts [datafusion]

2025-04-03 Thread via GitHub
alamb commented on issue #15271: URL: https://github.com/apache/datafusion/issues/15271#issuecomment-2776837729 > so If I have a lot of spill files or if every batch is really huge (contains very large lists - like result for array_agg on large dataset) we have all of this in memory.

Re: [I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-04-03 Thread via GitHub
comphead commented on issue #1228: URL: https://github.com/apache/datafusion-comet/issues/1228#issuecomment-2776744809 Closing this in favor of #1441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [I] [comet-parquet-exec] Track remaining test failures in POC 1 & 2 [datafusion-comet]

2025-04-03 Thread via GitHub
mkgada commented on issue #1228: URL: https://github.com/apache/datafusion-comet/issues/1228#issuecomment-2776590893 Has there been an update on this? My workflow specifies a schema to my read function through a JSON file, date-related fields are specified to be timestamp type.

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15567: URL: https://github.com/apache/datafusion/pull/15567#issuecomment-2776729981 is this the last one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on PR #15567: URL: https://github.com/apache/datafusion/pull/15567#issuecomment-2776882350 > is this the last one? This is the last one for migrating tests in `tests/sql_integration.rs`. There are still some cases in `tests/cases/plan_to_sql.rs` and `tests

Re: [PR] perf: replace `merge` `uninitiated_partitions` `VecDeque` with custom fixed size queue [datafusion]

2025-04-03 Thread via GitHub
rluvaton commented on PR #15562: URL: https://github.com/apache/datafusion/pull/15562#issuecomment-2776872261 I removed the custom fixed size queue and replaced with matching vec deque functions that are 0-cost as it's just index manipulation like my prev impl -- This is an automated m

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027712552 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [PR] perf: replace `merge` `uninitiated_partitions` `VecDeque` with custom fixed size queue [datafusion]

2025-04-03 Thread via GitHub
rluvaton commented on code in PR #15562: URL: https://github.com/apache/datafusion/pull/15562#discussion_r2027711140 ## datafusion/physical-plan/src/sorts/merge.rs: ## @@ -241,10 +239,13 @@ impl SortPreservingMergeStream { _ => { //

[I] `cargo audit` is failing on main [datafusion]

2025-04-03 Thread via GitHub
alamb opened a new issue, #15571: URL: https://github.com/apache/datafusion/issues/15571 ### Describe the bug We are seeing a cargo audit failure on @zebsme 's PR: https://github.com/apache/datafusion/pull/15454 ``` Crate: proc-macro-error Version: 1.0.4 Warn

Re: [I] Remove unwraps in `hash_array_small_decimal` [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove closed issue #1599: Remove unwraps in `hash_array_small_decimal` URL: https://github.com/apache/datafusion-comet/issues/1599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
qstommyshu commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027702690 ## datafusion/sql/tests/sql_integration.rs: ## @@ -4665,18 +4674,18 @@ Projection: person.id, person.age } #[test] -fn test_prepare_statement_infer_types_fr

Re: [PR] Update concepts-readings-events.md [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15541: URL: https://github.com/apache/datafusion/pull/15541#issuecomment-2776844631 Thanks @oznur-synnada and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [substrait] Build basic test suite to validate produced Substrait plans [datafusion]

2025-04-03 Thread via GitHub
alamb commented on issue #15069: URL: https://github.com/apache/datafusion/issues/15069#issuecomment-2776844057 > So I don't believe automatically running the full sqllogictests through Substrait is feasible today. A more realistic goal would be to have a few hand-crafted queries (TPCH quer

Re: [PR] Add topk information into tree explain plans [datafusion]

2025-04-03 Thread via GitHub
alamb merged PR #15547: URL: https://github.com/apache/datafusion/pull/15547 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add topk information into tree explain plans [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15547: URL: https://github.com/apache/datafusion/pull/15547#issuecomment-2776824435 Thanks again @kumarlokesh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Add `topk` information into `tree` explain plans [datafusion]

2025-04-03 Thread via GitHub
alamb closed issue #15546: Add `topk` information into `tree` explain plans URL: https://github.com/apache/datafusion/issues/15546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Blog post about user defined window functions [datafusion]

2025-04-03 Thread via GitHub
alamb commented on issue #6781: URL: https://github.com/apache/datafusion/issues/6781#issuecomment-2776822081 Thanks @Adez017 -- You can make a blog post for the DataFusion blog on by making a PR to this repo: https://github.com/alamb/datafusion-site -- This is an automated messa

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-04-03 Thread via GitHub
alamb commented on issue #15323: URL: https://github.com/apache/datafusion/issues/15323#issuecomment-2776820608 > I think I have the the same problem but in `AggregateExec` when using `row_hash`, as it spills as well and use `SortPreservingMergeStream`. > > I think the solution should

Re: [PR] tpcbench.py add --query support to run custom query [datafusion-ray]

2025-04-03 Thread via GitHub
jazracherif commented on code in PR #84: URL: https://github.com/apache/datafusion-ray/pull/84#discussion_r2027648678 ## tpch/tpcbench.py: ## @@ -186,8 +186,28 @@ def main( args = parser.parse_args() +if (args.qnum != -1 and args.query is not None): +print("

Re: [PR] Minor: add Arc for statistics in FileGroup [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15564: URL: https://github.com/apache/datafusion/pull/15564#issuecomment-2776794019 Thank you @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Fix Possible Congestion Scenario in `SortPreservingMergeExec` [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #12302: URL: https://github.com/apache/datafusion/pull/12302#discussion_r2027651936 ## datafusion/physical-plan/src/sorts/merge.rs: ## @@ -154,14 +165,36 @@ impl SortPreservingMergeStream { if self.aborted { return Poll::Ready(

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
blaginin commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027650462 ## datafusion/sql/tests/sql_integration.rs: ## @@ -3388,26 +3389,15 @@ fn ident_normalization_parser_options_ident_normalization() -> ParserOptions { } }

Re: [PR] perf: replace `merge` `uninitiated_partitions` `VecDeque` with custom fixed size queue [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15562: URL: https://github.com/apache/datafusion/pull/15562#issuecomment-2776787483 BTW I filed a ticket to track this as I have seen it too - https://github.com/apache/datafusion/issues/15573 -- This is an automated message from the Apache Git Service. To respon

[I] Improve time for SortPreservingMerge stream / uninitiated_partitions VecDeque [datafusion]

2025-04-03 Thread via GitHub
alamb opened a new issue, #15573: URL: https://github.com/apache/datafusion/issues/15573 ### Is your feature request related to a problem or challenge? Both @rluvaton and I have seen https://github.com/user-attachments/assets/cd91c702-51fa-45b7-9214-7913c1281161"; /> !

Re: [I] Explore integration with Delta Lake [datafusion-comet]

2025-04-03 Thread via GitHub
dennyglee commented on issue #174: URL: https://github.com/apache/datafusion-comet/issues/174#issuecomment-2776310481 Oh brilliant! Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] perf: replace `merge` `uninitiated_partitions` `VecDeque` with custom fixed size queue [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #15562: URL: https://github.com/apache/datafusion/pull/15562#discussion_r2027641047 ## datafusion/physical-plan/src/sorts/merge.rs: ## @@ -241,10 +239,13 @@ impl SortPreservingMergeStream { _ => { // If

Re: [PR] chore: return `404` for api requests if path does not exist [datafusion-ballista]

2025-04-03 Thread via GitHub
milenkovicm commented on PR #1224: URL: https://github.com/apache/datafusion-ballista/pull/1224#issuecomment-2776764135 clippy errors should be fixed in #1225 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] Extend benchmarking to "TopK" queries [datafusion]

2025-04-03 Thread via GitHub
geoffreyclaude opened a new issue, #15559: URL: https://github.com/apache/datafusion/issues/15559 ### Is your feature request related to a problem or challenge? Currently, the benchmarks folder in DataFusion does not include dedicated benchmarks for TopK queries (i.e., queries formatt

Re: [PR] Add dynamic pruning filters from TopK state [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on PR #15301: URL: https://github.com/apache/datafusion/pull/15301#issuecomment-2776668146 I'm closing this now massive PR in favor of splitting it up into units of work https://github.com/apache/datafusion/issues/15512#issuecomment-2776631138 Thank you all for the

Re: [PR] Migrate datafusion/sql tests to insta, part5 [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #15567: URL: https://github.com/apache/datafusion/pull/15567#discussion_r2027613057 ## datafusion/sql/tests/sql_integration.rs: ## @@ -15,6 +15,7 @@ // specific language governing permissions and limitations // under the License. +use core::pani

Re: [PR] Add topk information into tree explain plans [datafusion]

2025-04-03 Thread via GitHub
alamb commented on code in PR #15547: URL: https://github.com/apache/datafusion/pull/15547#discussion_r2027592484 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -1112,7 +1112,10 @@ impl DisplayAs for SortExec { impl ExecutionPlan for SortExec { fn name(&self) -> &'

Re: [PR] fix: update group by columns for merge phase after spill [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15531: URL: https://github.com/apache/datafusion/pull/15531#issuecomment-2776696614 Thank you @rluvaton -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Internal error: PhysicalExpr Column references bound error, Failure in spilling for `AggregateMode::Single` [datafusion]

2025-04-03 Thread via GitHub
alamb closed issue #15530: Internal error: PhysicalExpr Column references bound error, Failure in spilling for `AggregateMode::Single` URL: https://github.com/apache/datafusion/issues/15530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix: update group by columns for merge phase after spill [datafusion]

2025-04-03 Thread via GitHub
alamb merged PR #15531: URL: https://github.com/apache/datafusion/pull/15531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-04-03 Thread via GitHub
alamb commented on PR #15520: URL: https://github.com/apache/datafusion/pull/15520#issuecomment-2776693763 Thanks again @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: Add config `max_temp_directory_size` to limit max disk usage for spilling queries [datafusion]

2025-04-03 Thread via GitHub
alamb merged PR #15520: URL: https://github.com/apache/datafusion/pull/15520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] address failure caused by method signature change in SPARK-48791 [datafusion-comet]

2025-04-03 Thread via GitHub
parthchandra commented on issue #692: URL: https://github.com/apache/datafusion-comet/issues/692#issuecomment-2776495289 Seems like a perennial issue. This signature changes in every release it appears (it is private after all). https://github.com/apache/datafusion-comet/issues/1576 --

Re: [I] Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on issue #15037: URL: https://github.com/apache/datafusion/issues/15037#issuecomment-2776667178 Sorry closed the issue instead of the PR, my bad! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) [datafusion]

2025-04-03 Thread via GitHub
adriangb closed issue #15037: Dynamic pruning filters from TopK state (optimize `ORDER BY LIMIT` queries) URL: https://github.com/apache/datafusion/issues/15037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-03 Thread via GitHub
adriangb opened a new pull request, #15568: URL: https://github.com/apache/datafusion/pull/15568 Work towards #15512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

[PR] chore: return `404` for api requests if path does not exist [datafusion-ballista]

2025-04-03 Thread via GitHub
milenkovicm opened a new pull request, #1224: URL: https://github.com/apache/datafusion-ballista/pull/1224 # Which issue does this PR close? Closes #1223 . # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

Re: [I] Reduce number of tokio blocking threads in SortExec spill [datafusion]

2025-04-03 Thread via GitHub
rluvaton commented on issue #15323: URL: https://github.com/apache/datafusion/issues/15323#issuecomment-2776605858 I think I have the the same problem but in `AggregateExec` when using `row_hash`, as it spills as well and use `SortPreservingMergeStream`. I think the solution should ac

[PR] chore: fix clippy issues after update to rust 1.86 [datafusion-ballista]

2025-04-03 Thread via GitHub
milenkovicm opened a new pull request, #1225: URL: https://github.com/apache/datafusion-ballista/pull/1225 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? - fix clippy issues after rust 1.86 u

Re: [PR] parquet reader: move pruning predicate creation from ParquetSource to ParquetOpener [datafusion]

2025-04-03 Thread via GitHub
adriangb commented on code in PR #15561: URL: https://github.com/apache/datafusion/pull/15561#discussion_r2027429964 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -67,13 +67,13 @@ pub(crate) mod test_util { .into_iter() .zip(tmp_files.

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

2025-04-03 Thread via GitHub
LiaCastaneda commented on code in PR #15438: URL: https://github.com/apache/datafusion/pull/15438#discussion_r2025709682 ## datafusion/physical-expr/src/equivalence/projection.rs: ## @@ -66,9 +66,9 @@ impl ProjectionMapping { let idx = col.index();

Re: [PR] Fix duplicate unqualified Field name (schema error) on join queries [datafusion]

2025-04-03 Thread via GitHub
LiaCastaneda commented on code in PR #15438: URL: https://github.com/apache/datafusion/pull/15438#discussion_r2025709682 ## datafusion/physical-expr/src/equivalence/projection.rs: ## @@ -66,9 +66,9 @@ impl ProjectionMapping { let idx = col.index();

Re: [PR] Respect ignore_nulls in array_agg [datafusion]

2025-04-03 Thread via GitHub
Dandandan commented on code in PR #15544: URL: https://github.com/apache/datafusion/pull/15544#discussion_r2027495310 ## datafusion/functions-aggregate/src/array_agg.rs: ## @@ -500,11 +548,30 @@ impl Accumulator for OrderSensitiveArrayAggAccumulator { return Ok(());

Re: [I] Native scan panic with native_iceberg_compat on hdfs [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove closed issue #1553: Native scan panic with native_iceberg_compat on hdfs URL: https://github.com/apache/datafusion-comet/issues/1553 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] fix: avoid panic caused by close null handle of parquet reader [datafusion-comet]

2025-04-03 Thread via GitHub
andygrove merged PR #1604: URL: https://github.com/apache/datafusion-comet/pull/1604 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-03 Thread via GitHub
adriangb opened a new pull request, #15566: URL: https://github.com/apache/datafusion/pull/15566 Work towards #15037 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

  1   2   >