Re: [PR] build: Fix CI failures resulting from GitHub change [datafusion-comet]

2025-11-22 Thread via GitHub
codecov-commenter commented on PR #2816: URL: https://github.com/apache/datafusion-comet/pull/2816#issuecomment-3567011450 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2816?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] ListingTable handling of missing partition values [datafusion]

2025-11-22 Thread via GitHub
corasaurus-hex commented on issue #18083: URL: https://github.com/apache/datafusion/issues/18083#issuecomment-3567031005 I'm interested in picking this up late next week but I'll hold off taking it until then. By hive you mean apache hive? I'm inclined to follow whatever the "standard" is f

[PR] build(deps): bump datafusion from 50.3.0 to 51.0.0 [datafusion-python]

2025-11-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1306: URL: https://github.com/apache/datafusion-python/pull/1306 Bumps [datafusion](https://github.com/apache/datafusion) from 50.3.0 to 51.0.0. Commits https://github.com/apache/datafusion/commit/fd35a09438a2b4841431f5e86ffef378cbbda

[PR] build(deps): bump datafusion-substrait from 50.3.0 to 51.0.0 [datafusion-python]

2025-11-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1308: URL: https://github.com/apache/datafusion-python/pull/1308 Bumps [datafusion-substrait](https://github.com/apache/datafusion) from 50.3.0 to 51.0.0. Commits https://github.com/apache/datafusion/commit/fd35a09438a2b4841431f5e86ff

[PR] build(deps): bump datafusion-ffi from 50.3.0 to 51.0.0 [datafusion-python]

2025-11-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1307: URL: https://github.com/apache/datafusion-python/pull/1307 Bumps [datafusion-ffi](https://github.com/apache/datafusion) from 50.3.0 to 51.0.0. Commits https://github.com/apache/datafusion/commit/fd35a09438a2b4841431f5e86ffef378c

[PR] build(deps): bump datafusion-proto from 50.3.0 to 51.0.0 [datafusion-python]

2025-11-22 Thread via GitHub
dependabot[bot] opened a new pull request, #1309: URL: https://github.com/apache/datafusion-python/pull/1309 Bumps [datafusion-proto](https://github.com/apache/datafusion) from 50.3.0 to 51.0.0. Commits https://github.com/apache/datafusion/commit/fd35a09438a2b4841431f5e86ffef37

[PR] build: Fix CI failures resulting from GitHub change [datafusion-comet]

2025-11-22 Thread via GitHub
andygrove opened a new pull request, #2816: URL: https://github.com/apache/datafusion-comet/pull/2816 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Fast scalar path for array_slice [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on issue #18458: URL: https://github.com/apache/datafusion/issues/18458#issuecomment-3566982557 Oh, I found that we don't have support for `ListView` and `LargeListView` in `ScalarValue`, so this approach might have to wait. https://github.com/apache/datafusion/blob/6

Re: [PR] Add PostgreSQL `CREATE USER` and `ALTER USER` support [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
github-actions[bot] commented on PR #2015: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2015#issuecomment-3567388230 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3567537044 > πŸ€–: Benchmark completed > > Details > > ``` > group main specialize > -

Re: [PR] chore: update datafusion to 51.0 [datafusion-ballista]

2025-11-22 Thread via GitHub
danielhumanmod commented on code in PR #1345: URL: https://github.com/apache/datafusion-ballista/pull/1345#discussion_r2553813039 ## Cargo.toml: ## @@ -27,27 +27,28 @@ resolver = "2" # edition = "2021" # we should try to follow datafusion version -rust-version = "1.86.0" +

Re: [PR] fix: preserve byte-size statistics in AggregateExec [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on PR #18885: URL: https://github.com/apache/datafusion/pull/18885#issuecomment-3567537555 Thanks for opening the PR! I fired the tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on code in PR #18832: URL: https://github.com/apache/datafusion/pull/18832#discussion_r2553808933 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -198,68 +211,127 @@ impl ArrayStaticFilter { } } -struct Int32StaticFilter { -null_coun

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on code in PR #18832: URL: https://github.com/apache/datafusion/pull/18832#discussion_r2553808461 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -198,68 +211,127 @@ impl ArrayStaticFilter { } } -struct Int32StaticFilter { -null_coun

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
adriangb commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3567553330 Yes. Slowdowns for i32 are concerning. I won’t merge this until it’s all speedups or neutral. I may also make a support PR to add more benchmarks for other types so we can make bett

[I] Support `ListView`, `LargeListView` in `ScalarValue` [datafusion]

2025-11-22 Thread via GitHub
dqkqd opened a new issue, #18886: URL: https://github.com/apache/datafusion/issues/18886 ### Is your feature request related to a problem or challenge? While looking at #18458, I realized we don't have `ListView` and `LargeListView` in `ScalarValue`. There are some recent PRs s

Re: [I] Support `ListView`, `LargeListView` in `ScalarValue` [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on issue #18886: URL: https://github.com/apache/datafusion/issues/18886#issuecomment-3567483755 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Avoid skew in Roundrobin repartition [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on PR #18880: URL: https://github.com/apache/datafusion/pull/18880#issuecomment-3567577493 Couldn't show perf difference on my 10 core machine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] chore: update datafusion to 51.0 [datafusion-ballista]

2025-11-22 Thread via GitHub
danielhumanmod commented on PR #1345: URL: https://github.com/apache/datafusion-ballista/pull/1345#issuecomment-3567578813 Hey @milenkovicm β€” I continued your work by adding the new metrics type support, adjusting DF API usages, and updating the related tests. CI looks fine, appreciate you

Re: [PR] Push down InList or hash table references from HashJoinExec depending on the size of the build side [datafusion]

2025-11-22 Thread via GitHub
asolimando commented on code in PR #18393: URL: https://github.com/apache/datafusion/pull/18393#discussion_r2552698506 ## datafusion/common/src/config.rs: ## @@ -1019,6 +1019,22 @@ config_namespace! { /// will be collected into a single partition pub hash_join_

Re: [PR] Support PostgreSQL C Functions with Multiple AS Parameters [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
LucaCappelletti94 commented on code in PR #2095: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2095#discussion_r2552561563 ## src/parser/mod.rs: ## @@ -10224,17 +10224,32 @@ impl<'a> Parser<'a> { /// Parse the body of a `CREATE FUNCTION` specified as a string

Re: [I] Fast scalar path for array_slice [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on issue #18458: URL: https://github.com/apache/datafusion/issues/18458#issuecomment-3566077618 I was looking into this. For `ListArray` and `LargListArray` inputs, I think we have to copy child arrays to construct the return array, regardless of whether they are conti

[PR] feat: add `array_slice` benchmark [datafusion]

2025-11-22 Thread via GitHub
dqkqd opened a new pull request, #18879: URL: https://github.com/apache/datafusion/pull/18879 ## Which issue does this PR close? - Part of #18458. ## Rationale for this change - Add bench mark for `array_slice` ## What changes are included in this PR? Bench

Re: [PR] feat: add `array_slice` benchmark [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on PR #18879: URL: https://github.com/apache/datafusion/pull/18879#issuecomment-3566147442 Output from my laptop ```bash Gnuplot not found, using plotters backend Benchmarking array_slice: input List(nullable Int64), array args: Warming up for 3. s W

Re: [PR] impl `Spanned` for MERGE statements [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
xitep commented on code in PR #2100: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2100#discussion_r2552626909 ## src/parser/mod.rs: ## @@ -17225,66 +17229,84 @@ impl<'a> Parser<'a> { self.expect_keyword_is(Keyword::THEN)?; +macro_rule

Re: [PR] Push down InList or hash table references from HashJoinExec depending on the size of the build side [datafusion]

2025-11-22 Thread via GitHub
adriangb commented on code in PR #18393: URL: https://github.com/apache/datafusion/pull/18393#discussion_r2552634204 ## datafusion/common/src/config.rs: ## @@ -1019,6 +1019,22 @@ config_namespace! { /// will be collected into a single partition pub hash_join_si

Re: [PR] Push down InList or hash table references from HashJoinExec depending on the size of the build side [datafusion]

2025-11-22 Thread via GitHub
adriangb commented on code in PR #18393: URL: https://github.com/apache/datafusion/pull/18393#discussion_r2552634204 ## datafusion/common/src/config.rs: ## @@ -1019,6 +1019,22 @@ config_namespace! { /// will be collected into a single partition pub hash_join_si

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
adriangb commented on code in PR #18832: URL: https://github.com/apache/datafusion/pull/18832#discussion_r2552638215 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -198,68 +206,122 @@ impl ArrayStaticFilter { } } -struct Int32StaticFilter { -null_count

Re: [PR] Add PhysicalOptimizerRule::optimize_plan to allow passing more context into optimizer rules [datafusion]

2025-11-22 Thread via GitHub
Copilot commented on code in PR #18739: URL: https://github.com/apache/datafusion/pull/18739#discussion_r2552644385 ## datafusion/physical-optimizer/src/optimizer.rs: ## @@ -49,12 +89,49 @@ use datafusion_physical_plan::ExecutionPlan; /// /// [`SessionState::add_physical_optim

Re: [PR] optimizer: Support dynamic filter in `MIN/MAX` aggregates [datafusion]

2025-11-22 Thread via GitHub
LiaCastaneda commented on code in PR #18644: URL: https://github.com/apache/datafusion/pull/18644#discussion_r2553021314 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -1293,12 +1293,31 @@ impl ExecutionPlan for AggregateExec { ) -> Result>> { let mut res

Re: [PR] optimizer: Support dynamic filter in `MIN/MAX` aggregates [datafusion]

2025-11-22 Thread via GitHub
LiaCastaneda commented on code in PR #18644: URL: https://github.com/apache/datafusion/pull/18644#discussion_r2553021314 ## datafusion/physical-plan/src/aggregates/mod.rs: ## @@ -1293,12 +1293,31 @@ impl ExecutionPlan for AggregateExec { ) -> Result>> { let mut res

Re: [D] DISCUSSION: DataFusion Meetup in Denver, CO USA [datafusion]

2025-11-22 Thread via GitHub
GitHub user alamb added a comment to the discussion: DISCUSSION: DataFusion Meetup in Denver, CO USA We have an official venue and date πŸŽ‰ Date: Wednesday, July 22, 2026, 6:00 PM - 8:00 PM Location: [Code Talent, 3412 Blake St, Denver, CO 80205](https://www.google.com/maps/place/3412+Blake+St

Re: [I] Support duration in sum [datafusion]

2025-11-22 Thread via GitHub
Weijun-H closed issue #18771: Support duration in sum URL: https://github.com/apache/datafusion/issues/18771 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] implement sum for durations [datafusion]

2025-11-22 Thread via GitHub
Weijun-H merged PR #18853: URL: https://github.com/apache/datafusion/pull/18853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] implement sum for durations [datafusion]

2025-11-22 Thread via GitHub
Weijun-H commented on PR #18853: URL: https://github.com/apache/datafusion/pull/18853#issuecomment-3566761738 Thanks @martin-g @Jefffrey and @alamb for reviewing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support reverse parquet scan and fast parquet order inversion at row group level [datafusion]

2025-11-22 Thread via GitHub
zhuqi-lucas commented on PR #18817: URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3566729760 Thank you @xudong963 @suremarc, i do a lot of changes comparing our internal implementation in this PR, but i think in general the major design is similar to our internal version

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553149497 ## datafusion/ffi/README.md: ## @@ -101,6 +101,36 @@ In this crate we have a variety of structs which closely mimic the behavior of their internal counterparts

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553155986 ## datafusion-examples/examples/ffi/ffi_module_loader/src/main.rs: ## @@ -49,13 +49,13 @@ async fn main() -> Result<()> { ))?(); // In order

Re: [I] Fast parquet order inversion [datafusion]

2025-11-22 Thread via GitHub
zhuqi-lucas commented on issue #17172: URL: https://github.com/apache/datafusion/issues/17172#issuecomment-3566722062 cc @alamb @crepererum @xudong963 The PR is ready for review now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] Support reverse parquet scan and fast parquet order inversion at row group level [datafusion]

2025-11-22 Thread via GitHub
zhuqi-lucas commented on PR #18817: URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3566722719 cc @alamb @crepererum @xudong963 The PR is ready for review now! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] Roundrobin repartition skews to left-side partitions [datafusion]

2025-11-22 Thread via GitHub
Dandandan opened a new issue, #18883: URL: https://github.com/apache/datafusion/issues/18883 ### Describe the bug Currently round robin repartition always starts input partition at 0, skewing partitions to the first partitions. This problem becomes bigger when the output partitions

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553157110 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -173,9 +177,11 @@ unsafe extern "C" fn retract_batch_fn_wrapper( } unsafe extern "C" fn release_fn_wrapper(a

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553157892 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -173,9 +177,11 @@ unsafe extern "C" fn retract_batch_fn_wrapper( } unsafe extern "C" fn release_fn_wrapper(a

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553158027 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -173,9 +177,11 @@ unsafe extern "C" fn retract_batch_fn_wrapper( } unsafe extern "C" fn release_fn_wrapper(a

Re: [PR] feat: Improve read types support [Avro] [datafusion]

2025-11-22 Thread via GitHub
EmilyMatt closed pull request #18809: feat: Improve read types support [Avro] URL: https://github.com/apache/datafusion/pull/18809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: Improve read types support [Avro] [datafusion]

2025-11-22 Thread via GitHub
EmilyMatt commented on PR #18809: URL: https://github.com/apache/datafusion/pull/18809#issuecomment-3566860990 > > Performance of the Avro reader is currently dismal :( > > FWIW this is major reason we have been working on the new `arrow-avro` crate- https://arrow.apache.org/blog/2025

Re: [PR] feat: Support Ref types in Scan [Avro] [datafusion]

2025-11-22 Thread via GitHub
EmilyMatt closed pull request #18812: feat: Support Ref types in Scan [Avro] URL: https://github.com/apache/datafusion/pull/18812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
alamb commented on code in PR #18832: URL: https://github.com/apache/datafusion/pull/18832#discussion_r2552997998 ## datafusion/physical-expr/src/expressions/in_list.rs: ## @@ -1028,6 +1104,612 @@ mod tests { Ok(()) } +#[test] +fn in_list_int8() -> Result

Re: [PR] Avoid repartition skew [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on code in PR #18880: URL: https://github.com/apache/datafusion/pull/18880#discussion_r2552840621 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -878,9 +878,9 @@ mod test { partition_row_counts.push(total_rows);

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

2025-11-22 Thread via GitHub
alamb commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3566702940 For GroupedHashAggregate stream in particular, another potential solution would be to implement a GroupsAccumulator for whatever aggregate you are working on, rather than rely on

[I] Deprecate `AggregateUDFImpl::is_nullable` in favour of `return_field` [datafusion]

2025-11-22 Thread via GitHub
Jefffrey opened a new issue, #18882: URL: https://github.com/apache/datafusion/issues/18882 ### Is your feature request related to a problem or challenge? Deprecate `is_nullable` as the same information is already encoded by the field from `return_field` https://github.com/apa

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
alamb commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3566646830 πŸ€–: Benchmark completed Details ``` Comparing HEAD and specialize Benchmark clickbench_extended.json ┏━━

Re: [PR] Optimize planning for projected nested union [datafusion]

2025-11-22 Thread via GitHub
logan-keede commented on PR #18713: URL: https://github.com/apache/datafusion/pull/18713#issuecomment-3566686436 @alamb @Omega359 Do you want to merge these changes? If so, I will update this branch. Please feel free to close it otherwise. -- This is an automated message from the Apache G

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
alamb commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3566687657 πŸ€–: Benchmark completed Details ``` group main specialize -

Re: [PR] Support reverse parquet scan and fast parquet order inversion at row group level [datafusion]

2025-11-22 Thread via GitHub
xudong963 commented on PR #18817: URL: https://github.com/apache/datafusion/pull/18817#issuecomment-3566725990 Also cc @suremarc, finally, we're contributing our reversed parquet optimization to upstream, I guess you may be interested in seeing it. -- This is an automated message from the

Re: [PR] [WIP] Update to `arrow`, `parquet` 57.1.0 [datafusion]

2025-11-22 Thread via GitHub
rluvaton commented on PR #18820: URL: https://github.com/apache/datafusion/pull/18820#issuecomment-3566823185 Thank you, I have some more in my sleeve. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553215194 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -173,9 +177,11 @@ unsafe extern "C" fn retract_batch_fn_wrapper( } unsafe extern "C" fn release_fn_wrapper(a

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553215751 ## datafusion/ffi/src/lib.rs: ## @@ -58,5 +58,31 @@ pub extern "C" fn version() -> u64 { version.major } +static LIBRARY_MARKER: u8 = 0; + +/// This util

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553215442 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -70,6 +70,10 @@ pub struct FFI_Accumulator { /// Internal data. This is only to be accessed by the provider

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2553215845 ## docs/source/library-user-guide/upgrading.md: ## @@ -152,6 +152,47 @@ Instead of silently succeeding. The remove API no longer requires a mutable instance

Re: [PR] FFI: return underlying trait type when converting from FFI structs [datafusion]

2025-11-22 Thread via GitHub
timsaucer commented on PR #18672: URL: https://github.com/apache/datafusion/pull/18672#issuecomment-3566848589 @alamb Thank you for the review! I believe I've addressed all of your comments. The changes I've pushed are only in documentation. -- This is an automated message from the Apach

Re: [PR] implement sum for durations [datafusion]

2025-11-22 Thread via GitHub
logan-keede commented on PR #18853: URL: https://github.com/apache/datafusion/pull/18853#issuecomment-356523 Thanks @alamb @martin-g @Jefffrey for the reviews and suggestions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Support limit pruning [datafusion]

2025-11-22 Thread via GitHub
xudong963 commented on issue #18860: URL: https://github.com/apache/datafusion/issues/18860#issuecomment-3566674121 @2010YOUY01 The limit pruning is at the row group/page level. We can see the partition in the above pic as a row group or a page in Parquet. Without this limit pruning f

Re: [PR] Row group limit pruning [datafusion]

2025-11-22 Thread via GitHub
xudong963 commented on code in PR #18868: URL: https://github.com/apache/datafusion/pull/18868#discussion_r2553116326 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -407,8 +407,12 @@ impl FileOpener for ParquetOpener { .add_matched(n_remaining_row_gro

Re: [PR] Consolidate dataframe examples (#18142) [datafusion]

2025-11-22 Thread via GitHub
alamb commented on code in PR #18862: URL: https://github.com/apache/datafusion/pull/18862#discussion_r2553120326 ## datafusion-examples/examples/dataframe/main.rs: ## @@ -0,0 +1,107 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
alamb commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3566677313 πŸ€– `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~

Re: [I] ListingTable handling of missing partition values [datafusion]

2025-11-22 Thread via GitHub
alamb commented on issue #18083: URL: https://github.com/apache/datafusion/issues/18083#issuecomment-3566711735 > and an associated issue [duckdb/duckdb#12921](https://github.com/duckdb/duckdb/issues/12921) > > I do like the idea of the user being able to configure which value should

Re: [PR] Optimize planning for projected nested union [datafusion]

2025-11-22 Thread via GitHub
alamb commented on code in PR #18713: URL: https://github.com/apache/datafusion/pull/18713#discussion_r2553138394 ## datafusion/optimizer/src/eliminate_nested_union.rs: ## @@ -54,7 +54,7 @@ impl OptimizerRule for EliminateNestedUnion { plan: LogicalPlan, _confi

Re: [I] TPC-DS query 72 slow on modest (10) scale factors [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on issue #17494: URL: https://github.com/apache/datafusion/issues/17494#issuecomment-3566780230 > When running TPC-DS q72, I've noticed that regardless of the underlying file format, latency increases dramatically even with relatively modest scale factors like 10. I've m

Re: [PR] impl `Spanned` for MERGE statements [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
iffyio merged PR #2100: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] TPC-DS query 72 slow on modest (10) scale factors [datafusion]

2025-11-22 Thread via GitHub
AdamGS commented on issue #17494: URL: https://github.com/apache/datafusion/issues/17494#issuecomment-3566792370 What I remember from my investigation at the time is that it seems like a lot of this is just a hot loop inside `chain_traverse`. Pushing into the vec is slower than just writing

Re: [PR] Reduce FFI wrappers when round tripping code [datafusion]

2025-11-22 Thread via GitHub
alamb commented on code in PR #18672: URL: https://github.com/apache/datafusion/pull/18672#discussion_r2552911550 ## datafusion/ffi/src/udaf/accumulator.rs: ## @@ -173,9 +177,11 @@ unsafe extern "C" fn retract_batch_fn_wrapper( } unsafe extern "C" fn release_fn_wrapper(accum

[I] New lint `clippy::allow_attributes` [datafusion]

2025-11-22 Thread via GitHub
Jefffrey opened a new issue, #18881: URL: https://github.com/apache/datafusion/issues/18881 ### Is your feature request related to a problem or challenge? Lint reference: https://rust-lang.github.io/rust-clippy/master/index.html?search=allow_att#allow_attributes > Checks for us

Re: [I] New lint `clippy::allow_attributes` [datafusion]

2025-11-22 Thread via GitHub
Jefffrey commented on issue #18881: URL: https://github.com/apache/datafusion/issues/18881#issuecomment-3566506662 Thoughts @2010YOUY01 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Stricter `Clippy` checks in CI [datafusion]

2025-11-22 Thread via GitHub
Jefffrey commented on issue #18467: URL: https://github.com/apache/datafusion/issues/18467#issuecomment-3566506781 Another candidate: https://github.com/apache/datafusion/issues/18881 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] [WIP] Update to `arrow`, `parquet` 57.1.0 [datafusion]

2025-11-22 Thread via GitHub
alamb commented on PR #18820: URL: https://github.com/apache/datafusion/pull/18820#issuecomment-3566507909 > It seems to be quite a bit faster even without filter pushdown πŸš€ It is like someone has been optimizing low level filter kernels πŸ˜† (but seriously I think major credit is due t

Re: [I] [Tracking] Rollout of new lint `clippy::needless_pass_by_value` in all datafusion crates [datafusion]

2025-11-22 Thread via GitHub
Jefffrey commented on issue #18503: URL: https://github.com/apache/datafusion/issues/18503#issuecomment-3566508390 @2010YOUY01 are we good to enable this at the workspace level to close off this issue now? -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] add specialized InList implementations for common scalar types [datafusion]

2025-11-22 Thread via GitHub
alamb commented on PR #18832: URL: https://github.com/apache/datafusion/pull/18832#issuecomment-3566518295 πŸ€– `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubun

Re: [PR] chore: update Repartition DisplayAs to indicate maintained sort order [datafusion]

2025-11-22 Thread via GitHub
gabotechs commented on PR #18673: URL: https://github.com/apache/datafusion/pull/18673#issuecomment-3566519291 Thanks @ruchirK for the PR and @adriangb, @Jefffrey, @NGA-TRAN and @alamb for the input! -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Display sort order in Hash Repartition [datafusion]

2025-11-22 Thread via GitHub
gabotechs closed issue #18594: Display sort order in Hash Repartition URL: https://github.com/apache/datafusion/issues/18594 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: update Repartition DisplayAs to indicate maintained sort order [datafusion]

2025-11-22 Thread via GitHub
gabotechs merged PR #18673: URL: https://github.com/apache/datafusion/pull/18673 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Different result for different target partition size for clickbench Q32 [datafusion]

2025-11-22 Thread via GitHub
alchemist51 commented on issue #18863: URL: https://github.com/apache/datafusion/issues/18863#issuecomment-3566534716 Thanks folks for the clarification! Closing the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Different result for different target partition size for clickbench Q32 [datafusion]

2025-11-22 Thread via GitHub
alchemist51 closed issue #18863: Different result for different target partition size for clickbench Q32 URL: https://github.com/apache/datafusion/issues/18863 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Fast scalar path for array_slice [datafusion]

2025-11-22 Thread via GitHub
Jefffrey commented on issue #18458: URL: https://github.com/apache/datafusion/issues/18458#issuecomment-3566537544 Ah you're right about that πŸ€” Actually we could use `return_field_from_args` to control the return datatype based on if we have scalar inputs or not, that way we could ret

Re: [I] Fast scalar path for array_slice [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on issue #18458: URL: https://github.com/apache/datafusion/issues/18458#issuecomment-3567260813 > I don't know if it is preferable to always return view types or not πŸ€” I looked up and found that a similar idea has been adoped for `StringView` in #10918. I also

Re: [I] Limited Inline Comment Support in AST [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
xitep commented on issue #2065: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/2065#issuecomment-3566349938 ah, i'm sorry for being misleading :-/ i'll try to explain in more details: in my proof-of-concept program i do have a `fn find_between(&self, start: Location, e

Re: [PR] fix #18683 by enabling sort by aggregate [datafusion]

2025-11-22 Thread via GitHub
dqkqd commented on code in PR #18831: URL: https://github.com/apache/datafusion/pull/18831#discussion_r2552794225 ## datafusion/sql/src/select.rs: ## @@ -988,11 +1012,42 @@ impl SqlToRel<'_, S> { None }; +// Rewrite the ORDER BY expressions to use

[PR] Avoid repartition skew [datafusion]

2025-11-22 Thread via GitHub
Dandandan opened a new pull request, #18880: URL: https://github.com/apache/datafusion/pull/18880 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] Avoid repartition skew [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on code in PR #18880: URL: https://github.com/apache/datafusion/pull/18880#discussion_r2552840621 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -878,9 +878,9 @@ mod test { partition_row_counts.push(total_rows);

[PR] fix: preserve byte-size statistics in AggregateExec [datafusion]

2025-11-22 Thread via GitHub
Tamar-Posen opened a new pull request, #18885: URL: https://github.com/apache/datafusion/pull/18885 Previously, AggregateExec dropped total_byte_size statistics (Precision::Absent) through aggregation operations, preventing the optimizer from making informed decisions about memory allocatio

Re: [I] AggregateExec drops byte-size statistics causing incorrect join build-side selection and broken dynamic filtering [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on issue #18850: URL: https://github.com/apache/datafusion/issues/18850#issuecomment-3567117712 > The fully correct approach would involve post-aggregation statistics, as in Spark’s CBO, or rely on column-level statistics, similar to DuckDB, to estimate both row counts a

Re: [PR] Upgrade hashbrown to 0.16, keep 0.14 around [datafusion]

2025-11-22 Thread via GitHub
Dandandan commented on PR #18751: URL: https://github.com/apache/datafusion/pull/18751#issuecomment-3566960652 @alamb could you perhaps fire some benchmarks? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Prepare tokenizer for using borrowed strings instead of allocations. [datafusion-sqlparser-rs]

2025-11-22 Thread via GitHub
eyalleshem commented on PR #2073: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2073#issuecomment-3566962498 Thanks @iffyio! I've changed the PR to target this branch. Do we want to keep the branch protected to enforce reviews? -- This is an automated message from the Apac

[PR] feat: support `ListView` and `LargeListView` in `ScalarValue` [datafusion]

2025-11-22 Thread via GitHub
dqkqd opened a new pull request, #18884: URL: https://github.com/apache/datafusion/pull/18884 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?