Re: [I] Improve aggregate performance with adaptive sizing in accumulators / avoiding reallocations in accumulators [datafusion]

2025-08-12 Thread via GitHub
avantgardnerio commented on issue #7065: URL: https://github.com/apache/datafusion/issues/7065#issuecomment-3182347912 Have we considered using the `fallible_collections` crate to avoid OOMs in general? -- This is an automated message from the Apache Git Service. To respond to the message

Re: [I] [iceberg] TestSparkDataWrite show mismatch results [datafusion-comet]

2025-08-12 Thread via GitHub
hsiang-c commented on issue #2118: URL: https://github.com/apache/datafusion-comet/issues/2118#issuecomment-3182226876 This is fixed by https://github.com/apache/iceberg/pull/13793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [iceberg] `Comet execution only takes Arrow Arrays, but got class org.apache.iceberg.spark.data.vectorized.ColumnVectorWithFilter` [datafusion-comet]

2025-08-12 Thread via GitHub
hsiang-c commented on issue #2117: URL: https://github.com/apache/datafusion-comet/issues/2117#issuecomment-3182224035 This is fixed by https://github.com/apache/iceberg/pull/13793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [iceberg] Storage Partition Join (SPJ) returns mismatch results [datafusion-comet]

2025-08-12 Thread via GitHub
hsiang-c commented on issue #2119: URL: https://github.com/apache/datafusion-comet/issues/2119#issuecomment-3182225278 This is fixed by https://github.com/apache/iceberg/pull/13793 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Rewrite Nested Loop Join executor for 5× speed and 1% memory usage [datafusion]

2025-08-12 Thread via GitHub
2010YOUY01 commented on PR #16996: URL: https://github.com/apache/datafusion/pull/16996#issuecomment-3182175534 There are no new changes included. The speedup reaches 5× simply because the NLJ micro-benchmark is extended with cases where the join predicate is very cheap to evaluate (see ht

Re: [PR] Rewrite Nested Loop Join executor for 3.5× speed and 1% memory usage [datafusion]

2025-08-12 Thread via GitHub
2010YOUY01 commented on code in PR #16996: URL: https://github.com/apache/datafusion/pull/16996#discussion_r2272059908 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -660,529 +691,1168 @@ async fn collect_left_input( )) } -/// This enumeration represent

Re: [PR] chore(deps): Update sqlparser to 0.56 [datafusion]

2025-08-12 Thread via GitHub
Jefffrey commented on PR #16456: URL: https://github.com/apache/datafusion/pull/16456#issuecomment-3182124169 @Dimchikkk would you be able to pick up the work involving upgrade to 0.58 as mentioned? Otherwise I can build upon your work here to get this PR over the line 🙂 -- This is an a

[PR] [Draft] More accurate memory accounting in external sort [datafusion]

2025-08-12 Thread via GitHub
ding-young opened a new pull request, #17163: URL: https://github.com/apache/datafusion/pull/17163 ## Which issue does this PR close? - Closes #14748 and #16979 . ## Rationale for this change ## What changes are included in this PR? ## A

Re: [PR] #17128 Add support for chr(0) [datafusion]

2025-08-12 Thread via GitHub
Jefffrey commented on code in PR #17131: URL: https://github.com/apache/datafusion/pull/17131#discussion_r2271916893 ## datafusion/functions/src/string/ascii.rs: ## @@ -30,7 +30,7 @@ use std::sync::Arc; #[user_doc( doc_section(label = "String Functions"), -descriptio

Re: [PR] feat: Add optional extended metrics to sort_batch function [datafusion]

2025-08-12 Thread via GitHub
ding-young commented on code in PR #17147: URL: https://github.com/apache/datafusion/pull/17147#discussion_r2271887118 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -430,6 +430,9 @@ pub(crate) struct GroupedHashAggregateStream { /// Execution metrics

Re: [PR] Update dev env documentation to reflect pinned rust version [datafusion]

2025-08-12 Thread via GitHub
Jefffrey merged PR #17107: URL: https://github.com/apache/datafusion/pull/17107 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix column definition `COLLATE` parsing [datafusion-sqlparser-rs]

2025-08-12 Thread via GitHub
mvzink commented on code in PR #1986: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1986#discussion_r2271878827 ## src/parser/mod.rs: ## @@ -1248,6 +1248,12 @@ impl<'a> Parser<'a> { debug!("parsing expr"); let mut expr = self.parse_prefix()?; +

Re: [PR] fix: Iceberg scan buffer reuse [WIP] [datafusion-comet]

2025-08-12 Thread via GitHub
codecov-commenter commented on PR #2135: URL: https://github.com/apache/datafusion-comet/pull/2135#issuecomment-3181880951 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2135?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3181875196 Thank you very much for merging feature branch Andy . I created a new issue to extend on these changes and support ANSI mode for above arithmetic operations : #2137 (and r

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
Omega359 commented on PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#issuecomment-3181873419 I'll go through this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Enable ANSI support for arithmetic operations [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender commented on issue #2137: URL: https://github.com/apache/datafusion-comet/issues/2137#issuecomment-3181872476 Working on this (raised a draft PR) WIP . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[I] Enable ANSI support for arithmetic operations [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender opened a new issue, #2137: URL: https://github.com/apache/datafusion-comet/issues/2137 ### What is the problem the feature request solves? Now that Try eval mode is supported for native comet expressions, the goal is to extend it further and enable ANSI support ###

[PR] feat: init_ansi_mode_enabled [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender opened a new pull request, #2136: URL: https://github.com/apache/datafusion-comet/pull/2136 ## Which issue does this PR close? Support ANSI mode for arithmetic operations in Spark Closes #. ## Rationale for this change ## What changes are includ

[PR] fix: Iceberg scan buffer reuse [WIP] [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove opened a new pull request, #2135: URL: https://github.com/apache/datafusion-comet/pull/2135 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/2131 ## Rationale for this change Needed for Iceberg integrati

Re: [PR] fix: Quick fix for memory corruption in SortExec queries [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove closed pull request #2132: fix: Quick fix for memory corruption in SortExec queries URL: https://github.com/apache/datafusion-comet/pull/2132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[I] Remove emitWarning and add config option to log fallback reasons [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove opened a new issue, #2134: URL: https://github.com/apache/datafusion-comet/issues/2134 ### What is the problem the feature request solves? When Comet falls back to Spark, we tag the operator or expression with the reason using `withInfo`. Sometimes (but not often) we

Re: [PR] refactor `character_length` impl by unifying null handling logic [datafusion]

2025-08-12 Thread via GitHub
waynexia merged PR #16877: URL: https://github.com/apache/datafusion/pull/16877 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Add Memory Profiling Functionality [datafusion]

2025-08-12 Thread via GitHub
zheniasigayev commented on issue #14510: URL: https://github.com/apache/datafusion/issues/14510#issuecomment-3181216887 Very exciting progress! One thing that would be incredibly helpful is to extend the configuration flag, `--top-memory-consumers` from the DataFusion CLI into the configs (

Re: [PR] ci: add typo checker [datafusion]

2025-08-12 Thread via GitHub
waynexia commented on code in PR #17135: URL: https://github.com/apache/datafusion/pull/17135#discussion_r2271414447 ## .github/workflows/rust.yml: ## @@ -781,3 +781,11 @@ jobs: - name: Check datafusion-proto working-directory: datafusion/proto run: carg

Re: [I] [iceberg] Document configuration flags needed for Comet Iceberg to work correctly [datafusion-comet]

2025-08-12 Thread via GitHub
hsiang-c commented on issue #2092: URL: https://github.com/apache/datafusion-comet/issues/2092#issuecomment-3181198077 One more in `0.10.0` release ```shell "spark.comet.exec.shuffle.enabled" -> "false" ``` -- This is an automated message from the Apache Git Service.

Re: [PR] ci: add typo checker [datafusion]

2025-08-12 Thread via GitHub
waynexia commented on PR #17135: URL: https://github.com/apache/datafusion/pull/17135#issuecomment-3181192719 Thank you! I removed the job from this PR. I'll check if it's permitted while waiting for this PR to merge, and file another one dedicated for the checker -- This is an aut

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove merged PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] try_ arithmetic functions return incorrect results [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove closed issue #2021: try_ arithmetic functions return incorrect results URL: https://github.com/apache/datafusion-comet/issues/2021 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] refactor `character_length` impl by unifying null handling logic [datafusion]

2025-08-12 Thread via GitHub
waynexia commented on PR #16877: URL: https://github.com/apache/datafusion/pull/16877#issuecomment-3181185852 >Interestingly the times are significantly slower than what was listed above - this was run using Rust 1.89 on a m7i.4xlarge instance in aws. I run it with Rust 1.89 on AMD 79

Re: [I] Clean up APIs around `FileScanConfigBuilder`, `FileScanConfig` and `FileSource` [datafusion]

2025-08-12 Thread via GitHub
friendlymatthew commented on issue #15952: URL: https://github.com/apache/datafusion/issues/15952#issuecomment-3181185291 Hi, I've been staring at `FileScanConfig` for the past day now and had some thoughts about the redesign. ## One approach: create a new struct `FileScan` over the

Re: [PR] Pass the input schema to stats_projection for ProjectionExpr [datafusion]

2025-08-12 Thread via GitHub
hareshkh commented on PR #17123: URL: https://github.com/apache/datafusion/pull/17123#issuecomment-3181171024 @alamb : Yes, once this PR merges, I can create a PR to the `branch-49` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [I] [iceberg] SIGSEGV while testing Storage Partition Join [datafusion-comet]

2025-08-12 Thread via GitHub
hsiang-c commented on issue #2121: URL: https://github.com/apache/datafusion-comet/issues/2121#issuecomment-3181156145 The `SIGSEGV` issue is gone in the latest CI run: https://github.com/apache/datafusion-comet/actions/runs/16919663072/job/47943854988 -- This is an automated message fro

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3181147532 @andygrove the checks have all passed. Thank you for your approval .Please merge once you get a chance -- This is an automated message from the Apache Git Service. To res

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
alamb commented on PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#issuecomment-3181146539 Ok, I just pushed the updates due to the latest changes on main. I think the changes here now look good / expected, but would appreciate another set of eyes (I also need another com

Re: [PR] feat: Make parquet_encryption a non-default feature [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17137: URL: https://github.com/apache/datafusion/pull/17137#issuecomment-3181138557 Updating to get clean CI run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [I] Test TopK Dynamic Filter Optimization w/ ORDER BY on multiple columns [datafusion]

2025-08-12 Thread via GitHub
adriangb commented on issue #16464: URL: https://github.com/apache/datafusion/issues/16464#issuecomment-3181137065 @alamb maybe we've fixed it or it was never reproducible and unfortunately I can't find your original comment, but I'm adding a test in https://github.com/apache/datafusion/pul

Re: [I] `security_audit` CI check is failing on main [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #17144: `security_audit` CI check is failing on main URL: https://github.com/apache/datafusion/issues/17144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] chore(deps): bump slab from 0.4.10 to 0.4.11 [datafusion]

2025-08-12 Thread via GitHub
alamb merged PR #17161: URL: https://github.com/apache/datafusion/pull/17161 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[I] Inconsistent handling of CopyExec with string expressions [datafusion-comet]

2025-08-12 Thread via GitHub
andygrove opened a new issue, #2133: URL: https://github.com/apache/datafusion-comet/issues/2133 ### Describe the bug In `QueryPlanSerde`, there is specific handling of some string expressions for `FilterExec`. ```scala // Some native expressions do not support ope

[PR] add test for multi-column topk dynamic filter pushdown [datafusion]

2025-08-12 Thread via GitHub
adriangb opened a new pull request, #17162: URL: https://github.com/apache/datafusion/pull/17162 closes #16464 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscrib

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

2025-08-12 Thread via GitHub
jonathanc-n commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3181045863 Yes I think so too, I don't know if it will be worth the complexity though since this is a very niche workload (single range filter + higher selectivity for filter) I thin

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3181042350 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3181044646 @dependabot recreate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-08-12 Thread via GitHub
dependabot[bot] commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3181042476 Looks like this PR has been edited by someone other than Dependabot. That means Dependabot can't rebase it - sorry! If you're happy for Dependabot to recreate it from s

Re: [PR] chore(deps): bump on-headers and compression in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2025-08-12 Thread via GitHub
alamb merged PR #16812: URL: https://github.com/apache/datafusion/pull/16812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore(deps): bump clap from 4.5.43 to 4.5.44 [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17148: URL: https://github.com/apache/datafusion/pull/17148#issuecomment-3181039345 Security audit will be fixed via - https://github.com/apache/datafusion/pull/17161 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] chore(deps): bump thiserror from 2.0.12 to 2.0.14 [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17160: URL: https://github.com/apache/datafusion/pull/17160#issuecomment-3181037086 Security audit failing due to - https://github.com/apache/datafusion/pull/17161 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [I] `security_audit` CI check is failing on main [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #17144: URL: https://github.com/apache/datafusion/issues/17144#issuecomment-3181035671 Dependabot to the rescue: - https://github.com/apache/datafusion/pull/17161 -- This is an automated message from the Apache Git Service. To respond to the message, please log

[PR] chore(deps): bump slab from 0.4.10 to 0.4.11 [datafusion]

2025-08-12 Thread via GitHub
dependabot[bot] opened a new pull request, #17161: URL: https://github.com/apache/datafusion/pull/17161 Bumps [slab](https://github.com/tokio-rs/slab) from 0.4.10 to 0.4.11. Release notes Sourced from https://github.com/tokio-rs/slab/releases";>slab's releases. v0.4.11

Re: [I] `security_audit` CI check is failing on main [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #17144: URL: https://github.com/apache/datafusion/issues/17144#issuecomment-3181026960 Example failure: https://github.com/apache/datafusion/actions/runs/16910959426/job/47912188340?pr=17137 -- This is an automated message from the Apache Git Service. To respond t

Re: [PR] feat: Make parquet_encryption a non-default feature [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17137: URL: https://github.com/apache/datafusion/pull/17137#issuecomment-3181026261 CI failure tracked by - https://github.com/apache/datafusion/issues/17144 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-12 Thread via GitHub
BlakeOrth commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3180996128 @alamb > If you are willing to potentially help with this work, I can spec it out in a ticket / epic. This sounds great. As you've likely noted I've already star

Re: [PR] fix: rpad_bug_fix [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender commented on code in PR #2099: URL: https://github.com/apache/datafusion-comet/pull/2099#discussion_r2271180183 ## native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs: ## @@ -71,44 +101,101 @@ fn spark_read_side_padding2( } } +enum RPa

Re: [PR] Differentiate 0-row and 1-row EmptyRelation in EXPLAIN [datafusion]

2025-08-12 Thread via GitHub
findepi commented on PR #17145: URL: https://github.com/apache/datafusion/pull/17145#issuecomment-3180964097 Rebased after https://github.com/apache/datafusion/pull/17139 merged, to resolve logical conflicts. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] fix: rpad_bug_fix [datafusion-comet]

2025-08-12 Thread via GitHub
coderfender commented on code in PR #2099: URL: https://github.com/apache/datafusion-comet/pull/2099#discussion_r2271177077 ## native/spark-expr/src/static_invoke/char_varchar_utils/read_side_padding.rs: ## @@ -71,44 +101,101 @@ fn spark_read_side_padding2( } } +enum RPa

[PR] chore(deps): bump thiserror from 2.0.12 to 2.0.14 [datafusion]

2025-08-12 Thread via GitHub
dependabot[bot] opened a new pull request, #17160: URL: https://github.com/apache/datafusion/pull/17160 Bumps [thiserror](https://github.com/dtolnay/thiserror) from 2.0.12 to 2.0.14. Release notes Sourced from https://github.com/dtolnay/thiserror/releases";>thiserror's releases.

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
findepi commented on PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#issuecomment-3180945572 My 2c is that, if Upgrade guide starts to look like a change log / release notes, it will lose it's core value. IMO it should contain only actionable information, so that all the

Re: [PR] #17128 Add support for chr(0) [datafusion]

2025-08-12 Thread via GitHub
pepijnve commented on code in PR #17131: URL: https://github.com/apache/datafusion/pull/17131#discussion_r2271156842 ## datafusion/functions/src/string/chr.rs: ## @@ -47,22 +47,14 @@ pub fn chr(args: &[ArrayRef]) -> Result { for integer in integer_array { match int

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-12 Thread via GitHub
parthchandra commented on PR #2100: URL: https://github.com/apache/datafusion-comet/pull/2100#issuecomment-3180938744 @josh0yeh The PR is causing a regressions. Specifically, tests are failing because of - ``` [info] Cause: java.lang.UnsupportedOperationException: visit in is not

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
findepi merged PR #17139: URL: https://github.com/apache/datafusion/pull/17139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [I] Redundant aggregation elimination regression [datafusion]

2025-08-12 Thread via GitHub
findepi closed issue #17138: Redundant aggregation elimination regression URL: https://github.com/apache/datafusion/issues/17138 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
alamb commented on PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#issuecomment-3180929338 > Nothing changed in this regard, and IMO it's waste of reader's time to mention that in Upgrade guide. I think it is worth putting in the upgrade guide so that if anyone hits

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
alamb commented on code in PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#discussion_r2271148887 ## data/sqlite/random/aggregates/slt_good_119.slt: ## @@ -670,13 +670,10 @@ SELECT ALL * FROM tab1 WHERE NOT + - CAST ( NULL AS INTEGER ) + - col2 - + col0

Re: [PR] chore: Clarify `EmptyRelation` description [datafusion]

2025-08-12 Thread via GitHub
comphead merged PR #17157: URL: https://github.com/apache/datafusion/pull/17157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
findepi commented on PR #17139: URL: https://github.com/apache/datafusion/pull/17139#issuecomment-3180929800 Merging to unblock - https://github.com/apache/datafusion-testing/pull/10. -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
findepi commented on code in PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#discussion_r2271142832 ## data/sqlite/random/aggregates/slt_good_119.slt: ## @@ -670,13 +670,10 @@ SELECT ALL * FROM tab1 WHERE NOT + - CAST ( NULL AS INTEGER ) + - col2 - + col0 ---

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
alamb commented on code in PR #17139: URL: https://github.com/apache/datafusion/pull/17139#discussion_r2271141129 ## datafusion/sqllogictest/test_files/issue_17138.slt: ## @@ -0,0 +1,36 @@ +statement ok Review Comment: Yeah, maybe that would be better. But if there is alread

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
alamb commented on code in PR #17139: URL: https://github.com/apache/datafusion/pull/17139#discussion_r2271139363 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -153,23 +153,16 @@ fn optimize_projections( // Only use the absolutely necessary aggreg

Re: [I] Replace `AggregateUDFImpl::{equals,hash_value}` with `UdfHash`, `UdfEq` traits [datafusion]

2025-08-12 Thread via GitHub
findepi closed issue #16872: Replace `AggregateUDFImpl::{equals,hash_value}` with `UdfHash`, `UdfEq` traits URL: https://github.com/apache/datafusion/issues/16872 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [I] Implement PartialEq, Hash for all UDAFs (`AggregateUDFImpl`) [datafusion]

2025-08-12 Thread via GitHub
findepi closed issue #16869: Implement PartialEq, Hash for all UDAFs (`AggregateUDFImpl`) URL: https://github.com/apache/datafusion/issues/16869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Derive `AggregateUDFImpl` equality, hash from `Eq`, `Hash` traits [datafusion]

2025-08-12 Thread via GitHub
findepi merged PR #17130: URL: https://github.com/apache/datafusion/pull/17130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Update tests due to new simplification rules [datafusion-testing]

2025-08-12 Thread via GitHub
findepi commented on PR #10: URL: https://github.com/apache/datafusion-testing/pull/10#issuecomment-3180920166 > What do we think about adding a note to the upgrade guide explaining that some queries may now succeed (b/c they are optimized away)? That might help anyone upgrading and seeing

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3180916288 > At the end of the day I'm going to be working on some way to get listing resulted cached, and I'd much rather make those changes here to contribute back to open source than keep

Re: [PR] chore: Enforce checks for RC branches [datafusion]

2025-08-12 Thread via GitHub
findepi commented on code in PR #17132: URL: https://github.com/apache/datafusion/pull/17132#discussion_r2271130926 ## .asf.yaml: ## @@ -50,6 +50,74 @@ github: main: required_pull_request_reviews: required_approving_review_count: 1 +# needs to be updated

Re: [I] Incorrect implementation of `PartialOrd` for `ScalarUDF`, `WindowUDF` and `AggregateUDF` [datafusion]

2025-08-12 Thread via GitHub
findepi commented on issue #17064: URL: https://github.com/apache/datafusion/issues/17064#issuecomment-3180908298 Getting it consistent would require delegating to UDF impl, just like we do with Eq and Hash in https://github.com/apache/datafusion/issues/16677. (and thus a breaking change).

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
findepi commented on code in PR #17139: URL: https://github.com/apache/datafusion/pull/17139#discussion_r2271116423 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -153,23 +153,16 @@ fn optimize_projections( // Only use the absolutely necessary aggr

Re: [I] Optimize `ORDER BY time DESC LIMIT 1` queries ( TopK or aggr rewrite) [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #17098: URL: https://github.com/apache/datafusion/issues/17098#issuecomment-3180893182 sure -- feel free @Thearas However, FYI I am not sure how easy/hard this will be -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] Docs: Consolidate feature proposal content into roadmap [datafusion]

2025-08-12 Thread via GitHub
alamb commented on code in PR #17156: URL: https://github.com/apache/datafusion/pull/17156#discussion_r2271089079 ## docs/source/contributor-guide/roadmap.md: ## @@ -56,3 +56,71 @@ For more information: 1. [Search for issues labeled `roadmap`](https://github.com/apache/datafus

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-12 Thread via GitHub
comphead commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3180870723 > Thank you @friendlymatthew and @adriangb -- I think this change makes sense to me > > Is it ok with you too @comphead ? I think we were inclined to discuss refactor `

Re: [I] [Parquet Metadata Cache] Add an API to review the contents of the Cache [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #17091: [Parquet Metadata Cache] Add an API to review the contents of the Cache URL: https://github.com/apache/datafusion/issues/17091 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Add the ability to review the contents of the Metadata Cache [datafusion]

2025-08-12 Thread via GitHub
alamb merged PR #17126: URL: https://github.com/apache/datafusion/pull/17126 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Add the ability to review the contents of the Metadata Cache [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17126: URL: https://github.com/apache/datafusion/pull/17126#issuecomment-3180863758 Thanks again @nuno-faria -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Update workspace to use Rust 1.89 [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #17072: Update workspace to use Rust 1.89 URL: https://github.com/apache/datafusion/issues/17072 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] Update workspace to use Rust 1.89 [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17100: URL: https://github.com/apache/datafusion/pull/17100#issuecomment-3180860848 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Update workspace to use Rust 1.89 [datafusion]

2025-08-12 Thread via GitHub
alamb merged PR #17100: URL: https://github.com/apache/datafusion/pull/17100 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3180857144 > > Do we have an issue/RFC to discuss the future refactor? > > The closest thing to that in my mind is #15952 I marked this one with my experimental "PROPOSED EPIC" tag

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
comphead commented on code in PR #17139: URL: https://github.com/apache/datafusion/pull/17139#discussion_r2271059592 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -153,23 +153,16 @@ fn optimize_projections( // Only use the absolutely necessary agg

Re: [PR] feat: add `datafusion-physical-adapter`, implement predicate adaptation missing fields of structs [datafusion]

2025-08-12 Thread via GitHub
alamb commented on PR #16589: URL: https://github.com/apache/datafusion/pull/16589#issuecomment-3180841833 @kosiew is this PR ok with you? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] [Epic] A collection of Substrait conversion issues [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #16248: URL: https://github.com/apache/datafusion/issues/16248#issuecomment-3180838120 Here is another list of potential subtrait issues - https://github.com/apache/datafusion/issues/17159 -- This is an automated message from the Apache Git Service. To respond

Re: [I] [EPIC] Tracking issue of support substrait logical plan [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #8149: URL: https://github.com/apache/datafusion/issues/8149#issuecomment-3180833881 I consolidated the outstanding substrait work in a new epic, so let's close this one and continue discussion there - https://github.com/apache/datafusion/issues/17159 -- This

Re: [I] [EPIC] Tracking issue of support substrait logical plan [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #8149: [EPIC] Tracking issue of support substrait logical plan URL: https://github.com/apache/datafusion/issues/8149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] [EPIC] Substrait: Add producer and consumer for physical plans [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #5173: URL: https://github.com/apache/datafusion/issues/5173#issuecomment-3180833582 I consolidated the outstanding substrait work in a new epic, so let's close this one and continue discussion there - https://github.com/apache/datafusion/issues/17159 -- This

Re: [I] [EPIC] Substrait: Add producer and consumer for physical plans [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #5173: [EPIC] Substrait: Add producer and consumer for physical plans URL: https://github.com/apache/datafusion/issues/5173 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

[I] [EPIC] Substrait: Additional features for producer and consumer for physical plans [datafusion]

2025-08-12 Thread via GitHub
alamb opened a new issue, #17159: URL: https://github.com/apache/datafusion/issues/17159 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** This is a collection of remaining items for substrait physical plans **Describe the sol

Re: [I] [EPIC] Optimize the `CONCAT` expression in logical plan [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #3599: URL: https://github.com/apache/datafusion/issues/3599#issuecomment-3180812282 I filed the last remaining issue as - https://github.com/apache/datafusion/issues/17158 I don't think the epic needs to be open anymore -- we can just track the work in the

Re: [I] [EPIC] Optimize the `CONCAT` expression in logical plan [datafusion]

2025-08-12 Thread via GitHub
alamb closed issue #3599: [EPIC] Optimize the `CONCAT` expression in logical plan URL: https://github.com/apache/datafusion/issues/3599 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Discussion: DataFusion Improvement Proposal (DIPs) Process? [datafusion]

2025-08-12 Thread via GitHub
alamb commented on issue #16886: URL: https://github.com/apache/datafusion/issues/16886#issuecomment-3180796316 ALso, for my own sanity I tried out adding some new [labels](https://github.com/apache/datafusion/issues/labels) to help find proposals Specifically, I added two labels [`E

[I] `col1 || 'a' || 'b' || col2 -> col1 || 'ab' || col2` [datafusion]

2025-08-12 Thread via GitHub
alamb opened a new issue, #17158: URL: https://github.com/apache/datafusion/issues/17158 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail

Re: [PR] fix: Quick fix for memory corruption in SortExec queries [datafusion-comet]

2025-08-12 Thread via GitHub
codecov-commenter commented on PR #2132: URL: https://github.com/apache/datafusion-comet/pull/2132#issuecomment-3180765800 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2132?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Eliminate all redundant aggregations [datafusion]

2025-08-12 Thread via GitHub
findepi commented on code in PR #17139: URL: https://github.com/apache/datafusion/pull/17139#discussion_r2270995227 ## datafusion/optimizer/src/optimize_projections/mod.rs: ## @@ -153,23 +153,16 @@ fn optimize_projections( // Only use the absolutely necessary aggr

  1   2   3   4   >