Re: [I] Postgres: Support negative scale for `NUMERIC` [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
iffyio closed issue #1923: Postgres: Support negative scale for `NUMERIC` URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1923 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Postgres: enhance NUMERIC/DECIMAL parsing to support negative scale [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
iffyio merged PR #1990: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1990 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [I] Postgres: Incorrect ending line reported in `Span` for multi-line query [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
iffyio closed issue #1858: Postgres: Incorrect ending line reported in `Span` for multi-line query URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] feat: Include end token in `ALTER TABLE` statement [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
iffyio merged PR #1999: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1999 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-08 Thread via GitHub
shehabgamin commented on code in PR #17022: URL: https://github.com/apache/datafusion/pull/17022#discussion_r2264513839 ## datafusion/core/tests/parquet/page_pruning.rs: ## @@ -903,8 +903,8 @@ async fn without_pushdown_filter() { ) .unwrap(); -// Same amount of b

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-08 Thread via GitHub
comphead commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2264507470 ## native/proto/src/proto/types.proto: ## @@ -0,0 +1,41 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [I] Implement write support in VirtualObjectStore [datafusion]

2025-08-08 Thread via GitHub
kosiew commented on issue #17086: URL: https://github.com/apache/datafusion/issues/17086#issuecomment-3169846380 After more thought on this, I think it is more reasonable that VirtualObjectStore stays read-only. In practice when you write you already know which backend you want (S3, local F

Re: [I] Implement write support in VirtualObjectStore [datafusion]

2025-08-08 Thread via GitHub
kosiew closed issue #17086: Implement write support in VirtualObjectStore URL: https://github.com/apache/datafusion/issues/17086 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[PR] Revive to use upstream arrow coalesce [datafusion]

2025-08-08 Thread via GitHub
zhuqi-lucas opened a new pull request, #17105: URL: https://github.com/apache/datafusion/pull/17105 ## Which issue does this PR close? Revive https://github.com/apache/datafusion/pull/16249 ## Rationale for this change Revive https://github.com/apache/datafusion/pull/

Re: [PR] Feat: Revive to use upstream arrow coalesce [datafusion]

2025-08-08 Thread via GitHub
zhuqi-lucas commented on PR #17105: URL: https://github.com/apache/datafusion/pull/17105#issuecomment-3169768311 FYI @alamb @Dandandan I try to revive the PR https://github.com/apache/datafusion/pull/16249, and we may run the benchmark for this PR to see if any changes since then, th

Re: [PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
zhuqi-lucas commented on code in PR #99: URL: https://github.com/apache/datafusion-site/pull/99#discussion_r2264201932 ## content/blog/2025-08-15-external-parquet-indexes.md: ## @@ -0,0 +1,772 @@ +--- +layout: post +title: Using External Indexes, Metadata Stores, Catalogs and Ca

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-08-08 Thread via GitHub
github-actions[bot] closed pull request #16166: Set Formatted TableOptions Enum URL: https://github.com/apache/datafusion/pull/16166 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Test experiment load page index with columns [datafusion]

2025-08-08 Thread via GitHub
github-actions[bot] commented on PR #16329: URL: https://github.com/apache/datafusion/pull/16329#issuecomment-3169654735 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Chore: refactor datetime related expressions out of QueryPlanSerde [datafusion-comet]

2025-08-08 Thread via GitHub
andygrove merged PR #2085: URL: https://github.com/apache/datafusion-comet/pull/2085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
shehabgamin commented on PR #99: URL: https://github.com/apache/datafusion-site/pull/99#issuecomment-3169605025 Really solid blog post! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Postgres: enhance NUMERIC/DECIMAL parsing to support negative scale [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
IndexSeek commented on code in PR #1990: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1990#discussion_r2264190270 ## src/parser/mod.rs: ## @@ -11229,6 +11229,30 @@ impl<'a> Parser<'a> { } } +/// Parse an optionally signed integer literal. +

Re: [PR] Postgres: enhance NUMERIC/DECIMAL parsing to support negative scale [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
IndexSeek commented on code in PR #1990: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1990#discussion_r2264190270 ## src/parser/mod.rs: ## @@ -11229,6 +11229,30 @@ impl<'a> Parser<'a> { } } +/// Parse an optionally signed integer literal. +

Re: [PR] fix: rpad_bug_fix [datafusion-comet]

2025-08-08 Thread via GitHub
codecov-commenter commented on PR #2099: URL: https://github.com/apache/datafusion-comet/pull/2099#issuecomment-3169626650 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2099?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: include end token in ALTER TABLE statement for span calculation [datafusion-sqlparser-rs]

2025-08-08 Thread via GitHub
IndexSeek commented on code in PR #1999: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1999#discussion_r2264184902 ## tests/sqlparser_common.rs: ## @@ -4758,6 +4758,25 @@ fn parse_alter_table() { } } +#[test] +fn alter_table_span_includes_semicolon() { +

Re: [PR] fix: Add missing member to visitor for ConfigFileEncryptionProperties [datafusion]

2025-08-08 Thread via GitHub
corwinjoy commented on PR #17103: URL: https://github.com/apache/datafusion/pull/17103#issuecomment-3169612864 @adamreeve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: Add missing member to visitor for ConfigFileEncryptionProperties [datafusion]

2025-08-08 Thread via GitHub
corwinjoy commented on PR #17103: URL: https://github.com/apache/datafusion/pull/17103#issuecomment-3169612801 I'm unsure what's causing the failed CI test. I don't see an error locally, and it is flagging a nonexistent line number, so I don't know what to fix. -- This is an automated mes

Re: [I] rpad expression panics if length input is not a literal value [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on issue #2096: URL: https://github.com/apache/datafusion-comet/issues/2096#issuecomment-3169605057 Raised a PR with explanation on the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Implment Spark `map` function `map` [datafusion]

2025-08-08 Thread via GitHub
Standing-Man commented on PR #16940: URL: https://github.com/apache/datafusion/pull/16940#issuecomment-3169598741 Hi @alamb, no rush, but when you have some time, I’d love your review on this PR. Thanks in advance! -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] fix: rpad_bug_fix [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on PR #2099: URL: https://github.com/apache/datafusion-comet/pull/2099#issuecomment-3169604791 @andygrove , The issue is with implementation of `rpad` to only support col,int signature . Rather than reverting to native spark code, I went ahead and implemented native

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-08 Thread via GitHub
shehabgamin commented on code in PR #17022: URL: https://github.com/apache/datafusion/pull/17022#discussion_r2264173720 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -968,19 +973,41 @@ pub async fn fetch_parquet_metadata( meta: &ObjectMeta, size_hint: Opti

Re: [PR] feat: Use Cached Metadata for ListingTable Statistics [datafusion]

2025-08-08 Thread via GitHub
shehabgamin commented on code in PR #17022: URL: https://github.com/apache/datafusion/pull/17022#discussion_r2264173426 ## datafusion/datasource-parquet/src/mod.rs: ## @@ -24,7 +24,7 @@ pub mod file_format; mod metrics; mod opener; mod page_filter; -mod reader; +pub mod reade

Re: [I] minor: Incorrect visit function for ConfigFileEncryption properties [datafusion]

2025-08-08 Thread via GitHub
corwinjoy commented on issue #17104: URL: https://github.com/apache/datafusion/issues/17104#issuecomment-3169598205 Fixed by https://github.com/apache/datafusion/pull/17103. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[I] minor: Incorrect visit function for ConfigFileEncryption properties [datafusion]

2025-08-08 Thread via GitHub
corwinjoy opened a new issue, #17104: URL: https://github.com/apache/datafusion/issues/17104 ### Describe the bug The visit function for ConfigFileEncryption properties failed to visit the member column_encryption_properties. ### To Reproduce _No response_ ### Exp

[PR] fix: Add missing member to visitor for ConfigFileEncryptionProperties [datafusion]

2025-08-08 Thread via GitHub
corwinjoy opened a new pull request, #17103: URL: https://github.com/apache/datafusion/pull/17103 ## Which issue does this PR close? - Closes #. ## Rationale for this change The visit function for `ConfigFileEncryption` properties failed to visit the member `colu

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-08 Thread via GitHub
parthchandra commented on code in PR #2100: URL: https://github.com/apache/datafusion-comet/pull/2100#discussion_r2264165976 ## spark/src/main/scala/org/apache/comet/parquet/CometNativeFilters.scala: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (ASF) u

Re: [PR] fix: Make cast from float/double to decimal compatible with Spark [datafusion-comet]

2025-08-08 Thread via GitHub
parthchandra commented on PR #1915: URL: https://github.com/apache/datafusion-comet/pull/1915#issuecomment-3169545796 @andygrove It does look like the creator of dragonbox is recommending we fork their code. It does create some bloat if we include it in Comet. Is there any other alternativ

[PR] Make macros in common::test_util hygenic and not dependent on user dependencies [datafusion]

2025-08-08 Thread via GitHub
AdamGS opened a new pull request, #17102: URL: https://github.com/apache/datafusion/pull/17102 ## Which issue does this PR close? - Closes #17101. ## Rationale for this change Make both macros slightly nicer to use, and prevent some potentially confusing error messag

[I] A couple of macros in `datafusion-common` aren't hygenic, requiring the arrow crate as a dependency [datafusion]

2025-08-08 Thread via GitHub
AdamGS opened a new issue, #17101: URL: https://github.com/apache/datafusion/issues/17101 Both `create_array!` and `record_batch!` currently require `arrow` as a dependency exactly, not allowing users to pull subcrates or even re-use the re-exported arrow version. -- This is an automated

Re: [I] [Parquet Metadata Cache] Document the parquet metadata cache [datafusion]

2025-08-08 Thread via GitHub
shruti2522 commented on issue #17048: URL: https://github.com/apache/datafusion/issues/17048#issuecomment-3169431248 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: Remove duplicate serde code [datafusion-comet]

2025-08-08 Thread via GitHub
kazuyukitanimura commented on code in PR #2098: URL: https://github.com/apache/datafusion-comet/pull/2098#discussion_r2264066285 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -852,62 +852,6 @@ object QueryPlanSerde extends Logging with CometExprShim

Re: [PR] feat: Support Array Literal [datafusion-comet]

2025-08-08 Thread via GitHub
parthchandra commented on code in PR #2057: URL: https://github.com/apache/datafusion-comet/pull/2057#discussion_r2264063147 ## native/proto/src/proto/types.proto: ## @@ -0,0 +1,41 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[PR] Update workspace to use Rust 1.89 [datafusion]

2025-08-08 Thread via GitHub
shruti2522 opened a new pull request, #17100: URL: https://github.com/apache/datafusion/pull/17100 ## Which issue does this PR close? - Closes #17072 . ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-08 Thread via GitHub
codecov-commenter commented on PR #2100: URL: https://github.com/apache/datafusion-comet/pull/2100#issuecomment-3169397214 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2100?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-08 Thread via GitHub
kazuyukitanimura commented on code in PR #2100: URL: https://github.com/apache/datafusion-comet/pull/2100#discussion_r2264041752 ## spark/src/main/scala/org/apache/comet/parquet/CometNativeFilters.scala: ## @@ -0,0 +1,211 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-08 Thread via GitHub
josh0yeh commented on PR #2100: URL: https://github.com/apache/datafusion-comet/pull/2100#issuecomment-3169331714 CC: @andygrove @parthchandra (please feel free to cc others I might overlook) -- This is an automated message from the Apache Git Service. To respond to the message, please lo

[PR] fix: use spark ParquetFilters [datafusion-comet]

2025-08-08 Thread via GitHub
josh0yeh opened a new pull request, #2100: URL: https://github.com/apache/datafusion-comet/pull/2100 Before Spark 3.4.3, `ParquetFilters` were added originally so we could shade Parquet in Comet. Since, `ParquetFilters` has updated In/NotIn pushdown which comet needs. Remove `ParquetFilters

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3169248160 @andygrove , thank you for restarting the failed job and glad to see that the checks have all passed. Please review once you get a chance and let me know if you think we ne

Re: [PR] Derive `WindowUDFImpl` equality, hash from `Eq`, `Hash` traits [datafusion]

2025-08-08 Thread via GitHub
findepi commented on code in PR #17081: URL: https://github.com/apache/datafusion/pull/17081#discussion_r2263934637 ## datafusion/expr/src/udwf.rs: ## @@ -305,10 +305,7 @@ where /// .build() /// .unwrap(); /// ``` -pub trait WindowUDFImpl: Debug + Send + Sync { -/

Re: [PR] fix : implement_try_eval_mode_arithmetic [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on PR #2073: URL: https://github.com/apache/datafusion-comet/pull/2073#issuecomment-3169097452 One of the TPC-H check failed with a network exception. @andygrove could you please re trigger that workflow whenever you get a chance? Thank you -- This is an automa

Re: [PR] Derive `WindowUDFImpl` equality, hash from `Eq`, `Hash` traits [datafusion]

2025-08-08 Thread via GitHub
findepi commented on PR #17081: URL: https://github.com/apache/datafusion/pull/17081#issuecomment-3169195271 The dependency PRs have been merged, will rebase - https://github.com/apache/datafusion/pull/17078 - https://github.com/apache/datafusion/pull/17080 - https://github.com/a

Re: [PR] Fill missing methods in aliased UDF impls [datafusion]

2025-08-08 Thread via GitHub
alamb commented on code in PR #17080: URL: https://github.com/apache/datafusion/pull/17080#discussion_r2263843097 ## datafusion/expr/src/udaf.rs: ## @@ -1059,6 +1056,7 @@ impl AliasedAggregateUDFImpl { } } +#[warn(clippy::missing_trait_methods)] // Delegates, so it shoul

Re: [PR] Fill missing methods in aliased UDF impls [datafusion]

2025-08-08 Thread via GitHub
alamb merged PR #17080: URL: https://github.com/apache/datafusion/pull/17080 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add Memory Profiling Support to DataFusion CLI [datafusion]

2025-08-08 Thread via GitHub
alamb commented on code in PR #17021: URL: https://github.com/apache/datafusion/pull/17021#discussion_r2263831579 ## datafusion-cli/src/exec.rs: ## @@ -227,9 +227,19 @@ pub(super) async fn exec_and_print( let statements = DFParser::parse_sql_with_dialect(&sql, dialect.as_

Re: [PR] Add VirtualObjectStore to support routing paths to multiple ObjectStores [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #17084: URL: https://github.com/apache/datafusion/pull/17084#issuecomment-3168958975 FYI @EmilyMatt -- I am not sure if you have seen what @kosiew is working on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] Pin github actions to commit sha [datafusion]

2025-08-08 Thread via GitHub
findepi commented on PR #16964: URL: https://github.com/apache/datafusion/pull/16964#issuecomment-3169060386 I think it would make sense to add some validation to GH workflows that all 3rd party actions are indeed sha-pinned. It can be yet another workflow, but it can also be just a Rust

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3168980534 Thanks again @adamreeve -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Fix hash/equality issues for ScalarFunctionExpr [datafusion]

2025-08-08 Thread via GitHub
findepi commented on PR #17078: URL: https://github.com/apache/datafusion/pull/17078#issuecomment-3169039418 > `a == b` will effectively call `PartialEq` yes > and we dont need separate impl for `Eq`? `Eq` is a marker trait. Implementing `Eq` is a promise that `P

Re: [PR] Run config_docs CI check on PRs to change auto generated docs [datafusion]

2025-08-08 Thread via GitHub
findepi commented on code in PR #17046: URL: https://github.com/apache/datafusion/pull/17046#discussion_r2263819885 ## .github/workflows/docs_generated.yaml: ## @@ -0,0 +1,77 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreem

Re: [PR] feat: Add `Arc` to `ScalarFunctionArgs`, don't copy `ConfigOptions` on each query [datafusion]

2025-08-08 Thread via GitHub
findepi commented on code in PR #16970: URL: https://github.com/apache/datafusion/pull/16970#discussion_r2263815378 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -164,6 +175,42 @@ impl fmt::Display for ScalarFunctionExpr { } } +impl DynEq for ScalarFunctionEx

[PR] Improve Hash speed for ScalarFunctionExpr [datafusion]

2025-08-08 Thread via GitHub
findepi opened a new pull request, #17099: URL: https://github.com/apache/datafusion/pull/17099 implements: - https://github.com/apache/datafusion/pull/17078#discussion_r2263312644 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] Fix hash/equality issues for ScalarFunctionExpr [datafusion]

2025-08-08 Thread via GitHub
findepi commented on code in PR #17078: URL: https://github.com/apache/datafusion/pull/17078#discussion_r2263813896 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -175,42 +173,47 @@ impl fmt::Display for ScalarFunctionExpr { } } -impl DynEq for ScalarFunctionE

Re: [PR] Fix hash/equality issues for ScalarFunctionExpr [datafusion]

2025-08-08 Thread via GitHub
findepi commented on code in PR #17078: URL: https://github.com/apache/datafusion/pull/17078#discussion_r2263811540 ## datafusion/physical-expr/src/scalar_function.rs: ## @@ -175,42 +173,47 @@ impl fmt::Display for ScalarFunctionExpr { } } -impl DynEq for ScalarFunctionE

Re: [PR] Fix hash/equality issues for ScalarFunctionExpr [datafusion]

2025-08-08 Thread via GitHub
findepi merged PR #17078: URL: https://github.com/apache/datafusion/pull/17078 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafu

Re: [PR] Run config_docs CI check on PRs to change auto generated docs [datafusion]

2025-08-08 Thread via GitHub
gopidesupavan commented on code in PR #17046: URL: https://github.com/apache/datafusion/pull/17046#discussion_r2263797875 ## .github/workflows/docs_generated.yaml: ## @@ -0,0 +1,77 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license

Re: [PR] (Re)Support old syntax for `approx_percentile_cont` and `approx_percentile_cont_with_weight` [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16999: URL: https://github.com/apache/datafusion/pull/16999#issuecomment-3169021559 > The docs need to be updated too. In 6c283dfca -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] Blog post about using external indexes with Parquet [datafusion]

2025-08-08 Thread via GitHub
alamb commented on issue #17010: URL: https://github.com/apache/datafusion/issues/17010#issuecomment-3168966927 The blog is ready for review PR: - https://github.com/apache/datafusion-site/pull/99 Rendered Preview: https://datafusion.staged.apache.org/blog/2025/08/15/externa

Re: [PR] Run config_docs CI check on PRs to change auto generated docs [datafusion]

2025-08-08 Thread via GitHub
alamb commented on code in PR #17046: URL: https://github.com/apache/datafusion/pull/17046#discussion_r2263770464 ## .github/workflows/docs_generated.yaml: ## @@ -0,0 +1,77 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [PR] Run config_docs CI check on PRs to change auto generated docs [datafusion]

2025-08-08 Thread via GitHub
alamb commented on code in PR #17046: URL: https://github.com/apache/datafusion/pull/17046#discussion_r2263772727 ## .github/workflows/docs_generated.yaml: ## @@ -0,0 +1,77 @@ + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreemen

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-08 Thread via GitHub
alamb merged PR #16779: URL: https://github.com/apache/datafusion/pull/16779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] More flexible Parquet encryption configuration [datafusion]

2025-08-08 Thread via GitHub
alamb closed issue #16778: More flexible Parquet encryption configuration URL: https://github.com/apache/datafusion/issues/16778 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Dynamic Parquet encryption and decryption properties [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16779: URL: https://github.com/apache/datafusion/pull/16779#issuecomment-3168980303 Let's get it merged so we can begin testing more easily @mbutrovich it would be great if you could test this with comet before the 50.0.0 release so that we can make any adjustm

Re: [PR] [branch-49] Backport "Add ExecutionPlan::reset_state (apache#17028)" to v49 [datafusion]

2025-08-08 Thread via GitHub
alamb merged PR #17096: URL: https://github.com/apache/datafusion/pull/17096 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Unify how various `FileSource`s are applying projections? [datafusion]

2025-08-08 Thread via GitHub
alamb commented on issue #17095: URL: https://github.com/apache/datafusion/issues/17095#issuecomment-3168956601 > It would be nice to unify this logic, and make the design/relationship between FileSource and FileScanConfig much simpler. I agree it would be nice to make this simpler

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-08 Thread via GitHub
adriangb commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3168950104 Right big picture here I think it's quite evident that there are large issues with the current design: coupling, circular references, etc. Just take a look at this: https://

[I] Optimize `ORDER BY time DESC LIMIT 1` queries ( TopK or rewrite??) [datafusion]

2025-08-08 Thread via GitHub
alamb opened a new issue, #17098: URL: https://github.com/apache/datafusion/issues/17098 ### Is your feature request related to a problem or challenge? This came up in the context of an internal investigation I did at InfluxData. I am not sure how common this use case is, but since I

Re: [PR] Pass `batch_size` directly when creating file opener [datafusion]

2025-08-08 Thread via GitHub
friendlymatthew commented on PR #17076: URL: https://github.com/apache/datafusion/pull/17076#issuecomment-3168929125 > Thanks @friendlymatthew > > While I can see the rationale behind this change, I'd like to understand more about why this approach is definitively better than the exis

Re: [I] TPC-DS query 39 fails if parquet filter pushdown is enabled [datafusion]

2025-08-08 Thread via GitHub
AdamGS commented on issue #17097: URL: https://github.com/apache/datafusion/issues/17097#issuecomment-3168876916 pretty sure it's a duplicate of #17077, we ran into it with another file source/format, the only difference as far as I can tell is the reutrn value of `FileSource::try_with_push

Re: [PR] fix: Add support for StringDecode in Spark 4.0.0 [datafusion-comet]

2025-08-08 Thread via GitHub
mbutrovich merged PR #2075: URL: https://github.com/apache/datafusion-comet/pull/2075 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] fix: Add support for StringDecode in Spark 4.0.0 [datafusion-comet]

2025-08-08 Thread via GitHub
peter-toth commented on PR #2075: URL: https://github.com/apache/datafusion-comet/pull/2075#issuecomment-3168788232 Sure, I'm happy to open a follow-up PR next week. Thanks for the review @mbutrovich, @comphead and @andygrove! -- This is an automated message from the Apache Git Ser

Re: [I] Add support for StringDecode in Spark 4.0.0 [datafusion-comet]

2025-08-08 Thread via GitHub
mbutrovich closed issue #1942: Add support for StringDecode in Spark 4.0.0 URL: https://github.com/apache/datafusion-comet/issues/1942 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Enable `clone_on_ref_ptr` Clippy lint for the whole workspace [datafusion]

2025-08-08 Thread via GitHub
Adez017 commented on issue #17083: URL: https://github.com/apache/datafusion/issues/17083#issuecomment-3168734031 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] fix: Remove duplicate serde code [datafusion-comet]

2025-08-08 Thread via GitHub
codecov-commenter commented on PR #2098: URL: https://github.com/apache/datafusion-comet/pull/2098#issuecomment-3168569681 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2098?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[I] TPC-DS query 39 fails if parquet filter pushdown is enabled [datafusion]

2025-08-08 Thread via GitHub
AdamGS opened a new issue, #17097: URL: https://github.com/apache/datafusion/issues/17097 ### Describe the bug When running TPC-DS q39 with filter pushdown enabled, the query fails with the following error: ``` ... does not satisfy order requirements: [w_warehouse_sk@10 ASC,

Re: [I] rpad expression panics if length input is not a literal value [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on issue #2096: URL: https://github.com/apache/datafusion-comet/issues/2096#issuecomment-3168700788 Apologies for not immediately updating the issue -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] rpad expression panics if length input is not a literal value [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender commented on issue #2096: URL: https://github.com/apache/datafusion-comet/issues/2096#issuecomment-3168699849 @CuteChuanChuan I just started working on this too and raised a draft PR : https://github.com/apache/datafusion-comet/pull/2099 -- This is an automated message from

[PR] fix: rpad_bug_fix [datafusion-comet]

2025-08-08 Thread via GitHub
coderfender opened a new pull request, #2099: URL: https://github.com/apache/datafusion-comet/pull/2099 ## Which issue does this PR close? https://github.com/apache/datafusion-comet/issues/2096 Closes #. Implement comet native logic to support rpad(column, column) API in

Re: [I] Release DataFusion `49.0.1` (patch) [datafusion]

2025-08-08 Thread via GitHub
adriangb commented on issue #17036: URL: https://github.com/apache/datafusion/issues/17036#issuecomment-3168691369 > I think we also need a backport for > > * [Add ExecutionPlan::reset_state  #17028](https://github.com/apache/datafusion/pull/17028) https://github.com/apache/data

[PR] Backport https://github.com/apache/datafusion/pull/17028 to 49.0.1 [datafusion]

2025-08-08 Thread via GitHub
adriangb opened a new pull request, #17096: URL: https://github.com/apache/datafusion/pull/17096 https://github.com/apache/datafusion/issues/17036#issuecomment-3168369188 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-08-08 Thread via GitHub
BlakeOrth commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-3168656416 At the end of the day I'm going to be working on some way to get listing resulted cached, and I'd much rather make those changes here to contribute back to open source than ke

Re: [PR] feat: Support `PiecewiseMergeJoin` to speed up single range predicate joins [datafusion]

2025-08-08 Thread via GitHub
jonathanc-n commented on PR #16660: URL: https://github.com/apache/datafusion/pull/16660#issuecomment-3168485592 Yes PMG should perform better than IE join. they are used to tackle different things regardless. IE joins are used on multi range while PMG is for a single range. You can see tha

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-08-08 Thread via GitHub
petern48 commented on PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#issuecomment-3168459130 Yes, sorry. Just been really busy lately and it's not obvious to me how to fix the CI failures. I don't think I messed something up about copying things over, so there's probab

Re: [PR] chore: Remove obsolete supportedSortType function after Arrow updates [datafusion-comet]

2025-08-08 Thread via GitHub
andygrove commented on PR #1946: URL: https://github.com/apache/datafusion-comet/pull/1946#issuecomment-3168424899 This PR seems inactive, so moving to draft for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-08-08 Thread via GitHub
andygrove commented on PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#issuecomment-3168422096 This PR seems inactive, so moving to draft for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb commented on PR #99: URL: https://github.com/apache/datafusion-site/pull/99#issuecomment-3168412824 This PR is now ready for review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb commented on PR #99: URL: https://github.com/apache/datafusion-site/pull/99#issuecomment-3168407554 FYI @XiangpengHao @zhuqi-lucas and @JigaoLuo as you may be interested in this content -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb commented on PR #99: URL: https://github.com/apache/datafusion-site/pull/99#issuecomment-3168412068 FYI @nuno-faria @shehabgamin @jonathanc-n @zhuqi-lucas and @etseidl as you are mentioned in the blog post -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3168404348 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_pushdown Benchmark clickbench_partitioned.json ---

[PR] [BLOG] Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb opened a new pull request, #99: URL: https://github.com/apache/datafusion-site/pull/99 - Closes https://github.com/apache/datafusion/issues/17010 This is my attempt at technical evangelism / explanation about when one would use external indexes and how to do so with DataFusion

Re: [PR] [BLOG] Using External Indexes and Metadata Stores to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb commented on PR #98: URL: https://github.com/apache/datafusion-site/pull/98#issuecomment-3168401674 Will continue work in https://github.com/apache/datafusion-site/pull/99 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] [BLOG] Using External Indexes and Metadata Stores to Accelerate Queries on Apache Parquet [datafusion-site]

2025-08-08 Thread via GitHub
alamb closed pull request #98: [BLOG] Using External Indexes and Metadata Stores to Accelerate Queries on Apache Parquet URL: https://github.com/apache/datafusion-site/pull/98 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Link UdfEq and PtrEq to help understand relationship [datafusion]

2025-08-08 Thread via GitHub
alamb merged PR #17082: URL: https://github.com/apache/datafusion/pull/17082 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] rpad expression panics if length input is not a literal value [datafusion-comet]

2025-08-08 Thread via GitHub
CuteChuanChuan commented on issue #2096: URL: https://github.com/apache/datafusion-comet/issues/2096#issuecomment-3168388686 I would like to work on this issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[PR] fix: Remove duplicate serde code [datafusion-comet]

2025-08-08 Thread via GitHub
andygrove opened a new pull request, #2098: URL: https://github.com/apache/datafusion-comet/pull/2098 ## Which issue does this PR close? N/A ## Rationale for this change I'm not sure if this is from a merge conflict or just something I overlooked, but we

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3168373629 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_test_pushdown Benchmark clickbench_pushdown.json

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-08-08 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3168373753 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

  1   2   >