[PR] Snowflake: Improve support for reserved keywords for table factor [datafusion-sqlparser-rs]

2025-07-13 Thread via GitHub
yoavcloud opened a new pull request, #1942: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1942 Aligned logic in the `Dialect` trait to other similar flows (`is_column_alias`, `is_table_alias`, etc) and added a list of reserved keywords in Snowflake. -- This is an automated

Re: [PR] fix: add `order_requirement` & `dist_requirement` to `OutputRequirementExec` display [datafusion]

2025-07-13 Thread via GitHub
xudong963 commented on code in PR #16726: URL: https://github.com/apache/datafusion/pull/16726#discussion_r2203889037 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -983,7 +983,7 @@ async fn test_soft_hard_requirements_with_multiple_soft_requirements_and_

Re: [PR] add filter to handle backtrace [datafusion]

2025-07-13 Thread via GitHub
geetanshjuneja commented on PR #16752: URL: https://github.com/apache/datafusion/pull/16752#issuecomment-3067828326 @blaginin I added the filter which removes the whole backtrace block. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Add Configurable RecordBatch Splitting for Large Input Batches [datafusion]

2025-07-13 Thread via GitHub
2010YOUY01 commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3067676557 > hi @2010YOUY01 > > Do you mean to this PR should be narrowed to only split MemTable datasources by extending #15409? > > > ... some data sources may produce much la

[PR] Refactor BinaryTypeCoercer to Handle Null Coercion Early and Avoid Redundant Checks [datafusion]

2025-07-13 Thread via GitHub
kosiew opened a new pull request, #16768: URL: https://github.com/apache/datafusion/pull/16768 ## Which issue does this PR close? - Closes #16766. ## Rationale for this change This change refactors the `BinaryTypeCoercer` to handle `NULL` coercion at the beginning of the

Re: [PR] Update README.md - add Sqawk to users list [datafusion-sqlparser-rs]

2025-07-13 Thread via GitHub
github-actions[bot] commented on PR #1838: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1838#issuecomment-3067560514 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or

[I] Restructure `binary.rs` Tests into Dedicated Modules [datafusion]

2025-07-13 Thread via GitHub
kosiew opened a new issue, #16767: URL: https://github.com/apache/datafusion/issues/16767 ### Summary The `datafusion/expr-common/src/type_coercion/binary.rs` file has grown significantly and now exceeds 3000 lines, making it difficult to navigate and maintain — especially the test cases

Re: [PR] Support Type Coercion for NULL in Binary Arithmetic Expressions [datafusion]

2025-07-13 Thread via GitHub
kosiew commented on PR #16761: URL: https://github.com/apache/datafusion/pull/16761#issuecomment-3067499379 @2010YOUY01 Thanks for your review and suggestion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Support Type Coercion for NULL in Binary Arithmetic Expressions [datafusion]

2025-07-13 Thread via GitHub
kosiew merged PR #16761: URL: https://github.com/apache/datafusion/pull/16761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [I] Arithmetic expression on `Date` type with `Null` returns planning error (SQLancer) [datafusion]

2025-07-13 Thread via GitHub
kosiew closed issue #16760: Arithmetic expression on `Date` type with `Null` returns planning error (SQLancer) URL: https://github.com/apache/datafusion/issues/16760 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Simplify `signature()` Null Handling by Addressing at Function Entry [datafusion]

2025-07-13 Thread via GitHub
kosiew opened a new issue, #16766: URL: https://github.com/apache/datafusion/issues/16766 ### Summary The current implementation of `signature()` contains scattered checks and logic for handling `NULL` literals throughout the function. This results in unnecessary complexity and makes the

Re: [I] FixedSizeBinary support in min/max accumulators [datafusion]

2025-07-13 Thread via GitHub
theirix commented on issue #16513: URL: https://github.com/apache/datafusion/issues/16513#issuecomment-3067327290 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-13 Thread via GitHub
huaxingao commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2203531091 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

Re: [PR] fix: [iceberg] Add LogicalTypeAnnotation in ParquetColumnSpec [datafusion-comet]

2025-07-13 Thread via GitHub
huaxingao commented on code in PR #2000: URL: https://github.com/apache/datafusion-comet/pull/2000#discussion_r2203530913 ## common/src/main/java/org/apache/comet/parquet/Utils.java: ## @@ -290,15 +290,137 @@ public static ColumnDescriptor buildColumnDescriptor(ParquetColumnSpe

[PR] Support min/max aggregates for FixedSizeBinary type [datafusion]

2025-07-13 Thread via GitHub
theirix opened a new pull request, #16765: URL: https://github.com/apache/datafusion/pull/16765 ## Which issue does this PR close? - Closes #16513 ## Rationale for this change This ```sql CREATE TABLE binaries AS VALUES (X'000103', 1); CREATE VIEW fixed_size_

Re: [PR] Feat/better compaction [datafusion]

2025-07-13 Thread via GitHub
ctsk closed pull request #16764: Feat/better compaction URL: https://github.com/apache/datafusion/pull/16764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

[PR] Feat/better compaction [datafusion]

2025-07-13 Thread via GitHub
ctsk opened a new pull request, #16764: URL: https://github.com/apache/datafusion/pull/16764 Goal: decouple the compaction of string view / binary view buffers from the coalescing of batches. Why: Avoid compacting string views after hash partitioning How: Add CompactExec ExecutionPlan

Re: [PR] share staging infrastructure [datafusion-site]

2025-07-13 Thread via GitHub
kevinjqliu commented on PR #88: URL: https://github.com/apache/datafusion-site/pull/88#issuecomment-3067169457 I found an interesting way to stage multiple branches to `https://datafusion.staged.apache.org` The folks at asf infra has already thought of this https://cwiki.apache.

Re: [I] Add support for `MapSort` expression in Spark 4.0.0 [datafusion-comet]

2025-07-13 Thread via GitHub
rishvin commented on issue #1941: URL: https://github.com/apache/datafusion-comet/issues/1941#issuecomment-3067159959 I would like to work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-13 Thread via GitHub
adriangb commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3067125182 We introduced new APIs in https://github.com/apache/datafusion/pull/16461 and more importantly removed the existing SchemaAdapter. I am not sure how upgrading other systems has

Re: [D] Difficulty reconciling table option types for listing vs direct usage [datafusion]

2025-07-13 Thread via GitHub
GitHub user ianthetechie edited a discussion: Difficulty reconciling table option types for listing vs direct usage When adding a data source to the context, I think the current story around types is a bit clunky, but I wanted to start a discussion to make sure that 1) I'm not missing somethi

Re: [PR] feat: imporve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-13 Thread via GitHub
haohuaijin commented on PR #16762: URL: https://github.com/apache/datafusion/pull/16762#issuecomment-3067108870 cc @debajyoti-truefoundry @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] feat: imporve LiteralGuarantee for the case like `(a=1 AND b=1) OR (a=2 AND b=3)` [datafusion]

2025-07-13 Thread via GitHub
haohuaijin opened a new pull request, #16762: URL: https://github.com/apache/datafusion/pull/16762 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/16697 ## Rationale for this change improve LiteralGuarantee to handle the case like

Re: [PR] chore: Make `GroupValues` and APIs on `PhysicalGroupBy` aggregation APIs public [datafusion]

2025-07-13 Thread via GitHub
haohuaijin commented on PR #16733: URL: https://github.com/apache/datafusion/pull/16733#issuecomment-3067088312 Thank you @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

[PR] Snowflake: Improve accuracy of lookahead in implicit LIMIT alias [datafusion-sqlparser-rs]

2025-07-13 Thread via GitHub
yoavcloud opened a new pull request, #1941: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1941 Some needed adjustments to the lookahead logic when considering the `LIMIT` keyword as an implicit table alias. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-13 Thread via GitHub
blaginin commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3067051288 There's a minio alternative which actually uses datafusion internally 😱 https://github.com/rustfs/rustfs -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: expose intersect distinct/except distinct in dataframe api [datafusion]

2025-07-13 Thread via GitHub
alamb merged PR #16578: URL: https://github.com/apache/datafusion/pull/16578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-13 Thread via GitHub
alamb commented on PR #16668: URL: https://github.com/apache/datafusion/pull/16668#issuecomment-3067026604 Thanks again @jatin510 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Extend binary coercion rules to support Decimal arithmetic operations with integer(signed and unsigned) types [datafusion]

2025-07-13 Thread via GitHub
alamb merged PR #16668: URL: https://github.com/apache/datafusion/pull/16668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Decimal & UInt Binary operation giving wrong output [datafusion]

2025-07-13 Thread via GitHub
alamb closed issue #16667: Decimal & UInt Binary operation giving wrong output URL: https://github.com/apache/datafusion/issues/16667 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] chore: Make `GroupValues` and APIs on `PhysicalGroupBy` aggregation APIs public [datafusion]

2025-07-13 Thread via GitHub
alamb merged PR #16733: URL: https://github.com/apache/datafusion/pull/16733 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Discussion: public some aggregate related function and struct [datafusion]

2025-07-13 Thread via GitHub
alamb closed issue #16724: Discussion: public some aggregate related function and struct URL: https://github.com/apache/datafusion/issues/16724 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Benchmark for char expression [datafusion]

2025-07-13 Thread via GitHub
ajita-asthana commented on PR #16743: URL: https://github.com/apache/datafusion/pull/16743#issuecomment-3066991730 ``` Running benches/char.rs (/Users/ajeetaasthana/datafusion/target/release/deps/char-439076073244eaaa) Gnuplot not found, using plotters backend char

Re: [PR] Optimize Hex Function [datafusion]

2025-07-13 Thread via GitHub
ajita-asthana commented on PR #16077: URL: https://github.com/apache/datafusion/pull/16077#issuecomment-3066987604 Would it be feasible to add a benchmark for the new spark compatible hex function as well if it doesn't exist ? -- This is an automated message from the Apache Git Service. T

Re: [PR] share staging infrastructure [datafusion-site]

2025-07-13 Thread via GitHub
alamb commented on PR #88: URL: https://github.com/apache/datafusion-site/pull/88#issuecomment-3066966490 I wonder if we could use github pages or something in the local fork. That would require setup for the other users but we wouldn't have permission problems 🤔 -- This is an automated

Re: [PR] Add Configurable RecordBatch Splitting for Large Input Batches [datafusion]

2025-07-13 Thread via GitHub
kosiew commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3066962320 hi @2010YOUY01 Do you mean to this PR should be narrowed to only split MemTable datasources by extending #15409? > ... some data sources may produce much larger batches

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-13 Thread via GitHub
2010YOUY01 commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203255450 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices( probe_indices: UInt32Array, filte

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-13 Thread via GitHub
Dandandan commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203255187 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices( probe_indices: UInt32Array, filter

Re: [PR] Add Configurable RecordBatch Splitting for Large Input Batches [datafusion]

2025-07-13 Thread via GitHub
2010YOUY01 commented on PR #16734: URL: https://github.com/apache/datafusion/pull/16734#issuecomment-3066735438 I got an alternative idea that might be simpler to implement: We have already implemented MemTable repartition in https://github.com/apache/datafusion/pull/15409, this is no

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-13 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2203230458 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -843,24 +844,56 @@ pub(crate) fn apply_join_filter_to_indices( probe_indices: UInt32Array, filter: