Re: [PR] feat(spark): implement Spark `make_dt_interval` function [datafusion]

2025-09-23 Thread via GitHub
davidlghellin commented on code in PR #17728: URL: https://github.com/apache/datafusion/pull/17728#discussion_r2374645081 ## datafusion/sqllogictest/test_files/spark/datetime/make_dt_interval.slt: ## @@ -23,15 +23,141 @@ ## Original Query: SELECT make_dt_interval(1, 12, 30, 0

Re: [PR] feat : Display function alias in output column name [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on code in PR #17690: URL: https://github.com/apache/datafusion/pull/17690#discussion_r2374608816 ## datafusion/sql/src/expr/function.rs: ## @@ -270,7 +271,27 @@ impl SqlToRel<'_, S> { // User-defined function (UDF) should have precedence if

Re: [PR] feat : Display function alias in output column name [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on code in PR #17690: URL: https://github.com/apache/datafusion/pull/17690#discussion_r2374608816 ## datafusion/sql/src/expr/function.rs: ## @@ -270,7 +271,27 @@ impl SqlToRel<'_, S> { // User-defined function (UDF) should have precedence if

Re: [PR] Snowflake: ALTER USER and KeyValueOptions Refactoring [datafusion-sqlparser-rs]

2025-09-23 Thread via GitHub
yoavcloud commented on code in PR #2035: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2035#discussion_r2374526777 ## src/ast/mod.rs: ## @@ -10558,6 +10564,199 @@ impl fmt::Display for CreateUser { } } +/// Modifies the properties of a user +/// +/// Synta

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326322116 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubun

Re: [PR] Add support for calling async UDF as aggregation expression [datafusion]

2025-09-23 Thread via GitHub
goldmedal commented on PR #17620: URL: https://github.com/apache/datafusion/pull/17620#issuecomment-3326506859 Thanks @simonvandel @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Add support for calling async UDF as aggregation expression [datafusion]

2025-09-23 Thread via GitHub
goldmedal merged PR #17620: URL: https://github.com/apache/datafusion/pull/17620 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Release DataFusion `50.1.0` (patch) [datafusion]

2025-09-23 Thread via GitHub
Omega359 commented on issue #17594: URL: https://github.com/apache/datafusion/issues/17594#issuecomment-3324955721 51.0.0 or 50.1.0? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] minor: create `OptimizerContext` with provided `ConfigOptions` [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17742: URL: https://github.com/apache/datafusion/pull/17742#issuecomment-3325021792 Thanks @MichaelScofield and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326322009 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow-56.2 Benchmark clickbench_partitioned.json ---

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326367719 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow-56.2 Benchmark tpch_mem_sf1.json ┏

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326336971 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow-56.2 Benchmark clickbench_extended.json --

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326337015 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubun

Re: [PR] Add case expr simplifiers for literal comparisons [datafusion]

2025-09-23 Thread via GitHub
jackkleeman commented on code in PR #17743: URL: https://github.com/apache/datafusion/pull/17743#discussion_r2373030753 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1447,7 +1480,11 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] Support `LargeList` in `array_has` simplification to `InList` [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on code in PR #17732: URL: https://github.com/apache/datafusion/pull/17732#discussion_r2373952849 ## datafusion/functions-nested/src/array_has.rs: ## @@ -131,40 +131,42 @@ impl ScalarUDFImpl for ArrayHas { // if the haystack is a constant list, we c

Re: [PR] fix: distributed RangePartitioning bounds calculation with native shuffle [datafusion-comet]

2025-09-23 Thread via GitHub
comphead commented on code in PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#discussion_r2373951786 ## native/core/src/execution/shuffle/comet_partitioning.rs: ## @@ -26,15 +27,15 @@ pub enum CometPartitioning { Hash(Vec>, usize), /// Allocate rows

[I] `ScalarValue::convert_array_to_scalar_vec` doesn't respect null list elements [datafusion]

2025-09-23 Thread via GitHub
Jefffrey opened a new issue, #17749: URL: https://github.com/apache/datafusion/issues/17749 Because it used to do `array.as_list::().value(index)`, this never checked for nulls before. So in the test case I added below: ```rust // Funky (null slot has non-zero list offsets)

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326281724 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubun

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3326281602 🤖: Benchmark completed Details ``` Comparing HEAD and alamb_update_arrow-56.2 Benchmark clickbench_pushdown.json --

Re: [PR] Link to actual change logs in CHANGELOG.md [datafusion-sqlparser-rs]

2025-09-23 Thread via GitHub
iffyio merged PR #2040: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2040 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] feat: do not fallback to Spark for distinct aggregates [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove commented on code in PR #2429: URL: https://github.com/apache/datafusion-comet/pull/2429#discussion_r2372520786 ## spark/src/test/scala/org/apache/comet/CometFuzzAggregateSuite.scala: ## @@ -26,8 +26,18 @@ class CometFuzzAggregateSuite extends CometFuzzTestBase {

Re: [I] Configuration page on website has sidebar overlapping table [datafusion]

2025-09-23 Thread via GitHub
Jefffrey closed issue #17720: Configuration page on website has sidebar overlapping table URL: https://github.com/apache/datafusion/issues/17720 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Allow comparison between boolean and int values [datafusion]

2025-09-23 Thread via GitHub
github-actions[bot] closed pull request #16798: Allow comparison between boolean and int values URL: https://github.com/apache/datafusion/pull/16798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add reproducing test cases for stackoverflows [datafusion]

2025-09-23 Thread via GitHub
github-actions[bot] closed pull request #16787: Add reproducing test cases for stackoverflows URL: https://github.com/apache/datafusion/pull/16787 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] docs: fix sidebar overlapping table on configuration page on website [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on PR #17738: URL: https://github.com/apache/datafusion/pull/17738#issuecomment-3326168344 Thanks @saimahendra282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] feat(spark): implement Spark `make_dt_interval` function [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on code in PR #17728: URL: https://github.com/apache/datafusion/pull/17728#discussion_r2373839757 ## datafusion/spark/src/function/datetime/make_dt_interval.rs: ## @@ -0,0 +1,479 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more con

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3326147892 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~24.04.1-Ubun

Re: [PR] chore: Action some old TODOs in github actions [datafusion]

2025-09-23 Thread via GitHub
Jefffrey commented on code in PR #17694: URL: https://github.com/apache/datafusion/pull/17694#discussion_r2373830312 ## .github/workflows/rust.yml: ## @@ -308,17 +308,20 @@ jobs: name: cargo test datafusion-cli (amd64) needs: linux-build-lib runs-on: ubuntu-latest

Re: [PR] Implement `partition_statistics` API for `InterleaveExec` [datafusion]

2025-09-23 Thread via GitHub
liamzwbao commented on code in PR #17051: URL: https://github.com/apache/datafusion/pull/17051#discussion_r2373764889 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -387,6 +388,64 @@ mod test { Ok(()) } +#[tokio::test] +async f

Re: [PR] Implement `partition_statistics` API for `InterleaveExec` [datafusion]

2025-09-23 Thread via GitHub
liamzwbao commented on code in PR #17051: URL: https://github.com/apache/datafusion/pull/17051#discussion_r2373764889 ## datafusion/core/tests/physical_optimizer/partition_statistics.rs: ## @@ -387,6 +388,64 @@ mod test { Ok(()) } +#[tokio::test] +async f

[PR] Correctly tokenize nested comments in Databricks, Clickhouse, and ANSI [datafusion-sqlparser-rs]

2025-09-23 Thread via GitHub
jmhain opened a new pull request, #2044: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2044 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Add array_transform function [datafusion]

2025-09-23 Thread via GitHub
timsaucer commented on PR #17289: URL: https://github.com/apache/datafusion/pull/17289#issuecomment-3325920674 Adding a note, mostly for myself, to address later. The approach here works but it is sub-optimal. Suppose I have this DataFrame: ``` +--+--+

Re: [PR] Update version number, add changelog [datafusion-python]

2025-09-23 Thread via GitHub
timsaucer merged PR #1249: URL: https://github.com/apache/datafusion-python/pull/1249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-09-23 Thread via GitHub
mbutrovich commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2373571298 ## native/core/src/parquet/parquet_exec.rs: ## @@ -122,9 +145,131 @@ pub(crate) fn init_datasource_exec( Ok(Arc::new(DataSourceExec::new(Arc::new(file_

Re: [PR] feat: data source sampling via extension [datafusion]

2025-09-23 Thread via GitHub
theirix commented on code in PR #17633: URL: https://github.com/apache/datafusion/pull/17633#discussion_r2373547918 ## datafusion-examples/examples/table_sample.rs: ## @@ -0,0 +1,1353 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [PR] Prevent exponential planning time for Window functions - v2 [datafusion]

2025-09-23 Thread via GitHub
berkaysynnada commented on PR #17684: URL: https://github.com/apache/datafusion/pull/17684#issuecomment-3324395879 I think it's ready. @findepi feel free to commit directly if you'd like to make any changes ``` DataFusion CLI v50.0.0 > WITH source AS ( SELECT 1

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-09-23 Thread via GitHub
hsiang-c commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2373409278 ## native/core/src/parquet/parquet_exec.rs: ## @@ -122,9 +145,131 @@ pub(crate) fn init_datasource_exec( Ok(Arc::new(DataSourceExec::new(Arc::new(file_sc

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-09-23 Thread via GitHub
hsiang-c commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2373403929 ## native/core/src/parquet/parquet_exec.rs: ## @@ -122,9 +145,131 @@ pub(crate) fn init_datasource_exec( Ok(Arc::new(DataSourceExec::new(Arc::new(file_sc

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-09-23 Thread via GitHub
hsiang-c commented on code in PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#discussion_r2373388517 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -410,6 +410,16 @@ public void init() throws Throwable { } } +

Re: [I] Release DataFusion `50.1.0` (minor) [datafusion]

2025-09-23 Thread via GitHub
alamb commented on issue #17594: URL: https://github.com/apache/datafusion/issues/17594#issuecomment-3325462734 I have created a PR with version update and changelog: - https://github.com/apache/datafusion/pull/17748 -- This is an automated message from the Apache Git Service. To respo

[PR] [branch-50] Prepare for 50.1.0 release [datafusion]

2025-09-23 Thread via GitHub
alamb opened a new pull request, #17748: URL: https://github.com/apache/datafusion/pull/17748 ## Which issue does this PR close? - part of https://github.com/apache/datafusion/issues/17594 ## Rationale for this change Prepare for release ## What changes are included in thi

Re: [PR] fix: distributed RangePartitioning bounds calculation with native shuffle [datafusion-comet]

2025-09-23 Thread via GitHub
comphead commented on code in PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#discussion_r2373379906 ## native/core/src/execution/planner.rs: ## @@ -2344,16 +2351,58 @@ impl PhysicalPlanner { )) } PartitioningStruct:

Re: [PR] feat: Parquet Modular Encryption with Spark KMS for native readers [datafusion-comet]

2025-09-23 Thread via GitHub
codecov-commenter commented on PR #2447: URL: https://github.com/apache/datafusion-comet/pull/2447#issuecomment-3325352610 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2447?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: distributed RangePartitioning bounds calculation with native shuffle [datafusion-comet]

2025-09-23 Thread via GitHub
comphead commented on code in PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#discussion_r2373316456 ## native/core/src/execution/shuffle/comet_partitioning.rs: ## @@ -26,15 +27,15 @@ pub enum CometPartitioning { Hash(Vec>, usize), /// Allocate rows

[PR] feat: Parquet Modular Encryption support for native_datafusion and native_iceberg_compat readers [datafusion-comet]

2025-09-23 Thread via GitHub
mbutrovich opened a new pull request, #2447: URL: https://github.com/apache/datafusion-comet/pull/2447 This is draft for now. I have some duplicate code to clean up and other minor refactoring to do, but it's ready to start playing in CI. I'll expand on this description soon. #

Re: [PR] docs: fix sidebar overlapping table on configuration page on website [datafusion]

2025-09-23 Thread via GitHub
petern48 commented on PR #17738: URL: https://github.com/apache/datafusion/pull/17738#issuecomment-3324199580 > Anyways.. I will update my fixes and create PR again @saimahendra282 There's no need to create a new PR. You can just push to the same branch, and GitHub will update this PR

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-23 Thread via GitHub
adriangb commented on code in PR #115: URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2373291374 ## content/blog/2025-09-24-datafusion-50.0.0.md: ## @@ -0,0 +1,389 @@ +--- +layout: post +title: Apache DataFusion 50.0.0 Released +date: 2025-09-24 +author: pmc

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-23 Thread via GitHub
adriangb commented on code in PR #115: URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2373293807 ## content/blog/2025-09-24-datafusion-50.0.0.md: ## @@ -0,0 +1,389 @@ +--- +layout: post +title: Apache DataFusion 50.0.0 Released +date: 2025-09-24 +author: pmc

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-23 Thread via GitHub
adriangb commented on code in PR #115: URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2373289162 ## content/blog/2025-09-24-datafusion-50.0.0.md: ## @@ -0,0 +1,389 @@ +--- +layout: post +title: Apache DataFusion 50.0.0 Released +date: 2025-09-24 +author: pmc

Re: [PR] Prevent exponential planning time for Window functions - v2 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17684: URL: https://github.com/apache/datafusion/pull/17684#discussion_r2372892813 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -371,17 +371,41 @@ pub(crate) fn window_equivalence_properties( for (i, expr) in window_exprs.iter().enume

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3325043993 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubun

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3325252237 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1016-gcp #17~

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3325223470 My scripts are having problems ``` + cargo run --release --bin tpch -- benchmark datafusion --iterations 5 --path /home/alamb/arrow-datafusion/benchmarks/data/tpch_sf1 --prefer_ha

Re: [PR] Add case expr simplifiers for literal comparisons [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17743: URL: https://github.com/apache/datafusion/pull/17743#discussion_r2372961652 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1471,6 +1508,56 @@ impl TreeNodeRewriter for Simplifier<'_, S> { // Do

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3325222308 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubun

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3325222112 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3325219277 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~

Re: [PR] Proposed enhancement to intro and conclusion of Metadata handling blog [datafusion-site]

2025-09-23 Thread via GitHub
timsaucer merged PR #114: URL: https://github.com/apache/datafusion-site/pull/114 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] chore(deps): bump taiki-e/install-action from 2.61.8 to 2.62.4 [datafusion-sandbox]

2025-09-23 Thread via GitHub
dependabot[bot] commented on PR #12: URL: https://github.com/apache/datafusion-sandbox/pull/12#issuecomment-3323701535 ### Labels The following labels could not be found: `auto-dependencies`. Please create it before Dependabot can add it to a pull request. Please fix the a

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-23 Thread via GitHub
nuno-faria commented on code in PR #115: URL: https://github.com/apache/datafusion-site/pull/115#discussion_r2373157130 ## content/blog/2025-09-24-datafusion-50.0.0.md: ## @@ -0,0 +1,390 @@ +--- +layout: post +title: Apache DataFusion 50.0.0 Released +date: 2025-09-24 +author: p

Re: [PR] Update `arrow` / `parquet` to 56.2.0 [datafusion]

2025-09-23 Thread via GitHub
alamb commented on PR #17631: URL: https://github.com/apache/datafusion/pull/17631#issuecomment-3325044329 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubun

Re: [PR] add column descriptions to info schemas [datafusion]

2025-09-23 Thread via GitHub
adriangb commented on PR #17734: URL: https://github.com/apache/datafusion/pull/17734#issuecomment-3325031934 I think we're going to go in another direction: we'll add an internal endpoint to do pretty much the same thing. I think this change makes sense and is a nice addition but I would l

Re: [PR] add column descriptions to info schemas [datafusion]

2025-09-23 Thread via GitHub
adriangb closed pull request #17734: add column descriptions to info schemas URL: https://github.com/apache/datafusion/pull/17734 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Improve documentation for ordered set aggregate functions [datafusion]

2025-09-23 Thread via GitHub
alamb merged PR #17744: URL: https://github.com/apache/datafusion/pull/17744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] chore: remove homebrew publish instructions from release steps [datafusion]

2025-09-23 Thread via GitHub
alamb merged PR #17735: URL: https://github.com/apache/datafusion/pull/17735 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support `LargeList` in `array_has` simplification to `InList` [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17732: URL: https://github.com/apache/datafusion/pull/17732#discussion_r2373055683 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6497,7 +6495,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))] 05)---

Re: [PR] Relax constraint that file sort order must only reference individual columns [datafusion]

2025-09-23 Thread via GitHub
pepijnve commented on PR #17419: URL: https://github.com/apache/datafusion/pull/17419#issuecomment-3324901306 @alamb still iterating a bit on the documentation, but could you kick off another benchmark run? I've tried to reduce the amount of extra work to a minimum. -- This is an automat

Re: [PR] add column descriptions to info schemas [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17734: URL: https://github.com/apache/datafusion/pull/17734#discussion_r2373051430 ## datafusion/catalog/src/information_schema.rs: ## @@ -891,6 +894,10 @@ impl InformationSchemaColumnsBuilder { self.datetime_precisions.append_option(Non

Re: [PR] perf: boolean group values implementations [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17726: URL: https://github.com/apache/datafusion/pull/17726#discussion_r2373034969 ## datafusion/physical-plan/src/aggregates/group_values/mod.rs: ## @@ -174,23 +174,26 @@ pub fn new_group_values( downcast_helper!(Decimal128Type, d

Re: [PR] Add case expr simplifiers for literal comparisons [datafusion]

2025-09-23 Thread via GitHub
jackkleeman commented on code in PR #17743: URL: https://github.com/apache/datafusion/pull/17743#discussion_r2373022716 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1471,6 +1508,56 @@ impl TreeNodeRewriter for Simplifier<'_, S> {

Re: [PR] Blog: Add blog post about DataFusion 50.0.0 release [datafusion-site]

2025-09-23 Thread via GitHub
alamb commented on PR #115: URL: https://github.com/apache/datafusion-site/pull/115#issuecomment-3324931916 Amazing! Thank you @nuno-faria -- I will review this PR today or tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Better name display for CAST [datafusion]

2025-09-23 Thread via GitHub
alamb commented on issue #10274: URL: https://github.com/apache/datafusion/issues/10274#issuecomment-3324921857 FWIW I am not sure there is consensus on what the correct behavior in this case is -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] chore: Action some old TODOs in github actions [datafusion]

2025-09-23 Thread via GitHub
blaginin commented on code in PR #17694: URL: https://github.com/apache/datafusion/pull/17694#discussion_r237256 ## .github/workflows/rust.yml: ## @@ -308,17 +308,20 @@ jobs: name: cargo test datafusion-cli (amd64) needs: linux-build-lib runs-on: ubuntu-latest

Re: [PR] Metadata handling announcement [datafusion-site]

2025-09-23 Thread via GitHub
timsaucer commented on PR #73: URL: https://github.com/apache/datafusion-site/pull/73#issuecomment-3324891884 Thank you @paleolimbot @alamb and @2010YOUY01 ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Prevent exponential planning time for Window functions - v2 [datafusion]

2025-09-23 Thread via GitHub
findepi commented on code in PR #17684: URL: https://github.com/apache/datafusion/pull/17684#discussion_r2372957164 ## datafusion/physical-plan/src/windows/mod.rs: ## @@ -467,23 +493,44 @@ pub(crate) fn window_equivalence_properties( // utilize set-monotonicity

Re: [PR] Metadata handling announcement [datafusion-site]

2025-09-23 Thread via GitHub
paleolimbot commented on code in PR #73: URL: https://github.com/apache/datafusion-site/pull/73#discussion_r2372959896 ## content/blog/2025-09-21-custom-types-using-metadata.md: ## @@ -0,0 +1,296 @@ +--- +layout: post +title: Custom types in DataFusion using Metadata +date: 2025

Re: [PR] Proposed enhancement to intro and conclusion of Metadata handling blog [datafusion-site]

2025-09-23 Thread via GitHub
timsaucer commented on PR #114: URL: https://github.com/apache/datafusion-site/pull/114#issuecomment-3324849234 This is great - thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Restore window sort optimizations without exponential planning time [datafusion]

2025-09-23 Thread via GitHub
findepi closed issue #17624: Restore window sort optimizations without exponential planning time URL: https://github.com/apache/datafusion/issues/17624 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] (fix): Lag function creates unwanted projection (#17630) [datafusion]

2025-09-23 Thread via GitHub
renato2099 commented on PR #17639: URL: https://github.com/apache/datafusion/pull/17639#issuecomment-3323552390 > Yes, I'll merge it soon. thank you! > The pattern in this repo has frequently been to give a few days to give people an opportunity to comment. oh I see, ma

Re: [PR] fix: Specify reqwest crate features [datafusion-comet]

2025-09-23 Thread via GitHub
codecov-commenter commented on PR #2446: URL: https://github.com/apache/datafusion-comet/pull/2446#issuecomment-332477 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2446?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Keep aggregate udaf schema names unique when missing an order-by [datafusion]

2025-09-23 Thread via GitHub
alamb commented on code in PR #17731: URL: https://github.com/apache/datafusion/pull/17731#discussion_r2372194270 ## datafusion/sqllogictest/test_files/aggregate.slt: ## @@ -1821,6 +1821,29 @@ c 122 d 124 e 115 + +# using approx_percentile_cont on 2 columns with same signatu

Re: [I] GitHub action to unassign stale `take`s [datafusion]

2025-09-23 Thread via GitHub
Omega359 commented on issue #17733: URL: https://github.com/apache/datafusion/issues/17733#issuecomment-3324570407 I've actually seen this as well. 3 months might be a bit long imho but I think 60 days would be reasonable. I wish tje actions/stale github action would support this how

[PR] fix: Specify reqwest crate features [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove opened a new pull request, #2446: URL: https://github.com/apache/datafusion-comet/pull/2446 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] fix: potential native broadcast failure in scenarios with ReusedExhange [datafusion-comet]

2025-09-23 Thread via GitHub
akupchinskiy commented on PR #2167: URL: https://github.com/apache/datafusion-comet/pull/2167#issuecomment-3324690682 The functionality of this MR got covered by recently merged https://github.com/apache/datafusion-comet/pull/2398 alongside with https://github.com/apache/datafusion-comet/p

Re: [I] Support smaller decimal types through SQL interface [datafusion]

2025-09-23 Thread via GitHub
AdamGS commented on issue #17747: URL: https://github.com/apache/datafusion/issues/17747#issuecomment-3324603759 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] Support smaller decimal types through SQL interface [datafusion]

2025-09-23 Thread via GitHub
AdamGS opened a new issue, #17747: URL: https://github.com/apache/datafusion/issues/17747 Part of https://github.com/apache/datafusion/issues/17489. Due to a bug in `arrow-rs`, the new smaller decimal values couldn't have been used through the SQL interface. Now that arrow 56.2.0 is r

Re: [PR] fix: Remove parquet encryption feature from root deps [datafusion]

2025-09-23 Thread via GitHub
alamb merged PR #17700: URL: https://github.com/apache/datafusion/pull/17700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] docs: fix sidebar overlapping table on configuration page on website [datafusion]

2025-09-23 Thread via GitHub
saimahendra282 commented on PR #17738: URL: https://github.com/apache/datafusion/pull/17738#issuecomment-3324517853 @Jefffrey can you please review again in your free time, i modified the css now, if you want in other ways like adding border to columns and rows please do specify them too..

[I] Maven rat check should exclude `docs/comet-*` [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove opened a new issue, #2445: URL: https://github.com/apache/datafusion-comet/issues/2445 ### What is the problem the feature request solves? Generating documentation resuls in git clones of previous versions of the repo. The rat check takes a really long time when these clones

[I] Add link to datafusion ballista at the datafusion site landing page [datafusion]

2025-09-23 Thread via GitHub
milenkovicm opened a new issue, #17746: URL: https://github.com/apache/datafusion/issues/17746 ### Is your feature request related to a problem or challenge? At the moment [datafusio.apache.org](https://datafusion.apache.org) landing page links only DF python and comet as user facing

Re: [PR] [WIP] feat: Prefer to columnar [datafusion-comet]

2025-09-23 Thread via GitHub
codecov-commenter commented on PR #2444: URL: https://github.com/apache/datafusion-comet/pull/2444#issuecomment-3324370803 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2444?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Remove pyarrow as required dependency, relying on Arrow PyCapsule Interface [datafusion-python]

2025-09-23 Thread via GitHub
H0TB0X420 commented on issue #1227: URL: https://github.com/apache/datafusion-python/issues/1227#issuecomment-3324313864 I'll pick this up this week. Apologies for the delay, I've been traveling. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] feat: support multi-threaded writing of Parquet files with modular encryption [datafusion]

2025-09-23 Thread via GitHub
rok commented on PR #16738: URL: https://github.com/apache/datafusion/pull/16738#issuecomment-3324334058 cc @corwinjoy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

[PR] [WIP] feat: Prefer to columnar [datafusion-comet]

2025-09-23 Thread via GitHub
wForget opened a new pull request, #2444: URL: https://github.com/apache/datafusion-comet/pull/2444 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] feat: do not fallback to Spark for distinct aggregates [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove commented on code in PR #2429: URL: https://github.com/apache/datafusion-comet/pull/2429#discussion_r2372525360 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -558,12 +558,6 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Make Session Context `pyclass` frozen so interior mutability is only managed by rust [datafusion-python]

2025-09-23 Thread via GitHub
ntjohnson1 commented on PR #1248: URL: https://github.com/apache/datafusion-python/pull/1248#issuecomment-3324296817 > I'd say all pyclasses should be frozen by default (and I think pyo3 intends to make that the default in the future?). I can create an issue for follow up on this to

Re: [PR] feat: do not fallback to Spark for distinct aggregates [datafusion-comet]

2025-09-23 Thread via GitHub
comphead commented on code in PR #2429: URL: https://github.com/apache/datafusion-comet/pull/2429#discussion_r2372538781 ## spark/src/test/scala/org/apache/comet/CometFuzzAggregateSuite.scala: ## @@ -26,8 +26,18 @@ class CometFuzzAggregateSuite extends CometFuzzTestBase { d

Re: [PR] feat: do not fallback to Spark for distinct aggregates [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove commented on code in PR #2429: URL: https://github.com/apache/datafusion-comet/pull/2429#discussion_r2372526693 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -558,12 +558,6 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] feat: do not fallback to Spark for distinct aggregates [datafusion-comet]

2025-09-23 Thread via GitHub
andygrove commented on code in PR #2429: URL: https://github.com/apache/datafusion-comet/pull/2429#discussion_r2372466235 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -558,12 +558,6 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [D] why the binary of datafusion-cli build by pip on termux is very big? [datafusion]

2025-09-23 Thread via GitHub
GitHub user l1t1 closed a discussion: why the binary of datafusion-cli build by pip on termux is very big? ``` $ ./d root@localhost:~# datafusion-clibash: datafusion-cli: command not foundroot@localhost:~# pip install datafusion-cliLooking in indexes: https://pypi.tuna.tsinghua.edu.cn/simpleC

  1   2   >