[PR] refactor: remove deprecated `AvroExec` [datafusion]

2025-05-07 Thread via GitHub
miroim opened a new pull request, #15987: URL: https://github.com/apache/datafusion/pull/15987 ## Which issue does this PR close? Part of #15950 . ## Rationale for this change The `AvroExec` structure was deprecated in DataFusion 46 and is scheduled for removal. Dev

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
kazuyukitanimura commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078969812 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
kazuyukitanimura commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078969812 ## .github/workflows/iceberg_spark_test.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-07 Thread via GitHub
kazuyukitanimura commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2078943165 ## native/spark-expr/src/bitwise_funcs/bitwise_count.rs: ## @@ -0,0 +1,177 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [D] Multiple 'group by's, one scan [datafusion]

2025-05-07 Thread via GitHub
GitHub user pepijnve edited a discussion: Multiple 'group by's, one scan In the system I'm working on I want to perform multiple aggregates using different group by criteria over large data sets. I don't think grouping sets are an option since those support computing a single set of aggregates

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
irenjj commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2078862746 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[PR] Support Schema Evolution in iceberg [datafusion-comet]

2025-05-07 Thread via GitHub
huaxingao opened a new pull request, #1723: URL: https://github.com/apache/datafusion-comet/pull/1723 ## Which issue does this PR close? We original have CometConf.COMET_SCHEMA_EVOLUTION_ENABLED to set schema evolution to true in Scan rule if the scan is Iceberg table scan. However, i

Re: [PR] Re-Add CodeCov [datafusion]

2025-05-07 Thread via GitHub
2010YOUY01 commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2861663728 > Basically my opinion here is that the few times I tried to review the codecov reports, I found them useless. Maybe it has gotten better since they were originally > > Whe

Re: [PR] Show LogicalType name for `INFORMATION_SCHEMA` [datafusion]

2025-05-07 Thread via GitHub
goldmedal commented on PR #15965: URL: https://github.com/apache/datafusion/pull/15965#issuecomment-2861534893 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] Make Expr::alias and alias_qualified smarter by calling unalias [datafusion]

2025-05-07 Thread via GitHub
github-actions[bot] closed pull request #14749: Make Expr::alias and alias_qualified smarter by calling unalias URL: https://github.com/apache/datafusion/pull/14749 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2078776570 ## datafusion/sqllogictest/test_files/spark/math/ceil.slt: ## @@ -0,0 +1,141 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contribu

Re: [I] `native_datafusion/native_iceberg_compat` scans case sensitive [datafusion-comet]

2025-05-07 Thread via GitHub
wForget commented on issue #1574: URL: https://github.com/apache/datafusion-comet/issues/1574#issuecomment-2861237181 > Isn't this addressed by [#1575](https://github.com/apache/datafusion-comet/pull/1575) ? As commented in https://github.com/apache/datafusion-comet/pull/1575#discus

Re: [PR] BUG: schema_force_view_type configuration not working for CREATE EXTERNAL TABLE [datafusion]

2025-05-07 Thread via GitHub
github-actions[bot] commented on PR #14922: URL: https://github.com/apache/datafusion/pull/14922#issuecomment-2861252728 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [wip] attach diagnostic to duplicate table name error [datafusion]

2025-05-07 Thread via GitHub
github-actions[bot] closed pull request #14767: [wip] attach diagnostic to duplicate table name error URL: https://github.com/apache/datafusion/pull/14767 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Introducing mutation testing [datafusion]

2025-05-07 Thread via GitHub
github-actions[bot] commented on PR #14590: URL: https://github.com/apache/datafusion/pull/14590#issuecomment-2861252887 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2078773309 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2078772494 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] [datafusion-spark] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #15958: URL: https://github.com/apache/datafusion/pull/15958#discussion_r2078769590 ## datafusion/spark/src/function/math/ceil_floor.rs: ## @@ -0,0 +1,720 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] [wip] feat: Add framework for supporting multiple telemetry providers [datafusion-comet]

2025-05-07 Thread via GitHub
codecov-commenter commented on PR #1722: URL: https://github.com/apache/datafusion-comet/pull/1722#issuecomment-2861237379 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1722?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Enhance Schema adapter to accommodate evolving struct [datafusion]

2025-05-07 Thread via GitHub
TheBuilderJR commented on PR #15295: URL: https://github.com/apache/datafusion/pull/15295#issuecomment-2861173475 @kosiew so I think the tricky part is that there are actually multiple evolutions. Basically my code currenty looks like this ``` let con

[PR] [wip] feat: Add framework for supporting multiple telemetry providers [datafusion-comet]

2025-05-07 Thread via GitHub
andygrove opened a new pull request, #1722: URL: https://github.com/apache/datafusion-comet/pull/1722 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1718 ## Rationale for this change Experimenting with supporti

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078668025 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -0,0 +1,266 @@ +diff --git a/spark/v3.4/build.gradle b/spark/v3.4/build.gradle +index 6eb26e8..c288e72 100644 +--- a/sp

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078673638 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -0,0 +1,266 @@ +diff --git a/spark/v3.4/build.gradle b/spark/v3.4/build.gradle +index 6eb26e8..c288e72 100644 +--- a/sp

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078667555 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -0,0 +1,266 @@ +diff --git a/spark/v3.4/build.gradle b/spark/v3.4/build.gradle +index 6eb26e8..c288e72 100644 +--- a/sp

Re: [PR] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on PR #15958: URL: https://github.com/apache/datafusion/pull/15958#issuecomment-2860681100 @parthchandra @huaxingao @comphead @mbutrovich @kazuyukitanimura, FYI, in case you want to review. In Comet, we are currently using DataFusion's ceil and floor functions and have n

Re: [PR] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on PR #15958: URL: https://github.com/apache/datafusion/pull/15958#issuecomment-2860674623 Hi @irenjj. Could you explain in the PR description how these functions differ from the standard DataFusion implementation (i.e., what is Spark-specific about them)? That will help

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-07 Thread via GitHub
codecov-commenter commented on PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720#issuecomment-2860675507 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1720?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Implement ceil&floor function for spark [datafusion]

2025-05-07 Thread via GitHub
shehabgamin commented on PR #15958: URL: https://github.com/apache/datafusion/pull/15958#issuecomment-2860636654 @irenjj I'll review this by tomorrow! I'm not a committer though, so we'll still need a review from someone else as well. cc @alamb @andygrove -- This is an automated m

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078579775 ## .github/workflows/iceberg_spark_test_native_datafusion.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720#issuecomment-2860572261 > While this is ok to pass tests what we need is for CometNativeScanExec to implement support for bucketing like here https://github.com/apache/spark/blob/bc013c031b6b3e0c3

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721#discussion_r2078557579 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -76,6 +76,15 @@ impl CometMemoryPool { } } +impl Drop for CometMemoryPool { +fn

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721#discussion_r2078566439 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -76,6 +76,15 @@ impl CometMemoryPool { } } +impl Drop for CometMemoryPool { +

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078553944 ## .github/workflows/iceberg_spark_test_native_datafusion.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more co

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on PR #15947: URL: https://github.com/apache/datafusion/pull/15947#issuecomment-2860489815 > Thanks @shehabgamin and @andygrove > > What I think we should do is to merge this PR and file a follow on ticket for any follow on performance optimizations (aka avoid allo

[I] [datafusion-spark] Optimize hex function [datafusion]

2025-05-07 Thread via GitHub
andygrove opened a new issue, #15986: URL: https://github.com/apache/datafusion/issues/15986 ### Is your feature request related to a problem or challenge? PR https://github.com/apache/datafusion/pull/15947 adds a Spark-compatbile hex function. The feedback on the PR includes some sug

Re: [PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-07 Thread via GitHub
qstommyshu commented on PR #15984: URL: https://github.com/apache/datafusion/pull/15984#issuecomment-2860488095 1 more PR, then this should be done -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-07 Thread via GitHub
gabotechs commented on PR #15869: URL: https://github.com/apache/datafusion/pull/15869#issuecomment-2860408558 > @gabotechs is this PR good to merge from your perspective? Sure! the added tests looks really good 💯 -- This is an automated message from the Apache Git Service. To resp

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2078546921 ## datafusion/spark/src/function/math/hex.rs: ## @@ -0,0 +1,404 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078541035 ## .github/workflows/iceberg_spark_test_native_datafusion.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or mor

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721#discussion_r2078527149 ## native/core/src/execution/memory_pools/unified_pool.rs: ## @@ -76,6 +76,15 @@ impl CometMemoryPool { } } +impl Drop for CometMemoryPool { +

Re: [PR] feat: Add Aggregate UDF to FFI crate [datafusion]

2025-05-07 Thread via GitHub
alamb commented on code in PR #14775: URL: https://github.com/apache/datafusion/pull/14775#discussion_r2078483270 ## datafusion/ffi/src/arrow_wrappers.rs: ## @@ -31,30 +32,37 @@ use log::error; #[derive(Debug, StableAbi)] pub struct WrappedSchema(#[sabi(unsafe_opaque_field)] p

Re: [PR] Substrait: Handle inner map fields in schema renaming [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15869: URL: https://github.com/apache/datafusion/pull/15869#issuecomment-2860421183 I'll plan to merge tomorrow unless we get any more feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] `native_datafusion/native_iceberg_compat` scans case sensitive [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on issue #1574: URL: https://github.com/apache/datafusion-comet/issues/1574#issuecomment-2860407714 Isn't this addressed by #1575 ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-07 Thread via GitHub
codecov-commenter commented on PR #1721: URL: https://github.com/apache/datafusion-comet/pull/1721#issuecomment-2860400666 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1721?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] [datafusion-spark] Add Spark-compatible hex function [datafusion]

2025-05-07 Thread via GitHub
alamb commented on code in PR #15947: URL: https://github.com/apache/datafusion/pull/15947#discussion_r2078491599 ## datafusion/sqllogictest/test_files/spark/math/hex.slt: ## @@ -0,0 +1,26 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [PR] Re-Add CodeCov [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15256: URL: https://github.com/apache/datafusion/pull/15256#issuecomment-2860320308 Basically my opinion here is that the few times I tried to review the codecov reports, I found them useless. Maybe it has gotten better since they were originally When I tried t

Re: [PR] [wip] update list & struct coercion to support incrementality [datafusion]

2025-05-07 Thread via GitHub
alamb commented on code in PR #15259: URL: https://github.com/apache/datafusion/pull/15259#discussion_r2078470521 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -2250,6 +2291,23 @@ mod tests { ); // nullable because the RHS is nullable } +#[test

[PR] Add `PrimitiveDistinctCountGroupsAccumulator` [datafusion]

2025-05-07 Thread via GitHub
Dandandan opened a new pull request, #15985: URL: https://github.com/apache/datafusion/pull/15985 ## Which issue does this PR close? - Closes #. ## Rationale for this change Speed up queries with group by + distinct count. The original code is taken from @

Re: [PR] feat: Add `array_min` function support [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #14417: URL: https://github.com/apache/datafusion/pull/14417#issuecomment-2860324372 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [I] Spark executors failing occasionally on SIGSEGV [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on issue #1714: URL: https://github.com/apache/datafusion-comet/issues/1714#issuecomment-2860323801 > at �.wrap(ByteBufferInputStream.java:38) Seems like memory got overwritten by some unsafe code which would be consistent with getting a SEGV. Just to

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2078474669 ## tests/sqlparser_mssql.rs: ## @@ -283,6 +294,50 @@ fn parse_create_function() { END\ "; let _ = ms().verified_stmt(create_functi

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2078474669 ## tests/sqlparser_mssql.rs: ## @@ -283,6 +294,50 @@ fn parse_create_function() { END\ "; let _ = ms().verified_stmt(create_functi

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15539: URL: https://github.com/apache/datafusion/pull/15539#issuecomment-2860321107 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [PR] [wip] update list & struct coercion to support incrementality [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15259: URL: https://github.com/apache/datafusion/pull/15259#issuecomment-2860310961 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2860307673 🤖: Benchmark completed Details ``` Comparing HEAD and experiment_repartition-optimization Benchmark clickbench_extended.json --

Re: [PR] Support inferring new predicates to push down [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15906: URL: https://github.com/apache/datafusion/pull/15906#issuecomment-2860308364 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [PR] chore(deps): bump sha2 from 0.10.8 to 0.10.9 [datafusion]

2025-05-07 Thread via GitHub
alamb merged PR #15970: URL: https://github.com/apache/datafusion/pull/15970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] WIP: Testing parquet page cache reader [datafusion]

2025-05-07 Thread via GitHub
alamb closed pull request #15903: WIP: Testing parquet page cache reader URL: https://github.com/apache/datafusion/pull/15903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Add support for table valued functions for SQL Server [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on code in PR #1839: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1839#discussion_r2078461916 ## tests/sqlparser_mssql.rs: ## @@ -283,6 +294,50 @@ fn parse_create_function() { END\ "; let _ = ms().verified_stmt(create_functi

Re: [PR] Implementation for regex_instr [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15928: URL: https://github.com/apache/datafusion/pull/15928#issuecomment-2860276419 Can you please resolve the CI error: https://github.com/apache/datafusion/actions/runs/14820525339/job/41754009017?pr=15928 > # If you encounter an error, run './dev/update_funct

Re: [PR] Feat: support bit_get function [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on PR #1713: URL: https://github.com/apache/datafusion-comet/pull/1713#issuecomment-2860273632 Same comment as in #1602 (https://github.com/apache/datafusion-comet/pull/1602#discussion_r2078442637) -- This is an automated message from the Apache Git Service. To res

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078447398 ## .github/workflows/iceberg_spark_test_native_datafusion.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more co

Re: [PR] deprecate schema expressions [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15847: URL: https://github.com/apache/datafusion/pull/15847#issuecomment-2860258573 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to clear review queue. Please mark it as ready for review when it is ready for another look --

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2078437197 ## native/spark-expr/src/bitwise_funcs/bitwise_count.rs: ## @@ -0,0 +1,103 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more c

Re: [PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-07 Thread via GitHub
andygrove commented on code in PR #1720: URL: https://github.com/apache/datafusion-comet/pull/1720#discussion_r2078441658 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -96,6 +96,13 @@ case class CometScanRule(session: SparkSession) extends Rule[Spark

Re: [PR] chore(deps): bump tonic from 0.12.3 to 0.13.1 [datafusion]

2025-05-07 Thread via GitHub
alamb closed pull request #15957: chore(deps): bump tonic from 0.12.3 to 0.13.1 URL: https://github.com/apache/datafusion/pull/15957 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-07 Thread via GitHub
alamb commented on code in PR #15980: URL: https://github.com/apache/datafusion/pull/15980#discussion_r2078420969 ## datafusion/expr/src/expr.rs: ## @@ -1775,6 +1775,27 @@ impl Expr { | Expr::SimilarTo(Like { expr, pattern, .. }) => { rewrit

Re: [PR] chore(deps): bump tonic from 0.12.3 to 0.13.1 [datafusion]

2025-05-07 Thread via GitHub
dependabot[bot] commented on PR #15957: URL: https://github.com/apache/datafusion/pull/15957#issuecomment-2860237232 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version

Re: [PR] chore(deps): bump tonic from 0.12.3 to 0.13.1 [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15957: URL: https://github.com/apache/datafusion/pull/15957#issuecomment-2860237150 This needs the next arrow update, so closing it now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] chore: Improve reporting of fallback reasons for CollectLimit [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1694: URL: https://github.com/apache/datafusion-comet/pull/1694#discussion_r2078430098 ## spark/src/main/scala/org/apache/comet/rules/CometExecRule.scala: ## @@ -196,18 +198,34 @@ case class CometExecRule(session: SparkSession) extends Rule

Re: [PR] Fix Infer prepare statement type tests [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15743: URL: https://github.com/apache/datafusion/pull/15743#issuecomment-2860184929 Since @kczimm is working on this feature I think, maybe he has some thoughts about how this PR should be structured -- This is an automated message from the Apache Git Service. To r

Re: [PR] refactor: remove deprecated `ParquetExec` [datafusion]

2025-05-07 Thread via GitHub
alamb commented on code in PR #15973: URL: https://github.com/apache/datafusion/pull/15973#discussion_r2078396964 ## datafusion/core/src/datasource/physical_plan/mod.rs: ## @@ -35,9 +35,7 @@ pub use avro::{AvroExec, AvroSource}; pub use datafusion_datasource_parquet::source::Pa

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on code in PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#discussion_r2078393308 ## tests/sqlparser_common.rs: ## @@ -666,6 +666,23 @@ fn parse_select_with_table_alias() { ); } +#[test] +fn parse_consecutive_queries() { R

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2860166649 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

[PR] minor: Warn if memory pool is dropped with bytes still reserved [datafusion-comet]

2025-05-07 Thread via GitHub
andygrove opened a new pull request, #1721: URL: https://github.com/apache/datafusion-comet/pull/1721 ## Which issue does this PR close? N/A ## Rationale for this change If we drop memory pools before all bytes are freed, it could lead to a memory leak ov

Re: [PR] Show LogicalType name for `INFORMATION_SCHEMA` [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15965: URL: https://github.com/apache/datafusion/pull/15965#issuecomment-2860153585 Thanks again @goldmedal -- looks like a nice change to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Show LogicalType name for `INFORMATION_SCHEMA` [datafusion]

2025-05-07 Thread via GitHub
alamb merged PR #15965: URL: https://github.com/apache/datafusion/pull/15965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Migrate Optimizer tests to insta, part6 [datafusion]

2025-05-07 Thread via GitHub
qstommyshu opened a new pull request, #15984: URL: https://github.com/apache/datafusion/pull/15984 ## Which issue does this PR close? - Related #15396 , #15446, #15884, #15893, #15937, #15945 ## Rationale for this change ## What changes are included in thi

[PR] Postgresql ALTER TABLE operation: REPLICA IDENTITY [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
MohamedAbdeen21 opened a new pull request, #1844: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1844 Add support for the psql-specific ALTER TABLE operation REPLICA IDENTITY Docs: https://www.postgresql.org/docs/current/sql-altertable.html -- This is an automated messa

Re: [I] Filter cache based on the paper "Predicate Caching: Query-Driven Secondary Indexing for Cloud Data" [datafusion]

2025-05-07 Thread via GitHub
alamb commented on issue #15585: URL: https://github.com/apache/datafusion/issues/15585#issuecomment-2860009642 There is some vestigal code in the cache_manager crate https://docs.rs/datafusion/latest/datafusion/execution/cache/cache_manager/index.html that I think could provide a hom

[PR] fix: Bucketed scan fallback for native_datafusion Parquet scan [datafusion-comet]

2025-05-07 Thread via GitHub
mbutrovich opened a new pull request, #1720: URL: https://github.com/apache/datafusion-comet/pull/1720 ## Which issue does this PR close? Closes #. ## Rationale for this change See https://github.com/apache/datafusion-comet/issues/1719 ## What chang

[I] feat: bucketed scan for native_datafusion Parquet scan [datafusion-comet]

2025-05-07 Thread via GitHub
mbutrovich opened a new issue, #1719: URL: https://github.com/apache/datafusion-comet/issues/1719 ### What is the problem the feature request solves? The native_datafusion Parquet scan does not support bucketed scan, and fails most of the tests in Spark's BucketedReadSuite without a f

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
huaxingao commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078282350 ## .github/workflows/iceberg_spark_test_native_datafusion.yml: ## @@ -0,0 +1,80 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more c

Re: [PR] Add support for parsing with semicolons optional [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on PR #1843: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1843#issuecomment-2859830502 Keeping this in draft at the moment to work out the "compile-no-std" ci job, and possibly some more test cases. However, anyone feel free to post a review/thoughts --

Re: [PR] Migrate Optimizer tests to insta, part5 [datafusion]

2025-05-07 Thread via GitHub
alamb commented on PR #15945: URL: https://github.com/apache/datafusion/pull/15945#issuecomment-2859832899 > > I wonder if we are actually done now (❤️ thanks again @qstommyshu ) > > Not yet 😅. I just checked, there are at least 5 files in optimizer tests still needs to be migrated.

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859776487 Nice, that seems like a great result! i think the main improvement seems to be after this would be using the `take_in` API you proposed in arrow-rs (mainly to avoid `concat`)

Re: [PR] Migrate Optimizer tests to insta, part5 [datafusion]

2025-05-07 Thread via GitHub
qstommyshu commented on PR #15945: URL: https://github.com/apache/datafusion/pull/15945#issuecomment-2859773824 > I wonder if we are actually done now (❤️ thanks again @qstommyshu ) Not yet 😅. I just checked, there are at least 5 files in optimizer tests still needs to be migrated.

Re: [PR] feat: Set/cancel with job tag and make max broadcast table size configurable [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1693: URL: https://github.com/apache/datafusion-comet/pull/1693#discussion_r2078146339 ## spark/src/main/spark-3.4/org/apache/comet/shims/ShimCometBroadcastExchangeExec.scala: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Found

Re: [PR] perf: Add performance tracing capability [datafusion-comet]

2025-05-07 Thread via GitHub
parthchandra commented on code in PR #1706: URL: https://github.com/apache/datafusion-comet/pull/1706#discussion_r2078115426 ## native/core/src/execution/tracing.rs: ## @@ -0,0 +1,111 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

[I] Placeholder datatype not inferred after `LIMIT` clause [datafusion]

2025-05-07 Thread via GitHub
kczimm opened a new issue, #15978: URL: https://github.com/apache/datafusion/issues/15978 ### Describe the bug When using a parameterized query with a placeholder indicating the value in the `LIMIT` clause, the datatype is not inferred. ### To Reproduce ```rust let sc

Re: [I] Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` [datafusion]

2025-05-07 Thread via GitHub
acking-you commented on issue #15720: URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2859387925 > Interesting @acking-you, do you have some code/branch you could share? is here: https://github.com/acking-you/arrow-datafusion/commit/f020522eab82f1ff8a7b42b97b1c9

Re: [I] Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` [datafusion]

2025-05-07 Thread via GitHub
Dandandan commented on issue #15720: URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2859376503 Interesting @acking-you, do you have some code/branch you could share? I wonder if you tried out using `RowConverter::append` (e.g. clear + append) https://docs.rs/a

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859359952 I've ran clickbench_partitioned and tpch_mem10 - on a machine with 16 cores. The clickbench results are pretty much the same, tpch_mem10 ran significantly faster. data

Re: [PR] chore: Comet + Iceberg (1.8.1) CI [datafusion-comet]

2025-05-07 Thread via GitHub
hsiang-c commented on code in PR #1715: URL: https://github.com/apache/datafusion-comet/pull/1715#discussion_r2078058620 ## dev/diffs/iceberg/1.8.1.diff: ## @@ -0,0 +1,179 @@ +diff --git a/spark/v3.4/build.gradle b/spark/v3.4/build.gradle +index 6eb26e8..90d848d 100644 +--- a/sp

Re: [PR] implement `AggregateExec.partition_statistics` [datafusion]

2025-05-07 Thread via GitHub
xudong963 commented on PR #15954: URL: https://github.com/apache/datafusion/pull/15954#issuecomment-2858784424 @UBarney Thank you, I'll review it in two days -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Optimize hash partitioning for cache friendliness [datafusion]

2025-05-07 Thread via GitHub
Dandandan commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859289524 nice, could you share some perf numbers of this approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [I] Reuse Rows allocation in SortPreservingMergeStream / `RowCursorStream` [datafusion]

2025-05-07 Thread via GitHub
acking-you commented on issue #15720: URL: https://github.com/apache/datafusion/issues/15720#issuecomment-2859288602 After implementing reuse Rows, it was found that there was no improvement in the overall execution of `SortPreservingMergeExec`. @Dandandan Therefore, I measured the r

[I] Replace `ObjectStoreRegistry` with `object_store`'s new `ObjectStoreRegistry` [datafusion]

2025-05-07 Thread via GitHub
criccomini opened a new issue, #15983: URL: https://github.com/apache/datafusion/issues/15983 ### Is your feature request related to a problem or challenge? I'm working with the `object_store` folks on upstreaming DataFusion's [`ObjectStoreRegistry`](https://docs.rs/datafusion/latest/

[I] Treat truncated parquet stats as inexact [datafusion]

2025-05-07 Thread via GitHub
robert3005 opened a new issue, #15976: URL: https://github.com/apache/datafusion/issues/15976 ### Describe the bug When reading parquet files with truncated stats datafusion will report the min/max as exact even though metadata in the file indicates that min/max has been truncated

Re: [PR] Allow stored procedures to be defined without `BEGIN`/`END` [datafusion-sqlparser-rs]

2025-05-07 Thread via GitHub
aharpervc commented on code in PR #1834: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1834#discussion_r2078041514 ## tests/sqlparser_mssql.rs: ## @@ -100,48 +100,52 @@ fn parse_mssql_delimited_identifiers() { #[test] fn parse_create_procedure() { -let sql

  1   2   >