Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113294832 ## datafusion/datasource/src/file_format.rs: ## @@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt + fmt::Debug { &self,

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918429534 yep, it should be merged after every point is clear, to reduce review burden -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2113286434 ## datafusion/common/src/config.rs: ## @@ -1612,42 +1623,241 @@ impl TableOptions { }; e.0.set(key, value) } +} -/// Initializes

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
berkaysynnada commented on PR #16166: URL: https://github.com/apache/datafusion/pull/16166#issuecomment-2918419765 > Thank you for this contribution @berkaysynnada and @mertak-synnada > > I am a little confused about the new structure and exactly what problem is being solved with this

[PR] Reduce size of `Expr` struct [datafusion]

2025-05-28 Thread via GitHub
hendrikmakait opened a new pull request, #16207: URL: https://github.com/apache/datafusion/pull/16207 ## Which issue does this PR close? - Closes #16199. ## What changes are included in this PR? * Add a test for the size of `Expr` * Change `Expr::WindowFunction

Re: [PR] Implement schema adapter support for FileSource and add integration tests [datafusion]

2025-05-28 Thread via GitHub
kosiew commented on code in PR #16148: URL: https://github.com/apache/datafusion/pull/16148#discussion_r2113115289 ## datafusion/datasource/src/test_util.rs: ## @@ -81,6 +83,8 @@ impl FileSource for MockSource { fn file_type(&self) -> &str { "mock" } + +im

Re: [PR] feat: create builder for disk manager [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on code in PR #16191: URL: https://github.com/apache/datafusion/pull/16191#discussion_r2113090658 ## datafusion/execution/src/disk_manager.rs: ## @@ -32,7 +32,95 @@ use crate::memory_pool::human_readable_size; const DEFAULT_MAX_TEMP_DIRECTORY_SIZE: u64 =

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2918115640 I was using IDE to run the test, and terminate, maybe it's terminated by IDE... We can try again based the PR, because it seems the PR not affect performance. -- T

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2918118294 > 🤖: Benchmark completed > > Details > > ``` > Comparing HEAD and issue_16193 > > Benchmark clickbench_extended.json > --

Re: [I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

2025-05-28 Thread via GitHub
jonathanc-n commented on issue #16206: URL: https://github.com/apache/datafusion/issues/16206#issuecomment-2917999199 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Spark : Fix AQE Tests [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1811: URL: https://github.com/apache/datafusion-comet/pull/1811#issuecomment-2917941265 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1811?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917912411 @mbutrovich looks like this is causing ci failures. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra merged PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] Spark : Fix AQE Tests [datafusion-comet]

2025-05-28 Thread via GitHub
coderfender opened a new pull request, #1811: URL: https://github.com/apache/datafusion-comet/pull/1811 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these chang

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
irenjj commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112941516 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -287,6 +287,105 @@ pub enum LogicalPlan { Unnest(Unnest), /// A variadic query (e.g. "Recursive CTEs")

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
irenjj commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917853644 It looks like @duongcongtoai addressed the depth issue in #16016. Maybe this PR can be merged with #16016 to better verify the depth-related problem? -- This is an automated messag

Re: [PR] fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra merged PR #1785: URL: https://github.com/apache/datafusion-comet/pull/1785 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix: translate missing or corrupt file exceptions in NativeUtil, fall back native scans if asked to ignore [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1765: URL: https://github.com/apache/datafusion-comet/pull/1765#issuecomment-2917808380 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1765?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#issuecomment-2917796361 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1809?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2917789948 > I still think there is a bug here: > > For this test (when running on main): > > ```scala > test("debug datafusion native filter") { > val schema = Struc

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112886589 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with Adaptiv

Re: [PR] fix: fallback to Spark scan if encryption is enabled (native_datafusion/native_iceberg_compat) [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on PR #1785: URL: https://github.com/apache/datafusion-comet/pull/1785#issuecomment-2917762700 @andygrove @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917741777 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917730231 true, i've just realized it. Looks like a feature branch for us to work on is the way then? -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917721651 > Beware that this error is thrown after the planning stage has completed, and it is expected because the current limitation of subquery decorrelation. Oh I was under the i

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917717164 or an easiest way is to have a large feature branch :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[I] perf: only check Parquet type once in NativeBatchReader [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich opened a new issue, #1810: URL: https://github.com/apache/datafusion-comet/issues/1810 NativeBatchReader calls `checkParquetType` on all of the columns on every invocation of `loadNextBatch`. I tried moving it up to `init` but some Spark SQL tests expect the exceptions that this

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818597 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -613,7 +611,10 @@ public void close() throws IOException { @SuppressWarn

Re: [PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich commented on code in PR #1809: URL: https://github.com/apache/datafusion-comet/pull/1809#discussion_r2112818183 ## common/src/main/java/org/apache/comet/parquet/NativeBatchReader.java: ## @@ -321,8 +321,6 @@ public void init() throws Throwable { } long[]

[PR] fix: native_iceberg_compat: move checking parquet types above fetching batch [datafusion-comet]

2025-05-28 Thread via GitHub
mbutrovich opened a new pull request, #1809: URL: https://github.com/apache/datafusion-comet/pull/1809 ## Which issue does this PR close? Partially address #1542. ## Rationale for this change ## What changes are included in this PR? We valid

[I] Excessive Arc-clone in HashJoinStream with StringView on build-side [datafusion]

2025-05-28 Thread via GitHub
ctsk opened a new issue, #16206: URL: https://github.com/apache/datafusion/issues/16206 ### Describe the bug An unfortunate pattern in the hash join implementation leads to excessive Arc-cloning: Assume the build-side carries a string-view column as a payload. Let N be the number of

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917689696 So here are my thoughts (this plan is to split the work in smaller PRs) while avoid breaking things as much as possible: 1. we introduce 3 optimizors, declared in the order b

Re: [I] Spark-compatible CAST operation [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on issue #11201: URL: https://github.com/apache/datafusion/issues/11201#issuecomment-2917688776 The comet implementation already has a `PhysicalExpr` for cast. I was thinking if we could make it datafusion compatible(perhaps it already is) and while making physical exp

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112798304 ## datafusion/expr/src/logical_plan/tree_node.rs: ## @@ -400,6 +403,8 @@ impl LogicalPlan { mut f: F, ) -> Result { match self { +

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917657542 > The results are a little inconsistent. __scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id are not valid fields in the above context. Ideally, all the field in e1, e2 and e

[PR] Add support for parameter default values in SQL Server [datafusion-sqlparser-rs]

2025-05-28 Thread via GitHub
aharpervc opened a new pull request, #1866: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1866 This PR adds support for default values for parameters, as documented here: https://learn.microsoft.com/en-us/sql/t-sql/statements/create-function-transact-sql?view=sql-server-ver17#-

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917647569 🤖: Benchmark completed Details ``` Comparing HEAD and improve-primitive-group-values Benchmark clickbench_extended.json ---

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2917574821 Thanks for all your help @Rachelint and congratulations on the new job -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917562287 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917556654 Currently error look like:- ```sql > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employee

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on PR #16136: URL: https://github.com/apache/datafusion/pull/16136#issuecomment-2917542628 > I think this PR makes things better so approving. Nice work @Rachelint. Thanks @alamb , I think still two blocked things before merging it: - Maybe we should also compare

Re: [PR] Specialized `GroupValues` for `primitive` and `large_primitive` [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2112691325 ## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ## @@ -116,42 +122,60 @@ where { fn intern(&mut self, cols: &[Arr

Re: [PR] Implement intermediate result blocked approach to aggregation memory management [datafusion]

2025-05-28 Thread via GitHub
Rachelint commented on PR #15591: URL: https://github.com/apache/datafusion/pull/15591#issuecomment-2917518553 Thanks @adriangb @Dandandan . I just start my new job this week and a bit busy, and I will continue to push it forward this weekend. The new targets for this one may b

Re: [PR] chore: [native scans] Ignore Spark SQL test for string predicate pushdown [datafusion-comet]

2025-05-28 Thread via GitHub
parthchandra commented on code in PR #1768: URL: https://github.com/apache/datafusion-comet/pull/1768#discussion_r2112650217 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1330,6 +1330,25 @@ class CometExpressionSuite extends CometTestBase with Adap

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917443275 🤖: Benchmark completed Details ``` Comparing HEAD and issue_16193 Benchmark clickbench_extended.json ┏━

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917407566 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917408198 Running the benchmarks again to gather more details -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112613924 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112606619 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files based

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917351712 🤖: Benchmark completed Details ``` Comparing HEAD and issue_16193 Benchmark clickbench_extended.json ┏━

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2112550252 ## datafusion/optimizer/src/decorrelate_general.rs: ## @@ -0,0 +1,1137 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] build(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1127: build(deps): bump object_store from 0.12.0 to 0.12.1 URL: https://github.com/apache/datafusion-python/pull/1127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] build(deps): bump arrow from 55.0.0 to 55.1.0 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1128: build(deps): bump arrow from 55.0.0 to 55.1.0 URL: https://github.com/apache/datafusion-python/pull/1128 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.14 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1124: URL: https://github.com/apache/datafusion-python/pull/1124#issuecomment-2917263369 Looks like ring is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] build(deps): bump ring from 0.17.9 to 0.17.14 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] closed pull request #1124: build(deps): bump ring from 0.17.9 to 0.17.14 URL: https://github.com/apache/datafusion-python/pull/1124 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] build(deps): bump arrow from 55.0.0 to 55.1.0 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1128: URL: https://github.com/apache/datafusion-python/pull/1128#issuecomment-2917263195 Looks like arrow is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build(deps): bump object_store from 0.12.0 to 0.12.1 [datafusion-python]

2025-05-28 Thread via GitHub
dependabot[bot] commented on PR #1127: URL: https://github.com/apache/datafusion-python/pull/1127#issuecomment-2917263077 Looks like object_store is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] Release DataFusion 47.0.0 [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer merged PR #1130: URL: https://github.com/apache/datafusion-python/pull/1130 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Release DataFusion-Python 47.0.0 [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer closed issue #1115: Release DataFusion-Python 47.0.0 URL: https://github.com/apache/datafusion-python/issues/1115 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] feat: support inability to yield for loop when it's not using Tok… [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2917252077 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] fix: Re-enable Spark 4 tests on Linux [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1806: URL: https://github.com/apache/datafusion-comet/pull/1806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Set Formatted TableOptions Enum [datafusion]

2025-05-28 Thread via GitHub
alamb commented on code in PR #16166: URL: https://github.com/apache/datafusion/pull/16166#discussion_r2112474913 ## datafusion/datasource/src/file_format.rs: ## @@ -120,7 +121,26 @@ pub trait FileFormatFactory: Sync + Send + GetExt + fmt::Debug { &self, state

Re: [PR] fix: fall back on nested types for default values [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1799: URL: https://github.com/apache/datafusion-comet/pull/1799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-05-28 Thread via GitHub
duongcongtoai commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2917215746 There is also one thing i want to highlight, is that in DuckDB, a SubqueryExpr may result into 2 output expr after decorrelation, this is because they want to support this q

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112477825 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112474979 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917175705 🤖: Benchmark completed Details ``` Comparing HEAD and issue-15969-error-on-buffer-overflow Benchmark clickbench_extended.json -

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2917168337 Thanks again @ctsk -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16165: URL: https://github.com/apache/datafusion/pull/16165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Set aggregation hash seed [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16165: URL: https://github.com/apache/datafusion/pull/16165#issuecomment-2917168057 Second performance run looks as good / better so let's merge this in! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Shift from Field to FieldRef for all user defined functions [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16122: URL: https://github.com/apache/datafusion/pull/16122 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Reduce Field Copy operations before releasing 48.0.0 [datafusion]

2025-05-28 Thread via GitHub
alamb closed issue #16121: Reduce Field Copy operations before releasing 48.0.0 URL: https://github.com/apache/datafusion/issues/16121 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Shift from Field to FieldRef for all user defined functions [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16122: URL: https://github.com/apache/datafusion/pull/16122#issuecomment-2917166418 Thanks @timsaucer -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-28 Thread via GitHub
alamb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2917159985 In general I agree with the premise that making the filter pushdown APIs easier to use / understand would be very valuable to DataFusion -- the goals @kosiew describe all sound w

Re: [PR] perf: Only add CopyExec if source of `ScanExec` is `native_comet` [datafusion-comet]

2025-05-28 Thread via GitHub
codecov-commenter commented on PR #1808: URL: https://github.com/apache/datafusion-comet/pull/1808#issuecomment-2917149316 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1808?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2917083133 > I'm not able to request reviews. I think only commiters can do that and I'm not a commiter (yet). I think you will need to do the gitbox thing with your apache account (when i

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917078781 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [PR] Return an error on overflow in `do_append_val_inner` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16201: URL: https://github.com/apache/datafusion/pull/16201#issuecomment-2917076809 Thank you @liamzwbao -- this looks good to me. I'll start some benchmarks on this PR and as long as that looks good this PR looks nice to me Thanks again -- This is an automa

Re: [PR] Propagate .execute() calls immediately in `RepartitionExec` [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #16093: URL: https://github.com/apache/datafusion/pull/16093#issuecomment-2917050839 Looks all good to me, so let's go! 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [I] Simplify Filter Pushdown APIs for Better Maintainability and Developer Experience [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on issue #16188: URL: https://github.com/apache/datafusion/issues/16188#issuecomment-2917056155 Thank you @kosiew. Clearly what we have now needs work but I think I'd like to defer cleaning this up until some other folks try to implement more things with these APIs

Re: [I] Interuptable queries in jupyter notebooks [datafusion-python]

2025-05-28 Thread via GitHub
kylebarron commented on issue #1136: URL: https://github.com/apache/datafusion-python/issues/1136#issuecomment-2917052106 See https://pyo3.rs/v0.25.0/faq.html#ctrl-c-doesnt-do-anything-while-my-rust-code-is-executing and https://docs.rs/pyo3/latest/pyo3/marker/struct.Python.html#method.ch

Re: [I] RepartitionExec not immediately propagating `.execute()` calls to children [datafusion]

2025-05-28 Thread via GitHub
alamb closed issue #16088: RepartitionExec not immediately propagating `.execute()` calls to children URL: https://github.com/apache/datafusion/issues/16088 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Propagate .execute() calls immediately in `RepartitionExec` [datafusion]

2025-05-28 Thread via GitHub
alamb merged PR #16093: URL: https://github.com/apache/datafusion/pull/16093 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Example for using a separate threadpool for CPU bound work (try 2) [datafusion]

2025-05-28 Thread via GitHub
alamb commented on PR #14286: URL: https://github.com/apache/datafusion/pull/14286#issuecomment-2917020530 BTW the SpawnService is what should be used now: https://github.com/apache/arrow-rs-object-store/pull/332 Sadly, the docs are broken for the current version of object_store so I

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2917010803 Just tested on Linux. With `USE_TASK = false` I see this ``` Running query; will time out after 5 seconds InfiniteStream::poll_next 1 times InfiniteStream::po

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-28 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916986710 > [@alamb](https://github.com/alamb), how about starting test next week? I think that would be a great idea. Thanks @xudong963 -- This is an automated message from the Ap

Re: [PR] Feat: support bit_count function [datafusion-comet]

2025-05-28 Thread via GitHub
kazantsev-maksim commented on code in PR #1602: URL: https://github.com/apache/datafusion-comet/pull/1602#discussion_r2112325535 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -99,6 +100,73 @@ class CometExpressionSuite extends CometTestBase with Ada

Re: [I] AggregateExec not cancellable [datafusion]

2025-05-28 Thread via GitHub
pepijnve commented on issue #16193: URL: https://github.com/apache/datafusion/issues/16193#issuecomment-2916952966 🤔 testing on my machine your adapted version of the code still just keeps on running. ctrl-c does nothing. The only change I've made is to replace `tokio::test` with `tokio::ma

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916950224 > Sorry for late, I'll check tomorrow (feel free to directly invite me to review by the button, then I'll notice more) I'm not able to request reviews. I think only commiters

Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on code in PR #16157: URL: https://github.com/apache/datafusion/pull/16157#discussion_r2112300128 ## docs/source/user-guide/sql/ddl.md: ## @@ -91,6 +93,23 @@ STORED AS PARQUET LOCATION '/mnt/nyctaxi/tripdata.parquet'; ``` +:::{note} Review Comment: >

Re: [PR] Clarify documentation about gathering statistics for parquet files [datafusion]

2025-05-28 Thread via GitHub
xudong963 merged PR #16157: URL: https://github.com/apache/datafusion/pull/16157 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[I] Interuptable queries in jupyter notebooks [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer opened a new issue, #1136: URL: https://github.com/apache/datafusion-python/issues/1136 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** As a user, if I have written a query that takes a long time, I want to be able t

Re: [I] union all +aggregate function in the recursive cte results an infinite loop [datafusion-python]

2025-05-28 Thread via GitHub
timsaucer commented on issue #1131: URL: https://github.com/apache/datafusion-python/issues/1131#issuecomment-2916920418 Ok to close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on PR #16139: URL: https://github.com/apache/datafusion/pull/16139#issuecomment-2916919380 Sorry for late, I'll check tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] chore: add assertion that not using comet scan but using native scan [datafusion-comet]

2025-05-28 Thread via GitHub
rluvaton commented on PR #1793: URL: https://github.com/apache/datafusion-comet/pull/1793#issuecomment-2916913507 I still think there is a bug here: For this test (when running on main): ```scala test("debug datafusion native filter") { val schema = StructType( Seq(

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-05-28 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2916890212 @alamb, how about starting test next week? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [I] Spike: evaluate if cuDF can be used with datafusion-python [datafusion-python]

2025-05-28 Thread via GitHub
paleolimbot commented on issue #936: URL: https://github.com/apache/datafusion-python/issues/936#issuecomment-2916913205 Just two things I was involved in that may be useful here: - `cudf::from_arrow()`: https://github.com/rapidsai/cudf/blob/2789fa83d943649b982493d68bbba852f848d82c/c

Re: [PR] chore: manual "git bisect" to try and determine when CI failures started [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove closed pull request #1804: chore: manual "git bisect" to try and determine when CI failures started URL: https://github.com/apache/datafusion-comet/pull/1804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Chore: Moved strings expressions to separate file [datafusion-comet]

2025-05-28 Thread via GitHub
andygrove merged PR #1792: URL: https://github.com/apache/datafusion-comet/pull/1792 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Add new stats pruning helpers to allow combining partition values in file level stats [datafusion]

2025-05-28 Thread via GitHub
adriangb commented on code in PR #16139: URL: https://github.com/apache/datafusion/pull/16139#discussion_r2112282032 ## datafusion/common/src/pruning.rs: ## @@ -122,3 +126,1002 @@ pub trait PruningStatistics { values: &HashSet, ) -> Option; } + +/// Prune files ba

  1   2   >