Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-06 Thread via GitHub
ozankabak commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2951967426 > I will investigate that if we can remove some internal yield logic, such as repartition? etc Good idea, I'm curious to see if you can. `RepartitionExec` is a little bit of

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2951966229 RC3 voting is out: https://lists.apache.org/thread/b7r28j9bzk82cvgcoorxk2cz4c90lso1 -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [I] Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries [datafusion]

2025-06-06 Thread via GitHub
ozankabak closed issue #13748: Exponential planning time (100s of seconds) with `UNION` and `ORDER BY` queries URL: https://github.com/apache/datafusion/issues/13748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
ozankabak merged PR #16217: URL: https://github.com/apache/datafusion/pull/16217 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
ozankabak commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2951959753 Thanks @Omega359. Given that we didn't hear any concerns yesterday, I will go ahead and merge this. Thanks everyone for all the reviews! -- This is an automated message fr

Re: [PR] feat: rewrite subquery into dependent join logical plan [datafusion]

2025-06-06 Thread via GitHub
duongcongtoai commented on code in PR #16016: URL: https://github.com/apache/datafusion/pull/16016#discussion_r2133457827 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -287,6 +287,105 @@ pub enum LogicalPlan { Unnest(Unnest), /// A variadic query (e.g. "Recursive

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
xudong963 commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2951911403 I'm cooking rc3! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] [branch-48] Update CHANGELOG for latest 48.0.0 release [datafusion]

2025-06-06 Thread via GitHub
xudong963 merged PR #16314: URL: https://github.com/apache/datafusion/pull/16314 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] [branch-48] Update CHANGELOG for latest 48.0.0 release [datafusion]

2025-06-06 Thread via GitHub
xudong963 commented on PR #16314: URL: https://github.com/apache/datafusion/pull/16314#issuecomment-2951907953 Thank you -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] MySQL: `[[NOT] ENFORCED]` in CHECK constraint [datafusion-sqlparser-rs]

2025-06-06 Thread via GitHub
iffyio merged PR #1870: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] doc: Add SQL examples for SEMI + ANTI Joins [datafusion]

2025-06-06 Thread via GitHub
jonathanc-n opened a new pull request, #16316: URL: https://github.com/apache/datafusion/pull/16316 ## Which issue does this PR close? - Closes #16245 . ## Rationale for this change We are currently missing documentation on `Left Mark Join`, `Right Mark Join`, `Left

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-06 Thread via GitHub
zhuqi-lucas commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2951559003 > I found some time to work on this tonight and it looks good to me now. > > To summarize where we are: > > * We add yields to all leaf nodes, but no yields to any in

Re: [PR] docs: Expand `MemoryPool` docs with related structs [datafusion]

2025-06-06 Thread via GitHub
2010YOUY01 commented on code in PR #16289: URL: https://github.com/apache/datafusion/pull/16289#discussion_r2133222136 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -98,6 +98,64 @@ pub use pool::*; /// operator will spill the intermediate buffers to disk, and release me

Re: [PR] docs: Expand `MemoryPool` docs with related structs [datafusion]

2025-06-06 Thread via GitHub
ding-young commented on code in PR #16289: URL: https://github.com/apache/datafusion/pull/16289#discussion_r2133202144 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -98,6 +98,64 @@ pub use pool::*; /// operator will spill the intermediate buffers to disk, and release me

Re: [PR] Track peak_mem_used in ExternalSorter [datafusion]

2025-06-06 Thread via GitHub
2010YOUY01 commented on PR #16192: URL: https://github.com/apache/datafusion/pull/16192#issuecomment-2951468269 > @2010YOUY01 Hi, I’ve been struggling a bit with tracking peak memory in SPM step, and I was wondering if I could ask for some help. > > ### 1. Can we add the memory for co

Re: [PR] docs: Expand `MemoryPool` docs with related structs [datafusion]

2025-06-06 Thread via GitHub
2010YOUY01 commented on PR #16289: URL: https://github.com/apache/datafusion/pull/16289#issuecomment-2951425666 @ding-young Could you take a look and point out anything that doesn't make sense to help refine the doc? -- This is an automated message from the Apache Git Service. To respond

Re: [PR] docs: Expand `MemoryPool` docs with related structs [datafusion]

2025-06-06 Thread via GitHub
2010YOUY01 commented on code in PR #16289: URL: https://github.com/apache/datafusion/pull/16289#discussion_r2133176809 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -98,6 +98,61 @@ pub use pool::*; /// operator will spill the intermediate buffers to disk, and release me

Re: [PR] MySQL: `[[NOT] ENFORCED]` in CHECK constraint [datafusion-sqlparser-rs]

2025-06-06 Thread via GitHub
MohamedAbdeen21 commented on code in PR #1870: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1870#discussion_r2133132479 ## src/parser/mod.rs: ## @@ -8134,7 +8134,19 @@ impl<'a> Parser<'a> { self.expect_token(&Token::LParen)?; let

Re: [PR] chore: enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-06-06 Thread via GitHub
comphead closed pull request #1813: chore: enable map_values testing since we fall back on nested types for defa… URL: https://github.com/apache/datafusion-comet/pull/1813 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] chore: enable map_values testing since we fall back on nested types for defa… [datafusion-comet]

2025-06-06 Thread via GitHub
comphead commented on PR #1813: URL: https://github.com/apache/datafusion-comet/pull/1813#issuecomment-2951293667 Closing this PR in favor of #1835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Feat: Support Spark 4.0.0 part1 [datafusion-comet]

2025-06-06 Thread via GitHub
parthchandra commented on code in PR #1830: URL: https://github.com/apache/datafusion-comet/pull/1830#discussion_r2133051606 ## common/src/main/java/org/apache/comet/parquet/TypeUtil.java: ## @@ -74,7 +74,8 @@ public static ColumnDescriptor convertToParquet(StructField field) {

Re: [PR] feat: [branch-48] add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove merged PR #16315: URL: https://github.com/apache/datafusion/pull/16315 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

[PR] upgraded spark 3.5.5 to 3.5.6 [datafusion-comet]

2025-06-06 Thread via GitHub
YanivKunda opened a new pull request, #1861: URL: https://github.com/apache/datafusion-comet/pull/1861 ## Which issue does this PR close? Closes #1857 ## Rationale for this change Spark 3.5.6 is the latest stable 3.5.x version, and should be supported.

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
Omega359 commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2951240245 I'll be able to do my tests Sat afternoon or Sun morning. I'm good with doing that against either main or the branch. -- This is an automated message from the Apache Git Service.

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-06 Thread via GitHub
codecov-commenter commented on PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860#issuecomment-2951225627 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1860?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] feat: [branch-48] add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove opened a new pull request, #16315: URL: https://github.com/apache/datafusion/pull/16315 ## Which issue does this PR close? Backports https://github.com/apache/datafusion/pull/16170 to branch-48 ## Rationale for this change ## What changes are inc

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2951210118 Here is the cherry-pick PR to backport https://github.com/apache/datafusion/pull/16170 into `branch-48`. Once this is merged, we can cut rc3. https://github.com/apache/

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-06-06 Thread via GitHub
irenjj commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2951204424 > I am sorry I have not had a chance to review this. I will try and find time over the weekend, but sadly I have several other projects that are higher priority than subqueries to

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-06 Thread via GitHub
comphead commented on code in PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860#discussion_r2133038151 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -862,6 +862,7 @@ abstract class CometTestBase testName: String = "test",

Re: [I] Support metadata on literal values [datafusion]

2025-06-06 Thread via GitHub
andygrove closed issue #15797: Support metadata on literal values URL: https://github.com/apache/datafusion/issues/15797 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove merged PR #16170: URL: https://github.com/apache/datafusion/pull/16170 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2951189324 > Is that right? I took a quick look at Andrew's proposal and it looks like a good idea. I just wanted to avoid too much API churn. That is also my understanding. I will go a

Re: [I] Enter tokio runtime during other FFI calls, such as execute [datafusion]

2025-06-06 Thread via GitHub
westonpace commented on issue #16312: URL: https://github.com/apache/datafusion/issues/16312#issuecomment-2951162788 https://github.com/lancedb/lance/pull/3954 should resolve this as a workaround while we wait for the df-python change. It turns out it was the exec adding the row-addr colum

Re: [I] Enter tokio runtime during other FFI calls, such as execute [datafusion]

2025-06-06 Thread via GitHub
westonpace commented on issue #16312: URL: https://github.com/apache/datafusion/issues/16312#issuecomment-2951175493 One possibility is to wrap execute... ``` let plan = plan.clone(); // Should be cheap since users almost always start with Arc let schema = plan.schema(); let

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
timsaucer commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2951165941 I'm a bit under the weather at the moment, but it sounds like people are leaning towards - merging this to main, cherry picking into rc48 branch - adding Andrew's metadata

Re: [I] Enter tokio runtime during other FFI calls, such as execute [datafusion]

2025-06-06 Thread via GitHub
westonpace commented on issue #16312: URL: https://github.com/apache/datafusion/issues/16312#issuecomment-2951164842 Whoops, you can ignore that mention. I accidentally commented here instead of https://github.com/lancedb/lance/issues/3953 😰 -- This is an automated message from the Apac

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-06 Thread via GitHub
comphead commented on code in PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860#discussion_r2133038738 ## native/core/src/parquet/parquet_support.rs: ## @@ -239,14 +242,24 @@ fn cast_struct_to_struct( parquet_options,

Re: [PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-06 Thread via GitHub
comphead commented on code in PR #1860: URL: https://github.com/apache/datafusion-comet/pull/1860#discussion_r2133040038 ## native/core/src/execution/planner.rs: ## @@ -3331,4 +3347,146 @@ mod tests { Ok(()) } + +#[tokio::test] +async fn test_nested_types

[PR] fix: support read Struct by user schema [datafusion-comet]

2025-06-06 Thread via GitHub
comphead opened a new pull request, #1860: URL: https://github.com/apache/datafusion-comet/pull/1860 ## Which issue does this PR close? Closes #1843 . ## Rationale for this change ## What changes are included in this PR? ## How are these cha

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
ozankabak commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2951003937 OK, I resolved the conflicts and all seems OK. Should we go ahead after CI passes? -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950933126 🤖: Benchmark completed Details ``` group feat_metadata-on-logical-literal main -

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
ozankabak commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2950910237 I think there are some incoming big PRs, so I'd like to go ahead with this and help with fixing any issues @Omega359 finds after merge. In addition to the time cost of going throug

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950877975 > I will try and whip up a prototype of what I am talking about Here is a proposed PR: - https://github.com/timsaucer/datafusion/pull/2 -- This is an automated message from

Re: [I] Support reading multiple parquet files via `datafusion-cli` [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #16303: URL: https://github.com/apache/datafusion/issues/16303#issuecomment-2950888923 Maybe @robtandy could help -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950813601 🤖 `./gh_compare_branch_bench.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch_bench.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950813508 🤖: Benchmark completed Details ``` Comparing HEAD and feat_metadata-on-logical-literal Benchmark clickbench_extended.json -

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2950705663 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2950732437 BTW I think we should cut any release candidate off the `release-48` branch and main is now open for 49.0.0 features -- This is an automated message from the Apache Git Service.

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-06 Thread via GitHub
alamb merged PR #16262: URL: https://github.com/apache/datafusion/pull/16262 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2132891294 ## datafusion/expr/src/expr.rs: ## @@ -274,16 +275,16 @@ use sqlparser::ast::{ /// assert!(rewritten.transformed); /// // to 42 = 5 AND b = 6 /// assert_eq!(rewri

Re: [I] Support columns having the same alias [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #6543: URL: https://github.com/apache/datafusion/issues/6543#issuecomment-2950723461 > What is the best place to create this visitor? I am not 100% sure about an alias In general I think code that would create unique aliases would belong in the planner

[PR] [branch-48] Update CHANGELOG for latest 48.0.0 release [datafusion]

2025-06-06 Thread via GitHub
alamb opened a new pull request, #16314: URL: https://github.com/apache/datafusion/pull/16314 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/15771 ## Rationale for this change We added some more commits to the 48 r

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950734976 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1013-gcp #13~24.04.1-Ubun

Re: [I] Support Glob Expressions for S3 [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #7393: URL: https://github.com/apache/datafusion/issues/7393#issuecomment-2950720194 > Looks good. Thanks LOL now I am just 🎣 for people to actually write the code ;) -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-06 Thread via GitHub
ozankabak commented on code in PR #16196: URL: https://github.com/apache/datafusion/pull/16196#discussion_r2132866332 ## datafusion/datasource/src/source.rs: ## @@ -179,12 +180,17 @@ pub trait DataSource: Send + Sync + Debug { /// the [`FileSource`] trait. /// /// [`FileSourc

Re: [PR] [MAJOR] Equivalence System Overhaul [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16217: URL: https://github.com/apache/datafusion/pull/16217#issuecomment-2950730918 > I removed the conflicts, this is in a good state now. Did we get to the 48 cut-off point yet? We are close I think -- This is an automated message from the Apache Git Servic

Re: [I] General framework to decorrelate the subqueries [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #5492: URL: https://github.com/apache/datafusion/issues/5492#issuecomment-2950729008 I am sorry I have not had a chance to review this. I will try and find time over the weekend, but sadly I have several other projects that are higher priority than subqueries to att

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-06 Thread via GitHub
alamb merged PR #16195: URL: https://github.com/apache/datafusion/pull/16195 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Support defining custom MetricValues in PhysicalPlans [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16195: URL: https://github.com/apache/datafusion/pull/16195#issuecomment-2950705278 We have now made the release-48 branch so what is merged into main will be released as part of DataFusion 49.0.0 -- This is an automated message from the Apache Git Service. To respo

Re: [I] Support distribution as a MetricValue in ExecutionPlan [datafusion]

2025-06-06 Thread via GitHub
alamb closed issue #16044: Support distribution as a MetricValue in ExecutionPlan URL: https://github.com/apache/datafusion/issues/16044 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Extend benchmark comparison script with more detailed statistics [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16262: URL: https://github.com/apache/datafusion/pull/16262#issuecomment-2950701725 🚀 thank you @pepijnve @zhuqi-lucas and @2010YOUY01 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] Minor: Add upgrade guide for `Expr::WindowFunction` [datafusion]

2025-06-06 Thread via GitHub
alamb opened a new pull request, #16313: URL: https://github.com/apache/datafusion/pull/16313 ## Which issue does this PR close? - Part of https://github.com/apache/datafusion/issues/15771 - Related to https://github.com/apache/datafusion/pull/16207 ## Rationale for this chan

Re: [PR] chore: Ignore Spark SQL WholeStageCodegenSuite tests [datafusion-comet]

2025-06-06 Thread via GitHub
codecov-commenter commented on PR #1859: URL: https://github.com/apache/datafusion-comet/pull/1859#issuecomment-2950689342 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1859?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [D] Should ExecutionPlan spawn tasks in `execute` function [datafusion]

2025-06-06 Thread via GitHub
GitHub user timsaucer added a comment to the discussion: Should ExecutionPlan spawn tasks in `execute` function I have converted this discussion into this issue to track correction: https://github.com/apache/datafusion/issues/16312 GitHub link: https://github.com/apache/datafusion/discussion

[I] Enter tokio runtime during other FFI calls, such as execute [datafusion]

2025-06-06 Thread via GitHub
timsaucer opened a new issue, #16312: URL: https://github.com/apache/datafusion/issues/16312 Please see the discussion in the original post below. Some users wish to spawn tasks during calls like `execute` and others. For pure rust implementations without FFI this isn't a problem. How

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2950586335 Update: - @andygrove reverted the regression in the `release-48`: https://github.com/apache/datafusion/pull/16307 - I am pretty sure the upgrade works for Delta.rs: https:/

Re: [I] Spark Test fails `vectorized reader: missing all struct fields` [datafusion-comet]

2025-06-06 Thread via GitHub
parthchandra commented on issue #1843: URL: https://github.com/apache/datafusion-comet/issues/1843#issuecomment-2950544301 Sure. FWI, I also think it is acceptable to document this as a incompatible result and leave it at that. -- This is an automated message from the Apache Git Service

[PR] chore: Ignore Spark SQL WholeStageCodegenSuite tests [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove opened a new pull request, #1859: URL: https://github.com/apache/datafusion-comet/pull/1859 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1852 ## Rationale for this change `WholeStageCodegenSuite` con

Re: [PR] fix: Update broadcast exchange logic to support reused exchanges [datafusion-comet]

2025-06-06 Thread via GitHub
codecov-commenter commented on PR #1858: URL: https://github.com/apache/datafusion-comet/pull/1858#issuecomment-2950470900 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1858?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Panic in `datafusion_expr::window_state::WindowAggState::update` [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #16308: URL: https://github.com/apache/datafusion/issues/16308#issuecomment-2950399925 We reverted the change in DF 48: - https://github.com/apache/datafusion/pull/16307 We can focus on fixing it for real for DataFusion 49.0.0 FYI @suibianwanwank would yo

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on PR #16170: URL: https://github.com/apache/datafusion/pull/16170#issuecomment-2950425160 @alamb @xudong963 I think that we can include this in the next DF 48 rc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Panic in `datafusion_expr::window_state::WindowAggState::update` [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #16308: URL: https://github.com/apache/datafusion/issues/16308#issuecomment-2950422006 I also added this ticket to the list of things we need to do on DataFusion 49 prior to release - https://github.com/apache/datafusion/issues/16235 -- This is an automated mes

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-06 Thread via GitHub
alamb commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2950411440 > I suggest we revert this PR for now and then add more tests based on the failing tests in Spark/Comet so that we can have more confidence when the PR is updated. Update:@andyg

Re: [PR] fix: [branch-48] Revert "Improve performance of constant aggregate window expression" [datafusion]

2025-06-06 Thread via GitHub
alamb merged PR #16307: URL: https://github.com/apache/datafusion/pull/16307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` hack [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2950371179 Test now pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Update or ignore tests in Spark SQL WholeStageCodegenSuite [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove commented on issue #1852: URL: https://github.com/apache/datafusion-comet/issues/1852#issuecomment-2950335615 Comet does not support codegen, so these tests seem irrelevant. @kazuyukitanimura @parthchandra, is there any objection to adding `IgnoreComet` to these tests? --

[PR] fix: Update broadcast exchange logic to support reused exchanges [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove opened a new pull request, #1858: URL: https://github.com/apache/datafusion-comet/pull/1858 ## Which issue does this PR close? N/A ## Rationale for this change This fix was needed to fix some Spark SQL test failures in https://github.com/apache/

Re: [PR] chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove merged PR #1847: URL: https://github.com/apache/datafusion-comet/pull/1847 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Update documentation and ignore Spark SQL tests for known issue with count distinct on NaN in aggregate [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove commented on PR #1847: URL: https://github.com/apache/datafusion-comet/pull/1847#issuecomment-2950239412 Thanks for the review @parthchandra. I will go ahead and merge this and then re-enable the tests once we upgrade to DataFusion 48 -- This is an automated message from the Ap

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-06 Thread via GitHub
jkosh44 commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-2950167607 Of course, yet another solution would be to add the Duration type to substrait, but they'd need to be interested in doing that. -- This is an automated message from the Apache

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-06 Thread via GitHub
jkosh44 commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-2950146627 The query that fails looks like ```sql create table foo (val int, ts1 timestamp, ts2 timestamp, i interval) ... SELECT val, ts1 - ts2 FROM foo ORDER BY ts2 - ts1; ```

Re: [PR] feat: Allow cancelling of grouping operations which are CPU bound [datafusion]

2025-06-06 Thread via GitHub
pepijnve commented on PR #16196: URL: https://github.com/apache/datafusion/pull/16196#issuecomment-2950129956 One performance aspect I've been looking at is the cost of yielding. There's no magic as far as I can tell. Returning a Pending simply leads to a full unwind of the call stack by vi

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Float16 [datafusion]

2025-06-06 Thread via GitHub
jatin510 commented on issue #16298: URL: https://github.com/apache/datafusion/issues/16298#issuecomment-2950117043 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2950061908 I still see test failures: ``` 2025-06-06T18:02:24.9752586Z - aggregate window function for all types *** FAILED *** (260 milliseconds) 2025-06-06T18:02:24.9755112Z 6

Re: [I] Intermittent failures in CI in `test_files/limit.slt` [datafusion]

2025-06-06 Thread via GitHub
andygrove closed issue #16180: Intermittent failures in CI in `test_files/limit.slt` URL: https://github.com/apache/datafusion/issues/16180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Fix intermittent SQL logic test failure in limit.slt by adding ORDER BY clause [datafusion]

2025-06-06 Thread via GitHub
andygrove merged PR #16257: URL: https://github.com/apache/datafusion/pull/16257 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Improve DataFusion subcrate readme files [datafusion]

2025-06-06 Thread via GitHub
andygrove merged PR #16263: URL: https://github.com/apache/datafusion/pull/16263 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [I] Support `DESC ` statement [datafusion]

2025-06-06 Thread via GitHub
ajita-asthana commented on issue #16311: URL: https://github.com/apache/datafusion/issues/16311#issuecomment-2950009153 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: add metadata to literal expressions [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on code in PR #16170: URL: https://github.com/apache/datafusion/pull/16170#discussion_r2132597100 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -342,15 +342,15 @@ mod test { ) .unwrap(); let snap = dynamic_fi

[I] Upgrade to spark-3.5.6 [datafusion-comet]

2025-06-06 Thread via GitHub
YanivKunda opened a new issue, #1857: URL: https://github.com/apache/datafusion-comet/issues/1857 ### What is the problem the feature request solves? Spark 3.5.6 have been released - it should be added to the tested versions and be the default 3.5.x version. ### Describe the po

Re: [PR] fix: [branch-48] Revert "Improve performance of constant aggregate window expression" [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on PR #16307: URL: https://github.com/apache/datafusion/pull/16307#issuecomment-2949922131 @alamb I have confirmed that reverting this change from branch-48 resolves the Comet issue -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] [substrait] [sqllogictest] Unsupported cast type: Duration [datafusion]

2025-06-06 Thread via GitHub
jkosh44 commented on issue #16285: URL: https://github.com/apache/datafusion/issues/16285#issuecomment-2949920467 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2949909716 > @alamb I'm sorry for the issue that occurred. After a preliminary review, I suspect the cause might be: > > ``` > diff --git a/datafusion/physical-expr/src/window/window

Re: [I] Release DataFusion `48.0.0` (June 2025) [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on issue #15771: URL: https://github.com/apache/datafusion/issues/15771#issuecomment-2949900182 > Upgrading Comet to use rc2 causes tests to fail with a `attempt to subtract with overflow` panic. This did not happen with rc1. I have not debugged this yet to find the root

Re: [D] DISCUSSION: DataFusion Meetup in New York, NY, USA [datafusion]

2025-06-06 Thread via GitHub
GitHub user leoDYL added a comment to the discussion: DISCUSSION: DataFusion Meetup in New York, NY, USA That would be amazing! GitHub link: https://github.com/apache/datafusion/discussions/16265#discussioncomment-13392571 This is an automatically sent email for github@datafusion.apache

Re: [I] Panic in `datafusion_expr::window_state::WindowAggState::update` [datafusion]

2025-06-06 Thread via GitHub
andygrove commented on issue #16308: URL: https://github.com/apache/datafusion/issues/16308#issuecomment-2949893396 I did confirm that reverting https://github.com/apache/datafusion/pull/16234 fixes the issue -- This is an automated message from the Apache Git Service. To respond to the m

Re: [I] Panic in `datafusion_expr::window_state::WindowAggState::update` [datafusion]

2025-06-06 Thread via GitHub
alamb commented on issue #16308: URL: https://github.com/apache/datafusion/issues/16308#issuecomment-294988 Revert PR: - https://github.com/apache/datafusion/pull/16307 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: Remove `COMET_SHUFFLE_FALLBACK_TO_COLUMNAR` hack [datafusion-comet]

2025-06-06 Thread via GitHub
andygrove commented on PR #1736: URL: https://github.com/apache/datafusion-comet/pull/1736#issuecomment-2949829487 The remaining failure is related to exchange reuse in TPC-DS q44. ``` 2025-06-06T00:48:02.3079684Z OUTPUT: TakeOrderedAndProject(limit=100, orderBy=[rnk#18761 ASC NUL

Re: [PR] Improve performance of constant aggregate window expression [datafusion]

2025-06-06 Thread via GitHub
suibianwanwank commented on PR #16234: URL: https://github.com/apache/datafusion/pull/16234#issuecomment-2949822560 @alamb I'm sorry for the issue that occurred. After a preliminary review, I suspect the cause might be: ``` diff --git a/datafusion/physical-expr/src/window/window_expr.r

Re: [PR] chore(deps): bump sqllogictest from 0.28.2 to 0.28.3 [datafusion]

2025-06-06 Thread via GitHub
comphead merged PR #16286: URL: https://github.com/apache/datafusion/pull/16286 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

  1   2   3   >