[PR] Support Type Coercion for NULL in Binary Arithmetic Expressions [datafusion]

2025-07-12 Thread via GitHub
kosiew opened a new pull request, #16761: URL: https://github.com/apache/datafusion/pull/16761 ## Which issue does this PR close? - Closes #16760 ## Rationale for this change Currently, binary arithmetic expressions involving `NULL` (e.g., `NULL - DATE '1984-02-28'`) fai

Re: [I] Bloom filters are unused for certain where clause patterns (improve LiteralGuarantee) [datafusion]

2025-07-12 Thread via GitHub
haohuaijin commented on issue #16697: URL: https://github.com/apache/datafusion/issues/16697#issuecomment-3066576864 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[I] Arithmetic expression on `Date` type with `Null` returns planning error (SQLancer) [datafusion]

2025-07-12 Thread via GitHub
2010YOUY01 opened a new issue, #16760: URL: https://github.com/apache/datafusion/issues/16760 ### Describe the bug datafusion-cli is compiled from the latest main commit 4dd78255f ### To Reproduce ```sh DataFusion CLI v48.0.0 > SELECT NULL - DATE '1984-02-28';

Re: [I] Enable comments on datafusion-site via giscus [datafusion-site]

2025-07-12 Thread via GitHub
kevinjqliu commented on issue #80: URL: https://github.com/apache/datafusion-site/issues/80#issuecomment-3066142358 I think this can only be done by the apache org admin, according to [github docs](https://docs.github.com/en/apps/using-github-apps/installing-a-github-app-from-github-marketpl

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-12 Thread via GitHub
colinmarc commented on PR #16750: URL: https://github.com/apache/datafusion/pull/16750#issuecomment-3066014621 Somewhat unrelated, but maybe something like the following would be cleaner (untested)? ```rust trait TableProvider: TableSource { // only scan and get_table_defin

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-12 Thread via GitHub
colinmarc commented on code in PR #16750: URL: https://github.com/apache/datafusion/pull/16750#discussion_r2202889418 ## datafusion/catalog/src/default_table_source.rs: ## @@ -45,6 +45,26 @@ impl DefaultTableSource { pub fn new(table_provider: Arc) -> Self { Self {

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-12 Thread via GitHub
Omega359 commented on code in PR #16644: URL: https://github.com/apache/datafusion/pull/16644#discussion_r2202895589 ## datafusion-cli/CONTRIBUTING.md: ## @@ -29,47 +29,15 @@ cargo test ## Running Storage Integration Tests -By default, storage integration tests are not run.

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-12 Thread via GitHub
Omega359 commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3066007049 > @Omega359 did you ever figure out the problem on your machine? Given that this won't affect anyone unless they are running with `INTEGRATION_TESTS` I think we can merge it and per

Re: [I] Blog post for the DataFusion 49 release [datafusion]

2025-07-12 Thread via GitHub
alamb commented on issue #16758: URL: https://github.com/apache/datafusion/issues/16758#issuecomment-3066004900 I am also standing by to help with this one (given how much stuff is coming out in 49!) -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [I] Blog post for the DataFusion 49 release [datafusion]

2025-07-12 Thread via GitHub
Omega359 commented on issue #16758: URL: https://github.com/apache/datafusion/issues/16758#issuecomment-3066003385 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-12 Thread via GitHub
colinmarc commented on code in PR #16750: URL: https://github.com/apache/datafusion/pull/16750#discussion_r2202889418 ## datafusion/catalog/src/default_table_source.rs: ## @@ -45,6 +45,26 @@ impl DefaultTableSource { pub fn new(table_provider: Arc) -> Self { Self {

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-12 Thread via GitHub
colinmarc commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3065994600 > I have thought of two approaches... Either of these seems plausible to me! It still seems like a shame that the `LogicalExtensionCodec` has to be written with `F

[PR] fix: return NULL if any of the param to make_date is NULL [datafusion]

2025-07-12 Thread via GitHub
feniljain opened a new pull request, #16759: URL: https://github.com/apache/datafusion/pull/16759 ## Which issue does this PR close? - Closes #16746 ## Rationale for this change Consistent make_date behavior with duck_db and postgres ## What changes are included in

Re: [PR] chore: Add benchmarking scripts [datafusion-comet]

2025-07-12 Thread via GitHub
andygrove commented on PR #2025: URL: https://github.com/apache/datafusion-comet/pull/2025#issuecomment-3065967125 Thanks for the review @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] chore: Add benchmarking scripts [datafusion-comet]

2025-07-12 Thread via GitHub
andygrove merged PR #2025: URL: https://github.com/apache/datafusion-comet/pull/2025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-07-12 Thread via GitHub
codecov-commenter commented on PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#issuecomment-3065965553 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2026?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Add benchmarking scripts [datafusion-comet]

2025-07-12 Thread via GitHub
comphead commented on code in PR #2025: URL: https://github.com/apache/datafusion-comet/pull/2025#discussion_r2202869321 ## README.md: ## @@ -48,7 +48,7 @@ The following chart shows the time it takes to run the 22 TPC-H queries against using a single executor with 8 cores. See

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-07-12 Thread via GitHub
petern48 commented on code in PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#discussion_r2202868497 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1790,92 +1793,6 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Chore: Refactor QueryPlanSerde, move math exprs in separate file [datafusion-comet]

2025-07-12 Thread via GitHub
codecov-commenter commented on PR #2027: URL: https://github.com/apache/datafusion-comet/pull/2027#issuecomment-3065957871 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2027?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat: add CopyExec and move CopyExec handling to Spark [datafusion-comet]

2025-07-12 Thread via GitHub
dharanad commented on PR #2001: URL: https://github.com/apache/datafusion-comet/pull/2001#issuecomment-3065953457 @mbutrovich I would appreciate it if you could take a look at this PR and provide some early feedback on whether I'm headed in the right direction -- This is an automated mes

Re: [PR] feat: Upgrade to the official DataFusion 49.0.0 release [datafusion-comet]

2025-07-12 Thread via GitHub
dharanad commented on PR #1997: URL: https://github.com/apache/datafusion-comet/pull/1997#issuecomment-3065949501 Will resume the work on this issue tmrw my time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Bug: `make_date(year, month, day)` reports error if one of the fileds is NULL [datafusion]

2025-07-12 Thread via GitHub
feniljain commented on issue #16746: URL: https://github.com/apache/datafusion/issues/16746#issuecomment-3065936211 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-07-12 Thread via GitHub
andygrove commented on code in PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#discussion_r2202858577 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -1790,92 +1793,6 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Chore: Refactor QueryPlanSerde, move math exprs in separate file [datafusion-comet]

2025-07-12 Thread via GitHub
andygrove commented on PR #2027: URL: https://github.com/apache/datafusion-comet/pull/2027#issuecomment-3065931575 There is some overlap between this PR and https://github.com/apache/datafusion-comet/pull/2018, but I can rebase https://github.com/apache/datafusion-comet/pull/2018 on this P

Re: [I] Serialize user defined functions and table providers via protobuf [datafusion-python]

2025-07-12 Thread via GitHub
timsaucer commented on issue #1181: URL: https://github.com/apache/datafusion-python/issues/1181#issuecomment-3065925816 I've been spending some time today looking at this. I'm try to capture the exact use case and how it might work for the implementation. Suppose we want to create a

[PR] Chore: Refactor QueryPlanSerde, move math exprs in separate file [datafusion-comet]

2025-07-12 Thread via GitHub
kazantsev-maksim opened a new pull request, #2027: URL: https://github.com/apache/datafusion-comet/pull/2027 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2019 Closes #. ## Rationale for this change See https://github.

[PR] refactor: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-07-12 Thread via GitHub
petern48 opened a new pull request, #2026: URL: https://github.com/apache/datafusion-comet/pull/2026 ## Which issue does this PR close? Part of #2019 ## Rationale for this change See https://github.com/apache/datafusion-comet/issues/2019 ## What changes are

Re: [PR] ensure MemTable has at least one partition [datafusion]

2025-07-12 Thread via GitHub
waynexia merged PR #16754: URL: https://github.com/apache/datafusion/pull/16754 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] ensure MemTable has at least one partition [datafusion]

2025-07-12 Thread via GitHub
waynexia commented on PR #16754: URL: https://github.com/apache/datafusion/pull/16754#issuecomment-3065887081 Thanks for your review ❤️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore: Add benchmarking scripts [datafusion-comet]

2025-07-12 Thread via GitHub
codecov-commenter commented on PR #2025: URL: https://github.com/apache/datafusion-comet/pull/2025#issuecomment-3065847027 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2025?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Fix `next_up` and `next_down` behavior for zero float values [datafusion]

2025-07-12 Thread via GitHub
berkaysynnada commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3065846343 > Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in `next_up` and `next_down` makes more sense. > > As for the `total_cmp` changes, that was orig

[PR] chore: Add benchmarking scripts [datafusion-comet]

2025-07-12 Thread via GitHub
andygrove opened a new pull request, #2025: URL: https://github.com/apache/datafusion-comet/pull/2025 ## Which issue does this PR close? N/A ## Rationale for this change Add benchmarking scripts for transparency. ## What changes are included in this

Re: [PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-12 Thread via GitHub
liamzwbao commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3065760187 Thank you for pointing that out, @berkaysynnada and @ozankabak! The fix in `next_up` and `next_down` makes more sense. As for the `total_cmp` changes, that was originally rai

Re: [PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-12 Thread via GitHub
berkaysynnada commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3065737207 BTW, maybe we should modify how floating point ScalarValue's are compared (https://github.com/apache/datafusion/blob/ce3f62a6b2b3a17ba562c72276139174a8da9f50/datafusion/common/s

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-12 Thread via GitHub
jonathanc-n commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3065643465 Maybe we should do this again when tests are passing after @16716 is merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [D] How does Comet compare to Gluten? Are there any plans to integrate with Gluten? [datafusion-comet]

2025-07-12 Thread via GitHub
GitHub user andygrove closed a discussion: How does Comet compare to Gluten? Are there any plans to integrate with Gluten? Comet and Gluten have similar architectures: - They both replace Spark's physical plan with a custom plan - They both use an Intermediate Representation to encode the plan

Re: [PR] 48.0.1 [datafusion]

2025-07-12 Thread via GitHub
xudong963 commented on PR #16755: URL: https://github.com/apache/datafusion/pull/16755#issuecomment-3065568234 mistakes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] 48.0.1 [datafusion]

2025-07-12 Thread via GitHub
xudong963 commented on PR #16755: URL: https://github.com/apache/datafusion/pull/16755#issuecomment-3065568225 mistakes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-12 Thread via GitHub
xudong963 commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3065563872 @alamb Yeah, let's do it (time flies -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Update CI rules [datafusion-python]

2025-07-12 Thread via GitHub
timsaucer opened a new pull request, #1188: URL: https://github.com/apache/datafusion-python/pull/1188 # Which issue does this PR close? None # Rationale for this change This PR copies the configuration of the `arrow-rs` repository. It does not require a committer to si

Re: [PR] build(deps): bump tokio from 1.45.0 to 1.46.1 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] closed pull request #1180: build(deps): bump tokio from 1.45.0 to 1.46.1 URL: https://github.com/apache/datafusion-python/pull/1180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] build(deps): bump arrow from 55.1.0 to 55.2.0 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] closed pull request #1174: build(deps): bump arrow from 55.1.0 to 55.2.0 URL: https://github.com/apache/datafusion-python/pull/1174 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] build(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] commented on PR #1149: URL: https://github.com/apache/datafusion-python/pull/1149#issuecomment-3065252160 Looks like object_store is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] build(deps): bump tokio from 1.45.0 to 1.46.1 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] commented on PR #1180: URL: https://github.com/apache/datafusion-python/pull/1180#issuecomment-3065252213 Looks like tokio is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build(deps): bump arrow from 55.1.0 to 55.2.0 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] commented on PR #1174: URL: https://github.com/apache/datafusion-python/pull/1174#issuecomment-3065252136 Looks like arrow is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] build(deps): bump object_store from 0.12.1 to 0.12.2 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] closed pull request #1149: build(deps): bump object_store from 0.12.1 to 0.12.2 URL: https://github.com/apache/datafusion-python/pull/1149 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] build(deps): bump mimalloc from 0.1.46 to 0.1.47 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] closed pull request #1165: build(deps): bump mimalloc from 0.1.46 to 0.1.47 URL: https://github.com/apache/datafusion-python/pull/1165 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] build(deps): bump mimalloc from 0.1.46 to 0.1.47 [datafusion-python]

2025-07-12 Thread via GitHub
dependabot[bot] commented on PR #1165: URL: https://github.com/apache/datafusion-python/pull/1165#issuecomment-3065252125 Looks like mimalloc is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] 48.0.0 Release [datafusion-python]

2025-07-12 Thread via GitHub
timsaucer merged PR #1175: URL: https://github.com/apache/datafusion-python/pull/1175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Blog post for the DataFusion 47, 48, and 49 releases [datafusion]

2025-07-12 Thread via GitHub
alamb commented on issue #16347: URL: https://github.com/apache/datafusion/issues/16347#issuecomment-3065185292 Since we ended up going with a post per release, I broke out the other 48 and 49 releases into their own tickets for tracking - https://github.com/apache/datafusion/issues/16757

Re: [I] Blog post for the DataFusion 47, 48, and 49 releases [datafusion]

2025-07-12 Thread via GitHub
alamb closed issue #16347: Blog post for the DataFusion 47, 48, and 49 releases URL: https://github.com/apache/datafusion/issues/16347 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Blog post for the DataFusion 47, 48, and 49 releases [datafusion]

2025-07-12 Thread via GitHub
alamb commented on issue #16347: URL: https://github.com/apache/datafusion/issues/16347#issuecomment-3065185462 Thank you very much @Omega359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[I] Blog post for the DataFusion 49 release [datafusion]

2025-07-12 Thread via GitHub
alamb opened a new issue, #16758: URL: https://github.com/apache/datafusion/issues/16758 ### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/14836 - Broken out from https://github.com/apache/datafusion/issues/16347 (wh

[I] Blog post for the DataFusion 48 releases [datafusion]

2025-07-12 Thread via GitHub
alamb opened a new issue, #16757: URL: https://github.com/apache/datafusion/issues/16757 ### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/14836 - Broken out from https://github.com/apache/datafusion/issues/16347 (wh

Re: [PR] Fix in list round trip in df proto [datafusion]

2025-07-12 Thread via GitHub
alamb merged PR #16744: URL: https://github.com/apache/datafusion/pull/16744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] TPC-H Q16 fails during deserialization [datafusion]

2025-07-12 Thread via GitHub
alamb closed issue #16665: TPC-H Q16 fails during deserialization URL: https://github.com/apache/datafusion/issues/16665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3065147959 13% faster -- not bad! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] POC: Test DataFusion with experimental Parquet Filter Pushdown (try 4) [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16711: URL: https://github.com/apache/datafusion/pull/16711#issuecomment-3065140418 > > Thus I think we should proceed trying to get [apache/arrow-rs#7850](https://github.com/apache/arrow-rs/pull/7850) merged. > > Great! I plan to take another look in a few days

Re: [PR] feat(datafusion-proto): allow TableSource to be serialized [datafusion]

2025-07-12 Thread via GitHub
alamb commented on code in PR #16750: URL: https://github.com/apache/datafusion/pull/16750#discussion_r2202549466 ## datafusion/catalog/src/default_table_source.rs: ## @@ -45,6 +45,26 @@ impl DefaultTableSource { pub fn new(table_provider: Arc) -> Self { Self { tab

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
blaginin commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3065139181 before https://github.com/apache/datafusion/actions/runs/16237392836/attempts/1 after https://github.com/apache/datafusion/actions/runs/16237392836 seems t

[I] [BLOG] Blog post about writing your own SQL dialect / extending SQL with DataFusion [datafusion]

2025-07-12 Thread via GitHub
alamb opened a new issue, #16756: URL: https://github.com/apache/datafusion/issues/16756 ### Is your feature request related to a problem or challenge? https://github.com/apache/datafusion/issues/14836 @pepijnve [asked in discord](https://discord.com/channels/885562378132000778

Re: [I] Blog Post for Accelerating Query Processing with Specialized Indexes [datafusion]

2025-07-12 Thread via GitHub
alamb commented on issue #16372: URL: https://github.com/apache/datafusion/issues/16372#issuecomment-3065091603 We plan to publish this July 14 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Release DataFusion `49.0.0` (July 2025) [datafusion]

2025-07-12 Thread via GitHub
alamb commented on issue #16235: URL: https://github.com/apache/datafusion/issues/16235#issuecomment-3065090399 I am thinking we should begin the 49.0.0 release next week. I'll publicize to the lists / etc -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] feat: Add a configuration to make parquet encryption optional [datafusion]

2025-07-12 Thread via GitHub
alamb commented on code in PR #16649: URL: https://github.com/apache/datafusion/pull/16649#discussion_r2202528175 ## datafusion/datasource-parquet/src/file_format.rs: ## @@ -959,14 +953,18 @@ pub async fn fetch_parquet_metadata( store: &dyn ObjectStore, meta: &ObjectMe

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
alamb merged PR #16709: URL: https://github.com/apache/datafusion/pull/16709 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3065073253 Well, let's merge it and see how it works in action and iterate ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-12 Thread via GitHub
alamb commented on code in PR #16644: URL: https://github.com/apache/datafusion/pull/16644#discussion_r2202509446 ## datafusion-cli/tests/cli_integration.rs: ## @@ -35,6 +40,67 @@ fn make_settings() -> Settings { settings } +async fn setup_minio_container() -> ContainerA

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
blaginin commented on code in PR #16709: URL: https://github.com/apache/datafusion/pull/16709#discussion_r2202511856 ## .github/actions/setup-macos-aarch64-builder/action.yaml: ## @@ -45,5 +45,7 @@ runs: rustup component add rustfmt - name: Setup rust cache

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
blaginin commented on code in PR #16709: URL: https://github.com/apache/datafusion/pull/16709#discussion_r2202511007 ## .github/actions/setup-macos-aarch64-builder/action.yaml: ## @@ -45,5 +45,7 @@ runs: rustup component add rustfmt - name: Setup rust cache

Re: [I] Update Fuzz tests to include Dict with null values [datafusion]

2025-07-12 Thread via GitHub
kosiew closed issue #16266: Update Fuzz tests to include Dict with null values URL: https://github.com/apache/datafusion/issues/16266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Improve dictionary null handling in hashing and expand aggregate test coverage for nulls [datafusion]

2025-07-12 Thread via GitHub
kosiew merged PR #16466: URL: https://github.com/apache/datafusion/pull/16466 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafus

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
blaginin commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3065061371 > I did look at the timings for the CI jobs on this PR and on main and they didn't seem all that different. However since most of this caching is happening on main I suspect we'll s

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16709: URL: https://github.com/apache/datafusion/pull/16709#issuecomment-3065058878 https://github.com/apache/datafusion/actions/runs/16207285209/job/45760367631?pr=16709 Seems to still be 5 minutes: https://github.com/user-attachments/assets/478cef15-5e66-4965-9

Re: [PR] Improve Ci cache [datafusion]

2025-07-12 Thread via GitHub
alamb commented on code in PR #16709: URL: https://github.com/apache/datafusion/pull/16709#discussion_r2202500711 ## .github/actions/setup-macos-aarch64-builder/action.yaml: ## @@ -45,5 +45,7 @@ runs: rustup component add rustfmt - name: Setup rust cache use

Re: [PR] Auto start testcontainers for `datafusion-cli` [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16644: URL: https://github.com/apache/datafusion/pull/16644#issuecomment-3065047688 This looks great -- I am testing it out now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3065047251 🤖: Benchmark completed Details ``` Comparing HEAD and optimize-hj Benchmark tpch_mem_sf10.json ┏━━━

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3065047230 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3065047209 🤖: Benchmark completed Details ``` Comparing HEAD and optimize-hj Benchmark clickbench_extended.json ┏━

Re: [PR] Fix invalid intervals in `satisfy_greater` [datafusion]

2025-07-12 Thread via GitHub
berkaysynnada commented on PR #16745: URL: https://github.com/apache/datafusion/pull/16745#issuecomment-3065030622 Currently, when computing the next representable float from +0.0 or -0.0, the behavior incorrectly skips directly to the smallest subnormal (±ε) instead of transitioning betwee

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-12 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202485385 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,15 +845,125 @@ impl NestedLoopJoinStream { let poll = handle_state!(self

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-12 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202485179 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -705,8 +696,29 @@ impl NestedLoopJoinStreamState { } } +/// Tracks incremental output of j

Re: [PR] feat: Optimize `collect_left_input` processing [datafusion]

2025-07-12 Thread via GitHub
alamb commented on PR #16727: URL: https://github.com/apache/datafusion/pull/16727#issuecomment-3065018540 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1016-gcp #16~24.04.1-Ubun

Re: [PR] limit intermediate batch size in nested_loop_join [datafusion]

2025-07-12 Thread via GitHub
UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2202340283 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -828,13 +833,127 @@ impl NestedLoopJoinStream { handle_state!(self.process_pr

Re: [PR] ensure MemTable has at least one partition [datafusion]

2025-07-12 Thread via GitHub
2010YOUY01 commented on code in PR #16754: URL: https://github.com/apache/datafusion/pull/16754#discussion_r2202414020 ## datafusion/catalog/src/memory/table.rs: ## @@ -69,6 +69,10 @@ pub struct MemTable { impl MemTable { /// Create a new in-memory table from the provided