[PR] perf: Respect Spark's PARQUET_FILTER_PUSHDOWN_ENABLED config [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove opened a new pull request, #1619: URL: https://github.com/apache/datafusion-comet/pull/1619 ## Which issue does this PR close? N/A ## Rationale for this change My primary motivation was to be able to run benchmarks with the new scans with and wi

Re: [I] Benchmark / program to test Spilling Sorts [datafusion]

2025-04-09 Thread via GitHub
ding-young commented on issue #15664: URL: https://github.com/apache/datafusion/issues/15664#issuecomment-2791674000 I think we can refer to the existing microbenchmark for external aggregation ([external_aggr.rs](https://github.com/apache/datafusion/blob/main/benchmarks/src/bin/external_agg

Re: [PR] (WIP) Upgrade to arrow/parquet 55 [datafusion]

2025-04-09 Thread via GitHub
Dandandan commented on PR #15466: URL: https://github.com/apache/datafusion/pull/15466#issuecomment-2791658055 I wonder if `alamb_test_upgrade_54` has the latest version of 54? Some performance improvements happened there as well (e.g. https://github.com/apache/arrow-rs/pull/7195/files shou

Re: [PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on code in PR #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803#discussion_r2036578066 ## src/tokenizer.rs: ## @@ -895,7 +895,7 @@ impl<'a> Tokenizer<'a> { }; let mut location = state.location(); -while let Some

Re: [PR] Add support for MySQL's STRAIGHT_JOIN join operator. [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on code in PR #1802: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1802#discussion_r2036520283 ## src/ast/query.rs: ## @@ -2197,6 +2200,8 @@ pub enum JoinOperator { match_condition: Expr, constraint: JoinConstraint, }, +

Re: [PR] Support for projection item prefix operator (CONNECT_BY_ROOT) [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on PR #1780: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1780#issuecomment-2791591778 Marking as draft in the meantime as this PR is no longer pending review. @tomershaniii please feel free to undraft and ping when ready! -- This is an automated message f

Re: [PR] Add support for 'IN ' [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on PR #1793: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1793#issuecomment-2791589818 Marking as draft in the meantime as this is no longer pending review, @adamchainz please feel free to undraft and ping when ready! -- This is an automated message from t

Re: [PR] add support to nested join_without parentheses snowflake [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on code in PR #1799: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1799#discussion_r2036518287 ## src/parser/mod.rs: ## @@ -11823,7 +11828,16 @@ impl<'a> Parser<'a> { } _ => break, }; -

Re: [PR] add support to nested join_without parentheses snowflake [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio commented on code in PR #1799: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1799#discussion_r2036518287 ## src/parser/mod.rs: ## @@ -11823,7 +11828,16 @@ impl<'a> Parser<'a> { } _ => break, }; -

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
2010YOUY01 commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2791568836 I tried a simple benchmark: 1. Under `datafusion/datafusion-cli`, compile and run with 100M memory limit `cargo run --profile release-nonlto -- --mem-pool-type fair -m 100

Re: [PR] Allow literal backslash escapes for string literals in Redshift dialect. [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
iffyio merged PR #1801: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1801 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[I] Benchmark / program to test Spilling Joins [datafusion]

2025-04-09 Thread via GitHub
alamb opened a new issue, #15664: URL: https://github.com/apache/datafusion/issues/15664 ### Is your feature request related to a problem or challenge? - Part of https://github.com/apache/datafusion/issues/15271 There are many interesting ideas on how to improve DataFusion whil

Re: [I] Fix PREPARE statement tests [datafusion]

2025-04-09 Thread via GitHub
brayanjuls commented on issue #15577: URL: https://github.com/apache/datafusion/issues/15577#issuecomment-2791527017 I was investigating this issue and it seems those tests supposed to be for PREPARE statements, as the goal when they were implemented was to test infer types on prepare state

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2791514957 > And some other thoughts: > > 1. This is a pretty complicated program, maybe we should write some unit tests to make sure it doesn't break for future modifications?

[PR] feat: support min/max for struct [datafusion]

2025-04-09 Thread via GitHub
chenkovsky opened a new pull request, #15667: URL: https://github.com/apache/datafusion/pull/15667 ## Which issue does this PR close? - Closes #15666. ## Rationale for this change datafusion doesn't support min/max for struct. ## What changes are included in this P

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2791500223 Thank you all for the review! @qstommyshu I agree with the implementation-level feedbacks. I will address them in the refactor. @alamb Regarding parallel merging: I w

Re: [I] min/max aggregation function for struct [datafusion]

2025-04-09 Thread via GitHub
chenkovsky commented on issue #15666: URL: https://github.com/apache/datafusion/issues/15666#issuecomment-2791497910 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: support inner iejoin [datafusion]

2025-04-09 Thread via GitHub
github-actions[bot] commented on PR #12754: URL: https://github.com/apache/datafusion/pull/12754#issuecomment-2791365482 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: recursion protection for physical plan node [datafusion]

2025-04-09 Thread via GitHub
xudong963 commented on PR #15600: URL: https://github.com/apache/datafusion/pull/15600#issuecomment-2786981726 > I move out the function and reduce the "stack size" the stack overflow is gone. `sql_array_literal` is another one example too. `try_from_physical_plan` has the large function bo

Re: [PR] fix: union all by name [datafusion]

2025-04-09 Thread via GitHub
Omega359 commented on PR #15603: URL: https://github.com/apache/datafusion/pull/15603#issuecomment-2790862097 Thanks for looking into the nullable issue, it's been on my plate for a bit to look into some more. It's really the last blocker I know of for union by name to work correctly.

Re: [PR] perf: Add new Comet PARQUET_FILTER_PUSHDOWN_ENABLED config [datafusion-comet]

2025-04-09 Thread via GitHub
codecov-commenter commented on PR #1619: URL: https://github.com/apache/datafusion-comet/pull/1619#issuecomment-2784433520 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1619?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2025-04-09 Thread via GitHub
github-actions[bot] closed pull request #12523: Add hook for sharing join state in distributed execution URL: https://github.com/apache/datafusion/pull/12523 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Add more testы for nested types combinations for `native_datafusion` [datafusion-comet]

2025-04-09 Thread via GitHub
comphead commented on code in PR #1632: URL: https://github.com/apache/datafusion-comet/pull/1632#discussion_r2036308811 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -143,4 +143,96 @@ class CometNativeReaderSuite extends CometTestBase with A

[PR] feat: Add more test for nested types combinations [datafusion-comet]

2025-04-09 Thread via GitHub
comphead opened a new pull request, #1632: URL: https://github.com/apache/datafusion-comet/pull/1632 ## Which issue does this PR close? Related to #1595 . ## Rationale for this change Adding unit tests for more nested types combinations ## What changes are

[I] GlobalLimitExec execution offset pagination query results in internal error [datafusion]

2025-04-09 Thread via GitHub
lalaorya opened a new issue, #15665: URL: https://github.com/apache/datafusion/issues/15665 ### Describe the bug When using the LIMIT clause, simple `LIMIT N` syntax (such as `LIMIT 10`) works normally, but when using the syntax with an offset (such as `LIMIT 10,20`), it fails and re

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-09 Thread via GitHub
comphead commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2036316165 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -236,16 +171,70 @@ class CometSparkSessionExtensions CometScanE

Re: [PR] feat: Add more test for nested types combinations [datafusion-comet]

2025-04-09 Thread via GitHub
comphead commented on code in PR #1632: URL: https://github.com/apache/datafusion-comet/pull/1632#discussion_r2036308811 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -143,4 +143,96 @@ class CometNativeReaderSuite extends CometTestBase with A

Re: [I] Read STRUCT of MAP fields` with datafusion reader fails with schema issue [datafusion-comet]

2025-04-09 Thread via GitHub
comphead commented on issue #1633: URL: https://github.com/apache/datafusion-comet/issues/1633#issuecomment-2791245930 To reproduce ``` test("native reader - read STRUCT of MAP fields") { testSingleLineQuery( """ |select named_struct('m0', map('a', 1))

Re: [PR] Fix: after repartitioning, the `PartitionedFile` and `FileGroup` statistics should be inexact/recomputed [datafusion]

2025-04-09 Thread via GitHub
xudong963 commented on code in PR #15539: URL: https://github.com/apache/datafusion/pull/15539#discussion_r2036278558 ## datafusion/datasource/src/statistics.rs: ## @@ -410,23 +410,24 @@ pub async fn get_statistics_with_limit( } /// Generic function to compute statistics acr

Re: [PR] feat: Add more test for nested types combinations [datafusion-comet]

2025-04-09 Thread via GitHub
comphead commented on code in PR #1632: URL: https://github.com/apache/datafusion-comet/pull/1632#discussion_r2036308245 ## spark/src/test/scala/org/apache/comet/exec/CometNativeReaderSuite.scala: ## @@ -143,4 +143,96 @@ class CometNativeReaderSuite extends CometTestBase with A

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
andygrove commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2791138548 I created a PR in Comet to use DF from this PR - https://github.com/apache/datafusion-comet/pull/1629 I did not have time to run benchmarks today but hope to tomorrow -- T

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-09 Thread via GitHub
alamb commented on code in PR #67: URL: https://github.com/apache/datafusion-site/pull/67#discussion_r2036245247 ## content/blog/2025-04-10-fastest-tpch-generator.md: ## @@ -0,0 +1,613 @@ +--- +layout: post +title: tpchgen-rs World’s fastest open source TPC-H data generator, wri

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-09 Thread via GitHub
parthchandra commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2036258826 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -246,6 +183,65 @@ class CometSparkSessionExtensions } } + pri

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2036189365 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -246,6 +183,65 @@ class CometSparkSessionExtensions } } + privat

Re: [PR] perf: Introduce sort prefix computation for early TopK exit optimization on partially sorted input (10x speedup on top10 bench) [datafusion]

2025-04-09 Thread via GitHub
berkaysynnada commented on PR #15563: URL: https://github.com/apache/datafusion/pull/15563#issuecomment-2788659010 @geoffreyclaude I'm going to merge this if you're done with this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-09 Thread via GitHub
alamb commented on code in PR #67: URL: https://github.com/apache/datafusion-site/pull/67#discussion_r2036251572 ## content/blog/2025-04-10-fastest-tpch-generator.md: ## @@ -0,0 +1,613 @@ +--- +layout: post +title: tpchgen-rs World’s fastest open source TPC-H data generator, wri

Re: [PR] docs: docs for benchmarking in aws ec2 [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove merged PR #1601: URL: https://github.com/apache/datafusion-comet/pull/1601 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Add documentation for benchmarking Comet in AWS with S3 data source [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove closed issue #1583: Add documentation for benchmarking Comet in AWS with S3 data source URL: https://github.com/apache/datafusion-comet/issues/1583 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

[I] Dockerfile for generating TPC-DS data not working [datafusion-benchmarks]

2025-04-09 Thread via GitHub
viirya opened a new issue, #22: URL: https://github.com/apache/datafusion-benchmarks/issues/22 Run `docker build -t datafusion-benchmarks/tpcdsgen .` according to the README.md under `tpcds`. `tpctools v0.7.0` cannot be built currently. ``` 8.751 error: failed to compile `tpctools

Re: [PR] docs: docs for benchmarking in aws ec2 [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on code in PR #1601: URL: https://github.com/apache/datafusion-comet/pull/1601#discussion_r2036223583 ## docs/source/contributor-guide/benchmarking_aws_ec2.md: ## @@ -0,0 +1,223 @@ + + +# Comet Benchmarking in AWS + +This guide is for setting up benchmarks on

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2036221582 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -283,6 +284,51 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { /

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
alamb commented on code in PR #15654: URL: https://github.com/apache/datafusion/pull/15654#discussion_r2036219770 ## datafusion/physical-plan/src/spill/mod.rs: ## @@ -24,27 +24,156 @@ use std::fs::File; use std::io::BufReader; use std::path::{Path, PathBuf}; use std::ptr::Non

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
alamb commented on PR #15654: URL: https://github.com/apache/datafusion/pull/15654#issuecomment-2791117066 > > Does anyone know if we have benchmarks for sorting / spilling I could run to verify the impact of this PR on their behavior? > > I took a brief look but didn't find any >

Re: [PR] perf: Use a global tokio runtime [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on PR #1614: URL: https://github.com/apache/datafusion-comet/pull/1614#issuecomment-2783551229 @comphead @parthchandra could I get a committer review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2791076569 > Maybe it is possible to move the recursion into the optimizer rule but still keep a `ExecutionPlan` method by making a complex call signature, maybe something like this:

Re: [PR] Enhance: simplify x=x [datafusion]

2025-04-09 Thread via GitHub
alamb commented on code in PR #15589: URL: https://github.com/apache/datafusion/pull/15589#discussion_r2036171313 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6140,21 +6140,19 @@ logical_plan 02)--Aggregate: groupBy=[[]], aggr=[[count(Int64(1))]] 03)SubqueryAlia

Re: [PR] fix: add map coercion for binary ops [datafusion]

2025-04-09 Thread via GitHub
alamb commented on code in PR #15551: URL: https://github.com/apache/datafusion/pull/15551#discussion_r2036175257 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -987,6 +988,25 @@ fn coerce_fields(common_type: DataType, lhs: &FieldRef, rhs: &FieldRef) -> Field

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
ashdnazg commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2036184085 ## datafusion/common-runtime/src/common.rs: ## @@ -77,17 +82,32 @@ impl SpawnedTask { } } +impl Future for SpawnedTask { +type Output = Result; + +

Re: [PR] fix decimal precision issue in simplify expression optimize rule [datafusion]

2025-04-09 Thread via GitHub
shehabgamin commented on code in PR #15588: URL: https://github.com/apache/datafusion/pull/15588#discussion_r2032217056 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1997,6 +2010,78 @@ fn is_exactly_true(expr: Expr, info: &impl SimplifyInfo) -> Res

Re: [PR] fix decimal precision issue in simplify expression optimize rule [datafusion]

2025-04-09 Thread via GitHub
jayzhan211 commented on code in PR #15588: URL: https://github.com/apache/datafusion/pull/15588#discussion_r2032396391 ## datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs: ## @@ -1997,6 +2010,78 @@ fn is_exactly_true(expr: Expr, info: &impl SimplifyInfo) -> Resu

Re: [PR] chore: Add manually-triggered CI jobs for testing Spark SQL with native scans [datafusion-comet]

2025-04-09 Thread via GitHub
parthchandra commented on code in PR #1624: URL: https://github.com/apache/datafusion-comet/pull/1624#discussion_r2036139372 ## .github/workflows/spark_sql_test_native_datafusion.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more co

Re: [D] Gathering Ideas for WASM web playground design [datafusion]

2025-04-09 Thread via GitHub
GitHub user qstommyshu added a comment to the discussion: Gathering Ideas for WASM web playground design Thanks for your idea @backkem ! > IDK if there is broad enough interest for it but I think querying a remote > DataFusion instance would be a cool feature. Technically, I think this can be

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
alamb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2791020940 Maybe it is possible to move the recursion into the optimizer rule but still keep a `ExecutionPlan` method by making a complex call signature, maybe something like this: ```rust

Re: [PR] chore: Rename protobuf Java package [datafusion]

2025-04-09 Thread via GitHub
alamb merged PR #15658: URL: https://github.com/apache/datafusion/pull/15658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
alamb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2791020563 > It was @alamb that suggested we do it this way, unless I misunderstood his suggestion. > > I think it's possible to do the recursion as an optimizer rule but making the APIs f

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
alamb commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2036145638 ## datafusion/core/tests/physical_optimizer/filter_pushdown.rs: ## @@ -0,0 +1,529 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2035755891 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -283,6 +284,55 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug + DynEq + DynHash { /

Re: [PR] feat: add MAP type support for first level [datafusion-comet]

2025-04-09 Thread via GitHub
comphead merged PR #1603: URL: https://github.com/apache/datafusion-comet/pull/1603 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2025-04-09 Thread via GitHub
Dandandan closed pull request #11943: Sketch for aggregation intermediate results blocked management URL: https://github.com/apache/datafusion/pull/11943 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-09 Thread via GitHub
paleolimbot commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2790938185 I'll take a look this evening! It's mostly updating to Arrow 55, but https://github.com/apache/datafusion/pull/15663 (in particular https://github.com/apache/datafusion/pu

[PR] [WIP] Experiment with DataFusion against Arrow with Extension DataType support [datafusion]

2025-04-09 Thread via GitHub
paleolimbot opened a new pull request, #15663: URL: https://github.com/apache/datafusion/pull/15663 ## Which issue does this PR close? Another experiment in pursuit of https://github.com/apache/datafusion/issues/12644 ## Rationale for this change It has been suggested th

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2025-04-09 Thread via GitHub
Dandandan commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r2036098003 ## datafusion/common/src/config.rs: ## @@ -338,6 +338,19 @@ config_namespace! { /// if the source of statistics is accurate. /// We plan to mak

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
alamb commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2035644581 ## datafusion/common-runtime/src/common.rs: ## @@ -15,18 +15,23 @@ // specific language governing permissions and limitations // under the License. -use std::fut

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2036084527 ## datafusion/expr/src/filter_pushdown.rs: ## @@ -0,0 +1,55 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

[PR] chore(deps): bump tokio from 1.44.1 to 1.44.2 [datafusion]

2025-04-09 Thread via GitHub
dependabot[bot] opened a new pull request, #15627: URL: https://github.com/apache/datafusion/pull/15627 Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.44.1 to 1.44.2. Release notes Sourced from https://github.com/tokio-rs/tokio/releases";>tokio's releases. Tokio v1.4

Re: [PR] feat: Add `array_min` function support [datafusion]

2025-04-09 Thread via GitHub
rluvaton commented on code in PR #14417: URL: https://github.com/apache/datafusion/pull/14417#discussion_r2035971319 ## datafusion/functions-nested/src/min.rs: ## @@ -0,0 +1,140 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agr

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
geoffreyclaude commented on code in PR #15566: URL: https://github.com/apache/datafusion/pull/15566#discussion_r2036042823 ## datafusion/expr/src/filter_pushdown.rs: ## @@ -0,0 +1,55 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

[PR] chore: Rename protobuf Java package [datafusion]

2025-04-09 Thread via GitHub
andygrove opened a new pull request, #15658: URL: https://github.com/apache/datafusion/pull/15658 ## Which issue does this PR close? N/A ## Rationale for this change The Java package name needs updating now that DataFusion is a top-level project. #

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
eshed-flarion commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2036057402 ## datafusion/common-runtime/src/common.rs: ## @@ -15,18 +15,23 @@ // specific language governing permissions and limitations // under the License. -use

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
ashdnazg commented on PR #15653: URL: https://github.com/apache/datafusion/pull/15653#issuecomment-2790857599 Tests should be more reliable now :facepalm:. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
ashdnazg commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2036059191 ## datafusion/common-runtime/src/common.rs: ## @@ -15,18 +15,23 @@ // specific language governing permissions and limitations // under the License. -use std::

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-09 Thread via GitHub
parthchandra commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2036049975 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -246,6 +183,65 @@ class CometSparkSessionExtensions } } + pri

Re: [PR] Implement Future for SpawnedTask. [datafusion]

2025-04-09 Thread via GitHub
eshed-flarion commented on code in PR #15653: URL: https://github.com/apache/datafusion/pull/15653#discussion_r2036057402 ## datafusion/common-runtime/src/common.rs: ## @@ -15,18 +15,23 @@ // specific language governing permissions and limitations // under the License. -use

Re: [PR] Show current SQL recursion limit in RecursionLimitExceeded error message [datafusion]

2025-04-09 Thread via GitHub
kumarlokesh commented on code in PR #15644: URL: https://github.com/apache/datafusion/pull/15644#discussion_r2036056433 ## datafusion/sql/src/parser.rs: ## @@ -455,9 +457,16 @@ impl<'a> DFParser<'a> { if let Token::Word(w) = self.parser.peek_nth_token(1

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
ozankabak commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2790048758 This is a great contribution and is very close to merge. However, let's make sure the design is right to avoid API churn. Why do you think `try_swap_with_projection` is similar? I

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-09 Thread via GitHub
Omega359 commented on code in PR #67: URL: https://github.com/apache/datafusion-site/pull/67#discussion_r2036041501 ## content/blog/2025-04-10-fastest-tpch-generator.md: ## @@ -0,0 +1,613 @@ +--- +layout: post +title: tpchgen-rs World’s fastest open source TPC-H data generator,

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2036048151 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -535,56 +457,262 @@ impl ExternalSorter { // reserved again for the next spill. self.merg

Re: [PR] Show current SQL recursion limit in RecursionLimitExceeded error message [datafusion]

2025-04-09 Thread via GitHub
kumarlokesh commented on code in PR #15644: URL: https://github.com/apache/datafusion/pull/15644#discussion_r2036043878 ## datafusion/sql/src/parser.rs: ## @@ -469,17 +478,31 @@ impl<'a> DFParser<'a> { } _ => { /

Re: [PR] [BLOG] tpchgen-rs: World’s fastest open source TPCH data generator, written in Rust [datafusion-site]

2025-04-09 Thread via GitHub
Omega359 commented on code in PR #67: URL: https://github.com/apache/datafusion-site/pull/67#discussion_r2036041501 ## content/blog/2025-04-10-fastest-tpch-generator.md: ## @@ -0,0 +1,613 @@ +--- +layout: post +title: tpchgen-rs World’s fastest open source TPC-H data generator,

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2790824023 For comparison, here is roughly what I had before doing the recursion as part of the `OptimizerRule`: https://github.com/pydantic/datafusion/blob/fbf93a2bdd0a5c1532336026dfa71ac7305

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2036031118 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -535,56 +457,262 @@ impl ExternalSorter { // reserved again for the next spill. self.merg

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2036027203 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -535,56 +457,262 @@ impl ExternalSorter { // reserved again for the next spill. self.merg

Re: [PR] Introduce DynamicFilterSource and DynamicPhysicalExpr [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on code in PR #15568: URL: https://github.com/apache/datafusion/pull/15568#discussion_r2036022198 ## datafusion/physical-expr/src/expressions/dynamic_filters.rs: ## @@ -0,0 +1,442 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more co

Re: [PR] POC: Cascaded spill merge and re-spill [datafusion]

2025-04-09 Thread via GitHub
qstommyshu commented on code in PR #15610: URL: https://github.com/apache/datafusion/pull/15610#discussion_r2036013722 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -535,56 +457,262 @@ impl ExternalSorter { // reserved again for the next spill. self.merg

Re: [PR] ExecutionPlan: add APIs for filter pushdown & optimizer rule to apply them [datafusion]

2025-04-09 Thread via GitHub
adriangb commented on PR #15566: URL: https://github.com/apache/datafusion/pull/15566#issuecomment-2790778558 Some implementations do recurse I think, for similar reasons to our recursion here: https://github.com/pydantic/datafusion/blob/f8a6384bdf21b2eeb7bcfe3f08e52712735bb285/datafusio

Re: [I] deprecating `return_type` in favor of `return_type_from_args` [datafusion]

2025-04-09 Thread via GitHub
rluvaton commented on issue #15662: URL: https://github.com/apache/datafusion/issues/15662#issuecomment-2790765737 @alamb take: > > We can start `return_type` deprecation after this PR > > I recommend we *do not* deprecate return_type - after this PR I think we have a nice API. Nam

Re: [PR] Add Extension Type / Metadata support for Scalar UDFs [datafusion]

2025-04-09 Thread via GitHub
timsaucer commented on PR #15646: URL: https://github.com/apache/datafusion/pull/15646#issuecomment-2790756798 I think Aggregate and Window UDFs should come as a separate PR. I did notice however that for Aggregates the input portion is already viable with this PR. Since `AccumulatorArgs` a

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
rluvaton commented on code in PR #15654: URL: https://github.com/apache/datafusion/pull/15654#discussion_r2035960200 ## datafusion/physical-plan/src/spill/mod.rs: ## @@ -24,27 +24,156 @@ use std::fs::File; use std::io::BufReader; use std::path::{Path, PathBuf}; use std::ptr::

Re: [PR] chore: Change default Spark version to 3.5 [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on code in PR #1620: URL: https://github.com/apache/datafusion-comet/pull/1620#discussion_r2035981075 ## .github/actions/setup-spark-builder/action.yaml: ## @@ -19,13 +19,11 @@ name: Setup Spark Builder description: 'Setup Apache Spark to run SQL tests' inp

Re: [PR] feat: Add unique id for every memory consumer [datafusion]

2025-04-09 Thread via GitHub
EmilyMatt commented on PR #15613: URL: https://github.com/apache/datafusion/pull/15613#issuecomment-2790693147 @alamb Thanks for taking the time to look at this. I've addressed the relevant points and added some general documentation, I've also modified the reserved field to an AtomicUsize(

Re: [PR] chore: Change default Spark version to 3.5 [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on code in PR #1620: URL: https://github.com/apache/datafusion-comet/pull/1620#discussion_r2035965475 ## pom.xml: ## Review Comment: I tried doing this, but it caused regressions in Spark 4 support, so I'd prefer to create a separate PR. I filed a track

[I] Update Maven plugin versions to match Spark 3.5 [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove opened a new issue, #1631: URL: https://github.com/apache/datafusion-comet/issues/1631 ### What is the problem the feature request solves? See comment in https://github.com/apache/datafusion-comet/pull/1620#discussion_r2035772596 for context ### Describe the potentia

Re: [PR] feat: Add unique id for every memory consumer [datafusion]

2025-04-09 Thread via GitHub
EmilyMatt commented on code in PR #15613: URL: https://github.com/apache/datafusion/pull/15613#discussion_r2035961703 ## datafusion/execution/src/memory_pool/pool.rs: ## @@ -261,7 +268,7 @@ fn insufficient_capacity_err( pub struct TrackConsumersPool { inner: I, top: N

Re: [PR] feat: Add unique id for every memory consumer [datafusion]

2025-04-09 Thread via GitHub
EmilyMatt commented on code in PR #15613: URL: https://github.com/apache/datafusion/pull/15613#discussion_r2035959521 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -149,21 +150,65 @@ pub trait MemoryPool: Send + Sync + std::fmt::Debug { /// For help with allocation acco

Re: [PR] Remove waits from blocking threads reading spill files. [datafusion]

2025-04-09 Thread via GitHub
rluvaton commented on code in PR #15654: URL: https://github.com/apache/datafusion/pull/15654#discussion_r2035960200 ## datafusion/physical-plan/src/spill/mod.rs: ## @@ -24,27 +24,156 @@ use std::fs::File; use std::io::BufReader; use std::path::{Path, PathBuf}; use std::ptr::

Re: [D] Gathering Ideas for WASM web playground design [datafusion]

2025-04-09 Thread via GitHub
GitHub user backkem added a comment to the discussion: Gathering Ideas for WASM web playground design IDK if there is broad enough interest for it but I think querying a remote DataFusion instance would be a cool feature. Technically, I think this can be done by adding [grpc-web](https://gith

[PR] Fix tokenization of qualified identifiers with numeric prefix. [datafusion-sqlparser-rs]

2025-04-09 Thread via GitHub
romanb opened a new pull request, #1803: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1803 This PR is a follow-up to https://github.com/apache/datafusion-sqlparser-rs/pull/856. The remaining problem is that queries with qualified identifiers having numeric prefixes currently

Re: [PR] Update datafusion-testing pin (to fix extended test on main) [datafusion]

2025-04-09 Thread via GitHub
alamb commented on PR #15655: URL: https://github.com/apache/datafusion/pull/15655#issuecomment-2790030814 Thanks @phillipleblanc and @xudong963 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: refactor v2 scan conversion [datafusion-comet]

2025-04-09 Thread via GitHub
andygrove commented on code in PR #1621: URL: https://github.com/apache/datafusion-comet/pull/1621#discussion_r2035921195 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -246,6 +183,65 @@ class CometSparkSessionExtensions } } + privat

Re: [PR] fix!: incorrect coercion when comparing with string literals [datafusion]

2025-04-09 Thread via GitHub
alan910127 commented on code in PR #15482: URL: https://github.com/apache/datafusion/pull/15482#discussion_r2029954233 ## datafusion/sqllogictest/test_files/push_down_filter.slt: ## @@ -230,19 +230,19 @@ logical_plan TableScan: t projection=[a], full_filters=[t.a != Int32(100)]

  1   2   3   >