Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
Dandandan commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978979555 > 🤖: Benchmark completed > > Details > 🤖: Benchmark completed > > Details could you maybe confirm the topk benchmark results @alamb ? `topk_tp ch`

Re: [I] Support map lookup by key operation [datafusion-comet]

2025-06-16 Thread via GitHub
dharanad commented on issue #1884: URL: https://github.com/apache/datafusion-comet/issues/1884#issuecomment-2978981590 @comphead can i take this up ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Implement Single Join [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on issue #16425: URL: https://github.com/apache/datafusion/issues/16425#issuecomment-2978925316 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Implement Single Join [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on issue #16425: URL: https://github.com/apache/datafusion/issues/16425#issuecomment-2978925656 Thoughts @Dandandan? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

[I] Implement Single Join [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n opened a new issue, #16425: URL: https://github.com/apache/datafusion/issues/16425 ### Is your feature request related to a problem or challenge? This is part of #13181 for looking into different joins. ## What is a Single Join Single joins are similar to a regula

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
xudong963 commented on code in PR #16290: URL: https://github.com/apache/datafusion/pull/16290#discussion_r2151302101 ## datafusion/common/src/config.rs: ## @@ -259,10 +259,10 @@ config_namespace! { /// string length and thus DataFusion can not enforce such limits.

Re: [I] Mapping Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
xudong963 closed issue #16288: Mapping Char/Text/String default to Utf8View URL: https://github.com/apache/datafusion/issues/16288 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
xudong963 merged PR #16290: URL: https://github.com/apache/datafusion/pull/16290 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-16 Thread via GitHub
zhuqi-lucas commented on code in PR #16398: URL: https://github.com/apache/datafusion/pull/16398#discussion_r2151253373 ## datafusion/core/tests/execution/coop.rs: ## @@ -0,0 +1,722 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Use Tokio's task budget consistently [datafusion]

2025-06-16 Thread via GitHub
zhuqi-lucas commented on code in PR #16398: URL: https://github.com/apache/datafusion/pull/16398#discussion_r2151253373 ## datafusion/core/tests/execution/coop.rs: ## @@ -0,0 +1,722 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

Re: [PR] Add support of parsing struct field's options in BigQuery [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
git-hulk commented on PR #1890: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1890#issuecomment-2978804373 cc @iffyio -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Add support of parsing struct field's options in BigQuery [datafusion-sqlparser-rs]

2025-06-16 Thread via GitHub
git-hulk opened a new pull request, #1890: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1890 According to BigQuery syntax[1], the `OPTIONS` is allowed in both of top-level column definition and struct field. [1] https://cloud.google.com/bigquery/docs/reference/standard

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
zhuqi-lucas commented on code in PR #16290: URL: https://github.com/apache/datafusion/pull/16290#discussion_r2151238976 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -6082,7 +6082,7 @@ physical_plan 04)--AggregateExec: mode=Partial, gby=[], aggr=[count(Int64(1))]

Re: [I] Evaluate filter pushdown against the physical schema for performance and correctness [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on issue #15780: URL: https://github.com/apache/datafusion/issues/15780#issuecomment-2978786991 Thinking out loud about some of the tricky bits: there's going to be cases where we necessarily need to convert to the table schema's data type, e.g. `opaque_udf(col)`: we can'

Re: [PR] Eliminate Self Joins [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16023: URL: https://github.com/apache/datafusion/pull/16023#discussion_r2150927969 ## datafusion/optimizer/src/eliminate_self_join/mod.rs: ## @@ -0,0 +1,150 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
zhuqi-lucas commented on code in PR #16290: URL: https://github.com/apache/datafusion/pull/16290#discussion_r2151211169 ## datafusion/common/src/config.rs: ## @@ -259,10 +259,10 @@ config_namespace! { /// string length and thus DataFusion can not enforce such limits.

Re: [PR] fix: Move `null_equals_null` todo in `NestedLoopJoin` to Physical Planner [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n closed pull request #16390: fix: Move `null_equals_null` todo in `NestedLoopJoin` to Physical Planner URL: https://github.com/apache/datafusion/pull/16390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150916597 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +511,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on partitio

Re: [PR] updatted github action by change version tag to sha hashes [datafusion]

2025-06-16 Thread via GitHub
github-actions[bot] commented on PR #15315: URL: https://github.com/apache/datafusion/pull/15315#issuecomment-2978700211 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Support bounds evaluation for temporal data types [datafusion]

2025-06-16 Thread via GitHub
github-actions[bot] commented on PR #14523: URL: https://github.com/apache/datafusion/pull/14523#issuecomment-2978700370 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] refactor!: consistent null handling in coercible signatures [datafusion]

2025-06-16 Thread via GitHub
github-actions[bot] commented on PR #15404: URL: https://github.com/apache/datafusion/pull/15404#issuecomment-2978700123 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] wip: proto to physical plan conversion [datafusion]

2025-06-16 Thread via GitHub
github-actions[bot] commented on PR #14530: URL: https://github.com/apache/datafusion/pull/14530#issuecomment-2978700324 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Feature/parse float as decimal default true [datafusion]

2025-06-16 Thread via GitHub
github-actions[bot] closed pull request #14752: Feature/parse float as decimal default true URL: https://github.com/apache/datafusion/pull/14752 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978663984 @alamb I think we're ready to merge this and keep chipping away in https://github.com/apache/datafusion/pull/16424 and other spots right? -- This is an automated message from the

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-16 Thread via GitHub
alamb merged PR #16083: URL: https://github.com/apache/datafusion/pull/16083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Use dedicated NullEquality enum instead of null_equals_null boolean [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on PR #16419: URL: https://github.com/apache/datafusion/pull/16419#issuecomment-2977470034 cc @UBarney -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] feat: support array_max [datafusion-comet]

2025-06-16 Thread via GitHub
codecov-commenter commented on PR #1892: URL: https://github.com/apache/datafusion-comet/pull/1892#issuecomment-2978597479 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1892?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

[PR] feat: support array_max [datafusion-comet]

2025-06-16 Thread via GitHub
drexler-sky opened a new pull request, #1892: URL: https://github.com/apache/datafusion-comet/pull/1892 ## Which issue does this PR close? Closes #. ## Rationale for this change adds support for array_max ## What changes are included in this PR?

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150940087 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -509,8 +510,22 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// /// The default i

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
shehabgamin commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2151078955 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -21,6 +21,15 @@ This directory contains test files for the `spark` test suite. +## Implementa

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978219556 I queued up some benchmarks Looking at naming now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
corwinjoy commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2151077910 ## datafusion/common/src/config.rs: ## @@ -591,6 +930,12 @@ config_namespace! { /// writing out already in-memory data, such as from a cached /

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
comphead commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2151065878 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -21,6 +21,15 @@ This directory contains test files for the `spark` test suite. +## Implementatio

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
comphead commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2151067088 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -21,6 +21,15 @@ This directory contains test files for the `spark` test suite. +## Implementatio

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
comphead commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2151065280 ## datafusion/sqllogictest/test_files/spark/README.md: ## @@ -21,6 +21,15 @@ This directory contains test files for the `spark` test suite. +## Implementatio

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
adamreeve commented on code in PR #16351: URL: https://github.com/apache/datafusion/pull/16351#discussion_r2151059658 ## datafusion/common/src/config.rs: ## @@ -188,6 +195,338 @@ macro_rules! config_namespace { } } +#[derive(Clone, Default, Debug, PartialEq)] +pub struct

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2978213655 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] feat: add SchemaProvider::table_type(table_name: &str) [datafusion]

2025-06-16 Thread via GitHub
epgif commented on PR #16401: URL: https://github.com/apache/datafusion/pull/16401#issuecomment-2978427374 Please take another look @comphead -- thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] feat: add SchemaProvider::table_type(table_name: &str) [datafusion]

2025-06-16 Thread via GitHub
epgif commented on code in PR #16401: URL: https://github.com/apache/datafusion/pull/16401#discussion_r2151032750 ## datafusion/catalog/src/schema.rs: ## @@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send { name: &str, ) -> Result>, DataFusionError>;

Re: [PR] feat: add SchemaProvider::table_type(table_name: &str) [datafusion]

2025-06-16 Thread via GitHub
epgif commented on code in PR #16401: URL: https://github.com/apache/datafusion/pull/16401#discussion_r2151033231 ## datafusion/catalog/src/schema.rs: ## @@ -54,6 +55,14 @@ pub trait SchemaProvider: Debug + Sync + Send { name: &str, ) -> Result>, DataFusionError>;

Re: [I] Streamline github actions [datafusion-ballista]

2025-06-16 Thread via GitHub
Huy1Ng commented on issue #1128: URL: https://github.com/apache/datafusion-ballista/issues/1128#issuecomment-2978426251 I can give thisa try, but do we have any idea of what an ideal workflow should be? For example. datafusion Rust workflow also takes 25-30min to complete: https://githu

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978416696 🤖: Benchmark completed Details ``` Comparing HEAD and topk-dynamic-filters Benchmark sort_tpch.json ┏━━

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978414721 🤖: Benchmark completed Details ``` Comparing HEAD and topk-dynamic-filters Benchmark clickbench_extended.json

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978414775 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2978314187 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-06-16 Thread via GitHub
xiedeyantu commented on PR #16386: URL: https://github.com/apache/datafusion/pull/16386#issuecomment-2978374953 > > @alamb Could you help reivew this PR? > > Thanks @xiedeyantu ! We normally need to add tests as part of any code PR -- could you look into adding some tests and document

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150949273 ## datafusion/physical-plan/src/execution_plan.rs: ## @@ -509,8 +510,22 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { /// /// The defaul

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978330938 🤖 `./gh_compare_branch.sh` [Benchmark Script](https://github.com/alamb/datafusion-benchmarking/blob/main/gh_compare_branch.sh) Running Linux aal-dev 6.11.0-1015-gcp #15~24.04.1-Ubun

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2978330851 🤖: Benchmark completed Details ``` Comparing HEAD and prune-rg Benchmark clickbench_1.json ┏━━┳

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2978314117 🤖: Benchmark completed Details ``` Comparing HEAD and prune-rg Benchmark clickbench_extended.json ┏

Re: [I] panic `StructBuilder and field_builder with index 0 (Utf8) are of unequal lengths: (1 != 0)` when running with delta lake extension and `spark.comet.exec.shuffle.fallbackToColumnar` is true [d

2025-06-16 Thread via GitHub
andygrove commented on issue #1867: URL: https://github.com/apache/datafusion-comet/issues/1867#issuecomment-2977635741 @rluvaton were you able to confirm if this issue is now resolved? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] Add hooks to `SchemaAdapter` to add custom column generators [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #15261: URL: https://github.com/apache/datafusion/pull/15261#discussion_r2150964083 ## datafusion/datasource/src/schema_adapter.rs: ## @@ -98,6 +100,46 @@ pub trait SchemaMapper: Debug + Send + Sync { fn map_batch(&self, batch: RecordBatch) ->

Re: [PR] Eliminate Self Joins [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16023: URL: https://github.com/apache/datafusion/pull/16023#discussion_r2150948098 ## datafusion/optimizer/src/eliminate_self_join/unique_keyed.rs: ## @@ -0,0 +1,323 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mor

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2978284818 I am sorry I haven't had a chance to review this yet. It would be great if @mbutrovich could also take a look. I have this on my list to review but I haven't been able to find the tim

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150957763 ## datafusion/physical-optimizer/src/filter_pushdown.rs: ## @@ -362,17 +363,25 @@ use itertools::izip; /// [`ProjectionExec`]: datafusion_physical_plan::projecti

Re: [PR] feat: mapping sql Char/Text/String default to Utf8View [datafusion]

2025-06-16 Thread via GitHub
alamb commented on code in PR #16290: URL: https://github.com/apache/datafusion/pull/16290#discussion_r2150955809 ## datafusion/common/src/config.rs: ## @@ -259,10 +259,10 @@ config_namespace! { /// string length and thus DataFusion can not enforce such limits.

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #16424: URL: https://github.com/apache/datafusion/pull/16424#issuecomment-2978224779 Do we expect the benchmarks to show anything? I don't think they're using dynamic filters right? Maybe we need to merge https://github.com/apache/datafusion/pull/15770 and then we c

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-16 Thread via GitHub
djanderson commented on PR #75: URL: https://github.com/apache/datafusion-site/pull/75#issuecomment-2978267914 One other thing I'm curious about. This write-up discusses the change in terms of enabling long-running tasks to be cancelled, but would making CPU-intensive exec blocks more coop

Re: [PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16386: URL: https://github.com/apache/datafusion/pull/16386#issuecomment-2978264463 > @alamb Could you help reivew this PR? Thanks @xiedeyantu ! We normally need to add tests as part of any code PR -- could you look into adding some tests and documentation abou

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #15770: URL: https://github.com/apache/datafusion/pull/15770#discussion_r2150951172 ## datafusion/physical-optimizer/src/optimizer.rs: ## @@ -131,6 +131,8 @@ impl PhysicalOptimizer { // replacing operators with fetching variants, or

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2978256702 Thank you Andrew! I will do the renames, docs edits, etc., push those tonight and we can merge this tomorrow evening if there is no more feedback. -- This is an automated message

Re: [PR] Eliminate Self Joins [datafusion]

2025-06-16 Thread via GitHub
jonathanc-n commented on code in PR #16023: URL: https://github.com/apache/datafusion/pull/16023#discussion_r2150926283 ## datafusion/optimizer/src/eliminate_self_join/unique_keyed.rs: ## @@ -0,0 +1,323 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mor

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150931060 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -384,6 +353,24 @@ impl FileOpener for ParquetOpener { .map(move |maybe_batch| {

Re: [PR] build(deps): bump datafusion-proto from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] commented on PR #1151: URL: https://github.com/apache/datafusion-python/pull/1151#issuecomment-2978185483 Looks like datafusion-proto is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] chore(deps): bump rust_decimal from 1.37.1 to 1.37.2 [datafusion]

2025-06-16 Thread via GitHub
comphead merged PR #16422: URL: https://github.com/apache/datafusion/pull/16422 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Blog post on query cancellation [datafusion-site]

2025-06-16 Thread via GitHub
djanderson commented on code in PR #75: URL: https://github.com/apache/datafusion-site/pull/75#discussion_r2150859260 ## content/blog/2025-06-15-cancellation.md: ## @@ -0,0 +1,328 @@ +# Query Cancellation + +## The Challenge of Cancelling Long-Running Queries + +Have you ever tr

Re: [PR] build(deps): bump datafusion from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] closed pull request #1150: build(deps): bump datafusion from 47.0.0 to 48.0.0 URL: https://github.com/apache/datafusion-python/pull/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [I] Improve performance of `datafusion-cli` when reading from remote storage [datafusion]

2025-06-16 Thread via GitHub
comphead commented on issue #16365: URL: https://github.com/apache/datafusion/issues/16365#issuecomment-2978192301 Thanks @swaingotnochill I would probably start with profiling. The profiling techniques can be found https://github.com/apache/datafusion/blob/main/docs/source/library-use

Re: [PR] build(deps): bump datafusion-substrait from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] closed pull request #1148: build(deps): bump datafusion-substrait from 47.0.0 to 48.0.0 URL: https://github.com/apache/datafusion-python/pull/1148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] feat: upgrade df48 dependency [datafusion-python]

2025-06-16 Thread via GitHub
timsaucer merged PR #1143: URL: https://github.com/apache/datafusion-python/pull/1143 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] build(deps): bump datafusion from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] commented on PR #1150: URL: https://github.com/apache/datafusion-python/pull/1150#issuecomment-2978185174 Looks like datafusion is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] build(deps): bump datafusion-ffi from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] commented on PR #1147: URL: https://github.com/apache/datafusion-python/pull/1147#issuecomment-2978185243 Looks like datafusion-ffi is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] build(deps): bump datafusion-proto from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] closed pull request #1151: build(deps): bump datafusion-proto from 47.0.0 to 48.0.0 URL: https://github.com/apache/datafusion-python/pull/1151 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] build(deps): bump datafusion-ffi from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] closed pull request #1147: build(deps): bump datafusion-ffi from 47.0.0 to 48.0.0 URL: https://github.com/apache/datafusion-python/pull/1147 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] build(deps): bump datafusion-substrait from 47.0.0 to 48.0.0 [datafusion-python]

2025-06-16 Thread via GitHub
dependabot[bot] commented on PR #1148: URL: https://github.com/apache/datafusion-python/pull/1148#issuecomment-2978185196 Looks like datafusion-substrait is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Add design process section to the docs [datafusion]

2025-06-16 Thread via GitHub
comphead merged PR #16397: URL: https://github.com/apache/datafusion/pull/16397 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Update Roadmap documentation [datafusion]

2025-06-16 Thread via GitHub
comphead commented on code in PR #16399: URL: https://github.com/apache/datafusion/pull/16399#discussion_r2150904489 ## docs/source/contributor-guide/roadmap.md: ## @@ -46,81 +46,12 @@ make review efficient and avoid surprises. # Quarterly Roadmap -A quarterly roadmap will

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16391: URL: https://github.com/apache/datafusion/pull/16391#issuecomment-2978146604 Thanks again @jonathanc-n -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] fix: Fixed error handling for `generate_series/range` [datafusion]

2025-06-16 Thread via GitHub
alamb merged PR #16391: URL: https://github.com/apache/datafusion/pull/16391 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16083: URL: https://github.com/apache/datafusion/pull/16083#issuecomment-2978141974 🚀 -- I am feeling physically nervous that there are so many PRs open so starting the merge train! -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] Always add parentheses when formatting `BinaryExpr` with `SchemaDisplay` [datafusion]

2025-06-16 Thread via GitHub
alamb commented on PR #16209: URL: https://github.com/apache/datafusion/pull/16209#issuecomment-2978143366 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is read

Re: [I] RecordBatch might have logical row mapping on physical arrays [datafusion-comet]

2025-06-16 Thread via GitHub
huaxingao commented on issue #974: URL: https://github.com/apache/datafusion-comet/issues/974#issuecomment-2978083709 Yes, we still need this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2025-06-16 Thread via GitHub
viirya commented on issue #1059: URL: https://github.com/apache/datafusion-comet/issues/1059#issuecomment-2978021382 Yes -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Spark ColumnarToRowExec cannot pass CometBuffer safety check [datafusion-comet]

2025-06-16 Thread via GitHub
viirya closed issue #1059: Spark ColumnarToRowExec cannot pass CometBuffer safety check URL: https://github.com/apache/datafusion-comet/issues/1059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] chore: generate basic spark function tests [datafusion]

2025-06-16 Thread via GitHub
shehabgamin commented on code in PR #16409: URL: https://github.com/apache/datafusion/pull/16409#discussion_r2150803520 ## datafusion/sqllogictest/test_files/spark/array/array.slt: ## @@ -0,0 +1,22 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

Re: [PR] Use dedicated NullEquality enum instead of null_equals_null boolean [datafusion]

2025-06-16 Thread via GitHub
tobixdev commented on code in PR #16419: URL: https://github.com/apache/datafusion/pull/16419#discussion_r2150797422 ## datafusion/common/src/null_equality.rs: ## @@ -0,0 +1,34 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Prune files during streams and avoid additional pruning if there are no dynamic filters [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on code in PR #16424: URL: https://github.com/apache/datafusion/pull/16424#discussion_r2150409636 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -524,6 +512,91 @@ fn should_enable_page_index( .unwrap_or(false) } +/// Prune based on parti

Re: [I] Queries with exchange reuse sometimes fail in Comet [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1798: Queries with exchange reuse sometimes fail in Comet URL: https://github.com/apache/datafusion-comet/issues/1798 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1678: URL: https://github.com/apache/datafusion-comet/issues/1678#issuecomment-2977832506 fixed in https://github.com/apache/datafusion-comet/pull/1689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] chore: Enable tests for casting timestamp to numeric types [datafusion-comet]

2025-06-16 Thread via GitHub
codecov-commenter commented on PR #1891: URL: https://github.com/apache/datafusion-comet/pull/1891#issuecomment-2977904786 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1891?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] TopK dynamic filter pushdown attempt 2 [datafusion]

2025-06-16 Thread via GitHub
adriangb commented on PR #15770: URL: https://github.com/apache/datafusion/pull/15770#issuecomment-2977884089 I'd be curious to see perf impact after we merge https://github.com/apache/datafusion/pull/16424 as well -- This is an automated message from the Apache Git Service. To respond to

Re: [I] Fix rat check errors during release process [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1678: Fix rat check errors during release process URL: https://github.com/apache/datafusion-comet/issues/1678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1576: URL: https://github.com/apache/datafusion-comet/issues/1576#issuecomment-2977847545 This issue was likely resolved https://github.com/apache/datafusion-comet/pull/693 so will close for now. @mkgada feel free to reopen if this is still an issue -- This

Re: [I] NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1576: NoSuchMethodError: java.lang.Object org.apache.spark.executor.TaskMetrics.withExternalAccums(scala.Function1) URL: https://github.com/apache/datafusion-comet/issues/1576 -- This is an automated message from the Apache Git Service. To respond to the message, pleas

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-16 Thread via GitHub
mbutrovich commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2977839997 Thank you and @adamreeve for driving so much of the modular encryption work! I'll take a look at this branch this week and see how this might get Comet supporting modular encrypti

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove commented on issue #1451: URL: https://github.com/apache/datafusion-comet/issues/1451#issuecomment-2977837094 Closing this issue since it is inactive. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [I] NoSuchMethodError with Spark 3.5.3 (EMR 7.6) [datafusion-comet]

2025-06-16 Thread via GitHub
andygrove closed issue #1451: NoSuchMethodError with Spark 3.5.3 (EMR 7.6) URL: https://github.com/apache/datafusion-comet/issues/1451 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Few tests fail on windows [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm closed issue #1117: Few tests fail on windows URL: https://github.com/apache/datafusion-ballista/issues/1117 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [I] Update all github workflow to use actions tied to sha hashes [datafusion]

2025-06-16 Thread via GitHub
findepi commented on issue #15298: URL: https://github.com/apache/datafusion/issues/15298#issuecomment-2977828294 > A recent [supply chain attack](https://arstechnica.com/information-technology/2025/03/supply-chain-attack-exposing-credentials-affects-23k-users-of-tj-actions/) has made it ex

Re: [PR] fix: fix tests failing on windows [datafusion-ballista]

2025-06-16 Thread via GitHub
milenkovicm commented on PR #1273: URL: https://github.com/apache/datafusion-ballista/pull/1273#issuecomment-2977828544 thank you very much @Huy1Ng this issue was there for a long time -- This is an automated message from the Apache Git Service. To respond to the message, please log on

  1   2   3   >