Re: [I] Enable udf hashing across FFI boundary [datafusion]

2025-08-28 Thread via GitHub
crystalxyz commented on issue #17087: URL: https://github.com/apache/datafusion/issues/17087#issuecomment-3235551112 Hi @timsaucer, I'm returning to the community and refreshing my memory on FFI! I've just implemented a quick fix for this issue, and please let me know if this is what you ha

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-28 Thread via GitHub
ding-young commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2308923216 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] Update development guide in README.md [datafusion-python]

2025-08-28 Thread via GitHub
timsaucer merged PR #1213: URL: https://github.com/apache/datafusion-python/pull/1213 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] External sort failing with non-spillable operators as input (RepartitionExec) [datafusion]

2025-08-28 Thread via GitHub
ding-young commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3235438790 > I think some specific timing can cause the non-spillable operators to OOM, however if spillable operators can spill earlier, the execution should be possible to finish. ([@

Re: [PR] 49.0.0 release [datafusion-python]

2025-08-28 Thread via GitHub
timsaucer merged PR #1211: URL: https://github.com/apache/datafusion-python/pull/1211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] bug: test `cast StructType to String` fails for duplicated col [datafusion-comet]

2025-08-28 Thread via GitHub
comphead commented on issue #2256: URL: https://github.com/apache/datafusion-comet/issues/2256#issuecomment-3235337339 This happens because of ``` if (struct.names.length != struct.names.distinct.length) { withInfo(expr, "CreateNamedStruct with duplicate field nam

Re: [PR] feat: Make Parquet EncryptionFactory async [datafusion]

2025-08-28 Thread via GitHub
adamreeve commented on code in PR #17342: URL: https://github.com/apache/datafusion/pull/17342#discussion_r2308836875 ## datafusion/datasource-parquet/src/opener.rs: ## @@ -152,16 +152,18 @@ impl FileOpener for ParquetOpener { let mut predicate_file_schema = Arc::clone(

[I] Remove workaround disabling Parquet page index reading for encrypted files [datafusion]

2025-08-28 Thread via GitHub
adamreeve opened a new issue, #17352: URL: https://github.com/apache/datafusion/issues/17352 https://github.com/apache/datafusion/blob/d19bf524e384bc24e509c70f1806b6f330829529/datafusion/datasource-parquet/src/opener.rs#L158-L162 This was fixed in https://github.com/apache/arrow-rs/pu

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
Jefffrey commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2308807125 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] Fix ambiguous column names in substrait conversion as a result of literals having the same name during conversion. [datafusion]

2025-08-28 Thread via GitHub
xanderbailey commented on PR #17299: URL: https://github.com/apache/datafusion/pull/17299#issuecomment-3235211365 Anything else needed from me? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] fix: handle cast to dictionary vector introduced by case when [datafusion-comet]

2025-08-28 Thread via GitHub
parthchandra commented on code in PR #2044: URL: https://github.com/apache/datafusion-comet/pull/2044#discussion_r2308725990 ## native/spark-expr/src/conversion_funcs/cast.rs: ## @@ -866,6 +867,40 @@ pub fn spark_cast( } } +// copied from datafusion common scalar/mod.rs

Re: [PR] fix: RangePartitioning with native shuffle [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich commented on PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#issuecomment-3234923462 The next challenge to figure out is adding some flexibility for dictionary-encoded columns. The current approach with one schema is too rigid. -- This is an automated messa

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#issuecomment-3234967366 > lgtm. I only wonder if this will end up making a bunch of S3 calls every time ScanRule is called (but perhaps for the time being this is ok until we address this better).

Re: [I] Bug: UDAF unexpectedly returns non-empty result for empty table [datafusion]

2025-08-28 Thread via GitHub
samueleresca commented on issue #17269: URL: https://github.com/apache/datafusion/issues/17269#issuecomment-3234944839 This issue is also affecting other aggregates (e.g. `sum()`). I think the root cause is the parsing between `ScalarValue` to `ArrayRef` happening at the finalisation of the

[I] 0.9.2 Release [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove opened a new issue, #2260: URL: https://github.com/apache/datafusion-comet/issues/2260 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- Th

Re: [PR] Add PhysicalExpr::is_volatile [datafusion]

2025-08-28 Thread via GitHub
adriangb commented on code in PR #17351: URL: https://github.com/apache/datafusion/pull/17351#discussion_r2308522201 ## datafusion/physical-expr-common/src/physical_expr.rs: ## @@ -377,6 +377,40 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug + DynEq + DynHash {

[PR] Add PhysicalExpr::is_volatile [datafusion]

2025-08-28 Thread via GitHub
adriangb opened a new pull request, #17351: URL: https://github.com/apache/datafusion/pull/17351 Closes https://github.com/apache/datafusion/issues/17318 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] fix: shuffle reader should return statistics [datafusion-ballista]

2025-08-28 Thread via GitHub
milenkovicm opened a new pull request, #1302: URL: https://github.com/apache/datafusion-ballista/pull/1302 # Which issue does this PR close? Closes #. # Rationale for this change with datafusion 48 method which is to be used for Exec statistic information changed. Shuff

[PR] fix: Fix Eq by adding hash to FFI udf/udaf/udwf [datafusion]

2025-08-28 Thread via GitHub
crystalxyz opened a new pull request, #17350: URL: https://github.com/apache/datafusion/pull/17350 ## Which issue does this PR close? - Closes #17087 . ## Rationale for this change ## What changes are included in this PR? ## Are these change

Re: [I] Avoid schema deep clone in PruningExpressionBuilder [datafusion]

2025-08-28 Thread via GitHub
etolbakov commented on issue #17198: URL: https://github.com/apache/datafusion/issues/17198#issuecomment-3234710847 @findepi I believe this could be closed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
avantgardnerio commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308362009 ## datafusion/physical-optimizer/src/limit_pushdown_past_window.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-08-28 Thread via GitHub
BlakeOrth commented on PR #17266: URL: https://github.com/apache/datafusion/pull/17266#issuecomment-3234598245 @alamb I've made some changes that I think get the functional side of the code where it needs to be and pushed those for review. This is still lacking tests and docs, but I thought

[I] Subquery expressions in UDTF's parsed but elided [datafusion]

2025-08-28 Thread via GitHub
davisp opened a new issue, #17349: URL: https://github.com/apache/datafusion/issues/17349 ### Describe the bug I got nerd sniped by a question in the DataFusion Discord channel into seeing if I couldn't implement something similar to [BigQuery's gap_fill function](https://cloud.googl

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-08-28 Thread via GitHub
BlakeOrth commented on code in PR #17266: URL: https://github.com/apache/datafusion/pull/17266#discussion_r2308256808 ## datafusion-cli/src/object_storage.rs: ## @@ -563,6 +563,592 @@ pub(crate) async fn get_object_store( Ok(store) } +pub mod instrumented { Review Comme

Re: [PR] POC: datafusion-cli instrumented object store [datafusion]

2025-08-28 Thread via GitHub
BlakeOrth commented on code in PR #17266: URL: https://github.com/apache/datafusion/pull/17266#discussion_r2308250575 ## datafusion-cli/src/print_options.rs: ## @@ -73,6 +77,8 @@ pub struct PrintOptions { pub quiet: bool, pub maxrows: MaxRows, pub color: bool, +

[PR] chore: Refactor serde for more array and struct expressions [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove opened a new pull request, #2257: URL: https://github.com/apache/datafusion-comet/pull/2257 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2019 ## Rationale for this change Continued refactoring as pa

Re: [I] External sort failing with non-spillable operators as input (RepartitionExec) [datafusion]

2025-08-28 Thread via GitHub
16pierre commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3234493420 > I think some specific timing can cause the non-spillable operators to OOM, however if spillable operators can spill earlier, the execution should be possible to finish

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#issuecomment-3233741851 @Kontinuation fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Chore: Refactor serde for math expressions [datafusion-comet]

2025-08-28 Thread via GitHub
kazantsev-maksim opened a new pull request, #2259: URL: https://github.com/apache/datafusion-comet/pull/2259 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/2019 ## Rationale for this change Part of https://github.com/apache/da

[PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
avantgardnerio opened a new pull request, #17347: URL: https://github.com/apache/datafusion/pull/17347 ## Which issue does this PR close? - Closes #17346. ## Rationale for this change We would like plans with window functions to go faster. ## What changes are inclu

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
avantgardnerio commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308160226 ## datafusion/common/src/config.rs: ## @@ -727,6 +727,10 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggre

Re: [I] Add type parameter to `CometAggregateExpressionSerde` [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich closed issue #2245: Add type parameter to `CometAggregateExpressionSerde` URL: https://github.com/apache/datafusion-comet/issues/2245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] chore: Add type parameter to CometAggregateExpressionSerde [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich merged PR #2249: URL: https://github.com/apache/datafusion-comet/pull/2249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] decryption not supported when using `auto` scan mode to read from Parquet [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich closed issue #2198: decryption not supported when using `auto` scan mode to read from Parquet URL: https://github.com/apache/datafusion-comet/issues/2198 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] fix: Fall back to `native_comet` for encrypted Parquet scans [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich merged PR #2250: URL: https://github.com/apache/datafusion-comet/pull/2250 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
Dandandan commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308149858 ## datafusion/physical-optimizer/src/limit_pushdown_past_window.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
Dandandan commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308149858 ## datafusion/physical-optimizer/src/limit_pushdown_past_window.rs: ## @@ -0,0 +1,145 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
Dandandan commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308131304 ## datafusion/common/src/config.rs: ## @@ -727,6 +727,10 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregatio

Re: [PR] Push the limits past window functions [datafusion]

2025-08-28 Thread via GitHub
Dandandan commented on code in PR #17347: URL: https://github.com/apache/datafusion/pull/17347#discussion_r2308131304 ## datafusion/common/src/config.rs: ## @@ -727,6 +727,10 @@ config_namespace! { /// during aggregations, if possible pub enable_topk_aggregatio

Re: [PR] chore: Add spark compatible `MapSort` function along with limited support for grouping on Map type [datafusion-comet]

2025-08-28 Thread via GitHub
rishvin commented on PR #2221: URL: https://github.com/apache/datafusion-comet/pull/2221#issuecomment-3234381573 > Thanks @rishvin I feel like this PR would be hard to review, would be that possible to break it down to smaller parts? One function a time? Thanks @comphead for the feed

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on code in PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#discussion_r2308038000 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -288,8 +292,25 @@ case class CometScanRule(session: SparkSession) extends Rule[Spa

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on code in PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#discussion_r2308038000 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -288,8 +292,25 @@ case class CometScanRule(session: SparkSession) extends Rule[Spa

Re: [PR] fix: RangePartitioning with native shuffle [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich commented on PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#issuecomment-3234211470 I'll try to investigate how TPC-H Correctness suite ended up with `Error from DataFusion: Empty iterator passed to ScalarValue::iter_to_array.` It might be trying to go dow

Re: [PR] fix: RangePartitioning boundaries with native shuffle [datafusion-comet]

2025-08-28 Thread via GitHub
codecov-commenter commented on PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#issuecomment-3234191347 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2258?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: RangePartitioning boundaries with native shuffle [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich commented on PR #2258: URL: https://github.com/apache/datafusion-comet/pull/2258#issuecomment-3234182214 cc @Kontinuation -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] chore: Add type parameter to CometAggregateExpressionSerde [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on PR #2249: URL: https://github.com/apache/datafusion-comet/pull/2249#issuecomment-3234171837 @peter-toth fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] minor: make shuffle exec display consistent [datafusion-ballista]

2025-08-28 Thread via GitHub
andygrove merged PR #1299: URL: https://github.com/apache/datafusion-ballista/pull/1299 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

[PR] fix: RangePartitioning boundaries with native shuffle [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich opened a new pull request, #2258: URL: https://github.com/apache/datafusion-comet/pull/2258 ## Which issue does this PR close? Closes #1906. ## Rationale for this change #1862 tried to implement RangePartitioning with native shuffle. The implem

Re: [PR] Add `TableProvider::scan_with_args` to support pushdown sorting [datafusion]

2025-08-28 Thread via GitHub
adriangb commented on PR #17273: URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3234115401 I've made an epic to track this work: https://github.com/apache/datafusion/issues/17348 -- This is an automated message from the Apache Git Service. To respond to the message, ple

[I] [EPIC] Sort order pushdown [datafusion]

2025-08-28 Thread via GitHub
adriangb opened a new issue, #17348: URL: https://github.com/apache/datafusion/issues/17348 **Background** We already have an excellent blog post that I think does a great introduction to the importance of optimizing sort order information in queries: https://datafusion.apache.org/bl

Re: [PR] fix: Fall back to `native_comet` for encrypted Parquet scans [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on code in PR #2250: URL: https://github.com/apache/datafusion-comet/pull/2250#discussion_r2307885326 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -131,11 +131,19 @@ case class CometScanRule(session: SparkSession) extends Rule[Sp

Re: [PR] fix: split `expr.proto` file [datafusion-comet]

2025-08-28 Thread via GitHub
comphead commented on PR #2046: URL: https://github.com/apache/datafusion-comet/pull/2046#issuecomment-3234065474 Thanks @kination and @andygrove for the proto we prob should be take one step at a time. Lets try move all datatype related structures into the `types.proto` and make the co

[I] Optimizer should push limits past window functions [datafusion]

2025-08-28 Thread via GitHub
avantgardnerio opened a new issue, #17346: URL: https://github.com/apache/datafusion/issues/17346 ### Is your feature request related to a problem or challenge? Window functions are cardinality preserving, they just might need to see more data than the limits applied after them, so we

Re: [PR] chore: fix struct to string test for `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
comphead merged PR #2253: URL: https://github.com/apache/datafusion-comet/pull/2253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] `cast StructType to StringType` test fails if default scan is `auto` [datafusion-comet]

2025-08-28 Thread via GitHub
comphead closed issue #2175: `cast StructType to StringType` test fails if default scan is `auto` URL: https://github.com/apache/datafusion-comet/issues/2175 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
comphead commented on code in PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#discussion_r2307830288 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -288,8 +292,25 @@ case class CometScanRule(session: SparkSession) extends Rule[Spar

Re: [PR] fix: Fall back to `native_comet` when object store not supported by `native_iceberg_compat` [datafusion-comet]

2025-08-28 Thread via GitHub
comphead commented on code in PR #2251: URL: https://github.com/apache/datafusion-comet/pull/2251#discussion_r2307826742 ## native/core/src/parquet/mod.rs: ## @@ -686,9 +715,9 @@ pub unsafe extern "system" fn Java_org_apache_comet_parquet_Native_initRecordBat ) -> jlong {

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
chenkovsky commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2307499989 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] fix: Fall back to `native_comet` for encrypted Parquet scans [datafusion-comet]

2025-08-28 Thread via GitHub
comphead commented on code in PR #2250: URL: https://github.com/apache/datafusion-comet/pull/2250#discussion_r2307746115 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -131,11 +131,19 @@ case class CometScanRule(session: SparkSession) extends Rule[Spa

Re: [PR] chore: Refactor serde for more array and struct expressions [datafusion-comet]

2025-08-28 Thread via GitHub
codecov-commenter commented on PR #2257: URL: https://github.com/apache/datafusion-comet/pull/2257#issuecomment-3233973612 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/2257?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] fix: Fall back to `native_comet` for encrypted Parquet scans [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on code in PR #2250: URL: https://github.com/apache/datafusion-comet/pull/2250#discussion_r2307754474 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -131,11 +131,19 @@ case class CometScanRule(session: SparkSession) extends Rule[Sp

[PR] Allow wilrdacrd for all `SEMANTIC_VIEW` types [datafusion-sqlparser-rs]

2025-08-28 Thread via GitHub
bombsimon opened a new pull request, #2017: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2017 Followup on #2016, I didn't update all the types. `DIMENSIONS` also support wildcard, from [the docs](https://docs.snowflake.com/en/sql-reference/constructs/semantic_view) > T

Re: [I] chore: Fix Scala code warnings [datafusion-comet]

2025-08-28 Thread via GitHub
akupchinskiy commented on issue #2255: URL: https://github.com/apache/datafusion-comet/issues/2255#issuecomment-3233941579 I can work on that -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: split `expr.proto` file [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on PR #2046: URL: https://github.com/apache/datafusion-comet/pull/2046#issuecomment-3233855052 > Sorry I couldn't reproduce PR build failure in local(macOS). Could somebody give me some tip to check these? @kination I pulled these changes locally and ran `make cle

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove closed pull request #2026: chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde URL: https://github.com/apache/datafusion-comet/pull/2026 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] chore: Refactor GetArrayItem, ElementAt, GetArrayStructFields out of QueryPlanSerde [datafusion-comet]

2025-08-28 Thread via GitHub
andygrove commented on PR #2026: URL: https://github.com/apache/datafusion-comet/pull/2026#issuecomment-3233843793 I'll go ahead and close this. I started https://github.com/apache/datafusion-comet/pull/2257 to refactor these expressions. -- This is an automated message from the Apache

Re: [PR] chore: Introduce `strict-warning` profile for Scala [datafusion-comet]

2025-08-28 Thread via GitHub
comphead merged PR #2254: URL: https://github.com/apache/datafusion-comet/pull/2254 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] External sort failing with non-spillable operators as input (RepartitionExec) [datafusion]

2025-08-28 Thread via GitHub
16pierre commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-3233816123 Thanks a lot for the quick feedback ! I'm a bit stretch on dev bandwidth as we're still in early phases of our Datafusion production integration, but will try my best to

Re: [PR] feat(spark): implement Spark `width_bucket` function [datafusion]

2025-08-28 Thread via GitHub
davidlghellin commented on code in PR #17331: URL: https://github.com/apache/datafusion/pull/17331#discussion_r2307652098 ## datafusion/spark/src/function/math/width_bucket.rs: ## @@ -0,0 +1,506 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

[PR] feat: allow passing a slice to and expression with the [] indexing [datafusion-python]

2025-08-28 Thread via GitHub
timsaucer opened a new pull request, #1215: URL: https://github.com/apache/datafusion-python/pull/1215 # Which issue does this PR close? This is a user request but no issue was opened. # Rationale for this change With this you can do slicing of arrays via expression. For

Re: [PR] Redesign ownership model between `FileScanConfig` and `FileSource`s [datafusion]

2025-08-28 Thread via GitHub
friendlymatthew commented on PR #17242: URL: https://github.com/apache/datafusion/pull/17242#issuecomment-3233675439 > It would be better to have a high-level diagram to describe the current relationship. Fwiw, this change involves removing the `FileScanConfig` node and moving it in

Re: [PR] Redesign ownership model between `FileScanConfig` and `FileSource`s [datafusion]

2025-08-28 Thread via GitHub
friendlymatthew commented on PR #17242: URL: https://github.com/apache/datafusion/pull/17242#issuecomment-3233489562 > It would be better to have a high-level diagram to describe the current relationship. Hi, sure happy to do that. Do you have a link to that diagram? I'd like to make

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
chenkovsky commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2307499989 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [I] Ballista client keep blocking when prepare_task_definition or prepare_multi_task_definition fail [datafusion-ballista]

2025-08-28 Thread via GitHub
andygrove closed issue #1214: Ballista client keep blocking when prepare_task_definition or prepare_multi_task_definition fail URL: https://github.com/apache/datafusion-ballista/issues/1214 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [PR] fix: fail job in case of serde error (pull-mode) [datafusion-ballista]

2025-08-28 Thread via GitHub
andygrove merged PR #1297: URL: https://github.com/apache/datafusion-ballista/pull/1297 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: update datafusion to 49.0.2 [datafusion-ballista]

2025-08-28 Thread via GitHub
andygrove merged PR #1298: URL: https://github.com/apache/datafusion-ballista/pull/1298 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] fix EquivalenceProperties calculation in DataSourceExec [datafusion]

2025-08-28 Thread via GitHub
adriangb merged PR #17323: URL: https://github.com/apache/datafusion/pull/17323 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix EquivalenceProperties calculation in DataSourceExec [datafusion]

2025-08-28 Thread via GitHub
adriangb commented on PR #17323: URL: https://github.com/apache/datafusion/pull/17323#issuecomment-3233367866 I think we should apply this first, especially since we can pretty easily put it out as a hot fix, there are no or very minimal breaking changes in this PR versus the other one is g

Re: [PR] Redesign ownership model between `FileScanConfig` and `FileSource`s [datafusion]

2025-08-28 Thread via GitHub
xudong963 commented on PR #17242: URL: https://github.com/apache/datafusion/pull/17242#issuecomment-3233360860 It would be better to have a high-level diagram to describe the current relationship. Before the PR, it looks like https://github.com/user-attachments/assets/1b77f071-145a-49eb-9

Re: [I] External sort failing with non-spillable operators as input (RepartitionExec) [datafusion]

2025-08-28 Thread via GitHub
2010YOUY01 commented on issue #17334: URL: https://github.com/apache/datafusion/issues/17334#issuecomment-326093 The current `FairSpillPool` implementation seems problematic See its behavior: https://github.com/apache/datafusion/blob/5021b397b1e63277b217dd3f8111b64b3458d484/datafusion

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
Jefffrey commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2307154278 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] fix: Remove unreachable code in `CometScanRule` [datafusion-comet]

2025-08-28 Thread via GitHub
mbutrovich merged PR #2252: URL: https://github.com/apache/datafusion-comet/pull/2252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-08-28 Thread via GitHub
Jefffrey commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3232117851 @dependabot recreate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore(deps): bump libmimalloc-sys from 0.1.43 to 0.1.44 [datafusion]

2025-08-28 Thread via GitHub
Jefffrey merged PR #17343: URL: https://github.com/apache/datafusion/pull/17343 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Redshift: UNLOAD [datafusion-sqlparser-rs]

2025-08-28 Thread via GitHub
iffyio commented on code in PR #2013: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2013#discussion_r2306975457 ## src/ast/mod.rs: ## @@ -8796,16 +8831,46 @@ pub enum CopyLegacyOption { Delimiter(char), /// EMPTYASNULL EmptyAsNull, +/// ENCRYPTE

Re: [PR] chore(deps): bump actions/checkout from 4.2.2 to 5.0.0 [datafusion]

2025-08-28 Thread via GitHub
Jefffrey merged PR #17345: URL: https://github.com/apache/datafusion/pull/17345 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: avoid very cheap copy in `SchemaMapping` [datafusion]

2025-08-28 Thread via GitHub
Jefffrey merged PR #17344: URL: https://github.com/apache/datafusion/pull/17344 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Support wildcard metrics for `SEMANTIC_VIEW` [datafusion-sqlparser-rs]

2025-08-28 Thread via GitHub
iffyio merged PR #2016: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/2016 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

[PR] chore(deps): bump libmimalloc-sys from 0.1.43 to 0.1.44 [datafusion]

2025-08-28 Thread via GitHub
dependabot[bot] opened a new pull request, #17343: URL: https://github.com/apache/datafusion/pull/17343 Bumps [libmimalloc-sys](https://github.com/purpleprotocol/mimalloc_rust) from 0.1.43 to 0.1.44. Release notes Sourced from https://github.com/purpleprotocol/mimalloc_rust/release

[PR] chore: avoid very cheap copy in `SchemaMapping` [datafusion]

2025-08-28 Thread via GitHub
rluvaton opened a new pull request, #17344: URL: https://github.com/apache/datafusion/pull/17344 ## Which issue does this PR close? None ## Rationale for this change the copy is very cheap and maybe optimized away but nonetheless ## What changes are included in thi

[PR] chore(deps): bump actions/checkout from 4.2.2 to 5.0.0 [datafusion]

2025-08-28 Thread via GitHub
dependabot[bot] opened a new pull request, #17345: URL: https://github.com/apache/datafusion/pull/17345 Bumps [actions/checkout](https://github.com/actions/checkout) from 4.2.2 to 5.0.0. Release notes Sourced from https://github.com/actions/checkout/releases";>actions/checkout's r

Re: [PR] optimizer: Rewrite `IS NOT DISTINCT FROM` joins as Hash Joins [datafusion]

2025-08-28 Thread via GitHub
2010YOUY01 commented on code in PR #17319: URL: https://github.com/apache/datafusion/pull/17319#discussion_r2306871250 ## datafusion/optimizer/src/extract_equijoin_predicate.rs: ## @@ -112,22 +151,82 @@ impl OptimizerRule for ExtractEquijoinPredicate { } } +/// Splits an

Re: [PR] Fix incorrect memory accounting for sliced `StringViewArray` [datafusion]

2025-08-28 Thread via GitHub
2010YOUY01 commented on code in PR #17315: URL: https://github.com/apache/datafusion/pull/17315#discussion_r2306863053 ## datafusion/physical-plan/src/spill/spill_manager.rs: ## @@ -194,7 +195,84 @@ impl GetSlicedSize for RecordBatch { for array in self.columns() {

Re: [PR] feat: make sql an optional feature [datafusion]

2025-08-28 Thread via GitHub
crepererum commented on code in PR #17332: URL: https://github.com/apache/datafusion/pull/17332#discussion_r2306804318 ## .github/workflows/rust.yml: ## @@ -579,7 +579,7 @@ jobs: with: rust-version: stable - name: Run datafusion-common tests -r

Re: [PR] feat: make sql an optional feature [datafusion]

2025-08-28 Thread via GitHub
crepererum commented on PR #17332: URL: https://github.com/apache/datafusion/pull/17332#issuecomment-3232662859 I've adjusted the description because I think that also closes #17258 :partying_face: -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
chenkovsky commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2306633810 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [I] Release DataFusion `50.0.0` (Aug/Sep 2025) [datafusion]

2025-08-28 Thread via GitHub
shehabgamin commented on issue #16799: URL: https://github.com/apache/datafusion/issues/16799#issuecomment-3232376292 The DataFusion 50 upgrade may be challenging for many people. The Sail update currently involves more than 100 file changes, and I am still not finished. https://github.c

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
Jefffrey commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2306554329 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [PR] feat(spark): implement Spark bitwise function shiftleft/shiftright/shiftrightunsighed [datafusion]

2025-08-28 Thread via GitHub
chenkovsky commented on code in PR #17013: URL: https://github.com/apache/datafusion/pull/17013#discussion_r2306428541 ## datafusion/spark/src/function/bitwise/bit_shift.rs: ## @@ -0,0 +1,678 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] chore(deps): bump apache-avro from 0.17.0 to 0.18.0 [datafusion]

2025-08-28 Thread via GitHub
Jefffrey commented on PR #16092: URL: https://github.com/apache/datafusion/pull/16092#issuecomment-3232297447 fyi @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

  1   2   >