Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969200984 @drtconway You're welcome. Can you close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969217442 Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway closed issue #16366: row-wise min and max URL: https://github.com/apache/datafusion/issues/16366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mai

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144282145 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144282145 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
drtconway commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2969144940 Ah yes! Thank you! But they're not in the Rust part of the documentation (https://datafusion.apache.org/user-guide/expressions.html), which is why I didn't know they we

Re: [PR] POC: Reduce `Arc` cloning on hashmap build side [datafusion]

2025-06-12 Thread via GitHub
Dandandan commented on PR #16380: URL: https://github.com/apache/datafusion/pull/16380#issuecomment-2969139729 > I've noticed that it is possible for `interleave` to perform worse than `take` despite the `Arc` clones from `take`. This happens twice as well for `equal_row_arr` and `build_bat

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n closed pull request #16390: fix: Remove `null_equals_null` todo in `NestedLoopJoin` URL: https://github.com/apache/datafusion/pull/16390 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [I] RFC: What 3 level naming system should we use for catalog providers? [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1142: URL: https://github.com/apache/datafusion-python/issues/1142#issuecomment-2969114420 ## Context & Problem Statement - **Current state** - **Datafusion core repo:** uses `catalog/schema/table` - **Datafusion Python repo:** uses `ca

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969068899 hi @timsaucer I can see some errors when I pytest _test_window_udf.py in #1145 but I want to be sure it is the same error as you are reporting in this issue. Ca

Re: [PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on PR #16390: URL: https://github.com/apache/datafusion/pull/16390#issuecomment-2969058654 > however @UBarney was able to point out that a on clause already included in the join filter 🤦. I mean that non-equal condition(eg `<=>`) in `on` will be included in the joi

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144201354 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969030143 Never mind, I got the file from #1145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [I] Panic in FFI UDWF when using wrapping lead function [datafusion-python]

2025-06-12 Thread via GitHub
kosiew commented on issue #1144: URL: https://github.com/apache/datafusion-python/issues/1144#issuecomment-2969025577 hi @timsaucer , Can you share the `examples/datafusion-ffi-example/src/window_udf.rs`? I don't believe it's in `main` yet. -- This is an automated message from t

[PR] fix: Remove `null_equals_null` todo in `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n opened a new pull request, #16390: URL: https://github.com/apache/datafusion/pull/16390 ## Which issue does this PR close? - Closes #. ## Rationale for this change I had created #16210 to add `null_equals_null` join support however @UBarney was able to po

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2968970691 ```sql DataFusion CLI v48.0.0 > -- Define sample data CREATE TABLE t1 (a INT, b INT, c INT) AS VALUES (4, NULL, NULL), (1, 2, 3), (3, 1, 2), (1, NULL,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144169706 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144167426 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [I] row-wise min and max [datafusion]

2025-06-12 Thread via GitHub
kosiew commented on issue #16366: URL: https://github.com/apache/datafusion/issues/16366#issuecomment-2968958912 hi @drtconway , pmin sounds a lot like https://datafusion.apache.org/user-guide/sql/scalar_functions.html#least and pmax like https://datafusion.apache.org/use

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144121144 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144133973 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2144121144 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2968831295 > Made some progress on the problem statement already. I gave the AI the facts, it turned it into something I would actually enjoy reading. I'm going to work on the way thin

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2144101877 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -49,6 +49,7 @@ datafusion-physical-plan = { workspace = true } itertools = { workspace = true } log = { wo

Re: [I] Optimize `NestedLoopJoinExec` Memory Usage [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on issue #16364: URL: https://github.com/apache/datafusion/issues/16364#issuecomment-2968778612 > limiting the the intermediate result to ~1 batch size is enough to keep the performance. Do you mean we should also limit num_row of [`left_side, right_side`](http

Re: [PR] Use pager and allow configuration via `\pset` [datafusion]

2025-06-12 Thread via GitHub
github-actions[bot] commented on PR #15597: URL: https://github.com/apache/datafusion/pull/15597#issuecomment-2968768204 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [I] `datafusion-cli`: Use correct S3 region if it is not specified [datafusion]

2025-06-12 Thread via GitHub
liamzwbao commented on issue #16306: URL: https://github.com/apache/datafusion/issues/16306#issuecomment-2968741593 Hi @alamb, from the upstream ticket, I think we can use `resolve_bucket_region` to get the region if it's not specified. However, I'm wondering what should be the expect

Re: [I] `datafusion-cli`: Use correct S3 region if it is not specified [datafusion]

2025-06-12 Thread via GitHub
liamzwbao commented on issue #16306: URL: https://github.com/apache/datafusion/issues/16306#issuecomment-296858 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-12 Thread via GitHub
parthchandra commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2143805849 ## spark/src/main/scala/org/apache/comet/rules/CometScanRule.scala: ## @@ -105,8 +105,49 @@ case class CometScanRule(session: SparkSession) extends Rule[

Re: [PR] feat: Parquet modular encryption [datafusion]

2025-06-12 Thread via GitHub
corwinjoy commented on PR #16351: URL: https://github.com/apache/datafusion/pull/16351#issuecomment-2968354595 @alamb One piece I would like to solicit feedback on is if there is a way to leverage the existing tests to more thoroughly vet encryption. What I mean by that, is that we uncovere

Re: [I] Document semi join, anti semi join and more supported join types [datafusion]

2025-06-12 Thread via GitHub
alamb closed issue #16245: Document semi join, anti semi join and more supported join types URL: https://github.com/apache/datafusion/issues/16245 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] doc: Add SQL examples for SEMI + ANTI Joins [datafusion]

2025-06-12 Thread via GitHub
alamb merged PR #16316: URL: https://github.com/apache/datafusion/pull/16316 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1864: URL: https://github.com/apache/datafusion-comet/pull/1864#discussion_r2143696076 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -952,32 +947,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Chore: implement predicate exprs as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1864: URL: https://github.com/apache/datafusion-comet/pull/1864#discussion_r2143694940 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -952,32 +947,23 @@ object QueryPlanSerde extends Logging with CometExprShim {

Re: [PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2968155976 This is looking good so far! I know it's not your change, but it made me wonder: do you know if we have a test that exercises this code path? `"Hour(scalar) should be fold in

Re: [PR] feat: Add experimental auto mode for `COMET_PARQUET_SCAN_IMPL` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1747: URL: https://github.com/apache/datafusion-comet/pull/1747#discussion_r2143610758 ## .github/workflows/spark_sql_test_native_auto.yml: ## @@ -0,0 +1,71 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on PR #1870: URL: https://github.com/apache/datafusion-comet/pull/1870#issuecomment-2968083119 These changes already got merged in https://github.com/apache/datafusion-comet/pull/1869 -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed pull request #1870: chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 URL: https://github.com/apache/datafusion-comet/pull/1870 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[PR] Add fast paths for try_process_unnest [datafusion]

2025-06-12 Thread via GitHub
simonvandel opened a new pull request, #16389: URL: https://github.com/apache/datafusion/pull/16389 ## Which issue does this PR close? - Closes https://github.com/apache/datafusion/issues/16242 ## Rationale for this change Reduce planning work for unnest e

Re: [I] Fix failed Spark SQL tests due to shuffle enabled [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed issue #231: Fix failed Spark SQL tests due to shuffle enabled URL: https://github.com/apache/datafusion-comet/issues/231 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] chore: Enable more Spark SQL tests [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove merged PR #1869: URL: https://github.com/apache/datafusion-comet/pull/1869 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Utf8View and BinaryView (i.e., StringView in Arrow, colloquially German-style strings) support [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1403: URL: https://github.com/apache/datafusion-comet/issues/1403#issuecomment-2967992880 I'll be picking this up again now that we dropped Java 8 support and bumped our Arrow Java version. -- This is an automated message from the Apache Git Service. To res

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
adriangb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2967987004 Very excited about this! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [PR] fix: cast_struct_to_struct aligns to Spark behavior [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1879: URL: https://github.com/apache/datafusion-comet/pull/1879#issuecomment-2967951510 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1879?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Add support for SMJ with RightSemi join [datafusion-comet]

2025-06-12 Thread via GitHub
dharanad commented on issue #1725: URL: https://github.com/apache/datafusion-comet/issues/1725#issuecomment-2967896852 @andygrove Unlike Datafusion, Spark does not natively support RightSemi join type. This presents a challenge, and I was hoping to get your thoughts on the best way to han

[PR] fix: cast_struct_to_struct aligns to Spark behavior [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich opened a new pull request, #1879: URL: https://github.com/apache/datafusion-comet/pull/1879 ## Which issue does this PR close? Closes #1875. ## Rationale for this change ## What changes are included in this PR? - `cast_struct_to_s

Re: [PR] feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed pull request #1878: feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` URL: https://github.com/apache/datafusion-comet/pull/1878 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2967624631 Spark 3.4 and 3.5 handle struct conversion in this test case differently. 3.4 inserts a `cast` expression in the Project operator, while 3.5 used a `named_struct` expres

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143240105 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2143227797 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet,

Re: [PR] feat: Implement `doCanonicalize` for `CometShuffleExchangeExec` [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1878: URL: https://github.com/apache/datafusion-comet/pull/1878#issuecomment-2967573528 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1878?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] Spark SQL test failures in native_iceberg_compat mode [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove closed issue #1542: Spark SQL test failures in native_iceberg_compat mode URL: https://github.com/apache/datafusion-comet/issues/1542 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove merged PR #1876: URL: https://github.com/apache/datafusion-comet/pull/1876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] feat: Implement doCanonicilize for CometShuffleExchangeExec [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1878: URL: https://github.com/apache/datafusion-comet/pull/1878 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] [Spark SQL] Enable all tests in DynamicPartitionPruningSuite [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1839: URL: https://github.com/apache/datafusion-comet/issues/1839#issuecomment-296748 The "canocilization and exchange reuse" test is expected to fail and should be ignored until Comet supports DPP. The two exchanges are different. One contains a `CometSca

Re: [I] [Spark SQL] Enable all tests in DynamicPartitionPruningSuite [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1839: URL: https://github.com/apache/datafusion-comet/issues/1839#issuecomment-2967483655 All of these tests will need to be ignored until we support DPP (cc @coderfender). This is not a correctness issue but a performance issue due to not reusing exchanges.

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Migrate core test to insta, part1 [datafusion]

2025-06-12 Thread via GitHub
Chen-Yuan-Lai commented on code in PR #16324: URL: https://github.com/apache/datafusion/pull/16324#discussion_r2143152053 ## datafusion/core/tests/sql/explain_analyze.rs: ## @@ -52,43 +54,45 @@ async fn explain_analyze_baseline_metrics() { let formatted = arrow::util::prett

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on PR #74: URL: https://github.com/apache/datafusion-site/pull/74#issuecomment-2967429852 also nit, any links with `https://github.com/apache/datafusion/blob/main/` runs into the risk of being stale at a later time. For example, if a file path was moved to a different l

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #74: URL: https://github.com/apache/datafusion-site/pull/74#discussion_r2143137504 ## content/blog/2025-06-15-optimizing-sql-dataframes-part-two.md: ## @@ -0,0 +1,533 @@ +--- +layout: post +title: Optimizing SQL (and DataFrames) in DataFusion,

Re: [PR] Blog: Optimizing SQL and DataFrames [datafusion-site]

2025-06-12 Thread via GitHub
kevinjqliu commented on code in PR #74: URL: https://github.com/apache/datafusion-site/pull/74#discussion_r2143107153 ## content/blog/2025-06-15-optimizing-sql-dataframes-part-one.md: ## @@ -0,0 +1,250 @@ +--- +layout: post +title: Optimizing SQL (and DataFrames) in DataFusion,

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
simonvandel commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2143061438 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -49,6 +49,7 @@ datafusion-physical-plan = { workspace = true } itertools = { workspace = true } log = { wo

[PR] Extend exception handling [datafusion-sqlparser-rs]

2025-06-12 Thread via GitHub
bombsimon opened a new pull request, #1884: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1884 - Exception handling can handle multiple `WHEN` arms - Exception can re-raise with `RAISE` keyword - Snowflake can now also parse `BEGIN ... EXCEPTION ... END` Example:

Re: [I] Optimize `NestedLoopJoinExec` Memory Usage [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on issue #16364: URL: https://github.com/apache/datafusion/issues/16364#issuecomment-2967225093 > limiting the the intermediate result to ~1 batch size is enough to keep the performance. Do you mean we should also limit num_row of [`left_side, right_side`](http

Re: [I] [Epic] Pipeline breaking cancellation support and improvement [datafusion]

2025-06-12 Thread via GitHub
pepijnve commented on issue #16353: URL: https://github.com/apache/datafusion/issues/16353#issuecomment-2967172675 Made some progress on the problem statement already. I gave the AI the facts, it turned it into something I would actually enjoy reading. I'm going to work on the way things wo

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2142968386 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [PR] feat: Support null aware + equijoins for `NestedLoopJoin` [datafusion]

2025-06-12 Thread via GitHub
UBarney commented on code in PR #16210: URL: https://github.com/apache/datafusion/pull/16210#discussion_r2142968386 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -178,6 +187,18 @@ pub struct NestedLoopJoinExec { metrics: ExecutionPlanMetricsSet, ///

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2967050602 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb closed issue #16383: Can't publish datafusion-spark crate due to error URL: https://github.com/apache/datafusion/issues/16383 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
xudong963 commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2967031389 Nice, just published datafusion-sqllogictest by fixing the files locally -- This is an automated message from the Apache Git Service. To respond to the message, please log on

[PR] Simplify expressions passed to table functions [datafusion]

2025-06-12 Thread via GitHub
simonvandel opened a new pull request, #16388: URL: https://github.com/apache/datafusion/pull/16388 ## Which issue does this PR close? Fixes https://github.com/apache/datafusion/issues/14958 ## Rationale for this change Table functions don't need to special case `

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-12 Thread via GitHub
pepijnve commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2966838912 I don't think it's necessary TBH. I applied this patch (which I think is what @berkaysynnada meant) and the test then fails in the way it's intended to. ``` Index: datafusi

Re: [PR] feat: Upgrade to official DataFusion 48.0.0 release [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1877: URL: https://github.com/apache/datafusion-comet/pull/1877#issuecomment-2966837862 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1877?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2966682640 > [@mbutrovich](https://github.com/mbutrovich) Could you work on this one? Sounds good! -- This is an automated message from the Apache Git Service. To respond t

Re: [I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat with Spark 3.4.3 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on issue #1875: URL: https://github.com/apache/datafusion-comet/issues/1875#issuecomment-2966675923 @mbutrovich Could you work on this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1876: URL: https://github.com/apache/datafusion-comet/pull/1876 ## Which issue does this PR close? Part of https://github.com/apache/datafusion-comet/issues/1542 ## Rationale for this change Disable the one remaining f

Re: [PR] chore: Enable Spark SQL tests for `native_iceberg_compat` [datafusion-comet]

2025-06-12 Thread via GitHub
codecov-commenter commented on PR #1876: URL: https://github.com/apache/datafusion-comet/pull/1876#issuecomment-2966738000 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1876?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] chore: Stop Running Spark SQL tests for Spark 3.5.4 and 3.5.5 [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove commented on code in PR #1870: URL: https://github.com/apache/datafusion-comet/pull/1870#discussion_r2142666410 ## .github/workflows/pr_build_linux.yml: ## @@ -74,14 +74,14 @@ jobs: maven_opts: "-Pspark-3.4 -Pscala-2.12" scan_impl: "native_com

[PR] feat: Add support for glob patterns in CREATE EXTERNAL TABLE commands [datafusion]

2025-06-12 Thread via GitHub
a-agmon opened a new pull request, #16387: URL: https://github.com/apache/datafusion/pull/16387 Partly closes #16303 The purpose of this PR is to enable using CREATE command with glob pattern and a URL scheme - i.e., ``` CREATE EXTERNAL TABLE ee3 STORED AS CSV LOCATION

[PR] feat: Upgrade to official DataFusion 48.0.0 release [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new pull request, #1877: URL: https://github.com/apache/datafusion-comet/pull/1877 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [I] Can't publish datafusion-spark crate due to error [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16383: URL: https://github.com/apache/datafusion/issues/16383#issuecomment-2966455712 I manually updated the cargo file (applied the patch above) and ran `cargo publish` to get this to publish at 48.0.0: - https://crates.io/crates/datafusion-spark/48.0.0

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
alamb commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966448392 Nice @zhuqi-lucas -- BTW I am not sure how easy it will be to use the parquet APIs to do this (specifically write arbitrary bytes to the inner writer) so it may take some fiddlin

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
alamb commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2142595175 ## datafusion/physical-optimizer/src/insert_yield_exec.rs: ## @@ -32,9 +34,10 @@ use datafusion_physical_plan::yield_stream::YieldStreamExec; use datafusion_physica

Re: [I] Enable merge queue in github to avoid commit confliction. [datafusion]

2025-06-12 Thread via GitHub
crepererum commented on issue #6880: URL: https://github.com/apache/datafusion/issues/6880#issuecomment-2966444088 > Based on ASF Slack, I believe MQ aren't currently supported in `.asf.yaml` because there's no API support ([github.com/orgs/community/discussions/50893](https://github.com/or

Re: [PR] Disable `datafusion-cli` tests for hash_collision tests, fix extended CI [datafusion]

2025-06-12 Thread via GitHub
alamb merged PR #16382: URL: https://github.com/apache/datafusion/pull/16382 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Improved experience when remote object store URL does not end in / [datafusion]

2025-06-12 Thread via GitHub
xiedeyantu opened a new pull request, #16386: URL: https://github.com/apache/datafusion/pull/16386 ## Which issue does this PR close? - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Support reading multiple parquet files via `datafusion-cli` [datafusion]

2025-06-12 Thread via GitHub
a-agmon commented on issue #16303: URL: https://github.com/apache/datafusion/issues/16303#issuecomment-293348 Hi @comphead and @alamb I thought it might be a good idea to split this issue to several PRs 1 - add the support to use `CREATE TABLE` syntax with glob patterns and remot

Re: [PR] bug: remove busy-wait while sort is ongoing [datafusion]

2025-06-12 Thread via GitHub
alamb commented on PR #16322: URL: https://github.com/apache/datafusion/pull/16322#issuecomment-2966502775 Is there an additional test we should write perhaps, to add the coverage @berkaysynnada suggests? -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966416266 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Chore: implement hour func as ScalarUDFImpl [datafusion-comet]

2025-06-12 Thread via GitHub
mbutrovich commented on PR #1874: URL: https://github.com/apache/datafusion-comet/pull/1874#issuecomment-2966636271 Thanks for the contribution, @trompa! I'll take a look through this later today. -- This is an automated message from the Apache Git Service. To respond to the message, ple

[I] [Spark SQL] Fix InsertSuite failure when using native_iceberg_compat [datafusion-comet]

2025-06-12 Thread via GitHub
andygrove opened a new issue, #1875: URL: https://github.com/apache/datafusion-comet/issues/1875 ### Describe the bug ``` 2025-06-12T03:28:33.2846280Z [info] - INSERT INTO TABLE - complex type but different names *** FAILED *** (223 milliseconds) 2025-06-12T03:28:33.3248391Z

Re: [I] Add an example of embedding indexes *inside* a parquet file [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on issue #16374: URL: https://github.com/apache/datafusion/issues/16374#issuecomment-2966419013 I am interested in this, and i want to be familiar with embedding indexes. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2142643694 ## datafusion/physical-optimizer/src/insert_yield_exec.rs: ## @@ -92,3 +96,23 @@ impl PhysicalOptimizerRule for InsertYieldExec { true } } + +#[

Re: [PR] Disable `datafusion-cli` tests for hash_collision tests, fix extended CI [datafusion]

2025-06-12 Thread via GitHub
alamb commented on PR #16382: URL: https://github.com/apache/datafusion/pull/16382#issuecomment-2966480185 > fair point! THank you for agreeing. I have to admit it seems somewhat like cheating, but I still think it is true 😆 -- This is an automated message from the Apache Git Serv

Re: [PR] feat: Support RightMark join for NestedLoop and Hash join [datafusion]

2025-06-12 Thread via GitHub
jonathanc-n commented on code in PR #16083: URL: https://github.com/apache/datafusion/pull/16083#discussion_r2142033043 ## datafusion/physical-plan/src/joins/utils.rs: ## @@ -1102,6 +1129,30 @@ where .collect() } +pub(crate) fn get_mark_indices( +range: &Range, +

Re: [PR] Minor: add testing case for add YieldStreamExec and polish docs [datafusion]

2025-06-12 Thread via GitHub
zhuqi-lucas commented on code in PR #16369: URL: https://github.com/apache/datafusion/pull/16369#discussion_r2142611695 ## datafusion/physical-optimizer/src/insert_yield_exec.rs: ## @@ -15,10 +15,12 @@ // specific language governing permissions and limitations // under the Lic

  1   2   >