Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Weijun-H commented on PR #14411: URL: https://github.com/apache/datafusion/pull/14411#issuecomment-2660797386 > I ran some tests yesterday and I can confirm the runtime improvements. I do get some high memory usage however especially with some queries (TPC-H Query 18 I believe) than when us

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660794608 > Edge case: let's say input is a deduplicated `StringViewArray` (like a 10k rows batch with only 100 distinct values, but payload content are stored without duplication, the ar

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
2010YOUY01 commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660783992 > Here is the detailed explanation: > > 1. We reserve 2X memory for each batch on insertion, so when `in_mem_batches` holds 5 batches and consumes 50 MB memory, we have alre

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957058065 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -let

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957057990 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957056599 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -let

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957056835 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -le

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957056835 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -le

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957056835 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -le

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957054182 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -let

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957055119 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -408,50 +395,100 @@ impl ExternalSorter { debug!("Spilling sort data of ExternalSorter to disk

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957054182 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -641,7 +702,15 @@ pub fn sort_batch( lexsort_to_indices(&sort_columns, fetch)? }; -let

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957054215 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -

[I] Feature request: hermetic build [datafusion]

2025-02-14 Thread via GitHub
dentiny opened a new issue, #14678: URL: https://github.com/apache/datafusion/issues/14678 ### Is your feature request related to a problem or challenge? When I was building datafusion for the first time (with `cargo test`), I met an error: ``` error: failed to run custom build

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660771854 > > Thank you @kazuyukitanimura for the PR, i applied the PR try to fix the testing, but the above testing is still failed for me, i am not sure if i am missing something. >

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Weijun-H commented on code in PR #14411: URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847 ## datafusion/physical-plan/src/repartition/on_demand_repartition.rs: ## @@ -0,0 +1,1589 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] feat: Support On-Demand Repartition [datafusion]

2025-02-14 Thread via GitHub
Weijun-H commented on code in PR #14411: URL: https://github.com/apache/datafusion/pull/14411#discussion_r1957051847 ## datafusion/physical-plan/src/repartition/on_demand_repartition.rs: ## @@ -0,0 +1,1589 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957051337 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on code in PR #14644: URL: https://github.com/apache/datafusion/pull/14644#discussion_r1957051337 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -302,31 +299,16 @@ impl ExternalSorter { } self.reserve_memory_for_merge()?; -

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660767316 > > Thank you @kazuyukitanimura for the PR, i applied the PR try to fix the testing, but the above testing is still failed for me, i am not sure if i am missing something. >

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957023902 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -126,29 +126,65 @@ fn update_sort_ctx_children( /// [`CoalescePartitionsExec`] descendant(s) for

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957046184 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -84,42 +84,56 @@ impl EnforceSorting { } } -/// This object is used within the [`EnforceS

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957046184 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -84,42 +84,56 @@ impl EnforceSorting { } } -/// This object is used within the [`EnforceS

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957033990 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +165,51 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartiti

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957033990 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +165,51 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartiti

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
Kontinuation commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660748423 > Thank you @kazuyukitanimura for the PR, i applied the PR try to fix the testing, but the above testing is still failed for me, i am not sure if i am missing something.

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
Dandandan commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1957035222 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal_e

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1957033990 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +165,51 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartiti

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957023902 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -126,29 +126,65 @@ fn update_sort_ctx_children( /// [`CoalescePartitionsExec`] descendant(s) for

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957023902 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -126,29 +126,65 @@ fn update_sort_ctx_children( /// [`CoalescePartitionsExec`] descendant(s) for

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957033043 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -138,27 +138,25 @@ fn is_coalesce_to_remove( node: &Arc, parent: &Arc, ) -> bool { -

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957032310 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -138,27 +138,25 @@ fn is_coalesce_to_remove( node: &Arc, parent: &Arc, ) -> bool { -

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957024693 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -298,20 +355,11 @@ pub fn parallelize_sorts( vec![requirements], ),

Re: [PR] Demonstrate EnforceSorting can remove a needed coalesce [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1957023902 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -126,29 +126,65 @@ fn update_sort_ctx_children( /// [`CoalescePartitionsExec`] descendant(s) for

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660707498 ```rust use arrow::array::{RecordBatch, StringBuilder}; use arrow_schema::{DataType, Field, Schema}; use datafusion::execution::disk_manager::DiskManagerConfig; use dat

[PR] feat: pretty explain [datafusion]

2025-02-14 Thread via GitHub
irenjj opened a new pull request, #14677: URL: https://github.com/apache/datafusion/pull/14677 ## Which issue does this PR close? - Partof #9371 ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660658972 > > What about `.github/actions/setup-spark-builder/action.yaml`? > > There is no need to override the default value since it gets replace dynamically, but I filed an is

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956991849 ## native/core/src/execution/planner.rs: ## @@ -1155,12 +1154,9 @@ impl PhysicalPlanner { )) }); -

Re: [PR] Signature::Coercible with user defined implicit casting [datafusion]

2025-02-14 Thread via GitHub
shehabgamin commented on PR #14440: URL: https://github.com/apache/datafusion/pull/14440#issuecomment-2660643722 @jayzhan211 I will re-review by tomorrow EOD, exciting progress! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
codecov-commenter commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660637358 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1405?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Reorganize the Parser module [datafusion-sqlparser-rs]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #1581: Reorganize the Parser module URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1581 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] PoC Adaptive round robin repartitioning [datafusion]

2025-02-14 Thread via GitHub
github-actions[bot] closed pull request #13699: PoC Adaptive round robin repartitioning URL: https://github.com/apache/datafusion/pull/13699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Refactor signatures for lpad, rpad, left, and right [datafusion]

2025-02-14 Thread via GitHub
github-actions[bot] commented on PR #13420: URL: https://github.com/apache/datafusion/pull/13420#issuecomment-2660627372 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660616259 > What about `.github/actions/setup-spark-builder/action.yaml`? There is no need to override the default value since it gets replace dynamically, but I filed an issue to

[I] Remove hard-coded Comet version numbers from GitHub actions [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new issue, #1406: URL: https://github.com/apache/datafusion-comet/issues/1406 ### What is the problem the feature request solves? We hard-code the current snapshot version in some GitHub actions. We should get the version number from the pom.xml instead to remove th

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660614850 > The title says `[branch-0.6]` but this PR is against `main`? Thanks. Updated. -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405#issuecomment-2660613861 The title says `[branch-0.6]` but this PR is against `main`? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956971872 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

[PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new pull request, #1405: URL: https://github.com/apache/datafusion-comet/pull/1405 ## Which issue does this PR close? Fix incorrect Comet version in Spark diffs. ## Rationale for this change Fixing this just in case anyone wants to run S

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2660597523 Not directly related to the point of this PR but regarding ` I had a hard time making DataFusion Comet work on cloud instances with 4GB memory per CPU core, partially b

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660587673 Thanks @viirya @EmilyMatt Did you mean that we should be able to run Comet aggregation even Comet shuffle is disabled by > I believe I've seen a few

Re: [I] Attach `Diagnostic` to "more than one column in subquery" error [datafusion]

2025-02-14 Thread via GitHub
irenjj commented on issue #14438: URL: https://github.com/apache/datafusion/issues/14438#issuecomment-2660573715 > Hey [@irenjj](https://github.com/irenjj) how is it going with this ticket :) Can I help with anything? Hi, @eliaperantoni, Sorry for not updating my status for a long tim

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#issuecomment-2660574425 oops, @comphead would you mind merging the latest main into this PR branch in order to resolve the conflict? -- This is an automated message from the Apache Git Servi

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura merged PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
viirya commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660573325 Because this? ``` // When Comet shuffle is disabled, we don't want to transform the HashAggregate // to CometHashAggregate. Otherwise, we probably get partial Co

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660569970 Merged thanks @mbutrovich @parthchandra @comphead @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura merged PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [I] AQE may materialize a non-supported Final-mode HashAggregate [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on issue #1389: URL: https://github.com/apache/datafusion-comet/issues/1389#issuecomment-2660567872 cc @viirya I forgot why we did this in #991 https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/or

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956946498 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comme

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
comphead commented on PR #14675: URL: https://github.com/apache/datafusion/pull/14675#issuecomment-2660554270 @simonvandel I'd like to ask you to create a slt test for UUID(), I know it is non guaranteed output, but we can check the v4 validity format I suppose. -- This is an automated me

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
comphead commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956937048 ## native/core/src/execution/planner.rs: ## @@ -1155,12 +1154,9 @@ impl PhysicalPlanner { )) }); -let

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#issuecomment-2660546336 #1389 mentioned https://github.com/apache/datafusion-comet/blob/f099e6e40aa18441c7882e5bffd9d6dfb10c6c19/spark/src/main/scala/org/apache/comet/CometSparkSessionExtens

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660530828 > > @mbutrovich Do you plan to overwrite spark/benchmarks/CometReadBenchmark-jdk11-results.txt ? > > > Thanks for running this benchmark @mbutrovich. Slowness in nati

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956903894 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: Data

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove merged PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956879942 ## native/core/src/execution/planner.rs: ## @@ -1155,12 +1154,9 @@ impl PhysicalPlanner { )) }); -

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
comphead commented on code in PR #14675: URL: https://github.com/apache/datafusion/pull/14675#discussion_r1956877727 ## datafusion/functions/src/string/uuid.rs: ## @@ -87,7 +88,13 @@ impl ScalarUDFImpl for UuidFunc { if !args.is_empty() { return internal_er

Re: [PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
simonvandel commented on PR #14675: URL: https://github.com/apache/datafusion/pull/14675#issuecomment-2660484055 Oops, need to generate valid uuidv4 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] chore: adding Linkedin follow page [datafusion]

2025-02-14 Thread via GitHub
comphead opened a new pull request, #14676: URL: https://github.com/apache/datafusion/pull/14676 ## Which issue does this PR close? Related to #14389 - Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are th

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1956865360 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
parthchandra commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2660476596 @kazuyukitanimura please go ahead and merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] fix: AQE creating a non-supported Final HashAggregate post-shuffle [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1390: URL: https://github.com/apache/datafusion-comet/pull/1390#discussion_r1956799716 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -216,6 +216,17 @@ object CometConf extends ShimCometConf { val COMET_EXEC_INITCAP_

[PR] Speed up `uuid` UDF (20x faster) [datafusion]

2025-02-14 Thread via GitHub
simonvandel opened a new pull request, #14675: URL: https://github.com/apache/datafusion/pull/14675 ## Which issue does this PR close? N/A ## Rationale for this change It seems to be faster to generate random u128's in bulk, and then converting them to Uuids.

Re: [I] [DISCUSSION] 2025 Q1-Q2 Roadmap [datafusion]

2025-02-14 Thread via GitHub
comphead commented on issue #14580: URL: https://github.com/apache/datafusion/issues/14580#issuecomment-2660433774 I'll try to chase https://github.com/apache/datafusion/issues/13816 https://github.com/apache/datafusion/issues/14389 -- This is an automated message from the Apache Git

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
codecov-commenter commented on PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#issuecomment-2660420263 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1404?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Improve EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1956781210 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -151,10 +165,51 @@ fn update_coalesce_ctx_children( }; } -/// The boolean flag `repartitio

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
alamb commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956785686 ## datafusion/physical-optimizer/src/enforce_sorting/replace_with_order_preserving_variants.rs: ## @@ -45,7 +45,7 @@ use itertools::izip; pub type OrderPreservation

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
alamb commented on PR #14650: URL: https://github.com/apache/datafusion/pull/14650#issuecomment-2660389460 FYI @ozankabak and @berkaysynnada -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: Add support for distinct aggregates [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove closed pull request #1261: feat: Add support for distinct aggregates URL: https://github.com/apache/datafusion-comet/pull/1261 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-14 Thread via GitHub
comphead commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1956765242 ## native/core/Cargo.toml: ## @@ -77,6 +77,7 @@ datafusion-comet-proto = { workspace = true } object_store = { workspace = true } url = { workspace = true }

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956762065 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt: DataTyp

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956760901 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956760033 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove commented on code in PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#discussion_r1956753958 ## dev/diffs/3.4.3.diff: ## @@ -7,7 +7,7 @@ index d3544881af1..26ab186c65d 100644 2.5.1 2.0.8 +3.4 -+0.5.0-SNAPSHOT Review Comment:

[PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove opened a new pull request, #1404: URL: https://github.com/apache/datafusion-comet/pull/1404 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] docs: Add changelog for 0.6.0 release [datafusion-comet]

2025-02-14 Thread via GitHub
andygrove merged PR #1402: URL: https://github.com/apache/datafusion-comet/pull/1402 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] test: Add experimental native scans to CometReadBenchmark [datafusion-comet]

2025-02-14 Thread via GitHub
mbutrovich commented on PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#issuecomment-2660305494 > Sorry, one more question @mbutrovich do we need to add the DataFusion/IcebergCompat scans to `readerBenchmark` as well? That benchmark is more of a microbenchmark tha

[PR] Update GitHub CI run image [datafusion]

2025-02-14 Thread via GitHub
findepi opened a new pull request, #14674: URL: https://github.com/apache/datafusion/pull/14674 GitHub runs include this warning The Ubuntu-20.04 brownout takes place from 2025-02-01. For more details, see https://github.com/actions/runner-images/issues/11101 Let's t

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [PR] Simple Functions Preview [datafusion]

2025-02-14 Thread via GitHub
findepi commented on PR #14668: URL: https://github.com/apache/datafusion/pull/14668#issuecomment-2660287208 i now also have support for various numeric ```rust #[excalibur_function] fn add(a: i32, b: u32) -> i64 { a as i64 + b as i64 } ``` nullable fu

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

[PR] Update EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld opened a new pull request, #14673: URL: https://github.com/apache/datafusion/pull/14673 ## Which issue does this PR close? Helps with the docs effort https://github.com/apache/datafusion/issues/7013. ## Rationale for this change Noticed while reviewing https://gith

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [PR] Minor: remove confusing `update_plan_from_children` call from `EnforceSorting` [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14650: URL: https://github.com/apache/datafusion/pull/14650#discussion_r1956722551 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -119,7 +119,7 @@ fn update_sort_ctx_children( } node.data = data; -node.update_pl

Re: [PR] Update EnforceSorting docs. [datafusion]

2025-02-14 Thread via GitHub
wiedld commented on code in PR #14673: URL: https://github.com/apache/datafusion/pull/14673#discussion_r1956718287 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -84,42 +84,56 @@ impl EnforceSorting { } } -/// This object is used within the [`EnforceS

[PR] bug: fix offset type mismatch when prepending lists [datafusion]

2025-02-14 Thread via GitHub
friendlymatthew opened a new pull request, #14672: URL: https://github.com/apache/datafusion/pull/14672 Closes #14613 `array_prepend` would error when attempting to concatenate certain `List` data types due to an incorrect offset type assumption. The error occurs because the impleme

Re: [PR] fix: Change default value of COMET_SCAN_ALLOW_INCOMPATIBLE and add documentation [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on code in PR #1398: URL: https://github.com/apache/datafusion-comet/pull/1398#discussion_r1956696728 ## spark/src/main/scala/org/apache/comet/DataTypeSupport.scala: ## @@ -37,7 +37,7 @@ trait DataTypeSupport { private def isGloballySupported(dt:

Re: [I] Tracking: date_time related features [datafusion]

2025-02-14 Thread via GitHub
Omega359 commented on issue #14661: URL: https://github.com/apache/datafusion/issues/14661#issuecomment-2660211300 https://github.com/apache/datafusion/issues/8282 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [PR] fix: Reduce cast.rs and utils.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-14 Thread via GitHub
kazuyukitanimura commented on PR #1387: URL: https://github.com/apache/datafusion-comet/pull/1387#issuecomment-2660195143 @parthchandra any other comments? otherwise I can merge this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] feat: add resolved `target` to `DmlStatement` (to eliminate need for table lookup after deserialization) [datafusion]

2025-02-14 Thread via GitHub
milenkovicm commented on PR #14631: URL: https://github.com/apache/datafusion/pull/14631#issuecomment-2660174463 @alamb this PR looks fine I verified it with `datafusion.execution.parquet.pushdown_filters=true` but still same problem like https://github.com/apache/datafusion/pull/14631#iss

  1   2   3   >