Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-08 Thread via GitHub
Eason0729 commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2401475753 Here is one stacktrace. [partial_stacktrace.txt](https://github.com/user-attachments/files/17303367/partial_stacktrace.txt) I am trying to understand which stack fram

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on PR #12800: URL: https://github.com/apache/datafusion/pull/12800#issuecomment-2401465597 I refactored a fair bit to address the comments. - renamed `apply_projection` to `apply_masking` - Split `ensure_schema_compatibility` into two functions. new `ensure_schema_

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792935095 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -340,41 +337,20 @@ pub fn extract_projection( .iter() .map(|it

Re: [PR] Add Aggregation fuzzer framework [datafusion]

2024-10-08 Thread via GitHub
Rachelint commented on code in PR #12667: URL: https://github.com/apache/datafusion/pull/12667#discussion_r1792927844 ## datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs: ## @@ -44,6 +44,118 @@ use rand::rngs::StdRng; use rand::{Rng, SeedableRng}; use tokio::task::JoinSet;

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792879962 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -781,10 +757,11 @@ pub async fn from_substrait_rel( from_substrait_named_struct(nam

Re: [I] Panic in `nth_value` window function (SQLancer) [datafusion]

2024-10-08 Thread via GitHub
HuSen8891 commented on issue #12815: URL: https://github.com/apache/datafusion/issues/12815#issuecomment-2401346812 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Ordered Set Aggregate Functions [datafusion]

2024-10-08 Thread via GitHub
Garamda commented on issue #12824: URL: https://github.com/apache/datafusion/issues/12824#issuecomment-2401308768 Alright. Before I take the issue, let me be clear. What I understood is that the things to be implemented are those five functions below, https://github.com/user-attachmen

Re: [PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-08 Thread via GitHub
jonathanc-n commented on code in PR #12823: URL: https://github.com/apache/datafusion/pull/12823#discussion_r1792831269 ## datafusion/functions-aggregate-common/src/tdigest.rs: ## @@ -641,8 +641,15 @@ impl TDigest { v => panic!("invalid centroids type {v:?}"),

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-08 Thread via GitHub
andygrove commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2401248796 Sounds good to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Parse real number literals as the Decimal type [datafusion]

2024-10-08 Thread via GitHub
andygrove commented on issue #12817: URL: https://github.com/apache/datafusion/issues/12817#issuecomment-2401246744 +1 for this. The current behavior is not consistent with ANSI SQL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Performance: Add "read strings as binary" option for parquet [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 commented on issue #12788: URL: https://github.com/apache/datafusion/issues/12788#issuecomment-2401216381 Yes, we need to support binary -> utf8view in arrow cast -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] feat: Implement bloom_filter_agg [datafusion-comet]

2024-10-08 Thread via GitHub
viirya commented on PR #987: URL: https://github.com/apache/datafusion-comet/pull/987#issuecomment-2401208327 I do not have time to look at this error yet. I may take a look after the conference. -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Bump cookie and express in /datafusion/wasmtest/datafusion-wasm-app [datafusion]

2024-10-08 Thread via GitHub
dependabot[bot] opened a new pull request, #12825: URL: https://github.com/apache/datafusion/pull/12825 Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together. Updates `cookie` from 0.6.0 to 0

Re: [PR] Chore: Move `aggregate statistics` optimizer test from core to optimizer crate [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 merged PR #12783: URL: https://github.com/apache/datafusion/pull/12783 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Chore: Move `aggregate statistics` optimizer test from core to optimizer crate [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 commented on PR #12783: URL: https://github.com/apache/datafusion/pull/12783#issuecomment-2401204660 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Chore: Move `aggregate statistics` optimizer test from core to optimizer crate [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 commented on code in PR #12783: URL: https://github.com/apache/datafusion/pull/12783#discussion_r1792662810 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -32,9 +32,15 @@ rust-version = { workspace = true } workspace = true [dependencies] +arrow = { workspace

Re: [I] Ordered Set Aggregate Functions [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 commented on issue #12824: URL: https://github.com/apache/datafusion/issues/12824#issuecomment-2401201975 > Hi, I'm new here. I want to take the issue or participate in developing one of those functions. Would it be possible? Sure, just type `take` then you will be assigned

Re: [PR] feat: Use fair-spill pool when `spark.memory.offHeap.enabled=false` [datafusion-comet]

2024-10-08 Thread via GitHub
Kontinuation commented on PR #1004: URL: https://github.com/apache/datafusion-comet/pull/1004#issuecomment-2401198781 This is better than using a greedy memory pool. It makes spillable operators work correctly under memory pressure, especially when running sort-merge-join where multiple so

Re: [PR] Minor: add documentation note about `NullState` [datafusion]

2024-10-08 Thread via GitHub
jonahgao merged PR #12791: URL: https://github.com/apache/datafusion/pull/12791 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Panic in `simplify_expressions` optimizer rules when running an aggregate query (SQLancer) [datafusion]

2024-10-08 Thread via GitHub
jonathanc-n commented on issue #12814: URL: https://github.com/apache/datafusion/issues/12814#issuecomment-2401129822 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[I] Ordered Set Aggregate Functions [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 opened a new issue, #12824: URL: https://github.com/apache/datafusion/issues/12824 ### Is your feature request related to a problem or challenge? DataFusion doesn't support ordered-set aggregate functions yet. Those functions are supported in [Postgres](https://www.

Re: [PR] [logical-types] fix conflicts [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 merged PR #12820: URL: https://github.com/apache/datafusion/pull/12820 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-08 Thread via GitHub
jonahgao commented on code in PR #12823: URL: https://github.com/apache/datafusion/pull/12823#discussion_r1792710487 ## datafusion/functions-aggregate-common/src/tdigest.rs: ## @@ -641,8 +641,15 @@ impl TDigest { v => panic!("invalid centroids type {v:?}"),

Re: [PR] Fix: handle NULL input in lead window function [datafusion]

2024-10-08 Thread via GitHub
HuSen8891 commented on PR #12811: URL: https://github.com/apache/datafusion/pull/12811#issuecomment-2401092415 > Thanks @HuSen8891 I have some feeling that we should fix the out_data_type earlier so it will serve both LAG/LEAD Thanks! handle NULL input in lag window function -- Thi

Re: [I] Ordered Set Aggregate Functions [datafusion]

2024-10-08 Thread via GitHub
Garamda commented on issue #12824: URL: https://github.com/apache/datafusion/issues/12824#issuecomment-2401138148 Hi, I'm new here. I want to take the issue or participate in developing one of those functions. Would it be possible? -- This is an automated message from the Apache Git Servi

Re: [PR] Reuse hash [datafusion]

2024-10-08 Thread via GitHub
github-actions[bot] closed pull request #11708: Reuse hash URL: https://github.com/apache/datafusion/pull/11708 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Remove unused dependencies and features [datafusion]

2024-10-08 Thread via GitHub
jonahgao commented on code in PR #12808: URL: https://github.com/apache/datafusion/pull/12808#discussion_r1792685135 ## datafusion/physical-plan/Cargo.toml: ## @@ -69,6 +68,7 @@ rand = { workspace = true } tokio = { workspace = true } [dev-dependencies] +datafusion-functions

Re: [I] Add option to replace SortMergeJoin with ShuffleHashJoin [datafusion-comet]

2024-10-08 Thread via GitHub
viirya commented on issue #1006: URL: https://github.com/apache/datafusion-comet/issues/1006#issuecomment-2401086952 It sounds reasonable. The vectorized implementation of SMJ looks inefficient in DataFusion. I'm not sure if there is any optimized algorithm for SMJ in vectorized execution.

Re: [PR] chore: Remove NativeBase static initializer (to improve error handling when native lib fails to load) [datafusion-comet]

2024-10-08 Thread via GitHub
parthchandra commented on code in PR #1000: URL: https://github.com/apache/datafusion-comet/pull/1000#discussion_r1792654082 ## common/src/main/java/org/apache/comet/parquet/Native.java: ## @@ -24,6 +24,13 @@ import org.apache.comet.NativeBase; public final class Native exte

Re: [PR] Chore: Move `aggregate statistics` optimizer test from core to optimizer crate [datafusion]

2024-10-08 Thread via GitHub
jayzhan211 commented on code in PR #12783: URL: https://github.com/apache/datafusion/pull/12783#discussion_r1792662810 ## datafusion/physical-optimizer/Cargo.toml: ## @@ -32,9 +32,15 @@ rust-version = { workspace = true } workspace = true [dependencies] +arrow = { workspace

[PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-08 Thread via GitHub
jonathanc-n opened a new pull request, #12823: URL: https://github.com/apache/datafusion/pull/12823 ## Which issue does this PR close? Closes #12716. ## Rationale for this change Handles case where min/max are infinite or not a number (NaN) ## What changes

Re: [PR] Add TPC-DS scripts and documentation [datafusion-benchmarks]

2024-10-08 Thread via GitHub
mbutrovich commented on PR #7: URL: https://github.com/apache/datafusion-benchmarks/pull/7#issuecomment-2400939004 SF10 was good too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[I] Add option to replace SortMergeJoin with ShuffleHashJoin [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove opened a new issue, #1006: URL: https://github.com/apache/datafusion-comet/issues/1006 ### What is the problem the feature request solves? Other Spark accelerators, such as Spark RAPIDS and Apache Gluten, replace SortMergeJoin with ShuffleHashJoin for improved performance. W

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead commented on PR #12822: URL: https://github.com/apache/datafusion/pull/12822#issuecomment-2400893643 WDYT should we move in this direction? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] WIP: move SMJ join filtered part out of join_output stage. LeftOuter experiment [datafusion]

2024-10-08 Thread via GitHub
comphead commented on PR #12764: URL: https://github.com/apache/datafusion/pull/12764#issuecomment-2400878363 @korowa I'm planning to move all other join variants to the same approach so the filtered logic will be in a single place, test it out and make the PR ready for your review -- Th

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
vbarua commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792548576 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -327,11 +325,10 @@ pub async fn from_substrait_extended_expr( }) } -/// parse projection -pub fn

Re: [PR] Ts/minor updates release process [datafusion-python]

2024-10-08 Thread via GitHub
timsaucer merged PR #903: URL: https://github.com/apache/datafusion-python/pull/903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Enable GitHub discussions [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on issue #368: URL: https://github.com/apache/datafusion-comet/issues/368#issuecomment-2400854694 This was completed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] [logical-types] update working branch [datafusion]

2024-10-08 Thread via GitHub
notfilippo commented on PR #12812: URL: https://github.com/apache/datafusion/pull/12812#issuecomment-2400849906 > The CI appears to be failing on the `logical-types` branch -- perhaps you can make a follow on PR to fix that? Filed this PR: #12820 -- This is an automated message fr

Re: [I] Enable GitHub discussions [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove closed issue #368: Enable GitHub discussions URL: https://github.com/apache/datafusion-comet/issues/368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] docs: Various documentation improvements [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove merged PR #1005: URL: https://github.com/apache/datafusion-comet/pull/1005 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead commented on code in PR #12822: URL: https://github.com/apache/datafusion/pull/12822#discussion_r1792510641 ## datafusion/functions/src/math/log.rs: ## @@ -37,6 +38,7 @@ use datafusion_expr::{ }; use datafusion_expr::{ScalarUDFImpl, Signature, Volatility}; +#[udf_do

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead commented on PR #12822: URL: https://github.com/apache/datafusion/pull/12822#issuecomment-2400818202 @Omega359 @alamb I tried to play with custom attributes to wrap up the documentation on top of the what @Omega359 already built. I'm experimenting with just 2 fields(description and

[PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead opened a new pull request, #12822: URL: https://github.com/apache/datafusion/pull/12822 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Add Aggregation fuzzer framework [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12667: URL: https://github.com/apache/datafusion/pull/12667#issuecomment-2400810535 > And unforunately, seems a bug about min/max for string found... Maybe we should check and try to fix it in other pr. Yes, let's handle that as a separate PR / issue. -- Thi

Re: [PR] Remove unnecessary `DFSchema::check_ambiguous_name` [datafusion]

2024-10-08 Thread via GitHub
alamb merged PR #12805: URL: https://github.com/apache/datafusion/pull/12805 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Easier way to convert between `ParquetExec` and `ParquetExecBuilder` [datafusion]

2024-10-08 Thread via GitHub
alamb closed issue #12737: Easier way to convert between `ParquetExec` and `ParquetExecBuilder` URL: https://github.com/apache/datafusion/issues/12737 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] API from `ParquetExec` to `ParquetExecBuilder` [datafusion]

2024-10-08 Thread via GitHub
alamb merged PR #12799: URL: https://github.com/apache/datafusion/pull/12799 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Converting rank builtin function to UDWF [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12718: URL: https://github.com/apache/datafusion/pull/12718#issuecomment-2400807798 Awesome -- than you @jatin510 -- I have startd the CI and plan to review this tomorrow morning -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead commented on code in PR #12822: URL: https://github.com/apache/datafusion/pull/12822#discussion_r1792510829 ## datafusion/functions/src/math/log.rs: ## @@ -472,4 +471,16 @@ mod tests { SortProperties::Unordered ); } + +#[test] +fn test

Re: [PR] WIP Implement special min/max accumulator for Strings and Binary (10% faster for Clickbench Q28) [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12792: URL: https://github.com/apache/datafusion/pull/12792#issuecomment-2400821900 I need to write some more specific / targeted tests here and we'll be good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-08 Thread via GitHub
comphead commented on code in PR #12822: URL: https://github.com/apache/datafusion/pull/12822#discussion_r1792507680 ## Cargo.toml: ## @@ -48,6 +48,8 @@ members = [ "datafusion-examples", "test-utils", "benchmarks", +"datafusion/macros", +"datafusion/pre-m

Re: [PR] Add Aggregation fuzzer framework [datafusion]

2024-10-08 Thread via GitHub
alamb commented on code in PR #12667: URL: https://github.com/apache/datafusion/pull/12667#discussion_r1792503168 ## datafusion/core/tests/fuzz_cases/aggregate_fuzz.rs: ## @@ -44,6 +44,307 @@ use rand::rngs::StdRng; use rand::{Rng, SeedableRng}; use tokio::task::JoinSet; +us

Re: [PR] chore: make ParquetExec's with_file_groups public [datafusion]

2024-10-08 Thread via GitHub
alamb closed pull request #12726: chore: make ParquetExec's with_file_groups public URL: https://github.com/apache/datafusion/pull/12726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] API from `ParquetExec` to `ParquetExecBuilder` [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12799: URL: https://github.com/apache/datafusion/pull/12799#issuecomment-2400809002 Thanks again @jayzhan211 and @NGA-TRAN -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] [DISCUSSION] Make DataFusion is the fastest engine for querying parquet data in ClickBench [datafusion]

2024-10-08 Thread via GitHub
alamb commented on issue #12821: URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2400801350 Changes I think will make these queries significantly faster: - [x] https://github.com/apache/datafusion/pull/11627 - @korowa - [x] https://github.com/apache/datafusion/p

[I] [DISCUSSION] Make DataFusion is the fastest engine for querying parquet data in ClickBench [datafusion]

2024-10-08 Thread via GitHub
alamb opened a new issue, #12821: URL: https://github.com/apache/datafusion/issues/12821 ### Is your feature request related to a problem or challenge? I am mostly writing this up to record what I think is an ongoing work with @jayzhan211 @Rachelint @korowa and myself TLDR, we

[PR] docs: Various documentation improvements [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove opened a new pull request, #1005: URL: https://github.com/apache/datafusion-comet/pull/1005 ## Which issue does this PR close? N/A ## Rationale for this change Various documentation improvements: - Installtion Guide: - Remove old cont

[PR] [logical-types] Fix conflicts [datafusion]

2024-10-08 Thread via GitHub
notfilippo opened a new pull request, #12820: URL: https://github.com/apache/datafusion/pull/12820 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscri

Re: [PR] WIP Implement special min/max accumulator for Strings and Binary [datafusion]

2024-10-08 Thread via GitHub
alamb commented on code in PR #12792: URL: https://github.com/apache/datafusion/pull/12792#discussion_r1792438202 ## datafusion/functions-aggregate/src/min_max/min_max_bytes.rs: ## @@ -0,0 +1,596 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [I] Performance: Add "read strings as binary" option for parquet [datafusion]

2024-10-08 Thread via GitHub
alamb commented on issue #12788: URL: https://github.com/apache/datafusion/issues/12788#issuecomment-2400705951 Thank you @goldmedal -- I am checking it out now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

Re: [PR] Add Aggregation fuzzer framework [datafusion]

2024-10-08 Thread via GitHub
Rachelint commented on PR #12667: URL: https://github.com/apache/datafusion/pull/12667#issuecomment-2400592077 @alamb I have added tests to cover more aggregations: - basic prim aggr(sum/sum distinct/max/min/count/avg) - basic string aggr(count/count distinct/min/max) I make it t

Re: [PR] Refactor `DependencyMap` and `Dependencies` into structs [datafusion]

2024-10-08 Thread via GitHub
alamb merged PR #12761: URL: https://github.com/apache/datafusion/pull/12761 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792359380 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -340,41 +337,20 @@ pub fn extract_projection( .iter() .map(|it

Re: [PR] Use OnceLock to store TokioRuntime [datafusion-python]

2024-10-08 Thread via GitHub
Michael-J-Ward commented on code in PR #895: URL: https://github.com/apache/datafusion-python/pull/895#discussion_r1792406544 ## src/utils.rs: ## @@ -20,20 +20,19 @@ use crate::TokioRuntime; use datafusion::logical_expr::Volatility; use pyo3::prelude::*; use std::future::Futu

Re: [PR] feat: Use fair-spill pool when `spark.executor.offHeap.enabled=false` [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on PR #1004: URL: https://github.com/apache/datafusion-comet/pull/1004#issuecomment-2400661907 I was able to get benchmarks running by allocating more memory to Comet. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792393774 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -990,24 +978,46 @@ pub async fn from_substrait_rel( fn ensure_schema_compatability( table: DataFra

Re: [I] Implement physical optimizer rule for common subexpression elimination [datafusion]

2024-10-08 Thread via GitHub
peter-toth commented on issue #12599: URL: https://github.com/apache/datafusion/issues/12599#issuecomment-2400621255 Just a quick update that I've started working on this. Will try to submit a PR this week. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792372978 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -990,24 +978,46 @@ pub async fn from_substrait_rel( fn ensure_schema_compatability( table: DataFra

Re: [I] Performance: Add "read strings as binary" option for parquet [datafusion]

2024-10-08 Thread via GitHub
goldmedal commented on issue #12788: URL: https://github.com/apache/datafusion/issues/12788#issuecomment-2400448053 @alamb @jayzhan211 I drafted a PR #12816 for a simple POC. In this PR, we can use it like ```rust let ctx = SessionContext::new(); ctx.sql( r#"

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
westonpace commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792348343 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -340,41 +337,20 @@ pub fn extract_projection( .iter() .map

Re: [PR] Converting rank builtin function to UDWF [datafusion]

2024-10-08 Thread via GitHub
jatin510 commented on PR #12718: URL: https://github.com/apache/datafusion/pull/12718#issuecomment-2400556759 The PR is ready to be reviewed @jcsherin @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

[PR] feat: Use fair-spill pool when `spark.executor.offHeap.enabled=false` [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove opened a new pull request, #1004: URL: https://github.com/apache/datafusion-comet/pull/1004 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/996 ## Rationale for this change ## What changes are included

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792350577 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -865,9 +846,16 @@ pub async fn from_substrait_rel( let name = filename.unwrap();

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792254336 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -990,24 +989,44 @@ pub async fn from_substrait_rel( fn ensure_schema_compatability( table: DataFra

Re: [PR] feat: Use fair-spill pool when `spark.executor.offHeap.enabled=false` [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on PR #1004: URL: https://github.com/apache/datafusion-comet/pull/1004#issuecomment-2400570194 @Kontinuation Is this the general approach you were suggesting? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792346770 ## datafusion/substrait/tests/cases/substrait_validations.rs: ## @@ -104,20 +103,18 @@ mod tests { ); // the DataFusion schema { b, a, c,

Re: [PR] feat: Use fair-spill pool when `spark.executor.offHeap.enabled=false` [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on PR #1004: URL: https://github.com/apache/datafusion-comet/pull/1004#issuecomment-2400569123 I tried running benchmarks with this PR but ran into: ``` Failed to allocate additional 917708800 bytes for ShuffleRepartitioner[0] with 0 bytes already allocated for

Re: [PR] Add TPC-DS scripts and documentation [datafusion-benchmarks]

2024-10-08 Thread via GitHub
mbutrovich commented on PR #7: URL: https://github.com/apache/datafusion-benchmarks/pull/7#issuecomment-2400498041 Data generation ran successfully for SF 100. I'll try with SF 10 and SF 1, but that's an approval from me! -- This is an automated message from the Apache Git Service. To re

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
Blizzara commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792280636 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -835,6 +812,10 @@ pub async fn from_substrait_rel( })) } So

Re: [PR] Minor: clean up TODO comments in unnest.slt [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12795: URL: https://github.com/apache/datafusion/pull/12795#issuecomment-2400500164 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Minor: clean up TODO comments in unnest.slt [datafusion]

2024-10-08 Thread via GitHub
alamb merged PR #12795: URL: https://github.com/apache/datafusion/pull/12795 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-08 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1792258366 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -783,7 +783,6 @@ pub async fn from_substrait_rel( let t = ctx.table(table_reference.c

Re: [I] Comet 0.2.0 Release [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove closed issue #843: Comet 0.2.0 Release URL: https://github.com/apache/datafusion-comet/issues/843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

[I] Access children `DataType` in `ScalarUDFImpl::invoke` [datafusion]

2024-10-08 Thread via GitHub
joseph-isaacs opened a new issue, #12819: URL: https://github.com/apache/datafusion/issues/12819 ### Is your feature request related to a problem or challenge? I am trying to create a scalar UDF, pack, which operates on struct arrays. It packs many array into a struct array each with

Re: [I] Comet 0.3.0 Release [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove closed issue #965: Comet 0.3.0 Release URL: https://github.com/apache/datafusion-comet/issues/965 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [I] Migrate documentation for all core functions from scalar_functions.md to code [datafusion]

2024-10-08 Thread via GitHub
Omega359 commented on issue #12801: URL: https://github.com/apache/datafusion/issues/12801#issuecomment-2400210996 Sure! I was going by modules which do not line up perfectly with the sections in the docs but it shouldn't be that much of an issue. Otherwise I expect if I churn through them

Re: [PR] chore: Remove NativeBase static initializer (to improve error handling when native lib fails to load) [datafusion-comet]

2024-10-08 Thread via GitHub
viirya commented on code in PR #1000: URL: https://github.com/apache/datafusion-comet/pull/1000#discussion_r1792210948 ## common/src/main/java/org/apache/comet/parquet/Native.java: ## @@ -24,6 +24,13 @@ import org.apache.comet.NativeBase; public final class Native extends Na

Re: [PR] Fix bug in TopK aggregates [datafusion]

2024-10-08 Thread via GitHub
avantgardnerio merged PR #12766: URL: https://github.com/apache/datafusion/pull/12766 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [I] Probable bug in TopKAggregate [datafusion]

2024-10-08 Thread via GitHub
avantgardnerio closed issue #12748: Probable bug in TopKAggregate URL: https://github.com/apache/datafusion/issues/12748 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Minor: Added reference to cloned physical expression [datafusion]

2024-10-08 Thread via GitHub
jonathanc-n commented on PR #12818: URL: https://github.com/apache/datafusion/pull/12818#issuecomment-2400348749 Sorry is someone able to close this, I didn't catch that it was being dereferenced immediately -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] API from `ParquetExec` to `ParquetExecBuilder` [datafusion]

2024-10-08 Thread via GitHub
alamb commented on code in PR #12799: URL: https://github.com/apache/datafusion/pull/12799#discussion_r1792190479 ## datafusion/core/src/datasource/physical_plan/parquet/mod.rs: ## @@ -459,6 +498,32 @@ impl ParquetExec { ParquetExecBuilder::new(file_scan_config) }

Re: [PR] [logical-types] update working branch [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12812: URL: https://github.com/apache/datafusion/pull/12812#issuecomment-2400329901 Merging as this is to a feature branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] [logical-types] update working branch [datafusion]

2024-10-08 Thread via GitHub
alamb merged PR #12812: URL: https://github.com/apache/datafusion/pull/12812 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] Minor: Added reference to cloned physical expression [datafusion]

2024-10-08 Thread via GitHub
jonathanc-n opened a new pull request, #12818: URL: https://github.com/apache/datafusion/pull/12818 ## Which issue does this PR close? Closes #. ## Rationale for this change While working on one of the sqllancer bugs, noticed that an expression wasn't receiving refer

Re: [PR] [WIP] Create table with `FixedSizeList` column [datafusion]

2024-10-08 Thread via GitHub
alamb commented on PR #12810: URL: https://github.com/apache/datafusion/pull/12810#issuecomment-2400186329 > Thanks for your help! Do I add these tests under sqllogictest? If so, is there an existing test_files/*.slt I should add them to? Maybe arrow_typeof.slt? Perhaps you can add th

Re: [PR] feat: Implement shared memory pool for case where spark.memory.offHeap.enabled=false [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on PR #1002: URL: https://github.com/apache/datafusion-comet/pull/1002#issuecomment-2400258972 > I'm a bit worried about this approach because we are implementing greedy mode inside `CometTaskMemoryManager`, which is known to starve consumers frequently. I prefer using

Re: [PR] Add Aggregation fuzzer framework [datafusion]

2024-10-08 Thread via GitHub
Rachelint commented on code in PR #12667: URL: https://github.com/apache/datafusion/pull/12667#discussion_r1792087233 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs: ## @@ -0,0 +1,266 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[I] Parse real number literals as the Decimal type [datafusion]

2024-10-08 Thread via GitHub
doupache opened a new issue, #12817: URL: https://github.com/apache/datafusion/issues/12817 ### Is your feature request related to a problem or challenge? During the fix for [12655](https://github.com/apache/datafusion/issues/12655), I found that the root cause was that we parse real

Re: [PR] chore: Remove NativeBase static initializer (to improve error handling when native lib fails to load) [datafusion-comet]

2024-10-08 Thread via GitHub
andygrove commented on code in PR #1000: URL: https://github.com/apache/datafusion-comet/pull/1000#discussion_r1792138764 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -1105,7 +1105,8 @@ object CometSparkSessionExtensions extends Logging {

  1   2   >