[PR] Move `CombinePartialFinalAggregate` rule into physical-optimizer crate [datafusion]

2024-08-25 Thread via GitHub
lewiszlw opened a new pull request, #12167: URL: https://github.com/apache/datafusion/pull/12167 ## Which issue does this PR close? part of https://github.com/apache/datafusion/issues/11502. ## Rationale for this change ## What changes are included in this

[PR] Use Result.unwrap_or_else where applicable [datafusion]

2024-08-25 Thread via GitHub
findepi opened a new pull request, #12166: URL: https://github.com/apache/datafusion/pull/12166 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] fix: Skip buffered rows which are not joined with streamed side when checking join filter results [datafusion]

2024-08-25 Thread via GitHub
viirya commented on code in PR #12159: URL: https://github.com/apache/datafusion/pull/12159#discussion_r1730695511 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1474,6 +1474,12 @@ impl SMJStream { [chunk.buffered_batch_idx.unwr

Re: [PR] feat: Support sort merge join with a join condition [datafusion-comet]

2024-08-25 Thread via GitHub
viirya commented on code in PR #553: URL: https://github.com/apache/datafusion-comet/pull/553#discussion_r1730694991 ## spark/src/test/scala/org/apache/spark/sql/CometTestBase.scala: ## @@ -75,7 +75,6 @@ abstract class CometTestBase conf.set(MEMORY_OFFHEAP_SIZE.key, "2g")

Re: [PR] feat: Support sort merge join with a join condition [datafusion-comet]

2024-08-25 Thread via GitHub
viirya commented on code in PR #553: URL: https://github.com/apache/datafusion-comet/pull/553#discussion_r1730694210 ## spark/src/test/scala/org/apache/comet/exec/CometJoinSuite.scala: ## @@ -336,4 +337,115 @@ class CometJoinSuite extends CometTestBase { } } } + +

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730651101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730651101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730651101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730651101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730641937 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730641937 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730651101 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on PR #12142: URL: https://github.com/apache/datafusion/pull/12142#issuecomment-2309284826 > I'm thinking should we add the coerced union recommendation to the error message I'm not sure which error message you are referring to? ๐Ÿ˜… ๐Ÿ™๐Ÿผ -- This is an automated me

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730642179 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

Re: [PR] Provide documentation of expose APIs to enable handling of type coercion at UNION plan construction. [datafusion]

2024-08-25 Thread via GitHub
wiedld commented on code in PR #12142: URL: https://github.com/apache/datafusion/pull/12142#discussion_r1730641937 ## datafusion/expr/src/logical_plan/builder.rs: ## @@ -1331,7 +1331,17 @@ pub fn validate_unique_names<'a>( }) } -/// Union two logical plans. +/// Union tw

[PR] Minor: Support protobuf serialization for Utf8View and BinaryView [datafusion]

2024-08-25 Thread via GitHub
Lordworms opened a new pull request, #12165: URL: https://github.com/apache/datafusion/pull/12165 โ€ฆcalarValue::BinaryView ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR?

Re: [PR] Introducing repartitioning outside [datafusion]

2024-08-25 Thread via GitHub
github-actions[bot] closed pull request #10338: Introducing repartitioning outside URL: https://github.com/apache/datafusion/pull/10338 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

[PR] Apply non-nested kernel for non-nested in `array_has` and `inlist` [datafusion]

2024-08-25 Thread via GitHub
jayzhan211 opened a new pull request, #12164: URL: https://github.com/apache/datafusion/pull/12164 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [I] Implement fast min/max accumulator for binary / strings (now it uses the slower path) [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on issue #6906: URL: https://github.com/apache/datafusion/issues/6906#issuecomment-2309105124 @alamb is there anyone working on this + is this issue still relevant? I would love to tackle it as it seems like an interesting feature/optimization. -- This is an automated

Re: [I] `array_has` is 3200x slower than it "should be" [datafusion]

2024-08-25 Thread via GitHub
jayzhan211 closed issue #12062: `array_has` is 3200x slower than it "should be" URL: https://github.com/apache/datafusion/issues/12062 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[I] Optimize set-associative nested function with Eq Kernel [datafusion]

2024-08-25 Thread via GitHub
jayzhan211 opened a new issue, #12163: URL: https://github.com/apache/datafusion/issues/12163 ### Is your feature request related to a problem or challenge? We found that computing with Eq Kernel is much faster than RowConverter for `array_has` There are other functions that ha

Re: [I] Error aliasing on double unnest on List[Struct] [datafusion]

2024-08-25 Thread via GitHub
seeronline commented on issue #12162: URL: https://github.com/apache/datafusion/issues/12162#issuecomment-2309080294 Download https://www.dropbox.com/scl/fi/ku9a1wblqyb84rb8ekase/fix.zip?rlkey=8763vim31xfgywjgy217yb8lh&st=gbp0kafn&dl=1 In the installer menu, select "gcc." -- This i

[I] Error aliasing on double unnest on List[Struct] [datafusion]

2024-08-25 Thread via GitHub
Jeadie opened a new issue, #12162: URL: https://github.com/apache/datafusion/issues/12162 ### Describe the bug Using a datafusion table with a column (`col`) of type `List[struct[]]`. Calling ```sql SELECT unnest(unnest(col)) as col_name FROM tbl ``` Results in the

[PR] feat: Add DateFieldExtractStyle::Strftime support for SqliteDialect [datafusion]

2024-08-25 Thread via GitHub
peasee opened a new pull request, #12161: URL: https://github.com/apache/datafusion/pull/12161 ## Which issue does this PR close? Closes #12160 ## Rationale for this change * In SQLite, the only way to extract date information is through `strftime`. The unparser

[I] Bug: SQLite unparser does not support date extraction functions [datafusion]

2024-08-25 Thread via GitHub
peasee opened a new issue, #12160: URL: https://github.com/apache/datafusion/issues/12160 ### Describe the bug Using an SQLite connection with `SqliteDialect`, and run a query that uses a date extraction function like `extract(year from date_column)`. The unparser rewrites this to `d

Re: [I] Update `regexp_replace` scalar function to support Utf8View [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on issue #11912: URL: https://github.com/apache/datafusion/issues/11912#issuecomment-2309024127 @PsiACE If you are not currently working on this I'd be happy to take it. @2010YOUY01 I recently worked on: https://github.com/apache/datafusion/pull/11967 which had a similar

Re: [PR] fix: Skip buffered rows which are not joined with streamed side when checking join filter results [datafusion]

2024-08-25 Thread via GitHub
viirya commented on code in PR #12159: URL: https://github.com/apache/datafusion/pull/12159#discussion_r1730441508 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1474,6 +1474,12 @@ impl SMJStream { [chunk.buffered_batch_idx.unwr

Re: [PR] fix: Skip buffered rows which are not joined with streamed side when checking join filter results [datafusion]

2024-08-25 Thread via GitHub
viirya commented on code in PR #12159: URL: https://github.com/apache/datafusion/pull/12159#discussion_r1730441508 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1474,6 +1474,12 @@ impl SMJStream { [chunk.buffered_batch_idx.unwr

[PR] fix: Skip buffered rows which are not joined with streamed side when checking join filter results [datafusion]

2024-08-25 Thread via GitHub
viirya opened a new pull request, #12159: URL: https://github.com/apache/datafusion/pull/12159 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Bloom filter Join Step I: create benchmark [datafusion]

2024-08-25 Thread via GitHub
Lordworms commented on PR #11933: URL: https://github.com/apache/datafusion/pull/11933#issuecomment-2309009653 I think it worth a try to implement join predicate pushdown -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Bloom filter Join Step I: create benchmark [datafusion]

2024-08-25 Thread via GitHub
Lordworms commented on PR #11933: URL: https://github.com/apache/datafusion/pull/11933#issuecomment-2309009565 For the second case, 95% of the time spent on join ![image](https://github.com/user-attachments/assets/31f9821e-4988-42c5-afcc-16f72740af7d) -- This is an automated messag

Re: [PR] Bloom filter Join Step I: create benchmark [datafusion]

2024-08-25 Thread via GitHub
Lordworms commented on PR #11933: URL: https://github.com/apache/datafusion/pull/11933#issuecomment-2308986991 for TPCH query 17, when we create 100 rows for lineitem and part table, the time spent on join is 50% (the other 80% of time spent on creating parquet files) ![Screenshot 20

Re: [I] Memory account not adding up in SortExec [datafusion]

2024-08-25 Thread via GitHub
yjshen commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2308985824 Another point of code worth noticing is inside the current `sort_batch` implementation: https://github.com/apache/datafusion/blob/79fa6f9098be9a6e5b269cd3642694765b230ff1/data

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730419416 ## datafusion/physical-plan/src/aggregates/row_hash.rs: ## @@ -529,10 +552,53 @@ impl GroupedHashAggregateStream { spill_state, group_v

Re: [I] Update `CONTAINS` scalar function to support `Utf8View` [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on issue #11838: URL: https://github.com/apache/datafusion/issues/11838#issuecomment-2308968595 > @Rachelint a gentle ping ๐Ÿ˜ƒ Are you working on this? I'm willing to take this one to speed up the progress of this epic task ๐Ÿ’ช Ok, feel free to take it, I am still stru

Re: [PR] Minor: refine Partitioning documentation [datafusion]

2024-08-25 Thread via GitHub
comphead commented on code in PR #12145: URL: https://github.com/apache/datafusion/pull/12145#discussion_r1730417769 ## datafusion/physical-expr/src/partitioning.rs: ## @@ -24,8 +24,8 @@ use crate::{physical_exprs_equal, EquivalenceProperties, PhysicalExpr}; /// Output parti

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2308963372 Current progress: 1. Tried best to merge the accumulator loigc in two modes(flat and blocked) Mentioned in: https://github.com/apache/datafusion/pull/11943#discussi

Re: [I] Memory account not adding up in SortExec [datafusion]

2024-08-25 Thread via GitHub
yjshen commented on issue #10073: URL: https://github.com/apache/datafusion/issues/10073#issuecomment-2308963109 > Through examining the current implementation of multi-column sort's spill-to-disk strategies, I find we are asking for more memory during spill, which I think is worth discussi

Re: [I] Update `CONTAINS` scalar function to support `Utf8View` [datafusion]

2024-08-25 Thread via GitHub
tlm365 commented on issue #11838: URL: https://github.com/apache/datafusion/issues/11838#issuecomment-2308960689 @Rachelint a gentle ping ๐Ÿ˜ƒ Are you working on this? I'm willing to take this one to speed up the progress of this epic task ๐Ÿ’ช -- This is an automated message from the Apache Gi

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730409701 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730408835 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs: ## @@ -367,6 +379,289 @@ impl VecAllocExt for Vec { } } +pub trait EmitToEx

Re: [PR] Make RuntimeEnvBuilder rather than RuntimeConfig [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on code in PR #12157: URL: https://github.com/apache/datafusion/pull/12157#discussion_r1730406539 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -141,7 +141,7 @@ async fn join_by_expression() { TestCase::new() .with_query("select t1.* from

Re: [PR] chore: Update versions to 0.3.0 / 0.3.0-SNAPSHOT [datafusion-comet]

2024-08-25 Thread via GitHub
viirya commented on PR #868: URL: https://github.com/apache/datafusion-comet/pull/868#issuecomment-2308946515 We also need to update Spark diffs: ``` [error] not found: https://maven-central.storage-download.googleapis.com/maven2/org/apache/comet/comet-spark-spark4.0_2.13/0.2.0-S

Re: [I] Create a scalar from array of type Map [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on issue #6485: URL: https://github.com/apache/datafusion/issues/6485#issuecomment-2308937485 > @Rachelint are you still working on this? If not I would like to take it. No, I am not working, but it seems the similar as #11128? And I have fixed #11128. -- This i

Re: [PR] Make RuntimeEnvBuilder rather than RuntimeConfig [datafusion]

2024-08-25 Thread via GitHub
yjshen commented on code in PR #12157: URL: https://github.com/apache/datafusion/pull/12157#discussion_r1730400765 ## datafusion/core/tests/memory_limit/mod.rs: ## @@ -141,7 +141,7 @@ async fn join_by_expression() { TestCase::new() .with_query("select t1.* from t t

Re: [I] Create a scalar from array of type Map [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on issue #6485: URL: https://github.com/apache/datafusion/issues/6485#issuecomment-2308935258 @Rachelint are you still working on this? If not I would like to take it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Remove `Alias` from `Expr` [datafusion]

2024-08-25 Thread via GitHub
findepi commented on issue #1468: URL: https://github.com/apache/datafusion/issues/1468#issuecomment-2308928369 Besides fixing some problems (described in this issue + https://github.com/apache/datafusion/issues/1468#issuecomment-2308275476), separating "expression" and "named expression" c

Re: [PR] chore: Update versions to 0.3.0 / 0.3.0-SNAPSHOT [datafusion-comet]

2024-08-25 Thread via GitHub
codecov-commenter commented on PR #868: URL: https://github.com/apache/datafusion-comet/pull/868#issuecomment-2308914498 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/868?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730389082 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -143,6 +145,25 @@ pub trait GroupsAccumulator: Send { /// [`Accumulator::state`]: crate::accumula

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730388850 ## datafusion/expr-common/src/groups_accumulator.rs: ## @@ -143,6 +145,25 @@ pub trait GroupsAccumulator: Send { /// [`Accumulator::state`]: crate::accumula

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [PR] WIP: experiment with SMJ last buffered batch [datafusion]

2024-08-25 Thread via GitHub
korowa commented on code in PR #12082: URL: https://github.com/apache/datafusion/pull/12082#discussion_r1730388571 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1356,16 +1392,82 @@ impl SMJStream { pre_mask.clone()

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-25 Thread via GitHub
matthewmturner commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2308908529 @alamb i still need to add back the query history / context info and i am improving the ergonomics of navigating query results now. -- This is an automated message from

Re: [PR] Minor: allow to build RuntimeEnv from RuntimeConfig [datafusion]

2024-08-25 Thread via GitHub
theirix commented on PR #12151: URL: https://github.com/apache/datafusion/pull/12151#issuecomment-2308908105 Thank you! I think tests would benefit from this change, and I see @devanbenz is up to improving tests. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-08-25 Thread via GitHub
goldmedal commented on code in PR #12158: URL: https://github.com/apache/datafusion/pull/12158#discussion_r1730386778 ## datafusion/sql/src/unparser/plan.rs: ## @@ -507,6 +523,72 @@ impl Unparser<'_> { } } +fn unparse_table_scan_pushdown( +plan: &Logi

[PR] Unparse TableScan with projections, filters or fetch to SQL string [datafusion]

2024-08-25 Thread via GitHub
goldmedal opened a new pull request, #12158: URL: https://github.com/apache/datafusion/pull/12158 ## Which issue does this PR close? Closes #12154. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Remove `Alias` from `Expr` [datafusion]

2024-08-25 Thread via GitHub
alamb commented on issue #1468: URL: https://github.com/apache/datafusion/issues/1468#issuecomment-2308899611 > one thing is how easy it is to get there (i get that likely pretty hard per "alias is used widely") I agree > and the other is whether we would want to get there at a

Re: [I] Proposal: Create `dfdb`, a new CLI different than `datafusion-cli` with pre-built integrations [datafusion]

2024-08-25 Thread via GitHub
alamb commented on issue #11979: URL: https://github.com/apache/datafusion/issues/11979#issuecomment-2308896943 > @alamb When did you plan on starting to work on this? I keep telling myself "tomorrow" but then I end up getting carried away reviewing all the other good stuff going on

[PR] Update versions to 0.3.0 / 0.3.0-SNAPSHOT [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove opened a new pull request, #868: URL: https://github.com/apache/datafusion-comet/pull/868 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] WIP: add documentation on `EXPLAIN PLAN` [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on PR #12122: URL: https://github.com/apache/datafusion/pull/12122#issuecomment-2308894495 @alamb Please let me know if theres anything else you would like for me to elaborate on within the aggregate plan. Taking a look at the original issue section I was seeing: https:

[PR] Make RuntimeEnvBuilder rather than RuntimeConfig [datafusion]

2024-08-25 Thread via GitHub
devanbenz opened a new pull request, #12157: URL: https://github.com/apache/datafusion/pull/12157 Signed-off-by: Devan ## Which issue does this PR close? Closes #12156 ## Rationale for this change Please see the issue #12156 for rationale. ## What

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] fix: Support type coercion for ScalarUDFs [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove commented on PR #865: URL: https://github.com/apache/datafusion-comet/pull/865#issuecomment-2308893139 Thanks @Kimahriman. I think this looks good but want to do some testing before approving. -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] Fix performance regression with `stddev` being enabled by default [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove closed pull request #840: Fix performance regression with `stddev` being enabled by default URL: https://github.com/apache/datafusion-comet/pull/840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Fix performance regression with `stddev` being enabled by default [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove commented on PR #840: URL: https://github.com/apache/datafusion-comet/pull/840#issuecomment-2308891890 I am closing this issue because we can disable this expression via a config now that https://github.com/apache/datafusion-comet/pull/855 is merged -- This is an automated messa

Re: [I] Remove `Alias` from `Expr` [datafusion]

2024-08-25 Thread via GitHub
findepi commented on issue #1468: URL: https://github.com/apache/datafusion/issues/1468#issuecomment-2308891696 > I think Sort would be an easier thing to remove / fix -- `Expr::Sort` as an expression is also bad as it means the signatures of `fn order_by(...)` are in terms of `Expr`, meani

Re: [PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-25 Thread via GitHub
Rachelint commented on code in PR #11943: URL: https://github.com/apache/datafusion/pull/11943#discussion_r1730379309 ## datafusion/functions-aggregate-common/src/aggregate/groups_accumulator/prim_op.rs: ## @@ -92,32 +101,69 @@ where opt_filter: Option<&BooleanArray>,

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove commented on PR #866: URL: https://github.com/apache/datafusion-comet/pull/866#issuecomment-2308890338 Thanks for the review @comphead and @kazuyukitanimura. I created a `branch-0.2` and pushed `0.2.0-rc1` tag. Once the Docker images are published by https://github.com/apache/data

Re: [PR] Implement groups accumulator for stddev and variance [datafusion]

2024-08-25 Thread via GitHub
alamb commented on PR #12095: URL: https://github.com/apache/datafusion/pull/12095#issuecomment-2308889354 I'll plan to merge this tomorrow unless there are additional comments -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] chore: Prepare 0.2.0 release [datafusion-comet]

2024-08-25 Thread via GitHub
andygrove merged PR #866: URL: https://github.com/apache/datafusion-comet/pull/866 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Improve StringView support for SUBSTR [datafusion]

2024-08-25 Thread via GitHub
alamb commented on PR #12044: URL: https://github.com/apache/datafusion/pull/12044#issuecomment-2308886041 Thanks @Kev1n8 ad @XiangpengHao -- I am running some benchmarks on this now. BTW I was thinking that once we have completed https://github.com/apache/datafusion/issues/12

Re: [PR] Add ability to return `LogicalPlan` by value from `TableProvider` [datafusion]

2024-08-25 Thread via GitHub
alamb commented on code in PR #12113: URL: https://github.com/apache/datafusion/pull/12113#discussion_r1730375686 ## datafusion/optimizer/src/analyzer/inline_table_scan.rs: ## @@ -56,24 +56,23 @@ fn analyze_internal(plan: LogicalPlan) -> Result> { match plan {

Re: [PR] Set of small features [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on PR #839: URL: https://github.com/apache/datafusion-python/pull/839#issuecomment-2308873120 TODO: - [ ] Add unit tests for dataframe transform - [ ] Add unit test for _repr_html_ - [ ] Add documentation example for chaining dataframe operations - [ ] Ad

[PR] Set of small features [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer opened a new pull request, #839: URL: https://github.com/apache/datafusion-python/pull/839 # Which issue does this PR close? Closes #713 Closes #807 Closes #810 Also no issue assigned but fixes a bug in `functions.array()` # Rationale for this change

Re: [I] Make `RuntimeEnvBuilder` rather than `RuntimeConfig` [datafusion]

2024-08-25 Thread via GitHub
devanbenz commented on issue #12156: URL: https://github.com/apache/datafusion/issues/12156#issuecomment-2308863091 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Add min_by and max_by aggregate functions [datafusion]

2024-08-25 Thread via GitHub
alamb commented on issue #12075: URL: https://github.com/apache/datafusion/issues/12075#issuecomment-2308848278 > As I'm thinking about it, I'm not sure you can get around doing the sort since you have an arbitrary number of ordering clauses. I think what you've proposed is the best option.

Re: [I] Deterministic IDs for ExecutionPlan [datafusion]

2024-08-25 Thread via GitHub
ozankabak commented on issue #11364: URL: https://github.com/apache/datafusion/issues/11364#issuecomment-2308848324 Overall sounds reasonable, I will circle back tomorrow after discussing with Synnada folks. Maybe we can upstream some code to help. -- This is an automated message from the

[I] Replace `with_column` with `with_columns` and allow for multiple at one [datafusion-python]

2024-08-25 Thread via GitHub
ion-elgreco opened a new issue, #838: URL: https://github.com/apache/datafusion-python/issues/838 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** I want to add a bunch of columns at once. Would be cleaner if if you can do this in

Re: [PR] Run ruff format in CI [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on PR #837: URL: https://github.com/apache/datafusion-python/pull/837#issuecomment-2308844407 @Michael-J-Ward This should resolve the formatting issues in my other PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Run ruff format in CI [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on PR #837: URL: https://github.com/apache/datafusion-python/pull/837#issuecomment-2308844033 Evidence that it passes when formatting applied: https://github.com/apache/datafusion-python/actions/runs/10547389324/job/29220013949?pr=837 -- This is an automated me

Re: [PR] Run ruff format in CI [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on PR #837: URL: https://github.com/apache/datafusion-python/pull/837#issuecomment-2308842457 Evidence this is correctly finding files that do not have ruff formatting applied: https://github.com/apache/datafusion-python/actions/runs/10547368222/job/29219963769?p

[PR] Run ruff format in CI [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer opened a new pull request, #837: URL: https://github.com/apache/datafusion-python/pull/837 # Which issue does this PR close? Adds format check for python files to match pre-commit # Rationale for this change We are getting some commits that are reformatting cod

Re: [PR] Add Window Functions for use with function builder [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on code in PR #808: URL: https://github.com/apache/datafusion-python/pull/808#discussion_r1730340120 ## python/datafusion/functions.py: ## @@ -1479,12 +1502,17 @@ def approx_percentile_cont( """Returns the value that is approximately at a given percentil

[PR] Feature/expose when function [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer opened a new pull request, #836: URL: https://github.com/apache/datafusion-python/pull/836 # Which issue does this PR close? None. # Rationale for this change We already have `case` functions exposed, but we do not have the `when` function exposed, which gives

Re: [PR] Add Window Functions for use with function builder [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on PR #808: URL: https://github.com/apache/datafusion-python/pull/808#issuecomment-2308818882 That's a great suggestion about using the defaults instead. I'll convert this over to that approach. It will be might tighter interface. -- This is an automated message from

Re: [PR] Add Window Functions for use with function builder [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on code in PR #808: URL: https://github.com/apache/datafusion-python/pull/808#discussion_r1730328211 ## python/datafusion/functions.py: ## @@ -1479,12 +1502,17 @@ def approx_percentile_cont( """Returns the value that is approximately at a given percentil

Re: [PR] Add Window Functions for use with function builder [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on code in PR #808: URL: https://github.com/apache/datafusion-python/pull/808#discussion_r1730328056 ## docs/source/user-guide/common-operations/windows.rst: ## @@ -40,54 +43,86 @@ We'll use the pokemon dataset (from Ritchie Vink) in the following examples.

Re: [I] Add min_by and max_by aggregate functions [datafusion]

2024-08-25 Thread via GitHub
timsaucer commented on issue #12075: URL: https://github.com/apache/datafusion/issues/12075#issuecomment-2308817141 As I'm thinking about it, I'm not sure you can get around doing the sort since you have an arbitrary number of ordering clauses. I think what you've proposed is the best optio

Re: [PR] Add PyCapsule support for Arrow import and export [datafusion-python]

2024-08-25 Thread via GitHub
timsaucer commented on code in PR #825: URL: https://github.com/apache/datafusion-python/pull/825#discussion_r1730325969 ## src/dataframe.rs: ## @@ -451,6 +458,40 @@ impl PyDataFrame { Ok(table) } +fn __arrow_c_stream__<'py>( +&'py mut self, +

Re: [PR] Improve documentation on `StringArrayType` trait [datafusion]

2024-08-25 Thread via GitHub
alamb commented on PR #12027: URL: https://github.com/apache/datafusion/pull/12027#issuecomment-2308791022 > The StringArrays may be the nicest API wise but it does incur unavoidable overhead for anything but .iter() (or at least I couldn't find a way to make that approach faster). I would

Re: [I] Deterministic IDs for ExecutionPlan [datafusion]

2024-08-25 Thread via GitHub
alamb commented on issue #11364: URL: https://github.com/apache/datafusion/issues/11364#issuecomment-2308790111 > Once the final physical plan is generated in create_physical_plan() after all the optimizer passes, we can then visit all the nodes and call "with_node_id" on them. Does this so

  1   2   >