Re: [PR] fix: bugs when having and group by are all false [datafusion]

2024-08-12 Thread via GitHub
jonahgao commented on PR #11897: URL: https://github.com/apache/datafusion/pull/11897#issuecomment-2283248937 > So this PR actually fixed two bugs, (1. global group_by + having false. 2. global group_by + having true) The following case seems to be caused by `EliminateGroupByConstant`

[PR] Sketch for aggregation intermediate results blocked management [datafusion]

2024-08-12 Thread via GitHub
Rachelint opened a new pull request, #11943: URL: https://github.com/apache/datafusion/pull/11943 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] fix: bugs when having and group by are all false [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on PR #11897: URL: https://github.com/apache/datafusion/pull/11897#issuecomment-2283302352 Ideally group by constant should be eliminated, but the result is different when there is no row and we can't differentiate it after `EliminateGroupByConstant`. I think thi

[PR] TEST CI only [datafusion-comet]

2024-08-12 Thread via GitHub
viirya opened a new pull request, #812: URL: https://github.com/apache/datafusion-comet/pull/812 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-12 Thread via GitHub
JasonLi-cn commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283323779 > > > I agree, finally it should be a big change which switches the group values and related states managed by block like duckdb , and I am working on this(#11931). > > > >

Re: [PR] fix: bugs when having and group by are all false [datafusion]

2024-08-12 Thread via GitHub
jonahgao commented on PR #11897: URL: https://github.com/apache/datafusion/pull/11897#issuecomment-2283416590 I agree to disable `EliminateGroupByConstant` because it does not work correctly with empty input. -- This is an automated message from the Apache Git Service. To respond to the m

Re: [PR] Parse Sqllogictest column types from physical schema [datafusion]

2024-08-12 Thread via GitHub
jonahgao commented on PR #11929: URL: https://github.com/apache/datafusion/pull/11929#issuecomment-2283419507 Thanks for the review @alamb @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Parse Sqllogictest column types from physical schema [datafusion]

2024-08-12 Thread via GitHub
jonahgao merged PR #11929: URL: https://github.com/apache/datafusion/pull/11929 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] chore(deps): update substrait requirement from 0.36.0 to 0.40.0 [datafusion]

2024-08-12 Thread via GitHub
dependabot[bot] opened a new pull request, #11944: URL: https://github.com/apache/datafusion/pull/11944 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from https://github.com/substrait-io/su

Re: [PR] chore(deps): update substrait requirement from 0.36.0 to 0.39.0 [datafusion]

2024-08-12 Thread via GitHub
dependabot[bot] closed pull request #11842: chore(deps): update substrait requirement from 0.36.0 to 0.39.0 URL: https://github.com/apache/datafusion/pull/11842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] chore(deps): update substrait requirement from 0.36.0 to 0.39.0 [datafusion]

2024-08-12 Thread via GitHub
dependabot[bot] commented on PR #11842: URL: https://github.com/apache/datafusion/pull/11842#issuecomment-2283421829 Superseded by #11944. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-08-12 Thread via GitHub
crepererum commented on issue #4028: URL: https://github.com/apache/datafusion/issues/4028#issuecomment-2283437288 No need to rush. I think @alamb's comment was mostly meant as an assignment signal, so that nobody else starts to work on it and we end up wasting resources 🙂 -- This is an

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-12 Thread via GitHub
Rachelint commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283482260 @JasonLi-cn As I think, maybe we should impl the special block based `GroupValues` impls: - We pass the `block size` when initializing it - It manage the inner values block by

[PR] Move `LimitPushdown` to physical-optimizer crate [datafusion]

2024-08-12 Thread via GitHub
lewiszlw opened a new pull request, #11945: URL: https://github.com/apache/datafusion/pull/11945 ## Which issue does this PR close? part of https://github.com/apache/datafusion/issues/11502. ## Rationale for this change ## What changes are included in this

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-12 Thread via GitHub
JasonLi-cn commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283566894 > @JasonLi-cn As I think, `GroupValues` impls maybe should not care about the `batch size`? And we just do the `split and merge` work in the `GroupedHashAggregateStream::poll` , i

Re: [I] Write a blog post about implementing StringView in DataFusion [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11603: URL: https://github.com/apache/datafusion/issues/11603#issuecomment-2283594648 We are done with the draft. We expect it to be published in the next few weeks (it turns out to be a two part series) -- This is an automated message from the Apache Git Service

Re: [I] Return TableProviderFilterPushDown::Exact when Parquet Pushdown Enabled [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #4028: URL: https://github.com/apache/datafusion/issues/4028#issuecomment-2283600292 > No need to rush. I think @alamb's comment was mostly meant as an assignment signal, so that nobody else starts to work on it and we end up wasting resources 🙂 Yes this is w

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11888: URL: https://github.com/apache/datafusion/pull/11888 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update `INITCAP` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11853: Update `INITCAP` scalar function to support `Utf8View` URL: https://github.com/apache/datafusion/issues/11853 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Update INITCAP scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11888: URL: https://github.com/apache/datafusion/pull/11888#issuecomment-2283612658 Thanks again @xinlifoobar and @XiangpengHao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Implement native support StringView for Octet Length [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11906: URL: https://github.com/apache/datafusion/pull/11906 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement native support StringView for Octet Length [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11906: URL: https://github.com/apache/datafusion/pull/11906#issuecomment-2283613083 Thanks again @PsiACE -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [I] Update `OCTET_LENGTH` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11858: Update `OCTET_LENGTH` scalar function to support `Utf8View` URL: https://github.com/apache/datafusion/issues/11858 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [I] Implement `ordering` serialization for `AggregateUdf` in `datafusion-proto` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11804: Implement `ordering` serialization for `AggregateUdf` in `datafusion-proto` URL: https://github.com/apache/datafusion/issues/11804 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] chore: Add SessionState to MockContextProvider just like SessionContextProvider [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11940: URL: https://github.com/apache/datafusion/pull/11940#discussion_r1713505076 ## datafusion/sql/tests/sql_integration.rs: ## @@ -2739,39 +2742,44 @@ fn logical_plan_with_dialect_and_options( dialect: &dyn Dialect, options: ParserOpti

Re: [PR] fix: impl ordering for serialization/deserialization for AggregateUdf [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11926: URL: https://github.com/apache/datafusion/pull/11926 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] fix: impl ordering for serialization/deserialization for AggregateUdf [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11926: URL: https://github.com/apache/datafusion/pull/11926#issuecomment-2283616969 Thanks @jayzhan211 and @haohuaijin -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Support tuples as types [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11896: URL: https://github.com/apache/datafusion/pull/11896#issuecomment-2283621173 I took the liberty of merging up from main to resolve a conflict -- I think we could merge this PR and address the comments as a follow on PR, or we can do it prior to merging this one

Re: [PR] Move `LimitPushdown` to physical-optimizer crate [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11945: URL: https://github.com/apache/datafusion/pull/11945#discussion_r1713520236 ## datafusion/common/src/utils/mod.rs: ## @@ -683,6 +683,69 @@ pub fn transpose(original: Vec>) -> Vec> { } } +/// Computes the `skip` and `fetch` parameters

Re: [PR] Update RPAD scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11942: URL: https://github.com/apache/datafusion/pull/11942#discussion_r1713532418 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -77,96 +82,122 @@ impl ScalarUDFImpl for RPadFunc { fn invoke(&self, args: &[ColumnarValue]) -> Result {

Re: [PR] Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-08-12 Thread via GitHub
Rachelint commented on PR #11758: URL: https://github.com/apache/datafusion/pull/11758#issuecomment-2283653639 > > @JasonLi-cn As I think, `GroupValues` impls maybe should not care about the `batch size`? And we just do the `split and merge` work in the `GroupedHashAggregateStream::poll` ,

Re: [I] DataFusion weekly project plan (Andrew Lamb) - Aug 5, 2024 [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11826: URL: https://github.com/apache/datafusion/issues/11826#issuecomment-2283655669 PRs in need of review: DataFusion - [ ] https://github.com/apache/datafusion/pull/11456 - [ ] https://github.com/apache/datafusion/pull/11938 - [ ] https://github.co

Re: [I] Configurable null_equals_null flag [datafusion]

2024-08-12 Thread via GitHub
berkaysynnada commented on issue #11883: URL: https://github.com/apache/datafusion/issues/11883#issuecomment-2283657039 That makes sense. If such usages exist, there's no need to expose a new flag. It would be great if we could handle these two versions and link them to our internal join fl

Re: [PR] test: re-enable window function over parquet with forced collisions [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11939: URL: https://github.com/apache/datafusion/pull/11939#issuecomment-2283661518 I re-ran the relevant test: ```shell cargo test --test sqllogictests --features=force_hash_collisions -- parquet ``` And indeed it passes locally for me too `

Re: [I] Update `ENDS_WITH` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11852: Update `ENDS_WITH` scalar function to support `Utf8View` URL: https://github.com/apache/datafusion/issues/11852 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Implement native support StringView for Ends With [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11924: URL: https://github.com/apache/datafusion/pull/11924 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Implement native support StringView for Ends With [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11924: URL: https://github.com/apache/datafusion/pull/11924#discussion_r1713543104 ## datafusion/functions/src/string/ends_with.rs: ## @@ -43,14 +41,15 @@ impl Default for EndsWithFunc { impl EndsWithFunc { pub fn new() -> Self { -u

Re: [PR] Implement native support StringView for Levenshtein [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11925: URL: https://github.com/apache/datafusion/pull/11925 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update `levenshtein` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11854: Update `levenshtein` scalar function to support `Utf8View` URL: https://github.com/apache/datafusion/issues/11854 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [I] Update `OVERLAY` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
PsiACE commented on issue #11909: URL: https://github.com/apache/datafusion/issues/11909#issuecomment-2283746445 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Update `REGEXP_LIKE` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
PsiACE commented on issue #11910: URL: https://github.com/apache/datafusion/issues/11910#issuecomment-2283749092 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Implement Utf8View for lpad scalar function [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11941: URL: https://github.com/apache/datafusion/pull/11941#discussion_r1713604384 ## datafusion/functions/src/unicode/lpad.rs: ## @@ -76,300 +87,450 @@ impl ScalarUDFImpl for LPadFunc { } fn invoke(&self, args: &[ColumnarValue]) -> Res

Re: [PR] Implement native stringview support for BTRIM [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11920: URL: https://github.com/apache/datafusion/pull/11920#discussion_r1713618503 ## datafusion/functions/src/string/common.rs: ## @@ -68,6 +69,74 @@ pub(crate) fn general_trim( }, }; +if use_string_view { +string_view_

Re: [I] [Epic] Native `StringView` support for string functions [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11790: URL: https://github.com/apache/datafusion/issues/11790#issuecomment-2283777166 One thing I have noticed during implementations is that some functions such as `ltrim`/`rtrim`/`btrim` could be more efficient if they produced Utf8View as *output* in addition to

Re: [I] Update `LTRIM` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11856: URL: https://github.com/apache/datafusion/issues/11856#issuecomment-2283779062 After https://github.com/apache/datafusion/pull/11920 from @Kev1n8 I think this one will be quick to implement -- This is an automated message from the Apache Git Service. To r

Re: [I] Update `RTRIM` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11916: URL: https://github.com/apache/datafusion/issues/11916#issuecomment-2283779205 After https://github.com/apache/datafusion/pull/11920 from @Kev1n8 I think this one will be quick to implement -- This is an automated message from the Apache Git Service. To r

Re: [PR] Implement native stringview support for BTRIM [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11920: URL: https://github.com/apache/datafusion/pull/11920#discussion_r1713618503 ## datafusion/functions/src/string/common.rs: ## @@ -68,6 +69,74 @@ pub(crate) fn general_trim( }, }; +if use_string_view { +string_view_

Re: [PR] Implement native stringview support for BTRIM [datafusion]

2024-08-12 Thread via GitHub
alamb merged PR #11920: URL: https://github.com/apache/datafusion/pull/11920 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Update the `BTRIM` scalar function to support `Utf8View` [datafusion]

2024-08-12 Thread via GitHub
alamb closed issue #11835: Update the `BTRIM` scalar function to support `Utf8View` URL: https://github.com/apache/datafusion/issues/11835 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[I] Add additional regexp functions [datafusion]

2024-08-12 Thread via GitHub
timsaucer opened a new issue, #11946: URL: https://github.com/apache/datafusion/issues/11946 ### Is your feature request related to a problem or challenge? I would like to see the following regexp functions implemented. These exist in some, but not all, versions of PostgreSQL.

Re: [PR] Make `Precision` copy to make it clear clones are not expensive [datafusion]

2024-08-12 Thread via GitHub
crepererum commented on code in PR #11828: URL: https://github.com/apache/datafusion/pull/11828#discussion_r1713654802 ## datafusion/common/src/stats.rs: ## @@ -25,7 +25,7 @@ use arrow_schema::Schema; /// Represents a value with a degree of certainty. `Precision` is used to

Re: [PR] Make `Precision` copy to make it clear clones are not expensive [datafusion]

2024-08-12 Thread via GitHub
crepererum commented on code in PR #11828: URL: https://github.com/apache/datafusion/pull/11828#discussion_r1713654802 ## datafusion/common/src/stats.rs: ## @@ -25,7 +25,7 @@ use arrow_schema::Schema; /// Represents a value with a degree of certainty. `Precision` is used to

Re: [PR] fix: Sort on single struct should fallback to Spark [datafusion-comet]

2024-08-12 Thread via GitHub
andygrove commented on code in PR #811: URL: https://github.com/apache/datafusion-comet/pull/811#discussion_r1713770493 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2501,6 +2501,13 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde w

Re: [PR] feat: Implement to_json for subset of types [datafusion-comet]

2024-08-12 Thread via GitHub
andygrove commented on PR #805: URL: https://github.com/apache/datafusion-comet/pull/805#issuecomment-2283984392 @parthchandra could you review? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Update RPAD scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on code in PR #11942: URL: https://github.com/apache/datafusion/pull/11942#discussion_r1713779456 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -77,96 +82,122 @@ impl ScalarUDFImpl for RPadFunc { fn invoke(&self, args: &[ColumnarValue]) -> Result {

[PR] minor: Update release documentation based on 41.0.0 release [datafusion]

2024-08-12 Thread via GitHub
andygrove opened a new pull request, #11947: URL: https://github.com/apache/datafusion/pull/11947 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11476 ## Rationale for this change Update documentation to add new crat

[PR] Add native stringview support for LTRIM & RTRIM [datafusion]

2024-08-12 Thread via GitHub
Kev1n8 opened a new pull request, #11948: URL: https://github.com/apache/datafusion/pull/11948 ## Which issue does this PR close? Closes #11856 and #11916 ## Rationale for this change ## What changes are included in this PR? Added `ltrim` and `rtrim` fu

Re: [PR] minor: Update release documentation based on 41.0.0 release [datafusion]

2024-08-12 Thread via GitHub
andygrove commented on code in PR #11947: URL: https://github.com/apache/datafusion/pull/11947#discussion_r1713783391 ## datafusion/catalog/Cargo.toml: ## @@ -17,6 +17,7 @@ [package] name = "datafusion-catalog" +description = "datafusion-catalog" Review Comment: I had to

Re: [PR] Update RPAD scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on code in PR #11942: URL: https://github.com/apache/datafusion/pull/11942#discussion_r1713789099 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -46,8 +48,11 @@ impl RPadFunc { signature: Signature::one_of( vec![

Re: [PR] Implement Utf8View for lpad scalar function [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on PR #11941: URL: https://github.com/apache/datafusion/pull/11941#issuecomment-2284011527 I was looking at @Lordworms implementation in https://github.com/apache/datafusion/pull/11942 and I think it would make sense to align the two implementations. In some aspects I lik

Re: [PR] Move `LimitPushdown` to physical-optimizer crate [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 merged PR #11945: URL: https://github.com/apache/datafusion/pull/11945 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Move `LimitPushdown` to physical-optimizer crate [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on PR #11945: URL: https://github.com/apache/datafusion/pull/11945#issuecomment-2284039319 Thanks @lewiszlw @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] chore: Enable shuffle in micro benchmarks [datafusion-comet]

2024-08-12 Thread via GitHub
andygrove merged PR #806: URL: https://github.com/apache/datafusion-comet/pull/806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Update `RIGHT` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Kev1n8 commented on issue #11917: URL: https://github.com/apache/datafusion/issues/11917#issuecomment-2284063257 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [PR] Minor: Improve comments in row_hash.rs for skipping aggregation [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on PR #11820: URL: https://github.com/apache/datafusion/pull/11820#issuecomment-2284066506 Thanks all -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Minor: Improve comments in row_hash.rs for skipping aggregation [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 merged PR #11820: URL: https://github.com/apache/datafusion/pull/11820 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] Update `REGEXP_MATCH` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
PsiACE commented on issue #11911: URL: https://github.com/apache/datafusion/issues/11911#issuecomment-2284066645 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Update `regexp_replace` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
PsiACE commented on issue #11912: URL: https://github.com/apache/datafusion/issues/11912#issuecomment-2284067440 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

Re: [I] Manage group values and states by blocks in aggregation [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on issue #11931: URL: https://github.com/apache/datafusion/issues/11931#issuecomment-2284071358 > /// For example, `n= 10`, `block size=4`, `n` will be aligned to 12, /// and finally 3 blocks will be returned. FirstBlocks(usize) I think emitting with

Re: [PR] Update RPAD scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on PR #11942: URL: https://github.com/apache/datafusion/pull/11942#issuecomment-2284082623 You may want to look at the tests I added in https://github.com/apache/datafusion/pull/11941 as an example to verify that all the signature variants are tested. -- This is an aut

Re: [I] [Epic] Native `StringView` support for string functions [datafusion]

2024-08-12 Thread via GitHub
2010YOUY01 commented on issue #11790: URL: https://github.com/apache/datafusion/issues/11790#issuecomment-2284100621 Inspired by @Omega359 's great PR https://github.com/apache/datafusion/pull/11941, I have some suggestion on testing `Utf8View` support for functions: Although most im

Re: [PR] Implement Utf8View for lpad scalar function [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11941: URL: https://github.com/apache/datafusion/pull/11941#issuecomment-2284118946 > I was looking at @Lordworms implementation in #11942 and I think it would make sense to align the two implementations. In some aspects I like his approach much more than the one I to

Re: [PR] Make `Precision` copy to make it clear clones are not expensive [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11828: URL: https://github.com/apache/datafusion/pull/11828#discussion_r1713874112 ## datafusion/common/src/stats.rs: ## @@ -25,7 +25,7 @@ use arrow_schema::Schema; /// Represents a value with a degree of certainty. `Precision` is used to /// p

Re: [PR] chore: Add SessionState to MockContextProvider just like SessionContextProvider [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 merged PR #11940: URL: https://github.com/apache/datafusion/pull/11940 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] chore: Add SessionState to MockContextProvider just like SessionContextProvider [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on PR #11940: URL: https://github.com/apache/datafusion/pull/11940#issuecomment-2284135678 Thanks @dharanad @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] chore: Add `SessionState` to `MockContextProvider` just like `SessionContextProvider` [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 closed issue #11531: chore: Add `SessionState` to `MockContextProvider` just like `SessionContextProvider` URL: https://github.com/apache/datafusion/issues/11531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [I] Improve performance of inner hash join [datafusion-comet]

2024-08-12 Thread via GitHub
andygrove commented on issue #808: URL: https://github.com/apache/datafusion-comet/issues/808#issuecomment-2284140087 The `FilterExec` in the above example is even more expensive than the `HashJoinExec`. Evaluating the predicate is cheap but copying data to the filtered batch takes 99% of

Re: [PR] Implement Utf8View for lpad scalar function [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on PR #11941: URL: https://github.com/apache/datafusion/pull/11941#issuecomment-2284141051 > Do you think we should merge this PR and work on improvements in a follow on PR? I have no objections to that. -- This is an automated message from the Apache Git Ser

Re: [I] Support creating arrays with non-nullable elements in `make_array` [datafusion]

2024-08-12 Thread via GitHub
Kimahriman commented on issue #11923: URL: https://github.com/apache/datafusion/issues/11923#issuecomment-2284199250 After digging a little more it seems like there's some other issues that might make this significantly more difficult: - When `invoke` is called, there's no way to know

Re: [PR] feat: `CreateArray` support [datafusion-comet]

2024-08-12 Thread via GitHub
Kimahriman commented on code in PR #793: URL: https://github.com/apache/datafusion-comet/pull/793#discussion_r1713950119 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -2348,6 +2348,27 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde

Re: [I] Support creating arrays with non-nullable elements in `make_array` [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on issue #11923: URL: https://github.com/apache/datafusion/issues/11923#issuecomment-2284236770 > ```rust > Ok(Arc::new(GenericListArraytry_new( > ``` We could have another `invoke_with_schema` function to get the `nullable` from schema. Of course, anoth

Re: [PR] Parse Sqllogictest column types from physical schema [datafusion]

2024-08-12 Thread via GitHub
alamb commented on code in PR #11929: URL: https://github.com/apache/datafusion/pull/11929#discussion_r1713973317 ## datafusion/sqllogictest/src/engines/datafusion_engine/runner.rs: ## @@ -69,9 +72,12 @@ impl sqllogictest::AsyncDB for DataFusion { async fn run_query(ctx: &Ses

Re: [I] Support creating arrays with non-nullable elements in `make_array` [datafusion]

2024-08-12 Thread via GitHub
jayzhan211 commented on issue #11923: URL: https://github.com/apache/datafusion/issues/11923#issuecomment-2284262048 I remember there was a discussion about replace ColumnarValue::Array(ArrayRef) with Recordbatch, which contains `Schema` ```rust #[derive(Clone, Debug, PartialEq)]

Re: [PR] Update RPAD scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on code in PR #11942: URL: https://github.com/apache/datafusion/pull/11942#discussion_r1713832682 ## datafusion/functions/src/unicode/rpad.rs: ## @@ -77,96 +82,122 @@ impl ScalarUDFImpl for RPadFunc { fn invoke(&self, args: &[ColumnarValue]) -> Result {

[PR] Use the tracked-consumers memory pool be the default. [datafusion]

2024-08-12 Thread via GitHub
wiedld opened a new pull request, #11949: URL: https://github.com/apache/datafusion/pull/11949 ## Which issue does this PR close? Closes #11523 ## Rationale for this change We would like the improved OOM error messages, which lists the top overall memory reservation con

[I] Update `SPLIT_PART` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
tshauck opened a new issue, #11950: URL: https://github.com/apache/datafusion/issues/11950 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

[I] Update `STRPOS` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
tshauck opened a new issue, #11951: URL: https://github.com/apache/datafusion/issues/11951 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

[I] Update `SUBSTR` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
tshauck opened a new issue, #11952: URL: https://github.com/apache/datafusion/issues/11952 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

[I] Update `TRANSLATE` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
tshauck opened a new issue, #11953: URL: https://github.com/apache/datafusion/issues/11953 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

[I] Update `FIND_IN_SET` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
tshauck opened a new issue, #11954: URL: https://github.com/apache/datafusion/issues/11954 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] Update `REVERSE` scalar function to support Utf8View [datafusion]

2024-08-12 Thread via GitHub
Omega359 commented on issue #11915: URL: https://github.com/apache/datafusion/issues/11915#issuecomment-2284343484 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Add tests for StringView / character functions, fix `regexp_like` and `regexp_match` to work with StringView [datafusion]

2024-08-12 Thread via GitHub
tshauck commented on code in PR #11753: URL: https://github.com/apache/datafusion/pull/11753#discussion_r1714025767 ## datafusion/sqllogictest/test_files/string_view.slt: ## @@ -594,8 +518,417 @@ SELECT 228 0 NULL +## Ensure no casts for BTRIM +query TT +EXPLAIN SELECT

Re: [PR] Use the tracked-consumers memory pool be the default. [datafusion]

2024-08-12 Thread via GitHub
wiedld commented on PR #11949: URL: https://github.com/apache/datafusion/pull/11949#issuecomment-2284349835 Benchmark clickbench_1 | Query | main_base | 11523_tracked-consumers-default | Change | | ---

Re: [I] Improve performance of broadcast hash join [datafusion-comet]

2024-08-12 Thread via GitHub
andygrove commented on issue #808: URL: https://github.com/apache/datafusion-comet/issues/808#issuecomment-2284350076 The filter on the probe input is very simple (`col_0@0 IS NOT NULL`) and it should be possible to push down to the parquet scan? -- This is an automated message from the

Re: [PR] chore: Enable Comet shuffle with AQE coalesce partitions [datafusion-comet]

2024-08-12 Thread via GitHub
viirya closed pull request #651: chore: Enable Comet shuffle with AQE coalesce partitions URL: https://github.com/apache/datafusion-comet/pull/651 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] fix: bugs when having and group by are all false [datafusion]

2024-08-12 Thread via GitHub
Lordworms commented on PR #11897: URL: https://github.com/apache/datafusion/pull/11897#issuecomment-2284447751 > EliminateGroupByConstant > Ideally group by constant should be eliminated, but the result is different when there is no row and we can't differentiate it after `El

Re: [I] Change default value of `datafusion.catalog.has_header` to `true` [datafusion]

2024-08-12 Thread via GitHub
alamb commented on issue #11936: URL: https://github.com/apache/datafusion/issues/11936#issuecomment-2284459200 I think default to "no headers" is unlikely to have been a conscious choice (I suspect it was a historical accident) -- This is an automated message from the Apache Git Service.

[PR] Add native stringview support for RIGHT [datafusion]

2024-08-12 Thread via GitHub
Kev1n8 opened a new pull request, #11955: URL: https://github.com/apache/datafusion/pull/11955 ## Which issue does this PR close? Closes #11917 ## Rationale for this change ## What changes are included in this PR? Update signature of RIGHT to accept Utf

Re: [PR] Add native stringview support for RIGHT [datafusion]

2024-08-12 Thread via GitHub
Kev1n8 commented on PR #11955: URL: https://github.com/apache/datafusion/pull/11955#issuecomment-2284463629 Pasting a possible implementation of `StringViewArray` output here (current implementation returns `StringArray`): ``` fn string_view_right(args: &[ArrayRef]) -> Result {

Re: [PR] Update labeler.yml to match crates [datafusion]

2024-08-12 Thread via GitHub
alamb commented on PR #11937: URL: https://github.com/apache/datafusion/pull/11937#issuecomment-2284501265 Thanks for the review @comphead -- let's see how this works! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

  1   2   >