Re: [I] CometSparkToColumnar should have different name for row vs columnar input [datafusion-comet]

2024-09-20 Thread via GitHub
JensonChoi commented on issue #936: URL: https://github.com/apache/datafusion-comet/issues/936#issuecomment-2365034248 Hey, I'd love to pick this up if possible. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Create try_new and change calls to new [datafusion]

2024-09-20 Thread via GitHub
OussamaSaoudi opened a new pull request, #12566: URL: https://github.com/apache/datafusion/pull/12566 ## Which issue does this PR close? Closes #12554 ## What changes are included in this PR? - Move existing functionality of `RuntimeEnv::new` to `RuntimeEnv::t

Re: [I] Make it clearer that `RuntimeEnv::new()` is fallable [datafusion]

2024-09-20 Thread via GitHub
OussamaSaoudi commented on issue #12554: URL: https://github.com/apache/datafusion/issues/12554#issuecomment-2365024042 take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Make it clearer that `RuntimeEnv::new()` is fallable [datafusion]

2024-09-20 Thread via GitHub
OussamaSaoudi commented on issue #12554: URL: https://github.com/apache/datafusion/issues/12554#issuecomment-2365022670 I'd like to take this :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

[PR] Implement GROUPING aggregate function (following Postgres behavior.) [datafusion]

2024-09-20 Thread via GitHub
bgjackma opened a new pull request, #12565: URL: https://github.com/apache/datafusion/pull/12565 ## Which issue does this PR close? Closes #5647. ## Rationale for this change Implements the GROUPING function as per Postgres. https://www.postgresql.org/docs/15/func

Re: [I] Automate testing / ensuring that string functions get the same answer for String, LargeString, StringView, DictionaryString, etc [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on issue #12415: URL: https://github.com/apache/datafusion/issues/12415#issuecomment-2365010349 Move the TODO list here: ### Some TODO items should be finished in the follow-up PR - The remaining tests in the `string_view.slt` - [ ] `LIKE/ILIKE` - [

Re: [PR] Automate sqllogictest for String, LargeString and StringView behavior [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on PR #12525: URL: https://github.com/apache/datafusion/pull/12525#issuecomment-2365010145 Thanks @alamb @2010YOUY01 🙇‍♂️ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] docs: :memo: Add expected answers to `DataFrame` method examples [datafusion]

2024-09-20 Thread via GitHub
Eason0729 opened a new pull request, #12564: URL: https://github.com/apache/datafusion/pull/12564 ## Which issue does this PR close? Closes #12527. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

Re: [I] Add expected answers to `DataFrame` method examples [datafusion]

2024-09-20 Thread via GitHub
Eason0729 commented on issue #12527: URL: https://github.com/apache/datafusion/issues/12527#issuecomment-2364991754 I have added expected result in my fork. Could I also change doctest data to make methods' usage more apparent? -- This is an automated message from the Apache Git Ser

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-20 Thread via GitHub
jayzhan211 commented on code in PR #12269: URL: https://github.com/apache/datafusion/pull/12269#discussion_r1769462304 ## datafusion/physical-expr-common/src/group_value_row.rs: ## @@ -0,0 +1,393 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

[PR] feat: Support adding a single new table factory to SessionStateBuilder [datafusion]

2024-09-20 Thread via GitHub
Weijun-H opened a new pull request, #12563: URL: https://github.com/apache/datafusion/pull/12563 ## Which issue does this PR close? Closes #12552 ## Rationale for this change ## What changes are included in this PR? ## Are these changes test

Re: [I] Support adding a single new table factory to `SessionStateBuilder` [datafusion]

2024-09-20 Thread via GitHub
Weijun-H commented on issue #12552: URL: https://github.com/apache/datafusion/issues/12552#issuecomment-2364961770 \take -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-20 Thread via GitHub
jayzhan211 commented on code in PR #12269: URL: https://github.com/apache/datafusion/pull/12269#discussion_r1769459683 ## datafusion/physical-expr-common/src/group_value_row.rs: ## @@ -0,0 +1,393 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contr

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-20 Thread via GitHub
Weijun-H commented on PR #12490: URL: https://github.com/apache/datafusion/pull/12490#issuecomment-2364947286 ship it! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [PR] fix: coalesce schema issues [datafusion]

2024-09-20 Thread via GitHub
jayzhan211 commented on code in PR #12308: URL: https://github.com/apache/datafusion/pull/12308#discussion_r1769452189 ## datafusion/functions/src/encoding/inner.rs: ## @@ -49,17 +48,8 @@ impl Default for EncodeFunc { impl EncodeFunc { pub fn new() -> Self { -use

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364931941 > > Nope, sorry -- I just need to take a final review of the logic. I will do so now. Sorry for the delay > > Basically I wanted to make sure it improved performance before I

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364892694 > BTW have you profiled the benchmarks now? I wonder where time is being spent? Perhaps there are more places to improve Planned but still not, I will do it now. -- This i

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1769427084 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1769427084 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1769415355 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1769415355 ## datafusion/functions/src/string/common.rs: ## @@ -72,65 +94,126 @@ pub(crate) fn general_trim( }; if use_string_view { -string_view_trim::

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-20 Thread via GitHub
timsaucer commented on PR #880: URL: https://github.com/apache/datafusion-python/pull/880#issuecomment-2364792265 Also should add an example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add user defined window function support [datafusion-python]

2024-09-20 Thread via GitHub
timsaucer commented on PR #880: URL: https://github.com/apache/datafusion-python/pull/880#issuecomment-2364792223 I'll move it out of draft after adding unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Add user defined window function support [datafusion-python]

2024-09-20 Thread via GitHub
timsaucer opened a new pull request, #880: URL: https://github.com/apache/datafusion-python/pull/880 # Which issue does this PR close? Closes #845 # Rationale for this change Currently users can only create user defined scalar functions and user defined window function

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove merged PR #946: URL: https://github.com/apache/datafusion-comet/pull/946 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] fix: CometScanExec on Spark 3.5.2 [datafusion-comet]

2024-09-20 Thread via GitHub
Kimahriman commented on PR #915: URL: https://github.com/apache/datafusion-comet/pull/915#issuecomment-2364731723 Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
parthchandra commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769335335 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contribu

Re: [PR] WIP: Account for constant equivalence properties in union, tests [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12562: URL: https://github.com/apache/datafusion/pull/12562#discussion_r1769323644 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -2617,379 +2633,356 @@ mod tests { )) } -#[tokio::test] -async fn test_union

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-20 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2364693093 I made progress and started a PR here: https://github.com/apache/datafusion/pull/12562. It is all tests at the moment, but I think I now have a better handle on what is needed. I'

[PR] WIP: Account for constant equivalence properties in union, tests [datafusion]

2024-09-20 Thread via GitHub
alamb opened a new pull request, #12562: URL: https://github.com/apache/datafusion/pull/12562 Draft as I haven't actually fixed the bug yet, I am just writing tests ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/12446 Closes https://

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769302390 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769300429 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769298721 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769289628 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769286272 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on code in PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#discussion_r1769284275 ## dev/release/publish-to-maven.sh: ## @@ -0,0 +1,178 @@ +#!/bin/bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

Re: [I] Add support for user defined window functions [datafusion-python]

2024-09-20 Thread via GitHub
timsaucer commented on issue #845: URL: https://github.com/apache/datafusion-python/issues/845#issuecomment-2364639693 Mostly complete, should be ready early next week -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Implement mode function [datafusion]

2024-09-20 Thread via GitHub
dmitrybugakov commented on PR #12385: URL: https://github.com/apache/datafusion/pull/12385#issuecomment-2364612979 @alamb thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
thinkharderdev commented on PR #12523: URL: https://github.com/apache/datafusion/pull/12523#issuecomment-2364578331 > Maybe its easier to build some diagram in draw.io or something? I got the point about shared state but I'm not sure how it will be travelling from caller side and to caller

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
thinkharderdev commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769224360 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -71,11 +71,68 @@ use datafusion_physical_expr::equivalence::{ use datafusion_physical_expr::Physi

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
thinkharderdev commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769224022 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -71,11 +71,68 @@ use datafusion_physical_expr::equivalence::{ use datafusion_physical_expr::Physi

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
thinkharderdev commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769221343 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -131,8 +192,29 @@ impl JoinLeftData { /// Decrements the counter of running threads, and retu

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12514: URL: https://github.com/apache/datafusion/pull/12514#discussion_r1769218895 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -224,10 +240,28 @@ fn seconds(array: &dyn Array, unit: TimeUnit) -> Result { let subsecs = date_part(a

Re: [I] Extract from interval type failed [datafusion]

2024-09-20 Thread via GitHub
alamb closed issue #6327: Extract from interval type failed URL: https://github.com/apache/datafusion/issues/6327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-20 Thread via GitHub
alamb merged PR #12514: URL: https://github.com/apache/datafusion/pull/12514 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] is there a way to get a result similar to `datediff` function [datafusion]

2024-09-20 Thread via GitHub
alamb closed issue #7097: is there a way to get a result similar to `datediff` function URL: https://github.com/apache/datafusion/issues/7097 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix unparsing offset [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12539: URL: https://github.com/apache/datafusion/pull/12539#issuecomment-2364569158 🚀 -- thanks again -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Fix unparsing offset [datafusion]

2024-09-20 Thread via GitHub
alamb merged PR #12539: URL: https://github.com/apache/datafusion/pull/12539 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Fix unparsing OFFSET [datafusion]

2024-09-20 Thread via GitHub
alamb closed issue #12538: Fix unparsing OFFSET URL: https://github.com/apache/datafusion/issues/12538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: g

Re: [PR] fix: CometScanExec on Spark 3.5.2 [datafusion-comet]

2024-09-20 Thread via GitHub
codecov-commenter commented on PR #915: URL: https://github.com/apache/datafusion-comet/pull/915#issuecomment-2364557819 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/915?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] Implement mode function [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12385: URL: https://github.com/apache/datafusion/pull/12385#issuecomment-2364555142 > Should I first patch ArrowBytesMap and then wait for the new release of the DataFusion core, or is there a more efficient way to proceed? I would recommend: 1. Make a PR to

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364540711 BTW have you profiled the benchmarks now? I wonder where time is being spent? Perhaps there are more places to improve -- This is an automated message from the Apache Git Service. To

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12395: URL: https://github.com/apache/datafusion/pull/12395#discussion_r1769179503 ## datafusion/functions/src/string/common.rs: ## @@ -21,17 +21,39 @@ use std::fmt::{Display, Formatter}; use std::sync::Arc; use arrow::array::{ -new_null_ar

Re: [PR] Improve performance of `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364518461 > Nope, sorry -- I just need to take a final review of the logic. I will do so now. Sorry for the delay Basically I wanted to make sure it improved performance before I spent tim

Re: [PR] chore: Show reason for falling back to Spark when SMJ with join condition is not enabled [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove merged PR #956: URL: https://github.com/apache/datafusion-comet/pull/956 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Fix regression on register_udaf [datafusion-python]

2024-09-20 Thread via GitHub
Michael-J-Ward commented on PR #878: URL: https://github.com/apache/datafusion-python/pull/878#issuecomment-2364502368 Python 3.12 did change the way missing abstract methods are reported. Issue: https://github.com/python/cpython/issues/98284 Changelog: https://docs.python.org/3/w

Re: [PR] Improve `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364495550 > Hi, some other works needed before merging? Nope, sorry -- I just need to take a final review of the logic. I will do so now. Sorry for the delay -- This is an automated mes

Re: [PR] Automate sqllogictest for String, LargeString and StringView behavior [datafusion]

2024-09-20 Thread via GitHub
alamb merged PR #12525: URL: https://github.com/apache/datafusion/pull/12525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Automate sqllogictest for String, LargeString and StringView behavior [datafusion]

2024-09-20 Thread via GitHub
alamb commented on PR #12525: URL: https://github.com/apache/datafusion/pull/12525#issuecomment-2364459295 THANK YOU SO MUCH @goldmedal and @2010YOUY01 -- this looks really great -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Improve `trim` for string view [datafusion]

2024-09-20 Thread via GitHub
Rachelint commented on PR #12395: URL: https://github.com/apache/datafusion/pull/12395#issuecomment-2364458710 > ``` > ++ critcmp main string-view-trim > group main stri

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12490: URL: https://github.com/apache/datafusion/pull/12490#discussion_r1769137425 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1021,6 +1021,22 @@ fn list_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option { use arr

Re: [PR] fix: CometScanExec on Spark 3.5.2 [datafusion-comet]

2024-09-20 Thread via GitHub
Kimahriman commented on PR #915: URL: https://github.com/apache/datafusion-comet/pull/915#issuecomment-2364453012 Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
comphead commented on PR #12523: URL: https://github.com/apache/datafusion/pull/12523#issuecomment-2364397811 Maybe its easier to build some diagram in draw.io or something? I got the point about shared state but I'm not sure how it will be travelling from caller side and to caller side

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
comphead commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769078580 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -131,8 +192,29 @@ impl JoinLeftData { /// Decrements the counter of running threads, and returns `t

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
comphead commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769076385 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -71,11 +71,68 @@ use datafusion_physical_expr::equivalence::{ use datafusion_physical_expr::PhysicalExp

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
comphead commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769075068 ## datafusion/physical-plan/src/joins/hash_join.rs: ## @@ -71,11 +71,68 @@ use datafusion_physical_expr::equivalence::{ use datafusion_physical_expr::PhysicalExp

Re: [PR] Add hook for sharing join state in distributed execution [datafusion]

2024-09-20 Thread via GitHub
comphead commented on PR #12523: URL: https://github.com/apache/datafusion/pull/12523#issuecomment-2364357186 @korowa FYI -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] fix: CometScanExec on Spark 3.5.2 [datafusion-comet]

2024-09-20 Thread via GitHub
Kimahriman commented on PR #915: URL: https://github.com/apache/datafusion-comet/pull/915#issuecomment-2364344001 > @Kimahriman would you be able to rebase this PR so that we can merge it? Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-20 Thread via GitHub
korowa commented on code in PR #12531: URL: https://github.com/apache/datafusion/pull/12531#discussion_r1769051751 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -456,21 +458,72 @@ struct NestedLoopJoinStream { // null_equals_null: bool /// Join execu

Re: [PR] Implement mode function [datafusion]

2024-09-20 Thread via GitHub
dmitrybugakov commented on PR #12385: URL: https://github.com/apache/datafusion/pull/12385#issuecomment-2364311428 Hi @alamb, Could you help me choose the right approach? I’ve started working on [this issue](https://github.com/apache/datafusion/issues/12254#issuecomment-2356571

Re: [PR] Automate sqllogictest for String, LargeString and StringView behavior [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on PR #12525: URL: https://github.com/apache/datafusion/pull/12525#issuecomment-2364281908 > This is well organized 👍🏼 > > I have one minor suggestion: we should add a separate README under `test_files/string/` to explain the structure Nice idea! I have adde

Re: [PR] Avoid RowConverter for multi column grouping [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12269: URL: https://github.com/apache/datafusion/pull/12269#discussion_r1769015724 ## datafusion/physical-expr-common/src/group_value_row.rs: ## @@ -0,0 +1,393 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [I] Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS [datafusion]

2024-09-20 Thread via GitHub
eejbyfeldt commented on issue #5647: URL: https://github.com/apache/datafusion/issues/5647#issuecomment-2364228167 Note that @mingmwang (the author of this ticket) had an alternative approach here: https://github.com/apache/datafusion/pull/5749 but it seems like it was never pushed above th

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on PR #12490: URL: https://github.com/apache/datafusion/pull/12490#issuecomment-2364227006 > The position is incorrect, > > ``` > query ? > select arrow_cast([1,2,3], 'FixedSizeList(3, Int32)'); > > [1, 2, 3] > ``` Thanks @jayzhan211 for

[PR] chore: Show reason for falling back to Spark when SMJ with join condition is not enabled [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove opened a new pull request, #956: URL: https://github.com/apache/datafusion-comet/pull/956 ## Which issue does this PR close? N/A ## Rationale for this change The fallback message in the plan was not very helpful. Before: ``` Sort

Re: [PR] build: Upgrade Spark 4.0 to preview2 [datafusion-comet]

2024-09-20 Thread via GitHub
viirya commented on PR #955: URL: https://github.com/apache/datafusion-comet/pull/955#issuecomment-2364222493 The preview1 diff cannot be used on preview2 due to some conflicts. I need to fix them. -- This is an automated message from the Apache Git Service. To respond to the message, ple

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on code in PR #12490: URL: https://github.com/apache/datafusion/pull/12490#discussion_r1768980396 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -1021,6 +1021,22 @@ fn list_coercion(lhs_type: &DataType, rhs_type: &DataType) -> Option { use

[PR] build: Upgrade Spark 4.0 to preview2 [datafusion-comet]

2024-09-20 Thread via GitHub
viirya opened a new pull request, #955: URL: https://github.com/apache/datafusion-comet/pull/955 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes test

Re: [PR] Add `field` trait method to `WindowUDFImpl` [datafusion]

2024-09-20 Thread via GitHub
jcsherin commented on PR #12374: URL: https://github.com/apache/datafusion/pull/12374#issuecomment-2364187830 > If this is "too late" for this kind of comment, please let me know and I'll delete. I hadn't seen the issue / PR work until today. @Michael-J-Ward Appreciate the extra set

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on PR #12536: URL: https://github.com/apache/datafusion/pull/12536#issuecomment-2364175838 Brief code-level documentation -- what this new Scalar is, what it is not, what are the constraints and what's the goal of this evolution is a must for this. I would like to r

[PR] add flags for temporary ddl [datafusion]

2024-09-20 Thread via GitHub
hailelagi opened a new pull request, #12561: URL: https://github.com/apache/datafusion/pull/12561 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1768940239 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -195,9 +197,12 @@ impl ColumnarValue { kernels::cast::cast_with_options(array, cast_type,

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1768938948 ## datafusion/expr-common/src/columnar_value.rs: ## @@ -89,7 +91,7 @@ pub enum ColumnarValue { /// Array of values Array(ArrayRef), /// A single val

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1768935063 ## datafusion/expr-common/src/scalar.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1768934156 ## datafusion/expr-common/src/scalar.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] Fix regression on register_udaf [datafusion-python]

2024-09-20 Thread via GitHub
timsaucer commented on PR #878: URL: https://github.com/apache/datafusion-python/pull/878#issuecomment-2364156427 Looks like a small difference on the captured exception string, will try to address soon -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-20 Thread via GitHub
findepi commented on code in PR #12536: URL: https://github.com/apache/datafusion/pull/12536#discussion_r1768932532 ## datafusion/expr-common/src/scalar.rs: ## @@ -0,0 +1,109 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreem

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-09-20 Thread via GitHub
viirya commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1768932963 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -3277,4 +3295,41 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on code in PR #12490: URL: https://github.com/apache/datafusion/pull/12490#discussion_r1768930022 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1811,6 +1811,211 @@ mod test { Ok(()) } +#[test] +fn tes_case_when_list() -

Re: [PR] Remove ScalarValue::Dictionary [datafusion]

2024-09-20 Thread via GitHub
findepi commented on PR #12488: URL: https://github.com/apache/datafusion/pull/12488#issuecomment-2364151547 @notfilippo @alamb do you see any value in this PR, or should we just close it? -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Remove ScalarValue::Dictionary [datafusion]

2024-09-20 Thread via GitHub
findepi commented on PR #12488: URL: https://github.com/apache/datafusion/pull/12488#issuecomment-2364149964 > 2\. `Datum` (single row array) is quite a bit less efficient than `ScalarValue` (e.g. a single row StringArray will have several allocations for buffers, offsets, etc) for f

Re: [PR] parquet: Make page_index/pushdown metrics consistent with row_group metrics [datafusion]

2024-09-20 Thread via GitHub
progval commented on code in PR #12545: URL: https://github.com/apache/datafusion/pull/12545#discussion_r1768922823 ## datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs: ## @@ -276,6 +281,14 @@ fn rows_skipped(selection: &RowSelection) -> usize { .fold

Re: [I] Fall back to Spark if query uses DPP to avoid perf regressions in TPC-DS [datafusion-comet]

2024-09-20 Thread via GitHub
andygrove commented on issue #895: URL: https://github.com/apache/datafusion-comet/issues/895#issuecomment-2364137728 This is resolved for v1 data sources but not for v2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] fix: window function range offset should be long instead of int [datafusion-comet]

2024-09-20 Thread via GitHub
viirya commented on code in PR #733: URL: https://github.com/apache/datafusion-comet/pull/733#discussion_r1768920385 ## native/core/src/execution/datafusion/planner.rs: ## @@ -1692,16 +1692,33 @@ impl PhysicalPlanner { .and_then(|inner| inner.lower_frame_bound_struc

Re: [PR] fix: Normalize NaN and zeros for floating number comparison [datafusion-comet]

2024-09-20 Thread via GitHub
kazuyukitanimura commented on PR #953: URL: https://github.com/apache/datafusion-comet/pull/953#issuecomment-2364131269 Related https://github.com/apache/datafusion-comet/pull/585 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] Fix unparse table scan with the projection pushdown [datafusion]

2024-09-20 Thread via GitHub
goldmedal commented on PR #12534: URL: https://github.com/apache/datafusion/pull/12534#issuecomment-2364120177 Thanks @alamb @sgrebnov @phillipleblanc for the review 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

[I] Potential regression in Schema / nullability calculations after upgrade to 42.0.0 [datafusion]

2024-09-20 Thread via GitHub
alamb opened a new issue, #12560: URL: https://github.com/apache/datafusion/issues/12560 ### Describe the bug @phillipleblanc and @itsjunetime have both hit upgrades related to nullability and other metadata in schemas after the DataFusion 42 upgrade. In addition, @ion-elgre

Re: [I] TPC-H queries are failing on main branch [datafusion-ballista]

2024-09-20 Thread via GitHub
my-vegetable-has-exploded commented on issue #1058: URL: https://github.com/apache/datafusion-ballista/issues/1058#issuecomment-2364096102 ref to -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] feat(planner): Allowing setting sort order of parquet files without specifying the schema [datafusion]

2024-09-20 Thread via GitHub
alamb commented on code in PR #12466: URL: https://github.com/apache/datafusion/pull/12466#discussion_r1768884987 ## datafusion/sql/src/statement.rs: ## @@ -1136,14 +1136,29 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { schema: &DFSchemaRef, planner_context

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-20 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2364081374 > Actually IMO the result should be [[a, a0, c], [a0, a, c]]. To do that, input orderings should be adjusted such that: Ah, yes that makes sense -- that `c` can be at the en

  1   2   >