[PR] Add support for parsing MsSql alias with equals [datafusion-sqlparser-rs]

2024-10-10 Thread via GitHub
yoavcloud opened a new pull request, #1467: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1467 This PR addresses the MsSql syntax for select item alias using the `=` operator, for example: ``` SELECT col_alias = col FROM tbl ``` -- This is an automated message f

Re: [PR] Minor: more doc to `MemoryPool` module [datafusion]

2024-10-10 Thread via GitHub
jonahgao merged PR #12849: URL: https://github.com/apache/datafusion/pull/12849 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-10 Thread via GitHub
tokoko commented on PR #12800: URL: https://github.com/apache/datafusion/pull/12800#issuecomment-2406572500 @alamb can you force a rerun on the clippy job? seems like it failed on apt update. thanks -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [I] Regression on coercing Array of Structs [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 closed issue #12843: Regression on coercing Array of Structs URL: https://github.com/apache/datafusion/issues/12843 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 merged PR #12839: URL: https://github.com/apache/datafusion/pull/12839 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on PR #12839: URL: https://github.com/apache/datafusion/pull/12839#issuecomment-2406568395 Thanks @alamb @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on PR #12839: URL: https://github.com/apache/datafusion/pull/12839#issuecomment-2406567077 I will take a look on https://github.com/apache/datafusion/issues/5046 and struct in coalesce -- This is an automated message from the Apache Git Service. To respond to the mess

Re: [PR] chore: remove legacy comet-spark-shell [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove merged PR #1013: URL: https://github.com/apache/datafusion-comet/pull/1013 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Standardize APPROX_PERCENTILE_CONT / PERCENTILE_CONT and similar aggregation functions [datafusion]

2024-10-10 Thread via GitHub
Garamda commented on issue #11732: URL: https://github.com/apache/datafusion/issues/11732#issuecomment-2406548172 take (cf. #12824) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Fix: handle NULL offset of NTH_VALUE window function [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on PR #12851: URL: https://github.com/apache/datafusion/pull/12851#issuecomment-2406531849 @HuSen8891 This lgtm! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Added check for aggregate functions in optimizer rules [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on PR #12860: URL: https://github.com/apache/datafusion/pull/12860#issuecomment-2406516092 @jonahgao thank you for the review! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] [EPIC] Automatically generate all function content from code [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on issue #12740: URL: https://github.com/apache/datafusion/issues/12740#issuecomment-2406512268 @alamb I think it should be mentioned here that ./dev/update_function_docs.sh should be ran, as it wasn't immediately obvious, and when I did my first migration I took a bit

Re: [I] Improve error message for invalid aggregate queries [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on issue #12006: URL: https://github.com/apache/datafusion/issues/12006#issuecomment-2406509430 Figuring out the optimization part is tricky, as the error is being called during the optimization traversal. I'll unassign myself as I want to currently learn another part

Re: [I] Improve error message for invalid aggregate queries [datafusion]

2024-10-10 Thread via GitHub
2010YOUY01 commented on issue #12006: URL: https://github.com/apache/datafusion/issues/12006#issuecomment-2406493263 > @2010YOUY01 Might need some help with this one. From my understanding the function is right now being called when the type coercion rule is being called during the optimize

Re: [PR] Aggregate Function Migration [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on PR #12861: URL: https://github.com/apache/datafusion/pull/12861#issuecomment-2406478202 @alamb @Omega359 I wasn't sure how to deal with the regr functions as their file is a little unique and I am not sure how to deal with them. We could possibly deal with them in

Re: [I] [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on issue #12821: URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2406477789 https://github.com/apache/datafusion/pull/12697#discussion_r1789659808 Only Q1 slows down, but given it has nothing to do with grouping, I think we can ignore it. Thi

[PR] Aggregate Function Migration [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n opened a new pull request, #12861: URL: https://github.com/apache/datafusion/pull/12861 ## Which issue does this PR close? Closes #12827. ## Rationale for this change Migrate aggregate functions from static docs to new format. ## What changes ar

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on code in PR #12839: URL: https://github.com/apache/datafusion/pull/12839#discussion_r1796350180 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -476,6 +478,26 @@ fn type_union_resolution_coercion( type_union_resolution_coercio

Re: [I] [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench [datafusion]

2024-10-10 Thread via GitHub
Rachelint commented on issue #12821: URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2406458137 > > And I think maybe we can make clearer about when partial can help, and when partial will even get slower? > > In my mind the challenge with tweaking the "switch to p

Re: [PR] Added check for aggregate functions in optimizer rules [datafusion]

2024-10-10 Thread via GitHub
jonahgao commented on code in PR #12860: URL: https://github.com/apache/datafusion/pull/12860#discussion_r1796355617 ## datafusion/sql/src/select.rs: ## @@ -477,6 +477,15 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { let filter_expr =

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on code in PR #12839: URL: https://github.com/apache/datafusion/pull/12839#discussion_r1796350180 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -476,6 +478,26 @@ fn type_union_resolution_coercion( type_union_resolution_coercio

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on code in PR #12839: URL: https://github.com/apache/datafusion/pull/12839#discussion_r1796333672 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -476,6 +478,26 @@ fn type_union_resolution_coercion( type_union_resolution_coercio

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2406383445 I have run into a deadlock when running TPC-DS benchmarks with this feature, so I am moving to draft while I investigate. It is possibly related to the memory pool issues that

Re: [I] Unify the error handling for the RecordBatchStream [datafusion]

2024-10-10 Thread via GitHub
YjyJeff closed issue #12641: Unify the error handling for the RecordBatchStream URL: https://github.com/apache/datafusion/issues/12641 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] [logical-types] add NativeType and LogicalType [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on code in PR #12853: URL: https://github.com/apache/datafusion/pull/12853#discussion_r1796251479 ## datafusion/common/src/types/builtin.rs: ## @@ -0,0 +1,39 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [I] `octet_length()` function not working for StringView columns (SQLancer) [datafusion]

2024-10-10 Thread via GitHub
Omega359 commented on issue #12149: URL: https://github.com/apache/datafusion/issues/12149#issuecomment-2406269460 Array-string dependency was updated to 53.1.0 which includes the update from apache/arrow-rs#6305. I'll work on a PR to verify the fix in octet_length() -- This is an automa

Re: [I] Regression on coercing Array of Structs [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on issue #12843: URL: https://github.com/apache/datafusion/issues/12843#issuecomment-2406247191 ``` D create table t(s1 struct(a int, b varchar), s2 struct(a float, b varchar)); D insert into t values (row(1, 'r'), row(2.2, 'b')); D select [s1, s2] from t; ┌──

Re: [PR] Add DuckDB struct test and row as alias [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on PR #12841: URL: https://github.com/apache/datafusion/pull/12841#issuecomment-2406239454 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Add DuckDB struct test and row as alias [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 merged PR #12841: URL: https://github.com/apache/datafusion/pull/12841 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Add DuckDB struct test and row as alias [datafusion]

2024-10-10 Thread via GitHub
jayzhan211 commented on code in PR #12841: URL: https://github.com/apache/datafusion/pull/12841#discussion_r1796230654 ## datafusion/sqllogictest/test_files/struct.slt: ## @@ -373,3 +373,52 @@ You reached the bottom! statement ok drop view complex_view; + +# Test row alias +

Re: [PR] Migrate documentation for all core functions from scalar_functions.md to code [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on PR #12854: URL: https://github.com/apache/datafusion/pull/12854#issuecomment-2406223766 @Omega359 This looks good to me! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

[PR] Added check for aggregate functions in optimizer rules [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n opened a new pull request, #12860: URL: https://github.com/apache/datafusion/pull/12860 ## Which issue does this PR close? Closes #12814 . ## Rationale for this change ## What changes are included in this PR? The query is invalid due to aggr

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-10 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1796173482 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -984,30 +984,64 @@ pub async fn from_substrait_rel( /// 1. All fields present in the Substrait schema a

Re: [PR] Add TPC-DS scripts and documentation [datafusion-benchmarks]

2024-10-10 Thread via GitHub
andygrove merged PR #7: URL: https://github.com/apache/datafusion-benchmarks/pull/7 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] Add TPC-DS scripts and documentation [datafusion-benchmarks]

2024-10-10 Thread via GitHub
andygrove commented on PR #7: URL: https://github.com/apache/datafusion-benchmarks/pull/7#issuecomment-2406138068 Thanks @kazuyukitanimura and @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: Implement bloom_filter_agg [datafusion-comet]

2024-10-10 Thread via GitHub
kazuyukitanimura commented on PR #987: URL: https://github.com/apache/datafusion-comet/pull/987#issuecomment-2406118022 @mbutrovich Is it possible to trace back where `children` and `childVectors` are populated? -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] Improve error message for invalid aggregate queries [datafusion]

2024-10-10 Thread via GitHub
jonathanc-n commented on issue #12006: URL: https://github.com/apache/datafusion/issues/12006#issuecomment-2406117737 @2010YOUY01 Might need some help with this one. From my understanding the function is right now being called when the type coercion rule is being called during the optimize

[PR] chore: remove legacy comet-spark-shell [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove opened a new pull request, #1013: URL: https://github.com/apache/datafusion-comet/pull/1013 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these changes

Re: [PR] docs: clarify that Maven central only has jars for Linux [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove merged PR #1009: URL: https://github.com/apache/datafusion-comet/pull/1009 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

[PR] WIP: Upgrade to Datafusion 43 [datafusion-python]

2024-10-10 Thread via GitHub
Michael-J-Ward opened a new pull request, #905: URL: https://github.com/apache/datafusion-python/pull/905 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing ch

[I] java.lang.NoClassDefFoundError: Could not initialize class org.apache.comet.package$ [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove opened a new issue, #1012: URL: https://github.com/apache/datafusion-comet/issues/1012 ### Describe the bug I am building a Docker image using the Dockerfile from this repo and then trying to execute queries in k8s. I get the following error. ``` Caused by: java.la

Re: [PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-10 Thread via GitHub
adriangb commented on code in PR #12850: URL: https://github.com/apache/datafusion/pull/12850#discussion_r1796096929 ## datafusion/core/src/physical_optimizer/pruning.rs: ## @@ -1316,23 +1353,43 @@ const MAX_LIST_VALUE_SIZE_REWRITE: usize = 20; /// expression that will evaluate

Re: [I] Plugin can fail to initialize native library and hide the root cause [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on issue #999: URL: https://github.com/apache/datafusion-comet/issues/999#issuecomment-2406037729 Here is another related issue that I am currently running into: ``` │ Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.comet.packag

Re: [I] Migrate documentation for all datetime functions from scalar_functions.md to code [datafusion]

2024-10-10 Thread via GitHub
alamb commented on issue #12859: URL: https://github.com/apache/datafusion/issues/12859#issuecomment-2406032588 I think this is a good first issue as the pattern is clear and it would be a good way to learn how the process works -- This is an automated message from the Apache Git Service.

Re: [I] Migrate documentation for all encoding functions from scalar_functions.md to code [datafusion]

2024-10-10 Thread via GitHub
alamb commented on issue #12858: URL: https://github.com/apache/datafusion/issues/12858#issuecomment-2406030376 Actually this is already done in https://github.com/apache/datafusion/pull/12668 -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] Migrate documentation for all core functions from scalar_functions.md to code [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12854: URL: https://github.com/apache/datafusion/pull/12854#discussion_r1796085574 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -2806,93 +2751,10 @@ are not allowed ## Struct Functions -- [struct](#struct) -- [named_struct](#named_

Re: [PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-10 Thread via GitHub
adriangb commented on code in PR #12850: URL: https://github.com/apache/datafusion/pull/12850#discussion_r1796085755 ## datafusion/core/src/physical_optimizer/pruning.rs: ## @@ -1316,23 +1353,43 @@ const MAX_LIST_VALUE_SIZE_REWRITE: usize = 20; /// expression that will evaluate

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1796085416 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .creat

Re: [PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12850: URL: https://github.com/apache/datafusion/pull/12850#discussion_r1796084081 ## datafusion/core/src/physical_optimizer/pruning.rs: ## @@ -1316,23 +1353,43 @@ const MAX_LIST_VALUE_SIZE_REWRITE: usize = 20; /// expression that will evaluate to

Re: [PR] Crypto Function Migration [datafusion]

2024-10-10 Thread via GitHub
alamb merged PR #12840: URL: https://github.com/apache/datafusion/pull/12840 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Macro for creating record batch from literal slice [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12846: URL: https://github.com/apache/datafusion/pull/12846#discussion_r1796075363 ## datafusion/common/src/test_util.rs: ## @@ -279,6 +279,83 @@ pub fn get_data_dir( } } +#[macro_export] +macro_rules! create_array { +(Boolean, $values:

Re: [PR] Minor: more doc to `MemoryPool` module [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12849: URL: https://github.com/apache/datafusion/pull/12849#discussion_r1796073350 ## datafusion/execution/src/memory_pool/mod.rs: ## @@ -68,11 +68,35 @@ pub use pool::*; /// Note that a `MemoryPool` can be shared by concurrently executing plans,

Re: [PR] Add DuckDB struct test and row as alias [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12841: URL: https://github.com/apache/datafusion/pull/12841#discussion_r1796072537 ## datafusion/sqllogictest/test_files/struct.slt: ## @@ -373,3 +373,52 @@ You reached the bottom! statement ok drop view complex_view; + +# Test row alias + +que

Re: [PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-10 Thread via GitHub
alamb commented on code in PR #12839: URL: https://github.com/apache/datafusion/pull/12839#discussion_r1796062667 ## datafusion/expr-common/src/type_coercion/binary.rs: ## @@ -476,6 +478,26 @@ fn type_union_resolution_coercion( type_union_resolution_coercion(lhs

Re: [I] Regression on coercing Array of Structs [datafusion]

2024-10-10 Thread via GitHub
alamb commented on issue #12843: URL: https://github.com/apache/datafusion/issues/12843#issuecomment-2406001661 @jayzhan211 has a PR to fix this here; https://github.com/apache/datafusion/pull/12839 > however, this caused a regression on coerce_types against Array of Structs.

Re: [PR] Minor: Small comment changes in sql folder [datafusion]

2024-10-10 Thread via GitHub
alamb merged PR #12838: URL: https://github.com/apache/datafusion/pull/12838 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Support DictionaryString for Regex matching operators [datafusion]

2024-10-10 Thread via GitHub
alamb commented on PR #12768: URL: https://github.com/apache/datafusion/pull/12768#issuecomment-2405982363 Thanks @blaginin and @goldmedal for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-10 Thread via GitHub
alamb commented on PR #12830: URL: https://github.com/apache/datafusion/pull/12830#issuecomment-2405983114 Thank you again @tokoko and @vbarua for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] Support DictionaryString for Regex matching operators [datafusion]

2024-10-10 Thread via GitHub
alamb merged PR #12768: URL: https://github.com/apache/datafusion/pull/12768 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Convert `BuiltInWindowFunction::{Rank, PercentRank, DenseRank}` to a user defined functions [datafusion]

2024-10-10 Thread via GitHub
alamb closed issue #12648: Convert `BuiltInWindowFunction::{Rank, PercentRank, DenseRank}` to a user defined functions URL: https://github.com/apache/datafusion/issues/12648 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-10 Thread via GitHub
alamb merged PR #12830: URL: https://github.com/apache/datafusion/pull/12830 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Support DictionaryString for Regex matching operators [datafusion]

2024-10-10 Thread via GitHub
alamb closed issue #12618: Support DictionaryString for Regex matching operators URL: https://github.com/apache/datafusion/issues/12618 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific co

Re: [PR] Convert `rank` / `dense_rank` and `percent_rank` builtin functions to UDWF [datafusion]

2024-10-10 Thread via GitHub
alamb commented on PR #12718: URL: https://github.com/apache/datafusion/pull/12718#issuecomment-2405981840 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Convert `rank` / `dense_rank` and `percent_rank` builtin functions to UDWF [datafusion]

2024-10-10 Thread via GitHub
alamb merged PR #12718: URL: https://github.com/apache/datafusion/pull/12718 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add Python 3.13 to CI [datafusion-python]

2024-10-10 Thread via GitHub
Michael-J-Ward commented on issue #902: URL: https://github.com/apache/datafusion-python/issues/902#issuecomment-2405953777 Looks like we'll need to wait until `pyarrow` 18 to add python 3.13. https://github.com/apache/arrow/issues/43519#issuecomment-2402461382 -- This is an automa

Re: [I] ASOF join support / Specialize Range Joins [datafusion]

2024-10-10 Thread via GitHub
simonvandel commented on issue #318: URL: https://github.com/apache/datafusion/issues/318#issuecomment-2405946866 There seems to be a draft PR for iejoin here https://github.com/apache/datafusion/pull/12754. It looks like the author is ready for an initial review -- This is an automated

Re: [I] Simple Functions [datafusion]

2024-10-10 Thread via GitHub
Omega359 commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2405845576 ``` pub fn call(x: i64, y: i64) -> Result { datafusion::functions::math::gcd::compute_gcd(x, y) } ``` Wouldn't this incur a significant amount

Re: [PR] Ballista reloaded - proposed changes to core ballista [datafusion-ballista]

2024-10-10 Thread via GitHub
metegenez commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2405843070 An extensible distributed query engine could be a game-changer for Ballista, as it would reduce the time required to maintain the codebase. Shifting part of the code would

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-10 Thread via GitHub
comphead commented on code in PR #12822: URL: https://github.com/apache/datafusion/pull/12822#discussion_r1795932350 ## datafusion/pre-macros/src/lib.rs: ## @@ -0,0 +1,179 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreement

[PR] chore(ci): Add python 3.13 to CI [datafusion-python]

2024-10-10 Thread via GitHub
Michael-J-Ward opened a new pull request, #904: URL: https://github.com/apache/datafusion-python/pull/904 # Which issue does this PR close? Closes #902. # Rationale for this change Python 3.13 is the latest python release. https://devguide.python.org/versions/

Re: [PR] WIP: Generate docs from macros. [datafusion]

2024-10-10 Thread via GitHub
comphead commented on PR #12822: URL: https://github.com/apache/datafusion/pull/12822#issuecomment-2405804413 @alamb @Omega359 please have a look on real example `to_date` (I still need to include argements to be called with the builder). As you can see it is the same approach as before, th

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-10 Thread via GitHub
vbarua commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1795889580 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -984,30 +984,64 @@ pub async fn from_substrait_rel( /// 1. All fields present in the Substrait schema a

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-10 Thread via GitHub
vbarua commented on code in PR #12830: URL: https://github.com/apache/datafusion/pull/12830#discussion_r1795877281 ## datafusion/substrait/tests/testdata/test_plans/intersect.substrait.json: ## @@ -0,0 +1,118 @@ +{ + "relations": [ +{ + "root": { +"input": { +

[PR] wip: Convert `BuiltInWindowFunction::{Lead, Lag}` to a user defined window function [datafusion]

2024-10-10 Thread via GitHub
jcsherin opened a new pull request, #12857: URL: https://github.com/apache/datafusion/pull/12857 ## Which issue does this PR close? Closes #12802. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [I] Bug in `Display` for `ScalarValue::Struct` [datafusion]

2024-10-10 Thread via GitHub
avantgardnerio closed issue #12855: Bug in `Display` for `ScalarValue::Struct` URL: https://github.com/apache/datafusion/issues/12855 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comm

Re: [PR] Fix Bug in Display for ScalarValue::Struct [datafusion]

2024-10-10 Thread via GitHub
avantgardnerio merged PR #12856: URL: https://github.com/apache/datafusion/pull/12856 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

Re: [PR] Introduce `binary_as_string` parquet option [datafusion]

2024-10-10 Thread via GitHub
goldmedal commented on PR #12816: URL: https://github.com/apache/datafusion/pull/12816#issuecomment-2405668858 @alamb I have confirmed this feature works well and added some tests for it. Only some concerns about https://github.com/apache/datafusion/pull/12816#discussion_r1795799303.

Re: [PR] Ballista reloaded - proposed changes to core ballista [datafusion-ballista]

2024-10-10 Thread via GitHub
andygrove commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2405617156 There seems to be general consensus that this is a good direction. It would be good to know who is willing to work on this initiative. I imagine that it will require

Re: [PR] Introduce `binary_as_string` parquet option [datafusion]

2024-10-10 Thread via GitHub
goldmedal commented on code in PR #12816: URL: https://github.com/apache/datafusion/pull/12816#discussion_r1795800690 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -302,6 +302,87 @@ pub(crate) fn coerce_file_schema_to_view_type( )) } +/// Transform a schema

Re: [PR] Introduce `binary_as_string` parquet option [datafusion]

2024-10-10 Thread via GitHub
goldmedal commented on code in PR #12816: URL: https://github.com/apache/datafusion/pull/12816#discussion_r1795799303 ## datafusion/core/src/datasource/file_format/mod.rs: ## @@ -302,6 +302,87 @@ pub(crate) fn coerce_file_schema_to_view_type( )) } +/// Transform a schema

Re: [I] Include Apple OSX support in jars in Maven central [datafusion-comet]

2024-10-10 Thread via GitHub
parthchandra commented on issue #1010: URL: https://github.com/apache/datafusion-comet/issues/1010#issuecomment-2405591638 Do we want this in a single uber jar? It'll increase the size of the jar considerably. If we have individual jars for each platform we'll end up with a lot of combi

[PR] broken commit [datafusion]

2024-10-10 Thread via GitHub
avantgardnerio opened a new pull request, #12856: URL: https://github.com/apache/datafusion/pull/12856 ## Which issue does this PR close? Closes #12855. ## Rationale for this change Fix the bug ## What changes are included in this PR? A broken test (TODO)

[I] Bug in `Display` for `ScalarValue::Struct` [datafusion]

2024-10-10 Thread via GitHub
avantgardnerio opened a new issue, #12855: URL: https://github.com/apache/datafusion/issues/12855 ### Describe the bug There is an assertion failure in `Display` for `ScalarValue::Struct` ### To Reproduce See PR test ### Expected behavior It works ###

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2405546782 @viirya @parthchandra This is now ready for review. The new option is disabled by default and I added a section to the tuning guide explaining why users may want to enable thi

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1795751218 ## kube/Dockerfile: ## @@ -65,4 +65,4 @@ ENV SCALA_VERSION=2.12 USER root # note the use of a wildcard in the file name so that this works with both sna

Re: [I] ASOF join support / Specialize Range Joins [datafusion]

2024-10-10 Thread via GitHub
xudong963 commented on issue #318: URL: https://github.com/apache/datafusion/issues/318#issuecomment-2405492685 Leaving aside the asof join syntax. For range join, which contains non-equivalent join conditions, e.g. `t1.a > t2.b`, but no equivalent conditions, (timing scenario) such S

Re: [PR] Add additional regexp function regexp_count() [datafusion]

2024-10-10 Thread via GitHub
Omega359 commented on PR #12080: URL: https://github.com/apache/datafusion/pull/12080#issuecomment-2405489396 @alamb - since @xinlifoobar seems to be dormant my thoughts on this PR is to merge it in and file a couple of tickets to improve it, primarily the performance discrepancy. --

Re: [PR] feat: Use fair-spill pool when `spark.memory.offHeap.enabled=false` [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on PR #1004: URL: https://github.com/apache/datafusion-comet/pull/1004#issuecomment-2405461737 Thanks for the detailed feedback @Kontinuation. I plan to resume work on this today/tomorrow. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2405460156 I will add documentation to this PR today, explaining pros/cons of this feature in our tuning guide. -- This is an automated message from the Apache Git Service. To respond

Re: [PR] perf: Enable replaceSortMergeJoin by default [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove closed pull request #1008: perf: Enable replaceSortMergeJoin by default URL: https://github.com/apache/datafusion-comet/pull/1008 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [I] Simple Functions [datafusion]

2024-10-10 Thread via GitHub
comphead commented on issue #12635: URL: https://github.com/apache/datafusion/issues/12635#issuecomment-2405413561 > I experimented with this on the way from DataFusion meetup in Belgrade. > > i came up with something like this > > function author would write this > row-

Re: [I] Detect stack overflow and reduce stack usage on debug build [datafusion-sqlparser-rs]

2024-10-10 Thread via GitHub
alamb commented on issue #1465: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1465#issuecomment-2405409424 Related discussions about recursion limits: - [ ] https://github.com/apache/datafusion-sqlparser-rs/issues/305 - [ ] https://github.com/apache/datafusion-sqlparser

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-10 Thread via GitHub
alamb closed issue #12731: Stack overflow with LEAD and LAG functions URL: https://github.com/apache/datafusion/issues/12731 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-10 Thread via GitHub
alamb commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2405404843 Let's use https://github.com/apache/datafusion-sqlparser-rs/issues/1465 to track this issue -- This is an automated message from the Apache Git Service. To respond to the messag

Re: [PR] Implement special min/max accumulator for Strings and Binary (10% faster for Clickbench Q28) [datafusion]

2024-10-10 Thread via GitHub
alamb commented on PR #12792: URL: https://github.com/apache/datafusion/pull/12792#issuecomment-2405399916 Thank you for the reviews @jayzhan211 and @Dandandan I plan to merge this tomorrow to give other people a chance to review it if they would like -- This is an automated messa

Re: [PR] Ballista reloaded - proposed changes to core ballista [datafusion-ballista]

2024-10-10 Thread via GitHub
alamb commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2405397448 > Datafusion team did great job with Sink support. Kudos to @metesynnada and @devindangelo -- This is an automated message from the Apache Git Service. To respond to t

[I] Include Apple OSX support in jars in Maven central [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove opened a new issue, #1010: URL: https://github.com/apache/datafusion-comet/issues/1010 ### What is the problem the feature request solves? _No response_ ### Describe the potential solution _No response_ ### Additional context _No response_ -- Th

[PR] docs: clarify that Maven central only has jars for Linux [datafusion-comet]

2024-10-10 Thread via GitHub
andygrove opened a new pull request, #1009: URL: https://github.com/apache/datafusion-comet/pull/1009 ## Which issue does this PR close? N/A ## Rationale for this change Address a cause of confusion in our installation guide. ## What changes are inc

Re: [I] [EPIC] Decouple logical from physical types [datafusion]

2024-10-10 Thread via GitHub
notfilippo commented on issue #12622: URL: https://github.com/apache/datafusion/issues/12622#issuecomment-2405371092 > i was hoping we get logical types sooner than later, even if nothing uses them initially. Simple functions https://github.com/apache/datafusion/issues/12635 is currently bl

Re: [PR] [logical-types] add NativeType and LogicalType [datafusion]

2024-10-10 Thread via GitHub
notfilippo commented on code in PR #12853: URL: https://github.com/apache/datafusion/pull/12853#discussion_r1795618569 ## datafusion/common/src/types/logical.rs: ## @@ -0,0 +1,41 @@ +use core::fmt; +use std::{cmp::Ordering, hash::Hash, sync::Arc}; + +use super::NativeType; + +//

  1   2   >