Re: [PR] chore: Bump arrow-rs to 53.1.0 and datafusion [datafusion-comet]

2024-10-09 Thread via GitHub
kazuyukitanimura commented on PR #1001: URL: https://github.com/apache/datafusion-comet/pull/1001#issuecomment-2404186717 Confirmed it was about adding structure support. Opened https://github.com/apache/datafusion/issues/12843 Ideal to fix it before the next release, otherwise it is a r

[I] Regression on coercing Array of Structs [datafusion]

2024-10-09 Thread via GitHub
kazuyukitanimura opened a new issue, #12843: URL: https://github.com/apache/datafusion/issues/12843 ### Describe the bug https://github.com/apache/datafusion/pull/12753/files#diff-f1e354d4fe26237064d8194e10a6008efa4f88e2b68b8a8352086a5d011180b8R108 introduced to use `type_union_resol

[I] Clippy error on `datafusion-wasmtest` [datafusion]

2024-10-09 Thread via GitHub
jayzhan211 opened a new issue, #12842: URL: https://github.com/apache/datafusion/issues/12842 ### Describe the bug CI failed due to the following error ``` error: function `datafusion_test` is never used --> datafusion/wasmtest/src/lib.rs:91:8 | 91 | fn data

[PR] Add DuckDB struct test and row as alias [datafusion]

2024-10-09 Thread via GitHub
jayzhan211 opened a new pull request, #12841: URL: https://github.com/apache/datafusion/pull/12841 ## Which issue does this PR close? Closes #. ## Rationale for this change Add support for DuckDB-like syntax and functions to facilitate testing ## Wh

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-09 Thread via GitHub
tokoko commented on code in PR #12830: URL: https://github.com/apache/datafusion/pull/12830#discussion_r1794620771 ## datafusion/substrait/tests/testdata/test_plans/intersect.substrait.json: ## @@ -0,0 +1,118 @@ +{ + "relations": [ +{ + "root": { +"input": { +

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-09 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1794614220 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -984,30 +984,64 @@ pub async fn from_substrait_rel( /// 1. All fields present in the Substrait schema a

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-09 Thread via GitHub
tokoko commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1794571803 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -984,30 +984,64 @@ pub async fn from_substrait_rel( /// 1. All fields present in the Substrait schema a

Re: [PR] feat: Added DataFrameWriteOptions option when writing as csv, json, p… [datafusion-python]

2024-10-09 Thread via GitHub
allinux closed pull request #857: feat: Added DataFrameWriteOptions option when writing as csv, json, p… URL: https://github.com/apache/datafusion-python/pull/857 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL ab

Re: [PR] chore: Reserve memory for native shuffle writer per partition [datafusion-comet]

2024-10-09 Thread via GitHub
Kontinuation commented on code in PR #988: URL: https://github.com/apache/datafusion-comet/pull/988#discussion_r1794544786 ## native/core/src/execution/datafusion/shuffle_writer.rs: ## @@ -1504,25 +1586,65 @@ mod test { #[test] #[cfg_attr(miri, ignore)] // miri can't c

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
Eason0729 commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2403788185 https://github.com/apache/datafusion-sqlparser-rs/issues/1465 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Detect stack overflow and reduce stack usage on debug build [datafusion-sqlparser-rs]

2024-10-09 Thread via GitHub
Eason0729 commented on issue #1465: URL: https://github.com/apache/datafusion-sqlparser-rs/issues/1465#issuecomment-2403787866 I would like to take `stacker` approach, that is: 1. implement stack guard at some point 2. if red zone is reached, try grow stack 3. if unable to grow stac

Re: [I] Unify the error handling for the RecordBatchStream [datafusion]

2024-10-09 Thread via GitHub
YjyJeff commented on issue #12641: URL: https://github.com/apache/datafusion/issues/12641#issuecomment-2403766616 > > CollectErrorThenEmitStream wrapper to avoid the repeated logic. Users of the datafusion may have to use this stream in many places. When a user wants to modify the error han

Re: [I] Panic in scalar function `approx_percentile_cont_with_weight` (SQLancer) [datafusion]

2024-10-09 Thread via GitHub
jonahgao closed issue #12716: Panic in scalar function `approx_percentile_cont_with_weight` (SQLancer) URL: https://github.com/apache/datafusion/issues/12716 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-09 Thread via GitHub
jonahgao merged PR #12823: URL: https://github.com/apache/datafusion/pull/12823 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Migrate documentation for aggregate functions from aggregate_functions.md to code [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n commented on issue #12827: URL: https://github.com/apache/datafusion/issues/12827#issuecomment-2403722048 Will work on this after @Omega359 approves of my first pr! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Crypto Function Migration [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n commented on PR #12840: URL: https://github.com/apache/datafusion/pull/12840#issuecomment-2403694167 @Omega359 Does this look correct? That was so cool btw, running the script and then everything appearing -- This is an automated message from the Apache Git Service. To respond

[PR] Crypto Function Migration [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n opened a new pull request, #12840: URL: https://github.com/apache/datafusion/pull/12840 ## Which issue does this PR close? Closes #12828. ## Rationale for this change Migration of crypto functions ## What changes are included in this PR? Migr

[PR] Support struct coercion in `type_union_resolution` [datafusion]

2024-10-09 Thread via GitHub
jayzhan211 opened a new pull request, #12839: URL: https://github.com/apache/datafusion/pull/12839 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-09 Thread via GitHub
vbarua commented on code in PR #12830: URL: https://github.com/apache/datafusion/pull/12830#discussion_r1794414412 ## datafusion/substrait/tests/testdata/test_plans/intersect.substrait.json: ## @@ -0,0 +1,118 @@ +{ + "relations": [ +{ + "root": { +"input": { +

Re: [PR] fix(substrait): remove optimize calls from substrait consumer [datafusion]

2024-10-09 Thread via GitHub
vbarua commented on code in PR #12800: URL: https://github.com/apache/datafusion/pull/12800#discussion_r1794387461 ## datafusion/substrait/src/logical_plan/consumer.rs: ## @@ -984,30 +984,64 @@ pub async fn from_substrait_rel( /// 1. All fields present in the Substrait schema a

Re: [I] Parse real number literals as the Decimal type [datafusion]

2024-10-09 Thread via GitHub
jayzhan211 commented on issue #12817: URL: https://github.com/apache/datafusion/issues/12817#issuecomment-2403641487 I think the challenge is that we need to make sure decimal is supported, so when we switch from float to decimal, every functions works as expected as well. Existing test mig

Re: [PR] Ballista reloaded - proposed changes to core ballista [datafusion-ballista]

2024-10-09 Thread via GitHub
thinkharderdev commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2403631898 I think it makes a lot of sense -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] chore: Bump arrow-rs to 53.1.0 and datafusion [datafusion-comet]

2024-10-09 Thread via GitHub
jayzhan211 commented on PR #1001: URL: https://github.com/apache/datafusion-comet/pull/1001#issuecomment-2403628342 We need to support Struct, but I'm not sure whether we should discard dictionary. -- This is an automated message from the Apache Git Service. To respond to the message, p

Re: [PR] chore: Bump arrow-rs to 53.1.0 and datafusion [datafusion-comet]

2024-10-09 Thread via GitHub
kazuyukitanimura commented on PR #1001: URL: https://github.com/apache/datafusion-comet/pull/1001#issuecomment-2403621922 Just realized that `type_union_resolution_coercion` not handling struct... -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] Improve description of function migration [datafusion]

2024-10-09 Thread via GitHub
alamb merged PR #12743: URL: https://github.com/apache/datafusion/pull/12743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-09 Thread via GitHub
tokoko commented on code in PR #12830: URL: https://github.com/apache/datafusion/pull/12830#discussion_r1794381211 ## datafusion/substrait/tests/testdata/test_plans/intersect.substrait.json: ## @@ -0,0 +1,118 @@ +{ + "relations": [ +{ + "root": { +"input": { +

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
codecov-commenter commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403600852 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1007?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] feat(substrait): add intersect support to consumer [datafusion]

2024-10-09 Thread via GitHub
vbarua commented on code in PR #12830: URL: https://github.com/apache/datafusion/pull/12830#discussion_r1794316654 ## datafusion/substrait/tests/testdata/test_plans/intersect.substrait.json: ## @@ -0,0 +1,118 @@ +{ + "relations": [ +{ + "root": { +"input": { +

Re: [PR] chore: Bump arrow-rs to 53.1.0 and datafusion [datafusion-comet]

2024-10-09 Thread via GitHub
kazuyukitanimura commented on PR #1001: URL: https://github.com/apache/datafusion-comet/pull/1001#issuecomment-2403549408 @alamb @jayzhan211 I am still investigating but looks like there is a regression in DataFusion related to https://github.com/apache/datafusion/pull/12753 ```

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794312492 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .creat

[PR] perf: Enable replaceSortMergeJoin by default [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove opened a new pull request, #1008: URL: https://github.com/apache/datafusion-comet/pull/1008 ## Which issue does this PR close? Follows on from https://github.com/apache/datafusion-comet/pull/1007 ## Rationale for this change ## What changes are i

Re: [PR] chore: Reserve memory for native shuffle writer per partition [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on code in PR #988: URL: https://github.com/apache/datafusion-comet/pull/988#discussion_r1794297714 ## native/core/src/execution/datafusion/shuffle_writer.rs: ## @@ -1504,25 +1586,65 @@ mod test { #[test] #[cfg_attr(miri, ignore)] // miri can't call fo

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
parthchandra commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794297212 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .cr

Re: [PR] feat: Implement bloom_filter_agg [datafusion-comet]

2024-10-09 Thread via GitHub
mbutrovich commented on PR #987: URL: https://github.com/apache/datafusion-comet/pull/987#issuecomment-2403495574 Can't say I see a huge different in TPC-H or TPC-DS locally, but the plans I looked at were typically building filters over very small relations. -- This is an automated messa

Re: [PR] Add support for quantified comparison predicates (ALL/ANY/SOME) [datafusion-sqlparser-rs]

2024-10-09 Thread via GitHub
alamb commented on PR #1459: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1459#issuecomment-2403489408 Thanks again @yoavcloud -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Add support for quantified comparison predicates (ALL/ANY/SOME) [datafusion-sqlparser-rs]

2024-10-09 Thread via GitHub
alamb merged PR #1459: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1459 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [PR] chore: Reserve memory for native shuffle writer per partition [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on PR #988: URL: https://github.com/apache/datafusion-comet/pull/988#issuecomment-2403479002 Okay, it is the error I expected before: ``` ret: Err(ArrowError(ExternalError(IoError(Custom { kind: Uncategorized, error: PathError { path: "/var/folders/t_/mmhnh941511_

Re: [I] `datafusion-query-cache` - caching intermediate results for faster repeated queries [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12779: URL: https://github.com/apache/datafusion/issues/12779#issuecomment-2403472521 > I'd love to donate the project to datafusion-contrib Done -- https://github.com/datafusion-contrib/datafusion-query-cache -- This is an automated message from the Apache

Re: [I] [DISCUSSION] Make DataFusion the fastest engine for querying parquet data in ClickBench [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12821: URL: https://github.com/apache/datafusion/issues/12821#issuecomment-2403467422 > And I think maybe we can make clearer about when partial can help, and when partial will even get slower? In my mind the challenge with tweaking the "switch to partial mod

Re: [PR] Ballista reloaded - proposed changes to core ballista [datafusion-ballista]

2024-10-09 Thread via GitHub
alamb commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-240345 I agree that tracking the change in the overall goal as a ticket would be very helpful -- both for tracking follow on items as @Dandandan says as well as ensuring we have the v

Re: [PR] chore: Reserve memory for native shuffle writer per partition [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on PR #988: URL: https://github.com/apache/datafusion-comet/pull/988#issuecomment-2403452537 Hmm, these tests for large partition number shuffle fail on MacOS runners only. And no stack trace...But I cannot reproduce it locally. -- This is an automated message from the Ap

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794242814 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .createWi

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794243470 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .createWi

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794243470 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .createWi

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403440894 Current benchmarks: ![tpch_allqueries](https://github.com/user-attachments/assets/36950a1b-40e0-46db-a476-287cfbd59909) Speedup of using HashJoin instead of SortM

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794237336 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .creat

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
viirya commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1794234836 ## common/src/main/scala/org/apache/comet/CometConf.scala: ## @@ -263,6 +263,12 @@ object CometConf extends ShimCometConf { .booleanConf .createWi

[PR] Minor: Small comment changes in sql folder [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n opened a new pull request, #12838: URL: https://github.com/apache/datafusion/pull/12838 ## Which issue does this PR close? Closes #. ## Rationale for this change I was learning the codebase (sql folder) and thought, "might as well do some light clean up"

Re: [PR] Fix convert_to_state bug in `GroupsAccumulatorAdapter` [datafusion]

2024-10-09 Thread via GitHub
alamb commented on PR #12834: URL: https://github.com/apache/datafusion/pull/12834#issuecomment-2403421853 This seems like a non controversial bug fix, so merging it in -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Fix convert_to_state bug in `GroupsAccumulatorAdapter` [datafusion]

2024-10-09 Thread via GitHub
alamb commented on PR #12834: URL: https://github.com/apache/datafusion/pull/12834#issuecomment-2403422042 Thanks @Dandandan for the review -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spe

Re: [I] Error in `min/max` queries: InvalidArgumentError("number of columns(1) must match number of fields(2) in schema" [datafusion]

2024-10-09 Thread via GitHub
alamb closed issue #12833: Error in `min/max` queries: InvalidArgumentError("number of columns(1) must match number of fields(2) in schema" URL: https://github.com/apache/datafusion/issues/12833 -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2403418881 https://github.com/apache/datafusion-sqlparser-rs/pulls -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2403418736 Given this is a stack overflow in sqlparser, perhaps we can move the ticket there and try to solve it in the sqlparser library? -- This is an automated message from the Apache G

Re: [PR] Fix convert_to_state bug in `GroupsAccumulatorAdapter` [datafusion]

2024-10-09 Thread via GitHub
alamb merged PR #12834: URL: https://github.com/apache/datafusion/pull/12834 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Make HashJoinExec::join_schema public [datafusion]

2024-10-09 Thread via GitHub
alamb merged PR #12807: URL: https://github.com/apache/datafusion/pull/12807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Physical optimizers cannot rewrite `HashJoinExec` due to `join_schema` being private [datafusion]

2024-10-09 Thread via GitHub
alamb closed issue #12806: Physical optimizers cannot rewrite `HashJoinExec` due to `join_schema` being private URL: https://github.com/apache/datafusion/issues/12806 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2403410112 > I think it's better to open another issue. I would like this issue to focus on reducing stack usage. I agree -- This is an automated message from the Apache Git Service

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2403409783 The sqlparser library already supports limiting recursion: https://docs.rs/sqlparser/latest/sqlparser/parser/struct.Parser.html#method.with_recursion_limit Which is a pretty

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
parthchandra commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403336632 > if there is spilling, then SMJ can frequently lead to better performance I have seen this happen with Spark with some TPC-DS queries at SF10. -- This is an automated

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
parthchandra commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403334431 There is a small danger in enabling this without having a good estimate of the size of the build side. ShuffleHashJoin has limits on how much data it can process efficientl

Re: [I] Update supported Spark and Java versions in installation guide [datafusion-comet]

2024-10-09 Thread via GitHub
zemin-piao commented on issue #742: URL: https://github.com/apache/datafusion-comet/issues/742#issuecomment-2403332316 Hey folks, I installed the jar https://mvnrepository.com/artifact/org.apache.datafusion/comet-parent-spark3.5_2.12/0.3.0 on my cluster where we run spark 3.5

Re: [PR] Improve description of function migration [datafusion]

2024-10-09 Thread via GitHub
Omega359 commented on PR #12743: URL: https://github.com/apache/datafusion/pull/12743#issuecomment-2403317712 lgtm as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

[PR] Fix panic on wrong number of arguments to substr [datafusion]

2024-10-09 Thread via GitHub
eejbyfeldt opened a new pull request, #12837: URL: https://github.com/apache/datafusion/pull/12837 ## Which issue does this PR close? Closes #12699. ## Rationale for this change Fixes a panic. ## What changes are included in this PR? Adds guard that

Re: [PR] Wordsmith project description [datafusion]

2024-10-09 Thread via GitHub
Omega359 commented on code in PR #12778: URL: https://github.com/apache/datafusion/pull/12778#discussion_r1794145593 ## README.md: ## @@ -44,8 +44,10 @@ DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. -The DataFusi

Re: [PR] Add support for quantified comparison predicates (ALL/ANY/SOME) [datafusion-sqlparser-rs]

2024-10-09 Thread via GitHub
yoavcloud commented on PR #1459: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1459#issuecomment-2403273336 @alamb can you please trigger the workflow again? Fixed an issue with the URLs in the docs -- This is an automated message from the Apache Git Service. To respond to

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1793944208 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark3_5/q16/simplified.txt: ## @@ -1,44 +1,59 @@ -WholeStageCodegen (2) +WholeStageCode

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2403186661 > We do already have the https://github.com/apache/datafusion/tree/branch-42 branch Sorry - I have removed the maint-42 branch Let's make the PRs to the branch-42 bra

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#issuecomment-2403021870 Here is a teaser for the performance improvement. This is for TPC-H q11 with broadcast joins disabled (I am looking into a regression with those). I ran the query 5 times each

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-09 Thread via GitHub
andygrove commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2403006661 Here is some info on the current branching policy: https://github.com/apache/datafusion/tree/main/dev/release#branching-policy -- This is an automated message from the Apach

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-09 Thread via GitHub
andygrove commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2403003968 > I have created the https://github.com/apache/datafusion/tree/maint-42.x branch > > It would be super helpful if someone could make cherry-pick PR(s) with the changes

Re: [I] Aggregation fuzz testing [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12114: URL: https://github.com/apache/datafusion/issues/12114#issuecomment-2402995970 I will try and do so over the next few days. Thanks @Rachelint -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] chore: Reserve memory for native shuffle writer per partition [datafusion-comet]

2024-10-09 Thread via GitHub
codecov-commenter commented on PR #988: URL: https://github.com/apache/datafusion-comet/pull/988#issuecomment-2402979135 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/988?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] Cleanup TODO in recursive unnest [datafusion]

2024-10-09 Thread via GitHub
duongcongtoai commented on code in PR #12836: URL: https://github.com/apache/datafusion/pull/12836#discussion_r1793954013 ## datafusion/expr/src/logical_plan/plan.rs: ## @@ -3300,39 +3300,6 @@ pub enum Partitioning { DistributeBy(Vec), } -/// Represents the unnesting ope

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2402969956 > Is there still interest in the idea of maintaining a LTS version of DataFusion? Would 42 be a good foundation for that? I can help with this, but I am already stretched pr

Re: [I] Release DataFusion 42.1.0 [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12813: URL: https://github.com/apache/datafusion/issues/12813#issuecomment-2402973725 I have created the https://github.com/apache/datafusion/tree/maint-42.x branch It would be super helpful if someone could make cherry-pick PR(s) with the changes mentioned

[PR] Cleanup TODO in recursive unnest [datafusion]

2024-10-09 Thread via GitHub
duongcongtoai opened a new pull request, #12836: URL: https://github.com/apache/datafusion/pull/12836 ## Which issue does this PR close? Cleanup #11577 Closes #. ## Rationale for this change ## What changes are included in this PR? ## Ar

Re: [PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove commented on code in PR #1007: URL: https://github.com/apache/datafusion-comet/pull/1007#discussion_r1793944208 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark3_5/q16/simplified.txt: ## @@ -1,44 +1,59 @@ -WholeStageCodegen (2) +WholeStageCode

Re: [PR] Improve description of function migration [datafusion]

2024-10-09 Thread via GitHub
alamb commented on PR #12743: URL: https://github.com/apache/datafusion/pull/12743#issuecomment-2402962232 Thank you for the review @comphead šŸ™ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Support creating tables via SQL with `FixedSizeList` column (e.g. `a int[3]`) [datafusion]

2024-10-09 Thread via GitHub
alamb merged PR #12810: URL: https://github.com/apache/datafusion/pull/12810 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Create fixed size list table with syntax [] [datafusion]

2024-10-09 Thread via GitHub
alamb closed issue #10303: Create fixed size list table with syntax [] URL: https://github.com/apache/datafusion/issues/10303 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Improve AggregationFuzzer error reporting [datafusion]

2024-10-09 Thread via GitHub
Rachelint commented on code in PR #12832: URL: https://github.com/apache/datafusion/pull/12832#discussion_r1793942272 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs: ## @@ -237,45 +252,53 @@ struct AggregationFuzzTestTask { } impl AggregationFuzzTestTask {

Re: [PR] Support DictionaryString for Regex matching operators [datafusion]

2024-10-09 Thread via GitHub
alamb commented on code in PR #12768: URL: https://github.com/apache/datafusion/pull/12768#discussion_r1793941960 ## datafusion/expr-common/src/operator.rs: ## @@ -164,6 +164,15 @@ impl Operator { ) } +/// Return true if the comparison operator can be used in

Re: [PR] Support creating tables via SQL with `FixedSizeList` column (e.g. `a int[3]`) [datafusion]

2024-10-09 Thread via GitHub
jandremarais commented on PR #12810: URL: https://github.com/apache/datafusion/pull/12810#issuecomment-2402948711 > Very nice @jandremarais šŸ‘ > > Thank you! Thank you for your friendly support. Looking forward to contribute more -- This is an automated message from the Apache

[PR] perf: Add option to replace SortMergeJoin with ShuffledHashJoin [datafusion-comet]

2024-10-09 Thread via GitHub
andygrove opened a new pull request, #1007: URL: https://github.com/apache/datafusion-comet/pull/1007 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/1006 ## Rationale for this change Improved performance

Re: [PR] Retry apt-get and rustup on CI [datafusion]

2024-10-09 Thread via GitHub
comphead merged PR #12714: URL: https://github.com/apache/datafusion/pull/12714 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] `Setup rust toolchain` build step is flaky [datafusion]

2024-10-09 Thread via GitHub
comphead closed issue #12713: `Setup rust toolchain` build step is flaky URL: https://github.com/apache/datafusion/issues/12713 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Stack overflow with LEAD and LAG functions [datafusion]

2024-10-09 Thread via GitHub
iilyak commented on issue #12731: URL: https://github.com/apache/datafusion/issues/12731#issuecomment-2402887951 Another option is to make parser tail recursive and annotate the function using https://docs.rs/tailcall/latest/tailcall/ -- This is an automated message from the Apache Git Se

Re: [I] Performance: Add "read strings as binary" option for parquet [datafusion]

2024-10-09 Thread via GitHub
goldmedal commented on issue #12788: URL: https://github.com/apache/datafusion/issues/12788#issuecomment-2402885802 > BTW thinking more about this, I do think we need to support the cast, but in this PR we should effectively change the _file_ schema (not just the table schema) when we setup

Re: [PR] Add support for quantified comparison predicates (ALL/ANY/SOME) [datafusion-sqlparser-rs]

2024-10-09 Thread via GitHub
coveralls commented on PR #1459: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1459#issuecomment-2402878572 ## Pull Request Test Coverage Report for [Build 11239890912](https://coveralls.io/builds/70253361) ### Warning: This coverage report may be inaccurate. Thi

Re: [PR] Ballista reloaded - a proposed changes to core ballista [datafusion-ballista]

2024-10-09 Thread via GitHub
Dandandan commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2402875987 It would be nice to create a few tickets from this, e.g. supporting ballista in datafusion python, creating a contrib project for the ballista UI, Flight SQL etc. -- Thi

Re: [I] Unify the error handling for the RecordBatchStream [datafusion]

2024-10-09 Thread via GitHub
alamb commented on issue #12641: URL: https://github.com/apache/datafusion/issues/12641#issuecomment-2402869545 > CollectErrorThenEmitStream wrapper to avoid the repeated logic. Users of the datafusion may have to use this stream in many places. When a user wants to modify the error handlin

Re: [PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n commented on code in PR #12823: URL: https://github.com/apache/datafusion/pull/12823#discussion_r1793845419 ## datafusion/functions-aggregate-common/src/tdigest.rs: ## @@ -641,10 +641,19 @@ impl TDigest { v => panic!("invalid centroids type {v:?}"),

Re: [PR] Improve AggregationFuzzer error reporting [datafusion]

2024-10-09 Thread via GitHub
alamb commented on code in PR #12832: URL: https://github.com/apache/datafusion/pull/12832#discussion_r1793874972 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs: ## @@ -132,7 +133,17 @@ struct QueryGroup { } impl AggregationFuzzer { +/// Run the fuzzer,

Re: [PR] Ballista reloaded - a proposed changes to core ballista [datafusion-ballista]

2024-10-09 Thread via GitHub
Dandandan commented on PR #1066: URL: https://github.com/apache/datafusion-ballista/pull/1066#issuecomment-2402869185 I think it makes a lot of sense to reduce the size of Ballista in order to keep maintaining it. I think it also makes sense to move python support to Python DataFusion (to

Re: [PR] Improve AggregationFuzzer error reporting [datafusion]

2024-10-09 Thread via GitHub
alamb commented on code in PR #12832: URL: https://github.com/apache/datafusion/pull/12832#discussion_r1793876484 ## datafusion/core/tests/fuzz_cases/aggregation_fuzzer/fuzzer.rs: ## @@ -237,45 +252,53 @@ struct AggregationFuzzTestTask { } impl AggregationFuzzTestTask { -

Re: [PR] Convert `rank` / `dense_rank` and `percen_rank` builtin functions to UDWF [datafusion]

2024-10-09 Thread via GitHub
jatin510 commented on PR #12718: URL: https://github.com/apache/datafusion/pull/12718#issuecomment-2402859905 Thanks, @alamb and @jcsherin, for all your help! You both are awesome! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] Fix: approx_percentile_cont_with_weight Panic [datafusion]

2024-10-09 Thread via GitHub
jonathanc-n commented on code in PR #12823: URL: https://github.com/apache/datafusion/pull/12823#discussion_r1793845419 ## datafusion/functions-aggregate-common/src/tdigest.rs: ## @@ -641,10 +641,19 @@ impl TDigest { v => panic!("invalid centroids type {v:?}"),

Re: [PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-09 Thread via GitHub
adriangb commented on code in PR #12835: URL: https://github.com/apache/datafusion/pull/12835#discussion_r1793869493 ## datafusion/core/src/physical_optimizer/pruning.rs: ## @@ -478,6 +478,31 @@ pub struct PruningPredicate { literal_guarantees: Vec, } +/// Hook to handle

Re: [PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-09 Thread via GitHub
adriangb commented on code in PR #12835: URL: https://github.com/apache/datafusion/pull/12835#discussion_r1793869081 ## datafusion/core/src/physical_optimizer/pruning.rs: ## @@ -1315,24 +1351,23 @@ const MAX_LIST_VALUE_SIZE_REWRITE: usize = 20; /// Translate logical filter expr

[PR] Make PruningPredicate's rewrite public [datafusion]

2024-10-09 Thread via GitHub
adriangb opened a new pull request, #12835: URL: https://github.com/apache/datafusion/pull/12835 Replaces https://github.com/apache/datafusion/pull/12606 As per https://github.com/apache/datafusion/pull/12606#issuecomment-2392303014 the plan was to split the rewrite logic from the re

  1   2   >