Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
alihan-synnada commented on code in PR #12531: URL: https://github.com/apache/datafusion/pull/12531#discussion_r1768086606 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -456,21 +458,72 @@ struct NestedLoopJoinStream { // null_equals_null: bool /// Jo

[I] Implement nested expression support in Substrait [datafusion]

2024-09-19 Thread via GitHub
EpsilonPrime opened a new issue, #12541: URL: https://github.com/apache/datafusion/issues/12541 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like Here's a sample Substrait plan which uses a nested expression:

Re: [I] add documentation how to read a list of files [datafusion-python]

2024-09-19 Thread via GitHub
djouallah closed issue #535: add documentation how to read a list of files URL: https://github.com/apache/datafusion-python/issues/535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Show documentation how to use Delta table [datafusion-python]

2024-09-19 Thread via GitHub
djouallah closed issue #414: Show documentation how to use Delta table URL: https://github.com/apache/datafusion-python/issues/414 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-19 Thread via GitHub
findepi commented on PR #12536: URL: https://github.com/apache/datafusion/pull/12536#issuecomment-2362708504 Thank you @notfilippo for working on this! > `ScalarValue::Utf8View` and `ScalarValue::LargeUtf8` while keeping `ScalarValue::Utf8`) in order to keep track of the represented t

[PR] Produce informative error message on insert plan type mismatch [datafusion]

2024-09-19 Thread via GitHub
findepi opened a new pull request, #12540: URL: https://github.com/apache/datafusion/pull/12540 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Add JOB benchmark dataset [1/N] (imdb dataset) [datafusion]

2024-09-19 Thread via GitHub
doupache commented on PR #12497: URL: https://github.com/apache/datafusion/pull/12497#issuecomment-2362691758 Thanks @austin362667 and @alamb. I have updated the PR and learned some Cargo tips from @austin362667. Using debug build during development is much faster. ```

Re: [PR] return absent stats when filters are pushed down [datafusion]

2024-09-19 Thread via GitHub
waruto210 commented on PR #12471: URL: https://github.com/apache/datafusion/pull/12471#issuecomment-2362590513 @alamb PTAL -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-19 Thread via GitHub
nrc commented on PR #12514: URL: https://github.com/apache/datafusion/pull/12514#issuecomment-2362570825 Null handling and tests addressed. It took a tiny bit more code to make `epoch` work. @alamb ready for re-review, thanks! -- This is an automated message from the Apache Git Ser

Re: [PR] Support List type coercion for CASE-WHEN-THEN expression [datafusion]

2024-09-19 Thread via GitHub
Weijun-H commented on code in PR #12490: URL: https://github.com/apache/datafusion/pull/12490#discussion_r1767812352 ## datafusion/optimizer/src/analyzer/type_coercion.rs: ## @@ -1811,6 +1811,211 @@ mod test { Ok(()) } +#[test] +fn tes_case_when_list() ->

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-19 Thread via GitHub
andygrove commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2362554275 I ran some benchmarks from @timsaucer's repo where he has upgraded to DataFusion 41. Performance is looking good. ![Screenshot from 2024-09-19 19-55-55](http

Re: [PR] Fix unparse table scan with the projection pushdown [datafusion]

2024-09-19 Thread via GitHub
sgrebnov commented on PR #12534: URL: https://github.com/apache/datafusion/pull/12534#issuecomment-2362539701 @goldmedal , @alamb - the change looks good, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function, add `AggregateUDFImpl::is_null` [datafusion]

2024-09-19 Thread via GitHub
jayzhan211 commented on PR #11989: URL: https://github.com/apache/datafusion/pull/11989#issuecomment-2362482071 > I think (correct me if I'm wrong) that field metadata doesn't actually need to be equivalent for the invariant that this error is trying to catch to be upheld If the meta

Re: [PR] perf: Avoid memcpy during decimal precision check in decimal aggregates (sum and avg) [datafusion-comet]

2024-09-19 Thread via GitHub
mbutrovich commented on PR #952: URL: https://github.com/apache/datafusion-comet/pull/952#issuecomment-2362415067 I was a bit surprised to see a performance win from changing an if-else to a compound boolean expression, since this seems like something that an optimizing compiler should hand

[PR] Fix unparsing offset [datafusion]

2024-09-19 Thread via GitHub
Stazer opened a new pull request, #12539: URL: https://github.com/apache/datafusion/pull/12539 ## Which issue does this PR close? Closes #12538. ## Rationale for this change ## What changes are included in this PR? I have added a missing block for u

[I] Fix unparsing OFFSET [datafusion]

2024-09-19 Thread via GitHub
Stazer opened a new issue, #12538: URL: https://github.com/apache/datafusion/issues/12538 ### Describe the bug Unparsing OFFSET does not work as expected. ### To Reproduce Parse and unparse the query `SELECT 1 OFFSET 95`. ### Expected behavior _No response_

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on PR #12481: URL: https://github.com/apache/datafusion/pull/12481#issuecomment-2362328143 Thanks for the feedback, really appreciate it! I'm happy to help out, I'll start digging for some more issues following this (I had some personal stuff come up, so sorry about leavin

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1767604834 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -232,8 +232,61 @@ impl Hash for CreateExternalTable { } } +impl PartialOrd for CreateExternalTable { Rev

[I] `PartialOrd` for structs with incomparable fields [datafusion]

2024-09-19 Thread via GitHub
ngli-me opened a new issue, #12537: URL: https://github.com/apache/datafusion/issues/12537 ### Is your feature request related to a problem or challenge? > What requires this manual derivation? Is it some other extension trait would need to also require `PartialOrd`? > > I am t

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-19 Thread via GitHub
codecov-commenter commented on PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#issuecomment-2362297875 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/946?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1767604834 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -232,8 +232,61 @@ impl Hash for CreateExternalTable { } } +impl PartialOrd for CreateExternalTable { Rev

Re: [PR] perf: Avoid memcpy during decimal precision check in decimal aggregates (sum and avg) [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove commented on PR #952: URL: https://github.com/apache/datafusion-comet/pull/952#issuecomment-2362244962 Comparison between main branch and this PR: ![tpch_allqueries](https://github.com/user-attachments/assets/688ad559-81f7-4465-9b1e-c49a018d3428) ![tpch_queries_speed

Re: [I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-09-19 Thread via GitHub
parthchandra commented on issue #567: URL: https://github.com/apache/datafusion-comet/issues/567#issuecomment-2362243267 Yes, let's close this. We can revisit this if more people report it. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] feat: Publish artifacts to maven [datafusion-comet]

2024-09-19 Thread via GitHub
parthchandra commented on PR #946: URL: https://github.com/apache/datafusion-comet/pull/946#issuecomment-2362220409 @andygrove This is ready to be tried out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1767607106 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -284,6 +346,15 @@ pub struct CreateCatalogSchema { pub schema: DFSchemaRef, } +impl PartialOrd for Create

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1767604834 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -232,8 +232,61 @@ impl Hash for CreateExternalTable { } } +impl PartialOrd for CreateExternalTable { Rev

Re: [PR] PartialOrd for Expr and sub fields/structs [datafusion]

2024-09-19 Thread via GitHub
ngli-me commented on code in PR #12481: URL: https://github.com/apache/datafusion/pull/12481#discussion_r1767604101 ## datafusion/expr/src/logical_plan/ddl.rs: ## @@ -232,8 +232,61 @@ impl Hash for CreateExternalTable { } } +impl PartialOrd for CreateExternalTable { Rev

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-19 Thread via GitHub
andygrove commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2362194332 @austin362667 I created the new repository: https://github.com/apache/datafusion-ray Could you open a draft PR against this repository to add the code including ASF

Re: [I] Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS [datafusion]

2024-09-19 Thread via GitHub
bgjackma commented on issue #5647: URL: https://github.com/apache/datafusion/issues/5647#issuecomment-2362160489 Thanks for the heads up, that's helpful however I think I have a slightly nicer solution that doesn't force changes on existing accumulators. -- This is an automated message fr

Re: [I] Support Grouping functions with Group By CUBE/ROLLUP/GROUPING SETS [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #5647: URL: https://github.com/apache/datafusion/issues/5647#issuecomment-2362125251 Note there is a PR with a proposed implementation from @JasonLi-cn in https://github.com/apache/datafusion/pull/10208 -- This is an automated message from the Apache Git Service.

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-19 Thread via GitHub
nrc commented on code in PR #12514: URL: https://github.com/apache/datafusion/pull/12514#discussion_r1767512808 ## datafusion/functions/src/datetime/date_part.rs: ## @@ -223,9 +239,17 @@ fn seconds(array: &dyn Array, unit: TimeUnit) -> Result { let secs = as_int32_array(se

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-19 Thread via GitHub
nrc commented on code in PR #12514: URL: https://github.com/apache/datafusion/pull/12514#discussion_r1767512124 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -1472,6 +1472,135 @@ SELECT extract(epoch from arrow_cast('1969-12-31', 'Date64')) -86400 +# test_extra

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2362034533 One possibility might be to try and arrange something colocated with CIDR in Amsterdam https://www.cidrdb.org/cidr2025/ -- there might be many people in town already that could b

Re: [PR] Add JOB benchmark dataset [1/N] (imdb dataset) [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12497: URL: https://github.com/apache/datafusion/pull/12497#issuecomment-2362019502 Thanks @doupache -- I started the CI jobs, and I will try and test this out manually locally over the next few days -- This is an automated message from the Apache Git Service.

Re: [PR] Automate sqllogictest for String, LargeString and StringView behavior [datafusion]

2024-09-19 Thread via GitHub
alamb commented on code in PR #12525: URL: https://github.com/apache/datafusion/pull/12525#discussion_r1767493120 ## datafusion/sqllogictest/test_files/string/string_view.slt: ## @@ -0,0 +1,356 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contribut

Re: [I] Create binary releases [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove closed issue #721: Create binary releases URL: https://github.com/apache/datafusion-comet/issues/721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove merged PR #932: URL: https://github.com/apache/datafusion-comet/pull/932 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12536: URL: https://github.com/apache/datafusion/pull/12536#issuecomment-2362010952 I will try and find time to review this tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12536: URL: https://github.com/apache/datafusion/pull/12536#issuecomment-2362010698 FYI @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Fix unparse table scan with the projection pushdown [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12534: URL: https://github.com/apache/datafusion/pull/12534#issuecomment-2362010077 FYI @phillipleblanc and @sgrebnov -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] Support REPLACE INTO for INSERT statements [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12516: URL: https://github.com/apache/datafusion/pull/12516#issuecomment-2362009358 > The purpose of such a change would be so that TableProvider::insert_into could take an op: InsertOp argument instead of two boolean arguments for overwrite and replace_into . I think

Re: [PR] docs: Add more detailed architecture documentation [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove merged PR #922: URL: https://github.com/apache/datafusion-comet/pull/922 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] docs: Add more detailed architecture documentation [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove commented on PR #922: URL: https://github.com/apache/datafusion-comet/pull/922#issuecomment-2362008604 Thanks for the reviews @comphead @mbutrovich -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] support EXTRACT on intervals and durations [datafusion]

2024-09-19 Thread via GitHub
alamb commented on code in PR #12514: URL: https://github.com/apache/datafusion/pull/12514#discussion_r1767487219 ## datafusion/sqllogictest/test_files/expr.slt: ## @@ -1472,6 +1472,135 @@ SELECT extract(epoch from arrow_cast('1969-12-31', 'Date64')) -86400 +# test_ext

Re: [I] Improve performance of high cardinality grouping by reusing hash values [datafusion]

2024-09-19 Thread via GitHub
Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2362005473 > > > Does anyone know what the rationale of having repartition + coalesce, what kind of query benefits from it > > > > > > The primary reason is scalability. Effici

Re: [PR] Bump aws-sdk-sso to 1.43.0, aws-sdk-sts to 1.43.0 and aws-sdk-ssooidc from 1.40.0 to 1.44.0 in /datafusion-cli [datafusion]

2024-09-19 Thread via GitHub
alamb merged PR #12409: URL: https://github.com/apache/datafusion/pull/12409 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] SanityCheckPlan should compare UnionExec inputs to requirements for output (parent). [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12414: URL: https://github.com/apache/datafusion/pull/12414#issuecomment-2361962999 I plan to take over this PR / fix (likely tomorrow) -- I will keep https://github.com/apache/datafusion/issues/12446 updated as well -- This is an automated message from the Apache G

Re: [PR] SanityCheckPlan should compare UnionExec inputs to requirements for output (parent). [datafusion]

2024-09-19 Thread via GitHub
alamb commented on code in PR #12414: URL: https://github.com/apache/datafusion/pull/12414#discussion_r1767463048 ## datafusion/physical-expr/src/equivalence/properties.rs: ## @@ -2720,8 +2799,8 @@ mod tests { Arc::clone(&schema3), )

Re: [PR] SanityCheckPlan should compare UnionExec inputs to requirements for output (parent). [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12414: URL: https://github.com/apache/datafusion/pull/12414#issuecomment-2361953854 I looked at this problem some more (not really the code, but just a smaller reproducer) and left my notes here https://github.com/apache/datafusion/issues/12446#issuecomment-2361896831

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2361950461 I am trying to figure out where the bug is (is it in the union equivalence properties calculation). I am not sure it is. I added some debugging information to my small repr

Re: [PR] Update datafusion protobuf definitions [datafusion-ballista]

2024-09-19 Thread via GitHub
andygrove merged PR #1057: URL: https://github.com/apache/datafusion-ballista/pull/1057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr.

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2361896831 Even simpler: ```sql select * from (select c, a, NULL::int as a0 from t order by a, c) t1 union all select * from (select c, NULL::int as a, a0 from t order by a0, c

Re: [I] SanityChecker rejects certain valid `UNION` plans [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #12446: URL: https://github.com/apache/datafusion/issues/12446#issuecomment-2361894958 I spent some time finding a smaller standalone reproducer ```sql create table t(a0 int, a int, b int, c int) as values (1, 2, 3, 4), (5, 6, 7, 8); select * from (

Re: [PR] Improve flamegraph profiling instructions [datafusion]

2024-09-19 Thread via GitHub
alamb commented on PR #12521: URL: https://github.com/apache/datafusion/pull/12521#issuecomment-2361865663 Thank you @Rachelint and @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
korowa commented on code in PR #12531: URL: https://github.com/apache/datafusion/pull/12531#discussion_r1767370510 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -456,21 +458,72 @@ struct NestedLoopJoinStream { // null_equals_null: bool /// Join execu

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
parthchandra commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1767359468 ## Makefile: ## @@ -46,6 +46,22 @@ format: ./mvnw compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES) ./mvnw spotless:apply $(PR

Re: [I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-09-19 Thread via GitHub
comphead commented on issue #567: URL: https://github.com/apache/datafusion-comet/issues/567#issuecomment-2361840041 Well, the issue still exists, however its related to deprecated Parquet formats where Decimal is represented as BINARY. We probably should mention this in doc that such kind

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function, add `AggregateUDFImpl::is_null` [datafusion]

2024-09-19 Thread via GitHub
itsjunetime commented on PR #11989: URL: https://github.com/apache/datafusion/pull/11989#issuecomment-2361812330 > What is the logical schema and physical schema you have (the error)? I'm getting issues where both schemas are equivalent except for the metadata on the fields. I think (

Re: [PR] build(deps): bump com.google.protobuf:protobuf-java from 3.19.6 to 3.25.5 [datafusion-comet]

2024-09-19 Thread via GitHub
codecov-commenter commented on PR #954: URL: https://github.com/apache/datafusion-comet/pull/954#issuecomment-2361743926 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/954?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove commented on issue #567: URL: https://github.com/apache/datafusion-comet/issues/567#issuecomment-2361649356 @comphead @parthchandra can we close this issue? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] chore: extended explain info can be an object instead of class [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove commented on issue #452: URL: https://github.com/apache/datafusion-comet/issues/452#issuecomment-2361643314 I think we can close this one now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] chore: extended explain info can be an object instead of class [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove closed issue #452: chore: extended explain info can be an object instead of class URL: https://github.com/apache/datafusion-comet/issues/452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Correct build commend in Comet Development Guide [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove closed issue #305: Correct build commend in Comet Development Guide URL: https://github.com/apache/datafusion-comet/issues/305 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [I] Parquet column with integer logical type cannot read as Spark date column [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove commented on issue #44: URL: https://github.com/apache/datafusion-comet/issues/44#issuecomment-2361585685 This issues appears to be resolved, so I will close this issue. Thanks @okue for reporting it. ``` scala> Seq(15901).toDF("dt").write.parquet("/tmp/dt") 24/09/19

Re: [I] Parquet column with integer logical type cannot read as Spark date column [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove closed issue #44: Parquet column with integer logical type cannot read as Spark date column URL: https://github.com/apache/datafusion-comet/issues/44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
viirya commented on PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#issuecomment-2361561256 Looks good to me, with a few minor questions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
viirya commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1767185241 ## dev/release/build-release-comet.sh: ## @@ -0,0 +1,202 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor l

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
viirya commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1767167884 ## Makefile: ## @@ -46,6 +46,22 @@ format: ./mvnw compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES) ./mvnw spotless:apply $(PROFILES

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
viirya commented on code in PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#discussion_r1767163890 ## Makefile: ## @@ -46,6 +46,22 @@ format: ./mvnw compile test-compile scalafix:scalafix -Psemanticdb $(PROFILES) ./mvnw spotless:apply $(PROFILES

Re: [I] 2024 Q3-Q4 Roadmap? [datafusion]

2024-09-19 Thread via GitHub
Abdullahsab3 commented on issue #11442: URL: https://github.com/apache/datafusion/issues/11442#issuecomment-2361442745 > Any chance anyone on this issue (or @Abdullahsab3 ) wants to help organize another European meetup? Yes! My team and I would be interested in helping organize a me

Re: [PR] feat: implement scripts for binary release build [datafusion-comet]

2024-09-19 Thread via GitHub
parthchandra commented on PR #932: URL: https://github.com/apache/datafusion-comet/pull/932#issuecomment-2361465911 @viirya Any further comments? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-19 Thread via GitHub
franklsf95 commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2361460897 > @franklsf95 Would you be able to file an ICLA (unless you already have one)? The instructions are at https://www.apache.org/licenses/contributor-agreements.html >

[PR] build(deps): bump com.google.protobuf:protobuf-java from 3.19.6 to 3.25.5 [datafusion-comet]

2024-09-19 Thread via GitHub
dependabot[bot] opened a new pull request, #954: URL: https://github.com/apache/datafusion-comet/pull/954 Bumps [com.google.protobuf:protobuf-java](https://github.com/protocolbuffers/protobuf) from 3.19.6 to 3.25.5. Release notes Sourced from https://github.com/protocolbuffers/pro

[PR] Bump google-protobuf from 4.26.1 to 4.27.5 [datafusion-site]

2024-09-19 Thread via GitHub
dependabot[bot] opened a new pull request, #27: URL: https://github.com/apache/datafusion-site/pull/27 Bumps [google-protobuf](https://github.com/protocolbuffers/protobuf) from 4.26.1 to 4.27.5. Commits See full diff in https://github.com/protocolbuffers/protobuf/commits";>comp

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-19 Thread via GitHub
austin362667 commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2361443689 The PR [Add ASF license header](https://github.com/datafusion-contrib/ray-sql/pull/50) -- This is an automated message from the Apache Git Service. To respond to the

Re: [I] Unable to register UDAFs using SessionContext's `register_udaf` [datafusion-python]

2024-09-19 Thread via GitHub
timsaucer commented on issue #874: URL: https://github.com/apache/datafusion-python/issues/874#issuecomment-2361418801 Verified. I have a fix I will push up tonight or first thing in the morning. **Thank you for the bug report!** In the mean time, are you able to use the udaf via dat

Re: [I] Proposal to donate Ray SQL to the DataFusion Project (not into the Python subproject) [datafusion-python]

2024-09-19 Thread via GitHub
austin362667 commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2361390265 @andygrove Thanks for updating the title and yes, I can help. I'll sent PR to https://github.com/datafusion-contrib/ray-sql that adds: ``` # Licensed to the Apa

Re: [PR] SanityCheckPlan should compare UnionExec inputs to requirements for output (parent). [datafusion]

2024-09-19 Thread via GitHub
wiedld commented on PR #12414: URL: https://github.com/apache/datafusion/pull/12414#issuecomment-2361387578 > I believe you don't need to add a new logic for sort expr's. You should only focus on the following conversion, when constant expressions appear as the global order of the other chi

Re: [I] Proposal to Introduce Ray SQL into DataFusion Python [datafusion-python]

2024-09-19 Thread via GitHub
andygrove commented on issue #872: URL: https://github.com/apache/datafusion-python/issues/872#issuecomment-2361357793 @franklsf95 Would you be able to file an ICLA (unless you already have one)? The instructions are at https://www.apache.org/licenses/contributor-agreements.html @au

Re: [I] Alias `APPROX_PERCENTILE_CONT` as `PERCENTILE_CONT`? [datafusion]

2024-09-19 Thread via GitHub
alamb commented on issue #12533: URL: https://github.com/apache/datafusion/issues/12533#issuecomment-2361342279 I think the expectation for `PERCENTILE_CONT` is that it will implement an exact calculation -- and to do so the implementation needs to keep all the actual values (e.g. the same

[I] Unable to register UDAFs using SessionContext's `register_udaf` [datafusion-python]

2024-09-19 Thread via GitHub
emanueledomingo opened a new issue, #874: URL: https://github.com/apache/datafusion-python/issues/874 **Describe the bug** During the update of Datafusion from 39 to 41, my script got broken because the `register_udaf` crashes witht he following error: ``` in SessionContext.regi

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
berkaysynnada commented on PR #12531: URL: https://github.com/apache/datafusion/pull/12531#issuecomment-2361305449 cc @korowa, @comphead -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
ozankabak commented on code in PR #12531: URL: https://github.com/apache/datafusion/pull/12531#discussion_r1767011103 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -595,27 +650,17 @@ fn join_left_and_right_batch( column_indices: &[ColumnIndex], schem

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
ozankabak commented on PR #12531: URL: https://github.com/apache/datafusion/pull/12531#issuecomment-2361286349 /benchmark -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-19 Thread via GitHub
notfilippo commented on PR #12536: URL: https://github.com/apache/datafusion/pull/12536#issuecomment-2361270997 The main change is contained in this file: https://github.com/apache/datafusion/pull/12536/files#diff-2cc034babb8e7f8601dda34ecaa2119104eccd76d8ad2f9e19b26b01463634d6 All ot

Re: [PR] feat(function): add greatest function [datafusion]

2024-09-19 Thread via GitHub
comphead commented on PR #12474: URL: https://github.com/apache/datafusion/pull/12474#issuecomment-2361269219 Related to https://github.com/apache/datafusion/issues/6531 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] Replace some usages of `Expr::to_field` with `Expr::qualified_name` [datafusion]

2024-09-19 Thread via GitHub
comphead merged PR #12522: URL: https://github.com/apache/datafusion/pull/12522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Fix NestedLoopJoin performance regression [datafusion]

2024-09-19 Thread via GitHub
alihan-synnada commented on code in PR #12531: URL: https://github.com/apache/datafusion/pull/12531#discussion_r1766996632 ## datafusion/physical-plan/src/joins/nested_loop_join.rs: ## @@ -458,19 +457,47 @@ struct NestedLoopJoinStream { join_metrics: BuildProbeJoinMetrics,

Re: [PR] Expose DataFrame select_exprs method [datafusion]

2024-09-19 Thread via GitHub
comphead commented on PR #12520: URL: https://github.com/apache/datafusion/pull/12520#issuecomment-2361255895 Thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] DataFrame select_exprs [datafusion]

2024-09-19 Thread via GitHub
comphead closed issue #12519: DataFrame select_exprs URL: https://github.com/apache/datafusion/issues/12519 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Expose DataFrame select_exprs method [datafusion]

2024-09-19 Thread via GitHub
comphead merged PR #12520: URL: https://github.com/apache/datafusion/pull/12520 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] tests: Fix typo in config setting name [datafusion]

2024-09-19 Thread via GitHub
alamb merged PR #12535: URL: https://github.com/apache/datafusion/pull/12535 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] perf: Add metric for time spent in CometSparkToColumnarExec [datafusion-comet]

2024-09-19 Thread via GitHub
andygrove merged PR #931: URL: https://github.com/apache/datafusion-comet/pull/931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] feat(planner): Allowing setting sort order of parquet files without specifying the schema [datafusion]

2024-09-19 Thread via GitHub
devanbenz commented on code in PR #12466: URL: https://github.com/apache/datafusion/pull/12466#discussion_r1766966988 ## datafusion/sql/src/statement.rs: ## @@ -1136,14 +1136,29 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { schema: &DFSchemaRef, planner_con

[PR] Introduce `Scalar` type for ColumnarValue [datafusion]

2024-09-19 Thread via GitHub
notfilippo opened a new pull request, #12536: URL: https://github.com/apache/datafusion/pull/12536 This PR represents the first step originating from experiment #11978, which itself stems from the broad objective described in proposal #11513. --- ## Rationale for this change

Re: [PR] Add JOB benchmark dataset [1/N] (imdb dataset) [datafusion]

2024-09-19 Thread via GitHub
austin362667 commented on code in PR #12497: URL: https://github.com/apache/datafusion/pull/12497#discussion_r1766347074 ## benchmarks/src/imdb/convert.rs: ## @@ -0,0 +1,112 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreeme

Re: [PR] feat(planner): Allowing setting sort order of parquet files without specifying the schema [datafusion]

2024-09-19 Thread via GitHub
devanbenz commented on PR #12466: URL: https://github.com/apache/datafusion/pull/12466#issuecomment-2361106309 @alamb I have this working but I'm unsure if the *original* implementation is working as expected. Shouldn't the times be descending in this first selection? ``` > create

[PR] tests: Fix typo in config setting name [datafusion]

2024-09-19 Thread via GitHub
progval opened a new pull request, #12535: URL: https://github.com/apache/datafusion/pull/12535 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function, add `AggregateUDFImpl::is_null` [datafusion]

2024-09-19 Thread via GitHub
phillipleblanc commented on PR #11989: URL: https://github.com/apache/datafusion/pull/11989#issuecomment-2361050053 > Could we keep all the fields for the logical plan to make them consistent? That seems fine to me. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Fix the schema mismatch between logical and physical for aggregate function, add `AggregateUDFImpl::is_null` [datafusion]

2024-09-19 Thread via GitHub
jayzhan211 commented on PR #11989: URL: https://github.com/apache/datafusion/pull/11989#issuecomment-2361034346 I think the ideal way is to have something like `wildcard field` for both logical and physical 🤔 -- This is an automated message from the Apache Git Service. To respond to the

  1   2   >