[PR] Migrate testing optimizer rules [datafusion]

2024-05-20 Thread via GitHub
lewiszlw opened a new pull request, #10576: URL: https://github.com/apache/datafusion/pull/10576 ## Which issue does this PR close? part of https://github.com/apache/datafusion/issues/9637. ## Rationale for this change ## What changes are included in this

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-20 Thread via GitHub
LorrensP-2158466 commented on issue #10571: URL: https://github.com/apache/datafusion/issues/10571#issuecomment-2119941766 I don't see this as a really difficult API change. Is it ok if I do this? -- This is an automated message from the Apache Git Service. To respond to the message, pleas

[PR] Improve ContextProvider [datafusion]

2024-05-20 Thread via GitHub
lewiszlw opened a new pull request, #10577: URL: https://github.com/apache/datafusion/pull/10577 ## Which issue does this PR close? Renaming like `SchemaProvider::table_names`, add docs and remove deprecated code. ## Rationale for this change ## What chan

[PR] Update prost-build requirement from =0.12.4 to =0.12.6 [datafusion]

2024-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #10578: URL: https://github.com/apache/datafusion/pull/10578 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. Commits https://github.com/tokio-rs/prost/commit/d42c85e790263f78f6

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
jonahgao commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1606501759 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [I] API in ParquetExec to pass in RowSelections to `ParquetExec` (enable custom indexes, finer grained pushdown) [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9929: URL: https://github.com/apache/datafusion/issues/9929#issuecomment-2120223824 Update here is that I found it was maybe too large a step to get to the row level access initially -- instead I started with a basic example of building a *file level index* -- htt

Re: [PR] Stop copying LogicalPlan and Exprs in `SingleDistinctToGroupBy` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10527: URL: https://github.com/apache/datafusion/pull/10527#discussion_r1606615253 ## datafusion/optimizer/src/single_distinct_to_groupby.rs: ## @@ -131,177 +126,190 @@ fn contains_grouping_set(expr: &[Expr]) -> bool { impl OptimizerRule for Singl

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-20 Thread via GitHub
peter-toth commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2120329963 I'm still working on an alternative to this PR and will need a couple of more days to test a few different ideas... -- This is an automated message from the Apache Git Service.

Re: [PR] Add reference visitor `TreeNode` APIs [datafusion]

2024-05-20 Thread via GitHub
ozankabak commented on PR #10543: URL: https://github.com/apache/datafusion/pull/10543#issuecomment-2120338004 No worries. Will be happy to review and help iterate once you are ready -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] feat: Implement Spark-compatible CAST from String to Date [datafusion-comet]

2024-05-20 Thread via GitHub
vidyasankarv commented on PR #383: URL: https://github.com/apache/datafusion-comet/pull/383#issuecomment-2120433348 https://github.com/apache/datafusion-comet/suites/23883332179/logs?attempt=2 In the logs for ubuntu-latest/java 17-spark-3.4-scala-2.12/java - which included the fuzz te

Re: [I] UserDefindedLogicalNode::from_template does not return a Result<...>. [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10571: URL: https://github.com/apache/datafusion/issues/10571#issuecomment-2120463462 > I don't see this as a really difficult API change. Is it ok if I do this? Edit: there is a PR already,did not see it, sorry. @lewiszlw beats us to it! (BTW

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
therealsharath commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120466027 @dependabot rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
dependabot[bot] commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120466102 Sorry, only users with push access can use that command. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-05-20 Thread via GitHub
therealsharath commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2120467630 Hello, is it possible to merge this in because https://github.com/apache/arrow-rs/issues/5589 was fixed in object store `0.10.1`. Thanks! -- This is an automated messag

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-20 Thread via GitHub
aditanase commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2120467689 I was recently trying to query the NYC dataset from ballista. Path looks something like https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2024-01.parquet

Re: [PR] Implement Unparse `GroupingSet` Expr --> String Support sql [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10555: URL: https://github.com/apache/datafusion/pull/10555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `GroupingSet` Expr --> String Support [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10521: `GroupingSet` Expr --> String Support URL: https://github.com/apache/datafusion/issues/10521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [I] Complete support for `Expr --> String ` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9726: URL: https://github.com/apache/datafusion/issues/9726#issuecomment-2120522626 With the completion of https://github.com/apache/datafusion/pull/10555 from @xinlifoobar I think this epic is now done! -- This is an automated message from the Apache Git Service

Re: [I] Complete support for `Expr --> String ` [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #9726: Complete support for `Expr --> String ` URL: https://github.com/apache/datafusion/issues/9726 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-20 Thread via GitHub
viirya merged PR #445: URL: https://github.com/apache/datafusion-comet/pull/445 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: improve fallback message when comet native shuffle is not enabled [datafusion-comet]

2024-05-20 Thread via GitHub
viirya commented on PR #445: URL: https://github.com/apache/datafusion-comet/pull/445#issuecomment-2120582785 Merged. Thanks @andygrove @advancedxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1601980707 ## datafusion/substrait/src/logical_plan/producer.rs: ## @@ -165,6 +168,53 @@ pub fn to_substrait_rel( }))), })) } +

Re: [PR] test: parametrize test_array_functions [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on PR #678: URL: https://github.com/apache/datafusion-python/pull/678#issuecomment-2120634135 @andygrove could we merge this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
viirya commented on code in PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2#discussion_r1606905078 ## tpch/tpchgen.py: ## @@ -0,0 +1,89 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606905348 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` row

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
comphead commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2120655736 > I've seen some issues in this patch. It doesn't look like a correct fix. The tests currently in sync with what hash join returns, is there a test showing the opposite? --

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606910619 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606910619 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [I] DataFusion to run SQL queries on Parquet files with error No suitable object store found for file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #9280: URL: https://github.com/apache/datafusion/issues/9280#issuecomment-2120680185 @aditanase how are you running the external statement? It seems to work well from `datafusion-cli` ```shell andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606924344 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1365,6 +1402,69 @@ fn get_filter_column( filter_columns } +/// Get `buffered_indices` rows

Re: [PR] Minor: Move proxy to datafusion common [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10561: URL: https://github.com/apache/datafusion/pull/10561 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `GroupingSet` Expr --> String Support [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10521: URL: https://github.com/apache/datafusion/issues/10521#issuecomment-2120689989 Thanks again @xinlifoobar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606931001 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . +

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
Michael-J-Ward commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606931001 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . +

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606942756 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606944266 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10304: URL: https://github.com/apache/datafusion/pull/10304#discussion_r1606948252 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -989,8 +996,21 @@ impl SMJStream { } } Ordering::Equal =>

[I] DataFusion weekly project plan (Andrew Lamb) - May 20, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10579: URL: https://github.com/apache/datafusion/issues/10579 Follow on to https://github.com/apache/datafusion/issues/10482 My (personal) North ⭐ : 1000 projects are built using DataFusion 📈 **It would be great for other contributors to DataFusion wh

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10482: URL: https://github.com/apache/datafusion/issues/10482#issuecomment-2120727825 Next week: https://github.com/apache/datafusion/issues/10579 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10482: DataFusion weekly project plan (Andrew Lamb) - May 13, 2024 URL: https://github.com/apache/datafusion/issues/10482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix: Sort Merge Join LeftSemi issues when JoinFilter is set [datafusion]

2024-05-20 Thread via GitHub
viirya commented on PR #10304: URL: https://github.com/apache/datafusion/pull/10304#issuecomment-2120729543 > I've seen some issues in this patch. It doesn't look like a correct fix. Took another look. Looks okay to me. -- This is an automated message from the Apache Git Ser

[I] Advanced example for building an external index for Row Groups *within* parquet files [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10580: URL: https://github.com/apache/datafusion/issues/10580 ### Is your feature request related to a problem or challenge? It is common in databases and other analytic system to have additional external "indexes" (perhaps stored in the "metadata catalog",

Re: [PR] fix double blog path [datafusion-site]

2024-05-20 Thread via GitHub
alamb merged PR #3: URL: https://github.com/apache/datafusion-site/pull/3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.

Re: [PR] fix double blog path [datafusion-site]

2024-05-20 Thread via GitHub
alamb commented on PR #3: URL: https://github.com/apache/datafusion-site/pull/3#issuecomment-2120752894 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e

Re: [I] Using `Expr::field` panics [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10565: URL: https://github.com/apache/datafusion/issues/10565#issuecomment-2120754098 Thank you @jayzhan211 🙏 -- I will review it now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Remove `Expr::GetIndexedField` and fix panic of `field`, `index` and `range` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1606971468 ## datafusion/core/tests/expr_api/mod.rs: ## @@ -61,7 +63,7 @@ fn test_eq_with_coercion() { #[test] fn test_get_field() { evaluate_expr_test( -get_fie

Re: [PR] Remove `Expr::GetIndexedField`, replace `Expr::{field,index,range}` with `FieldAccessor`, `IndexAccessor`, and `SliceAccessor` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10568: URL: https://github.com/apache/datafusion/pull/10568#discussion_r1606978717 ## datafusion/functions/src/core/expr_ext.rs: ## @@ -0,0 +1,68 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agre

Re: [PR] Support Substrait's VirtualTables [datafusion]

2024-05-20 Thread via GitHub
Blizzara commented on code in PR #10531: URL: https://github.com/apache/datafusion/pull/10531#discussion_r1606981891 ## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ## @@ -607,6 +608,15 @@ async fn qualified_catalog_schema_table_reference() -> Result<()> { r

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2120775312 https://github.com/apache/datafusion/pull/10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade. -- This is an automated mes

Re: [I] [EPIC] JIT support for `DataFusion` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #2703: URL: https://github.com/apache/datafusion/issues/2703#issuecomment-2120780299 Hi @leoluan2009 In my opinion, I don't think DataFusion needs JIT to get good performance. In general, I find the paper ["Everything You Always Wanted to Know About C

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
timsaucer commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606991649 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . + +

Re: [PR] Tsaucer/prepare tpch examples for ci [datafusion-python]

2024-05-20 Thread via GitHub
timsaucer commented on code in PR #710: URL: https://github.com/apache/datafusion-python/pull/710#discussion_r1606991649 ## .github/workflows/test.yaml: ## @@ -111,3 +134,9 @@ jobs: source venv/bin/activate pip install -e . -vv pytest -v . + +

Re: [PR] build(deps): upgrade sqlparser to 0.46.0 [datafusion]

2024-05-20 Thread via GitHub
tisonkun commented on code in PR #10392: URL: https://github.com/apache/datafusion/pull/10392#discussion_r1605950326 ## datafusion/sqllogictest/test_files/array.slt: ## Review Comment: Can be a bug after the JSON path parse changes - https://github.com/sqlparser-rs/sqlpars

Re: [I] Make SQL strings generated from `Expr`s "prettier" [datafusion]

2024-05-20 Thread via GitHub
tisonkun commented on issue #10557: URL: https://github.com/apache/datafusion/issues/10557#issuecomment-2120801283 > #10392 is the upgrade to sqlparser -- I think it is pretty close but @tisonkun hit an issue during upgrade. We may need a 0.46.1 for resolving the regressions: *

[I] chore: extended explain info can be an object instead of class [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra opened a new issue, #452: URL: https://github.com/apache/datafusion-comet/issues/452 ### Describe the bug ExtendedExplainInfo is declared as a class, but it can be an object instead. ### Steps to reproduce _No response_ ### Expected behavior _N

Re: [PR] feat: Add logging to explain reasons for Comet not being able to run a query stage natively [datafusion-comet]

2024-05-20 Thread via GitHub
parthchandra commented on code in PR #397: URL: https://github.com/apache/datafusion-comet/pull/397#discussion_r1607016398 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -734,6 +734,23 @@ class CometSparkSessionExtensions } else {

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
andygrove commented on PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2#issuecomment-2120828494 Thanks for the review @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] Add script to generate TPC-H data and convert it to Parquet using DataFusion [datafusion-benchmarks]

2024-05-20 Thread via GitHub
andygrove merged PR #2: URL: https://github.com/apache/datafusion-benchmarks/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [PR] feat: Add random row generator in data generator [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #451: URL: https://github.com/apache/datafusion-comet/pull/451#discussion_r1607045862 ## spark/src/test/scala/org/apache/comet/DataGenerator.scala: ## @@ -95,4 +102,55 @@ class DataGenerator(r: Random) { Range(0, n).map(_ => r.next

Re: [PR] Improve `UserDefinedLogicalNode::from_template` API to return `Result` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10575: URL: https://github.com/apache/datafusion/pull/10575#discussion_r1607048271 ## datafusion/expr/src/logical_plan/extension.rs: ## @@ -76,27 +76,20 @@ pub trait UserDefinedLogicalNode: fmt::Debug + Send + Sync { /// For example: `TopK: k=

Re: [PR] Improve ContextProvider [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10577: URL: https://github.com/apache/datafusion/pull/10577#issuecomment-2120882095 I'll leave this open for a day as it is an API change, in case anyone else wants a chance to review -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Update prost-build requirement from =0.12.4 to =0.12.6 [datafusion]

2024-05-20 Thread via GitHub
comphead merged PR #10578: URL: https://github.com/apache/datafusion/pull/10578 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10564: URL: https://github.com/apache/datafusion/pull/10564#issuecomment-2120885272 @jayzhan211 has a better fix in https://github.com/apache/datafusion/pull/10568 -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned [datafusion]

2024-05-20 Thread via GitHub
alamb closed pull request #10564: Minor: Fix name in ArrayFunctionRewriter, error not panic if `Expr::GetStructField` is planned URL: https://github.com/apache/datafusion/pull/10564 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1607055206 ## datafusion/sql/Cargo.toml: ## @@ -47,6 +47,7 @@ arrow-schema = { workspace = true } datafusion-common = { workspace = true, default-features = true } datafus

[PR] Minor: Fix ArrayFunctionRewriter name [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new pull request, #10581: URL: https://github.com/apache/datafusion/pull/10581 This confused me while debugging something else -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] refactor: reduce allocations in push down filter [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10567: URL: https://github.com/apache/datafusion/pull/10567#discussion_r1607060754 ## datafusion/optimizer/src/push_down_filter.rs: ## @@ -861,16 +861,12 @@ impl OptimizerRule for PushDownFilter { .collect(); l

Re: [PR] Add examples of how to convert logical plan to/from sql strings [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10558: URL: https://github.com/apache/datafusion/pull/10558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Add an example of how to convert LogicalPlan to/from SQL Strings [datafusion]

2024-05-20 Thread via GitHub
alamb closed issue #10550: Add an example of how to convert LogicalPlan to/from SQL Strings URL: https://github.com/apache/datafusion/issues/10550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Implement a dialect-specific rule for unparsing an identifier with or without quotes [datafusion]

2024-05-20 Thread via GitHub
comphead commented on code in PR #10573: URL: https://github.com/apache/datafusion/pull/10573#discussion_r1607067858 ## datafusion/sql/src/unparser/expr.rs: ## @@ -504,6 +508,14 @@ impl Unparser<'_> { .collect::>>() } +pub(super) fn new_ident_quoted_if_ne

Re: [PR] Minor: Improve documentation in sql_to_plan example [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10582: URL: https://github.com/apache/datafusion/pull/10582#discussion_r1607076013 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -22,36 +22,45 @@ use datafusion::sql::unparser::expr_to_sql; use datafusion_sql::unparser::dialect::CustomDial

[I] HashJoin LeftAnti Join handles nulls incorrectly [datafusion]

2024-05-20 Thread via GitHub
viirya opened a new issue, #10583: URL: https://github.com/apache/datafusion/issues/10583 ### Describe the bug During working on https://github.com/apache/datafusion-comet/pull/437, a few Spark join tests are failed when delegating to DataFusion HashJoin. It is because that Dat

[PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya opened a new pull request, #10584: URL: https://github.com/apache/datafusion/pull/10584 ## Which issue does this PR close? Closes #10583. ## Rationale for this change ## What changes are included in this PR? ## Are these changes teste

Re: [PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya commented on PR #10584: URL: https://github.com/apache/datafusion/pull/10584#issuecomment-2120931328 Added the test case first. I will find some time to work on the fix. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] HashJoin LeftAnti Join should handle nulls correctly [datafusion]

2024-05-20 Thread via GitHub
viirya commented on code in PR #10584: URL: https://github.com/apache/datafusion/pull/10584#discussion_r1607085982 ## datafusion/sqllogictest/test_files/join.slt: ## @@ -793,3 +793,19 @@ DROP TABLE companies statement ok DROP TABLE leads + + +# LeftAnti Join with null +state

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607087345 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607091830 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607104678 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] Incorrect statistics read for `i8` `i16` [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10585: URL: https://github.com/apache/datafusion/issues/10585 ### Describe the bug As @NGA-TRAN found in https://github.com/apache/datafusion/pull/10537 when i8 and i16 values are written to parquet and then the statistics are extracted, the returned min/m

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
NGA-TRAN commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607104991 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor

Re: [I] Incorrect statistics read for `i8` `i16` columns in parquet [datafusion]

2024-05-20 Thread via GitHub
alamb commented on issue #10585: URL: https://github.com/apache/datafusion/issues/10585#issuecomment-2120956336 Possibly related to https://github.com/apache/datafusion/issues/9779 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607107595 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607110465 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,652 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

[I] DataFusion ignores "column order" parquet statistics specification [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10586: URL: https://github.com/apache/datafusion/issues/10586 ### Describe the bug As @tustvold points out, there is a [`column_order` API](https://docs.rs/parquet/latest/parquet/file/metadata/struct.FileMetaData.html#method.column_order) defined in parquet

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607118413 ## datafusion/core/src/datasource/physical_plan/parquet/arrow_statistics.rs: ## @@ -0,0 +1,43 @@ +use arrow_array::ArrayRef; +use arrow_schema::DataType; +use datafu

[I] DataFusion reads Date32 and Date64 parquet statistics in as [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new issue, #10587: URL: https://github.com/apache/datafusion/issues/10587 ### Describe the bug When reading a Date32 or Date64 column from a parquet file, DataFusion currently returns an Int32 array ### To Reproduce You can see the issue in https:/

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10537: URL: https://github.com/apache/datafusion/pull/10537#discussion_r1607125587 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -0,0 +1,654 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor lic

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb commented on PR #10537: URL: https://github.com/apache/datafusion/pull/10537#issuecomment-2120986025 I have filed the following tickets * https://github.com/apache/datafusion/issues/10585 * https://github.com/apache/datafusion/issues/10586 * #10587 I think this PR is

Re: [I] Row groups are read out of order or with completely different values [datafusion]

2024-05-20 Thread via GitHub
twitu closed issue #10572: Row groups are read out of order or with completely different values URL: https://github.com/apache/datafusion/issues/10572 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Minor: Improve documentation in sql_to_plan example [datafusion]

2024-05-20 Thread via GitHub
edmondop commented on code in PR #10582: URL: https://github.com/apache/datafusion/pull/10582#discussion_r1607146553 ## datafusion-examples/examples/plan_to_sql.rs: ## @@ -22,36 +22,45 @@ use datafusion::sql::unparser::expr_to_sql; use datafusion_sql::unparser::dialect::CustomD

[PR] Minor: Consolidate some integration tests into `core_integration` [datafusion]

2024-05-20 Thread via GitHub
alamb opened a new pull request, #10588: URL: https://github.com/apache/datafusion/pull/10588 ## Which issue does this PR close? ## Rationale for this change In an effort to make it faster to develop and test datafusion , it would be nice if the resources required to run th

Re: [PR] Minor: Consolidate some integration tests into `core_integration` [datafusion]

2024-05-20 Thread via GitHub
alamb commented on code in PR #10588: URL: https://github.com/apache/datafusion/pull/10588#discussion_r1607149443 ## datafusion/core/tests/custom_sources.rs: ## @@ -1,308 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one Review Comment: This was moved to

Re: [PR] fix: Compute murmur3 hash with dictionary input correctly [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #433: URL: https://github.com/apache/datafusion-comet/pull/433#discussion_r1607151015 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1452,17 +1452,55 @@ class CometExpressionSuite extends CometTestBase with A

Re: [PR] feat: API for collecting statistics/index for metadata of a parquet file + tests [datafusion]

2024-05-20 Thread via GitHub
alamb merged PR #10537: URL: https://github.com/apache/datafusion/pull/10537 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] DataFusion reads Date32 and Date64 parquet statistics in as [datafusion]

2024-05-20 Thread via GitHub
edmondop commented on issue #10587: URL: https://github.com/apache/datafusion/issues/10587#issuecomment-2121024857 @alamb the title here doesn't make much sense, are you saying that the `min` and `max` are not extracted as Date32/Date64? -- This is an automated message from the Apache Git

[I] Pass per-field BigQuery `OPTIONS` values to the LogicalPlan's Arrow Schema [datafusion]

2024-05-20 Thread via GitHub
davisp opened a new issue, #10589: URL: https://github.com/apache/datafusion/issues/10589 ### Is your feature request related to a problem or challenge? I've been reading and learning the TableProvider APIs and have finally gotten around to taking a serious look at implementing suppor

Re: [PR] feat: add hex scalar function [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #449: URL: https://github.com/apache/datafusion-comet/pull/449#discussion_r1607158744 ## spark/src/test/scala/org/apache/comet/CometExpressionSuite.scala: ## @@ -1038,6 +1038,46 @@ class CometExpressionSuite extends CometTestBase with Ad

Re: [PR] docs: add guide to adding a new expression [datafusion-comet]

2024-05-20 Thread via GitHub
kazuyukitanimura commented on code in PR #422: URL: https://github.com/apache/datafusion-comet/pull/422#discussion_r1607164180 ## docs/source/index.rst: ## @@ -58,7 +58,11 @@ as a native runtime to achieve improvement in terms of query efficiency and quer Comet Plugin Overv

[PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-20 Thread via GitHub
davisp opened a new pull request, #10590: URL: https://github.com/apache/datafusion/pull/10590 ## Which issue does this PR close? Closes #10589 ## Rationale for this change Provide per-column key/value options in the `CREATE EXTERN TABLE` statement. ## What changes

Re: [PR] Pass BigQuery options to the ArrowSchema [datafusion]

2024-05-20 Thread via GitHub
davisp commented on PR #10590: URL: https://github.com/apache/datafusion/pull/10590#issuecomment-2121051712 Also, for anyone more familiar with datafusion and/or sqlparser, one thing I wasn't 100% on was how to represent the metadata value. For now I've just called format on it, but I have

  1   2   >