[I] 404 in repo Website: https://datafusion.apache.org/ballista [datafusion-ballista]

2024-07-09 Thread via GitHub
StepfenShawn opened a new issue, #1035: URL: https://github.com/apache/datafusion-ballista/issues/1035 **Describe the bug** A clear and concise description of what the bug is. **To Reproduce** Steps to reproduce the behavior: **Expected behavior** A clear and concise de

Re: [PR] Remove redundant `unalias_nested` calls for creating Filter's [datafusion]

2024-07-09 Thread via GitHub
jonahgao commented on code in PR #11340: URL: https://github.com/apache/datafusion/pull/11340#discussion_r1669909386 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -344,12 +344,10 @@ impl CommonSubexprEliminate { self.try_unary_plan(expr, input, config)?

[I] Support serialization/deserialization for custom physical exprs in proto [datafusion]

2024-07-09 Thread via GitHub
lewiszlw opened a new issue, #11350: URL: https://github.com/apache/datafusion/issues/11350 ### Is your feature request related to a problem or challenge? We support serialization/deserialization for custom physical plan in proto, it's better to support custom physical expr as well.

Re: [PR] Fix bug when pushing projection under joins [datafusion]

2024-07-09 Thread via GitHub
berkaysynnada commented on PR #11333: URL: https://github.com/apache/datafusion/pull/11333#issuecomment-2216857903 > @berkaysynnada, could you please help review this? Thanks for investigating those issues, @jonahgao. Great catch and fix. I have searched for other instances of name eq

Re: [I] Release DataFusion `40.0.0` [datafusion]

2024-07-09 Thread via GitHub
samuelcolvin commented on issue #11077: URL: https://github.com/apache/datafusion/issues/11077#issuecomment-2217040619 https://github.com/datafusion-contrib/datafusion-functions-json is now up to date with datafusion 40 on main, arrow and question mark operators work and the new union behav

[PR] Add standardization methods for TableOptions [datafusion]

2024-07-09 Thread via GitHub
emrecakmakyurdu opened a new pull request, #11351: URL: https://github.com/apache/datafusion/pull/11351 ## Which issue does this PR close? ## Rationale for this change ## What changes are included in this PR? ## Are these changes tested?

[PR] Update prost-build requirement from =0.12.6 to =0.13.0 [datafusion]

2024-07-09 Thread via GitHub
dependabot[bot] opened a new pull request, #11352: URL: https://github.com/apache/datafusion/pull/11352 Updates the requirements on [prost-build](https://github.com/tokio-rs/prost) to permit the latest version. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CH

[PR] Update tonic requirement from 0.11 to 0.12 [datafusion]

2024-07-09 Thread via GitHub
dependabot[bot] opened a new pull request, #11353: URL: https://github.com/apache/datafusion/pull/11353 Updates the requirements on [tonic](https://github.com/hyperium/tonic) to permit the latest version. Release notes Sourced from https://github.com/hyperium/tonic/releases";>tonic

[PR] Update prost requirement from 0.12.0 to 0.13.0 [datafusion]

2024-07-09 Thread via GitHub
dependabot[bot] opened a new pull request, #11354: URL: https://github.com/apache/datafusion/pull/11354 Updates the requirements on [prost](https://github.com/tokio-rs/prost) to permit the latest version. Changelog Sourced from https://github.com/tokio-rs/prost/blob/master/CHANGELO

[PR] Update prost-derive requirement from 0.12 to 0.13 [datafusion]

2024-07-09 Thread via GitHub
dependabot[bot] opened a new pull request, #11355: URL: https://github.com/apache/datafusion/pull/11355 Updates the requirements on [prost-derive](https://github.com/tokio-rs/prost) to permit the latest version. Release notes Sourced from https://github.com/tokio-rs/prost/releases"

Re: [PR] Implement prettier SQL unparsing (more human readable) [datafusion]

2024-07-09 Thread via GitHub
MohamedAbdeen21 commented on code in PR #11186: URL: https://github.com/apache/datafusion/pull/11186#discussion_r1670090368 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -314,3 +310,78 @@ fn test_table_references_in_plan_to_sql() { "SELECT \"table\".id, \"table\".

Re: [PR] Add standardization methods for TableOptions [datafusion]

2024-07-09 Thread via GitHub
emrecakmakyurdu closed pull request #11351: Add standardization methods for TableOptions URL: https://github.com/apache/datafusion/pull/11351 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the spec

Re: [PR] Fix bug when pushing projection under joins [datafusion]

2024-07-09 Thread via GitHub
jonahgao commented on PR #11333: URL: https://github.com/apache/datafusion/pull/11333#issuecomment-2217171773 > Would you prefer updating this part as well? @berkaysynnada Updated as suggestion in https://github.com/apache/datafusion/pull/11333/commits/6e8eec76017e73827b25cd90221b458c

Re: [PR] Fix bug when pushing projection under joins [datafusion]

2024-07-09 Thread via GitHub
berkaysynnada commented on PR #11333: URL: https://github.com/apache/datafusion/pull/11333#issuecomment-2217211550 > > Would you prefer updating this part as well? > > @berkaysynnada Updated as suggestion in [6e8eec7](https://github.com/apache/datafusion/commit/6e8eec76017e73827b25cd9

Re: [I] Write DataFusion paper for (SIGMOD / VLDB / ICDE) [datafusion]

2024-07-09 Thread via GitHub
alamb commented on issue #6782: URL: https://github.com/apache/datafusion/issues/6782#issuecomment-2217256777 Final ACM link: https://dl.acm.org/doi/10.1145/3626246.3653368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

[I] Add documentation and annotations for all user facing python classes and functions [datafusion-python]

2024-07-09 Thread via GitHub
timsaucer opened a new issue, #749: URL: https://github.com/apache/datafusion-python/issues/749 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Per discussion in the datafusion python discord channel, some users feel that the

[PR] Minor: correct filename for 204-05-07 announcement [datafusion-site]

2024-07-09 Thread via GitHub
alamb opened a new pull request, #4: URL: https://github.com/apache/datafusion-site/pull/4 The filename says "2025" but the post is from "2024" -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] Minor: Add link to blog to main DataFusion website [datafusion]

2024-07-09 Thread via GitHub
alamb opened a new pull request, #11356: URL: https://github.com/apache/datafusion/pull/11356 ## Which issue does this PR close? Part of #9602 ## Rationale for this change If we are going to have a blog we should make it easier for people to find it ## What change

[PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
timsaucer opened a new pull request, #750: URL: https://github.com/apache/datafusion-python/pull/750 # Which issue does this PR close? Closes #749. # Rationale for this change As described the the [issue linked above](https://github.com/apache/datafusion-python/issues/7

[PR] Minor: Add instructions for building with docker [datafusion-site]

2024-07-09 Thread via GitHub
alamb opened a new pull request, #5: URL: https://github.com/apache/datafusion-site/pull/5 part of https://github.com/apache/datafusion/issues/9602 I am preparing to write a blog post for the imminent release of DataFusion 40 and I need to know how to see such posts locally -- This

Re: [PR] Minor: Add instructions for building with docker [datafusion-site]

2024-07-09 Thread via GitHub
alamb commented on code in PR #5: URL: https://github.com/apache/datafusion-site/pull/5#discussion_r1670427526 ## README.md: ## @@ -30,13 +30,28 @@ Should be `ruby 3.1.3p185 (2022-11-24 revision 1a6b16756e) [arm64-darwin23]` or gem install jekyll bundler ``` -## Preview sit

[PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-09 Thread via GitHub
alamb opened a new pull request, #6: URL: https://github.com/apache/datafusion-site/pull/6 Closes https://github.com/apache/datafusion/issues/9602 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] feat: Upgrade to DataFusion 40.0.0-rc1 [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on PR #644: URL: https://github.com/apache/datafusion-comet/pull/644#issuecomment-2217564012 > > I recall that count, max, min are supported as window function recently. Maybe there are some changes in DataFusion. > > maybe this [PR](https://github.com/apache/dataf

Re: [I] Blog post with DataFusion Jan - June 2024 [datafusion]

2024-07-09 Thread via GitHub
alamb commented on issue #9602: URL: https://github.com/apache/datafusion/issues/9602#issuecomment-2217583939 Started gathering ideas https://github.com/apache/datafusion-site/pull/6 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

[PR] Improve `CommonSubexprEliminate` rule with surely and conditionally evaluated stats [datafusion]

2024-07-09 Thread via GitHub
peter-toth opened a new pull request, #11357: URL: https://github.com/apache/datafusion/pull/11357 ## Which issue does this PR close? Part of https://github.com/apache/datafusion/issues/11194. ## Rationale for this change Currently `CommonSubexprEliminate` doesn't recurse

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 [datafusion-site]

2024-07-09 Thread via GitHub
alamb merged PR #2: URL: https://github.com/apache/datafusion-site/pull/2 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.

Re: [PR] Bump rexml from 3.2.6 to 3.2.8 [datafusion-site]

2024-07-09 Thread via GitHub
alamb commented on PR #2: URL: https://github.com/apache/datafusion-site/pull/2#issuecomment-2217689428 I tested this locally using the docker setup in https://github.com/apache/datafusion-site/pull/5 and it seems to have worked great -- This is an automated message from the Apache Git S

Re: [PR] Support `NULL` literals in where clause [datafusion]

2024-07-09 Thread via GitHub
alamb merged PR #11266: URL: https://github.com/apache/datafusion/pull/11266 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] `where` clause incorrectly reject `NULL` literal (by SQLancer-NoREC) [datafusion]

2024-07-09 Thread via GitHub
alamb closed issue #11248: `where` clause incorrectly reject `NULL` literal (by SQLancer-NoREC) URL: https://github.com/apache/datafusion/issues/11248 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [PR] Support `NULL` literals in where clause [datafusion]

2024-07-09 Thread via GitHub
alamb commented on PR #11266: URL: https://github.com/apache/datafusion/pull/11266#issuecomment-2217711150 Thanks again @xinlifoobar -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [PR] Minor: Add link to blog to main DataFusion website [datafusion]

2024-07-09 Thread via GitHub
alamb commented on PR #11356: URL: https://github.com/apache/datafusion/pull/11356#issuecomment-2217711806 🚀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Minor: Add link to blog to main DataFusion website [datafusion]

2024-07-09 Thread via GitHub
alamb merged PR #11356: URL: https://github.com/apache/datafusion/pull/11356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Separate Spark-compatibility expressions into a library [datafusion-comet]

2024-07-09 Thread via GitHub
advancedxy commented on issue #630: URL: https://github.com/apache/datafusion-comet/issues/630#issuecomment-2217714745 Thanks for raising this issue and Andy's quick response on this. I was thinking about adding a similar crate to DataFusion as well, which could be expanded to suppo

Re: [PR] feat: add UDF to_local_time() [datafusion]

2024-07-09 Thread via GitHub
alamb commented on code in PR #11347: URL: https://github.com/apache/datafusion/pull/11347#discussion_r1670518147 ## datafusion/functions/src/datetime/to_local_time.rs: ## @@ -0,0 +1,601 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor li

[I] Add `to_local_time` function for converting timestamps with timezones to timestmaps without timezones [datafusion]

2024-07-09 Thread via GitHub
alamb opened a new issue, #11358: URL: https://github.com/apache/datafusion/issues/11358 ### Is your feature request related to a problem or challenge? The actual need is implementing `date_bin` that correctly bins dates on timezone correct timestamps (e.g. not UTC) as described by @A

Re: [PR] Improve `CommonSubexprEliminate` rule with surely and conditionally evaluated stats [datafusion]

2024-07-09 Thread via GitHub
peter-toth commented on PR #11357: URL: https://github.com/apache/datafusion/pull/11357#issuecomment-2217754075 cc @alamb, @haohuaijin. This PR fixes https://github.com/apache/datafusion/pull/11197#discussion_r1662618797 / https://github.com/apache/datafusion/pull/11265#discussion_r16669881

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-09 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1670547313 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,248 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:0

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-09 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1670551071 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,248 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:0

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1670551445 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -501,18 +504,24 @@ impl Cast { let array = array_with_timezone(array, self.timezone

Re: [PR] Blog post for release 40.0.0 [datafusion-site]

2024-07-09 Thread via GitHub
phillipleblanc commented on code in PR #6: URL: https://github.com/apache/datafusion-site/pull/6#discussion_r1670551846 ## _posts/2024-07-09-datafusion-40.0.0.md: ## @@ -0,0 +1,248 @@ +--- +layout: post +title: "Apache Arrow DataFusion 40.0.0 Released" +date: "2024-07-09 00:00:0

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1670554215 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -501,18 +504,24 @@ impl Cast { let array = array_with_timezone(array, self.timezone

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1670557442 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -91,22 +92,24 @@ macro_rules! cast_utf8_to_int { result }}; } - macro_rules!

Re: [PR] feat: Create new `datafusion-comet-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
advancedxy commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670518489 ## native/spark-expr/src/abs.rs: ## @@ -68,17 +71,15 @@ impl ScalarUDFImpl for CometAbsFunc { fn invoke(&self, args: &[ColumnarValue]) -> Result {

Re: [PR] fix: Optimize some functions to rewrite dictionary-encoded strings [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #627: URL: https://github.com/apache/datafusion-comet/pull/627#discussion_r1670554215 ## native/core/src/execution/datafusion/expressions/cast.rs: ## @@ -501,18 +504,24 @@ impl Cast { let array = array_with_timezone(array, self.timezone

Re: [PR] feat: Create new `datafusion-comet-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670584739 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] feat: Create new `datafusion-comet-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670588372 ## native/spark-expr/src/abs.rs: ## @@ -68,17 +71,15 @@ impl ScalarUDFImpl for CometAbsFunc { fn invoke(&self, args: &[ColumnarValue]) -> Result {

Re: [PR] feat: Upgrade to DataFusion 40.0.0-rc1 [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on PR #644: URL: https://github.com/apache/datafusion-comet/pull/644#issuecomment-2217850578 @viirya @huaxingao I made a change so that we fall back to Spark for count in window aggregate for now, and I will file a follow on issue. This PR is ready for review now. --

[I] Review use of logical expressions in physical AggregateFunctionExpr [datafusion]

2024-07-09 Thread via GitHub
andygrove opened a new issue, #11359: URL: https://github.com/apache/datafusion/issues/11359 ### Is your feature request related to a problem or challenge? ### Is your feature request related to a problem or challenge? DataFusion 40.0.0 added a new `logical_args: Vec` field to

[I] Improve `SingleDistinctToGroupBy` to get the same plan as the `group by` query [datafusion]

2024-07-09 Thread via GitHub
jayzhan211 opened a new issue, #11360: URL: https://github.com/apache/datafusion/issues/11360 ### Is your feature request related to a problem or challenge? While working on #11299 , I meet the issue that the `single distinct plan` is different from `group by` plan. https://github.co

Re: [I] use StringViewArray when reading String columns from Parquet [datafusion]

2024-07-09 Thread via GitHub
XiangpengHao commented on issue #10921: URL: https://github.com/apache/datafusion/issues/10921#issuecomment-2217882361 Want to share some thoughts here on when to use `StringViewArray` and when not. We only consider the cost of loading data from parquet to narrow the scope. To

Re: [PR] chore: Make shuffle compression level configurable [datafusion-comet]

2024-07-09 Thread via GitHub
advancedxy commented on code in PR #632: URL: https://github.com/apache/datafusion-comet/pull/632#discussion_r1670631896 ## core/src/execution/datafusion/shuffle_writer.rs: ## @@ -80,6 +79,8 @@ pub struct ShuffleWriterExec { /// Metrics metrics: ExecutionPlanMetricsSet

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
advancedxy commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670637816 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] feat: ANSI support for Add [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on PR #616: URL: https://github.com/apache/datafusion-comet/pull/616#issuecomment-2217922631 Thanks for the contribution @planga82. I am reviewing this today. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
andygrove commented on PR #750: URL: https://github.com/apache/datafusion-python/pull/750#issuecomment-2217977268 @jdye64 @charlesbluca fyi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

[I] Re-implement support for count in window aggregate [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove opened a new issue, #645: URL: https://github.com/apache/datafusion-comet/issues/645 ### What is the problem the feature request solves? When upgrading to DataFusion 40.0.0-rc1 in https://github.com/apache/datafusion-comet/pull/644 I had to disable support for count in wind

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670743013 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670747580 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [I] Re-implement support for count in window aggregate [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on issue #645: URL: https://github.com/apache/datafusion-comet/issues/645#issuecomment-2218036771 cc @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670758586 ## python/datafusion/udf.py: ## @@ -0,0 +1,62 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670758586 ## python/datafusion/udf.py: ## @@ -0,0 +1,62 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
andygrove commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670767325 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670771945 ## python/datafusion/record_batch.py: ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] Support SortMerge spilling [datafusion]

2024-07-09 Thread via GitHub
comphead commented on PR #11218: URL: https://github.com/apache/datafusion/pull/11218#issuecomment-2218067577 All initial tests passed, I'm planning to add more tests related to result correctness in separate PR -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Support SortMerge spilling [datafusion]

2024-07-09 Thread via GitHub
comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1670774467 ## datafusion/physical-plan/src/sorts/sort.rs: ## @@ -333,10 +333,7 @@ impl ExternalSorter { for spill in self.spills.drain(..) {

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670778616 ## python/datafusion/record_batch.py: ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670779264 ## python/datafusion/expr.py: ## @@ -15,9 +15,256 @@ # specific language governing permissions and limitations # under the License. +from __future__ impor

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670778616 ## python/datafusion/record_batch.py: ## @@ -0,0 +1,52 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
huaxingao commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670785329 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670823051 ## python/datafusion/udf.py: ## @@ -0,0 +1,62 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670825912 ## python/datafusion/expr.py: ## @@ -15,9 +15,256 @@ # specific language governing permissions and limitations # under the License. +from __future__ impor

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670834793 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [I] Potential memory issue when using COPY with PARTITIONED BY [datafusion]

2024-07-09 Thread via GitHub
alamb commented on issue #11042: URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2218133961 Possible related to https://github.com/apache/datafusion/issues/11344 where the memory tracking for the parquet writing could be improved -- This is an automated message from th

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670834793 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements

Re: [I] Track memory used by parquet writers. [datafusion]

2024-07-09 Thread via GitHub
alamb commented on issue #11344: URL: https://github.com/apache/datafusion/issues/11344#issuecomment-2218134406 Someone else also saw similar issues in https://github.com/apache/datafusion/issues/11042 -- This is an automated message from the Apache Git Service. To respond to the message,

Re: [I] Potential memory issue when using COPY with PARTITIONED BY [datafusion]

2024-07-09 Thread via GitHub
hveiga commented on issue #11042: URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2218143802 > BTW something we have seen in InfluxDB, especially for very compressible data, was that the arrow writer was consuming substantial memory. > > Something that might be wor

Re: [I] Re-implement support for count in window aggregate [datafusion-comet]

2024-07-09 Thread via GitHub
huaxingao commented on issue #645: URL: https://github.com/apache/datafusion-comet/issues/645#issuecomment-2218146640 Looks like `min` and `max` will be removed from built-in function as well. I will re-implement the window aggregate support using `AggregateUDF` -- This is an aut

Re: [PR] feat: ANSI support for Add [datafusion-comet]

2024-07-09 Thread via GitHub
dharanad commented on code in PR #616: URL: https://github.com/apache/datafusion-comet/pull/616#discussion_r1670855040 ## core/src/execution/datafusion/expressions/binary.rs: ## @@ -0,0 +1,202 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [PR] feat: add UDF to_local_time() [datafusion]

2024-07-09 Thread via GitHub
Abdullahsab3 commented on code in PR #11347: URL: https://github.com/apache/datafusion/pull/11347#discussion_r1670852619 ## datafusion/functions/src/datetime/to_local_time.rs: ## @@ -0,0 +1,610 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [I] Implement initial version of to_json [datafusion-comet]

2024-07-09 Thread via GitHub
dharanad commented on issue #631: URL: https://github.com/apache/datafusion-comet/issues/631#issuecomment-2218221471 @andygrove QQ: Upon checking i found out that DataFusion doesn't currently support a built-in `to_json` function. While implementing it directly in Comet is an option, there

Re: [PR] feat: add UDF to_local_time() [datafusion]

2024-07-09 Thread via GitHub
Abdullahsab3 commented on code in PR #11347: URL: https://github.com/apache/datafusion/pull/11347#discussion_r1670830134 ## datafusion/functions/src/datetime/to_local_time.rs: ## @@ -0,0 +1,601 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] feat: add UDF to_local_time() [datafusion]

2024-07-09 Thread via GitHub
Abdullahsab3 commented on code in PR #11347: URL: https://github.com/apache/datafusion/pull/11347#discussion_r1670830134 ## datafusion/functions/src/datetime/to_local_time.rs: ## @@ -0,0 +1,601 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
emgeee commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670891866 ## python/datafusion/expr.py: ## @@ -15,9 +15,256 @@ # specific language governing permissions and limitations # under the License. +from __future__ import a

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
max-muoto commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670892929 ## python/datafusion/expr.py: ## @@ -15,9 +15,256 @@ # specific language governing permissions and limitations # under the License. +from __future__ impor

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
timsaucer commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1670896802 ## python/datafusion/expr.py: ## @@ -15,9 +15,256 @@ # specific language governing permissions and limitations # under the License. +from __future__ impor

[PR] Implement ScalarFunction `MAKE_MAP` and `MAP` [datafusion]

2024-07-09 Thread via GitHub
goldmedal opened a new pull request, #11361: URL: https://github.com/apache/datafusion/pull/11361 ## Which issue does this PR close? Closes #11268. ## Rationale for this change ## What changes are included in this PR? ## Are these changes te

[I] Regression in eliminating monotonic sorts [datafusion]

2024-07-09 Thread via GitHub
suremarc opened a new issue, #11362: URL: https://github.com/apache/datafusion/issues/11362 ### Describe the bug After PR #10434, there is a slight regression in eliminating certain sorts during physical planning. In particular, consider the following case: ``` SortExec: [c ASC]

[PR] fix: Fix eq properties regression from #10434 [datafusion]

2024-07-09 Thread via GitHub
suremarc opened a new pull request, #11363: URL: https://github.com/apache/datafusion/pull/11363 ## Which issue does this PR close? Closes #11362. ## Rationale for this change This PR fixes a regression in #10434. The previous PR implements "discovery" of

Re: [I] Potential memory issue when using COPY with PARTITIONED BY [datafusion]

2024-07-09 Thread via GitHub
alamb commented on issue #11042: URL: https://github.com/apache/datafusion/issues/11042#issuecomment-2218330107 > In general I have been having a hard time trying to debug this since there is no heaptrack for Mac and the build process for heaptrack_gui is also broken at the moment as I cann

Re: [I] Integrate with the substrait integration test [datafusion]

2024-07-09 Thread via GitHub
richtia commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2218334234 > @richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/main/s

Re: [PR] feat: Use unified allocator for execution iterators [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on PR #613: URL: https://github.com/apache/datafusion-comet/pull/613#issuecomment-2218343486 This only got failures on `CometTPCDSQuerySuite` with sort merge join configs (broadcast and hash join configs are passed). But I don't see any details about the failure in CI

Re: [PR] feat: Upgrade to DataFusion 40.0.0-rc1 [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #644: URL: https://github.com/apache/datafusion-comet/pull/644#discussion_r1670984028 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -208,7 +208,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with Co

Re: [PR] feat: Upgrade to DataFusion 40.0.0-rc1 [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #644: URL: https://github.com/apache/datafusion-comet/pull/644#discussion_r1670985170 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -208,7 +208,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with Co

Re: [PR] feat: Upgrade to DataFusion 40.0.0-rc1 [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #644: URL: https://github.com/apache/datafusion-comet/pull/644#discussion_r1670985545 ## spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala: ## @@ -208,7 +208,7 @@ object QueryPlanSerde extends Logging with ShimQueryPlanSerde with Co

Re: [PR] feat: Create new `datafusion-spark-expr` crate containing Spark-compatible DataFusion expressions [datafusion-comet]

2024-07-09 Thread via GitHub
viirya commented on code in PR #638: URL: https://github.com/apache/datafusion-comet/pull/638#discussion_r1670998962 ## native/spark-expr/Cargo.toml: ## @@ -0,0 +1,38 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See

Re: [PR] Support SortMerge spilling [datafusion]

2024-07-09 Thread via GitHub
viirya commented on PR #11218: URL: https://github.com/apache/datafusion/pull/11218#issuecomment-2218402570 I will review this in next few days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] Implement ScalarFunction `MAKE_MAP` and `MAP` [datafusion]

2024-07-09 Thread via GitHub
goldmedal commented on PR #11361: URL: https://github.com/apache/datafusion/pull/11361#issuecomment-2218411989 I noticed the map and struct of Arrow allow duplicate keys but I think this behavior is wrong. ``` > select named_struct('abc', 1, 'abc', 2); +

[I] Deterministic IDs for ExecutionPlan [datafusion]

2024-07-09 Thread via GitHub
ameyc opened a new issue, #11364: URL: https://github.com/apache/datafusion/issues/11364 ### Is your feature request related to a problem or challenge? Currently execution plans do not have an id associated with them this makes comparison of metrics across the runs. Additionally we wo

[I] StateBackend in DataFusion's RuntimeEnv [datafusion]

2024-07-09 Thread via GitHub
ameyc opened a new issue, #11365: URL: https://github.com/apache/datafusion/issues/11365 ### Is your feature request related to a problem or challenge? Currently DataFusion operators communicate via a narrow API i.e. forwarding `SendableRecordBatchStreams`. In some instances, in parti

Re: [PR] Python wrapper classes for all user interfaces [datafusion-python]

2024-07-09 Thread via GitHub
slyons commented on code in PR #750: URL: https://github.com/apache/datafusion-python/pull/750#discussion_r1671038077 ## python/datafusion/context.py: ## @@ -0,0 +1,1167 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements.

Re: [I] Integrate with the substrait integration test [datafusion]

2024-07-09 Thread via GitHub
Lordworms commented on issue #10710: URL: https://github.com/apache/datafusion/issues/10710#issuecomment-2218479864 > > @richtia hi! when I was trying to do plan 7,8 and 9, I find the substrait json file is empty, any reason for this? https://github.com/substrait-io/consumer-testing/tree/ma

  1   2   3   >