Re: [I] date_part returning wrong results due to overflows [datafusion]

2025-02-17 Thread via GitHub
gabotechs commented on issue #14738: URL: https://github.com/apache/datafusion/issues/14738#issuecomment-2664836889 It seems like even without integer overflows, the overall logic is wrong: ``` SELECT date_part('seconds', interval '1 hour') -- returns 3600, but the result should be 0

Re: [I] Prepared physical plan reusage [datafusion]

2025-02-17 Thread via GitHub
askalt commented on issue #14342: URL: https://github.com/apache/datafusion/issues/14342#issuecomment-2664835375 @alamb Sorry for delay from my side. I investigated plan re-use question again and what I noticed. - There is also a problem with nodes state `shared across partiti

Re: [I] date_part returning wrong results due to overflows [datafusion]

2025-02-17 Thread via GitHub
gabotechs commented on issue #14738: URL: https://github.com/apache/datafusion/issues/14738#issuecomment-2664833514 It can also be replicated with intervals: ```sql SELECT date_part('microseconds', interval '1 hour') -- returns -694967296, but the result should be 0 ```

Re: [PR] fix: Remove more cast.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-17 Thread via GitHub
parthchandra commented on PR #1413: URL: https://github.com/apache/datafusion-comet/pull/1413#issuecomment-2664526822 @mbutrovich I opened https://github.com/apache/datafusion-comet/pull/1415 which modifies `cast_supported` to address a couple of failures. Can you verify if I can remove t

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-17 Thread via GitHub
xudong963 commented on code in PR #14207: URL: https://github.com/apache/datafusion/pull/14207#discussion_r1959179376 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3172,3 +3181,78 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> {

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-17 Thread via GitHub
xudong963 commented on code in PR #14207: URL: https://github.com/apache/datafusion/pull/14207#discussion_r1959138660 ## datafusion/physical-optimizer/src/enforce_distribution.rs: ## @@ -1020,23 +1033,26 @@ fn remove_dist_changing_operators( /// ``` fn replace_order_preserving

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-17 Thread via GitHub
xudong963 commented on code in PR #14207: URL: https://github.com/apache/datafusion/pull/14207#discussion_r1959171412 ## datafusion/core/tests/physical_optimizer/enforce_distribution.rs: ## @@ -3172,3 +3181,78 @@ fn optimize_away_unnecessary_repartition2() -> Result<()> {

Re: [I] ListingTable cannot handle partition evolution [datafusion]

2025-02-17 Thread via GitHub
TheBuilderJR commented on issue #13270: URL: https://github.com/apache/datafusion/issues/13270#issuecomment-2664700070 +1 I'm also blocked on this. It'd be nice if schema evolution could be a first class citizen in datafusion. It's been pretty painful/stressful running into schema evolution

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-17 Thread via GitHub
zhuqi-lucas commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2664713258 Hi @westonpace , i think the problem is we need to setting the partition count and also to increase the memory limit also for your case: ``` 1. setting the partition

Re: [I] ListingTable cannot handle partition evolution [datafusion]

2025-02-17 Thread via GitHub
zhuqi-lucas commented on issue #13270: URL: https://github.com/apache/datafusion/issues/13270#issuecomment-2664731578 Just noticed we have a solution for partition evolution, see details PR, may be we need some improvement based on it? https://github.com/apache/datafusion/pull/12683/f

Re: [PR] bug: Fix memory reservation and allocation problems for SortExec [datafusion]

2025-02-17 Thread via GitHub
westonpace commented on PR #14644: URL: https://github.com/apache/datafusion/pull/14644#issuecomment-2664703991 Thanks for the ping and thanks for working on this! This is an important feature for us (for training secondary indices on string columns) so I'm very thankful to see the effort

Re: [I] Support remote shuffle service [datafusion-comet]

2025-02-17 Thread via GitHub
lifulong commented on issue #1241: URL: https://github.com/apache/datafusion-comet/issues/1241#issuecomment-2664679756 Many companies are using Celeborn, looking forward to Comet supporting Celeborn -- This is an automated message from the Apache Git Service. To respond to the message, p

[PR] fix: graceful NULL and type error handling in array functions [datafusion]

2025-02-17 Thread via GitHub
alan910127 opened a new pull request, #14737: URL: https://github.com/apache/datafusion/pull/14737 ## Which issue does this PR close? - Closes #14451. ## Rationale for this change This change ensures that array functions handle `NULL` and incorrect argument t

[I] cannot write dataframe due to incompotability of Timestamp [datafusion]

2025-02-17 Thread via GitHub
TheBuilderJR opened a new issue, #14736: URL: https://github.com/apache/datafusion/issues/14736 ### Describe the bug The following code ``` df .clone() .write_parquet( file_path.to_str().ok_or(anyhow!("Invalid file path"))?

[PR] feat(examples): boundary analysis example for the case of conjunctions [datafusion]

2025-02-17 Thread via GitHub
clflushopt opened a new pull request, #14735: URL: https://github.com/apache/datafusion/pull/14735 ## Which issue does this PR close? - - Part of #3929 (see the recent followup discussion) and a follow up for #14688 ## Rationale for this change The goal of th

Re: [I] [EPIC] Improved support for nested / structured types (`Struct` , `List`, `ListArray`, and other Composite types) [datafusion]

2025-02-17 Thread via GitHub
TheBuilderJR commented on issue #2326: URL: https://github.com/apache/datafusion/issues/2326#issuecomment-2664575861 @alamb I just tried this on 45.0 and it still doesn't seem supported: ``` Failed to collect data frame results: Plan("Cannot cast file schema field additionalInfo o

Re: [I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

2025-02-17 Thread via GitHub
Noah-FetchRewards closed issue #1411: Need help running benchmarks and other pyspark jobs. URL: https://github.com/apache/datafusion-comet/issues/1411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

2025-02-17 Thread via GitHub
Noah-FetchRewards commented on issue #1411: URL: https://github.com/apache/datafusion-comet/issues/1411#issuecomment-2664570522 Thanks @andygrove, your suggestion of disable comet features and running on Spark first, then reenabling worked for me. For some reason disabling the docume

Re: [PR] migrate string functions to `inovke_with_args` [datafusion]

2025-02-17 Thread via GitHub
zjregee commented on PR #14722: URL: https://github.com/apache/datafusion/pull/14722#issuecomment-2664558268 And I noticed that #14686 introduced a call `invoke_batch` that was going to be outdated, I modified it here. -- This is an automated message from the Apache Git Service. To respo

Re: [PR] migrate string functions to `inovke_with_args` [datafusion]

2025-02-17 Thread via GitHub
zjregee commented on code in PR #14722: URL: https://github.com/apache/datafusion/pull/14722#discussion_r1959024013 ## datafusion/functions/benches/concat.rs: ## @@ -39,8 +40,15 @@ fn criterion_benchmark(c: &mut Criterion) { let mut group = c.benchmark_group("concat fun

Re: [I] [EPIC] Substrait: Add producer and consumer for physical plans [datafusion]

2025-02-17 Thread via GitHub
niebayes commented on issue #5173: URL: https://github.com/apache/datafusion/issues/5173#issuecomment-2664538226 @andygrove @alamb Could you recommend the best path for implementing these tasks? Since we’re building a distributed query engine based on DataFusion, which requires splitting a

Re: [PR] fix: Substrait serializer clippy error: not calling truncate [datafusion]

2025-02-17 Thread via GitHub
niebayes commented on code in PR #14723: URL: https://github.com/apache/datafusion/pull/14723#discussion_r1959012334 ## datafusion/substrait/src/serializer.rs: ## @@ -27,10 +27,13 @@ use substrait::proto::Plan; use std::fs::OpenOptions; use std::io::{Read, Write}; -#[allow(c

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-17 Thread via GitHub
xudong963 commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2664517439 Yes, the bug still lives. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] fix: Substrait serializer clippy error: not calling truncate [datafusion]

2025-02-17 Thread via GitHub
niebayes commented on code in PR #14723: URL: https://github.com/apache/datafusion/pull/14723#discussion_r1959010437 ## datafusion/substrait/src/serializer.rs: ## @@ -27,10 +27,13 @@ use substrait::proto::Plan; use std::fs::OpenOptions; use std::io::{Read, Write}; -#[allow(c

[PR] fix: fix various unit test failures in native_datafusion and native_iceberg_compat readers [datafusion-comet]

2025-02-17 Thread via GitHub
parthchandra opened a new pull request, #1415: URL: https://github.com/apache/datafusion-comet/pull/1415 major changes : - allow Uint64 to decimal and FixedWidthBinary to Binary conversions in complex readers - do not enable prefetch reads in tests if complex reader is enabled

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-17 Thread via GitHub
comphead commented on PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#issuecomment-2664351633 Thanks everyone -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [PR] chore: Migrate Core Functions to invoke_with_args [datafusion]

2025-02-17 Thread via GitHub
niebayes commented on code in PR #14725: URL: https://github.com/apache/datafusion/pull/14725#discussion_r1958982284 ## datafusion/functions/src/core/coalesce.rs: ## @@ -93,11 +95,8 @@ impl ScalarUDFImpl for CoalesceFunc { } /// coalesce evaluates to the first value

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion-comet]

2025-02-17 Thread via GitHub
wForget commented on issue #1412: URL: https://github.com/apache/datafusion-comet/issues/1412#issuecomment-2664447686 > Nice find. Thanks [@wForget](https://github.com/wForget). Do you plan on working on a fix? I will try to fix it. -- This is an automated message from the Apache

Re: [PR] feat: add Win-amd64 profile [datafusion-comet]

2025-02-17 Thread via GitHub
wForget commented on PR #1410: URL: https://github.com/apache/datafusion-comet/pull/1410#issuecomment-2664446363 > Thanks @wForget. I have no way to test this, but LGTM. @andygrove Thank you. I have verified this locally, I'll provide screenshots later. -- This is an automated mes

Re: [PR] feat: add Win-amd64 profile [datafusion-comet]

2025-02-17 Thread via GitHub
wForget commented on code in PR #1410: URL: https://github.com/apache/datafusion-comet/pull/1410#discussion_r1958961256 ## pom.xml: ## @@ -477,6 +477,20 @@ under the License. + + Win-amd64 Review Comment: Yes, they have different cpu architectures. M

Re: [PR] Feat: Implement hf:// / "hugging face" integration in datafusion-cli [datafusion]

2025-02-17 Thread via GitHub
github-actions[bot] commented on PR #10792: URL: https://github.com/apache/datafusion/pull/10792#issuecomment-2664435664 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Adding node_id to ExecutionPlanProperties [datafusion]

2025-02-17 Thread via GitHub
github-actions[bot] commented on PR #12186: URL: https://github.com/apache/datafusion/pull/12186#issuecomment-2664435616 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] fix: [branch-0.6] Fix Comet version in Spark diffs [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove merged PR #1405: URL: https://github.com/apache/datafusion-comet/pull/1405 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove merged PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] chore: Prepare for 0.7.0 development [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove commented on PR #1404: URL: https://github.com/apache/datafusion-comet/pull/1404#issuecomment-2664433714 Thanks @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the s

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-17 Thread via GitHub
comphead merged PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Create optional HDFS feature for Comet [datafusion-comet]

2025-02-17 Thread via GitHub
comphead closed issue #1337: Create optional HDFS feature for Comet URL: https://github.com/apache/datafusion-comet/issues/1337 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] fix: Remove more cast.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-17 Thread via GitHub
codecov-commenter commented on PR #1413: URL: https://github.com/apache/datafusion-comet/pull/1413#issuecomment-2664355521 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/1413?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_ca

Re: [PR] Remove CountWildcardRule in Analyzer and move the functionality in ExprPlanner, add `plan_aggregate` and `plan_window` to planner [datafusion]

2025-02-17 Thread via GitHub
jayzhan211 commented on PR #14689: URL: https://github.com/apache/datafusion/pull/14689#issuecomment-2664346172 We need display name / schema name for WindowFunction as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958906563 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2283,49 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Res

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958906563 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2283,49 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Res

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958906563 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2283,49 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Res

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892432 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -246,32 +282,50 @@ fn replace_with_partial_sort( /// This function turns plans of the form ///

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958881333 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -546,19 +594,7 @@ fn remove_bottleneck_in_subplan( }) .collect::>()?;

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958881333 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -546,19 +594,7 @@ fn remove_bottleneck_in_subplan( }) .collect::>()?;

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958881333 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -546,19 +594,7 @@ fn remove_bottleneck_in_subplan( }) .collect::>()?;

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892575 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -246,32 +282,50 @@ fn replace_with_partial_sort( /// This function turns plans of the form ///

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958881333 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -546,19 +594,7 @@ fn remove_bottleneck_in_subplan( }) .collect::>()?;

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892575 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -246,32 +282,50 @@ fn replace_with_partial_sort( /// This function turns plans of the form ///

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958892432 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -246,32 +282,50 @@ fn replace_with_partial_sort( /// This function turns plans of the form ///

Re: [PR] feat: add Win-amd64 profile [datafusion-comet]

2025-02-17 Thread via GitHub
kazuyukitanimura commented on code in PR #1410: URL: https://github.com/apache/datafusion-comet/pull/1410#discussion_r1958881323 ## pom.xml: ## @@ -477,6 +477,20 @@ under the License. + + Win-amd64 Review Comment: Just curious, we already have `Win-x

Re: [I] Improve release candidate numbering [datafusion-python]

2025-02-17 Thread via GitHub
kevinjqliu commented on issue #1025: URL: https://github.com/apache/datafusion-python/issues/1025#issuecomment-2664277456 Great thing about the above approach is we can reuse the github action for nightly build https://github.com/apache/iceberg-python/blob/main/.github/workflows/nightl

Re: [I] Improve release candidate numbering [datafusion-python]

2025-02-17 Thread via GitHub
kevinjqliu commented on issue #1025: URL: https://github.com/apache/datafusion-python/issues/1025#issuecomment-2664276975 > Currently we have a divergence in a few releases in the datafusion-python version number and their upstream datafusion they rely on. This isn't a big problem but it d

Re: [PR] make DefaultSubstraitProducer public [datafusion]

2025-02-17 Thread via GitHub
xudong963 merged PR #14721: URL: https://github.com/apache/datafusion/pull/14721 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@data

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958868756 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2283,49 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Res

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
wiedld commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958868756 ## datafusion/core/tests/physical_optimizer/enforce_sorting.rs: ## @@ -2280,3 +2283,49 @@ async fn test_not_replaced_with_partial_sort_for_unbounded_input() -> Res

Re: [I] Proper NULL handling in array functions [datafusion]

2025-02-17 Thread via GitHub
alan910127 commented on issue #14451: URL: https://github.com/apache/datafusion/issues/14451#issuecomment-2664259352 > I don't think it's _that_ important to match DuckDB errors 100%, but I might be the wrong person to ask. Yeah, I agree. But in this case, I feel like their implementa

Re: [PR] make DefaultSubstraitProducer public [datafusion]

2025-02-17 Thread via GitHub
vbarua commented on PR #14721: URL: https://github.com/apache/datafusion/pull/14721#issuecomment-2664247702 Ah, yes it makes sense for this to be public. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] pyarrow-19.0.0 breaks unit test [datafusion-python]

2025-02-17 Thread via GitHub
timsaucer closed issue #1023: pyarrow-19.0.0 breaks unit test URL: https://github.com/apache/datafusion-python/issues/1023 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

[I] Improve release candidate numbering [datafusion-python]

2025-02-17 Thread via GitHub
timsaucer opened a new issue, #1025: URL: https://github.com/apache/datafusion-python/issues/1025 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently we have a divergence in a few releases in the `datafusion-python` vers

Re: [PR] feat: add experimental remote HDFS support for native DataFusion reader [datafusion-comet]

2025-02-17 Thread via GitHub
kazuyukitanimura commented on code in PR #1359: URL: https://github.com/apache/datafusion-comet/pull/1359#discussion_r1958826427 ## docs/source/user-guide/datasources.md: ## @@ -35,3 +35,81 @@ converted into Arrow format, allowing native execution to happen after that. Comet

[PR] Support aliases in ConstEvaluator [datafusion]

2025-02-17 Thread via GitHub
joroKr21 opened a new pull request, #14734: URL: https://github.com/apache/datafusion/pull/14734 Not sure why they are not supported. It seems that if we're not careful, some transformations can introduce aliases nested inside other expressions. ## Which issue does this PR close?

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958807208 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,89 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-17 +auth

Re: [PR] minor: remove custom extract_ok! macro [datafusion]

2025-02-17 Thread via GitHub
ctsk commented on PR #14733: URL: https://github.com/apache/datafusion/pull/14733#issuecomment-2664118881 @alamb I see that you've previously attempted something similar in https://github.com/apache/datafusion/pull/7025 - but a larger refactoring is not (no longer?) necessary to drop extrac

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
kazuyukitanimura commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958757936 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,89 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-1

[PR] minor: remove custom extract_ok! macro [datafusion]

2025-02-17 Thread via GitHub
ctsk opened a new pull request, #14733: URL: https://github.com/apache/datafusion/pull/14733 When reading through aggregation code, I saw that the `extract_ok!` macro seems to do the same thing as the `?` operator in this context. ```rust /// Extracts a successful Ok(_) or returns

[PR] dependabot: group arrow/parquet minor/patch bumps, remove limit [datafusion]

2025-02-17 Thread via GitHub
mbrobbel opened a new pull request, #14730: URL: https://github.com/apache/datafusion/pull/14730 ## Which issue does this PR close? None. ## Rationale for this change It would be nice if minor/patch bumps of `arrow*` and `parquet` are grouped. In addition major parquet b

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958758039 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,89 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-17 +auth

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
comphead commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958757161 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,89 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-17 +autho

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958756377 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,86 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-17 +auth

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
robtandy commented on code in PR #56: URL: https://github.com/apache/datafusion-site/pull/56#discussion_r1958746660 ## content/blog/2025-02-17-datafusion-comet-0.6.0.md: ## @@ -0,0 +1,86 @@ +--- +layout: post +title: Apache DataFusion Comet 0.6.0 Release +date: 2025-02-17 +autho

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-17 Thread via GitHub
timsaucer commented on code in PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#discussion_r1958743672 ## Cargo.toml: ## @@ -17,7 +17,7 @@ [package] name = "datafusion-python" -version = "44.0.0" +version = "45.1.0" Review Comment: Yes, unfortunately

Re: [PR] Set projection before configuring the source [datafusion]

2025-02-17 Thread via GitHub
blaginin commented on PR #14685: URL: https://github.com/apache/datafusion/pull/14685#issuecomment-2664009187 Thanks for checking! I also thought that, but couldn't do because `project` relies on other fields to be set. You can see that for example in `partition_column_projector` test, wher

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-17 Thread via GitHub
timsaucer commented on code in PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#discussion_r1958742717 ## python/tests/test_dataframe.py: ## @@ -755,13 +755,20 @@ def test_execution_plan(aggregate_df): assert "CsvExec:" in indent ctx = SessionCon

[PR] Site/comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove opened a new pull request, #56: URL: https://github.com/apache/datafusion-site/pull/56 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove closed pull request #55: Comet 0.6.0 URL: https://github.com/apache/datafusion-site/pull/55 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gi

[PR] Comet 0.6.0 [datafusion-site]

2025-02-17 Thread via GitHub
andygrove opened a new pull request, #55: URL: https://github.com/apache/datafusion-site/pull/55 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix: fetch is missed during EnforceDistribution [datafusion]

2025-02-17 Thread via GitHub
berkaysynnada commented on PR #14207: URL: https://github.com/apache/datafusion/pull/14207#issuecomment-2663989003 @xudong963 the bug still lives? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[PR] fix: Remove more cast.rs logic from parquet_support.rs for experimental native scans [datafusion-comet]

2025-02-17 Thread via GitHub
mbutrovich opened a new pull request, #1413: URL: https://github.com/apache/datafusion-comet/pull/1413 ## Which issue does this PR close? Closes #. ## Rationale for this change See #1387 for more discussion on this topic. ## What changes are include

Re: [PR] Minor: Further Clean-up in Enforce Sorting [datafusion]

2025-02-17 Thread via GitHub
berkaysynnada closed pull request #14732: Minor: Further Clean-up in Enforce Sorting URL: https://github.com/apache/datafusion/pull/14732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[PR] Minor: Further Clean-up in Enforce Sorting [datafusion]

2025-02-17 Thread via GitHub
berkaysynnada opened a new pull request, #14732: URL: https://github.com/apache/datafusion/pull/14732 ## Which issue does this PR close? - Closes #. ## Rationale for this change This is a follow-up on #14650. @xudong963 had shown that rather than a change

Re: [PR] fix: EnforceSorting should not remove a needed coalesces [datafusion]

2025-02-17 Thread via GitHub
xudong963 commented on code in PR #14637: URL: https://github.com/apache/datafusion/pull/14637#discussion_r1958375906 ## datafusion/physical-optimizer/src/enforce_sorting/mod.rs: ## @@ -126,29 +126,65 @@ fn update_sort_ctx_children( /// [`CoalescePartitionsExec`] descendant(s)

Re: [PR] Add union_tag scalar function [datafusion]

2025-02-17 Thread via GitHub
gstvg commented on code in PR #14687: URL: https://github.com/apache/datafusion/pull/14687#discussion_r1958697188 ## datafusion/functions/src/core/union_tag.rs: ## @@ -0,0 +1,223 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Dataframe with_column and with_column_renamed performance improvements [datafusion]

2025-02-17 Thread via GitHub
Omega359 commented on PR #14653: URL: https://github.com/apache/datafusion/pull/14653#issuecomment-2663930834 ``` with_column_10 time: [6.1112 ms 6.2616 ms 6.4226 ms] change: [+18.276% +23.739% +29.703%] (p = 0.00 < 0.05) with_column_100 ti

Re: [PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-17 Thread via GitHub
kevinjqliu commented on code in PR #1024: URL: https://github.com/apache/datafusion-python/pull/1024#discussion_r1958697079 ## python/tests/test_dataframe.py: ## @@ -755,13 +755,20 @@ def test_execution_plan(aggregate_df): assert "CsvExec:" in indent ctx = SessionCo

Re: [PR] Add union_tag scalar function [datafusion]

2025-02-17 Thread via GitHub
gstvg commented on code in PR #14687: URL: https://github.com/apache/datafusion/pull/14687#discussion_r1958697188 ## datafusion/functions/src/core/union_tag.rs: ## @@ -0,0 +1,223 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license ag

Re: [PR] Add support for Postgres `ALTER TYPE` [datafusion-sqlparser-rs]

2025-02-17 Thread via GitHub
iffyio merged PR #1727: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1727 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-17 Thread via GitHub
blaginin commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1958693053 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1779,37 +1779,82 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-17 Thread via GitHub
blaginin commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1958680507 ## datafusion/core/src/dataframe/mod.rs: ## @@ -1779,37 +1779,82 @@ impl DataFrame { .config_options() .sql_parser .enable_

Re: [PR] fix: workaround to get benchmarks working again [datafusion-ballista]

2025-02-17 Thread via GitHub
milenkovicm commented on PR #1184: URL: https://github.com/apache/datafusion-ballista/pull/1184#issuecomment-2663862812 Unfortunately two places at the moment, I've missed duplicated code -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] Reuse last projection layer when renaming columns [datafusion]

2025-02-17 Thread via GitHub
blaginin commented on code in PR #14684: URL: https://github.com/apache/datafusion/pull/14684#discussion_r1958663144 ## datafusion/core/tests/dataframe/mod.rs: ## @@ -1617,9 +1617,19 @@ async fn with_column_renamed() -> Result<()> { // accepts table qualifier .

[PR] Chore: Release datafusion-python 45 [datafusion-python]

2025-02-17 Thread via GitHub
timsaucer opened a new pull request, #1024: URL: https://github.com/apache/datafusion-python/pull/1024 # Which issue does this PR close? This is to release DataFusion Python 45. # Rationale for this change Next release -- This is an automated message from the Apac

Re: [PR] docs: Add instruction to build [datafusion]

2025-02-17 Thread via GitHub
comphead commented on code in PR #14694: URL: https://github.com/apache/datafusion/pull/14694#discussion_r1958646842 ## docs/source/contributor-guide/development_environment.md: ## @@ -37,6 +37,22 @@ developing DataFusion in an isolated environment either locally or remote if d

Re: [PR] fix: workaround to get benchmarks working again [datafusion-ballista]

2025-02-17 Thread via GitHub
andygrove commented on PR #1184: URL: https://github.com/apache/datafusion-ballista/pull/1184#issuecomment-2663789239 Yes, disabling string views would be ideal. I wasn't sure where to do that. -- This is an automated message from the Apache Git Service. To respond to the message, please

Re: [PR] chore(deps): bump arrow from 54.1.0 to 54.2.0 [datafusion]

2025-02-17 Thread via GitHub
findepi commented on PR #14716: URL: https://github.com/apache/datafusion/pull/14716#issuecomment-2663635380 This should include regression test for https://github.com/apache/arrow-rs/issues/7069. I know this works in Arrow and we **currently** just use arrow impl directly, but it does n

Re: [I] Deprecate and eventually remove `ScalarUDF::invoke_batch` [datafusion]

2025-02-17 Thread via GitHub
goldmedal commented on issue #14652: URL: https://github.com/apache/datafusion/issues/14652#issuecomment-2663631166 > Just FYI: As I note in [#14729](https://github.com/apache/datafusion/issues/14729), it seems rust (neither compile nor clippy) doesn't warn about _implementing_ a deprecated

Re: [PR] chore" Migrate Regex function to invoke_with_args [datafusion]

2025-02-17 Thread via GitHub
goldmedal commented on code in PR #14728: URL: https://github.com/apache/datafusion/pull/14728#discussion_r1958504929 ## datafusion/functions/src/regex/regexpcount.rs: ## @@ -655,11 +657,12 @@ mod tests { let v_sv = ScalarValue::Utf8(Some(v.to_string()));

Re: [I] Documentation regarding running/regenerating stability test plans [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove commented on issue #1393: URL: https://github.com/apache/datafusion-comet/issues/1393#issuecomment-2663582946 > [@EmilyMatt](https://github.com/EmilyMatt) This may be because the diff files in the main branch currently have the wrong Comet version. This is fixed in [#1404](https:

Re: [I] Need help running benchmarks and other pyspark jobs. [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove commented on issue #1411: URL: https://github.com/apache/datafusion-comet/issues/1411#issuecomment-2663580349 Ho @Noah-FetchRewards. It looks like you are trying to do two things at once - run TPC-H on Spark on k8s, and run Comet. If your requirement is to run in k8s then I

Re: [I] Overflow happened on: -2147483648 % -1 [datafusion-comet]

2025-02-17 Thread via GitHub
andygrove commented on issue #1412: URL: https://github.com/apache/datafusion-comet/issues/1412#issuecomment-2663563958 Nice find. Thanks @wForget. Do you plan on working on a fix? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

  1   2   >