Re: [I] Add tests for Scalar and Inverval values for UnaryMinus [datafusion-comet]

2024-06-05 Thread via GitHub
vaibhawvipul commented on issue #508: URL: https://github.com/apache/datafusion-comet/issues/508#issuecomment-2151535000 I will work on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] chore: Remove 3.4.2.diff [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on PR #528: URL: https://github.com/apache/datafusion-comet/pull/528#issuecomment-2151533724 Thank you merged @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] chore: Remove 3.4.2.diff [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura merged PR #528: URL: https://github.com/apache/datafusion-comet/pull/528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

[PR] Fix `ScalarUDFImpl::propagate_constraints` doc [datafusion]

2024-06-05 Thread via GitHub
lewiszlw opened a new pull request, #10810: URL: https://github.com/apache/datafusion/pull/10810 ## Which issue does this PR close? Closes #. ## Rationale for this change I'm reading somd udf code, and found the example in `ScalarUDFImpl::propagate_constr

Re: [PR] fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #523: URL: https://github.com/apache/datafusion-comet/pull/523#issuecomment-2151485524 cc @huaxingao @kazuyukitanimura -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[PR] chore: Remove 3.4.2.diff [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura opened a new pull request, #528: URL: https://github.com/apache/datafusion-comet/pull/528 ## Which issue does this PR close? Closes #. ## Rationale for this change ## What changes are included in this PR? ## How are these ch

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
viirya merged PR #518: URL: https://github.com/apache/datafusion-comet/pull/518 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#issuecomment-2151361526 Merged. Thanks @huaxingao @advancedxy -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [I] Incorrect return type on aggregate functions implemented by AggregateUDF when upgrading to latest DataFusion [datafusion-comet]

2024-06-05 Thread via GitHub
viirya closed issue #511: Incorrect return type on aggregate functions implemented by AggregateUDF when upgrading to latest DataFusion URL: https://github.com/apache/datafusion-comet/issues/511 -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add `ParquetAccessPlan`, unify RowGroup selection and PagePruning selection [datafusion]

2024-06-05 Thread via GitHub
Ted-Jiang commented on code in PR #10738: URL: https://github.com/apache/datafusion/pull/10738#discussion_r1628732681 ## datafusion/core/src/datasource/physical_plan/parquet/page_filter.rs: ## @@ -236,6 +225,24 @@ impl PagePruningPredicate { } } +/// returns the number o

Re: [PR] Add `ParquetAccessPlan`, unify RowGroup selection and PagePruning selection [datafusion]

2024-06-05 Thread via GitHub
Ted-Jiang commented on PR #10738: URL: https://github.com/apache/datafusion/pull/10738#issuecomment-2151312356 Thanks for ping me, i will review this carefully later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on code in PR #523: URL: https://github.com/apache/datafusion-comet/pull/523#discussion_r1628700174 ## core/src/execution/datafusion/shuffle_writer.rs: ## @@ -947,12 +948,18 @@ async fn external_shuffle( partitioning, metrics, context.

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya closed pull request #527: build: Update plan stability for Spark 4.0 URL: https://github.com/apache/datafusion-comet/pull/527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] update deps [datafusion-python]

2024-06-05 Thread via GitHub
Michael-J-Ward opened a new pull request, #723: URL: https://github.com/apache/datafusion-python/pull/723 I'd like to clear out the `dependabot` pull requests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#issuecomment-2151280110 I guess that with #526, we don't need to update the plan stability. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [PR] chore: Upgrade spark to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya merged PR #526: URL: https://github.com/apache/datafusion-comet/pull/526 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Upgrade spark4.0 to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya closed issue #525: Upgrade spark4.0 to 4.0.0-preview1 URL: https://github.com/apache/datafusion-comet/issues/525 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on code in PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#discussion_r1628673092 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark4_0/q1/explain.txt: ## @@ -31,9 +31,9 @@ TakeOrderedAndProject (40) : :

Re: [PR] feat: Use file cache to list partitions if available [datafusion]

2024-06-05 Thread via GitHub
github-actions[bot] commented on PR #9655: URL: https://github.com/apache/datafusion/pull/9655#issuecomment-2151257622 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Add GroupValuesFullyOrdered mode to GroupValues trait for aggregate grouping. [datafusion]

2024-06-05 Thread via GitHub
github-actions[bot] closed pull request #9662: Add GroupValuesFullyOrdered mode to GroupValues trait for aggregate grouping. URL: https://github.com/apache/datafusion/pull/9662 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

Re: [PR] fix: move array_except in SetOp and support Null columnar in `array_except`, `array_union` and `array_intersect` [datafusion]

2024-06-05 Thread via GitHub
github-actions[bot] closed pull request #9710: fix: move array_except in SetOp and support Null columnar in `array_except`, `array_union` and `array_intersect` URL: https://github.com/apache/datafusion/pull/9710 -- This is an automated message from the Apache Git Service. To respond to the me

Re: [I] `select array_concat([])` panicked [datafusion]

2024-06-05 Thread via GitHub
jonahgao commented on issue #10200: URL: https://github.com/apache/datafusion/issues/10200#issuecomment-2151251654 Fixed by #10790 10790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-05 Thread via GitHub
jonahgao commented on code in PR #10790: URL: https://github.com/apache/datafusion/pull/10790#discussion_r1628633358 ## datafusion/sqllogictest/test_files/array.slt: ## @@ -2622,6 +2629,12 @@ drop table large_array_repeat_table; ## array_concat (aliases: `array_cat`, `list_co

Re: [I] `select array_concat([])` panicked [datafusion]

2024-06-05 Thread via GitHub
jonahgao closed issue #10200: `select array_concat([])` panicked URL: https://github.com/apache/datafusion/issues/10200 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsub

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
codecov-commenter commented on PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#issuecomment-2151220549 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/527?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] chore: Upgrade spark to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
codecov-commenter commented on PR #526: URL: https://github.com/apache/datafusion-comet/pull/526#issuecomment-2151219521 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/526?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on code in PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#discussion_r1628603328 ## spark/src/test/resources/tpcds-plan-stability/approved-plans-v1_4-spark4_0/q1/explain.txt: ## @@ -31,9 +31,9 @@ TakeOrderedAndProject (40) :

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#issuecomment-2151168828 cc @kazuyukitanimura @andygrove @huaxingao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #527: URL: https://github.com/apache/datafusion-comet/pull/527#issuecomment-2151168689 The plan stability results of Spark 4.0 is changed. It causes current CI pipelines failures. -- This is an automated message from the Apache Git Service. To respond to the message

[PR] build: Update plan stability for Spark 4.0 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya opened a new pull request, #527: URL: https://github.com/apache/datafusion-comet/pull/527 ## Which issue does this PR close? Closes #. ## Rationale for this change Fixing broken CI pipelines. ## What changes are included in this PR?

Re: [PR] chore: Upgrade spark to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on PR #526: URL: https://github.com/apache/datafusion-comet/pull/526#issuecomment-2151168099 @kazuyukitanimura would you mind take a look at this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[PR] chore: Upgrade spark to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy opened a new pull request, #526: URL: https://github.com/apache/datafusion-comet/pull/526 ## Which issue does this PR close? Closes #525 ## Rationale for this change Reduce CI time and for local testing with spark 4.0. ## What changes are included in this PR?

[I] Upgrade spark4.0 to 4.0.0-preview1 [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy opened a new issue, #525: URL: https://github.com/apache/datafusion-comet/issues/525 ### What is the problem the feature request solves? Since spark 4.0.0-preview1 has already been [released](https://lists.apache.org/thread/y0fhglwjdrt90qjd0ntgvy0qodzdtzmn), we should use

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-05 Thread via GitHub
jayzhan211 merged PR #10790: URL: https://github.com/apache/datafusion/pull/10790 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [I] `Int64` as default type for `make_array` function empty or null case [datafusion]

2024-06-05 Thread via GitHub
jayzhan211 closed issue #10789: `Int64` as default type for `make_array` function empty or null case URL: https://github.com/apache/datafusion/issues/10789 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-05 Thread via GitHub
jayzhan211 commented on PR #10790: URL: https://github.com/apache/datafusion/pull/10790#issuecomment-2151149267 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [PR] Int64 as default type for make_array function empty or null case [datafusion]

2024-06-05 Thread via GitHub
jayzhan211 commented on PR #10790: URL: https://github.com/apache/datafusion/pull/10790#issuecomment-2151148927 > Looks like a reasonable change to me. Thanks @jayzhan211 > > BTW I tested in duckdb and it seems like the default type is actually `int32` but I think `int64` is close eno

Re: [PR] fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on code in PR #523: URL: https://github.com/apache/datafusion-comet/pull/523#discussion_r1628574405 ## core/src/execution/datafusion/shuffle_writer.rs: ## @@ -947,12 +948,18 @@ async fn external_shuffle( partitioning, metrics, cont

Re: [I] Support join filter in NestedLoopJoin in fizz join test cases [datafusion]

2024-06-05 Thread via GitHub
comphead commented on issue #10787: URL: https://github.com/apache/datafusion/issues/10787#issuecomment-2151145063 Thats interesting, I'll try to add a test case for it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[PR] Bench: Add `PREFER_HASH_JOIN` env variable [datafusion]

2024-06-05 Thread via GitHub
comphead opened a new pull request, #10809: URL: https://github.com/apache/datafusion/pull/10809 ## Which issue does this PR close? Closes #. ## Rationale for this change By default benches run with hash join algorithm, this PR introduces new env variable

Re: [PR] feat: Add fuzz testing for arithmetic expressions [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on PR #519: URL: https://github.com/apache/datafusion-comet/pull/519#issuecomment-2151137653 Thanks for the review @huaxingao @kazuyukitanimura @viirya I have added the bitwise expressions. -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#issuecomment-2151133054 > LGTM. > > Do we need to contribute the UnboundColumn back to DataFusion? DataFusion doesn't need it. -- This is an automated message from the Apache Git Service. To

Re: [PR] Experiment: Coalesce batches after scan [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove closed pull request #496: Experiment: Coalesce batches after scan URL: https://github.com/apache/datafusion-comet/pull/496 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Experiment: Coalesce batches after scan [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on PR #496: URL: https://github.com/apache/datafusion-comet/pull/496#issuecomment-2151128821 I ran benchmarks with this change and see no improvement, so closing this as a failed experiement. -- This is an automated message from the Apache Git Service. To respond to th

Re: [I] TPC-H q8 hangs with xxhash64 enabled [datafusion-comet]

2024-06-05 Thread via GitHub
parthchandra commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2151122785 Was this running on a K8s cluster? It should not result in a OutOfMemory if cpu limits are reached. -- This is an automated message from the Apache Git Service. To res

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1628556183 ## dev/release/run-rat.sh: ## @@ -0,0 +1,43 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license ag

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1627813262 ## dev/release/run-rat.sh: ## @@ -0,0 +1,43 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license ag

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1628554607 ## dev/release/create-tarball.sh: ## @@ -0,0 +1,135 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor li

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2151114500 > We plan to do binary release, although it might not be able to catch up the 0.1.0 source release. I think we are on the same page. > Comet involves native co

[I] [EPIC] Document all known incompatibilities [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove opened a new issue, #524: URL: https://github.com/apache/datafusion-comet/issues/524 ### What is the problem the feature request solves? As we approach the [0.1.0 release](https://github.com/apache/datafusion-comet/issues/369), we want to make sure that we address all (curr

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
codecov-commenter commented on PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#issuecomment-215554 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/518?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

Re: [I] Possible regression in memory usage [datafusion-comet]

2024-06-05 Thread via GitHub
advancedxy commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2151102014 > all/most cores are at 100% utilization So it might be more about cpu usage rather than memory consumption? If it’s indeed related to xxhash64, one possible reason

Re: [I] Support OneRowRelation to Support Scalar Inputs? [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on issue #516: URL: https://github.com/apache/datafusion-comet/issues/516#issuecomment-2151095074 Thank you @tshauck assigned this to you. Now `OneRowRelation` issue and `scalar` test issues are decoupled. If you would like to separate the ticket, please do so.

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2151086300 I created a milestone where we can track the priority issues for the 0.1.0 release https://github.com/apache/datafusion-comet/milestone/1 -- This is an automated m

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on code in PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#discussion_r1628506549 ## core/src/execution/datafusion/expressions/unbound.rs: ## @@ -0,0 +1,110 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [I] Possible regression in memory usage [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2151057179 The issue is happening with query 8. The behavior I see is that all/most cores are at 100% utilization, and memory is not actually high, but the cluster becomes unresponsiv

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on code in PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#discussion_r1628505993 ## core/src/execution/datafusion/expressions/unbound.rs: ## @@ -0,0 +1,110 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribut

Re: [PR] chore: Add UnboundColumn to carry datatype for unbound reference [datafusion-comet]

2024-06-05 Thread via GitHub
huaxingao commented on code in PR #518: URL: https://github.com/apache/datafusion-comet/pull/518#discussion_r1628501503 ## core/src/execution/datafusion/expressions/unbound.rs: ## @@ -0,0 +1,110 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contri

Re: [PR] docs: changes in documentation [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on PR #512: URL: https://github.com/apache/datafusion-comet/pull/512#issuecomment-2151046096 Thank you @SemyonSinchenko @andygrove @viirya merged -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Remove git-commit-id-maven-plugin [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura closed issue #191: Remove git-commit-id-maven-plugin URL: https://github.com/apache/datafusion-comet/issues/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] [NOT A BUG] Why comet does not convert the HashAggregate expression to native in my query? [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura closed issue #503: [NOT A BUG] Why comet does not convert the HashAggregate expression to native in my query? URL: https://github.com/apache/datafusion-comet/issues/503 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [I] Remove git-commit-id-maven-plugin [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura closed issue #191: Remove git-commit-id-maven-plugin URL: https://github.com/apache/datafusion-comet/issues/191 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] docs: changes in documentation [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura merged PR #512: URL: https://github.com/apache/datafusion-comet/pull/512 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubsc

Re: [I] Plan Comet 0.1.0 Release [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on issue #369: URL: https://github.com/apache/datafusion-comet/issues/369#issuecomment-2151002022 > I assume the source release will tag the repo with a `release-0.1.0` tag. Even though a maven artifact would not be published, it does allow projects to build their own,

Re: [PR] chore: Simplify code in CometExecIterator and avoid some small overhead [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove merged PR #522: URL: https://github.com/apache/datafusion-comet/pull/522 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Possible regression in memory usage [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2150948824 Is the failed one with OOM a query with xxhash64 so it is not natively run before? -- This is an automated message from the Apache Git Service. To respond to the message, pl

Re: [PR] fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on PR #523: URL: https://github.com/apache/datafusion-comet/pull/523#issuecomment-2150944200 cc @andygrove @sunchao -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Possible regression in memory usage [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2150901696 The issue starts with the PR that added xxhash64 (https://github.com/apache/datafusion-comet/pull/424). I wonder if this is an issue with hashing itself or just that

Re: [PR] docs: changes in documentation [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on code in PR #512: URL: https://github.com/apache/datafusion-comet/pull/512#discussion_r1628376610 ## docs/source/user-guide/tuning.md: ## @@ -39,6 +39,8 @@ It must be set before the Spark context is created. You can enable or disable Co at runtime by setting

[PR] fix: Input batch to ShuffleRepartitioner.insert_batch should not be larger than configured batch size [datafusion-comet]

2024-06-05 Thread via GitHub
viirya opened a new pull request, #523: URL: https://github.com/apache/datafusion-comet/pull/523 ## Which issue does this PR close? Closes #498. ## Rationale for this change ## What changes are included in this PR? ## How are these changes t

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on PR #10801: URL: https://github.com/apache/datafusion/pull/10801#issuecomment-2150851089 > > 1. Interval statistics not supported; possibly due to those [lines](https://github.com/apache/datafusion/issues/10752#issuecomment-2150024521) > > 2. Type IntervalUnit::

Re: [PR] chore: Make ANSI fallback more granular [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on PR #509: URL: https://github.com/apache/datafusion-comet/pull/509#issuecomment-2150842134 This is a pretty interesting test failure: ``` - postgreSQL/groupingsets.sql *** FAILED *** (4 seconds, 331 milliseconds) postgreSQL/groupingsets.sql Expecte

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
alamb commented on PR #10801: URL: https://github.com/apache/datafusion/pull/10801#issuecomment-2150838943 > 1. Interval statistics not supported; possibly due to those [lines](https://github.com/apache/datafusion/issues/10752#issuecomment-2150024521) > 2. Type IntervalUnit::MonthDayNano

Re: [I] bug: org.apache.comet.CometNativeException: range end index 8221 out of range for slice of length 8192 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on issue #498: URL: https://github.com/apache/datafusion-comet/issues/498#issuecomment-2150837074 Ah, I probably found where it goes wrong. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

Re: [PR] Prune Parquet RowGroup in a single call to `PruningPredicate::prune`, update StatisticsExtractor API [datafusion]

2024-06-05 Thread via GitHub
NGA-TRAN commented on code in PR #10802: URL: https://github.com/apache/datafusion/pull/10802#discussion_r1628162150 ## datafusion-examples/examples/parquet_index.rs: ## @@ -518,21 +518,17 @@ impl ParquetMetadataIndexBuilder { // extract the parquet statistics from th

Re: [I] bug: attempt to multiply with overflow in cast string to date [datafusion-comet]

2024-06-05 Thread via GitHub
eejbyfeldt commented on issue #481: URL: https://github.com/apache/datafusion-comet/issues/481#issuecomment-2150830474 @andygrove Then there is more than one issue. I also recreated a similar crash using the date `29-01-01` inside a timestamp. Spark will correctly return the year give

Re: [I] bug: org.apache.comet.CometNativeException: range end index 8221 out of range for slice of length 8192 [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on issue #498: URL: https://github.com/apache/datafusion-comet/issues/498#issuecomment-2150814297 I don't see this before. If there is reproducible example for this, it will be easier to debug. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] feat: Support Ansi mode in abs function [datafusion-comet]

2024-06-05 Thread via GitHub
planga82 commented on PR #500: URL: https://github.com/apache/datafusion-comet/pull/500#issuecomment-2150802402 I have done a refactor with all the comments, thanks for the revision! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Gi

Re: [PR] chore: Make ANSI fallback more granular [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove commented on code in PR #509: URL: https://github.com/apache/datafusion-comet/pull/509#discussion_r1628307927 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -712,17 +712,6 @@ class CometSparkSessionExtensions } override de

Re: [I] Support OneRowRelation to Support Scalar Inputs? [datafusion-comet]

2024-06-05 Thread via GitHub
tshauck commented on issue #516: URL: https://github.com/apache/datafusion-comet/issues/516#issuecomment-2150771572 Thanks, that's my understanding as well. Here's the extended explain for reference... ``` spark-sql (default)> EXPLAIN EXTENDED SELECT trim('123 '); 24/06/0

[PR] chore: Simplify code in CometExecIterator and avoid some small overhead [datafusion-comet]

2024-06-05 Thread via GitHub
andygrove opened a new pull request, #522: URL: https://github.com/apache/datafusion-comet/pull/522 ## Which issue does this PR close? N/A ## Rationale for this change In the original code, `getNextBatch` called `executeNative` which would create a `Batch

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-06-05 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2150740158 I suspect this means that the min/max function for intervals is also incorrectly propagating nulls but I tried to make a unit test for intervals and get an error that there is no

Re: [PR] fix: use total ordering in the min & max accumulator for floats [datafusion]

2024-06-05 Thread via GitHub
westonpace commented on PR #10627: URL: https://github.com/apache/datafusion/pull/10627#issuecomment-2150738379 > @westonpace what is the status / plan with this PR? It has failing CI tests but is not marked as a draft. Are you still planning on working with it? Do you need help to push it

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on PR #10801: URL: https://github.com/apache/datafusion/pull/10801#issuecomment-2150714644 @alamb thanks for the review. I have adressed your comments PTAL. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on code in PR #10801: URL: https://github.com/apache/datafusion/pull/10801#discussion_r1628177847 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -256,6 +259,13 @@ macro_rules! get_statistic { Some(DataTy

[I] Add ability to receive an iterator over the inputs of a LogicalPlan instead of a Vec. [datafusion]

2024-06-05 Thread via GitHub
LorrensP-2158466 opened a new issue, #10808: URL: https://github.com/apache/datafusion/issues/10808 ### Is your feature request related to a problem or challenge? Currently, the only way to get the inputs of a LogicalPlan is to call `inputs()`, which returns a `Vec<&LogicalPlan>`. But

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on code in PR #10801: URL: https://github.com/apache/datafusion/pull/10801#discussion_r1628177847 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -256,6 +259,13 @@ macro_rules! get_statistic { Some(DataTy

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on code in PR #10801: URL: https://github.com/apache/datafusion/pull/10801#discussion_r1628209250 ## datafusion/core/tests/parquet/mod.rs: ## @@ -925,6 +932,71 @@ fn make_dict_batch() -> RecordBatch { .unwrap() } +fn make_interval_batch(offset: i

Re: [PR] chore: Make ANSI fallback more granular [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on code in PR #509: URL: https://github.com/apache/datafusion-comet/pull/509#discussion_r162818 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -712,17 +712,6 @@ class CometSparkSessionExtensions } over

Re: [I] Support OneRowRelation to Support Scalar Inputs? [datafusion-comet]

2024-06-05 Thread via GitHub
viirya commented on issue #516: URL: https://github.com/apache/datafusion-comet/issues/516#issuecomment-2150642140 `OneRowRelation ` is logical node. It will be planned as a physical node `RDDScanExec` with one single row (empty columns). It should be easy to implement in Comet. -- This

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on PR #10801: URL: https://github.com/apache/datafusion/pull/10801#issuecomment-2150625815 > > @alamb PTAL. > > As described in the PR I tried to prepare as much as possible, although statistics are not yet supported. > > Yet another issue: The `IntervalUnit::M

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
marvinlanhenke commented on code in PR #10801: URL: https://github.com/apache/datafusion/pull/10801#discussion_r1628177847 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -256,6 +259,13 @@ macro_rules! get_statistic { Some(DataTy

Re: [I] Support OneRowRelation to Support Scalar Inputs? [datafusion-comet]

2024-06-05 Thread via GitHub
kazuyukitanimura commented on issue #516: URL: https://github.com/apache/datafusion-comet/issues/516#issuecomment-2150607171 @tshauck Thank you for working on this. I need to look into the details but the name `OneRowRelation` sounds like it is reading from a table. What about reading pa

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
alamb commented on PR #10801: URL: https://github.com/apache/datafusion/pull/10801#issuecomment-2150605789 > @alamb PTAL. > > As described in the PR I tried to prepare as much as possible, although statistics are not yet supported. > > Yet another issue: The `IntervalUnit::Mont

Re: [PR] feat: Support Ansi mode in abs function [datafusion-comet]

2024-06-05 Thread via GitHub
parthchandra commented on code in PR #500: URL: https://github.com/apache/datafusion-comet/pull/500#discussion_r1628163758 ## core/src/execution/datafusion/expressions/abs.rs: ## @@ -0,0 +1,87 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contribu

Re: [I] Feedback request for providing configurable UDF functions [datafusion]

2024-06-05 Thread via GitHub
Omega359 commented on issue #10744: URL: https://github.com/apache/datafusion/issues/10744#issuecomment-2150594778 > I have read the context now and understand that this is about `safe` mode or what Spark calls `ANSI` mode. > > Isn't this just a case of adding a new flag to the sessio

Re: [PR] chore: Create initial release process scripts for official ASF source release [datafusion-comet]

2024-06-05 Thread via GitHub
parthchandra commented on code in PR #429: URL: https://github.com/apache/datafusion-comet/pull/429#discussion_r1628161655 ## dev/release/create-tarball.sh: ## @@ -0,0 +1,135 @@ +#!/bin/bash +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor

Re: [PR] feat: Add fuzz testing for arithmetic expressions [datafusion-comet]

2024-06-05 Thread via GitHub
huaxingao commented on PR #519: URL: https://github.com/apache/datafusion-comet/pull/519#issuecomment-2150588332 Thanks @andygrove for the PR! For binary arithmetic expressions, shall we also include bitwise operation such as `BitwiseAnd`, `BitwiseOr`, `BitwiseXor`? -- This is an automate

Re: [PR] Handle EmptyRelation during SQL unparsing [datafusion]

2024-06-05 Thread via GitHub
goldmedal commented on PR #10803: URL: https://github.com/apache/datafusion/pull/10803#issuecomment-2150586926 Thanks @alamb @devinjdangelo I think I will address @alamb's comments in the follow-up PR. -- This is an automated message from the Apache Git Service. To respond to the messa

Re: [PR] Extract Parquet statistics from `Interval` column [datafusion]

2024-06-05 Thread via GitHub
alamb commented on code in PR #10801: URL: https://github.com/apache/datafusion/pull/10801#discussion_r1628143605 ## datafusion/core/tests/parquet/mod.rs: ## @@ -925,6 +932,71 @@ fn make_dict_batch() -> RecordBatch { .unwrap() } +fn make_interval_batch(offset: i32) -> Re

  1   2   3   >