Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640748152 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -39,102 +40,102 @@ use arrow_array::{ use arrow_schema::{DataType, Field, Schema}; use datafus

Re: [I] Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-06-14 Thread via GitHub
ozankabak commented on issue #10881: URL: https://github.com/apache/datafusion/issues/10881#issuecomment-2169129789 If we look at the following plan: > EXPLAIN SELECT e.emp_id, e.name, d.department FROM employees e LEFT JOIN department d ON (e.name = 'Alice' OR e.name = 'Bob')

Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-14 Thread via GitHub
advancedxy commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2169128582 +1. > Maintaining more than two versions definitely slows down development. Maybe we should define a supporting policy about how and which Spark versions sho

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640748152 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -39,102 +40,102 @@ use arrow_array::{ use arrow_schema::{DataType, Field, Schema}; use datafus

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640748152 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -39,102 +40,102 @@ use arrow_array::{ use arrow_schema::{DataType, Field, Schema}; use datafus

Re: [I] Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-06-14 Thread via GitHub
ozankabak commented on issue #10881: URL: https://github.com/apache/datafusion/issues/10881#issuecomment-2169117921 Could this be related to a bug in the optimizer (pushing down filter to the left where it shouldn't)? Maybe it has nothing to do with the join impl itself at all. -- This i

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640741230 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -327,6 +429,38 @@ async fn test_two_row_groups_with_all_nulls_in_one() { // row counts

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640740355 ## datafusion/core/tests/parquet/arrow_statistics.rs: ## @@ -39,102 +40,102 @@ use arrow_array::{ use arrow_schema::{DataType, Field, Schema}; use datafus

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
marvinlanhenke commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640738976 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -771,10 +885,205 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UIn

Re: [I] Implement equality `=` and inequality `<>` support for `StringView` [datafusion]

2024-06-14 Thread via GitHub
Weijun-H commented on issue #10919: URL: https://github.com/apache/datafusion/issues/10919#issuecomment-2169073038 I am glad to pick this ticket. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [PR] refactor: Generate GroupByHash output in multiple RecordBatches [datafusion]

2024-06-14 Thread via GitHub
github-actions[bot] commented on PR #9818: URL: https://github.com/apache/datafusion/pull/9818#issuecomment-2169032432 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] chore(deps): update bigdecimal requirement from =0.4.1 to =0.4.3 [datafusion]

2024-06-14 Thread via GitHub
github-actions[bot] commented on PR #9476: URL: https://github.com/apache/datafusion/pull/9476#issuecomment-2169032448 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Implement spilling for PartialSortExec [datafusion]

2024-06-14 Thread via GitHub
github-actions[bot] commented on PR #9469: URL: https://github.com/apache/datafusion/pull/9469#issuecomment-2169032463 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] [functions]support current_timestamp [datafusion]

2024-06-14 Thread via GitHub
github-actions[bot] closed pull request #6873: [functions]support current_timestamp URL: https://github.com/apache/datafusion/pull/6873 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific c

Re: [PR] Update benchmarks to handle pip dependencies for user [datafusion]

2024-06-14 Thread via GitHub
github-actions[bot] commented on PR #10070: URL: https://github.com/apache/datafusion/pull/10070#issuecomment-2169032402 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1640557359 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -496,11 +496,14 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1640557359 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -496,11 +496,14 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {

Re: [I] Unclear error message when calling a function with no parameters. [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 closed issue #10915: Unclear error message when calling a function with no parameters. URL: https://github.com/apache/datafusion/issues/10915 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] test: Enable Spark 4.0 tests [datafusion-comet]

2024-06-14 Thread via GitHub
kazuyukitanimura commented on PR #537: URL: https://github.com/apache/datafusion-comet/pull/537#issuecomment-2169001890 @andygrove just checking in to see if you have more feedback -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

Re: [PR] chore: update some error messages for clarity [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 merged PR #10916: URL: https://github.com/apache/datafusion/pull/10916 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] chore: update some error messages for clarity [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 commented on PR #10916: URL: https://github.com/apache/datafusion/pull/10916#issuecomment-2169001823 Thanks @jeffreyssmith2nd and @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

[PR] fix: Improve error "BroadcastExchange is not supported" [datafusion-comet]

2024-06-14 Thread via GitHub
parthchandra opened a new pull request, #577: URL: https://github.com/apache/datafusion-comet/pull/577 ## Which issue does this PR close? Closes #568 ## Rationale for this change The message is confusing (and incorrect) ## What changes are included in this PR?

Re: [PR] feat: add CliSessionContext trait for cli [datafusion]

2024-06-14 Thread via GitHub
tshauck commented on code in PR #10890: URL: https://github.com/apache/datafusion/pull/10890#discussion_r1640558703 ## datafusion-cli/examples/cli-session-context.rs: ## @@ -0,0 +1,99 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licen

Re: [I] range end index 294912 out of range for slice of length 147456 [datafusion-comet]

2024-06-14 Thread via GitHub
viirya commented on issue #540: URL: https://github.com/apache/datafusion-comet/issues/540#issuecomment-2168968135 I opened an issue at Java Arrow repo and described the root cause: https://github.com/apache/arrow/issues/42156. Fixing it there might wait for a longer release period. I'm th

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-14 Thread via GitHub
jayzhan211 commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1640557359 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -496,11 +496,14 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {

Re: [I] Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY [datafusion-comet]

2024-06-14 Thread via GitHub
parthchandra commented on issue #574: URL: https://github.com/apache/datafusion-comet/issues/574#issuecomment-2168961884 IIRC, the vectorized versions of these encodings in Spark did not improve performance much over the row based implementation in the parquet library -- This is an autom

Re: [PR] Move `Literal` to `physical-expr-common` [datafusion]

2024-06-14 Thread via GitHub
lewiszlw commented on PR #10910: URL: https://github.com/apache/datafusion/pull/10910#issuecomment-2168953432 Merged. This feels great. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Move `Literal` to `physical-expr-common` [datafusion]

2024-06-14 Thread via GitHub
lewiszlw merged PR #10910: URL: https://github.com/apache/datafusion/pull/10910 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-14 Thread via GitHub
huaxingao commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1640539424 ## core/src/execution/datafusion/expressions/covariance.rs: ## @@ -287,22 +296,24 @@ impl Accumulator for CovarianceAccumulator { } fn evaluate(&mu

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-14 Thread via GitHub
huaxingao commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1640540523 ## core/src/execution/datafusion/expressions/covariance.rs: ## @@ -287,22 +296,24 @@ impl Accumulator for CovarianceAccumulator { } fn evaluate(&mu

Re: [I] Use maven-assembly-plugin to set final artifact name [datafusion-comet]

2024-06-14 Thread via GitHub
parthchandra commented on issue #563: URL: https://github.com/apache/datafusion-comet/issues/563#issuecomment-2168936741 This was taken from Spark which has corrected it since -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
alamb commented on PR #10852: URL: https://github.com/apache/datafusion/pull/10852#issuecomment-2168871005 I think once we are happy with this PR we can merge it in and then I'll file tickets for filling out the other types. I feel like we may be able to make the tests a little better

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
alamb commented on code in PR #10852: URL: https://github.com/apache/datafusion/pull/10852#discussion_r1640436779 ## datafusion/core/src/datasource/physical_plan/parquet/statistics.rs: ## @@ -771,10 +885,205 @@ impl<'a> StatisticsConverter<'a> { Ok(Arc::new(UInt64Array:

Re: [PR] Initial Extract parquet data page statistics API [datafusion]

2024-06-14 Thread via GitHub
alamb commented on PR #10852: URL: https://github.com/apache/datafusion/pull/10852#issuecomment-2168867919 Ok, I spent some time working on a test that had multiple data pages and have checked it in in. @marvinlanhenke any chance you can give this PR another reveiew to see if you t

Re: [I] TPC-H q8 hangs with xxhash64 enabled [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove closed issue #517: TPC-H q8 hangs with xxhash64 enabled URL: https://github.com/apache/datafusion-comet/issues/517 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [I] TPC-H q8 hangs with xxhash64 enabled [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2168838510 The issue is not related to xxhash64. Closing this and filed https://github.com/apache/datafusion-comet/issues/517 -- This is an automated message from the Apache Git Ser

[I] Spark executor becomes unresponsive when trying to transform query plan for TPC-H q8 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove opened a new issue, #576: URL: https://github.com/apache/datafusion-comet/issues/576 ### Describe the bug _Note: This was originally reported as an issue with xxhash64 in https://github.com/apache/datafusion-comet/issues/517 but xxhash64 is not really involved._ When

Re: [PR] Update ListingTable to use StatisticsConverter [datafusion]

2024-06-14 Thread via GitHub
alamb commented on code in PR #10924: URL: https://github.com/apache/datafusion/pull/10924#discussion_r1640395427 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -482,73 +404,101 @@ pub async fn statistics_from_parquet_meta( file_metadata.key_value_metad

[PR] Update ListingTable to use StatisticsConverter [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new pull request, #10924: URL: https://github.com/apache/datafusion/pull/10924 ## Which issue does this PR close? Closes https://github.com/apache/datafusion/issues/10923 ## Rationale for this change 1. The statistics code did not handle all types (like str

Re: [PR] WIP: feat: Add support for Spark 3.5 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on PR #555: URL: https://github.com/apache/datafusion-comet/pull/555#issuecomment-2168814009 I don't think I will have time to work on this, so closing for now -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHu

Re: [PR] WIP: feat: Add support for Spark 3.5 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove closed pull request #555: WIP: feat: Add support for Spark 3.5 URL: https://github.com/apache/datafusion-comet/pull/555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Update ListingTable to use `StatisticsConverter` [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10923: URL: https://github.com/apache/datafusion/issues/10923 ### Is your feature request related to a problem or challenge? Thanks to https://github.com/apache/datafusion/issues/10453, we now have the very nice `StatisticsConverter` API ([code link](https:

[PR] use ScalarValue::to_pyarrow to convert to python object [datafusion-python]

2024-06-14 Thread via GitHub
Michael-J-Ward opened a new pull request, #731: URL: https://github.com/apache/datafusion-python/pull/731 # Which issue does this PR close? Closes #729 # Rationale for this change `datafusion` already implements converting `ScalarValue`s to python objects. # W

Re: [I] TPC-H q8 hangs with xxhash64 enabled [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on issue #517: URL: https://github.com/apache/datafusion-comet/issues/517#issuecomment-2168740484 Even with https://github.com/apache/datafusion-comet/pull/575, q8 still crashes so perhaps it is not related to xxhash64 support at all. I am investigating still -- This

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #10453: URL: https://github.com/apache/datafusion/issues/10453#issuecomment-2168730604 This issue is done enough -- I am consolidating the remaining todo items under https://github.com/apache/datafusion/issues/10922 -- This is an automated message from the Apache

Re: [I] [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs [datafusion]

2024-06-14 Thread via GitHub
alamb closed issue #10453: [EPIC] Efficiently and correctly extract parquet statistics into ArrayRefs URL: https://github.com/apache/datafusion/issues/10453 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[I] [EPIC] Continued correct and improved extracting Parquet statistics into ArrayRefs [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10922: URL: https://github.com/apache/datafusion/issues/10922 ### Is your feature request related to a problem or challenge? I consolidated the content of our previous tickets about better statistics https://github.com/apache/datafusion/issues/10806 and htt

Re: [PR] Simplify Join Partition Rules [datafusion]

2024-06-14 Thread via GitHub
alamb merged PR #10911: URL: https://github.com/apache/datafusion/pull/10911 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Simplify Join Partition Rules [datafusion]

2024-06-14 Thread via GitHub
alamb commented on PR #10911: URL: https://github.com/apache/datafusion/pull/10911#issuecomment-2168725290 > lgtm thanks @berkaysynnada I'd love to see this table in docs somewhere, it can be in followup PR I agree - thank you @berkaysynnada and getting the table into the docs would

[PR] Implement more efficient version of xxhash64 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove opened a new pull request, #575: URL: https://github.com/apache/datafusion-comet/pull/575 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/547 ## Rationale for this change ## What changes are included in

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-14 Thread via GitHub
alamb commented on PR #10879: URL: https://github.com/apache/datafusion/pull/10879#issuecomment-2168723309 The CI failure seems unrelated to the code in this PR: https://github.com/apache/datafusion/actions/runs/9510703358?pr=10879 -- This is an automated message from the Apache G

Re: [PR] Support explicit type and name during table creation [datafusion]

2024-06-14 Thread via GitHub
alamb merged PR #10273: URL: https://github.com/apache/datafusion/pull/10273 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [I] Create `Struct` table with explicit type and name [datafusion]

2024-06-14 Thread via GitHub
alamb closed issue #10207: Create `Struct` table with explicit type and name URL: https://github.com/apache/datafusion/issues/10207 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific commen

Re: [I] [EPIC] Fix safety issues in unsafe code [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove closed issue #507: [EPIC] Fix safety issues in unsafe code URL: https://github.com/apache/datafusion-comet/issues/507 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove merged PR #546: URL: https://github.com/apache/datafusion-comet/pull/546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] Implement `arrow_cast` support for `StringView` and `BinaryView` [datafusion]

2024-06-14 Thread via GitHub
XiangpengHao commented on issue #10920: URL: https://github.com/apache/datafusion/issues/10920#issuecomment-2168676788 Let me try it, can you assign me? @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[I] use StringViewArray when reading String columns from Parquet [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10921: URL: https://github.com/apache/datafusion/issues/10921 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10918 In order to take advantage of the parquet writer generating StringVie

[I] Implement `arrow_cast` support for `StringView` [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10920: URL: https://github.com/apache/datafusion/issues/10920 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10918, `[StringViewArray`](https://docs.rs/arrow/latest/arrow/array/type.StringVi

[I] Implement equality `=` and inequality `<>` support for `StringView` [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10919: URL: https://github.com/apache/datafusion/issues/10919 ### Is your feature request related to a problem or challenge? Part of https://github.com/apache/datafusion/issues/10918, `[StringViewArray`](https://docs.rs/arrow/latest/arrow/array/type.StringVi

Re: [I] [Epic] Implement support for `StringView` in DataFusion [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #10918: URL: https://github.com/apache/datafusion/issues/10918#issuecomment-2168625482 I think we should aim for a first "milestone" of showing improvements for some clickbench queries -- This is an automated message from the Apache Git Service. To respond to the

[I] [Epic] Implement support for `StringView` in DataFusion [datafusion]

2024-06-14 Thread via GitHub
alamb opened a new issue, #10918: URL: https://github.com/apache/datafusion/issues/10918 ### Is your feature request related to a problem or challenge? _No response_ ### Describe the solution you'd like _No response_ ### Describe alternatives you've considered

Re: [I] Data set which is much bigger than RAM [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #10897: URL: https://github.com/apache/datafusion/issues/10897#issuecomment-2168571852 > Will it swallow all memory and fail or it will be running in a kind on streaming format? Hi @Smotrov, given your description and code, I would expect this query to run in

Re: [I] CTE in a UNION query can escape its scope [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #10914: URL: https://github.com/apache/datafusion/issues/10914#issuecomment-2168566440 😮 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

Re: [I] Depend on Arrow Subcrates [datafusion]

2024-06-14 Thread via GitHub
tmi commented on issue #5725: URL: https://github.com/apache/datafusion/issues/5725#issuecomment-2168560427 Honestly I think this issue should be closed as there won't be that much value -- e.g., `datafusion-common` depends on pyarrow which doesn't exist as a standalone crate so you are stu

Re: [I] Upgrade window UDF api [datafusion-python]

2024-06-14 Thread via GitHub
timsaucer commented on issue #730: URL: https://github.com/apache/datafusion-python/issues/730#issuecomment-2168549814 I’ll be happy to but on vacation so it will be a couple weeksOn Jun 14, 2024, at 1:49 PM, Michael J Ward ***@***.***> wrote: @timsaucer could you take a look at this on

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-06-14 Thread via GitHub
dependabot[bot] closed pull request #707: build(deps): bump object_store from 0.9.1 to 0.10.1 URL: https://github.com/apache/datafusion-python/pull/707 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] build(deps): bump object_store from 0.9.1 to 0.10.1 [datafusion-python]

2024-06-14 Thread via GitHub
dependabot[bot] commented on PR #707: URL: https://github.com/apache/datafusion-python/pull/707#issuecomment-2168541122 Looks like object_store is up-to-date now, so this is no longer needed. -- This is an automated message from the Apache Git Service. To respond to the message, please lo

Re: [PR] Upgrade datafusion 39 [datafusion-python]

2024-06-14 Thread via GitHub
andygrove merged PR #728: URL: https://github.com/apache/datafusion-python/pull/728 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@d

Re: [I] Incorrect conversion of pyarrow interval value to datafusion literal [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #665: Incorrect conversion of pyarrow interval value to datafusion literal URL: https://github.com/apache/datafusion-python/issues/665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Add `regexp_replace` example back to docs [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #677: Add `regexp_replace` example back to docs URL: https://github.com/apache/datafusion-python/issues/677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Incorrect conversion of pyarrow interval value to datafusion literal [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #665: Incorrect conversion of pyarrow interval value to datafusion literal URL: https://github.com/apache/datafusion-python/issues/665 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [I] Incorporate upstream array_slice fixes [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #670: Incorporate upstream array_slice fixes URL: https://github.com/apache/datafusion-python/issues/670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Add `regexp_replace` example back to docs [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #677: Add `regexp_replace` example back to docs URL: https://github.com/apache/datafusion-python/issues/677 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Incorporate upstream array_slice fixes [datafusion-python]

2024-06-14 Thread via GitHub
andygrove closed issue #670: Incorporate upstream array_slice fixes URL: https://github.com/apache/datafusion-python/issues/670 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[I] Support DELTA_BINARY_PACKED and DELTA_BYTE_ARRAY [datafusion-comet]

2024-06-14 Thread via GitHub
kazuyukitanimura opened a new issue, #574: URL: https://github.com/apache/datafusion-comet/issues/574 ### What is the problem the feature request solves? There are some tests in Spark 4.0 that uses `parquet.writer.version=v2` (`ParquetTypeWideningSuite`). The V2 write writes wi

Re: [I] Keynote presentation for SiMoD workshop at SIGMOD 2024 [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #10481: URL: https://github.com/apache/datafusion/issues/10481#issuecomment-2168498658 Its done! I'll try and record this talk too at some point and post it on http://andrew.nerdnetworks.org/ -- This is an automated message from the Apache Git Service. To respond

Re: [I] Keynote presentation for SiMoD workshop at SIGMOD 2024 [datafusion]

2024-06-14 Thread via GitHub
alamb closed issue #10481: Keynote presentation for SiMoD workshop at SIGMOD 2024 URL: https://github.com/apache/datafusion/issues/10481 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

Re: [I] Upgrade window UDF api [datafusion-python]

2024-06-14 Thread via GitHub
Michael-J-Ward commented on issue #730: URL: https://github.com/apache/datafusion-python/issues/730#issuecomment-2168496625 @timsaucer could you take a look at this once #728 is merged? -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

Re: [I] Avoid stack allocation in xxhash64 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on issue #547: URL: https://github.com/apache/datafusion-comet/issues/547#issuecomment-2168486571 @advancedxy I am working on re-implementing this now in a simpler way as an experiement ... I will aim to have a PR up by end of day -- This is an automated message from

Re: [I] Unexpected results with group by and random() [datafusion]

2024-06-14 Thread via GitHub
alamb commented on issue #7876: URL: https://github.com/apache/datafusion/issues/7876#issuecomment-2168484280 I think we have fixed this issue in subsequent releases, so closing this ticket. Let's reopen / file a new ticket if we find something still is not working -- This is an automate

Re: [I] Unexpected results with group by and random() [datafusion]

2024-06-14 Thread via GitHub
alamb closed issue #7876: Unexpected results with group by and random() URL: https://github.com/apache/datafusion/issues/7876 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-14 Thread via GitHub
goldmedal commented on code in PR #10917: URL: https://github.com/apache/datafusion/pull/10917#discussion_r1640141816 ## datafusion/proto/src/physical_plan/mod.rs: ## @@ -496,11 +496,14 @@ impl AsExecutionPlan for protobuf::PhysicalPlanNode {

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-14 Thread via GitHub
comphead commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1640135715 ## core/src/execution/sort.rs: ## @@ -159,12 +159,16 @@ where pos += 1; } } else { -unsafe {

Re: [I] Avoid stack allocation in xxhash64 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on issue #547: URL: https://github.com/apache/datafusion-comet/issues/547#issuecomment-2168451782 Here are benchmark results comparing murmur3 to xxhash64. xxhash64 is 3x slower (not sure if that is the expectation) but what is more interesting is that there is a warnin

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-14 Thread via GitHub
waynexia commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1635767599 ## datafusion/functions/src/string/contains.rs: ## @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor licens

Re: [I] Improve error "BroadcastExchange is not supported" [datafusion-comet]

2024-06-14 Thread via GitHub
parthchandra commented on issue #568: URL: https://github.com/apache/datafusion-comet/issues/568#issuecomment-2168443359 Yes, I'll take this up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to t

Re: [I] Improve error "BroadcastExchange is not supported" [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove commented on issue #568: URL: https://github.com/apache/datafusion-comet/issues/568#issuecomment-2168438975 @parthchandra Are you interested in working on this one? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

Re: [PR] Add catalog::resolve_table_references [datafusion]

2024-06-14 Thread via GitHub
leoyvens commented on PR #10876: URL: https://github.com/apache/datafusion/pull/10876#issuecomment-2168424302 @jonahgao You're a SQL grammar wizard, I didn't know nested CTEs were even a thing, thanks for pointing out all those edge cases. I've handled and tested them all to the best of my

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-14 Thread via GitHub
viirya commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1640056823 ## core/src/execution/datafusion/expressions/covariance.rs: ## @@ -287,22 +296,24 @@ impl Accumulator for CovarianceAccumulator { } fn evaluate(&mut s

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-14 Thread via GitHub
viirya commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1640056823 ## core/src/execution/datafusion/expressions/covariance.rs: ## @@ -287,22 +296,24 @@ impl Accumulator for CovarianceAccumulator { } fn evaluate(&mut s

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-14 Thread via GitHub
viirya commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1640056168 ## core/src/execution/datafusion/expressions/covariance.rs: ## @@ -287,22 +296,24 @@ impl Accumulator for CovarianceAccumulator { } fn evaluate(&mut s

[I] Improve performance of TPC-H q14 [datafusion-comet]

2024-06-14 Thread via GitHub
andygrove opened a new issue, #573: URL: https://github.com/apache/datafusion-comet/issues/573 ### What is the problem the feature request solves? Initial observations: - Query is fully native - Similar to q19, lineitem scan takes longer with Comet - Comet avoids an expens

Re: [PR] Support dictionary data type in array_to_string [datafusion]

2024-06-14 Thread via GitHub
EduardoVega commented on code in PR #10908: URL: https://github.com/apache/datafusion/pull/10908#discussion_r1640045366 ## datafusion/functions-array/src/string.rs: ## @@ -281,6 +281,21 @@ pub(super) fn array_to_string_inner(args: &[ArrayRef]) -> Result { Ok(

[PR] Convert ApproxPercentileCont and ApproxPercentileContWithWeight to UDAF [datafusion]

2024-06-14 Thread via GitHub
goldmedal opened a new pull request, #10917: URL: https://github.com/apache/datafusion/pull/10917 ## Which issue does this PR close? Closes #10870 and convert ApproxPercentileContWithWeight to udaf. ## Rationale for this change ## What changes are included

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-14 Thread via GitHub
comphead commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1640031686 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -85,11 +85,16 @@ pub(crate) fn spark_compatible_murmur3_hash>(data: T, seed: u32) // safety: //

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-14 Thread via GitHub
comphead commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1640030932 ## core/src/execution/sort.rs: ## @@ -159,12 +159,16 @@ where pos += 1; } } else { -unsafe {

Re: [PR] Minor: use venv in benchmark compare [datafusion]

2024-06-14 Thread via GitHub
comphead merged PR #10894: URL: https://github.com/apache/datafusion/pull/10894 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Update benchmarks README to require rich installation [datafusion]

2024-06-14 Thread via GitHub
comphead closed issue #10022: Update benchmarks README to require rich installation URL: https://github.com/apache/datafusion/issues/10022 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

Re: [PR] Minor: Fix `bench.sh tpch data` [datafusion]

2024-06-14 Thread via GitHub
comphead merged PR #10905: URL: https://github.com/apache/datafusion/pull/10905 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] Minor: Fix `bench.sh tpch data` [datafusion]

2024-06-14 Thread via GitHub
comphead commented on code in PR #10905: URL: https://github.com/apache/datafusion/pull/10905#discussion_r1640026180 ## benchmarks/bench.sh: ## @@ -302,7 +302,7 @@ data_tpch() { else echo " creating parquet files using benchmark binary ..." pushd "${SCRIPT

  1   2   >