Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-13 Thread via GitHub
eejbyfeldt commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2167342033 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] Convert `Grouping` to UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 commented on issue #10906: URL: https://github.com/apache/datafusion/issues/10906#issuecomment-2167237131 > can I have a try to take it? Sure! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-13 Thread via GitHub
Weijun-H commented on PR #10879: URL: https://github.com/apache/datafusion/pull/10879#issuecomment-2167185570 Please add an entry here https://github.com/apache/datafusion/blob/1a2a1bf3f96b8ca96d1496d99a0c072d65e1940d/docs/source/user-guide/sql/scalar_functions.md#L645-L646 -- This is

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-13 Thread via GitHub
Lordworms commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1639206381 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1443,6 +1443,19 @@ position(substr in origstr) - **substr**: The pattern string. - **origstr**: The m

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-13 Thread via GitHub
Weijun-H commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1639203659 ## docs/source/user-guide/sql/scalar_functions.md: ## @@ -1443,6 +1443,19 @@ position(substr in origstr) - **substr**: The pattern string. - **origstr**: The mo

[PR] WIP: Upgrade to Rust 1.79 [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove opened a new pull request, #570: URL: https://github.com/apache/datafusion-comet/pull/570 ## Which issue does this PR close? N/A ## Rationale for this change Rust 1.79 was just released, so we should make sure we can support it. ## What ch

[PR] Move `Literal` to `physical-expr-common` [datafusion]

2024-06-13 Thread via GitHub
lewiszlw opened a new pull request, #10910: URL: https://github.com/apache/datafusion/pull/10910 ## Which issue does this PR close? ## Rationale for this change `StringAgg` depends on `Literal` https://github.com/apache/datafusion/blob/cc60278f50eac33f9c

Re: [PR] perf: Add criterion benchmark for xxhash64 function [datafusion-comet]

2024-06-13 Thread via GitHub
advancedxy commented on code in PR #560: URL: https://github.com/apache/datafusion-comet/pull/560#discussion_r1639161664 ## core/benches/hash.rs: ## @@ -95,6 +96,16 @@ fn criterion_benchmark(c: &mut Criterion) { }); }, ); +group.bench_function(Benc

Re: [PR] Add `advanced_parquet_index.rs` example of index in into parquet files [datafusion]

2024-06-13 Thread via GitHub
Weijun-H commented on code in PR #10701: URL: https://github.com/apache/datafusion/pull/10701#discussion_r1639136992 ## datafusion-examples/examples/advanced_parquet_index.rs: ## @@ -0,0 +1,595 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-13 Thread via GitHub
Lordworms commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1639143722 ## datafusion/functions/src/string/mod.rs: ## @@ -149,6 +149,9 @@ pub mod expr_fn { ),( uuid, "returns uuid v4 as a string value", +),

Re: [I] Convert `Grouping` to UDAF [datafusion]

2024-06-13 Thread via GitHub
Rachelint commented on issue #10906: URL: https://github.com/apache/datafusion/issues/10906#issuecomment-2167090175 can I have a try to take it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] [RFC] Register scalars with boxed fn impl [datafusion]

2024-06-13 Thread via GitHub
github-actions[bot] commented on PR #9980: URL: https://github.com/apache/datafusion/pull/9980#issuecomment-2167064293 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or th

Re: [PR] Support dictionary data type in array_to_string [datafusion]

2024-06-13 Thread via GitHub
alamb commented on code in PR #10908: URL: https://github.com/apache/datafusion/pull/10908#discussion_r1639090072 ## datafusion/functions-array/src/string.rs: ## @@ -281,6 +281,21 @@ pub(super) fn array_to_string_inner(args: &[ArrayRef]) -> Result { Ok(arg)

Re: [PR] WIP: Extract parquet data page statistics [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10852: URL: https://github.com/apache/datafusion/pull/10852#issuecomment-2167026675 I plan to sort this PR and follow on tickets out tomorrow -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Construction of user-defined table functions (UDTFs) should be async to allow for async schemas [datafusion]

2024-06-13 Thread via GitHub
alamb commented on issue #10889: URL: https://github.com/apache/datafusion/issues/10889#issuecomment-2167026249 I think this proposal sounds like a great idea to me, FWIW Thank you for the suggestion @matthewgapp -- This is an automated message from the Apache Git Service. To respo

Re: [I] Real-time streaming support [datafusion]

2024-06-13 Thread via GitHub
alamb commented on issue #10895: URL: https://github.com/apache/datafusion/issues/10895#issuecomment-2167025143 Hi @maronavenue -- I think several people use DataFusion for this - I think https://docs.rs/datafusion/latest/datafusion/datasource/streaming/struct.StreamingTable.html g

[PR] Docs: clarify when the reader will read from object store when using cached metadata [datafusion]

2024-06-13 Thread via GitHub
alamb opened a new pull request, #10909: URL: https://github.com/apache/datafusion/pull/10909 ## Which issue does this PR close? Part of ## Rationale for this change While working on https://github.com/apache/datafusion/pull/10701 it was quite unclear to me why the

[PR] Support dictionary data type in array_to_string [datafusion]

2024-06-13 Thread via GitHub
EduardoVega opened a new pull request, #10908: URL: https://github.com/apache/datafusion/pull/10908 ## Which issue does this PR close? Closes #10862 ## Rationale for this change Go to issue. ## What changes are included in this PR? D

Re: [PR] Add `advanced_parquet_index.rs` example of index in into parquet files [datafusion]

2024-06-13 Thread via GitHub
alamb commented on code in PR #10701: URL: https://github.com/apache/datafusion/pull/10701#discussion_r1639066998 ## datafusion-examples/examples/advanced_parquet_index.rs: ## @@ -0,0 +1,602 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributo

Re: [I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-06-13 Thread via GitHub
comphead commented on issue #567: URL: https://github.com/apache/datafusion-comet/issues/567#issuecomment-2166986064 Thanks @parthchandra the issue is likely in `org.apache.comet.parquet.TypeUtil.checkParquetType` when deriving the decimal type -- This is an automated message from the A

Re: [I] Improve performance for grouping by variable length columns (strings) [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 commented on issue #9403: URL: https://github.com/apache/datafusion/issues/9403#issuecomment-2166974881 I will take a look on this first 👀 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] Convert `BitAnd`, `BitOr`, `BitXor` to UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 opened a new issue, #10907: URL: https://github.com/apache/datafusion/issues/10907 ### Is your feature request related to a problem or challenge? Similar to other issues in #8708 ### Describe the solution you'd like _No response_ ### Describe alternativ

[I] Convert `Grouping` to UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 opened a new issue, #10906: URL: https://github.com/apache/datafusion/issues/10906 ### Is your feature request related to a problem or challenge? Similar to other issues in #8708 ### Describe the solution you'd like _No response_ ### Describe alternativ

Re: [I] Convert `Regr` to UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 closed issue #10883: Convert `Regr` to UDAF URL: https://github.com/apache/datafusion/issues/10883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [I] Convert `Regr` to UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 closed issue #10883: Convert `Regr` to UDAF URL: https://github.com/apache/datafusion/issues/10883 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-

Re: [PR] Move Regr_* functions to use UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 merged PR #10898: URL: https://github.com/apache/datafusion/pull/10898 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Move Regr_* functions to use UDAF [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 commented on PR #10898: URL: https://github.com/apache/datafusion/pull/10898#issuecomment-2166953853 Thanks @eejbyfeldt and @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] Remove builtin count [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 merged PR #10893: URL: https://github.com/apache/datafusion/pull/10893 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dat

Re: [PR] Remove builtin count [datafusion]

2024-06-13 Thread via GitHub
jayzhan211 commented on PR #10893: URL: https://github.com/apache/datafusion/pull/10893#issuecomment-2166945309 Thanks @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

Re: [I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-06-13 Thread via GitHub
parthchandra commented on issue #567: URL: https://github.com/apache/datafusion-comet/issues/567#issuecomment-2166877174 I'll look into this @comphead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

Re: [I] June 2024 ASF Board Report [datafusion]

2024-06-13 Thread via GitHub
alamb commented on issue #10155: URL: https://github.com/apache/datafusion/issues/10155#issuecomment-2166874584 Submitted following report: ``` ## Description: The mission of Apache DataFusion is the creation and maintenance of software related to an extensible query engine

Re: [I] June 2024 ASF Board Report [datafusion]

2024-06-13 Thread via GitHub
alamb closed issue #10155: June 2024 ASF Board Report URL: https://github.com/apache/datafusion/issues/10155 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-m

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1638984957 ## core/src/execution/sort.rs: ## @@ -159,12 +159,16 @@ where pos += 1; } } else { -unsafe {

[I] Comet cannot read decimals with physical type BINARY [datafusion-comet]

2024-06-13 Thread via GitHub
comphead opened a new issue, #567: URL: https://github.com/apache/datafusion-comet/issues/567 ### Describe the bug The user raised the issue when Comet crashes on ``` Column: [price], Expected: decimal(15,2), Found: BINARY. ``` when reading the parquet file The par

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1638984957 ## core/src/execution/sort.rs: ## @@ -159,12 +159,16 @@ where pos += 1; } } else { -unsafe {

Re: [PR] chore: Upgrade to Rust 1.78 and fix UB issues in unsafe code [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #546: URL: https://github.com/apache/datafusion-comet/pull/546#discussion_r1638984525 ## core/src/execution/datafusion/spark_hash.rs: ## @@ -85,11 +85,16 @@ pub(crate) fn spark_compatible_murmur3_hash>(data: T, seed: u32) // safety: //

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove merged PR #558: URL: https://github.com/apache/datafusion-comet/pull/558 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#issuecomment-2166839176 Thanks for the review @viirya -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] feat: add nullOnDivideByZero for Covariance [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #564: URL: https://github.com/apache/datafusion-comet/pull/564#discussion_r1638975379 ## spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala: ## @@ -1089,36 +1089,148 @@ class CometAggregateSuite extends CometTestBase with Adap

Re: [I] Incorrect LEFT JOIN evaluation result on OR conditions [datafusion]

2024-06-13 Thread via GitHub
viirya commented on issue #10881: URL: https://github.com/apache/datafusion/issues/10881#issuecomment-2166820182 > > SELECT e.emp_id, e.name, d.department FROM employees e LEFT JOIN department d ON (e.name = 'Alice' OR e.name = 'Bob'); Hmm, as the join filter is only on `emplo

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#issuecomment-2166819246 > The newer Rust version requires alignments on `copy_nonoverlapping` call? Just to be clear, this was always the requirement, but Rust 1.78 added debug assertions to catch

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#discussion_r163896 ## core/src/parquet/read/values.rs: ## @@ -438,41 +419,92 @@ impl PlainDictDecoding for BoolType { } } -// Shared implementation for int variants such

Re: [PR] Move Regr_* functions to use UDAF [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10898: URL: https://github.com/apache/datafusion/pull/10898#issuecomment-2166623105 🎉 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on code in PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#discussion_r1638962935 ## core/src/parquet/read/values.rs: ## @@ -438,41 +419,92 @@ impl PlainDictDecoding for BoolType { } } -// Shared implementation for int variants such as

Re: [PR] chore: Fix most of the scala/java build warnings [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #562: URL: https://github.com/apache/datafusion-comet/pull/562#discussion_r1638735671 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -158,28 +158,30 @@ class CometSparkSessionExtensions // data sourc

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#issuecomment-2166812415 > Parquet decoding when converting between different integral types was using `copy_nonoverlapping` without meeting the precondition that both pointers were properly aligned.

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
edmondop commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166810812 This is interesting! At least breaking the build was worth it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] fix: Re-implement some Parquet decode methods without `copy_nonoverlapping` [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on PR #558: URL: https://github.com/apache/datafusion-comet/pull/558#issuecomment-2166808756 > I am questioning the earlier results now. Latest benchmark comparing safe version of `decode_i32_to_u16` vs unsafe version of `decode_i32_to_i16`. Now seeing 67 ns not 68 ps. Curi

[I] Simpify `PyExpr::python_value` by using `ScalarValue::into_py` [datafusion-python]

2024-06-13 Thread via GitHub
Michael-J-Ward opened a new issue, #729: URL: https://github.com/apache/datafusion-python/issues/729 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Currently, we de-structure every `ScalarValue` variant https://github.com/

Re: [I] range end index 294912 out of range for slice of length 147456 [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on issue #540: URL: https://github.com/apache/datafusion-comet/issues/540#issuecomment-2166552950 I've just figured out where the root cause is. I will go to propose a fix to arrow-rs. -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Minor: use venv in benchmark compare [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10894: URL: https://github.com/apache/datafusion/pull/10894#issuecomment-2166625663 I merged up to get a clean CI run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Add `advanced_parquet_index.rs` example of index in into parquet files [datafusion]

2024-06-13 Thread via GitHub
adriangb commented on code in PR #10701: URL: https://github.com/apache/datafusion/pull/10701#discussion_r1638838000 ## datafusion-examples/examples/advanced_parquet_index.rs: ## @@ -0,0 +1,602 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contrib

Re: [PR] perf: Add criterion benchmark for xxhash64 function [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove merged PR #560: URL: https://github.com/apache/datafusion-comet/pull/560 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
comphead commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166456220 Found that SMJ left semi sometimes have a duplicated row in the output, I'm trying to make a test case -- This is an automated message from the Apache Git Service. To respond

Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-13 Thread via GitHub
kazuyukitanimura commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2166650699 +1 It might be a good idea to post the link of this issue on ASF slack `#datafusion-comet` channel to notify more folks -- This is an automated message from the

Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-13 Thread via GitHub
huaxingao commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2166202256 also cc @sunchao @kazuyukitanimura @parthchandra -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t

Re: [PR] Move Regr_* functions to use UDAF [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10898: URL: https://github.com/apache/datafusion/pull/10898#issuecomment-2166139620 The macos failure https://github.com/apache/datafusion/actions/runs/9502875641/job/26191967199?pr=10898 appears to be related to https://github.com/apache/datafusion/pull/10904 (a fla

Re: [PR] CSE shorthand alias [datafusion]

2024-06-13 Thread via GitHub
peter-toth commented on code in PR #10868: URL: https://github.com/apache/datafusion/pull/10868#discussion_r1638701368 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -166,6 +166,15 @@ impl CommonSubexprEliminate { ) -> Result<(Vec>, LogicalPlan)> {

[PR] Minor: Fix `bench.sh tpch data` [datafusion]

2024-06-13 Thread via GitHub
alamb opened a new pull request, #10905: URL: https://github.com/apache/datafusion/pull/10905 ## Which issue does this PR close? Closes #. ## Rationale for this change While reviewing https://github.com/apache/datafusion/pull/10894 I noticed the tpch data generat

Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-13 Thread via GitHub
parthchandra commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2166401215 I'm +1 on this too. Maintaining more than two versions definitely slows down development. -- This is an automated message from the Apache Git Service. To respond to t

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
viirya commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638671764 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1422,6 +1449,38 @@ fn get_filter_column( filter_columns } +fn produce_buffered_null_batch(

Re: [PR] Remove builtin count [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10893: URL: https://github.com/apache/datafusion/pull/10893#issuecomment-2166555899 I took the liberty of merging up from main for this PR to get a clean CI run -- This is an automated message from the Apache Git Service. To respond to the message, please log on to G

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
viirya commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638677200 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch t

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
comphead commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166074359 Filed https://github.com/apache/datafusion/pull/10904 Will run the test in the loop to find out where the issue is -- This is an automated message from the Apache Git Servi

Re: [PR] perf: Add criterion benchmark for xxhash64 function [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #560: URL: https://github.com/apache/datafusion-comet/pull/560#discussion_r1638753419 ## core/benches/hash.rs: ## @@ -95,6 +96,16 @@ fn criterion_benchmark(c: &mut Criterion) { }); }, ); +group.bench_function(Bench

Re: [PR] docs: Proposal for source release process [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on code in PR #556: URL: https://github.com/apache/datafusion-comet/pull/556#discussion_r1638701567 ## dev/release/README.md: ## @@ -17,29 +17,34 @@ specific language governing permissions and limitations under the License. --> -# Comet Release Process +#

Re: [PR] Minor: Fix `bench.sh tpch data` [datafusion]

2024-06-13 Thread via GitHub
alamb commented on code in PR #10905: URL: https://github.com/apache/datafusion/pull/10905#discussion_r1638752596 ## benchmarks/bench.sh: ## @@ -302,7 +302,7 @@ data_tpch() { else echo " creating parquet files using benchmark binary ..." pushd "${SCRIPT_DI

Re: [PR] chore: Fix most of the scala/java build warnings [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove merged PR #562: URL: https://github.com/apache/datafusion-comet/pull/562 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] Move Regr_* functions to use UDAF [datafusion]

2024-06-13 Thread via GitHub
alamb commented on PR #10898: URL: https://github.com/apache/datafusion/pull/10898#issuecomment-2166554722 Merged up to get the fix for #10904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [PR] CSE shorthand alias [datafusion]

2024-06-13 Thread via GitHub
MohamedAbdeen21 commented on PR #10868: URL: https://github.com/apache/datafusion/pull/10868#issuecomment-2166538137 The failing CI is a simple `clippy` warning, I'd appreciate if it can be fixed before merging. The last thread between me and @peter-toth mentions some possible improv

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638430287 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1313,51 +1354,37 @@ impl SMJStream { streamed_columns

Re: [PR] CSE shorthand alias [datafusion]

2024-06-13 Thread via GitHub
peter-toth commented on code in PR #10868: URL: https://github.com/apache/datafusion/pull/10868#discussion_r1638113803 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -166,6 +166,15 @@ impl CommonSubexprEliminate { ) -> Result<(Vec>, LogicalPlan)> {

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638458531 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch

Re: [PR] CSE shorthand alias [datafusion]

2024-06-13 Thread via GitHub
peter-toth commented on code in PR #10868: URL: https://github.com/apache/datafusion/pull/10868#discussion_r1638701368 ## datafusion/optimizer/src/common_subexpr_eliminate.rs: ## @@ -166,6 +166,15 @@ impl CommonSubexprEliminate { ) -> Result<(Vec>, LogicalPlan)> {

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638432985 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1422,6 +1449,38 @@ fn get_filter_column( filter_columns } +fn produce_buffered_null_batch

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638423685 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1055,7 +1064,7 @@ impl SMJStream { Some(scanning_idx),

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638406261 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1445,9 +1504,13 @@ fn get_buffered_columns( /// `streamed_indices` have the same length as `mask

Re: [I] Dropping Spark 3.2 support [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on issue #565: URL: https://github.com/apache/datafusion-comet/issues/565#issuecomment-2166005516 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

Re: [PR] Minor: dont panic with bad arguments to round [datafusion]

2024-06-13 Thread via GitHub
tmi commented on code in PR #10899: URL: https://github.com/apache/datafusion/pull/10899#discussion_r1638502083 ## datafusion/functions/src/math/round.rs: ## @@ -128,29 +134,41 @@ pub fn round(args: &[ArrayRef]) -> Result { } )) as ArrayRef)

Re: [I] ci: clippy failed on main [datafusion]

2024-06-13 Thread via GitHub
alamb closed issue #10902: ci: clippy failed on main URL: https://github.com/apache/datafusion/issues/10902 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638422703 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch

Re: [PR] Minor: disable flaky fuzz test [datafusion]

2024-06-13 Thread via GitHub
comphead merged PR #10904: URL: https://github.com/apache/datafusion/pull/10904 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [PR] [WIP] Add support for BNLJ [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on PR #343: URL: https://github.com/apache/datafusion-comet/pull/343#issuecomment-2165935930 No worries. Just want to make sure if you still work on it. If not, we could have someone pick it up. -- This is an automated message from the Apache Git Service. To respond to th

Re: [PR] Minor: dont panic with bad arguments to round [datafusion]

2024-06-13 Thread via GitHub
tmi commented on code in PR #10899: URL: https://github.com/apache/datafusion/pull/10899#discussion_r1638499463 ## datafusion/functions/src/math/round.rs: ## @@ -128,29 +134,41 @@ pub fn round(args: &[ArrayRef]) -> Result { } )) as ArrayRef)

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
viirya commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638444191 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch t

Re: [I] chore: Investigate impact of small batches on performance [datafusion-comet]

2024-06-13 Thread via GitHub
andygrove commented on issue #495: URL: https://github.com/apache/datafusion-comet/issues/495#issuecomment-2166104979 When running TPC-H q16 with DataFusion, there is a significant difference in performance between runs with coalesce batches enabled vs disabled. With `datafusion.exec

[I] ci: clippy failed on main [datafusion]

2024-06-13 Thread via GitHub
jonahgao opened a new issue, #10902: URL: https://github.com/apache/datafusion/issues/10902 ### Describe the bug The release of a new version of Rust likely caused it. https://blog.rust-lang.org/2024/06/13/Rust-1.79.0.html ### To Reproduce Failed job: https://

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
comphead commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166058101 Thanks @eejbyfeldt and @jonahgao I'll comment the test now and run more tests on top of it to see where the spontaneous bug is. -- This is an automated message from the Apa

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
comphead commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638437120 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
eejbyfeldt commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166026294 I think this should be reopened or we should open a separate issue. The test flaked here: https://github.com/apache/datafusion/actions/runs/9501960117/job/26188803965?pr=1089

Re: [I] SMJ producing different results than HashJoin when doing a semi join [datafusion]

2024-06-13 Thread via GitHub
jonahgao commented on issue #10886: URL: https://github.com/apache/datafusion/issues/10886#issuecomment-2166055612 > I think this should be reopened or we should open a separate issue. The test flaked here: https://github.com/apache/datafusion/actions/runs/9501960117/job/26188803965?pr=1089

Re: [PR] ci: fix clippy failures on main [datafusion]

2024-06-13 Thread via GitHub
alamb merged PR #10903: URL: https://github.com/apache/datafusion/pull/10903 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

[PR] ci: fix clippy failures on main [datafusion]

2024-06-13 Thread via GitHub
jonahgao opened a new pull request, #10903: URL: https://github.com/apache/datafusion/pull/10903 ## Which issue does this PR close? Closes #10902. ## Rationale for this change ## What changes are included in this PR? ## Are these changes tes

Re: [PR] fix: Return error in pre_timestamp_cast instead of panic [datafusion-comet]

2024-06-13 Thread via GitHub
eejbyfeldt commented on code in PR #543: URL: https://github.com/apache/datafusion-comet/pull/543#discussion_r1638389361 ## core/src/execution/datafusion/expressions/utils.rs: ## @@ -81,12 +81,12 @@ pub fn array_with_timezone( array: ArrayRef, timezone: String, to

Re: [PR] Minor: dont panic with bad arguments to round [datafusion]

2024-06-13 Thread via GitHub
tmi commented on code in PR #10899: URL: https://github.com/apache/datafusion/pull/10899#discussion_r1638502083 ## datafusion/functions/src/math/round.rs: ## @@ -128,29 +134,41 @@ pub fn round(args: &[ArrayRef]) -> Result { } )) as ArrayRef)

Re: [PR] chore: Fix most of the scala/java build warnings [datafusion-comet]

2024-06-13 Thread via GitHub
viirya commented on code in PR #562: URL: https://github.com/apache/datafusion-comet/pull/562#discussion_r1638405445 ## spark/src/main/scala/org/apache/comet/CometSparkSessionExtensions.scala: ## @@ -158,28 +158,30 @@ class CometSparkSessionExtensions // data source V

Re: [PR] Add contains function, and support in datafusion substrait consumer [datafusion]

2024-06-13 Thread via GitHub
alamb commented on code in PR #10879: URL: https://github.com/apache/datafusion/pull/10879#discussion_r1638340942 ## datafusion/functions/src/string/contains.rs: ## @@ -0,0 +1,143 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license a

Re: [PR] fix: Fix the incorrect null joined rows for SMJ outer join with join filter [datafusion]

2024-06-13 Thread via GitHub
viirya commented on code in PR #10892: URL: https://github.com/apache/datafusion/pull/10892#discussion_r1638444191 ## datafusion/physical-plan/src/joins/sort_merge_join.rs: ## @@ -1098,49 +1107,52 @@ impl SMJStream { // 2. freezes NULLs joined to dequeued buffered batch t

Re: [PR] feat: Add method to add analyzer rules to SessionContext [datafusion]

2024-06-13 Thread via GitHub
alamb commented on code in PR #10849: URL: https://github.com/apache/datafusion/pull/10849#discussion_r1638488162 ## datafusion/core/tests/user_defined/user_defined_plan.rs: ## @@ -619,3 +627,49 @@ impl RecordBatchStream for TopKReader { self.input.schema() } } +

Re: [I] Apply guarantee rewriter to sql workflow [datafusion]

2024-06-13 Thread via GitHub
alamb commented on issue #10456: URL: https://github.com/apache/datafusion/issues/10456#issuecomment-2165935318 > Improving grouping performance seems interesting! I think it would be awesome -- thank you. How would you like to proceed? I personally think either https://github.com/apa

  1   2   >