Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-08-06 Thread via GitHub


mbutrovich closed issue #1820: Use sha2 implementation from datafusion-spark 
crate
URL: https://github.com/apache/datafusion-comet/issues/1820


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-08-01 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3146120825

   Opened PR https://github.com/apache/datafusion-comet/pull/2063 for using DF 
SHA2.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-07-11 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3063537883

   Should be able to open Comet's PR after 
https://github.com/apache/datafusion-comet/issues/1993 is closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-25 Thread via GitHub


andygrove commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3006624446

   > [@andygrove](https://github.com/andygrove) can I backport 
[SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of 
datafusion ? I tried updating with datafusion-main branch until my commit, 
however, looks like there are many changes in between and they are causing 
different test failures in Comet.
   
   We plan on releasing Comet 0.9.0 in the next few days, hopefully, and then 
we can have Comet main branch start using a pinned dependency on DataFusion and 
start making updated for the changes that have happened in DF since 48 was 
released.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-24 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-3002947426

   > > [@andygrove](https://github.com/andygrove) can I backport 
[SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of 
datafusion ? I tried updating with datafusion-main branch until my commit, 
however, looks like there are many changes in between and they are causing 
different test failures in Comet.
   > 
   > Looks like the test-cases are failing in Comet after updating the 
Datafusion dependency with main-branch due to this 
[commit](https://github.com/apache/datafusion/commit/0c3037404929fc3a3c4fbf6b9b7325d422ce10bd).
 This commit was revert in the branch-48 
[here](https://github.com/apache/datafusion/commit/c76c1f076cca6f1922de8ba86b98c05b6a27e7ac),
 however it wasn't in the main branch. And the failures that I encountered were 
also related to Window functions tests,
   > 
   > For eg, I ran this: `./mvnw test -Dtest=None 
-Dsuites="org.apache.comet.exec.CometExecSuite Windows support"` and it failed 
with subraction underflow,
   > 
   > ```
   >  Windows support *** FAILED *** (7 seconds, 210 milliseconds)
   >   org.apache.spark.SparkException: Job aborted due to stage failure: Task 
0 in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 
(TID 17) (pop-os executor driver): org.apache.comet.CometNativeException: 
attempt to subtract with overflow
   > at 
comet::errors::init::{{closure}}(/home/rishabjoshi/Projects/datafusion-comet/native/core/src/errors.rs:151)
   > at  as 
core::ops::function::Fn>::call(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/alloc/src/boxed.rs:2007)
   > at 
std::panicking::rust_panic_with_hook(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:836)
   > at 
std::panicking::begin_panic_handler::{{closure}}(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:694)
   > at 
std::sys::backtrace::__rust_end_short_backtrace(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/sys/backtrace.rs:168)
   > at 
rust_begin_unwind(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692)
   > at 
core::panicking::panic_fmt(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75)
   > at 
core::panicking::panic_const::panic_const_sub_overflow(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:178)
   > at 
datafusion_expr::window_state::WindowAggState::update(/home/rishabjoshi/Projects/datafusion/datafusion/expr/src/window_state.rs:95)
   > at 
datafusion_physical_expr::window::window_expr::AggregateWindowExpr::aggregate_evaluate_stateful(/home/rishabjoshi/Projects/datafusion/datafusion/physical-expr/src/window/window_expr.rs:260)
   > at 
::evaluate_stateful(/home/rishabjoshi/Projects/datafusion/datafusion/physical-expr/src/window/aggregate.rs:148)
   > at 
datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::compute_aggregates(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:983)
   > at 
datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::poll_next_inner(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:1033)
   > at 
::poll_next(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:949)
   > ```
   > 
   > After applying the revert I see the test case passing in main also.
   > 
   > [@andygrove](https://github.com/andygrove) would you prefer if I apply 
this revert in the main also? Looks like applying the revert in main is no 
longer clean because there were many conflicting patches in between now. 
However, if you prefer, rather than revert I might disable that pieces of code 
within some property/feature-flag that will be remain turned off. I can do this 
piece of work as part of this ticket itself.
   
   Ok, this issue is fixed here: 
https://github.com/apache/datafusion/pull/16430 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-20 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2990142584

   > [@andygrove](https://github.com/andygrove) can I backport 
[SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of 
datafusion ? I tried updating with datafusion-main branch until my commit, 
however, looks like there are many changes in between and they are causing 
different test failures in Comet.
   
   
   Looks like the test-cases are failing in Comet after updating the Datafusion 
dependency with main-branch  due to this 
[commit](https://github.com/apache/datafusion/commit/0c3037404929fc3a3c4fbf6b9b7325d422ce10bd).
 This commit was revert in the branch-48 
[here](https://github.com/apache/datafusion/commit/c76c1f076cca6f1922de8ba86b98c05b6a27e7ac),
 however it wasn't in the main branch. And the failures that I encountered were 
also related to Window functions tests,
   
   For eg, I ran this: `./mvnw test -Dtest=None 
-Dsuites="org.apache.comet.exec.CometExecSuite Windows support"`
   and it failed with subraction underflow,
   ```
Windows support *** FAILED *** (7 seconds, 210 milliseconds)
 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 
in stage 7.0 failed 1 times, most recent failure: Lost task 0.0 in stage 7.0 
(TID 17) (pop-os executor driver): org.apache.comet.CometNativeException: 
attempt to subtract with overflow
   at 
comet::errors::init::{{closure}}(/home/rishabjoshi/Projects/datafusion-comet/native/core/src/errors.rs:151)
   at  as 
core::ops::function::Fn>::call(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/alloc/src/boxed.rs:2007)
   at 
std::panicking::rust_panic_with_hook(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:836)
   at 
std::panicking::begin_panic_handler::{{closure}}(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:694)
   at 
std::sys::backtrace::__rust_end_short_backtrace(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/sys/backtrace.rs:168)
   at 
rust_begin_unwind(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/std/src/panicking.rs:692)
   at 
core::panicking::panic_fmt(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:75)
   at 
core::panicking::panic_const::panic_const_sub_overflow(/rustc/4d91de4e48198da2e33413efdcd9cd2cc0c46688/library/core/src/panicking.rs:178)
   at 
datafusion_expr::window_state::WindowAggState::update(/home/rishabjoshi/Projects/datafusion/datafusion/expr/src/window_state.rs:95)
   at 
datafusion_physical_expr::window::window_expr::AggregateWindowExpr::aggregate_evaluate_stateful(/home/rishabjoshi/Projects/datafusion/datafusion/physical-expr/src/window/window_expr.rs:260)
   at 
::evaluate_stateful(/home/rishabjoshi/Projects/datafusion/datafusion/physical-expr/src/window/aggregate.rs:148)
   at 
datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::compute_aggregates(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:983)
   at 
datafusion_physical_plan::windows::bounded_window_agg_exec::BoundedWindowAggStream::poll_next_inner(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:1033)
   at 
::poll_next(/home/rishabjoshi/Projects/datafusion/datafusion/physical-plan/src/windows/bounded_window_agg_exec.rs:949)
   ```
   After applying the revert I see the test case passing in main also.
   
   @andygrove would you prefer if I apply this revert in the main also? Looks 
like applying the revert in main is no longer clean because there were many 
conflicting patches in between now. However, if you prefer, rather than revert 
I might disable that pieces of code within some property/feature-flag that will 
be remain turned off. 
   I can do this piece of work as part of this ticket itself.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-19 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2988838360

   @andygrove can I backport 
[SHA2-fix](https://github.com/apache/datafusion/pull/16350) to branch-48 of 
datafusion ? I tried updating with datafusion-main branch until my commit, 
however, looks like there are many changes in between and they are causing 
different test failures in Comet.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-17 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2982012440

   I should be able to resume working on this later this week. I might have to 
file another ticket to first upgrade the datafusion dependency before making 
Comet changes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-08 Thread via GitHub


andygrove commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2954102662

   Thanks for working on this @rishvin.
   
   You are correct that Spark doesn't support UInt32 because it is JVM-based. 
JVM only has signed ints. The upper-case output seems to be a bug.
   
   I suggest filing a bug report in DataFusion, linking to this issue. Perhaps 
you would be interested in creating a PR in DataFusion to resolve these issues?
   
   cc @shehabgamin
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-07 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2953616585

   Hi @andygrove, I tried the following approach and looks like there is some 
discrepancy in the Datafusion's `SparkSha2` output with Spark.
   
   **This is what I attempted**
   
   Removed the existing ssh(224|256|384|512) and other related codes from 
[comet_scalar_funcs.rs](https://github.com/rishvin/datafusion-comet/blob/1b1d6185ede9175887de1e9ec7f48422c2a64a10/native/spark-expr/src/comet_scalar_funcs.rs#L120)
   
   Registered the `SparkSha2` UDF 
[here](https://github.com/rishvin/datafusion-comet/blob/1b1d6185ede9175887de1e9ec7f48422c2a64a10/native/core/src/execution/planner.rs#L160).
   
   Modified the serialization logic at Spark to roughly like so,
   ```
   case Sha2(left, numBits) =>
   if (!numBits.foldable) {
 withInfo(expr, "non literal numBits is not supported")
 return None
   }
   val childExpr = exprToProtoInternal(left, inputs, binding)
   val bitsExpr = exprToProtoInternal(numBits, inputs, binding)
   scalarFunctionExprToProtoWithReturnType("sha2", StringType, childExpr, 
bitsExpr)
   ```
   
   
   With these changes we hit the following exception - `Unsupported argument 
types for sha2 function`. 
   This is because the `SparkSha2` pattern matching supports `numBits` type to 
be UInt32 but we are sending type Int32 type from spark (Ref: 
[here](https://github.com/apache/datafusion/blob/1daa5ed5cc51546904d45e23cc148601d973942a/datafusion/spark/src/function/hash/sha2.rs#L141)).
 I checked the supported types for Spark which we define 
[here](https://github.com/rishvin/datafusion-comet/blob/1b1d6185ede9175887de1e9ec7f48422c2a64a10/spark/src/main/scala/org/apache/comet/serde/QueryPlanSerde.scala#L90),
 but looks like Spark doesn't support UInt32. 
   
   To test things further, I duplicated the `SparkSha2` into Comet and added 
Int32 into the pattern matching. This time the output response was successfully 
returned but the results did not match with the expected spark result. The hash 
returned by the `SparkSha2` is in uppercase, however, Spark returns SHA2 in 
lowercase. Here is an example of the two,
   Comet returned - 
`2C83E9E8A39D60F7FCD3CFEC29C154260AA069F91CD40C972756F9354C64594E` 
   Spark returned - 
`2c83e9e8a39d60f7fcd3cfec29c154260aa069f91cd40c972756f9354c64594e`.
   
   This happened because the 
[hex_encode](https://github.com/apache/datafusion/blob/1daa5ed5cc51546904d45e23cc148601d973942a/datafusion/spark/src/function/math/hex.rs#L163)
 method used by the `SparkSha2` sets the lowercase=false (second parameter). 
However in the existing Comet implementation lowercase is set to true, 
[here](https://github.com/rishvin/datafusion-comet/blob/1b1d6185ede9175887de1e9ec7f48422c2a64a10/native/spark-expr/src/math_funcs/hex.rs#L56).
 Hence, we are getting the matching results from comet.
   
   To make it work we might have to do following changes to the Datafusion's 
`SparkSha2`,
   - Add support for pattern matching Int32 type.
   - Add support to return hashes in lowercase.
   
   Alternatively, if are not suppose to make these change in Datafusion, we 
might have to create some wrappers over `SparkSha2` in Comet.
   
   Let me know your thoughts on this or I'm missing anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-05 Thread via GitHub


andygrove commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2945503093

   Thanks @rishvin. I assigned the issue to you. Let me know if you have any 
questions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [I] Use sha2 implementation from datafusion-spark crate [datafusion-comet]

2025-06-05 Thread via GitHub


rishvin commented on issue #1820:
URL: 
https://github.com/apache/datafusion-comet/issues/1820#issuecomment-2945492404

   @andygrove I can work on this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]