Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub


kosiew commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3116309058

   Closing this in favour of 
https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-24 Thread via GitHub


kosiew closed pull request #16681: Enhance `ScalarUDFImpl` Equality Handling 
with Pointer-Based Default and Customizable Logic
URL: https://github.com/apache/datafusion/pull/16681


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-19 Thread via GitHub


findepi commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3092345097

   Posted a solution proposal -- 
https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-18 Thread via GitHub


alamb commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3089418586

   > remove default implementation of equality -- this is a breaking change
   
   As long as we sufficiently document how to fix it (with an example in the 
upgrade guide) I I think that makes sense to me
   
   Probably in DataFusion 50


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-18 Thread via GitHub


findepi commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3088768792

   @alamb  https://github.com/apache/datafusion/pull/16781 aims to solve 
immediate obvious problem - incorrect implementation of equality for stateful 
functions managed in this repo. That PR does not solve #16677.
   
   This PR aims to solve the problem of equality being error-prone, wrong by 
default. I.e. addresses #16677.
   The #16781 PR shows how wide-spread the problem is, giving more motivation 
to fix the default behavior.
   
   My proposal is 
   
   1. land #16781 (or equivalent), i.e. fix the immediate problem
   2. remove default implementation of equality -- this is a breaking change


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-17 Thread via GitHub


alamb commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3085634080

   I think this PR is an alternative to the proposed fix in 
   - https://github.com/apache/datafusion/pull/16781
   
   Is that right? Are you happy with the code in 
https://github.com/apache/datafusion/pull/16781 @kosiew ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-15 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2206960159


##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +700,89 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 
 /// Return true if this scalar UDF is equal to the other.
 ///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules 
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must 
adhere to the rules of equivalence:
+///
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+///
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures 
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are 
the same. This conservative approach
+/// ensures that different instances of the same function type are not 
mistakenly considered equal.
 ///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be 
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare 
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, 
ScalarUDFImpl, Signature};
+///
+/// #[derive(Debug, PartialEq)]
+/// struct MyUdf {
+///  param: i32,
+///  signature: Signature,
+/// }
 ///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> 
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self == other
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If 
`equals` returns true for two UDFs,
+///   their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined 
behavior for any input.
 fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {

Review Comment:
   I think we cannot safely use std::ptr::eq(self, other) here because self and 
other are trait objects (&dyn ScalarUDFImpl).
   std::ptr::eq(self, other) would compare the vtable pointers, not the 
underlying data pointers, and may return false even if the trait objects point 
to the same data.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-15 Thread via GitHub


findepi commented on PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3072748613

   "99%" of the cases are stateless[^1]. Would it be possible to provide 
default implementation that works great in the "99%" common case, but still 
works correctly in other cases?
   
   if that's not possible, and so the default behavior is inferior for the 
common case, it needs to be reimplemented in every function. Which means it's 
not really an optional method and so we should remove its default 
implementation.
   
   cc @alamb 
   
   [^1]: the only field in most UDFs is Signature. It's there for convenience. 
Could be as well implemented as a static 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-15 Thread via GitHub


findepi commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2206861910


##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +700,89 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 
 /// Return true if this scalar UDF is equal to the other.
 ///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules 
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must 
adhere to the rules of equivalence:
+///
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+///
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures 
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are 
the same. This conservative approach
+/// ensures that different instances of the same function type are not 
mistakenly considered equal.
 ///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be 
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare 
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, 
ScalarUDFImpl, Signature};
+///
+/// #[derive(Debug, PartialEq)]
+/// struct MyUdf {
+///  param: i32,
+///  signature: Signature,
+/// }
 ///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> 
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self == other
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If 
`equals` returns true for two UDFs,
+///   their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined 
behavior for any input.
 fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {

Review Comment:
   Could this be `std::ptr::eq(self, other)` here?
   



##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 
 /// Return true if this scalar UDF is equal to the other.
 ///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules 
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must 
adhere to the rules of equivalence:
 ///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
 ///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures 
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are 
the same. This conservative

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188857055


##
datafusion/expr/src/test/mod.rs:
##
@@ -16,3 +16,5 @@
 // under the License.
 
 pub mod function_stub;
+#[cfg(test)]
+pub mod udf_equals;

Review Comment:
   Good idea.
   Moved.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188833299


##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 
 /// Return true if this scalar UDF is equal to the other.
 ///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules 
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must 
adhere to the rules of equivalence:
 ///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
 ///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures 
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are 
the same. This conservative approach
+/// ensures that different instances of the same function type are not 
mistakenly considered equal.
+///
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be 
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare 
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, 
ScalarUDFImpl, Signature};
+/// #[derive(Debug)]
+/// struct MyUdf {
+///  param: i32,
+///  signature: Signature,
+/// }
+///
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> 
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self.param == other.param && self.name() == other.name()

Review Comment:
   Amended to use
   derive PartialEq



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188826793


##
docs/source/library-user-guide/upgrading.md:
##
@@ -62,6 +62,36 @@ DataFusionError::SchemaError(
 
 [#16652]: https://github.com/apache/datafusion/issues/16652
 
+ `ScalarUDFImpl::equals` Default Implementation
+
+The default implementation of the `equals` method in the `ScalarUDFImpl` trait 
has been updated. Previously, it compared only the type IDs, names, and 
signatures of UDFs. Now, it assumes UDFs are not equal unless their pointers 
are the same.

Review Comment:
   That's a thoughtful consideration. You raise valid points about 
predictability vs. convenience. Let me outline the tradeoffs:
   
   Current approach (safe default with pointer comparison):
   
   ✅ Prevents silent bugs where ScalarUDFs with different internal state are 
incorrectly considered equal
   ✅ Forces developers to think about equality semantics for their specific UDFs
   ✅ Common expression elimination works correctly out of the box
   ❌ Requires manual action from users who want structural equality
   
   Alternative approach (require explicit PartialEq implementation):
   
   ✅ More explicit and predictable - no hidden behavior changes
   ✅ Leverages Rust's type system for compile-time guarantees
   ✅ Clear contract: if you want equality comparison, implement PartialEq
   ❌ Breaking change requiring immediate action from all ScalarUDF implementors
   ❌ More work for simple ScalarUDFs that don't need custom equality logic
   
   My recommendation: Stick with the current approach for DataFusion 49.0.0, 
but consider the PartialEq approach for a future major version. Here's why:
   
   The current change already enables critical functionality (common expression 
elimination) without breaking compilation
   The upgrade path is clear and documented with examples
   For DataFusion 50.0.0 or 51.0.0, we could introduce a more explicit API that 
requires PartialEq implementation
   This gives users time to adapt while ensuring the optimizer works correctly 
immediately. 
   
   What do you think about this phased approach?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188817382


##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
 
 /// Return true if this scalar UDF is equal to the other.
 ///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules 
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must 
adhere to the rules of equivalence:
 ///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
 ///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures 
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are 
the same. This conservative approach
+/// ensures that different instances of the same function type are not 
mistakenly considered equal.
+///
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be 
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare 
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs, 
ScalarUDFImpl, Signature};
+/// #[derive(Debug)]
+/// struct MyUdf {
+///  param: i32,
+///  signature: Signature,
+/// }
+///
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> 
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self.param == other.param && self.name() == other.name()
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If 
`equals` returns true for two UDFs,
+///   their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined 
behavior for any input.
 fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {
+return true;
+}
+
+// if the types, names and signatures are the same, we can't know if 
they are the same so we
+// assume they are not.
+// If a UDF has internal state that should be compared, it should 
implement this method
+false

Review Comment:
   amended comment to clarify why we are not doing
   bit-by-bit comparison
   
   ```
   // Alternative approach: we could potentially do bit-by-bit 
comparison if both objects
   // are the same concrete type, but this requires:
   // 1. Both objects to have identical TypeId
   // 2. Careful handling of potential padding bytes in structs
   // 3. The concrete type to be safely comparable via memcmp
   // For now, we use the conservative approach of returning false
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-

Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188804048


##
datafusion/expr/src/test/udf_equals.rs:
##
@@ -0,0 +1,187 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::{
+ColumnarValue, ScalarFunctionArgs, ScalarUDF, ScalarUDFImpl, Signature, 
Volatility,
+};
+use arrow::datatypes::DataType;
+use datafusion_common::{not_impl_err, Result};
+use std::{
+any::Any,
+hash::{Hash, Hasher},
+};
+#[derive(Debug)]
+#[allow(dead_code)]

Review Comment:
   Removed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]

2025-07-06 Thread via GitHub


kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188802671


##
datafusion/expr/src/test/udf_equals.rs:
##
@@ -0,0 +1,187 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::{
+ColumnarValue, ScalarFunctionArgs, ScalarUDF, ScalarUDFImpl, Signature, 
Volatility,
+};
+use arrow::datatypes::DataType;
+use datafusion_common::{not_impl_err, Result};
+use std::{
+any::Any,
+hash::{Hash, Hasher},
+};
+#[derive(Debug)]
+#[allow(dead_code)]
+struct ParamUdf {
+param: i32,
+signature: Signature,
+}
+
+impl ParamUdf {
+fn new(param: i32) -> Self {
+Self {
+param,
+signature: Signature::exact(vec![DataType::Int32], 
Volatility::Immutable),
+}
+}
+}
+
+impl ScalarUDFImpl for ParamUdf {
+fn as_any(&self) -> &dyn Any {
+self
+}
+fn name(&self) -> &str {
+"param_udf"
+}
+fn signature(&self) -> &Signature {
+&self.signature
+}
+fn return_type(&self, _arg_types: &[DataType]) -> Result {
+Ok(DataType::Int32)
+}
+fn invoke_with_args(&self, _args: ScalarFunctionArgs) -> 
Result {
+not_impl_err!("not used")
+}
+fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+if let Some(other) = other.as_any().downcast_ref::() {
+self.param == other.param && self.type_id() == other.type_id()

Review Comment:
   Good point.
   Amended to implement
`derive(PartialEq) and then self == other`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]