Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3116309058 Closing this in favour of https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew closed pull request #16681: Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic URL: https://github.com/apache/datafusion/pull/16681 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
findepi commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3092345097 Posted a solution proposal -- https://github.com/apache/datafusion/issues/16677#issuecomment-3092338265 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
alamb commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3089418586 > remove default implementation of equality -- this is a breaking change As long as we sufficiently document how to fix it (with an example in the upgrade guide) I I think that makes sense to me Probably in DataFusion 50 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
findepi commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3088768792 @alamb https://github.com/apache/datafusion/pull/16781 aims to solve immediate obvious problem - incorrect implementation of equality for stateful functions managed in this repo. That PR does not solve #16677. This PR aims to solve the problem of equality being error-prone, wrong by default. I.e. addresses #16677. The #16781 PR shows how wide-spread the problem is, giving more motivation to fix the default behavior. My proposal is 1. land #16781 (or equivalent), i.e. fix the immediate problem 2. remove default implementation of equality -- this is a breaking change -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
alamb commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3085634080 I think this PR is an alternative to the proposed fix in - https://github.com/apache/datafusion/pull/16781 Is that right? Are you happy with the code in https://github.com/apache/datafusion/pull/16781 @kosiew ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2206960159
##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +700,89 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Return true if this scalar UDF is equal to the other.
///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must
adhere to the rules of equivalence:
+///
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+///
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are
the same. This conservative approach
+/// ensures that different instances of the same function type are not
mistakenly considered equal.
///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs,
ScalarUDFImpl, Signature};
+///
+/// #[derive(Debug, PartialEq)]
+/// struct MyUdf {
+/// param: i32,
+/// signature: Signature,
+/// }
///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) ->
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self == other
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If
`equals` returns true for two UDFs,
+/// their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined
behavior for any input.
fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {
Review Comment:
I think we cannot safely use std::ptr::eq(self, other) here because self and
other are trait objects (&dyn ScalarUDFImpl).
std::ptr::eq(self, other) would compare the vtable pointers, not the
underlying data pointers, and may return false even if the trait objects point
to the same data.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
findepi commented on PR #16681: URL: https://github.com/apache/datafusion/pull/16681#issuecomment-3072748613 "99%" of the cases are stateless[^1]. Would it be possible to provide default implementation that works great in the "99%" common case, but still works correctly in other cases? if that's not possible, and so the default behavior is inferior for the common case, it needs to be reimplemented in every function. Which means it's not really an optional method and so we should remove its default implementation. cc @alamb [^1]: the only field in most UDFs is Signature. It's there for convenience. Could be as well implemented as a static -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
findepi commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2206861910
##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +700,89 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Return true if this scalar UDF is equal to the other.
///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must
adhere to the rules of equivalence:
+///
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+///
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are
the same. This conservative approach
+/// ensures that different instances of the same function type are not
mistakenly considered equal.
///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs,
ScalarUDFImpl, Signature};
+///
+/// #[derive(Debug, PartialEq)]
+/// struct MyUdf {
+/// param: i32,
+/// signature: Signature,
+/// }
///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) ->
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self == other
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If
`equals` returns true for two UDFs,
+/// their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined
behavior for any input.
fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {
Review Comment:
Could this be `std::ptr::eq(self, other)` here?
##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Return true if this scalar UDF is equal to the other.
///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must
adhere to the rules of equivalence:
///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are
the same. This conservative
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188857055 ## datafusion/expr/src/test/mod.rs: ## @@ -16,3 +16,5 @@ // under the License. pub mod function_stub; +#[cfg(test)] +pub mod udf_equals; Review Comment: Good idea. Moved. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188833299
##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Return true if this scalar UDF is equal to the other.
///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must
adhere to the rules of equivalence:
///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are
the same. This conservative approach
+/// ensures that different instances of the same function type are not
mistakenly considered equal.
+///
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs,
ScalarUDFImpl, Signature};
+/// #[derive(Debug)]
+/// struct MyUdf {
+/// param: i32,
+/// signature: Signature,
+/// }
+///
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) ->
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self.param == other.param && self.name() == other.name()
Review Comment:
Amended to use
derive PartialEq
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681: URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188826793 ## docs/source/library-user-guide/upgrading.md: ## @@ -62,6 +62,36 @@ DataFusionError::SchemaError( [#16652]: https://github.com/apache/datafusion/issues/16652 + `ScalarUDFImpl::equals` Default Implementation + +The default implementation of the `equals` method in the `ScalarUDFImpl` trait has been updated. Previously, it compared only the type IDs, names, and signatures of UDFs. Now, it assumes UDFs are not equal unless their pointers are the same. Review Comment: That's a thoughtful consideration. You raise valid points about predictability vs. convenience. Let me outline the tradeoffs: Current approach (safe default with pointer comparison): ✅ Prevents silent bugs where ScalarUDFs with different internal state are incorrectly considered equal ✅ Forces developers to think about equality semantics for their specific UDFs ✅ Common expression elimination works correctly out of the box ❌ Requires manual action from users who want structural equality Alternative approach (require explicit PartialEq implementation): ✅ More explicit and predictable - no hidden behavior changes ✅ Leverages Rust's type system for compile-time guarantees ✅ Clear contract: if you want equality comparison, implement PartialEq ❌ Breaking change requiring immediate action from all ScalarUDF implementors ❌ More work for simple ScalarUDFs that don't need custom equality logic My recommendation: Stick with the current approach for DataFusion 49.0.0, but consider the PartialEq approach for a future major version. Here's why: The current change already enables critical functionality (common expression elimination) without breaking compilation The upgrade path is clear and documented with examples For DataFusion 50.0.0 or 51.0.0, we could introduce a more explicit API that requires PartialEq implementation This gives users time to adapt while ensuring the optimizer works correctly immediately. What do you think about this phased approach? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188817382
##
datafusion/expr/src/udf.rs:
##
@@ -696,16 +696,81 @@ pub trait ScalarUDFImpl: Debug + Send + Sync {
/// Return true if this scalar UDF is equal to the other.
///
-/// Allows customizing the equality of scalar UDFs.
-/// Must be consistent with [`Self::hash_value`] and follow the same rules
as [`Eq`]:
+/// This method allows customizing the equality of scalar UDFs. It must
adhere to the rules of equivalence:
///
-/// - reflexive: `a.equals(a)`;
-/// - symmetric: `a.equals(b)` implies `b.equals(a)`;
-/// - transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
+/// - Reflexive: `a.equals(a)` must return true.
+/// - Symmetric: `a.equals(b)` implies `b.equals(a)`.
+/// - Transitive: `a.equals(b)` and `b.equals(c)` implies `a.equals(c)`.
///
-/// By default, compares [`Self::name`] and [`Self::signature`].
+/// # Default Behavior
+/// By default, this method compares the type IDs, names, and signatures
of the two UDFs. If these match,
+/// the method assumes the UDFs are not equal unless their pointers are
the same. This conservative approach
+/// ensures that different instances of the same function type are not
mistakenly considered equal.
+///
+/// # Custom Implementation
+/// If a UDF has internal state or additional properties that should be
considered for equality, this method
+/// should be overridden. For example, a UDF with parameters might compare
those parameters in addition to
+/// the default checks.
+///
+/// # Example
+/// ```rust
+/// use std::any::Any;
+/// use std::hash::{DefaultHasher, Hash, Hasher};
+/// use arrow::datatypes::DataType;
+/// use datafusion_common::{not_impl_err, Result};
+/// use datafusion_expr::{ColumnarValue, ScalarFunctionArgs,
ScalarUDFImpl, Signature};
+/// #[derive(Debug)]
+/// struct MyUdf {
+/// param: i32,
+/// signature: Signature,
+/// }
+///
+/// impl ScalarUDFImpl for MyUdf {
+///fn as_any(&self) -> &dyn Any {
+///self
+///}
+///fn name(&self) -> &str {
+///"my_udf"
+///}
+///fn signature(&self) -> &Signature {
+///&self.signature
+///}
+///fn return_type(&self, _arg_types: &[DataType]) -> Result {
+///Ok(DataType::Int32)
+///}
+///fn invoke_with_args(&self, _args: ScalarFunctionArgs) ->
Result {
+///not_impl_err!("not used")
+///}
+/// fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+/// if let Some(other) = other.as_any().downcast_ref::() {
+/// self.param == other.param && self.name() == other.name()
+/// } else {
+/// false
+/// }
+/// }
+/// fn hash_value(&self) -> u64 {
+/// let mut hasher = DefaultHasher::new();
+/// self.param.hash(&mut hasher);
+/// self.name().hash(&mut hasher);
+/// hasher.finish()
+/// }
+/// }
+/// ```
+///
+/// # Notes
+/// - This method must be consistent with [`Self::hash_value`]. If
`equals` returns true for two UDFs,
+/// their hash values must also be the same.
+/// - Ensure that the implementation does not panic or cause undefined
behavior for any input.
fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
-self.name() == other.name() && self.signature() == other.signature()
+// if the pointers are the same, the UDFs are the same
+if std::ptr::eq(self.as_any(), other.as_any()) {
+return true;
+}
+
+// if the types, names and signatures are the same, we can't know if
they are the same so we
+// assume they are not.
+// If a UDF has internal state that should be compared, it should
implement this method
+false
Review Comment:
amended comment to clarify why we are not doing
bit-by-bit comparison
```
// Alternative approach: we could potentially do bit-by-bit
comparison if both objects
// are the same concrete type, but this requires:
// 1. Both objects to have identical TypeId
// 2. Careful handling of potential padding bytes in structs
// 3. The concrete type to be safely comparable via memcmp
// For now, we use the conservative approach of returning false
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188804048
##
datafusion/expr/src/test/udf_equals.rs:
##
@@ -0,0 +1,187 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::{
+ColumnarValue, ScalarFunctionArgs, ScalarUDF, ScalarUDFImpl, Signature,
Volatility,
+};
+use arrow::datatypes::DataType;
+use datafusion_common::{not_impl_err, Result};
+use std::{
+any::Any,
+hash::{Hash, Hasher},
+};
+#[derive(Debug)]
+#[allow(dead_code)]
Review Comment:
Removed.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Enhance `ScalarUDFImpl` Equality Handling with Pointer-Based Default and Customizable Logic [datafusion]
kosiew commented on code in PR #16681:
URL: https://github.com/apache/datafusion/pull/16681#discussion_r2188802671
##
datafusion/expr/src/test/udf_equals.rs:
##
@@ -0,0 +1,187 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+use crate::{
+ColumnarValue, ScalarFunctionArgs, ScalarUDF, ScalarUDFImpl, Signature,
Volatility,
+};
+use arrow::datatypes::DataType;
+use datafusion_common::{not_impl_err, Result};
+use std::{
+any::Any,
+hash::{Hash, Hasher},
+};
+#[derive(Debug)]
+#[allow(dead_code)]
+struct ParamUdf {
+param: i32,
+signature: Signature,
+}
+
+impl ParamUdf {
+fn new(param: i32) -> Self {
+Self {
+param,
+signature: Signature::exact(vec![DataType::Int32],
Volatility::Immutable),
+}
+}
+}
+
+impl ScalarUDFImpl for ParamUdf {
+fn as_any(&self) -> &dyn Any {
+self
+}
+fn name(&self) -> &str {
+"param_udf"
+}
+fn signature(&self) -> &Signature {
+&self.signature
+}
+fn return_type(&self, _arg_types: &[DataType]) -> Result {
+Ok(DataType::Int32)
+}
+fn invoke_with_args(&self, _args: ScalarFunctionArgs) ->
Result {
+not_impl_err!("not used")
+}
+fn equals(&self, other: &dyn ScalarUDFImpl) -> bool {
+if let Some(other) = other.as_any().downcast_ref::() {
+self.param == other.param && self.type_id() == other.type_id()
Review Comment:
Good point.
Amended to implement
`derive(PartialEq) and then self == other`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
