Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2667238494 Close by #14440 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 closed pull request #14268: Fix Type Coercion for UDF Arguments URL: https://github.com/apache/datafusion/pull/14268 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944712928
##
datafusion/expr-common/src/signature.rs:
##
@@ -376,7 +383,9 @@ impl TypeSignature {
TypeSignature::Coercible(types) => types
.iter()
.map(|logical_type| match logical_type {
-TypeSignatureClass::Native(l) =>
get_data_types(l.native()),
+TypeSignatureClass::Native(l)
+| TypeSignatureClass::Numeric(l)
+| TypeSignatureClass::Integer(l) =>
get_data_types(l.native()),
Review Comment:
`get_data_types` is used only in `get_possible_types`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944643137
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() &&
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() &&
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();
Review Comment:
`Ok(current_type.to_owned())`
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() &&
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() &&
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();
Review Comment:
`Ok(current_type.to_owned())`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944643767
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() &&
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() &&
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();
Review Comment:
```suggestion
Ok(current_type.to_owned())
```
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() &&
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received
{current_type}"
+)
}
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
Review Comment:
```suggestion
Ok(current_type.to_owned())
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944640302
##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
Interval,
Duration,
Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),
+Integer(LogicalTypeRef),
Review Comment:
@shehabgamin I found that we might not need `LogicalTypeRef`.
This is designed to accept all the Integer, so if the given type is integer,
we keep it as it is.
If we want specific integer type, then we should use Native instead. Does
this makes sense to you?
Numeric is the same
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944122850
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
@jayzhan211 Forsure, let's make sure @alamb is okay with this too before I
go ahead and make the change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942524406
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
do we support integer?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942560239
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
Review Comment:
I see
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942559050
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
Let's remove it
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942537530
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
Up to you and @alamb
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942541549
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
Review Comment:
Only difference is that I condensed the 3 if statements into 1
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942535860
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
Review Comment:
@jayzhan211 Not sure I follow, I reverted back to the original logic as
requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942535860
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
Review Comment:
@jayzhan211 Not sure I follow, I reverted to the original logic as requested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942530489
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
Review Comment:
This should be `target_type.default_cast_for(current_type)` only
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan-synnada commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942523715
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
Do we support integer?
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
Do we support integer?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942525348
##
datafusion/functions/src/string/bit_length.rs:
##
@@ -106,6 +108,33 @@ impl ScalarUDFImpl for BitLengthFunc {
}
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
same with this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942524406
##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
make_scalar_function(ascii, vec![])(args)
}
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()
Review Comment:
should we support integer? It is not consistent with Postgres/DuckDB
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
findepi commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2635850123 > I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, thereβs a critical distinction: **System contracts vs. SQL dialect behavior.** > > [...] > > The current approach keeps the door open for both strict and permissive use cases through explicit signature choices. With these changes, users can still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures while still leveraging broader coercion where appropriate. The function coercions are not enough for building a tailored system. Relational operators also do coercions (the set operators: union, intersect, except). In any case, for certain system designs -- those who take on responsibility of implementing their particular SQL dialect behavior before handing over the control over to DF core -- it's desirable to opt out from any coercion logic at all. @shehabgamin @linhr Given the Sail design, you might be interested in https://github.com/apache/datafusion/issues/12723. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824
##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
Interval,
Duration,
Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),
Review Comment:
Can we also avoid these `Numeric` and `Integer`? I'm thinking to deprecate
this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824
##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
Interval,
Duration,
Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),
Review Comment:
Can we also avoid these `Numeric` and `Integer`? I'm thinking to deprecate
this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824
##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
Interval,
Duration,
Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),
Review Comment:
Can we also avoid these 2? I'm thinking to deprecate this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2635031823 > @jayzhan211 To keep it simple ill just remove AnyNative and use coerce_types so we don't block this PR any longer. We can have a larger discussion and align on goals afterwards! > > CC @alamb @Omega359 Done, this should be good to merge now. `AnyNative` was removed, `Native` was reverted back to original logic, and for the 6 UDFs in the PR `Signature::user_defined` was used. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2634598214 @jayzhan211 To keep it simple ill just remove AnyNative and use coerce_types so we don't block this PR any longer. We can have a larger discussion and align on goals afterwards! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633097400 > This is the point, I don't quite understand why adding signature for `default_cast_to` logic helps. I thought you want to solve the issues for these datafusion functions, so I don't know what others functions you want to solve and they are only possible if we have `AnyNative`. > > @shehabgamin First of all, what are the issues we are solving? Are these functions in datafusion? > > ``` > ascii > bit_length > contains > ends_with > starts_with > octet_length > ``` > > Are there others issues? Why `AnyNative` helps? @jayzhan211 The regression is that all these functions before DataFusion 43 would coerce: 1. `Binary` -> `String` 2. `Int` -> `String` But now they no longer do. I am trying to find some middle ground here. I am happy to implement any signature for the functions in this PR, but currently, `default_cast_for` is inaccessible for downstream users, adding `AnyNative` makes it accessible. Sail implements hundreds of custom UDFs, so it would be nice to have access to `default_cast_for`. We're not the only downstream project that has a need for for flexible coercion like this (see here: https://github.com/apache/datafusion/issues/14230#issuecomment-2631350057). > Even if there is such case that we really need AnyNative, it seems strange to me to add in TypeSignatureClass but should be another new TypeSignature. I'm not sure that `AnyNative` should be a new `TypeSignature` because `TypeSignature::Coercible` most accurately describes the behavior. > Btw, if your proposed signature doesn't used by any functions in datafusion. You should use TypeSignature::UserDefined it is used for any other customize logic for downstream. The point is that `default_cast_for` is not accessible downstream. Should we just make `default_cast_for` a public function? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633007543 Btw, if your proposed signature doesn't help for any functions in datafusion. You should use `TypeSignature::UserDefined` it is used for any other customize logic for downstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633000378 > if it helps downstream projects This is the point, I don't quite understand why adding signature for `default_cast_to` logic helps. I thought you want to solve the issues for these datafusion functions, so I don't know what others functions you want to solve and they are only possible if we have `AnyNative`. Even if there is such case that we really need `AnyNative`, it seems to strange to me to add in TypeSignatureClass but should be another new TypeSignature. > IMO CoercibleV2 would be out of scope for this PR I think this is the solution to the issue we have, we need coercible like signature but **not fixed logic exposed to the user** since it makes any changes to it breaking change, while `CoercibleV2` is not the case. @shehabgamin First of all, what are the issues we are solving? Are these functions in datafusion? ``` ascii bit_length contains ends_with starts_with octet_length ``` Are there others issues? Why `AnyNative` helps? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632975463 > As we have discussed, we should avoid using old Coercible signature and also the TypeSignatureClass that is used in Coercible, because any change might impact downstream projects, although if we add new `TypeSignatureClass` like what you did will not, but adding more soon to remove things are not good idea @jayzhan211 I thought you said it was okay to add a new signature if it helps downstream projects. See here: https://github.com/apache/datafusion/pull/14268#issuecomment-2629247219 I reverted `Native` back to the original logic before this PR and then I added `AnyNative`. > I think we can work directly on CoercibleV2 I mentioned for these functions IMO `CoercibleV2` would be out of scope for this PR. It looks like @alamb would like to get this PR sorted out if possible for `45` (see here: https://github.com/apache/datafusion/issues/14008#issuecomment-2631357227) If I am understanding you correctly, you are okay with adding a new signature but not applying them to the UDFs in this PR? Should I apply User-defined coercion as you were mentioning earlier? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632952904 > @jayzhan211 What did I miss here? As we have discussed, we should avoid using old Coercible signature and also the TypeSignatureClass that is used in Coercible. Therefore the change doesn't seem toward the correct path to me -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632940583 > > I may add a new TypeSignatureClass in this pr to expose default_cast_for so that it's accessible by downstream users > > Not sure do we need this π€ since we might not need this in the future, but if this helps downstream project then it is fine to me to add it for now and remove it in the future > We can implement first version for the functions you mentioned, I believe the change makes more sense > I don't quite understand why we are adding more `TypeSignatureClass` here π @jayzhan211 What did I miss here? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632938494 > > @shehabgamin #14440 I come out a flexible version of Signature::CoercibleV2 (temporary name), it can replace `Signature::Coercible`, `Signature::String`, `Signature::Numeric`, `Signature::Exact`, `Signature::Uniform` and `Signature::Comparable`. > > The most difference is that the `implicit coercion` is now defined by the user. This way any change for datafusion doesn't impact downstream projects unlike the existing signature like `Signature::String` or `Signature::Coercible`. > > I probably don't have time to push it forward in the recent days, if you are interested in it you can work on it. > > I think it makes sense to work on this for DataFusion 46! > > > We can implement first version for the functions you mentioned, I believe the change makes more sense > > @jayzhan211 If I am understanding you correctly, Done! I added `AnyNative` and applied it to the functions mentioned. > > CC @alamb @Omega359 I don't quite understand why we are adding more `TypeSignatureClass` here π -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632894156 > @shehabgamin #14440 I come out a flexible version of Signature::CoercibleV2 (temporary name), it can replace `Signature::Coercible`, `Signature::String`, `Signature::Numeric`, `Signature::Exact`, `Signature::Uniform` and `Signature::Comparable`. > > The most difference is that the `implicit coercion` is now defined by the user. This way any change for datafusion doesn't impact downstream projects unlike the existing signature like `Signature::String` or `Signature::Coercible`. > > I probably don't have time to push it forward in the recent days, if you are interested in it you can work on it. I think it makes sense to work on this for DataFusion 46! > We can implement first version for the functions you mentioned, I believe the change makes more sense @jayzhan211 If I am understanding you correctly, Done! I added `AnyNative` and applied it to the functions mentioned. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632259185 I'll work on this tonight @alamb @jayzhan211 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2631352326 FWIW I think @Omega359 hit the same "int no longer automatically coerces to String" in his application too: https://github.com/apache/datafusion/issues/14230#issuecomment-2631350057 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2631043032 @shehabgamin #14440 I come out a flexible version of Signature::CoercibleV2 (temporary name), it can be replace `Signature::String`, `Signature::Numeric`, `Signature::Exact`, `Signature::Uniform` and `Signature::Comparable` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629300546 > I'll get to this sometime in the next few hours. Taking a detour to review and comment on https://github.com/apache/datafusion/pull/14392. @jayzhan211 @alamb, I bit off more than I can chew here. I actually wonβt be able to get to this until Monday night (PT). Hopefully it can still make it in time for the 45 release. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629247219 > > If the approach looks good to you too, we can go ahead! > > Sounds good! This works for now since we'll be working towards a better design in the future. > > I may add a new `TypeSignatureClass` in this pr to expose `default_cast_for` so that it's accessible by downstream users. I'll also update the doc comments to accurately reflect the behavior of `Coercible`. > > I'll get to this sometime in the next few hours. Taking a detour to review and comment on https://github.com/apache/datafusion/pull/14392. > > > We already have user defined coercion for functions, #14296 is for binary op or case expression and so on. > > > > If we agree to use User-defined coercion for functions everywhere, modify coerce types are the easier fix that can go into release quicker. Another better design is to redesign a good method for the function trait, I will also think about the design too. > > I think a redesign is the path forward. As long as we offer flexibility for customization (e.g., users can override the default signature), then it doesn't really matter what the default implemented signature is, because we won't have to worry about conflicting requirements. I guess we will remove signature::Coercible in the future. Your purpose of adding signature that do 'default_cast_for' is just a quick workaround? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629245515 > If the approach looks good to you too, we can go ahead! Sounds good! This works for now since we'll be working towards a better design in the future. I may add a new `TypeSignatureClass` in this pr to expose `default_cast_for` so that it's accessible by downstream users. I'll also update the doc comments to accurately reflect the behavior of `Coercible`. I'll get to this sometime in the next few hours. Taking a detour to review and comment on https://github.com/apache/datafusion/pull/14392. > We already have user defined coercion for functions, #14296 is for binary op or case expression and so on. > > If we agree to use User-defined coercion for functions everywhere, modify coerce types are the easier fix that can go into release quicker. Another better design is to redesign a good method for the function trait, I will also think about the design too. I think a redesign is the path forward. As long as we offer flexibility for customization (e.g., users can override the default signature), then it doesn't really matter what the default implemented signature is, because we won't have to worry about conflicting requirements. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629240637 > ### Topic 1 > > @findepi I had thought about using Signature:: User-defined everywhere before but was rejected for some reason. As the discussion here, I think we need this to minimize the impact of any changes that is required for datafusion builtun function but not for other projects. Any thoughts to add in? > > @alamb @jayzhan211 Are we waiting on @findepi to chime in, or should I just go ahead with this approach? > > ### Topic 2 > > It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 > > > > To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better > > I am happy to help out here! I think that we all want the same thing, but what's going on is that it's hard for all of us to be fully contextualized with everything so we end up going in circles. > > @alamb @jayzhan211 Let's bring the conversation back here https://github.com/apache/datafusion/issues/14296 and try to get this topic resolved for DataFusion 46. I would also like to note that I'm always happy to hop on a call or do some sort of live messaging if that's ever desired. > If the approach looks good to you too, we can go ahead! We already have user defined coercion for functions, #14296 is for binary op or case expression and so on. If we agree to use User-defined coercion for functions everywhere, modify coerce types are the easier fix that can go into release quicker. Another better design is to redesign a good method for the function trait, I will also think about the design too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629239585 ### Topic 1 > @findepi I had thought about using Signature:: User-defined everywhere before but was rejected for some reason. As the discussion here, I think we need this to minimize the impact of any changes that is required for datafusion builtun function but not for other projects. Any thoughts to add in? @alamb @jayzhan211 Are we waiting on @findepi to chime in, or should I just go ahead with this approach? ### Topic 2 > It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 > > To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better I am happy to help out here! I think that we all want the same thing, but what's going on is that it's hard for all of us to be fully contextualized with everything so we end up going in circles. @alamb @jayzhan211 Let's bring the conversation back here https://github.com/apache/datafusion/issues/14296 and try to get this topic resolved for DataFusion 46. I would also like to note that I'm always happy to hop on a call or do some sort of live messaging if that's ever desired. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629035590 Given I feel like we have not yet reached consensus on this issue I made a release branch for 45 without this change - https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232 I expect to make a release candidate tomorrow or monday so if we can get something that is safe to backport to 45 by then I can do so. Otherwise let's just fix it on main and include in version 46 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628939208 > @shehabgamin can we try to use Singatur::User-defined coerce types for those functions, the change is relatively trivial and the impact is smaller than the current one @findepi I had thought about using Signature:: User-defined everywhere before but was rejected for some reason. As the discussion here, I think we need this to minimize the impact of any changes that is required for datafusion builtun function but not for other projects. Any thoughts to add in? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628936121 @shehabgamin can we try to use coerce types for those functions, the change is relatively trivial and less breakage compare to the current one -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628913239 > > So what is the actual regressions to be resolved? I don't remember signature coericible is consistent with the docs before so it is not a regression to be fixed now. If supporting binary for string is something used to work, then we can add such coercion logic to signature:: string > > @jayzhan211 The regressions resolved are all the UDFs listed in the PR description. Every test added to this PR passed on DataFusion `42`, and then every test that tested coercion behavior started failing on DataFusion `43`. > > Signature coericible not being consistent with the documentation is a big problem, this PR fixes that. > > > Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? > > @alamb The changes to the coercible behavior are already very small and are outlined below. > ### Changes To Coercible Behavior > > **1. We go from this:** > https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L585-L604 > **To this:** > https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L542-L545 > > **2. Before this line:** > https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L640 > **We add this:** > https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L581-L597 > > That's it. > > Just now i've implemented `TypeSignatureClass::Integer` and updated `repeat` to use that. Beyond this, I'm really not sure how to move forward here. The documentation is wrong but we should fix the description in this case, because the Coercible is introduced not because of we need a very flexible coercion logic but the logic like it is now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628900809 > So what is the actual regressions to be resolved? I don't remember signature coericible is consistent with the docs before so it is not a regression to be fixed now. If supporting binary for string is something used to work, then we can add such coercion logic to signature:: string @jayzhan211 The regressions resolved are all the UDFs listed in the PR description. Every test added to this PR passed on DataFusion `42`, and then every test that tested coercion behavior started failing on DataFusion `43`. Signature coericible not being consistent with the documentation is a big problem, this PR fixes that. > Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? @alamb The changes to the coercible behavior are already very small and are outlined below. ### Changes To Coercible Behavior 1. We go from this: https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L585-L604 To this: https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L542-L545 2. Before this line: https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L640 We add this: https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L581-L597 I've implemented `TypeSignatureClass::Integer`and update repeat to use that. Beyond this, I'm really not sure how to move forward here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628591052 > > > I don't have a strong opinion about how exactly the coercion rules should be changed. > > > > > > > > > > > > It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 > > > > > > > > > > > > To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better > > > > > > > > > > > > That being said, I would like to make the 45.0.0 release candidate soon (like later today or tomorrow) and I worry that making non trivial changes to the coercion logic so close may lead to other unintended consequences > > > > > > > > > > > > Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? > > > > > > > > > > @alamb I'm not near a computer until later tonight, but just lemme know what you'd like me to do and i'll take care of it when I'm back home! > > So what is the actual regressions to be resolved? I don't remember signature coericible is consistent with the docs before so it is not a regression to be fixed now. If supporting binary for string is something used to work, then we can add such coercion logic to signature:: string But change of signature::string may also cause another impact on others that don't want binary to string coercionπ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628583804 > > I don't have a strong opinion about how exactly the coercion rules should be changed. > > > > > > > > It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 > > > > > > > > To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better > > > > > > > > That being said, I would like to make the 45.0.0 release candidate soon (like later today or tomorrow) and I worry that making non trivial changes to the coercion logic so close may lead to other unintended consequences > > > > > > > > Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? > > > > > > @alamb I'm not near a computer until later tonight, but just lemme know what you'd like me to do and i'll take care of it when I'm back home! So what is the actual regressions to be resolved? I don't remember signature coericible is consistent with the docs before so it is not a regression to be fixed now. If supporting binary for string is something used to work, then we can add such coercion logic to singature:: sitting -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628446082 > I don't have a strong opinion about how exactly the coercion rules should be changed. > > > > It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 > > > > To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better > > > > That being said, I would like to make the 45.0.0 release candidate soon (like later today or tomorrow) and I worry that making non trivial changes to the coercion logic so close may lead to other unintended consequences > > > > Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? > > @alamb I'm not near a computer until later tonight, but just lemme know what you'd like me to do and i'll take care of it when I'm back home! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2627866762 I don't have a strong opinion about how exactly the coercion rules should be changed. It does seem to me that we keep churning / thrashing on coercion (and introducing regressions like the one this PR fixes along with https://github.com/apache/datafusion/issues/14154 and https://github.com/apache/datafusion/issues/14383 To me this is a sign we need more work as @jayzhan211 says and I am sure we would appreciate help figuring out what to do / making the code better That being said, I would like to make the 45.0.0 release candidate soon (like later today or tomorrow) and I worry that making non trivial changes to the coercion logic so close may lead to other unintended consequences Is there any way we can fix the regression bug with some sort of smaller patch and then make the changes to coerceable behavior as a follow on PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626728678 > First of all, we need to have a out of box behavior for each function, and at the same time easy to be customizable. The Signature Coercible now becomes problematic, if we follows the system contract, it is not usable internally in datafusion core, but if we adapt it to function in datafusion core, it becomes less flexible and even easy to cause surprised regressions for other projects that rely on the old signature logic. My guess is that it would still be usable internally by some functions in datafusion core. This is because coercion behavior can often be unique to a specific UDF. > The idea I have so far is that how about make Signature a method of trait at all, each function is able to implement it's own signature check, it not only is flexible by default and also any change to it is safely not to impact others. To make coercion less repeated and easy to use we provide many small and well defined utility functions like string coercion or numeric coercion both used internally and public to others. This would be out of scope for this PR, but I think it aligns pretty well with the discussion we were having here: https://github.com/apache/datafusion/issues/14296#issuecomment-2614337663 For the purpose of this PR, what are your thoughts about: 1. Keeping the changes to the coercible behavior. This follows the system contract and makes the signature consistent with the documentation. 2. Using different signatures (or creating new signatures) for the affected UDFs in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626205413 > > > @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments > > > It looks like there are some unresolved comments like [#14268 (comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432) from @jayzhan211 that suggest this PR is being too aggressive in coercion > > > I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions > > > > There are concerns unresolved > > @alamb and @jayzhan211 - Thank you for your patience and thoughtful discussion. Sorry for being quiet here, Iβve spent time reflecting on how to reconcile the competing priorities over the past couple of days. > > Let me clarify the core principles driving this PR. > > This PR aims to restore DataFusion's documented type coercion contracts - specifically the semantics defined for `TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging the existing casting logic in `NativeType::default_cast_for`. > > The current implementation strictly follows the guidance in: https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type castable to the target `NativeType` through the explicit set of type conversion rules defined in `NativeType::default_cast_for`. > > I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, thereβs a critical distinction: **System contracts vs. SQL dialect behavior.** > > While PostgreSQL/DuckDB-style strictness is desirable for specific use cases, subverting the type systemβs foundational contracts would result in: > > 1. **Lost flexibility:** Users lose the ability to intentionally design UDFs with broader NativeType coercion patterns when needed. The system becomes opinionated in ways that aren't transparent. > 2. **Hidden couplings:** Tightly linking coercion rules to individual function implementations makes behavior unpredictable across optimizer rules. What works for `ascii()` might break `concat()` unexpectedly. > 3. **Eroded system integrity:** Undocumented exceptions to `NativeType::default_cast_for` in `Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion context-dependent and impossible to maintain and reason about holistically. > > The current approach keeps the door open for both strict and permissive use cases through explicit signature choices. With these changes, users can still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures while still leveraging broader coercion where appropriate. Conversely, hardcoding dialect-specific restrictions at the type system level would limit DataFusionβs flexibility and usability as a polyglot execution engine. > > If we bypass these contracts to enforce function-specific rules then we: > > 1. **Break documented behavior:** The system no longer behaves as its APIs and documentation promise, creating confusion. > 2. **Misalign naming and semantics**: A `Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid coercions via `NativeType::default_cast_for` becomes a contradiction in terms. > > I don't feel strongly about what signatures each individual UDF in this PR should implement, but I do feel strongly about maintaining the documented type coercion contracts and system semantics that define how DataFusion's type system is intended to work. The behavior of `TypeSignatureClass::Native` and `NativeType::default_cast_for` is explicitly documented, and their naming clearly signals their purpose. I think you totally point out the challenge we have for function's type coercion, the large difference between each function makes it hard to design a one size fit all solution but if we all in flexibility (with 'coerce_types' we ends up similar and repeated coercion logic everywhere. First of all, we need to have a out of box behavior for each function, and at the same time easy to be customizable. The Signature Coercible now becomes problematic, if we follows the system contract, it is not usable internally in datafusion core, but if we adapt it to function in datafusion core, it becomes less flexible and even easy to cause surprised regressions for other projects that rely on the old signature logic. The idea I have so far is that how about make Signature a method of trait at all, each function is able to implement it's own signature check, it not only is flexible by default and also any change to it is safely not to impact others. To make co
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626079378 > > @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments > > It looks like there are some unresolved comments like [#14268 (comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432) from @jayzhan211 that suggest this PR is being too aggressive in coercion > > I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions > > There are concerns unresolved @alamb and @jayzhan211 - Thank you for your patience and thoughtful discussion. Sorry for being quiet here, Iβve spent time reflecting on how to reconcile the competing priorities over the past couple of days. Let me clarify the core principles driving this PR. This PR aims to restore DataFusion's documented type coercion contracts - specifically the semantics defined for `TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging the existing casting logic in `NativeType::default_cast_for`. The current implementation strictly follows the guidance in https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134, where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type castable to the target `NativeType` through the explicit set of type conversion rules defined in `NativeType::default_cast_for`. I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, thereβs a critical distinction: **System contracts vs. SQL dialect behavior.** While PostgreSQL/DuckDB-style strictness is desirable for specific use cases, subverting the type systemβs foundational contracts would result in: 1. **Lost flexibility:** Users lose the ability to intentionally design UDFs with broader NativeType coercion patterns when needed. The system becomes opinionated in ways that aren't transparent. 2. **Hidden couplings:** Tightly linking coercion rules to individual function implementations makes behavior unpredictable across optimizer rules. What works for `ascii()` might break `concat()` unexpectedly. 3. **Eroded system integrity:** Undocumented exceptions to `NativeType::default_cast_for` in `Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion context-dependent and impossible to maintain and reason about holistically. The current approach keeps the door open for both strict and permissive use cases through explicit signature choices. With these changes, users can still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures while still leveraging broader coercion where appropriate. Conversely, hardcoding dialect-specific restrictions at the type system level would limit DataFusionβs flexibility and usability as a polyglot execution engine. If we bypass these contracts to enforce function-specific rules then we: 1. **Break documented behavior:** The system no longer behaves as its APIs and documentation promise, creating confusion. 2. **Misalign naming and semantics**: A `Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid coercions via `NativeType::default_cast_for` becomes a contradiction in terms. I don't feel strongly about what signatures each individual UDF in this PR should implement, but I do feel strongly about maintaining the documented type coercion contracts and system semantics that define how DataFusion's type system is intended to work. The behavior of `TypeSignatureClass::Native` and `NativeType::default_cast_for` is explicitly documented, and their naming clearly signals their purpose. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626045546 > @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments > > It looks like there are some unresolved comments like https://github.com/apache/datafusion/pull/14268#discussion_r1933063432 from @jayzhan211 that suggest this PR is being too aggressive in coercion > > I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions There are concerns unresolved -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on PR #14268: URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2625430773 @jayzhan211 and @shehabgamin what is the status of this PR? It looks to me like there are some unresolved comments It looks like there are some unresolved comments like https://github.com/apache/datafusion/pull/14268#discussion_r1933063432 from @jayzhan211 that suggest this PR is being too aggressive in coercion I would like to try and get this in for the fhe DataFusion 45 release so it would be nice if we can figure out how to fix the regressions -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1936187986
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
-1.2-1.2-1.2
-query error DataFusion error: Error during planning: Internal error: Expect
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but
received Float64
+query T
select repeat('-1.2', 3.2);
Review Comment:
In main this query does an internal error
```sql
> select repeat('foo', 3.2);
Error during planning: Internal error: Expect
TypeSignatureClass::Native(LogicalType(Native(Int64), Int64)) but received
Float64.
This was likely caused by a bug in DataFusion's code and we would welcome
that you file an bug report in our issue tracker No function matches the given
name and argument types 'repeat(Utf8, Float64)'. You might need to add explicit
type casts.
Candidate functions:
repeat(TypeSignatureClass::Native(LogicalType(Native(String), String)),
TypeSignatureClass::Native(LogicalType(Native(Int64), Int64)))
```
Avoiding an internal error seems like an improvement to me
I agree in general it would likely be better to help people avoid an error
by rejecting non-integer arguments with a nicer error
Filed
- https://github.com/apache/datafusion/issues/14376
I don't understand the implementation implications however
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
I think for now this is ok in my opinion as there is no spark functions
crate yet π¬ Maybe this is yet another example of how it is needed
- https://github.com/apache/datafusion/issues/5600
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
As a rule of thumb, stick to either Postgres or DuckDB for functions, and
avoid overly broad type casting that allows any type to convert to any other.
This can turn invalid queries into valid ones, making errors harder to catch,
especially when tests donβt cover all cases.
Based on that,
1. Cast from Binary to String ππ»
2. Cast from Numeric to String ππ» (At least for the string functions
changed in this PR)
3. Cast from Float to Integer ππ» (At least not for `repeat`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
As a rule of thumb, stay consistent with either Postgres or DuckDB when
using functions. Avoid overly general casting that allows any type to be cast
to any other type, as this can turn invalid queries into valid ones. This makes
it harder to catch errors, especially when tests don't cover all cases
Based on that,
1. Cast from Binary to String ππ»
2. Cast from Numeric to String ππ» (At least for the string functions
changed in this PR)
3. Cast from Float to Integer ππ» (At least not for `repeat`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
The rule of thumb is that try to be consistent with either Postgres and
DuckDB for the functions and don't make the casting too general that cast any
type to any type since this turns invalid query into valid one and it is not
easy to know that given the test is not covered.
Based on that,
1. Cast from Binary to String ππ»
2. Cast from Numeric to String ππ» (At least for the string functions
changed in this PR)
3. Cast from Float to Integer ππ» (At least not for `repeat`)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933063432
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
@shehabgamin
I suggest with revert to the previous change and add binary->string casting
given most of the string function allow this kind of casting, but not
int->string like `ascii(2)`
For Signature::String, we can make it a convenient wrapper of
`Coercible(string, string)` or deprecate it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933056565
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
> `TypeSignature::Coercible` is designed to cast between the same logical
type, but now it casts to any kind of type, I don't think this is ideal.
This is what the doc comments say:
```
/// One or more arguments belonging to the [`TypeSignatureClass`], in order.
///
/// For example, `Coercible(vec![logical_float64()])` accepts
/// arguments like `vec![Int32]` or `vec![Float32]`
/// since i32 and f32 can be cast to f64
///
/// For functions that take no arguments (e.g. `random()`) see
[`TypeSignature::Nullary`].
Coercible(Vec),
```
Link:
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
For `Signature::string` all input args must be `String` or `Dictionary` with
a `String` value (Unchanged in this PR). For `Signature::Coercible(string)`,
the input args must be able to cast to a `String` (dictated via
`default_cast_for`).
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
For `Signature::string(x)` all input args must be `String` or `Dictionary`
with a `String` value (Unchanged in this PR). For
`Signature::Coercible(string)`, the input args must be able to cast to a
`String` (dictated via `default_cast_for`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933060341
##
datafusion/functions/src/string/bit_length.rs:
##
@@ -55,7 +58,10 @@ impl Default for BitLengthFunc {
impl BitLengthFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(1, Volatility::Immutable),
+signature: Signature::coercible(
Review Comment:
See here:
https://github.com/apache/datafusion/pull/14268#discussion_r1933055108
But yes I agree, the documentation should be clearer. I had to dig into the
code to get a solid understanding of the behavior.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933056565
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
> `TypeSignature::Coercible` is designed to cast between the same logical
type, but now it casts to any kind of type, I don't think this is ideal.
This is what the doc comments say:
```
/// One or more arguments belonging to the [`TypeSignatureClass`], in order.
///
/// For example, `Coercible(vec![logical_float64()])` accepts
/// arguments like `vec![Int32]` or `vec![Float32]`
/// since i32 and f32 can be cast to f64
///
/// For functions that take no arguments (e.g. `random()`) see
[`TypeSignature::Nullary`].
Coercible(Vec),
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
For `Signature::string` all input args must be `String` or `Dictionary` with
a `String` value. For `Signature::Coercible`, the input args must be able to
cast to a `String` (dictated via `default_cast_for`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
For `Signature::string` all input args must be a `String` or Dictionary with
`String` value. For `Signature::Coercible`, the input args must be able to cast
to a `String` (dictated via `default_cast_for`).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933052292
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
For example, in this PR, `SELECT bit_length(12);` now return 16 instead of
error, but I think we should error. Any unexpected type is valid which doesn't
seem correct
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933048065
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
`TypeSignature::Coercible` is designed to cast between the same logical
type, but now it cast any kind of type, I don't think this is ideal.
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
`TypeSignature::Coercible` is designed to cast between the same logical
type, but now it casts to any kind of type, I don't think this is ideal.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933034910
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
Actually I'm quite confuse why we need to change to coercible(string,
string) in this PR, `String(2)` is equivalent to `Coercible(string, string)` in
my mind, so I don't quite follow the change at all
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933031384
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
-1.2-1.2-1.2
-query error DataFusion error: Error during planning: Internal error: Expect
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but
received Float64
+query T
select repeat('-1.2', 3.2);
Review Comment:
I still think non-integer 2nd arg should return error
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933030522
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
Should this belongs to Spark-functions crate instead of datafusion core?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
alamb commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1932866009
##
datafusion/functions/src/string/bit_length.rs:
##
@@ -55,7 +58,10 @@ impl Default for BitLengthFunc {
impl BitLengthFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(1, Volatility::Immutable),
+signature: Signature::coercible(
Review Comment:
as a minor comment this new signature is quite a bit of a mouthful compared
to `Signature::string`
It isn't clear to me from reading the code / comments when one would prefer
to use one over the other (not related to this PR)
##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
impl ContainsFunc {
pub fn new() -> Self {
Self {
-signature: Signature::string(2, Volatility::Immutable),
Review Comment:
@jayzhan211 / @notfilippo should we deprecate `Signature::string` and
direct people to `Signature::coerceable`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
The behavior is consistent with Spark and the tests also come from the Spark
codebase (which we ported to Sail then here lol).
```
spark-sql (default)> SELECT ASCII(2);
50
Time taken: 0.952 seconds, Fetched 1 row(s)
spark-sql (default)> SELECT ASCII('2');
50
Time taken: 0.044 seconds, Fetched 1 row(s)
spark-sql (default)>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
The behavior is consistent with Spark, the tests come from the Spark
codebase (which we ported to Sail then here too lol) as well.
```
spark-sql (default)> SELECT ASCII(2);
50
Time taken: 0.952 seconds, Fetched 1 row(s)
spark-sql (default)> SELECT ASCII('2');
50
Time taken: 0.044 seconds, Fetched 1 row(s)
spark-sql (default)>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929723820
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
match target_type_class {
TypeSignatureClass::Native(native_type) => {
let target_type = native_type.native();
-if &logical_type == target_type {
Review Comment:
If it's `TypeSignature::Coercible` with a
`TypeSignatureClass::Native(logical_string())`, then yes. Any function that
specifies `TypeSignature::Coercible` with a `TypeSignatureClass::Native` should
coerce according to the behavior implemented in the `default_cast_for` function
for `NativeType` in order to be consistent.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929721662
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
-1.2-1.2-1.2
-query error DataFusion error: Error during planning: Internal error: Expect
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but
received Float64
+query T
select repeat('-1.2', 3.2);
Review Comment:
Maybe the 2nd arg should be `TypeSignatureClass::Integer` instead of
`TypeSignatureClass::Native(logical_int64())`? It's currently unimplemented
with a `TODO`:
https://github.com/apache/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/expr-common/src/signature.rs#L216-L218
Otherwise, it would seem strange to deviate from the behavior specified in
the implementation of `default_cast_for`for `NativeType`.
I do think that we should prioritize being permissive and general-purpose
though as I was discussing here:
https://github.com/apache/datafusion/issues/14296#issuecomment-2614200125
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
The behavior is consistent with Spark, the tests come from the Spark
codebase as well.
```
spark-sql (default)> SELECT ASCII(2);
50
Time taken: 0.952 seconds, Fetched 1 row(s)
spark-sql (default)> SELECT ASCII('2');
50
Time taken: 0.044 seconds, Fetched 1 row(s)
spark-sql (default)>
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929710139
##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('π―a')
128175
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)
Review Comment:
In DuckDB, this doesn't work
```
D select ascii(2);
Binder Error: No function matches the given name and argument types
'ascii(INTEGER_LITERAL)'. You might need to add explicit type casts.
Candidate functions:
ascii(VARCHAR) -> INTEGER
LINE 1: select ascii(2);
^
D select ascii('2');
ββ
β ascii('2') β
β int32β
ββ€
β 50 β
ββ
```
So does Postgres
```
postgres=# select ascii(2);
ERROR: function ascii(integer) does not exist
LINE 1: select ascii(2);
^
HINT: No function matches the given name and argument types. You might need
to add explicit type casts.
postgres=# select ascii('2');
ascii
---
50
(1 row)
```
This used to work in DataFusion, but it is not consistent with other
systems. I'm not sure whether we should introduce this behaviour back
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan,
expected)?;
Ok(())
}
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr,
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr,
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {
Review Comment:
We can remove this test if the purpose of the test is covered already in slt
##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -744,6 +706,57 @@ fn get_valid_types(
Ok(valid_types)
}
+/// Validates that all data types are either [`NativeType::String`] or
[`NativeType::Null`].
+/// For [`NativeType::Null`], returns [`DataType::Utf8`] as the default string
type.
+/// Returns error if any type is neither [`NativeType::String`] nor
[`NativeType::Null`].
+fn validate_and_collect_string_types(data_types: &[DataType]) ->
Result> {
+data_types
+.iter()
+.map(|data_type| {
+let logical_type: NativeType = data_type.into();
+match logical_type {
+NativeType::String => Ok(data_type.to_owned()),
+// TODO: Switch to Utf8View if all the string functions
supports Utf8View
+NativeType::Null => Ok(DataType::Utf8),
+_ => plan_err!("The signature expected NativeType::String but
received {logical_type}"),
+}
+})
+.collect()
+}
+
+/// Returns a common string [`DataType`] that both input types can be coerced
to.
+/// Handles [`DataType::Dictionary`] by recursively finding common type of
their value [`DataType`].
+/// Returns error if types cannot be coerced to a common string [`DataType`].
+fn find_common_string_type(lhs_type: &DataType, rhs_type: &DataType) ->
Result {
+match (lhs_type, rhs_type) {
+(DataType::Dictionary(_, lhs), DataType::Dictionary(_, rhs)) => {
+find_common_string_type(lhs, rhs)
+}
+(DataType::Dictionary(_, v), other) | (other, DataType::Dictionary(_,
v)) => {
+find_common_string_type(v, other)
+}
+_ => {
+if let Some(coerced_type) = string_coerc
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929701613
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan,
expected)?;
Ok(())
}
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr,
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr,
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {
Review Comment:
Ending up putting the tests in the `.slt` file, but figured we can still
leave this test here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]
shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929529897
##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan,
expected)?;
Ok(())
}
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr,
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr,
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {
Review Comment:
I would have preferred to place the various UDF tests within their
respective files, but I couldn't due to circular dependencies.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
