Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-18 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2667238494

   Close by #14440 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-18 Thread via GitHub


jayzhan211 closed pull request #14268: Fix Type Coercion for UDF Arguments
URL: https://github.com/apache/datafusion/pull/14268


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-06 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944712928


##
datafusion/expr-common/src/signature.rs:
##
@@ -376,7 +383,9 @@ impl TypeSignature {
 TypeSignature::Coercible(types) => types
 .iter()
 .map(|logical_type| match logical_type {
-TypeSignatureClass::Native(l) => 
get_data_types(l.native()),
+TypeSignatureClass::Native(l)
+| TypeSignatureClass::Numeric(l)
+| TypeSignatureClass::Integer(l) => 
get_data_types(l.native()),

Review Comment:
   `get_data_types` is used only in `get_possible_types`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-06 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944643137


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() && 
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() && 
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();

Review Comment:
   `Ok(current_type.to_owned())`



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() && 
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() && 
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();

Review Comment:
   `Ok(current_type.to_owned())`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-06 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944643767


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() && 
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();
+if target_type.is_numeric() && 
logical_type.is_numeric() {
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
+}
+TypeSignatureClass::Integer(native_type) => {
+let target_type = native_type.native();

Review Comment:
   ```suggestion
Ok(current_type.to_owned())
   ```



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null
+|| (target_type.is_numeric() && 
logical_type.is_numeric())
+{
+target_type.default_cast_for(current_type)
+} else {
+internal_err!(
+"Expect {target_type_class} but received 
{current_type}"
+)
 }
-
-if logical_type == NativeType::Null {
-return target_type.default_cast_for(current_type);
+}
+TypeSignatureClass::Numeric(native_type) => {
+let target_type = native_type.native();

Review Comment:
   ```suggestion
   Ok(current_type.to_owned())
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-06 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944640302


##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
 Interval,
 Duration,
 Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),
+Integer(LogicalTypeRef),

Review Comment:
   @shehabgamin I found that we might not need `LogicalTypeRef`.
   This is designed to accept all the Integer, so if the given type is integer, 
we keep it as it is.
   
   If we want specific integer type, then we should use Native instead. Does 
this makes sense to you?
   
   Numeric is the same



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1944122850


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   @jayzhan211 Forsure, let's make sure @alamb is okay with this too before I 
go ahead and make the change.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942524406


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   do we support integer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942560239


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null

Review Comment:
   I see



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942559050


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   Let's remove it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942537530


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   Up to you and @alamb



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942541549


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null

Review Comment:
   Only difference is that I condensed the 3 if statements into 1



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942535860


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null

Review Comment:
   @jayzhan211 Not sure I follow, I reverted back to the original logic as 
requested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942535860


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null

Review Comment:
   @jayzhan211 Not sure I follow, I reverted to the original logic as requested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942530489


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +541,36 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {
-return target_type.default_cast_for(current_type);
+if &logical_type == target_type
+|| logical_type == NativeType::Null

Review Comment:
   This should be `target_type.default_cast_for(current_type)` only



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan-synnada commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942523715


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   Do we support integer?



##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   Do we support integer?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942525348


##
datafusion/functions/src/string/bit_length.rs:
##
@@ -106,6 +108,33 @@ impl ScalarUDFImpl for BitLengthFunc {
 }
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   same with this



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-05 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942524406


##
datafusion/functions/src/string/ascii.rs:
##
@@ -93,6 +95,33 @@ impl ScalarUDFImpl for AsciiFunc {
 make_scalar_function(ascii, vec![])(args)
 }
 
+fn coerce_types(&self, arg_types: &[DataType]) -> Result> {
+if arg_types.len() != 1 {
+return plan_err!(
+"The {} function requires 1 argument, but got {}.",
+self.name(),
+arg_types.len()
+);
+}
+
+let arg_type = &arg_types[0];
+let current_native_type: NativeType = arg_type.into();
+let target_native_type = NativeType::String;
+if current_native_type.is_integer()

Review Comment:
   should we support integer? It is not consistent with Postgres/DuckDB



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


findepi commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2635850123

   > I fully recognize this creates behavior that diverges from 
PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a 
critical distinction: **System contracts vs. SQL dialect behavior.**
   > 
   > [...]
   > 
   > The current approach keeps the door open for both strict and permissive 
use cases through explicit signature choices. With these changes, users can 
still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing 
signatures while still leveraging broader coercion where appropriate.
   
   The function coercions are not enough for building a tailored system. 
Relational operators also do coercions (the set operators: union, intersect, 
except).
   
   In any case, for certain system designs -- those who take on responsibility 
of implementing their particular SQL dialect behavior before handing over the 
control over to DF core -- it's desirable to opt out from any coercion logic at 
all. @shehabgamin @linhr Given the Sail design, you might be interested in 
https://github.com/apache/datafusion/issues/12723. 
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824


##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
 Interval,
 Duration,
 Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),

Review Comment:
   Can we also avoid these `Numeric` and `Integer`? I'm thinking to deprecate 
this 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824


##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
 Interval,
 Duration,
 Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),

Review Comment:
   Can we also avoid these `Numeric` and `Integer`? I'm thinking to deprecate 
this 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1942056824


##
datafusion/expr-common/src/signature.rs:
##
@@ -213,9 +221,8 @@ pub enum TypeSignatureClass {
 Interval,
 Duration,
 Native(LogicalTypeRef),
-// TODO:
-// Numeric
-// Integer
+Numeric(LogicalTypeRef),

Review Comment:
   Can we also avoid these 2? I'm thinking to deprecate this 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2635031823

   > @jayzhan211 To keep it simple ill just remove AnyNative and use 
coerce_types so we don't block this PR any longer. We can have a larger 
discussion and align on goals afterwards!
   > 
   > CC @alamb @Omega359
   
   Done, this should be good to merge now. `AnyNative` was removed, `Native` 
was reverted back to original logic, and for the 6 UDFs in the PR 
`Signature::user_defined` was used.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-04 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2634598214

   @jayzhan211 To keep it simple ill just remove AnyNative and use coerce_types 
so we don't block this PR any longer. We can have a larger discussion and align 
on goals afterwards!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633097400

   > This is the point, I don't quite understand why adding signature for 
`default_cast_to` logic helps. I thought you want to solve the issues for these 
datafusion functions, so I don't know what others functions you want to solve 
and they are only possible if we have `AnyNative`.
   >
   > @shehabgamin First of all, what are the issues we are solving? Are these 
functions in datafusion?
   > 
   > ```
   > ascii
   > bit_length
   > contains
   > ends_with
   > starts_with
   > octet_length
   > ```
   > 
   > Are there others issues? Why `AnyNative` helps?
   
   @jayzhan211 The regression is that all these functions before DataFusion 43 
would coerce:
   1. `Binary` -> `String`
   2. `Int` -> `String`
   
   But now they no longer do.
   
   I am trying to find some middle ground here. I am happy to implement any 
signature for the functions in this PR, but currently, `default_cast_for` is 
inaccessible for downstream users, adding `AnyNative` makes it accessible. Sail 
implements hundreds of custom UDFs, so it would be nice to have access to  
`default_cast_for`. We're not the only downstream project that has a need for 
for flexible coercion like this (see here: 
https://github.com/apache/datafusion/issues/14230#issuecomment-2631350057).
   
   > Even if there is such case that we really need AnyNative, it seems strange 
to me to add in TypeSignatureClass but should be another new TypeSignature.
   
   I'm not sure that `AnyNative` should be a new `TypeSignature` because 
`TypeSignature::Coercible` most accurately describes the behavior.
   
   > Btw, if your proposed signature doesn't used by any functions in 
datafusion. You should use TypeSignature::UserDefined it is used for any other 
customize logic for downstream.
   
   The point is that `default_cast_for` is not accessible downstream. Should we 
just make `default_cast_for` a public function?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633007543

   Btw, if your proposed signature doesn't help for any functions in 
datafusion. You should use `TypeSignature::UserDefined` it is used for any 
other customize logic for downstream.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2633000378

   > if it helps downstream projects
   
   This is the point, I don't quite understand why adding signature for 
`default_cast_to` logic helps. I thought you want to solve the issues for these 
datafusion functions, so I don't know what others functions you want to solve 
and they are only possible if we have `AnyNative`.
   
   Even if there is such case that we really need `AnyNative`, it seems to 
strange to me to add in TypeSignatureClass but should be another new 
TypeSignature.
   
   > IMO CoercibleV2 would be out of scope for this PR
   
   I think this is the solution to the issue we have, we need coercible like 
signature but **not fixed logic exposed to the user** since it makes any 
changes to it breaking change, while `CoercibleV2` is not the case.
   
   @shehabgamin First of all, what are the issues we are solving? Are these 
functions in datafusion? 
   
   ```
   ascii
   bit_length
   contains
   ends_with
   starts_with
   octet_length
   ```
   
   Are there others issues? Why `AnyNative` helps?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632975463

   > As we have discussed, we should avoid using old Coercible signature and 
also the TypeSignatureClass that is used in Coercible, because any change might 
impact downstream projects, although if we add new `TypeSignatureClass` like 
what you did will not, but adding more soon to remove things are not good idea
   
   @jayzhan211 I thought you said it was okay to add a new signature if it 
helps downstream projects. See here:
   https://github.com/apache/datafusion/pull/14268#issuecomment-2629247219
   
   I reverted `Native` back to the original logic before this PR and then I 
added `AnyNative`.
   
   > I think we can work directly on CoercibleV2 I mentioned for these functions
   
   IMO `CoercibleV2` would be out of scope for this PR. It looks like @alamb 
would like to get this PR sorted out if possible for `45` (see here: 
https://github.com/apache/datafusion/issues/14008#issuecomment-2631357227)
   
   If I am understanding you correctly, you are okay with adding a new 
signature but not applying them to the UDFs in this PR? Should I apply 
User-defined coercion as you were mentioning earlier?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632952904

   > @jayzhan211 What did I miss here?
   
   As we have discussed, we should avoid using old Coercible signature and also 
the TypeSignatureClass that is used in Coercible. Therefore the change doesn't 
seem toward the correct path to me
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632940583

   > > I may add a new TypeSignatureClass in this pr to expose default_cast_for 
so that it's accessible by downstream users
   > 
   > Not sure do we need this πŸ€” since we might not need this in the future, but 
if this helps downstream project then it is fine to me to add it for now and 
remove it in the future
   
   > We can implement first version for the functions you mentioned, I believe 
the change makes more sense
   
   > I don't quite understand why we are adding more `TypeSignatureClass` here πŸ˜•
   
   @jayzhan211 What did I miss here?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632938494

   > > @shehabgamin #14440 I come out a flexible version of 
Signature::CoercibleV2 (temporary name), it can replace `Signature::Coercible`, 
`Signature::String`, `Signature::Numeric`, `Signature::Exact`, 
`Signature::Uniform` and `Signature::Comparable`.
   > > The most difference is that the `implicit coercion` is now defined by 
the user. This way any change for datafusion doesn't impact downstream projects 
unlike the existing signature like `Signature::String` or 
`Signature::Coercible`.
   > > I probably don't have time to push it forward in the recent days, if you 
are interested in it you can work on it.
   > 
   > I think it makes sense to work on this for DataFusion 46!
   > 
   > > We can implement first version for the functions you mentioned, I 
believe the change makes more sense
   > 
   > @jayzhan211 If I am understanding you correctly, Done! I added `AnyNative` 
and applied it to the functions mentioned.
   > 
   > CC @alamb @Omega359
   
   I don't quite understand why we are adding more `TypeSignatureClass` here πŸ˜• 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632894156

   > @shehabgamin #14440 I come out a flexible version of 
Signature::CoercibleV2 (temporary name), it can replace `Signature::Coercible`, 
`Signature::String`, `Signature::Numeric`, `Signature::Exact`, 
`Signature::Uniform` and `Signature::Comparable`.
   > 
   > The most difference is that the `implicit coercion` is now defined by the 
user. This way any change for datafusion doesn't impact downstream projects 
unlike the existing signature like `Signature::String` or 
`Signature::Coercible`.
   > 
   > I probably don't have time to push it forward in the recent days, if you 
are interested in it you can work on it.
   
   I think it makes sense to work on this for DataFusion 46!
   
   > We can implement first version for the functions you mentioned, I believe 
the change makes more sense
   
   @jayzhan211 If I am understanding you correctly, Done! I added `AnyNative` 
and applied it to the functions mentioned.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2632259185

   I'll work on this tonight @alamb @jayzhan211 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


alamb commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2631352326

   FWIW I think @Omega359  hit the same "int no longer automatically coerces to 
String" in his application too: 
https://github.com/apache/datafusion/issues/14230#issuecomment-2631350057


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-03 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2631043032

   @shehabgamin #14440 I come out a flexible version of Signature::CoercibleV2 
(temporary name), it can be replace `Signature::String`, `Signature::Numeric`, 
`Signature::Exact`, `Signature::Uniform` and `Signature::Comparable`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-02 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629300546

   > I'll get to this sometime in the next few hours. Taking a detour to review 
and comment on https://github.com/apache/datafusion/pull/14392.
   
   @jayzhan211 @alamb, I bit off more than I can chew here. I actually won’t be 
able to get to this until Monday night (PT). Hopefully it can still make it in 
time for the 45 release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629247219

   > > If the approach looks good to you too, we can go ahead!
   > 
   > Sounds good! This works for now since we'll be working towards a better 
design in the future.
   > 
   > I may add a new `TypeSignatureClass` in this pr to expose 
`default_cast_for` so that it's accessible by downstream users. I'll also 
update the doc comments to accurately reflect the behavior of `Coercible`.
   > 
   > I'll get to this sometime in the next few hours. Taking a detour to review 
and comment on https://github.com/apache/datafusion/pull/14392.
   > 
   > > We already have user defined coercion for functions, #14296 is for 
binary op or case expression and so on.
   > > 
   > > If we agree to use User-defined coercion for functions everywhere, 
modify coerce types are the easier fix that can go into release quicker. 
Another better design is to redesign a good method for the function trait, I 
will also think about the design too.
   > 
   > I think a redesign is the path forward. As long as we offer flexibility 
for customization (e.g., users can override the default signature), then it 
doesn't really matter what the default implemented signature is, because we 
won't have to worry about conflicting requirements.
   
   I guess we will remove signature::Coercible in the future.  Your purpose of 
adding signature that do 'default_cast_for' is just a quick workaround? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629245515

   > If the approach looks good to you too, we can go ahead!
   Sounds good! This works for now since we'll be working towards a better 
design in the future.
   
   I may add a new `TypeSignatureClass` in this pr to expose `default_cast_for` 
so that it's accessible by downstream users. I'll also update the doc comments 
to accurately reflect the behavior of `Coercible`.
   
   I'll get to this sometime in the next few hours. Taking a detour to review 
and comment on https://github.com/apache/datafusion/pull/14392.
   
   > We already have user defined coercion for functions, #14296 is for binary 
op or case expression and so on.
   > 
   > If we agree to use User-defined coercion for functions everywhere, modify 
coerce types are the easier fix that can go into release quicker. Another 
better design is to redesign a good method for the function trait, I will also 
think about the design too.
   
   I think a redesign is the path forward. As long as we offer flexibility for 
customization (e.g., users can override the default signature), then it doesn't 
really matter what the default implemented signature is, because we won't have 
to worry about conflicting requirements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629240637

   > ### Topic 1
   > > @findepi I had thought about using Signature:: User-defined everywhere 
before but was rejected for some reason. As the discussion here, I think we 
need this to minimize the impact of any changes that is required for datafusion 
builtun function but not for other projects. Any thoughts to add in?
   > 
   > @alamb @jayzhan211 Are we waiting on @findepi to chime in, or should I 
just go ahead with this approach?
   > 
   > ### Topic 2
   > > It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   > >
   > > To me this is a sign we need more work as @jayzhan211 says and I am sure 
we would appreciate help figuring out what to do / making the code better
   > 
   > I am happy to help out here! I think that we all want the same thing, but 
what's going on is that it's hard for all of us to be fully contextualized with 
everything so we end up going in circles.
   > 
   > @alamb @jayzhan211 Let's bring the conversation back here 
https://github.com/apache/datafusion/issues/14296 and try to get this topic 
resolved for DataFusion 46. I would also like to note that I'm always happy to 
hop on a call or do some sort of live messaging if that's ever desired.
   > 
   
   If the approach looks good to you too, we can go ahead! 
   
   We already have user defined coercion for functions, #14296 is for binary op 
or case expression and so on.
   
   If we agree to use User-defined coercion for functions everywhere, modify 
coerce types are the easier fix that can go into release quicker. Another 
better design is to redesign a good method for the function trait, I will also 
think about the design too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629239585

   ### Topic 1
   > @findepi I had thought about using Signature:: User-defined everywhere 
before but was rejected for some reason. As the discussion here, I think we 
need this to minimize the impact of any changes that is required for datafusion 
builtun function but not for other projects. Any thoughts to add in?
   
   @alamb @jayzhan211 Are we waiting on @findepi to chime in, or should I just 
go ahead with this approach?
   
   ### Topic 2
   > It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   >
   > To me this is a sign we need more work as @jayzhan211 says and I am sure 
we would appreciate help figuring out what to do / making the code better
   
   I am happy to help out here! I think that we all want the same thing, but 
what's going on is that it's hard for all of us to be fully contextualized with 
everything so we end up going in circles.
   
   @alamb @jayzhan211 Let's bring the conversation back here 
https://github.com/apache/datafusion/issues/14296 and try to get this topic 
resolved for DataFusion 46. I would also like to note that I'm always happy to 
hop on a call or do some sort of live messaging if that's ever desired.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


alamb commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2629035590

   Given I feel like we have not yet reached consensus on this issue I made a 
release branch for 45 without this change
   - https://github.com/apache/datafusion/issues/14008#issuecomment-2628923232
   
   I expect to make a release candidate tomorrow or monday so if we can get 
something that is safe to backport to 45 by then I can do so. Otherwise let's 
just fix it on main and include in version 46


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628939208

   > @shehabgamin can we try to use Singatur::User-defined coerce types for 
those functions, the change is relatively trivial and the impact is smaller 
than the current one
   
   @findepi I had thought about using Signature:: User-defined everywhere 
before but was rejected for some reason. As the discussion here, I think we 
need this to minimize the impact of any changes that is required for datafusion 
builtun function but not for other projects. Any thoughts to add in?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628936121

   @shehabgamin can we try to use coerce types for those functions, the change 
is relatively trivial and less breakage compare to the current one


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628913239

   > > So what is the actual regressions to be resolved? I don't remember 
signature coericible is consistent with the docs before so it is not a 
regression to be fixed now. If supporting binary for string is something used 
to work, then we can add such coercion logic to signature:: string
   > 
   > @jayzhan211 The regressions resolved are all the UDFs listed in the PR 
description. Every test added to this PR passed on DataFusion `42`, and then 
every test that tested coercion behavior started failing on DataFusion `43`.
   > 
   > Signature coericible not being consistent with the documentation is a big 
problem, this PR fixes that.
   > 
   > > Is there any way we can fix the regression bug with some sort of smaller 
patch and then make the changes to coerceable behavior as a follow on PR?
   > 
   > @alamb The changes to the coercible behavior are already very small and 
are outlined below.
   > ### Changes To Coercible Behavior
   > 
   > **1. We go from this:**
   > 
https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L585-L604
   > **To this:**
   > 
https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L542-L545
   > 
   > **2. Before this line:**
   > 
https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L640
   > **We add this:**
   > 
https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L581-L597
   > 
   > That's it.
   > 
   > Just now i've implemented `TypeSignatureClass::Integer` and updated 
`repeat` to use that. Beyond this, I'm really not sure how to move forward here.
   
   The documentation is wrong but we should fix the description in this case, 
because the Coercible is introduced not because of we need a very flexible 
coercion logic but the logic like it is now.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-02-01 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628900809

   > So what is the actual regressions to be resolved? I don't remember 
signature coericible is consistent with the docs before so it is not a 
regression to be fixed now. If supporting binary for string is something used 
to work, then we can add such coercion logic to signature:: string
   
   @jayzhan211 The regressions resolved are all the UDFs listed in the PR 
description. Every test added to this PR passed on DataFusion `42`, and then 
every test that tested coercion behavior started failing on DataFusion `43`.
   
   Signature coericible not being consistent with the documentation is a big 
problem, this PR fixes that.
   
   > Is there any way we can fix the regression bug with some sort of smaller 
patch and then make the changes to coerceable behavior as a follow on PR?
   
   @alamb The changes to the coercible behavior are already very small and are 
outlined below.
   ### Changes To Coercible Behavior
   
   1. We go from this:
   
https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L585-L604
   To this:
   
https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L542-L545
   
   2. Before this line:
   
https://github.com/apache/datafusion/blob/9d1bfc1bfb4b6fc59c36a391b21d5b4bb7191804/datafusion/expr/src/type_coercion/functions.rs#L640
   We add this:
   
https://github.com/apache/datafusion/blob/d0f3f9adec75bb80ef05dc6aca33f2e0621c9c3b/datafusion/expr/src/type_coercion/functions.rs#L581-L597
   
   I've implemented `TypeSignatureClass::Integer`and update repeat to use that. 
Beyond this, I'm really not sure how to move forward here.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628591052

   > > > I don't have a strong opinion about how exactly the coercion rules 
should be changed. 
   > > > 
   > > > 
   > > > 
   > > > It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   > > > 
   > > > 
   > > > 
   > > > To me this is a sign we need more work as @jayzhan211 says and I am 
sure we would appreciate help figuring out what to do / making the code better
   > > > 
   > > > 
   > > > 
   > > > That being said, I would like to make the 45.0.0 release candidate 
soon (like later today or tomorrow) and I worry that making non trivial changes 
to the coercion logic so close may lead to other unintended consequences
   > > > 
   > > > 
   > > > 
   > > > Is there any way we can fix the regression bug with some sort of 
smaller patch and then make the changes to coerceable behavior as a follow on 
PR?  
   > > > 
   > > > 
   > > 
   > > @alamb I'm not near a computer until later tonight, but just lemme know 
what you'd like me to do and i'll take care of it when I'm back home! 
   > 
   > So what is the actual regressions to be resolved? I don't remember 
signature coericible is consistent with the docs before so it is not a 
regression to be fixed now. If supporting binary for string is something used 
to work, then we can add such coercion logic to signature:: string
   
   But change of signature::string may also cause another impact on others that 
don't want binary to string coercionπŸ˜…


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628583804

   > > I don't have a strong opinion about how exactly the coercion rules 
should be changed. 
   > > 
   > > 
   > > 
   > > It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   > > 
   > > 
   > > 
   > > To me this is a sign we need more work as @jayzhan211 says and I am sure 
we would appreciate help figuring out what to do / making the code better
   > > 
   > > 
   > > 
   > > That being said, I would like to make the 45.0.0 release candidate soon 
(like later today or tomorrow) and I worry that making non trivial changes to 
the coercion logic so close may lead to other unintended consequences
   > > 
   > > 
   > > 
   > > Is there any way we can fix the regression bug with some sort of smaller 
patch and then make the changes to coerceable behavior as a follow on PR?  
   > > 
   > > 
   > 
   > @alamb I'm not near a computer until later tonight, but just lemme know 
what you'd like me to do and i'll take care of it when I'm back home! 
   
   So what is the actual regressions to be resolved? I don't remember signature 
coericible is consistent with the docs before so it is not a regression to be 
fixed now. If supporting binary for string is something used to work, then we 
can add such coercion logic to singature:: sitting 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2628446082

   > I don't have a strong opinion about how exactly the coercion rules should 
be changed. 
   > 
   > 
   > 
   > It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   > 
   > 
   > 
   > To me this is a sign we need more work as @jayzhan211 says and I am sure 
we would appreciate help figuring out what to do / making the code better
   > 
   > 
   > 
   > That being said, I would like to make the 45.0.0 release candidate soon 
(like later today or tomorrow) and I worry that making non trivial changes to 
the coercion logic so close may lead to other unintended consequences
   > 
   > 
   > 
   > Is there any way we can fix the regression bug with some sort of smaller 
patch and then make the changes to coerceable behavior as a follow on PR?  
   > 
   > 
   
   @alamb I'm not near a computer until later tonight, but just lemme know what 
you'd like me to do and i'll take care of it when I'm back home! 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub


alamb commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2627866762

   I don't have a strong opinion about how exactly the coercion rules should be 
changed. 
   
   It does seem to me that we keep churning / thrashing on coercion (and 
introducing regressions like the one this PR fixes along with 
https://github.com/apache/datafusion/issues/14154 and 
https://github.com/apache/datafusion/issues/14383
   
   To me this is a sign we need more work as @jayzhan211 says and I am sure we 
would appreciate help figuring out what to do / making the code better
   
   That being said, I would like to make the 45.0.0 release candidate soon 
(like later today or tomorrow) and I worry that making non trivial changes to 
the coercion logic so close may lead to other unintended consequences
   
   Is there any way we can fix the regression bug with some sort of smaller 
patch and then make the changes to coerceable behavior as a follow on PR?  
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-31 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626728678

   > First of all, we need to have a out of box behavior for each function, and 
at the same time easy to be customizable. The Signature Coercible now becomes 
problematic, if we follows the system contract, it is not usable internally in 
datafusion core, but if we adapt it to function in datafusion core, it becomes 
less flexible and even easy to cause surprised regressions for other projects 
that rely on the old signature logic.
   
   My guess is that it would still be usable internally by some functions in 
datafusion core. This is because coercion behavior can often be unique to a 
specific UDF.
   
   > The idea I have so far is that how about make Signature a method of trait 
at all, each function is able to implement it's own signature check, it not 
only is flexible by default and also any change to it is safely not to impact 
others. To make coercion less repeated and easy to use we provide many small 
and well defined utility functions like string coercion or numeric coercion 
both used internally and public to others.
   
   This would be out of scope for this PR, but I think it aligns pretty well 
with the discussion we were having here:
   https://github.com/apache/datafusion/issues/14296#issuecomment-2614337663
   
   For the purpose of this PR, what are your thoughts about:
   1. Keeping the changes to the coercible behavior. This follows the system 
contract and makes the signature consistent with the documentation.
   2. Using different signatures (or creating new signatures) for the affected 
UDFs in this PR.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-30 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626205413

   > > > @jayzhan211 and @shehabgamin  what is the status of this PR? It looks 
to me like there are some unresolved comments
   > > > It looks like there are some unresolved comments like [#14268 
(comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432)
 from @jayzhan211 that suggest this PR is being too aggressive in coercion
   > > > I would like to try and get this in for the fhe DataFusion 45 release 
so it would be nice if we can figure out how to fix the regressions
   > > 
   > > There are concerns unresolved
   > 
   > @alamb and @jayzhan211 - Thank you for your patience and thoughtful 
discussion. Sorry for being quiet here, I’ve spent time reflecting on how to 
reconcile the competing priorities over the past couple of days.
   > 
   > Let me clarify the core principles driving this PR.
   > 
   > This PR aims to restore DataFusion's documented type coercion contracts - 
specifically the semantics defined for 
`TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging 
the existing casting logic in `NativeType::default_cast_for`.
   > 
   > The current implementation strictly follows the guidance in: 
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134
 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type 
castable to the target `NativeType` through the explicit set of type conversion 
rules defined in `NativeType::default_cast_for`.
   > 
   > I fully recognize this creates behavior that diverges from 
PostgreSQL/DuckDB semantics for the various UDFs in this PR. However, there’s a 
critical distinction: **System contracts vs. SQL dialect behavior.**
   > 
   > While PostgreSQL/DuckDB-style strictness is desirable for specific use 
cases, subverting the type system’s foundational contracts would result in:
   > 
   > 1. **Lost flexibility:** Users lose the ability to intentionally design 
UDFs with broader NativeType coercion patterns when needed. The system becomes 
opinionated in ways that aren't transparent.
   > 2. **Hidden couplings:** Tightly linking coercion rules to individual 
function implementations makes behavior unpredictable across optimizer rules. 
What works for `ascii()` might break `concat()` unexpectedly.
   > 3. **Eroded system integrity:** Undocumented exceptions to 
`NativeType::default_cast_for` in 
`Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion 
context-dependent and impossible to maintain and reason about holistically.
   > 
   > The current approach keeps the door open for both strict and permissive 
use cases through explicit signature choices. With these changes, users can 
still achieve PostgreSQL/DuckDB behavior by defining UDFs with existing 
signatures while still leveraging broader coercion where appropriate. 
Conversely, hardcoding dialect-specific restrictions at the type system level 
would limit DataFusion’s flexibility and usability as a polyglot execution 
engine.
   > 
   > If we bypass these contracts to enforce function-specific rules then we:
   > 
   > 1. **Break documented behavior:** The system no longer behaves as its APIs 
and documentation promise, creating confusion.
   > 2. **Misalign naming and semantics**: A 
`Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid 
coercions via `NativeType::default_cast_for` becomes a contradiction in terms.
   > 
   > I don't feel strongly about what signatures each individual UDF in this PR 
should implement, but I do feel strongly about maintaining the documented type 
coercion contracts and system semantics that define how DataFusion's type 
system is intended to work. The behavior of `TypeSignatureClass::Native` and 
`NativeType::default_cast_for` is explicitly documented, and their naming 
clearly signals their purpose.
   
   I think you totally point out the challenge we have for function's type 
coercion, the large difference between each function makes it hard to design a 
one size fit all solution but if we all in flexibility (with 'coerce_types' we 
ends up similar and repeated coercion logic everywhere.
   
   First of all, we need to have a out of box behavior for each function, and 
at the same time easy to be customizable. The Signature Coercible now becomes 
problematic, if we follows the system contract, it is not usable internally in 
datafusion core, but if we adapt it to function in datafusion core, it becomes 
less flexible and even easy to cause surprised regressions for other projects 
that rely on the old signature logic.
   
   The idea I have so far is that how about make Signature a method of trait at 
all, each function is able to implement it's own signature check, it not only 
is flexible by default and also any change to it is safely not to impact 
others. To make co

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-30 Thread via GitHub


shehabgamin commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626079378

   > > @jayzhan211 and @shehabgamin  what is the status of this PR? It looks to 
me like there are some unresolved comments
   > > It looks like there are some unresolved comments like [#14268 
(comment)](https://github.com/apache/datafusion/pull/14268#discussion_r1933063432)
 from @jayzhan211 that suggest this PR is being too aggressive in coercion
   > > I would like to try and get this in for the fhe DataFusion 45 release so 
it would be nice if we can figure out how to fix the regressions
   > 
   > There are concerns unresolved
   
   @alamb and @jayzhan211 - Thank you for your patience and thoughtful 
discussion. Sorry for being quiet here, I’ve spent time reflecting on how to 
reconcile the competing priorities over the past couple of days.
   Let me clarify the core principles driving this PR.
   
   This PR aims to restore DataFusion's documented type coercion contracts - 
specifically the semantics defined for 
`TypeSignature::Coercible(vec![TypeSignatureClass::Native(...)])` by leveraging 
the existing casting logic in `NativeType::default_cast_for`.
   
   The current implementation strictly follows the guidance in 
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134,
 where `Coercible(vec![TypeSignatureClass::Native(...)])` accepts any type 
castable to the target `NativeType` through the explicit set of type conversion 
rules defined in `NativeType::default_cast_for`.
   
   I fully recognize this creates behavior that diverges from PostgreSQL/DuckDB 
semantics for the various UDFs in this PR. However, there’s a critical 
distinction: **System contracts vs. SQL dialect behavior.**
   
   While PostgreSQL/DuckDB-style strictness is desirable for specific use 
cases, subverting the type system’s foundational contracts would result in:
   
   1. **Lost flexibility:** Users lose the ability to intentionally design UDFs 
with broader NativeType coercion patterns when needed. The system becomes 
opinionated in ways that aren't transparent.
   2. **Hidden couplings:** Tightly linking coercion rules to individual 
function implementations makes behavior unpredictable across optimizer rules. 
What works for `ascii()` might break `concat()` unexpectedly.
   3. **Eroded system integrity:** Undocumented exceptions to 
`NativeType::default_cast_for` in 
`Coercible(vec![TypeSignatureClass::Native(...)])` would make type coercion 
context-dependent and impossible to maintain and reason about holistically.
   
   The current approach keeps the door open for both strict and permissive use 
cases through explicit signature choices. With these changes, users can still 
achieve PostgreSQL/DuckDB behavior by defining UDFs with existing signatures 
while still leveraging broader coercion where appropriate. Conversely, 
hardcoding dialect-specific restrictions at the type system level would limit 
DataFusion’s flexibility and usability as a polyglot execution engine.
   
   If we bypass these contracts to enforce function-specific rules then we:
   
   1. **Break documented behavior:** The system no longer behaves as its APIs 
and documentation promise, creating confusion.
   2. **Misalign naming and semantics**: A 
`Coercible(vec![TypeSignatureClass::Native(...)])` signature that rejects valid 
coercions via `NativeType::default_cast_for` becomes a contradiction in terms.
   
   I don't feel strongly about what signatures each individual UDF in this PR 
should implement, but I do feel strongly about maintaining the documented type 
coercion contracts and system semantics that define how DataFusion's type 
system is intended to work. The behavior of `TypeSignatureClass::Native` and 
`NativeType::default_cast_for` is explicitly documented, and their naming 
clearly signals their purpose.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-30 Thread via GitHub


jayzhan211 commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2626045546

   > @jayzhan211 and @shehabgamin  what is the status of this PR? It looks to 
me like there are some unresolved comments 
   > 
   > It looks like there are some unresolved comments like 
https://github.com/apache/datafusion/pull/14268#discussion_r1933063432 from 
@jayzhan211 that suggest this PR is being too aggressive in coercion
   > 
   > I would like to try and get this in for the fhe DataFusion 45 release so 
it would be nice if we can figure out how to fix the regressions 
   
   There are concerns unresolved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-30 Thread via GitHub


alamb commented on PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#issuecomment-2625430773

   @jayzhan211 and @shehabgamin  what is the status of this PR? It looks to me 
like there are some unresolved comments 
   
   It looks like there are some unresolved comments like 
https://github.com/apache/datafusion/pull/14268#discussion_r1933063432 from 
@jayzhan211 that suggest this PR is being too aggressive in coercion
   
   I would like to try and get this in for the fhe DataFusion 45 release so it 
would be nice if we can figure out how to fix the regressions 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-30 Thread via GitHub


alamb commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1936187986


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
 
 -1.2-1.2-1.2
 
-query error DataFusion error: Error during planning: Internal error: Expect 
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but 
received Float64
+query T
 select repeat('-1.2', 3.2);

Review Comment:
   In main this query does an internal error
   ```sql
   > select repeat('foo', 3.2);
   Error during planning: Internal error: Expect 
TypeSignatureClass::Native(LogicalType(Native(Int64), Int64)) but received 
Float64.
   This was likely caused by a bug in DataFusion's code and we would welcome 
that you file an bug report in our issue tracker No function matches the given 
name and argument types 'repeat(Utf8, Float64)'. You might need to add explicit 
type casts.
Candidate functions:
repeat(TypeSignatureClass::Native(LogicalType(Native(String), String)), 
TypeSignatureClass::Native(LogicalType(Native(Int64), Int64)))
   ```
   
   Avoiding an internal error seems like an improvement to me
   
   I agree in general it would likely be better to help people avoid an error 
by rejecting non-integer arguments with a nicer error
   
   
   Filed 
   - https://github.com/apache/datafusion/issues/14376
   
   I don't understand the implementation implications however 



##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   I think for now this is ok in my opinion as there is no spark functions 
crate yet 😬 Maybe this is yet another example of how it is needed
   
   - https://github.com/apache/datafusion/issues/5600



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   As a rule of thumb, stick to either Postgres or DuckDB for functions, and 
avoid overly broad type casting that allows any type to convert to any other. 
This can turn invalid queries into valid ones, making errors harder to catch, 
especially when tests don’t cover all cases.
   
   Based on that,
   1. Cast from Binary to String πŸ‘πŸ» 
   2. Cast from Numeric to String πŸ‘ŽπŸ»  (At least for the string functions 
changed in this PR)
   3. Cast from Float to Integer πŸ‘ŽπŸ» (At least not for `repeat`) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   As a rule of thumb, stay consistent with either Postgres or DuckDB when 
using functions. Avoid overly general casting that allows any type to be cast 
to any other type, as this can turn invalid queries into valid ones. This makes 
it harder to catch errors, especially when tests don't cover all cases
   
   Based on that,
   1. Cast from Binary to String πŸ‘πŸ» 
   2. Cast from Numeric to String πŸ‘ŽπŸ»  (At least for the string functions 
changed in this PR)
   3. Cast from Float to Integer πŸ‘ŽπŸ» (At least not for `repeat`) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933110734


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   The rule of thumb is that try to be consistent with either Postgres and 
DuckDB for the functions and don't make the casting too general that cast any 
type to any type since this turns invalid query into valid one and it is not 
easy to know that given the test is not covered.
   
   Based on that,
   1. Cast from Binary to String πŸ‘πŸ» 
   2. Cast from Numeric to String πŸ‘ŽπŸ»  (At least for the string functions 
changed in this PR)
   3. Cast from Float to Integer πŸ‘ŽπŸ» (At least not for `repeat`) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933063432


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   @shehabgamin 
   I suggest with revert to the previous change and add binary->string casting 
given most of the string function allow this kind of casting, but not 
int->string like `ascii(2)`
   
   For Signature::String, we can make it a convenient wrapper of 
`Coercible(string, string)` or deprecate it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933056565


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   > `TypeSignature::Coercible` is designed to cast between the same logical 
type, but now it casts to any kind of type, I don't think this is ideal.
   
   This is what the doc comments say:
   ```
   /// One or more arguments belonging to the [`TypeSignatureClass`], in order.
   ///
   /// For example, `Coercible(vec![logical_float64()])` accepts
   /// arguments like `vec![Int32]` or `vec![Float32]`
   /// since i32 and f32 can be cast to f64
   ///
   /// For functions that take no arguments (e.g. `random()`) see 
[`TypeSignature::Nullary`].
   Coercible(Vec),
   ```
   Link:
   
https://github.com/apache/datafusion/blob/a4917d44c8cc3b08c61383320285d239250dd94e/datafusion/expr-common/src/signature.rs#L127-L134



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108


##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   For `Signature::string` all input args must be `String` or `Dictionary` with 
a `String` value (Unchanged in this PR). For `Signature::Coercible(string)`, 
the input args must be able to cast to a `String` (dictated via 
`default_cast_for`).



##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   For `Signature::string(x)` all input args must be `String` or `Dictionary` 
with a `String` value (Unchanged in this PR). For 
`Signature::Coercible(string)`, the input args must be able to cast to a 
`String` (dictated via `default_cast_for`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933060341


##
datafusion/functions/src/string/bit_length.rs:
##
@@ -55,7 +58,10 @@ impl Default for BitLengthFunc {
 impl BitLengthFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(1, Volatility::Immutable),
+signature: Signature::coercible(

Review Comment:
   See here:
   https://github.com/apache/datafusion/pull/14268#discussion_r1933055108
   
   But yes I agree, the documentation should be clearer. I had to dig into the 
code to get a solid understanding of the behavior.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933056565


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   > `TypeSignature::Coercible` is designed to cast between the same logical 
type, but now it casts to any kind of type, I don't think this is ideal.
   
   This is what the doc comments say:
   ```
   /// One or more arguments belonging to the [`TypeSignatureClass`], in order.
   ///
   /// For example, `Coercible(vec![logical_float64()])` accepts
   /// arguments like `vec![Int32]` or `vec![Float32]`
   /// since i32 and f32 can be cast to f64
   ///
   /// For functions that take no arguments (e.g. `random()`) see 
[`TypeSignature::Nullary`].
   Coercible(Vec),
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108


##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   For `Signature::string` all input args must be `String` or `Dictionary` with 
a `String` value. For `Signature::Coercible`, the input args must be able to 
cast to a `String` (dictated via `default_cast_for`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933055108


##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   For `Signature::string` all input args must be a `String` or Dictionary with 
`String` value. For `Signature::Coercible`, the input args must be able to cast 
to a `String` (dictated via `default_cast_for`).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933052292


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   For example, in this PR, `SELECT bit_length(12);` now return 16 instead of 
error, but I think we should error. Any unexpected type is valid which doesn't 
seem correct



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933048065


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   `TypeSignature::Coercible` is designed to cast between the same logical 
type, but now it cast any kind of type, I don't think this is ideal.



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   `TypeSignature::Coercible` is designed to cast between the same logical 
type, but now it casts to any kind of type, I don't think this is ideal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933034910


##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   Actually I'm quite confuse why we need to change to coercible(string, 
string) in this PR, `String(2)` is equivalent to `Coercible(string, string)` in 
my mind, so I don't quite follow the change at all 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933031384


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
 
 -1.2-1.2-1.2
 
-query error DataFusion error: Error during planning: Internal error: Expect 
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but 
received Float64
+query T
 select repeat('-1.2', 3.2);

Review Comment:
   I still think non-integer 2nd arg should return error



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1933030522


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   Should this belongs to Spark-functions crate instead of datafusion core?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-28 Thread via GitHub


alamb commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1932866009


##
datafusion/functions/src/string/bit_length.rs:
##
@@ -55,7 +58,10 @@ impl Default for BitLengthFunc {
 impl BitLengthFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(1, Volatility::Immutable),
+signature: Signature::coercible(

Review Comment:
   as a minor comment this new signature is quite a bit of a mouthful compared 
to `Signature::string`
   
   It isn't clear to me from reading the code / comments when one would prefer 
to use one over the other (not related to this PR)



##
datafusion/functions/src/string/contains.rs:
##
@@ -59,7 +64,13 @@ impl Default for ContainsFunc {
 impl ContainsFunc {
 pub fn new() -> Self {
 Self {
-signature: Signature::string(2, Volatility::Immutable),

Review Comment:
   @jayzhan211  / @notfilippo  should we deprecate `Signature::string` and 
direct people to `Signature::coerceable`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   The behavior is consistent with Spark and the tests also come from the Spark 
codebase (which we ported to Sail then here lol).
   ```
   spark-sql (default)> SELECT ASCII(2);
   50
   Time taken: 0.952 seconds, Fetched 1 row(s)
   spark-sql (default)> SELECT ASCII('2');
   50
   Time taken: 0.044 seconds, Fetched 1 row(s)
   spark-sql (default)>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   The behavior is consistent with Spark, the tests come from the Spark 
codebase (which we ported to Sail then here too lol) as well.
   ```
   spark-sql (default)> SELECT ASCII(2);
   50
   Time taken: 0.952 seconds, Fetched 1 row(s)
   spark-sql (default)> SELECT ASCII('2');
   50
   Time taken: 0.044 seconds, Fetched 1 row(s)
   spark-sql (default)>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929723820


##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -584,23 +544,7 @@ fn get_valid_types(
 match target_type_class {
 TypeSignatureClass::Native(native_type) => {
 let target_type = native_type.native();
-if &logical_type == target_type {

Review Comment:
   If it's `TypeSignature::Coercible` with a 
`TypeSignatureClass::Native(logical_string())`, then yes. Any function that 
specifies `TypeSignature::Coercible` with a `TypeSignatureClass::Native` should 
coerce according to the behavior implemented in the `default_cast_for` function 
for `NativeType` in order to be consistent.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929721662


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -571,8 +601,10 @@ select repeat('-1.2', arrow_cast(3, 'Int32'));
 
 -1.2-1.2-1.2
 
-query error DataFusion error: Error during planning: Internal error: Expect 
TypeSignatureClass::Native\(LogicalType\(Native\(Int64\), Int64\)\) but 
received Float64
+query T
 select repeat('-1.2', 3.2);

Review Comment:
   Maybe the 2nd arg should be `TypeSignatureClass::Integer` instead of 
`TypeSignatureClass::Native(logical_int64())`? It's currently unimplemented 
with a `TODO`: 
https://github.com/apache/datafusion/blob/f77579108d1dc0285636fbfb24507d2bfca66446/datafusion/expr-common/src/signature.rs#L216-L218
   
   Otherwise, it would seem strange to deviate from the behavior specified in 
the implementation of `default_cast_for`for `NativeType`.
   
   I do think that we should prioritize being permissive and general-purpose 
though as I was discussing here: 
https://github.com/apache/datafusion/issues/14296#issuecomment-2614200125



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929718209


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   The behavior is consistent with Spark, the tests come from the Spark 
codebase as well.
   ```
   spark-sql (default)> SELECT ASCII(2);
   50
   Time taken: 0.952 seconds, Fetched 1 row(s)
   spark-sql (default)> SELECT ASCII('2');
   50
   Time taken: 0.044 seconds, Fetched 1 row(s)
   spark-sql (default)>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


jayzhan211 commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929710139


##
datafusion/sqllogictest/test_files/expr.slt:
##
@@ -344,6 +344,16 @@ SELECT ascii('πŸ’―a')
 
 128175
 
+query I
+SELECT ascii('222')
+
+50
+
+query I
+SELECT ascii(2)

Review Comment:
   In DuckDB, this doesn't work
   ```
   D select ascii(2);
   Binder Error: No function matches the given name and argument types 
'ascii(INTEGER_LITERAL)'. You might need to add explicit type casts.
Candidate functions:
ascii(VARCHAR) -> INTEGER
   
   LINE 1: select ascii(2);
  ^
   D select ascii('2');
   β”Œβ”
   β”‚ ascii('2') β”‚
   β”‚   int32β”‚
   β”œβ”€
   β”‚ 50 β”‚
   β””β”˜
   ```
   
   So does Postgres
   
   ```
   postgres=# select ascii(2);
   ERROR:  function ascii(integer) does not exist
   LINE 1: select ascii(2);
  ^
   HINT:  No function matches the given name and argument types. You might need 
to add explicit type casts.
   postgres=# select ascii('2');
ascii 
   ---
   50
   (1 row)
   ```
   
   This used to work in DataFusion, but it is not consistent with other 
systems. I'm not sure whether we should introduce this behaviour back



##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
 assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan, 
expected)?;
 Ok(())
 }
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type 
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {

Review Comment:
   We can remove this test if the purpose of the test is covered already in slt



##
datafusion/expr/src/type_coercion/functions.rs:
##
@@ -744,6 +706,57 @@ fn get_valid_types(
 Ok(valid_types)
 }
 
+/// Validates that all data types are either [`NativeType::String`] or 
[`NativeType::Null`].
+/// For [`NativeType::Null`], returns [`DataType::Utf8`] as the default string 
type.
+/// Returns error if any type is neither [`NativeType::String`] nor 
[`NativeType::Null`].
+fn validate_and_collect_string_types(data_types: &[DataType]) -> 
Result> {
+data_types
+.iter()
+.map(|data_type| {
+let logical_type: NativeType = data_type.into();
+match logical_type {
+NativeType::String => Ok(data_type.to_owned()),
+// TODO: Switch to Utf8View if all the string functions 
supports Utf8View
+NativeType::Null => Ok(DataType::Utf8),
+_ => plan_err!("The signature expected NativeType::String but 
received {logical_type}"),
+}
+})
+.collect()
+}
+
+/// Returns a common string [`DataType`] that both input types can be coerced 
to.
+/// Handles [`DataType::Dictionary`] by recursively finding common type of 
their value [`DataType`].
+/// Returns error if types cannot be coerced to a common string [`DataType`].
+fn find_common_string_type(lhs_type: &DataType, rhs_type: &DataType) -> 
Result {
+match (lhs_type, rhs_type) {
+(DataType::Dictionary(_, lhs), DataType::Dictionary(_, rhs)) => {
+find_common_string_type(lhs, rhs)
+}
+(DataType::Dictionary(_, v), other) | (other, DataType::Dictionary(_, 
v)) => {
+find_common_string_type(v, other)
+}
+_ => {
+if let Some(coerced_type) = string_coerc

Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-26 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929701613


##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
 assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan, 
expected)?;
 Ok(())
 }
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type 
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {

Review Comment:
   Ending up putting the tests in the `.slt` file, but figured we can still 
leave this test here.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Fix Type Coercion for UDF Arguments [datafusion]

2025-01-25 Thread via GitHub


shehabgamin commented on code in PR #14268:
URL: https://github.com/apache/datafusion/pull/14268#discussion_r1929529897


##
datafusion/optimizer/src/analyzer/type_coercion.rs:
##
@@ -2133,4 +2133,77 @@ mod test {
 assert_analyzed_plan_eq(Arc::new(TypeCoercion::new()), plan, 
expected)?;
 Ok(())
 }
+
+/// Create a [`PhysicalExpr`] from an [`Expr`] after applying type 
coercion.
+fn create_physical_expr_with_type_coercion(
+expr: Expr,
+df_schema: &DFSchema,
+) -> Result> {
+let props = ExecutionProps::default();
+let coerced_expr = expr
+.rewrite(&mut TypeCoercionRewriter::new(df_schema))?
+.data;
+let physical_expr = datafusion_physical_expr::create_physical_expr(
+&coerced_expr,
+df_schema,
+&props,
+)?;
+Ok(physical_expr)
+}
+
+fn evaluate_expr_with_array(
+expr: Expr,
+batch: RecordBatch,
+df_schema: &DFSchema,
+) -> Result {
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
df_schema)?;
+match physical_expr.evaluate(&batch)? {
+ColumnarValue::Array(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected array result in evaluate_expr_with_array"
+),
+}
+}
+
+fn evaluate_expr_with_scalar(expr: Expr) -> Result {
+let df_schema = DFSchema::empty();
+let physical_expr = create_physical_expr_with_type_coercion(expr, 
&df_schema)?;
+match physical_expr
+.evaluate(&RecordBatch::new_empty(Arc::clone(df_schema.inner(?
+{
+ColumnarValue::Scalar(result) => Ok(result),
+_ => datafusion_common::internal_err!(
+"Expected scalar result in evaluate_expr_with_scalar"
+),
+}
+}
+
+#[test]
+fn test_ascii_expr() -> Result<()> {

Review Comment:
   I would have preferred to place the various UDF tests within their 
respective files, but I couldn't due to circular dependencies.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]