Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-08-13 Thread via GitHub


github-actions[bot] closed pull request #15980: Additional placeholder datatype 
inferencing
URL: https://github.com/apache/datafusion/pull/15980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-08-04 Thread via GitHub


github-actions[bot] commented on PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#issuecomment-3153035994

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-06-03 Thread via GitHub


alamb commented on PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#issuecomment-2936183106

   Marking as draft as I think this PR is no longer waiting on feedback and I 
am trying to make it easier to find PRs in need of review. Please mark it as 
ready for review when it is ready for another look 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-30 Thread via GitHub


alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2115997415


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   Thank you 🙏 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-29 Thread via GitHub


kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2114359645


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   Yeah I agree with you. I've got caught up working on some other things but 
I've been giving this some thought and plan to work on it. We will find 
something that we like.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-28 Thread via GitHub


alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2112276143


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   This is my last remaining concern -- if we can figure out some way that 
LIMIT doesn't need special case it would be great



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-09 Thread via GitHub


alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2082330048


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   What concerns me about the code is that code for inferring data type has a 
special case for `LIMIT` but from what I know `LIMIT` doesn't have any special 
behavior for parameters
   
   If we need a schema to infer types for `LIMIT` we could either 
   1. Use its `input` plan (though all schema would be ignored)
   2. Use `Schema::empty()` perhaps
   
   Does that make sense?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub


kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   My test is actually really unfair, I will modify it. I constructed 
`LogicalPlan` by hand and gave it no opportunity to do any placeholder type 
inference.
   
   `Expr::infer_placeholder_types` takes an `Expr` and a `DFSchema` and it uses 
the schema to infer the datatype from the field type. However, there is no 
schema that will provide that information for `LIMIT`. It's a special case and 
we (I believe) always know what datatype it must be and it need not be 
inferred. I think the case is the same for `offset` as you mention. In fact, it 
looks like we try to coerce it to be Int64, 
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/optimizer/src/analyzer/type_coercion.rs#L242.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub


kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080123467


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1507,6 +1515,9 @@ impl LogicalPlan {
 (_, Some(dt)) => {
 param_types.insert(id.clone(), 
Some(dt.clone()));
 }
+(Some(Some(_)), None) => {
+// we have already inferred the datatype

Review Comment:
   This happens because we earlier treated LIMIT specially. We added it to the 
`param_types` but when we get to the Placeholder expression here, we see that 
the type was never inferred. If we weren't in an immutable method, we could do 
the actual type inference above instead of directly populating `param_types` 
and let it be appropriately populated here. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub


kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522


##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   My test is actually really unfair, I will modify it. I constructed 
`LogicalPlan` by hand and gave it no opportunity to do any placeholder type 
inference. For example, if I modify the test and call 
`LogicalPlan::with_param_values`, it will attempt to infer the datatypes so 
that replacement is valid. I've done this and the type is still not getting 
inferred for the `LIMIT` case. This makes sense though the way it's currently 
designed and the nature of the LIMIT clause. `Expr::infer_placeholder_types` 
takes an `Expr` and a `DFSchema` and it uses the schema to infer the datatype 
from the field type. However, there is no schema that will provide that 
information for `LIMIT`. It's a special case and we (I believe) always know 
what datatype it must be and it need not be inferred. I think the case is the 
same for `offset` as you mention. In fact, it looks like we try to coerce it to 
be Int64, 
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/opt
 imizer/src/analyzer/type_coercion.rs#L242.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-08 Thread via GitHub


kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080056995


##
datafusion/expr/src/expr.rs:
##
@@ -1775,6 +1775,27 @@ impl Expr {
 | Expr::SimilarTo(Like { expr, pattern, .. }) => {
 rewrite_placeholder(pattern.as_mut(), expr.as_ref(), 
schema)?;
 }
+Expr::InSubquery(InSubquery {
+expr,
+subquery,
+negated: _,
+}) => {
+let subquery_schema = subquery.subquery.schema();
+let fields = subquery_schema.fields();
+
+// only supports subquery with exactly 1 field
+if let [first_field] = &fields[..] {

Review Comment:
   The subquery should only return one column. 
https://github.com/apache/datafusion/blob/main/datafusion/sql/src/expr/subquery.rs#L120



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Additional placeholder datatype inferencing [datafusion]

2025-05-07 Thread via GitHub


alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2078420969


##
datafusion/expr/src/expr.rs:
##
@@ -1775,6 +1775,27 @@ impl Expr {
 | Expr::SimilarTo(Like { expr, pattern, .. }) => {
 rewrite_placeholder(pattern.as_mut(), expr.as_ref(), 
schema)?;
 }
+Expr::InSubquery(InSubquery {
+expr,
+subquery,
+negated: _,
+}) => {
+let subquery_schema = subquery.subquery.schema();
+let fields = subquery_schema.fields();
+
+// only supports subquery with exactly 1 field
+if let [first_field] = &fields[..] {

Review Comment:
   What happens if there is more than one field? Will it not rewrite any 
placeholders?



##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   I don't understand why the expression in the LIMIT needs special handling. 
It seems like it should be handled by `LogicalPlan::apply_expressions` here 
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/expr/src/logical_plan/tree_node.rs#L462-L461
   
   Also this code doesn't seem to handle the `offset` clause, only the `fetch`



##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1507,6 +1515,9 @@ impl LogicalPlan {
 (_, Some(dt)) => {
 param_types.insert(id.clone(), 
Some(dt.clone()));
 }
+(Some(Some(_)), None) => {
+// we have already inferred the datatype

Review Comment:
   When can this happen? Like what situations would one instance of the 
parameter like `$2` be resolved but another instance would not be 🤔 



##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
 let mut param_types: HashMap> = 
HashMap::new();
 
 self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {

Review Comment:
   However, when I removed this special case the tests you have added 
definitely fail 🤔 Something really strange is going on



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]