Re: [PR] Additional placeholder datatype inferencing [datafusion]
github-actions[bot] closed pull request #15980: Additional placeholder datatype inferencing URL: https://github.com/apache/datafusion/pull/15980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
github-actions[bot] commented on PR #15980: URL: https://github.com/apache/datafusion/pull/15980#issuecomment-3153035994 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
alamb commented on PR #15980: URL: https://github.com/apache/datafusion/pull/15980#issuecomment-2936183106 Marking as draft as I think this PR is no longer waiting on feedback and I am trying to make it easier to find PRs in need of review. Please mark it as ready for review when it is ready for another look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2115997415
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
Thank you 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2114359645
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
Yeah I agree with you. I've got caught up working on some other things but
I've been giving this some thought and plan to work on it. We will find
something that we like.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2112276143
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
This is my last remaining concern -- if we can figure out some way that
LIMIT doesn't need special case it would be great
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2082330048
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
What concerns me about the code is that code for inferring data type has a
special case for `LIMIT` but from what I know `LIMIT` doesn't have any special
behavior for parameters
If we need a schema to infer types for `LIMIT` we could either
1. Use its `input` plan (though all schema would be ignored)
2. Use `Schema::empty()` perhaps
Does that make sense?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
My test is actually really unfair, I will modify it. I constructed
`LogicalPlan` by hand and gave it no opportunity to do any placeholder type
inference.
`Expr::infer_placeholder_types` takes an `Expr` and a `DFSchema` and it uses
the schema to infer the datatype from the field type. However, there is no
schema that will provide that information for `LIMIT`. It's a special case and
we (I believe) always know what datatype it must be and it need not be
inferred. I think the case is the same for `offset` as you mention. In fact, it
looks like we try to coerce it to be Int64,
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/optimizer/src/analyzer/type_coercion.rs#L242.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080123467
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1507,6 +1515,9 @@ impl LogicalPlan {
(_, Some(dt)) => {
param_types.insert(id.clone(),
Some(dt.clone()));
}
+(Some(Some(_)), None) => {
+// we have already inferred the datatype
Review Comment:
This happens because we earlier treated LIMIT specially. We added it to the
`param_types` but when we get to the Placeholder expression here, we see that
the type was never inferred. If we weren't in an immutable method, we could do
the actual type inference above instead of directly populating `param_types`
and let it be appropriately populated here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080118522
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
My test is actually really unfair, I will modify it. I constructed
`LogicalPlan` by hand and gave it no opportunity to do any placeholder type
inference. For example, if I modify the test and call
`LogicalPlan::with_param_values`, it will attempt to infer the datatypes so
that replacement is valid. I've done this and the type is still not getting
inferred for the `LIMIT` case. This makes sense though the way it's currently
designed and the nature of the LIMIT clause. `Expr::infer_placeholder_types`
takes an `Expr` and a `DFSchema` and it uses the schema to infer the datatype
from the field type. However, there is no schema that will provide that
information for `LIMIT`. It's a special case and we (I believe) always know
what datatype it must be and it need not be inferred. I think the case is the
same for `offset` as you mention. In fact, it looks like we try to coerce it to
be Int64,
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/opt
imizer/src/analyzer/type_coercion.rs#L242.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
kczimm commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2080056995
##
datafusion/expr/src/expr.rs:
##
@@ -1775,6 +1775,27 @@ impl Expr {
| Expr::SimilarTo(Like { expr, pattern, .. }) => {
rewrite_placeholder(pattern.as_mut(), expr.as_ref(),
schema)?;
}
+Expr::InSubquery(InSubquery {
+expr,
+subquery,
+negated: _,
+}) => {
+let subquery_schema = subquery.subquery.schema();
+let fields = subquery_schema.fields();
+
+// only supports subquery with exactly 1 field
+if let [first_field] = &fields[..] {
Review Comment:
The subquery should only return one column.
https://github.com/apache/datafusion/blob/main/datafusion/sql/src/expr/subquery.rs#L120
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Additional placeholder datatype inferencing [datafusion]
alamb commented on code in PR #15980:
URL: https://github.com/apache/datafusion/pull/15980#discussion_r2078420969
##
datafusion/expr/src/expr.rs:
##
@@ -1775,6 +1775,27 @@ impl Expr {
| Expr::SimilarTo(Like { expr, pattern, .. }) => {
rewrite_placeholder(pattern.as_mut(), expr.as_ref(),
schema)?;
}
+Expr::InSubquery(InSubquery {
+expr,
+subquery,
+negated: _,
+}) => {
+let subquery_schema = subquery.subquery.schema();
+let fields = subquery_schema.fields();
+
+// only supports subquery with exactly 1 field
+if let [first_field] = &fields[..] {
Review Comment:
What happens if there is more than one field? Will it not rewrite any
placeholders?
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
I don't understand why the expression in the LIMIT needs special handling.
It seems like it should be handled by `LogicalPlan::apply_expressions` here
https://github.com/apache/datafusion/blob/41e7aed3a943134c40d1b18cb9d424b358b5e5b1/datafusion/expr/src/logical_plan/tree_node.rs#L462-L461
Also this code doesn't seem to handle the `offset` clause, only the `fetch`
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1507,6 +1515,9 @@ impl LogicalPlan {
(_, Some(dt)) => {
param_types.insert(id.clone(),
Some(dt.clone()));
}
+(Some(Some(_)), None) => {
+// we have already inferred the datatype
Review Comment:
When can this happen? Like what situations would one instance of the
parameter like `$2` be resolved but another instance would not be 🤔
##
datafusion/expr/src/logical_plan/plan.rs:
##
@@ -1494,6 +1494,14 @@ impl LogicalPlan {
let mut param_types: HashMap> =
HashMap::new();
self.apply_with_subqueries(|plan| {
+if let LogicalPlan::Limit(Limit { fetch: Some(e), .. }) = plan {
Review Comment:
However, when I removed this special case the tests you have added
definitely fail 🤔 Something really strange is going on
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
