Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
github-actions[bot] closed pull request #16186: feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations URL: https://github.com/apache/datafusion/pull/16186 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
github-actions[bot] commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-3130331771 Thank you for your contribution. Unfortunately, this pull request is stale because it has been open 60 days with no activity. Please remove the stale label or comment or this will be closed in 7 days. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
irenjj commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918752095 > ~we can implement the tree formatter for this node type to visualize it~ tree formatter is used only in datafusion-cli, for sqllogical test, we only use indent explain.🤣 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918703742 i create a temp branch here to combine 2 PRs https://github.com/duongcongtoai/arrow-datafusion/blob/14554-subquery-unnest-framework-fixed-planner/datafusion/sqllogictest/test_files/dependent_join_temp.slt the slt files demonstrate the queryplan rewritten into dependent join, i wonder if there is a way to visualize it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918429534 yep, it should be merged after every point is clear, to reduce review burden -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
irenjj commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917853644 It looks like @duongcongtoai addressed the depth issue in #16016. Maybe this PR can be merged with #16016 to better verify the depth-related problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917741777 cc @alamb -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917730231 true, i've just realized it. Looks like a feature branch for us to work on is the way then? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917721651 > Beware that this error is thrown after the planning stage has completed, and it is expected because the current limitation of subquery decorrelation. Oh I was under the impression that it was still in planning stage. Thanks for letting me know. > Looks like we need a new plan to somehow let this Optimizor back-off during the implementation of new rules 🤔 > 2\. We add a if condition inside DependentJoinRewriter, that if the deepest subquery depth is 1, it doesn't change the query plan at all > => This allow the old optimizors to work as expected. We probably should not discriminate in terms of depths, as what works for higher depth should work at depth = 1. Though I remember `Improving Unnesting of Complex Queries` mentioning the case of `Simplistic Unnesting` but I don't think that was limited by depth either. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917717164 or an easiest way is to have a large feature branch :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917689696 So here are my thoughts (this plan is to split the work in smaller PRs) while avoid breaking things as much as possible: 1. we introduce 3 optimizors, declared in the order below: - DependentJoinRewriter - OldOptimizors (`DecorrelateLateralJoin`, `ScalarSubqueryToJoin`, `DecorrelatePredicateSubquery`) - NewOptimizor 2. We add a if condition inside DependentJoinRewriter, that if the total subquery depth is 1, it doesn't change the query plan at all => This allow the old optimizors to work as expected 3. Develop the newOptimizor -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917657542 > The results are a little inconsistent. __scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id are not valid fields in the above context. Ideally, all the field in e1, e2 and e3 should come up here as they are valid. Beware that this error is thrown after the planning stage has completed, and it is expected because the current limitation of subquery decorrelation. The columns shown in the error is the side-effect of `DecorrelateLateralJoin` optimizor ``` Inner Join: e1.dept_id = __scalar_sq_1.dept_id Filter: CAST(e1.salary AS Decimal128(38, 14)) > __scalar_sq_1.avg(e2.salary) SubqueryAlias: e1 TableScan: employees projection=[employee_name, dept_id, salary] SubqueryAlias: __scalar_sq_1 Projection: avg(e2.salary), e2.dept_id Aggregate: groupBy=[[e2.dept_id]], aggr=[[avg(e2.salary)]] Projection: e2.dept_id, e2.salary Inner Join: Filter: CAST(e2.salary AS Decimal128(38, 14)) > __scalar_sq_2.avg(e3.salary) AND __scalar_sq_2.dept_id = e1.dept_id SubqueryAlias: e2 TableScan: employees projection=[dept_id, salary] SubqueryAlias: __scalar_sq_2 Projection: avg(e3.salary), e3.dept_id Aggregate: groupBy=[[e3.dept_id]], aggr=[[avg(e3.salary)]] SubqueryAlias: e3 TableScan: employees projection=[dept_id, salary] ``` And the error is thrown at this line: ` Inner Join: Filter: CAST(e2.salary AS Decimal128(38, 14)) > __scalar_sq_2.avg(e3.salary) AND __scalar_sq_2.dept_id = e1.dept_id ` Looks like we need a new plan to somehow let this Optimizor back-off during the implementation of new rules :thinking: -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
logan-keede commented on PR #16186: URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917556654 Currently error look like:- ```sql > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id AND e2.salary > ( SELECT AVG(e3.salary) FROM employees e3 WHERE e3.dept_id = e1.dept_id ) ); Schema error: No field named e1.dept_id. Did you mean 'e3.dept_id'?. > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id AND e2.salary > ( SELECT AVG(e3.salary) FROM employees e3 WHERE e3.dept_id = e1.depd ) ); Schema error: No field named e1.depd. Valid fields are e3.employee_id, e3.employee_name, e3.dept_id, e3.salary. > ``` After This PR:- ```sql > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id AND e2.salary > ( SELECT AVG(e3.salary) FROM employees e3 WHERE e3.dept_id = e1.depd ) ); Schema error: No field named e1.depd. Valid fields are e3.employee_id, e3.employee_name, e3.dept_id, e3.salary. > explain SELECT e1.employee_name, e1.salary FROM employees e1 WHERE e1.salary > ( SELECT AVG(e2.salary) FROM employees e2 WHERE e2.dept_id = e1.dept_id AND e2.salary > ( SELECT AVG(e3.salary) FROM employees e3 WHERE e3.dept_id = e1.dept_id ) ); Schema error: No field named e1.dept_id. Valid fields are e2.salary, __scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id. ``` The results are a little inconsistent. `__scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id` are not valid fields in the above context. Ideally, all the field in e1, e2 and e3 should come up here as they are valid. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2107708646
##
datafusion/sql/src/planner.rs:
##
@@ -235,18 +235,27 @@ impl PlannerContext {
}
// Return a reference to the outer query's schema
-pub fn outer_query_schema(&self) -> Option<&DFSchema> {
-self.outer_query_schema.as_ref().map(|s| s.as_ref())
+pub fn outer_query_schema(&self) -> Vec<&DFSchema> {
+self.outer_query_schema
+.iter()
+.map(|sc| sc.as_ref())
+.collect()
}
/// Sets the outer query schema, returning the existing one, if
/// any
-pub fn set_outer_query_schema(
Review Comment:
done, old methods were deprecated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106485055 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice < ( +select sum(l_extendedprice) as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_1.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_1 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < () +10)Subquery: +11)--Projection: sum(lineitem.l_extendedprice) AS price +12)Aggregate: groupBy=[[]], aggr=[[sum(lineitem.l_extendedprice)]] +13)--Filter: lineitem.l_orderkey = outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < outer_ref(customer.c_acctbal) +14)TableScan: lineitem, partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), lineitem.l_extendedprice < outer_ref(customer.c_acctbal)] +15)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] + +# correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and exists ( +select * from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_2.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_2 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--LeftSemi Join: orders.o_orderkey = __correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < customer.c_acctbal +10)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] +11)SubqueryAlias: __correlated_sq_1 +12)--TableScan: lineitem projection=[l_orderkey, l_extendedprice] + +# correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice in ( +select l_extendedprice as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; Review Comment: yep, the generated logical plan is still unexecutable, because all existing decorrelating optimizor cannot decorrelate it -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
irenjj commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice < ( +select sum(l_extendedprice) as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_1.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_1 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < () +10)Subquery: +11)--Projection: sum(lineitem.l_extendedprice) AS price +12)Aggregate: groupBy=[[]], aggr=[[sum(lineitem.l_extendedprice)]] +13)--Filter: lineitem.l_orderkey = outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < outer_ref(customer.c_acctbal) +14)TableScan: lineitem, partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), lineitem.l_extendedprice < outer_ref(customer.c_acctbal)] +15)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] + +# correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and exists ( +select * from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_2.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_2 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--LeftSemi Join: orders.o_orderkey = __correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < customer.c_acctbal +10)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] +11)SubqueryAlias: __correlated_sq_1 +12)--TableScan: lineitem projection=[l_orderkey, l_extendedprice] + +# correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice in ( +select l_extendedprice as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; Review Comment: Still failed in datafusion-cli, but in sqllogictest it runs well: ``` > explain select c_custkey from customer where c_acctbal < ( select sum(o_totalprice) from orders where o_custkey = c_custkey and o_totalprice in ( select l_extendedprice as price from lineitem where l_orderkey = o_orderkey and l_extendedprice < c_acctbal ) ) order by c_custkey; Schema error: No field named customer.c_acctbal. Valid fields are __correlated_sq_1.l_extendedprice. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
irenjj commented on code in PR #16186: URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691 ## datafusion/sqllogictest/test_files/subquery.slt: ## @@ -1482,3 +1482,85 @@ logical_plan statement count 0 drop table person; + +# correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice < ( +select sum(l_extendedprice) as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_1.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_1 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < () +10)Subquery: +11)--Projection: sum(lineitem.l_extendedprice) AS price +12)Aggregate: groupBy=[[]], aggr=[[sum(lineitem.l_extendedprice)]] +13)--Filter: lineitem.l_orderkey = outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < outer_ref(customer.c_acctbal) +14)TableScan: lineitem, partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), lineitem.l_extendedprice < outer_ref(customer.c_acctbal)] +15)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] + +# correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and exists ( +select * from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; + +logical_plan +01)Sort: customer.c_custkey ASC NULLS LAST +02)--Projection: customer.c_custkey +03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: CAST(customer.c_acctbal AS Decimal128(25, 2)) < __scalar_sq_2.sum(orders.o_totalprice) +04)--TableScan: customer projection=[c_custkey, c_acctbal] +05)--SubqueryAlias: __scalar_sq_2 +06)Projection: sum(orders.o_totalprice), orders.o_custkey +07)--Aggregate: groupBy=[[orders.o_custkey]], aggr=[[sum(orders.o_totalprice)]] +08)Projection: orders.o_custkey, orders.o_totalprice +09)--LeftSemi Join: orders.o_orderkey = __correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < customer.c_acctbal +10)TableScan: orders projection=[o_orderkey, o_custkey, o_totalprice] +11)SubqueryAlias: __correlated_sq_1 +12)--TableScan: lineitem projection=[l_orderkey, l_extendedprice] + +# correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation +query TT +explain select c_custkey from customer +where c_acctbal < ( +select sum(o_totalprice) from orders +where o_custkey = c_custkey +and o_totalprice in ( +select l_extendedprice as price from lineitem where l_orderkey = o_orderkey +and l_extendedprice < c_acctbal +) +) order by c_custkey; Review Comment: Still failed in datafusion-cli, but int sqllogictest it runs well: ``` > explain select c_custkey from customer where c_acctbal < ( select sum(o_totalprice) from orders where o_custkey = c_custkey and o_totalprice in ( select l_extendedprice as price from lineitem where l_orderkey = o_orderkey and l_extendedprice < c_acctbal ) ) order by c_custkey; Schema error: No field named customer.c_acctbal. Valid fields are __correlated_sq_1.l_extendedprice. ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]
duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106305831
##
datafusion/optimizer/src/push_down_filter.rs:
##
@@ -1089,7 +1089,13 @@ impl OptimizerRule for PushDownFilter {
let (volatile_filters, non_volatile_filters): (Vec<&Expr>,
Vec<&Expr>) =
filter_predicates
.into_iter()
-.partition(|pred| pred.is_volatile());
+// TODO: subquery decorrelation sometimes cannot
decorrelated all the expr
+// (i.e in the case of recursive subquery)
+// this function may accidentally pushdown the
subquery expr as well
+// until then, we have to exclude these exprs here
+.partition(|pred| {
+pred.is_volatile() || has_scalar_subquery(pred)
Review Comment:
when we allow nested subquery, the final plan reaches this optimizor and the
predicate on scalar_subquery can be accidentally push down
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
