Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-08-06 Thread via GitHub


github-actions[bot] closed pull request #16186: feat: Support parsing 
subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations
URL: https://github.com/apache/datafusion/pull/16186


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-07-28 Thread via GitHub


github-actions[bot] commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-3130331771

   Thank you for your contribution. Unfortunately, this pull request is stale 
because it has been open 60 days with no activity. Please remove the stale 
label or comment or this will be closed in 7 days.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-29 Thread via GitHub


irenjj commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918752095

   > ~we can implement the tree formatter for this node type to visualize it~ 
   
   tree formatter is used only in datafusion-cli, for sqllogical test, we only 
use indent explain.🤣


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-29 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918703742

   i create a temp branch here to combine 2 PRs
   
https://github.com/duongcongtoai/arrow-datafusion/blob/14554-subquery-unnest-framework-fixed-planner/datafusion/sqllogictest/test_files/dependent_join_temp.slt
   
   the slt files demonstrate the queryplan rewritten into dependent join, i 
wonder if there is a way to visualize it


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2918429534

   yep, it should be merged after every point is clear, to reduce review burden


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


irenjj commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917853644

   It looks like @duongcongtoai  addressed the depth issue in #16016. Maybe 
this PR can be merged with #16016 to better verify the depth-related problem?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


logan-keede commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917741777

   cc @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917730231

   true, i've just realized it. Looks like a feature branch for us to work on 
is the way then?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


logan-keede commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917721651

   > Beware that this error is thrown after the planning stage has completed, 
and it is expected because the current limitation of subquery decorrelation.
   
   Oh I was under the impression that it was still in planning stage. Thanks 
for letting me know.
   
   > Looks like we need a new plan to somehow let this Optimizor back-off 
during the implementation of new rules 🤔
   
   > 2\. We add a if condition inside DependentJoinRewriter, that if the 
deepest subquery depth is 1, it doesn't change the query plan at all
   > => This allow the old optimizors to work as expected.
   
   We probably should not discriminate in terms of depths, as what works for 
higher depth should work at depth = 1. 
   Though I remember `Improving Unnesting of Complex Queries` mentioning the 
case of `Simplistic Unnesting` but I don't think that was limited by depth 
either.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917717164

   or an easiest way is to have a large feature branch :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917689696

   So here are my thoughts (this plan is to split the work in smaller PRs) 
while avoid breaking things as much as possible:
   1. we introduce 3 optimizors, declared in the order below:
   - DependentJoinRewriter
   - OldOptimizors (`DecorrelateLateralJoin`, `ScalarSubqueryToJoin`, 
`DecorrelatePredicateSubquery`)
   - NewOptimizor
   2. We add a if condition inside DependentJoinRewriter, that if the total 
subquery depth is 1, it doesn't change the query plan at all
   => This allow the old optimizors to work as expected
   3. Develop the newOptimizor


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


duongcongtoai commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917657542

   > The results are a little inconsistent. __scalar_sq_2."avg(e3.salary)", 
__scalar_sq_2.dept_id are not valid fields in the above context. Ideally, all 
the field in e1, e2 and e3 should come up here as they are valid.
   
   Beware that this error is thrown after the planning stage has completed, and 
it is expected because the current limitation of subquery decorrelation.
   
   The columns shown in the error is the side-effect of 
`DecorrelateLateralJoin` optimizor
   ```
   Inner Join: e1.dept_id = __scalar_sq_1.dept_id Filter: CAST(e1.salary AS 
Decimal128(38, 14)) > __scalar_sq_1.avg(e2.salary)
 SubqueryAlias: e1
   TableScan: employees projection=[employee_name, dept_id, salary]
 SubqueryAlias: __scalar_sq_1
   Projection: avg(e2.salary), e2.dept_id
 Aggregate: groupBy=[[e2.dept_id]], aggr=[[avg(e2.salary)]]
   Projection: e2.dept_id, e2.salary
 Inner Join:  Filter: CAST(e2.salary AS Decimal128(38, 14)) > 
__scalar_sq_2.avg(e3.salary) AND __scalar_sq_2.dept_id = e1.dept_id 
   SubqueryAlias: e2
 TableScan: employees projection=[dept_id, salary]
   SubqueryAlias: __scalar_sq_2
 Projection: avg(e3.salary), e3.dept_id
   Aggregate: groupBy=[[e3.dept_id]], aggr=[[avg(e3.salary)]]
 SubqueryAlias: e3
   TableScan: employees projection=[dept_id, salary]
   ```
   
   And the error is thrown at this line: `  Inner Join:  Filter: 
CAST(e2.salary AS Decimal128(38, 14)) > __scalar_sq_2.avg(e3.salary) AND 
__scalar_sq_2.dept_id = e1.dept_id `
   
   Looks like we need a new plan to somehow let this Optimizor back-off during 
the implementation of new rules :thinking: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-28 Thread via GitHub


logan-keede commented on PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#issuecomment-2917556654

   Currently error look like:-
   ```sql 
   > explain SELECT e1.employee_name, e1.salary
   FROM employees e1
   WHERE e1.salary > (
   SELECT AVG(e2.salary)
   FROM employees e2 
   WHERE e2.dept_id = e1.dept_id
   AND e2.salary > (
   SELECT AVG(e3.salary)
   FROM employees e3
   WHERE e3.dept_id = e1.dept_id
   )
   );
   Schema error: No field named e1.dept_id. Did you mean 'e3.dept_id'?.
   > explain SELECT e1.employee_name, e1.salary
   FROM employees e1
   WHERE e1.salary > (
   SELECT AVG(e2.salary)
   FROM employees e2 
   WHERE e2.dept_id = e1.dept_id
   AND e2.salary > (
   SELECT AVG(e3.salary)
   FROM employees e3
   WHERE e3.dept_id = e1.depd
   )
   );
   Schema error: No field named e1.depd. Valid fields are e3.employee_id, 
e3.employee_name, e3.dept_id, e3.salary.
   >
   ```
   After This PR:-
   ```sql
   > explain SELECT e1.employee_name, e1.salary
   FROM employees e1
   WHERE e1.salary > (
   SELECT AVG(e2.salary)
   FROM employees e2 
   WHERE e2.dept_id = e1.dept_id
   AND e2.salary > (
   SELECT AVG(e3.salary)
   FROM employees e3
   WHERE e3.dept_id = e1.depd
   )
   );
   Schema error: No field named e1.depd. Valid fields are e3.employee_id, 
e3.employee_name, e3.dept_id, e3.salary.
   > explain SELECT e1.employee_name, e1.salary
   FROM employees e1
   WHERE e1.salary > (
   SELECT AVG(e2.salary)
   FROM employees e2 
   WHERE e2.dept_id = e1.dept_id
   AND e2.salary > (
   SELECT AVG(e3.salary)
   FROM employees e3
   WHERE e3.dept_id = e1.dept_id
   )
   );
   Schema error: No field named e1.dept_id. Valid fields are e2.salary, 
__scalar_sq_2."avg(e3.salary)", __scalar_sq_2.dept_id.
   ```
   The results are a little inconsistent.  `__scalar_sq_2."avg(e3.salary)", 
__scalar_sq_2.dept_id` are not valid fields in the above context.
   Ideally, all the field in e1, e2 and e3 should come up here as they are 
valid. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-26 Thread via GitHub


duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2107708646


##
datafusion/sql/src/planner.rs:
##
@@ -235,18 +235,27 @@ impl PlannerContext {
 }
 
 // Return a reference to the outer query's schema
-pub fn outer_query_schema(&self) -> Option<&DFSchema> {
-self.outer_query_schema.as_ref().map(|s| s.as_ref())
+pub fn outer_query_schema(&self) -> Vec<&DFSchema> {
+self.outer_query_schema
+.iter()
+.map(|sc| sc.as_ref())
+.collect()
 }
 
 /// Sets the outer query schema, returning the existing one, if
 /// any
-pub fn set_outer_query_schema(

Review Comment:
   done, old methods were deprecated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub


duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106485055


##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
 
 statement count 0
 drop table person;
+
+# 
correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice < (
+select sum(l_extendedprice) as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_1.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_1
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < 
()
+10)Subquery:
+11)--Projection: sum(lineitem.l_extendedprice) AS price
+12)Aggregate: groupBy=[[]], 
aggr=[[sum(lineitem.l_extendedprice)]]
+13)--Filter: lineitem.l_orderkey = 
outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < 
outer_ref(customer.c_acctbal)
+14)TableScan: lineitem, 
partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), 
lineitem.l_extendedprice < outer_ref(customer.c_acctbal)]
+15)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and exists (
+select * from lineitem where l_orderkey = o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_2.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_2
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--LeftSemi Join: orders.o_orderkey = 
__correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < 
customer.c_acctbal
+10)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+11)SubqueryAlias: __correlated_sq_1
+12)--TableScan: lineitem projection=[l_orderkey, 
l_extendedprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice in (
+select l_extendedprice as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;

Review Comment:
   yep, the generated logical plan is still unexecutable, because all existing 
decorrelating optimizor cannot decorrelate it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub


irenjj commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691


##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
 
 statement count 0
 drop table person;
+
+# 
correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice < (
+select sum(l_extendedprice) as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_1.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_1
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < 
()
+10)Subquery:
+11)--Projection: sum(lineitem.l_extendedprice) AS price
+12)Aggregate: groupBy=[[]], 
aggr=[[sum(lineitem.l_extendedprice)]]
+13)--Filter: lineitem.l_orderkey = 
outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < 
outer_ref(customer.c_acctbal)
+14)TableScan: lineitem, 
partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), 
lineitem.l_extendedprice < outer_ref(customer.c_acctbal)]
+15)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and exists (
+select * from lineitem where l_orderkey = o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_2.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_2
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--LeftSemi Join: orders.o_orderkey = 
__correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < 
customer.c_acctbal
+10)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+11)SubqueryAlias: __correlated_sq_1
+12)--TableScan: lineitem projection=[l_orderkey, 
l_extendedprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice in (
+select l_extendedprice as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;

Review Comment:
   Still failed in datafusion-cli, but in sqllogictest it runs well:
   ```
   > explain select c_custkey from customer
   where c_acctbal < (
   select sum(o_totalprice) from orders
   where o_custkey = c_custkey
   and o_totalprice in (
   select l_extendedprice as price from lineitem where l_orderkey = 
o_orderkey
   and l_extendedprice < c_acctbal
   )
   ) order by c_custkey;
   Schema error: No field named customer.c_acctbal. Valid fields are 
__correlated_sq_1.l_extendedprice.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub


irenjj commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106333691


##
datafusion/sqllogictest/test_files/subquery.slt:
##
@@ -1482,3 +1482,85 @@ logical_plan
 
 statement count 0
 drop table person;
+
+# 
correlated_recursive_scalar_subquery_with_level_3_scalar_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice < (
+select sum(l_extendedprice) as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_1.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_1.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_1
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--Filter: CAST(orders.o_totalprice AS Decimal128(25, 2)) < 
()
+10)Subquery:
+11)--Projection: sum(lineitem.l_extendedprice) AS price
+12)Aggregate: groupBy=[[]], 
aggr=[[sum(lineitem.l_extendedprice)]]
+13)--Filter: lineitem.l_orderkey = 
outer_ref(orders.o_orderkey) AND lineitem.l_extendedprice < 
outer_ref(customer.c_acctbal)
+14)TableScan: lineitem, 
partial_filters=[lineitem.l_orderkey = outer_ref(orders.o_orderkey), 
lineitem.l_extendedprice < outer_ref(customer.c_acctbal)]
+15)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_exists_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and exists (
+select * from lineitem where l_orderkey = o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;
+
+logical_plan
+01)Sort: customer.c_custkey ASC NULLS LAST
+02)--Projection: customer.c_custkey
+03)Inner Join: customer.c_custkey = __scalar_sq_2.o_custkey Filter: 
CAST(customer.c_acctbal AS Decimal128(25, 2)) < 
__scalar_sq_2.sum(orders.o_totalprice)
+04)--TableScan: customer projection=[c_custkey, c_acctbal]
+05)--SubqueryAlias: __scalar_sq_2
+06)Projection: sum(orders.o_totalprice), orders.o_custkey
+07)--Aggregate: groupBy=[[orders.o_custkey]], 
aggr=[[sum(orders.o_totalprice)]]
+08)Projection: orders.o_custkey, orders.o_totalprice
+09)--LeftSemi Join: orders.o_orderkey = 
__correlated_sq_1.l_orderkey Filter: __correlated_sq_1.l_extendedprice < 
customer.c_acctbal
+10)TableScan: orders projection=[o_orderkey, o_custkey, 
o_totalprice]
+11)SubqueryAlias: __correlated_sq_1
+12)--TableScan: lineitem projection=[l_orderkey, 
l_extendedprice]
+
+# 
correlated_recursive_scalar_subquery_with_level_3_in_subquery_referencing_level1_relation
+query TT
+explain select c_custkey from customer
+where c_acctbal < (
+select sum(o_totalprice) from orders
+where o_custkey = c_custkey
+and o_totalprice in (
+select l_extendedprice as price from lineitem where l_orderkey = 
o_orderkey
+and l_extendedprice < c_acctbal
+)
+) order by c_custkey;

Review Comment:
   Still failed in datafusion-cli, but int sqllogictest it runs well:
   ```
   > explain select c_custkey from customer
   where c_acctbal < (
   select sum(o_totalprice) from orders
   where o_custkey = c_custkey
   and o_totalprice in (
   select l_extendedprice as price from lineitem where l_orderkey = 
o_orderkey
   and l_extendedprice < c_acctbal
   )
   ) order by c_custkey;
   Schema error: No field named customer.c_acctbal. Valid fields are 
__correlated_sq_1.l_extendedprice.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] feat: Support parsing subqueries with `OuterReferenceColumn` belongs to non-adjacent outer relations [datafusion]

2025-05-25 Thread via GitHub


duongcongtoai commented on code in PR #16186:
URL: https://github.com/apache/datafusion/pull/16186#discussion_r2106305831


##
datafusion/optimizer/src/push_down_filter.rs:
##
@@ -1089,7 +1089,13 @@ impl OptimizerRule for PushDownFilter {
 let (volatile_filters, non_volatile_filters): (Vec<&Expr>, 
Vec<&Expr>) =
 filter_predicates
 .into_iter()
-.partition(|pred| pred.is_volatile());
+// TODO: subquery decorrelation sometimes cannot 
decorrelated all the expr
+// (i.e in the case of recursive subquery)
+// this function may accidentally pushdown the 
subquery expr as well
+// until then, we have to exclude these exprs here
+.partition(|pred| {
+pred.is_volatile() || has_scalar_subquery(pred)

Review Comment:
   when we allow nested subquery, the final plan reaches this optimizor and the 
predicate on scalar_subquery can be accidentally push down



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]