Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
nuno-faria commented on PR #15212: URL: https://github.com/apache/datafusion/pull/15212#issuecomment-2736367170 Thank you @chenkovsky and @goldmedal for fixing the issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
goldmedal merged PR #15212: URL: https://github.com/apache/datafusion/pull/15212 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
goldmedal commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1998987034 ## .github/workflows/rust.yml: ## @@ -246,6 +246,7 @@ jobs: mc cp -r /source/* localminio/data" - name: Run tests (excluding doctests) env: + RUST_MIN_STACK: 4194304 Review Comment: This test has been modified in the `main` branch. It seems that it never happens stack overflow anymore. ## datafusion/sql/src/unparser/expr.rs: ## @@ -94,6 +94,7 @@ impl Unparser<'_> { Ok(root_expr) } +#[cfg_attr(feature = "recursive_protection", recursive::recursive)] Review Comment: This annotation should be enabled by using the `recursive_protection` feature. I found the CI doesn't enable it. I experimented with it https://github.com/apache/datafusion/pull/15272 (It seems to work well) You can see how I modified the workflow. ## .github/workflows/extended.yml: ## @@ -81,6 +81,7 @@ jobs: - name: Run tests (excluding doctests) env: RUST_BACKTRACE: 1 + RUST_MIN_STACK: 4194304 Review Comment: After enabling `recursive_protection`, it can be removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
chenkovsky commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1999031362 ## datafusion/sql/src/unparser/expr.rs: ## @@ -94,6 +94,7 @@ impl Unparser<'_> { Ok(root_expr) } +#[cfg_attr(feature = "recursive_protection", recursive::recursive)] Review Comment: interesting. actually, I tested in my env, without this annotation, stack will be overflowed. but with this annotation, it works. so I haven't checked whether it's enabled in CI. you saved my day. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
goldmedal commented on code in PR #15212: URL: https://github.com/apache/datafusion/pull/15212#discussion_r1995622142 ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -1746,3 +1749,153 @@ fn test_unparse_subquery_alias_with_table_pushdown() -> Result<()> { assert_eq!(sql.to_string(), expected); Ok(()) } + +#[test] +fn test_unparse_left_anti_join() -> Result<()> { +let schema = Schema::new(vec![ +Field::new("c", DataType::Int32, false), +Field::new("d", DataType::Int32, false), +]); + +// LeftAnti Join: t1.c = __correlated_sq_1.c +// TableScan: t1 projection=[c] +// SubqueryAlias: __correlated_sq_1 +// TableScan: t2 projection=[c] + +let table_scan1 = table_scan(Some("t1"), &schema, Some(vec![0, 1]))?.build()?; +let table_scan2 = table_scan(Some("t2"), &schema, Some(vec![0]))?.build()?; +let subquery = subquery_alias(table_scan2, "__correlated_sq_1")?; +let plan = LogicalPlanBuilder::from(table_scan1) +.project(vec![col("t1.d")])? +.join_on( +subquery, +datafusion_expr::JoinType::LeftAnti, +vec![col("t1.c").eq(col("__correlated_sq_1.c"))], +)? +.build()?; + +let unparser = Unparser::new(&UnparserPostgreSqlDialect {}); +let sql = unparser.plan_to_sql(&plan)?; +assert_eq!("SELECT \"t1\".\"d\" FROM \"t1\" WHERE NOT EXISTS (SELECT 1 FROM \"t2\" AS \"__correlated_sq_1\" WHERE (\"t1\".\"c\" = \"__correlated_sq_1\".\"c\"))", sql.to_string()); +Ok(()) +} + +#[test] +fn test_unparse_left_semi_join() -> Result<()> { +let schema = Schema::new(vec![ +Field::new("c", DataType::Int32, false), +Field::new("d", DataType::Int32, false), +]); + +// LeftAnti Join: t1.c = __correlated_sq_1.c +// TableScan: t1 projection=[c] +// SubqueryAlias: __correlated_sq_1 +// TableScan: t2 projection=[c] Review Comment: I think the comment sould be `semi_join`, not `left anti join`. ## datafusion/sql/src/unparser/ast.rs: ## @@ -176,6 +177,41 @@ impl SelectBuilder { self.lateral_views = value; self } + +/// Replaces the selection with a new value. +/// +/// This function is used to replace a specific expression within the selection. +/// Unlike the `selection` method which combines existing and new selections with AND, +/// this method searches for and replaces occurrences of a specific expression. +/// +/// This method is primarily used to modify LEFT MARK JOIN expressions. +/// When processing a LEFT MARK JOIN, we need to replace the placeholder expression +/// with the actual join condition in the selection clause. +/// +/// # Arguments +/// +/// * `existing_expr` - The expression to replace +/// * `value` - The new expression to set as the selection +/// +/// # Returns +/// +/// A mutable reference to self for method chaining Review Comment: ```suggestion ``` It's redundant. ## datafusion/sql/tests/cases/plan_to_sql.rs: ## @@ -1746,3 +1749,153 @@ fn test_unparse_subquery_alias_with_table_pushdown() -> Result<()> { assert_eq!(sql.to_string(), expected); Ok(()) } + +#[test] +fn test_unparse_left_anti_join() -> Result<()> { +let schema = Schema::new(vec![ +Field::new("c", DataType::Int32, false), +Field::new("d", DataType::Int32, false), +]); + +// LeftAnti Join: t1.c = __correlated_sq_1.c +// TableScan: t1 projection=[c] +// SubqueryAlias: __correlated_sq_1 +// TableScan: t2 projection=[c] + +let table_scan1 = table_scan(Some("t1"), &schema, Some(vec![0, 1]))?.build()?; +let table_scan2 = table_scan(Some("t2"), &schema, Some(vec![0]))?.build()?; +let subquery = subquery_alias(table_scan2, "__correlated_sq_1")?; +let plan = LogicalPlanBuilder::from(table_scan1) +.project(vec![col("t1.d")])? +.join_on( +subquery, +datafusion_expr::JoinType::LeftAnti, +vec![col("t1.c").eq(col("__correlated_sq_1.c"))], +)? +.build()?; + +let unparser = Unparser::new(&UnparserPostgreSqlDialect {}); +let sql = unparser.plan_to_sql(&plan)?; +assert_eq!("SELECT \"t1\".\"d\" FROM \"t1\" WHERE NOT EXISTS (SELECT 1 FROM \"t2\" AS \"__correlated_sq_1\" WHERE (\"t1\".\"c\" = \"__correlated_sq_1\".\"c\"))", sql.to_string()); +Ok(()) +} + +#[test] +fn test_unparse_left_semi_join() -> Result<()> { +let schema = Schema::new(vec![ +Field::new("c", DataType::Int32, false), +Field::new("d", DataType::Int32, false), +]); + +// LeftAnti Join: t1.c = __correlated_sq_1.c +// TableScan: t1 projection=[c] +// SubqueryAlias: __correlated_sq_1 +// TableScan: t2 projection=[c] + +let table_scan1 = table_scan(Some("t1"), &schema, Some(vec![0, 1]))?.build()?; +let table_scan2 = table_scan(So
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
chenkovsky commented on PR #15212: URL: https://github.com/apache/datafusion/pull/15212#issuecomment-2725105874 > > for stackoverflow problem. do you have any idea? > > No, I don't have any idea currently 🤔. I'm not sure, but I guess `recursive_protection` is the right direction. the wierd thing is that after recursive_protection was added, on amd64 linux, stack is still overflowed. but on other platform, it's ok. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
goldmedal commented on PR #15212: URL: https://github.com/apache/datafusion/pull/15212#issuecomment-2725013843 > for stackoverflow problem. do you have any idea? No, I don't have any idea currently 🤔. I'm not sure, but I guess `recursive_protection` is the right direction. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
chenkovsky commented on PR #15212: URL: https://github.com/apache/datafusion/pull/15212#issuecomment-2724945557 > Thanks @chenkovsky. I think the results of `left mark join`, `right semi join` and `right anit join` aren't correct (they aren't executable). I left some comments for them. Could you double-check them? thank you for your reply. I'm struggling with stackoverflow problem, so I missed something carelessly. it's my fault. I will fix it asap. for stackoverflow problem. do you have any idea? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
Re: [PR] fix: unparsing left/ right semi/mark join [datafusion]
chenkovsky commented on PR #15212: URL: https://github.com/apache/datafusion/pull/15212#issuecomment-2723305518 stackoverflow again. ಥ_ಥ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org
[PR] fix: unparsing left/ right semi/mark join [datafusion]
chenkovsky opened a new pull request, #15212: URL: https://github.com/apache/datafusion/pull/15212 ## Which issue does this PR close? - Closes #15127. ## Rationale for this change Most SQL dialects don't support specialized join types like LEFT SEMI, LEFT ANTI, LEFT MARK, RIGHT SEMI, or RIGHT ANTI joins. ## What changes are included in this PR? For these cases, this PR rewrites these joins into equivalent EXISTS subqueries. However, an alternative approach would be to generate dialect-specific SQL for different database engines. ## Are these changes tested? unit test ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org