[ https://issues.apache.org/jira/browse/SPARK-46446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wenchen Fan reassigned SPARK-46446: ----------------------------------- Assignee: Jack Chen > Correctness bug in correlated subquery with OFFSET > -------------------------------------------------- > > Key: SPARK-46446 > URL: https://issues.apache.org/jira/browse/SPARK-46446 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 4.0.0 > Reporter: Jack Chen > Assignee: Jack Chen > Priority: Major > Labels: pull-request-available > > Subqueries with correlation under LIMIT with OFFSET have a correctness bug, > introduced recently when support for correlation under OFFSET was enabled but > were not handled correctly. (So we went from unsupported, query throws error > -> wrong results.) > It’s a bug in all types of correlated subqueries: scalar, lateral, IN, EXISTS > > It's easy to repro with a query like > {code:java} > create table x(x1 int, x2 int); > insert into x values (1, 1), (2, 2); > create table y(y1 int, y2 int); > insert into y values (1, 1), (1, 2), (2, 4); > select * from x where exists (select * from y where x1 = y1 limit 1 offset > 2){code} > Correct result: empty set, see postgres: > [https://www.db-fiddle.com/f/dtXNn7hwDnemiCTUhvwgYM/0] > Spark result: Array([2,2]) > > The > [PR|https://github.com/apache/spark/pull/43111/files/324a106611e6d62c31535cfc43863fdaa16e5dda#diff-583171e935b2dc349378063a5841c5b98b30a2d57ac3743a9eccfe7bffcb8f2aR1403] > where it was introduced added a test for it, but the golden file results for > the test actually were incorrect and we didn't notice. (The bug was initially > found by https://github.com/apache/spark/pull/44084) > I'll work on both: > * Adding support for offset in DecorrelateInnerQuery (the transformation is > into a filter on row_number window function, similar to limit). > * Adding a feature flag to enable/disable offset in subquery support -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org