Matthew Weaver created HIVE-4697: ------------------------------------ Summary: Subqueries with IN and NOT IN Key: HIVE-4697 URL: https://issues.apache.org/jira/browse/HIVE-4697 Project: Hive Issue Type: New Feature Components: Query Processor Reporter: Matthew Weaver Assignee: Matthew Weaver
h5. Functional Requirements * Support {{WHERE x IN (<column subquery>);}} ** {{<column subquery>}} returns one column, any number of rows. * Support {{WHERE x NOT IN (<column subquery>)}}; * Support same types of subqueries in {{HAVING}}. ** E.g. {code:sql} SELECT key FROM t1 GROUP BY key HAVING COUNT(value) IN (SELECT p FROM t2); {code} * Correlated subqueries not supported, for now at least ** But still need to check for correlation, and bail if it occurs. ** Correlated subquery: *** A subquery that references a table that appears in a containing query ([MySQL|http://dev.mysql.com/doc/refman/5.7/en/correlated-subqueries.html]), thus requiring subquery evaluation to look outside its scope. *** The subquery depends on the outer query for its values, so the subquery must be executed once for each row of the outer query. Also known as _repeating Subqueries_. h5. Tasks * Rewrite {{IN (<column-subquery>)}} as a {{LEFT SEMI JOIN}}. ** Not ready for public consumption. In particular, no check for correlated terms. ** With test queries. * Add check for correlated terms, return informative error message. * Rewrite {{WHERE NOT IN (<column-subquery>)}} as a {{LEFT OUTER JOIN}}. ** Return rows that don't match the right side * Rewrite subqueries in {{HAVING}}, using {{LEFT SEMI JOIN}} and {{LEFT OUTER JOIN}} as above. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira