Matthew Weaver created HIVE-4697:
------------------------------------

             Summary: Subqueries with IN and NOT IN
                 Key: HIVE-4697
                 URL: https://issues.apache.org/jira/browse/HIVE-4697
             Project: Hive
          Issue Type: New Feature
          Components: Query Processor
            Reporter: Matthew Weaver
            Assignee: Matthew Weaver


h5. Functional Requirements

* Support {{WHERE x IN (<column subquery>);}}
** {{<column subquery>}} returns one column, any number of rows.
* Support {{WHERE x NOT IN (<column subquery>)}};
* Support same types of subqueries in {{HAVING}}.
** E.g.
    {code:sql}
SELECT key FROM t1 GROUP BY key 
HAVING COUNT(value) IN (SELECT p FROM t2);  {code}
* Correlated subqueries not supported, for now at least
** But still need to check for correlation, and bail if it occurs.
** Correlated subquery:
*** A subquery that references a table that appears in a containing query 
([MySQL|http://dev.mysql.com/doc/refman/5.7/en/correlated-subqueries.html]), 
thus requiring subquery evaluation to look outside its scope.
*** The subquery depends on the outer query for its values, so the subquery 
must be executed once for each row of the outer query.  Also known as 
_repeating Subqueries_.

h5. Tasks
* Rewrite {{IN (<column-subquery>)}} as a {{LEFT SEMI JOIN}}.
** Not ready for public consumption.  In particular, no check for correlated 
terms.
** With test queries.
* Add check for correlated terms, return informative error message.
* Rewrite {{WHERE NOT IN (<column-subquery>)}} as a {{LEFT OUTER JOIN}}.
** Return rows that don't match the right side
* Rewrite subqueries in {{HAVING}}, using {{LEFT SEMI JOIN}} and {{LEFT OUTER 
JOIN}} as above.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to