> WHERE p IN (SELECT p FROM t2)


> here we could argue that Hive could optimize this by computing the sub
>query first, 
> and then do the partition pruning, but sadly I don't think this
>optimisation has been implemented yet

It is implemented already -
<https://issues.apache.org/jira/browse/HIVE-7826>

In Hive-1.x, the optimization doesn't kick in when the partition column
has a UDF wrapped around it.

In Hive-2.0, it does apply even if the partition column is wrapped with a
UDF.

"explain rewrite .... where p IN (Select p from t2);"

will show the rewrite which enables DPP.

> An example of non-deterministic function are rand() and unix_timestamp()
>because it is evaluated differently at each row

Yes, that is exactly right. Another case was TO_DATE() which in Hive-1.x
returned Strings and prevented the removal of partitions.

Cheers,
Gopal



Reply via email to