I can repro this on master. I’ll file a bug... From: Stephen Sprague <[email protected]<mailto:[email protected]>> Reply-To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Date: Thursday, August 25, 2016 at 13:34 To: "[email protected]<mailto:[email protected]>" <[email protected]<mailto:[email protected]>> Subject: Re: hive 2.1.0 and "NOT IN ( list )" and column is a partition_key
Hi Gopal, Thank you for this insight. good stuff. The thing is there is no 'foo' for etl_database_source so that filter if anything should be short-circuited to 'true'. ie. double nots. 1. not in 2. and foo not present. it doesn't matter what what i put in that "not in" clause the filter always comes back false if the column is a partition_key of course. thanks for the tip on explain extended.... that's some crazy output so i'm sifting for clues in that now. i hear you though - something in there with the metastore is at play. Cheers, Stephen. On Thu, Aug 25, 2016 at 1:12 PM, Gopal Vijayaraghavan <[email protected]<mailto:[email protected]>> wrote: > anybody run up against this one? hive 2.1.0 + using a "not in" on a >list + the column is a partition key participant. The partition filters are run before the plan is generated. > AND etl_source_database not in ('foo') Is there a 'foo' in etl_source_database? > predicate: false (type: boolean) #### this kills any hope >of the query returning anything. ... > Select Operator ###doesn't even mention a filter This is probably good news, because that's an optimization. PrunedPartitionList getPartitionsFromServer(Table tab, final ExprNodeGenericFuncDesc compactExpr ...) { ... hasUnknownPartitions = Hive.get().getPartitionsByExpr( tab, compactExpr, conf, partitions); } goes into the metastore and evaluates the IN and NOT IN for partitions ahead of time. So, this could mean that the partition pruning evaluation returned no partitions at all (or just exactly matched partitions only, skipping the filter per-row). In 2.x, you might notice it does a bit fancier things there as well, like select count(1) from table where year*10000 + month*100 + day >= 20160101; https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hi ve/ql/optimizer/ppr/PartitionPruner.java#L468 You can try "explain extended" and see which partitions are selected (& validate that the filter removed was applied already). Cheers, Gopal
