awesome!  i'm not crazy after all!

so workaround we devised here for this pattern:

   partition_key not in ('foo')

is:

   not array_contains(array('foo'), partition_key)

thanks,
Stephen.

On Thu, Aug 25, 2016 at 6:31 PM, Sergey Shelukhin <[email protected]>
wrote:

> I can repro this on master. I’ll file a bug...
>
> From: Stephen Sprague <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Thursday, August 25, 2016 at 13:34
> To: "[email protected]" <[email protected]>
> Subject: Re: hive 2.1.0 and "NOT IN ( list )" and column is a
> partition_key
>
> Hi Gopal,
> Thank you for this insight.  good stuff.   The thing is there is no 'foo'
> for etl_database_source so that filter if anything should be
> short-circuited to 'true'.  ie. double nots.   1. not in  2. and foo not
> present.
>
> it doesn't matter what what i put in that "not in" clause the filter
> always comes back false if the column is a partition_key of course.
>
> thanks for the tip on explain extended.... that's some crazy output so i'm
> sifting for clues in that now.   i hear you though - something in there
> with the metastore is at play.
>
> Cheers,
> Stephen.
>
> On Thu, Aug 25, 2016 at 1:12 PM, Gopal Vijayaraghavan <[email protected]>
> wrote:
>
>>
>> > anybody run up against this one?  hive 2.1.0 + using a  "not in" on a
>> >list + the column is a partition key participant.
>>
>> The partition filters are run before the plan is generated.
>>
>> >    AND etl_source_database not in ('foo')
>>
>> Is there a 'foo' in etl_source_database?
>>
>> >             predicate: false (type: boolean)  #### this kills any hope
>> >of the query returning anything.
>> ...
>> >  Select Operator        ###doesn't even mention a filter
>>
>> This is probably good news, because that's an optimization.
>>
>> PrunedPartitionList getPartitionsFromServer(Table tab, final
>> ExprNodeGenericFuncDesc compactExpr ...) {
>> ...
>>           hasUnknownPartitions = Hive.get().getPartitionsByExpr(
>>               tab, compactExpr, conf, partitions);
>> }
>>
>>
>> goes into the metastore and evaluates the IN and NOT IN for partitions
>> ahead of time.
>>
>>
>> So, this could mean that the partition pruning evaluation returned no
>> partitions at all (or just exactly matched partitions only, skipping the
>> filter per-row).
>>
>> In 2.x, you might notice it does a bit fancier things there as well, like
>>
>> select count(1) from table where year*10000 + month*100 + day >= 20160101;
>>
>> https://github.com/apache/hive/blob/master/ql/src/java/org/
>> apache/hadoop/hi
>> ve/ql/optimizer/ppr/PartitionPruner.java#L468
>> <https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ppr/PartitionPruner.java#L468>
>>
>>
>> You can try "explain extended" and see which partitions are selected (&
>> validate that the filter removed was applied already).
>>
>> Cheers,
>> Gopal
>>
>>
>>
>>
>>
>

Reply via email to