anybody run up against this one? hive 2.1.0 + using a "not in" on a list
+ the column is a partition key participant.
* using not
query:
explain
SELECT count(*)
FROM bi.fact_email_funnel
WHERE
event_date_key = 20160824
AND etl_source_database *not* in ('foo')
output frag:
Map Operator Tree:
TableScan
alias: fact_email_funnel
Statistics: Num rows: 33723530 Data size: 12154626048 Basic
stats: COMPLETE Column stats: NONE
Filter Operator
*predicate: false (type: boolean) #### this kills any hope
of the query returning anything.*
Statistics: Num rows: 1 Data size: 360 Basic stats: COMPLETE
Column stats: NONE
Group By Operator
aggregations: count()
* not using not
query:
explain
SELECT count(*)
FROM bi.fact_email_funnel
WHERE
event_date_key = 20160824
AND etl_source_database in ('foo')
output frag:
Map Operator Tree:
TableScan
alias: fact_email_funnel
Statistics: Num rows: 33744659 Data size: 14562180096 Basic
stats: COMPLETE Column stats: NONE
Select Operator *###doesn't even mention a filter*
Statistics: Num rows: 33744659 Data size: 14562180096 Basic
stats: COMPLETE Column stats: NONE
Group By Operator
aggregations: count()
* show create table output:
CREATE TABLE `bi.fact_email_funnel`(
`user_id` string,
`email` string,
`ipaddress` string,
`email_type_key` string,
`email_type_id` string,
`email_type_name` string,
`tracking_code` string,
`action_type` string,
`email_guid` string,
`mailing_id` string,
`message_id` string,
`delivered_status` string,
`delivered_error_code` string,
`click_link_id` string,
`click_link` string,
`click_property_id` string,
`platform` string,
`etl_source_id_column` string,
`etl_source_id` string,
`event_datetime` timestamp,
`event_time_key` bigint,
`id_job` bigint,
`load_date` timestamp,
`version` string,
`event_datetime_utc` timestamp,
`num_leads` int COMMENT 'number of leads',
`lead_product_type` string COMMENT 'product lead type <single family,
rental community>',
`property_impression`
array<struct<propertyid:string,propertystatus:string,city:string,statecode:string,order:string>>
COMMENT 'sent property attributes',
`search_impression` map<string,string> COMMENT 'search attributes')
PARTITIONED BY (
`event_date_key` bigint,
`etl_source_database` string,
`etl_source_table` string)