So I have an interesting situation where a select from a view - amazingly i
might add - does the right partition elimination in hive v0.80 but in hive
v0.10 it does not. instead it seems to perform like a mere mortal would.
Let me explain:
1. view definition:
create view v1 as select cast(year*10000 + month*100 + day as int)
as date_key, * from big_table_with_partitions;
2. big_table_with_partitions is partitioned by (year int, month int, day
int, hour int)
3. the query. select * from v1 where date_key=20130429 limit 10;
on hive 0.80 running explain we have this:
{code}
| Path -> Alias:
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=01[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=02[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=03[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=04[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=05[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=06[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=07[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=08[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=09[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=10[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=11[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=12[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=13[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=14[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=15[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=16[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=17[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=18[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=19[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=20[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=21[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=22[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=23[bi.log_server_www_vw:raw_logs.fact_www_access_by_hour]
| Path -> Partition:
| hdfs://
namenode1.sv2.trulia.com:8020/user/hive/warehouse/raw_logs.db/fact_www_access_by_hour/year=2013/month=04/day=29/hour=00
| Partition
| base file name: hour=00
| input format: org.apache.hadoop.mapred.SequenceFileInputFormat
| output format:
org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
| partition values:
| day 29
| hour 00
| month 04
| year 2013
{code}
so hive somehow was able to parse the "date_key=20130429" and figure out
that was partition (year=2013/month=04/day=29).
in hive version 0.10 it does not and lets just say i have several years of
data and after about 30 minutes the explain comes back and every partition
is listed in the output (where above it is only one day.)
So i guess my question is: how was hive in v0.80 able to do that partition
elimination? remember i did not specify year, month nor day which are the
partition keys.
thanks,
Stephen.