Hi folks, I have created a table in the following manner: CREATE EXTERNAL TABLE IF NOT EXISTS rum_beacon_partition ( .... list of columns
) COMMENT 'User Infomation' PARTITIONED BY (account_id String, product String, group_id String, year String, month String, day String) STORED AS PARQUET LOCATION '/stream2store/nt_tp_collation' I then ran MSCK REPAIR TABLE to generate the partition information. I think partitions got generated correctly -- here is a query and it's output: "show table extended like 'rum_beacon_partition' partition(account_id='XXXX',product='rum',group_id='XXXX',year='2017',month='09',day='12') location:ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation /account_id=XXXX/product=rum/group_id= XXXX/year=2017/month=09/day=12 However, it does appear that when I issue a SQL query, the predicates do not correctly limit the files touched: explain extended select uri from rum_beacon_partition where account_id='XXXX' and product='rum' and group_id='XXXX' and year='2017' and month='09' limit 2 Produces output that seems to indicate that every file is being touched (unless I'm misreading the output). It also crashes my filesystem so I suspect there is some truth to it. Optimized logical plan looks fine I think: == Optimized Logical Plan == | | Limit 2 | | Project [uri#16519] | | Filter (((((account_id#16511 = XXXX) && (product#16512 = rum)) && (group_id#16513 = XXXX)) && (year#16514 = 2017)) && (month#16515 = 09)) | But in the physical plan it seems that a ton of files are touched (both in account and date partitions) Scan ParquetRelation[ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=16,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=17,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=18,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=19,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=20,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=21,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=22,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=23,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=24,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=25,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=26,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=27,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=28,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=29,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=30,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=05/day=31,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=06/day=01,ceph://ceph-mon.service.consul:6789/stream2store/nt_tp_collation/account_id=XXXX/product=rum/group_id=XXXX/year=2017/month=06/day=02 I am hoping someone can offer debugging tips / advice on what to look for in the logs. I'm on a pretty old version of Spark (1.5.2) but this seems like something that I'm doing wrong.