I ran into an interesting bug. Basically, if your FROM() source is a partitioned table and you use a where clause that prunes, all of the INSERT HERE SELECT * WHERE x=y ignores each specified where clause. This does not occur if the source partition is not specified, but if the source as where partition = 'x' then the where on each individual insert is ignored...
I've included some files here testdata.tsv - Tab delimited data to prove the issue create_tables.hive - Creates a database and tables as well as loads the data from the TSV Test Cases: I created these test case files in a way that there are three types of insert in each case: 1. Load all data from initial statement, 2. Load partial data (use a limiting clause such as where day >= '2013-01-05', and 3 Load NO data from the initial statement (where 1 = 0) These tests are all run on hive 0.9 multi-flat-flat.hive - The source table and the dest tables are not partitioned, the where clauses work as expected: 19 Rows loaded to multi_bug_flat 0 Rows loaded to multi_bug_flat3 15 Rows loaded to multi_bug_flat2 multi-part-part.hive - The source table and the dest tables are partitioned. The where clauses are not honored. 9 Rows loaded to multi_bug_part3 9 Rows loaded to multi_bug_part2 9 Rows loaded to multi_bug_part multi-flat-part.hive - The source table is flat, the dest table is partitioned - The where clauses work as expected: 0 Rows loaded to multi_bug_part3 15 Rows loaded to multi_bug_part2 19 Rows loaded to multi_bug_part multi-part-flat.hive - The source table is partitioned, the dest table is flat - The where clauses are not honored: 9 Rows loaded to multi_bug_flat 9 Rows loaded to multi_bug_flat3 9 Rows loaded to multi_bug_flat2 multi-part-specified.hive - The source and dest are partitioned, but there is no partition pruning statement in the from () this works as expected 0 Rows loaded to multi_bug_part3 15 Rows loaded to multi_bug_part2 19 Rows loaded to multi_bug_part Thoughts?
create_tables.hive
Description: Binary data
multi-flat-flat.hive
Description: Binary data
multi-flat-part.hive
Description: Binary data
multi-part-flat.hive
Description: Binary data
multi-part-part.hive
Description: Binary data
multi-part-specified.hive
Description: Binary data
2013-01-09 00:00:00 | 2013-01-09 23:59:58 | 2013-01-09 |
2013-01-12 00:00:00 | 2013-01-12 23:59:59 | 2013-01-12 |
2013-01-03 00:00:00 | 2013-01-03 23:59:58 | 2013-01-03 |
2013-01-18 00:00:06 | 2013-01-18 23:59:58 | 2013-01-18 |
2013-01-15 00:00:00 | 2013-01-15 23:59:59 | 2013-01-15 |
2013-01-02 00:00:00 | 2013-01-02 23:59:59 | 2013-01-02 |
2013-01-17 00:00:03 | 2013-01-17 23:59:55 | 2013-01-17 |
2013-01-05 00:00:00 | 2013-01-05 23:59:59 | 2013-01-05 |
2013-01-07 00:00:00 | 2013-01-07 23:59:59 | 2013-01-07 |
2013-01-10 00:00:01 | 2013-01-10 23:59:58 | 2013-01-10 |
2013-01-06 00:00:00 | 2013-01-06 23:59:59 | 2013-01-06 |
2013-01-13 00:00:00 | 2013-01-13 23:59:59 | 2013-01-13 |
2013-01-01 00:00:01 | 2013-01-01 23:59:58 | 2013-01-01 |
2013-01-16 00:00:00 | 2013-01-16 23:59:58 | 2013-01-16 |
2013-01-08 00:00:00 | 2013-01-08 23:59:59 | 2013-01-08 |
2013-01-11 00:00:00 | 2013-01-11 23:59:59 | 2013-01-11 |
2013-01-04 00:00:00 | 2013-01-04 23:59:59 | 2013-01-04 |
2013-01-19 00:00:01 | 2013-01-19 23:59:53 | 2013-01-19 |
2013-01-14 00:00:00 | 2013-01-14 23:59:58 | 2013-01-14 |