[ https://issues.apache.org/jira/browse/HIVE-20594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
J.P Feng reassigned HIVE-20594: ------------------------------- Assignee: J.P Feng > insert overwrite may brings duplicated data when hdfs path exists but > partition missing in hms > ---------------------------------------------------------------------------------------------- > > Key: HIVE-20594 > URL: https://issues.apache.org/jira/browse/HIVE-20594 > Project: Hive > Issue Type: Bug > Components: Hive > Affects Versions: 2.1.1 > Reporter: J.P Feng > Assignee: J.P Feng > Priority: Major > Attachments: HIVE-20594.patch > > > when i insert overwrite a partitioned table whose hdfs path exists but its > partition is missing from hms, i will get the duplicated data. > > sql: insert overwrite table > hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns partition (month = > '201808' ) select * from xxx where month = '201808'; > > 1. there is 10 files in hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns > month=201808/000001_0 > month=201808/000002_0 ... month=201808/000009_0 > 2. if hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns is a external table > and i drop partition (month=201808) / or in other ways, i drop partition > (month=201808) but do not remove the data under it > 3.insert overwrite table hive_test.temp_fcs2_inv_trx_settle_intf_out_all_2ns > partition (month = '201808' ) select * from xxx where month = '201808' > if in such sql, it generates 9 maps, and may generates 9 files : > month=201808/000001_0 ~ month=201808/000008_0 > > after executing such sql, we may find the file `month=201808/000009_0` will > still remain, then we may get the duplicated data. > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)