[ https://issues.apache.org/jira/browse/HIVE-24936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Harish JP resolved HIVE-24936. ------------------------------ Fix Version/s: 4.0.0 Target Version/s: 4.0.0 Resolution: Fixed > Fix file name parsing and copy file move. > ----------------------------------------- > > Key: HIVE-24936 > URL: https://issues.apache.org/jira/browse/HIVE-24936 > Project: Hive > Issue Type: Bug > Components: HiveServer2 > Reporter: Harish JP > Assignee: Harish JP > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 1h 40m > Remaining Estimate: 0h > > The taskId and taskAttemptId is not extracted correctly for copy files > (00001_02_copy_3) and when doing a move file of an incompatible copy file the > rename utility generates wrong file names. Ex: 00001_02_copy_3 is renamed to > 00001_02_copy_3_1 if 00001_02_copy_3 already exists, ideally it should be > 00001_02_copy_N. > > Incompatible files should be always renamed using the current task or it can > get deleted if the file name conflicts with another task output file. Ex: if > the input file name for a task is 00005_01 and is incompatible then if we > move this file, it will be treated as an output file for task id 5, attempt 1 > which if exists will try to generate the same file and fail and another > attempt will be made. There will be 2 files 00005_01, 00005_02, the deduping > code will remove 00005_01 resulting in data loss. There are other scenarios > where the same can happen. -- This message was sent by Atlassian Jira (v8.3.4#803005)