[ 
https://issues.apache.org/jira/browse/HIVE-277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma resolved HIVE-277.
------------------------------------

    Resolution: Duplicate

> Files created by redundant tasks that have been killed are taken into 
> consideration
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-277
>                 URL: https://issues.apache.org/jira/browse/HIVE-277
>             Project: Hadoop Hive
>          Issue Type: Bug
>         Environment: Hive with Hadoop 0.19.0 on a Linux cluster
>            Reporter: Rodrigo Schmidt
>            Priority: Minor
>
> Hadoop starts redundant tasks (mappers, by default) if it some particular 
> mappers are taking too long to be executed. When one of the redundant tasks 
> finishes, the others are killed. Killed tasks may generate output files 
> (usually empty) and Hive is considering them as part of the job output.
> In my case, I'm profiling one of the mappers in an INSERT OVERWRITE TABLE ... 
> SELECT (map-only) query, and the extra time added by the profiler makes 
> hadoop start a second mapper for the same part of the input. When one of 
> these redundant mappers finishes, the other is killed, and 
> /tmp/hive-xxxx/xxxxxxxxx.10000.insclause-0/ will have the following files:
> _tmp.attempt_XX....XX_XXXX_m_000000_0
> attempt_XX....XX_XXXX_m_000000_0
> attempt_XX....XX_XXXX_m_000000_1
> attempt_XX....XX_XXXX_m_000000_2
> ...
> The first file is empty, but Hive considers it as part of the generated 
> output and tries to load it in the destination table, giving the following 
> error message:
> Loading data to table output_table partition {p=p1}
> Failed with exception Cannot load text files into a table stored as 
> SequenceFile.
> FAILED: Execution Error, return code 1 from 
> org.apache.hadoop.hive.ql.exec.MoveTask
> I'm not sure if the files generated by killed tasks will always be empty. If 
> not, this bug might render the data inconsistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to