[ 
https://issues.apache.org/jira/browse/HIVE-3587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Krehl updated HIVE-3587:
----------------------------

    Description: 
I'm trying to load a table using an INSERT query (1).  Not all the data is 
making it from the original table into the new table.  The query generates 2 
jobs.  The first job takes about 45 minutes with mapred.mapper.class = 
org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileMergeMapper and the second 
takes ~10 seconds with mapred.mapper.class = 
org.apache.hadoop.hive.ql.exec.ExecMapper.  Toward the end (< 2 minutes) of the 
first job a number of IOExceptions are raised (2).  The exceptions are only 
raised in the last mapper task to complete, the other mapper tasks complete 
successfully.  The exceptions indicate that an expected temporary file is 
missing.  The second jobs completes entirely successfully.  According to the 
task tracker web interface the jobs are run sequentially with no overlap.  
However, the second job spawns a number of tasks which rename the very 
temporary files that are the cause of the failures in the first job (3).

(1) 
https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries

(2) Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file 
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
 File does not exist. Holder DFSClient_NONMAPREDUCE_-672101740_1 does not have 
any open files.

(3) Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path 
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0
 to 
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0
 . File size is 3482

  was:

I'm trying to load a table using an INSERT query [1].  Not all the data is 
making it from the original table into the new table.  The query generates 2 
jobs.  The first job takes about 45 minutes with mapred.mapper.class = 
org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileMergeMapper and the second 
takes ~10 seconds with mapred.mapper.class = 
org.apache.hadoop.hive.ql.exec.ExecMapper.  Toward the end (< 2 minutes) of the 
first job a number of IOExceptions are raised [2].  The exceptions are only 
raised in the last mapper task to complete, the other mapper tasks complete 
successfully.  The exceptions indicate that an expected temporary file is 
missing.  The second jobs completes entirely successfully.  According to the 
task tracker web interface the jobs are run sequentially with no overlap.  
However, the second job spawns a number of tasks which rename the very 
temporary files that are the cause of the failures in the first job [3].

[1] 
https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries

[2] Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file 
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
/tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
 File does not exist. Holder DFSClient_NONMAPREDUCE_-672101740_1 does not have 
any open files.

[3] Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path 
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0
 to 
hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0
 . File size is 3482

    
> Lost data during INSERT query
> -----------------------------
>
>                 Key: HIVE-3587
>                 URL: https://issues.apache.org/jira/browse/HIVE-3587
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.9.0
>         Environment: Ubuntu 10.04
> Hadoop MapReduce 0.20.2
> Cloudera 4.1.0
> 3 data/task nodes
>            Reporter: Jim Krehl
>            Priority: Critical
>
> I'm trying to load a table using an INSERT query (1).  Not all the data is 
> making it from the original table into the new table.  The query generates 2 
> jobs.  The first job takes about 45 minutes with mapred.mapper.class = 
> org.apache.hadoop.hive.ql.io.rcfile.merge.RCFileMergeMapper and the second 
> takes ~10 seconds with mapred.mapper.class = 
> org.apache.hadoop.hive.ql.exec.ExecMapper.  Toward the end (< 2 minutes) of 
> the first job a number of IOExceptions are raised (2).  The exceptions are 
> only raised in the last mapper task to complete, the other mapper tasks 
> complete successfully.  The exceptions indicate that an expected temporary 
> file is missing.  The second jobs completes entirely successfully.  According 
> to the task tracker web interface the jobs are run sequentially with no 
> overlap.  However, the second job spawns a number of tasks which rename the 
> very temporary files that are the cause of the failures in the first job (3).
> (1) 
> https://cwiki.apache.org/Hive/languagemanual-dml.html#LanguageManualDML-InsertingdataintoHiveTablesfromqueries
> (2) Example: ERROR org.apache.hadoop.hdfs.DFSClient: Failed to close file 
> /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
> org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on 
> /tmp/hive-hive/hive_2012-10-15_13-45-21_245_1936216192130095423/_task_tmp.-ext-10002/month=2012-01/_tmp.000000_1
>  File does not exist. Holder DFSClient_NONMAPREDUCE_-672101740_1 does not 
> have any open files.
> (3) Example: 2012-10-16 15:36:57,605 INFO RCFileMergeMapper: renamed path 
> hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_task_tmp.-ext-10000/month=2012-01/_tmp.000011_0
>  to 
> hdfs://analysis-hadoop-master/tmp/hive-hive/hive_2012-10-16_14-48-47_633_7033175453889409541/_tmp.-ext-10000/month=2012-01/000011_0
>  . File size is 3482

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to