Harsh J created HIVE-11325:
------------------------------

             Summary: Infinite loop in HiveHFileOutputFormat
                 Key: HIVE-11325
                 URL: https://issues.apache.org/jira/browse/HIVE-11325
             Project: Hive
          Issue Type: Bug
          Components: HBase Handler
    Affects Versions: 1.0.0
            Reporter: Harsh J


No idea why {{hbase_handler_bulk.q}} does not catch this if its being run 
regularly in Hive builds, but here's the gist of the issue:

The condition at 
https://github.com/apache/hive/blob/master/hbase-handler/src/java/org/apache/hadoop/hive/hbase/HiveHFileOutputFormat.java#L152-L164
 indicates that we will infinitely loop until we find a file whose last path 
component (the name) is equal to the column family name.

In execution, however, the iteration enters an actual infinite loop cause the 
file we end up considering as the srcDir name, is actually the region file, 
whose name will never match the family name.

This is an example of the IPC the listing loop of a 100% progress task gets 
stuck in:

{code}
2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
1: Call -> cdh54.vm/172.16.29.132:8020: getListing {src: 
"/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c"
 startAfter: "" needLocation: false}
2015-07-21 10:32:20,662 DEBUG [IPC Parameter Sending Thread #1] 
org.apache.hadoop.ipc.Client: IPC Client (1551465414) connection to 
cdh54.vm/172.16.29.132:8020 from hive sending #510346
2015-07-21 10:32:20,662 DEBUG [IPC Client (1551465414) connection to 
cdh54.vm/172.16.29.132:8020 from hive] org.apache.hadoop.ipc.Client: IPC Client 
(1551465414) connection to cdh54.vm/172.16.29.132:8020 from hive got value 
#510346
2015-07-21 10:32:20,662 DEBUG [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
Call: getListing took 0ms
2015-07-21 10:32:20,662 TRACE [main] org.apache.hadoop.ipc.ProtobufRpcEngine: 
1: Response <- cdh54.vm/172.16.29.132:8020: getListing {dirList { 
partialListing { fileType: IS_FILE path: "" length: 863 permission { perm: 4600 
} owner: "hive" group: "hive" modification_time: 1437454718130 access_time: 
1437454717973 block_replication: 1 blocksize: 134217728 fileId: 33960 
childrenNum: 0 storagePolicy: 0 } remainingEntries: 0 }}
{code}

The path we are getting out of the listing results is 
{{/user/hive/warehouse/hbase_test/_temporary/1/_temporary/attempt_1436935612068_0011_m_000000_0/family/97112ac1c09548ae87bd85af072d2e8c}},
 but instead of checking the path's parent {{family}} we're instead looping 
infinitely over its hashed filename {{97112ac1c09548ae87bd85af072d2e8c}} cause 
it does not match {{family}}.

It stays in the infinite loop therefore, until the MR framework kills it away 
due to an idle task timeout (and then since the subsequent task attempts fail 
outright, the job fails).

While doing a {{getPath().getParent()}} will resolve that, is that infinite 
loop even necessary? Especially given the fact that we throw exceptions if 
there are no entries or there is more than one entry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to