[jira] [Created] (HIVE-17941) Don't Re-Create Running Job Client During Status Checks

BELUGA BEHR (JIRA) Mon, 30 Oct 2017 14:05:18 -0700

BELUGA BEHR created HIVE-17941:
----------------------------------

             Summary: Don't Re-Create Running Job Client During Status Checks
                 Key: HIVE-17941
                 URL: https://issues.apache.org/jira/browse/HIVE-17941
             Project: Hive
          Issue Type: Improvement
          Components: HiveServer2
    Affects Versions: 3.0.0, 2.3.1
            Reporter: BELUGA BEHR



{code:java|title=org.apache.hadoop.hive.ql.exec.mr.HadoopJobExecHelper}
while (!rj.isComplete()) {
  ...
        RunningJob newRj = jc.getJob(rj.getID());
        if (newRj == null) {
          // under exceptional load, hadoop may not be able to look up status
          // of finished jobs (because it has purged them from memory). From
          // hive's perspective - it's equivalent to the job having failed.
          // So raise a meaningful exception
          throw new IOException("Could not find status of job:" + rj.getID());
        } else {
          th.setRunningJob(newRj);
          rj = newRj;
        }
      }
  ...
}
{code}

https://github.com/apache/hive/blob/a9f25c0e7ad3f81a9f00f601947a161516e33f1b/ql/src/java/org/apache/hadoop/hive/ql/exec/mr/HadoopJobExecHelper.java#L295-L306

Every time we loop here for a status update, we are rebuilding the RunningJob 
object to test if the Job information is still loaded in YARN.  Rebuilding this 
RunningJob object is not trivial because it requires that we re-load and parse 
the Job Configuration XML file every time.

{code:java|title=Outdated Stacktrace But Same Idea Holds}
at java.io.FileInputStream.open(Native Method)
        at java.io.FileInputStream.<init>(FileInputStream.java:120)
        at 
org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1924)
        at 
org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1877)
        at 
org.apache.hadoop.conf.Configuration.getProps(Configuration.java:1785)
        at org.apache.hadoop.conf.Configuration.get(Configuration.java:712)
        at 
org.apache.hadoop.mapred.JobConf.checkAndWarnDeprecation(JobConf.java:1951)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:398)
        at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:388)
        at 
org.apache.hadoop.mapred.JobClient$NetworkedJob.<init>(JobClient.java:174)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:655)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:668)
        at 
org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:282)
        at 
org.apache.hadoop.hive.ql.exec.HadoopJobExecHelper.progress(HadoopJobExecHelper.java:532)
{code}

Maybe we can be use {{isRetired()}} instead for this particular check.  We also 
probably need to be better about checking the return value from any of the 
{{RunningJob}} methods if it's the case that they can fail/go-away at any time 
if YARN purges the information.  It seems that perhaps this was an attempt to 
detect a purged job before exercising the {{RunningJob}} object... even though 
it can go bad at any point.

https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/mapred/RunningJob.html



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Created] (HIVE-17941) Don't Re-Create Running Job Client During Status Checks

Reply via email to