Chao Sun created HIVE-16984:
-------------------------------

             Summary: HoS: avoid waiting for RemoteSparkJobStatus::getAppID() 
when remote driver died
                 Key: HIVE-16984
                 URL: https://issues.apache.org/jira/browse/HIVE-16984
             Project: Hive
          Issue Type: Bug
          Components: Spark
            Reporter: Chao Sun
            Assignee: Chao Sun


In HoS, after a RemoteDriver is launched, it may fail to initialize a Spark 
context and thus the ApplicationMaster will die eventually. In this case, there 
are two issues related to RemoteSparkJobStatus::getAppID():

1. Currently we call {{getAppID()}} before starting the monitoring job. For the 
first, it will wait for {{hive.spark.client.future.timeout}}, and for the 
latter, it will wait for {{hive.spark.job.monitor.timeout}}. The error message 
for the latter treats the {{hive.spark.job.monitor.timeout}} as the time 
waiting for the job submission. However, this is inaccurate as it doesn't 
include {{hive.spark.client.future.timeout}}.
2. In case the RemoteDriver suddenly died, currently we still may wait 
hopelessly for the timeouts. This should potentially be avoided if we know that 
the channel has closed between the client and remote driver.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to