[jira] [Created] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-03 Thread Xu Cang (Jira)
Xu Cang created YARN-10516:
--

 Summary: In HA mode, when one Resource Manager has networking 
issue, getTokenService() should not throw runtime exception
 Key: YARN-10516
 URL: https://issues.apache.org/jira/browse/YARN-10516
 Project: Hadoop YARN
  Issue Type: Improvement
  Components: client
Reporter: Xu Cang


We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 

 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
Is being called,  "yarnConf.getSocketAddr" will throw runtime exception, more 
specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be resolved.

This runtime exception then floats all the way into our application and cause 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly. 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-03 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Description: 
We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
is being called,    "yarnConf.getSocketAddr" will throw runtime exception, more 
specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
 while one of the RM host was having networking issue that IP cannot be 
resolved.

This runtime exception then floats all the way up to our application and causes 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly, instead of failing the whole action. 

 

 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.

  was:
We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 

 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
Is being called,  "yarnConf.getSocketAddr" will throw runtime exception, more 
specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
while one of the RM host was having networking issue that IP cannot be resolved.

This runtime exception then floats all the way into our application and cause 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly. 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.


> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    "yarnConf.getSocketAddr" will throw runtime exception, 
> more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  while one of the RM host was having networking issue that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-03 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Description: 
We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
is being called,    buildTokenService()  fails and will throw runtime 
exception, more specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
 *while one of the RM host was having networking issue* that IP cannot be 
resolved.

This runtime exception then floats all the way up to our application and causes 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly, instead of failing the whole action. 

 

 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.

  was:
We have observed one issue from YARN client around this piece of code:

[https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]

 

While 
{code:java}
services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
defaultAddr, defaultPort)) .toString());
 
{code}
is being called,    "yarnConf.getSocketAddr" will throw runtime exception, more 
specifically, UnknownHostException from here: 
[https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
 while one of the RM host was having networking issue that IP cannot be 
resolved.

This runtime exception then floats all the way up to our application and causes 
MR job submission failed. 

In my opinion, since we have HA here, multiple RMs are still alive and 
available. We should catch this exception in  getTokenService() and handle it 
properly, instead of failing the whole action. 

 

 

Would like to hear your opinion on this, if agreed, I will provide a patch on 
this. Thank you.


> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-09 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: (was: YARN-10516.001.patch)

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-09 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: YARN-10516.001.patch

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-09 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: YARN-10516.002.patch

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-09 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: YARN-10516.003.patch

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-09 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: YARN-10516.004.patch

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Updated] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-10 Thread Xu Cang (Jira)


 [ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xu Cang updated YARN-10516:
---
Attachment: YARN-10516.007.patch

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch, YARN-10516.007.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-10 Thread Xu Cang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247393#comment-17247393
 ] 

Xu Cang commented on YARN-10516:


[~epayne] [~hexiaoqiao] would you please review this Jira and patch? thanks!

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch, YARN-10516.007.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-10 Thread Xu Cang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247393#comment-17247393
 ] 

Xu Cang edited comment on YARN-10516 at 12/10/20, 6:05 PM:
---

[~epayne] [~hexiaoqiao] [~Jim_Brennan]  would you please review this Jira and 
patch? thanks!


was (Author: xucang):
[~epayne] [~hexiaoqiao] would you please review this Jira and patch? thanks!

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch, YARN-10516.007.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-10 Thread Xu Cang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247393#comment-17247393
 ] 

Xu Cang edited comment on YARN-10516 at 12/10/20, 11:50 PM:


[~shahrs87] 
[~epayne] [~hexiaoqiao] [~Jim_Brennan]  would you please review this Jira and 
patch? thanks!


was (Author: xucang):
[~epayne] [~hexiaoqiao] [~Jim_Brennan]  would you please review this Jira and 
patch? thanks!

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch, YARN-10516.007.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-10516) In HA mode, when one Resource Manager has networking issue, getTokenService() should not throw runtime exception

2020-12-22 Thread Xu Cang (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-10516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253675#comment-17253675
 ] 

Xu Cang commented on YARN-10516:


[~epayne] [~hexiaoqiao] [~Jim_Brennan]

Hi, would love to get some review on this, thank you

> In HA mode, when one Resource Manager has networking issue, getTokenService() 
> should not throw runtime exception
> 
>
> Key: YARN-10516
> URL: https://issues.apache.org/jira/browse/YARN-10516
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: client
>Reporter: Xu Cang
>Priority: Minor
> Attachments: YARN-10516.001.patch, YARN-10516.002.patch, 
> YARN-10516.003.patch, YARN-10516.004.patch, YARN-10516.007.patch
>
>
> We have observed one issue from YARN client around this piece of code:
> [https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-common/src/main/java/org/apache/hadoop/yarn/client/ClientRMProxy.java#L145]
>  
> While 
> {code:java}
> services.add(SecurityUtil.buildTokenService( yarnConf.getSocketAddr(address, 
> defaultAddr, defaultPort)) .toString());
>  
> {code}
> is being called,    buildTokenService()  fails and will throw runtime 
> exception, more specifically, UnknownHostException from here: 
> [https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/security/SecurityUtil.java#L466]
>  *while one of the RM host was having networking issue* that IP cannot be 
> resolved.
> This runtime exception then floats all the way up to our application and 
> causes MR job submission failed. 
> In my opinion, since we have HA here, multiple RMs are still alive and 
> available. We should catch this exception in  getTokenService() and handle it 
> properly, instead of failing the whole action. 
>  
>  
> Would like to hear your opinion on this, if agreed, I will provide a patch on 
> this. Thank you.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org