[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2024-01-04 Thread Shilun Fan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17802749#comment-17802749
 ] 

Shilun Fan commented on YARN-4971:
--

Bulk update: moved all 3.4.0 non-blocker issues, please move back if it is a 
blocker. Retarget 3.5.0.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2020-09-11 Thread Wangda Tan (Jira)


[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17194378#comment-17194378
 ] 

Wangda Tan commented on YARN-4971:
--

I think we should revisit the patch based on comment from Karthik: 
https://issues.apache.org/jira/browse/YARN-4971?focusedCommentId=15281097=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15281097

I also don't quite understand why the following methods of ClientRMService are 
different: 

One is:
{code:java}
  InetSocketAddress getBindAddress(Configuration conf) {
return conf.getSocketAddr(
YarnConfiguration.RM_BIND_HOST,
YarnConfiguration.RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_ADDRESS,
YarnConfiguration.DEFAULT_RM_PORT);
  } {code}
 

And another one is: 
{code:java}
 clientBindAddress = conf.updateConnectAddr(YarnConfiguration.RM_BIND_HOST,
   YarnConfiguration.RM_ADDRESS,
   
YarnConfiguration.DEFAULT_RM_ADDRESS,
   
server.getListenerAddress());{code}
 

Basically, in serviceInit and serviceStart, how to get RM address is different. 
Is that a potential root cause of the problem? [~wilfreds], [~shuzirra]

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>Priority: Major
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-05-11 Thread Karthik Kambatla (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15281097#comment-15281097
 ] 

Karthik Kambatla commented on YARN-4971:


I must be missing something, but can't figure out why not setting the variable 
helps here. If I understand the code correctly, the individual variables 
{{clientBindAddress}} and {{masterServiceAddress}} are used only in tests and 
the one other place in {{DelegationTokenRenewer}} that Daniel pointed out. 

Both ClientRMService and ApplicationMasterService are part of RMActiveServices. 
On transition to standby, both services are inited again to be started when the 
RM transitions back to active. This code path, in theory at least, shouldn't be 
different from the first time around. 

Am I missing something or misreading the code? 

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: yarn-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-issues-h...@hadoop.apache.org



[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-04-26 Thread Daniel Templeton (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258327#comment-15258327
 ] 

Daniel Templeton commented on YARN-4971:


I agree on [~rchiang]'s test suggestion.

I'm surprised the patch doesn't break any tests since it changes the behavior 
of the {{getBindAddress()}} method, which is used for testing.

I suspect the patch may break with security enabled because the 
{{DelegationTokenRenewer.setLocalSecretManagerAndServiceAddr()}} method relies 
on {{ClientRMService.getBindAddress()}}.  Security code generally doesn't like 
non-specific addresses.  It would be good to do a thorough test with Kerberos 
enabled to verify.  I don't know what operation uses 
{{DelegationTokenRenewer.setLocalSecretManagerAndServiceAddr()}}, but it 
shouldn't be too hard to figure out.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-04-22 Thread Ray Chiang (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15254507#comment-15254507
 ] 

Ray Chiang commented on YARN-4971:
--

+1 (nonbinding).  The only new test I can think of would be to verify that the 
member variable address stays at 0.0.0.0 if it's initially 0.0.0.0--mainly 
useful as a "spec" for the class behavior.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 2.7.2
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
> Attachments: YARN-4971.1.patch
>
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (YARN-4971) RM fails to re-bind to wildcard IP after failover in multi homed clusters

2016-04-20 Thread Wilfred Spiegelenburg (JIRA)

[ 
https://issues.apache.org/jira/browse/YARN-4971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249961#comment-15249961
 ] 

Wilfred Spiegelenburg commented on YARN-4971:
-

During the service init the service bind address is calculated based on all the 
settings and stored as an InetSocketAddress in a local variable. In the startup 
the server is created using that socket address. This all works.
In the {{ApplicationMasterService}} and the {{ClientRMService}} after the 
service start the bind address is updated as part of the config update. This 
update does not happen in the other services. The update is the return value of 
{{updateConnectAddr()}}. This method in its javadoc shows the following text:
bq. The wildcard address is replaced with the local host's address.
Since we store this return value in the bind address we break binding to the 
wildcard address in later cycles.

> RM fails to re-bind to wildcard IP after failover in multi homed clusters
> -
>
> Key: YARN-4971
> URL: https://issues.apache.org/jira/browse/YARN-4971
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: resourcemanager
>Affects Versions: 3.0.0
>Reporter: Wilfred Spiegelenburg
>Assignee: Wilfred Spiegelenburg
>
> If the RM has the {{yarn.resourcemanager.bind-host}} set to 0.0.0.0 the first 
> time the service becomes active binding to the wildcard works as expected. If 
> the service has transitioned from active to standby and then becomes active 
> again after failovers the service only binds to one of the ip addresses.
> There is a difference between the services inside the RM: it only seem to 
> happen for the services listening on ports: 8030 and 8032



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)