[ 
https://issues.apache.org/jira/browse/YARN-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14246125#comment-14246125
 ] 

Steve Loughran commented on YARN-2710:
--------------------------------------

Immediate failure trigger is that the retry handler says "Not retrying because 
failovers (5) exceeded maximum allowed (5)"

If you look at the lines immediately above it, the RM is still bootstrapping 
hence not listening for connections.

The test is probably just starting too early. Recommend changing the 
failure/retry policy to to add some backoff & maybe more retries

{code}
014-12-14 11:40:06,693 INFO  [Thread-442] resourcemanager.RMAuditLogger 
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins     
OPERATION=refreshSuperUserGroupsConfiguration   TARGET=AdminService     
RESULT=SUCCESS
2014-12-14 11:40:06,693 INFO  [Thread-442] conf.Configuration 
(Configuration.java:getConfResourceAsInputStream(2240)) - found resource 
core-site.xml at 
file:/home/jenkins/jenkins-slave/workspace/Hadoop-Yarn-trunk-Java8/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-client/target/test-classes/core-site.xml
2014-12-14 11:40:06,694 INFO  [Thread-442] security.Groups 
(Groups.java:refresh(248)) - clearing userToGroupsMap cache
2014-12-14 11:40:06,694 INFO  [Thread-442] resourcemanager.RMAuditLogger 
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins    
OPERATION=refreshUserToGroupsMappings   TARGET=AdminService     RESULT=SUCCESS
2014-12-14 11:40:06,694 INFO  [Thread-442] resourcemanager.RMAuditLogger 
(RMAuditLogger.java:logSuccess(165)) - USER=jenkins    
OPERATION=transitionToActive    TARGET=RMHAProtocolService      RESULT=SUCCESS
2014-12-14 11:40:07,539 INFO  [Thread-443] 
client.ConfiguredRMFailoverProxyProvider 
(ConfiguredRMFailoverProxyProvider.java:performFailover(100)) - Failing over to 
rm2
2014-12-14 11:40:07,541 WARN  [Thread-443] retry.RetryInvocationHandler 
(RetryInvocationHandler.java:invoke(117)) - Exception while invoking class 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers
 over rm2. Not retrying because failovers (5) exceeded maximum allowed (5)
java.net.ConnectException: Call From asf908.gq1.ygridcore.net/67.195.81.152 to 
asf908.gq1.ygridcore.net:28032 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
        at org.apache.hadoop.ipc.Client.call(Client.java:1472)
        at org.apache.hadoop.ipc.Client.call(Client.java:1399)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
        at com.sun.proxy.$Proxy16.getContainers(Unknown Source)
        at 
org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getContainers(ApplicationClientProtocolPBClientImpl.java:410)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
        at com.sun.proxy.$Proxy17.getContainers(Unknown Source)
        at 
org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getContainers(YarnClientImpl.java:654)
        at 
org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetContainersOnHA(TestApplicationClientProtocolOnHA.java:155)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:483)
        at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
        at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
        at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
        at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
        at 
org.apache.zookeeper.JUnit4ZKTestRunner$LoggedInvokeMethod.evaluate(JUnit4ZKTestRunner.java:52)
        at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)
Caused by: java.net.ConnectException: Connection refused
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:712)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
        at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
        at org.apache.hadoop.ipc.Client.call(Client.java:1438)
        ... 23 more
{code}

> RM HA tests failed intermittently on trunk
> ------------------------------------------
>
>                 Key: YARN-2710
>                 URL: https://issues.apache.org/jira/browse/YARN-2710
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: client
>            Reporter: Wangda Tan
>         Attachments: TestResourceTrackerOnHA-output.2.txt, 
> org.apache.hadoop.yarn.client.TestResourceTrackerOnHA-output.txt
>
>
> Failure like, it can be happened in TestApplicationClientProtocolOnHA, 
> TestResourceTrackerOnHA, etc.
> {code}
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA
> testGetApplicationAttemptsOnHA(org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA)
>   Time elapsed: 9.491 sec  <<< ERROR!
> java.net.ConnectException: Call From asf905.gq1.ygridcore.net/67.195.81.149 
> to asf905.gq1.ygridcore.net:28032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>       at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy17.getApplicationAttempts(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationAttempts(ApplicationClientProtocolPBClientImpl.java:372)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:101)
>       at com.sun.proxy.$Proxy18.getApplicationAttempts(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationAttempts(YarnClientImpl.java:583)
>       at 
> org.apache.hadoop.yarn.client.TestApplicationClientProtocolOnHA.testGetApplicationAttemptsOnHA(TestApplicationClientProtocolOnHA.java:137)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to