[ 
https://issues.apache.org/jira/browse/STORM-2706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2706.
----------------------------------------
       Resolution: Fixed
    Fix Version/s: 1.1.2
                   1.2.0

Thanks [~Srdo],

I merged this into branch-1.x and branch-1.1.x.  It didn't apply cleanly to 
1.0.x so I didn't pull it in there.

> Nimbus stuck in exception and does not fail fast
> ------------------------------------------------
>
>                 Key: STORM-2706
>                 URL: https://issues.apache.org/jira/browse/STORM-2706
>             Project: Apache Storm
>          Issue Type: Bug
>    Affects Versions: 1.1.1
>            Reporter: Bijan Fahimi Shemrani
>            Assignee: Stig Rohde Døssing
>              Labels: nimbus, pull-request-available
>             Fix For: 2.0.0, 1.2.0, 1.1.2
>
>          Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> We experience a problem in nimbus which leads it to get stuck in a retry and 
> fail loop. When I manually restart the nimbus it works again as expected. 
> However, it would be great if nimbus would shut down so our monitoring can 
> automatically restart the nimbus. 
> The nimbus log. 
> {noformat}
> 24.8.2017 15:39:1913:39:19.804 [pool-13-thread-51] ERROR 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - 
> Unexpected throwable while invoking!
> 24.8.2017 
> 15:39:19org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
>  KeeperErrorCode = NoNode for /storm/leader-lock
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown 
> Source) ~[?:?]
> 24.8.2017 15:39:19    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
> 24.8.2017 15:39:19    at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131]
> 24.8.2017 15:39:19    at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19    at 
> clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19    at 
> org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown 
> Source) ~[?:?]
> 24.8.2017 15:39:19    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
> 24.8.2017 15:39:19    at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131]
> 24.8.2017 15:39:19    at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19    at 
> clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:19    at 
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getLeader(nimbus.clj:2412)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3944)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.generated.Nimbus$Processor$getLeader.getResult(Nimbus.java:3928)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:19    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_131]
> 24.8.2017 15:39:19    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_131]
> 24.8.2017 15:39:19    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 24.8.2017 15:39:2713:39:27.205 [pool-13-thread-52] ERROR 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer - 
> Unexpected throwable while invoking!
> 24.8.2017 
> 15:39:27org.apache.storm.shade.org.apache.zookeeper.KeeperException$NoNodeException:
>  KeeperErrorCode = NoNode for /storm/leader-lock
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1590)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:230)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:219)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.RetryLoop.callWithRetry(RetryLoop.java:109)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:216)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:207)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:40)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getSortedChildren(LockInternals.java:151)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.locks.LockInternals.getParticipantNodes(LockInternals.java:133)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.shade.org.apache.curator.framework.recipes.leader.LeaderLatch.getLeader(LeaderLatch.java:453)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown 
> Source) ~[?:?]
> 24.8.2017 15:39:27    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
> 24.8.2017 15:39:27    at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131]
> 24.8.2017 15:39:27    at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27    at 
> clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27    at 
> org.apache.storm.zookeeper$zk_leader_elector$reify__1043.getLeader(zookeeper.clj:296)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown 
> Source) ~[?:?]
> 24.8.2017 15:39:27    at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  ~[?:1.8.0_131]
> 24.8.2017 15:39:27    at java.lang.reflect.Method.invoke(Method.java:498) 
> ~[?:1.8.0_131]
> 24.8.2017 15:39:27    at 
> clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27    at 
> clojure.lang.Reflector.invokeNoArgInstanceMember(Reflector.java:313) 
> ~[clojure-1.7.0.jar:?]
> 24.8.2017 15:39:27    at 
> org.apache.storm.daemon.nimbus$get_cluster_info.invoke(nimbus.clj:1544) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.daemon.nimbus$mk_reified_nimbus$reify__10780.getClusterInfo(nimbus.clj:2006)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3920)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.generated.Nimbus$Processor$getClusterInfo.getResult(Nimbus.java:3904)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.thrift.ProcessFunction.process(ProcessFunction.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.security.auth.SimpleTransportPlugin$SimpleWrapProcessor.process(SimpleTransportPlugin.java:162)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.thrift.server.AbstractNonblockingServer$FrameBuffer.invoke(AbstractNonblockingServer.java:518)
>  ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> org.apache.storm.thrift.server.Invocation.run(Invocation.java:18) 
> ~[storm-core-1.1.1.jar:1.1.1]
> 24.8.2017 15:39:27    at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [?:1.8.0_131]
> 24.8.2017 15:39:27    at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [?:1.8.0_131]
> 24.8.2017 15:39:27    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
> 24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping assignments
> 24.8.2017 15:39:2913:39:29.270 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping cleanup
> 24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping assignments
> 24.8.2017 15:39:3913:39:39.270 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping cleanup
> 24.8.2017 15:39:4913:39:49.271 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping assignments
> 24.8.2017 15:39:4913:39:49.272 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping cleanup
> 24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping assignments
> 24.8.2017 15:39:5913:39:59.272 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping cleanup
> 24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping assignments
> 24.8.2017 15:40:0913:40:09.272 [timer] INFO  org.apache.storm.daemon.nimbus - 
> not a leader, skipping cleanup
> 24.8.2017 15:40:1313:40:13.806 [timer] INFO  
> org.apache.storm.shade.org.apache.curator.framework.imps.CuratorFrameworkImpl 
> - Starting
> 24.8.2017 15:40:1313:40:13.807 [timer] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Initiating client 
> connection, connectString=zookeeper:2181/storm sessionTimeout=20000 
> watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@f90354
> 24.8.2017 15:40:1313:40:13.808 [timer-SendThread(10.42.174.214:2181)] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Opening socket 
> connection to server 10.42.174.214/10.42.174.214:2181. Will not attempt to 
> authenticate using SASL (unknown error)
> 24.8.2017 15:40:1313:40:13.862 [timer-SendThread(10.42.174.214:2181)] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Socket connection 
> established to 10.42.174.214/10.42.174.214:2181, initiating session
> 24.8.2017 15:40:1313:40:13.865 [timer-SendThread(10.42.174.214:2181)] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - Session 
> establishment complete on server 10.42.174.214/10.42.174.214:2181, sessionid 
> = 0x15e14456dc70045, negotiated timeout = 20000
> 24.8.2017 15:40:1313:40:13.910 [timer] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ZooKeeper - Session: 
> 0x15e14456dc70045 closed
> 24.8.2017 15:40:1313:40:13.910 [timer-EventThread] INFO  
> org.apache.storm.shade.org.apache.zookeeper.ClientCnxn - EventThread shut down
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to