Are you running nimbus, supervisors in background? looks like you are sshing into machines and running ./bin/storm nimbus in foreground which will get killed when you exit the ssh session. Make sure you use supervisord http://supervisord.org/ to run nimbus, supervisors.
On Sat, Feb 7, 2015, at 11:04 AM, Vineet Mishra wrote: > > Including Subject! > > On Sun, Feb 8, 2015 at 12:33 AM, Vineet Mishra > <clearmido...@gmail.com> wrote: >> Hi All, >> >> I am running a Kafka Storm topology in distributed mode, its running >> good for the initial run when I start the cluster(3 node cluster) >> deploy the Storm Topology and leave it to run. There are often times >> the whole cluster goes down(nimbus, supervisor, workers) and this is >> most of the time happening when I submit the topology to run and >> disconnect my session from the machine. >> >> I could fairly notice that on one of the worker node its throwing the >> error : >> >> java.lang.RuntimeException: java.lang.RuntimeException: Client is >> being closed, and does not take requests any more >> >> Config and Detailed Stack Trance is provided below. >> >> Node 1 - Nimbus, UI Node 2 - Supervisor, Worker Node 3 - >> Supervisor, Worker >> >> 2015-02-07T23:01:25.884+0530 b.s.d.worker [INFO] Shutting down worker >> KafkaConsumerTopologyy-1-1423329275 >> 9d98d0b4-1bb4-42e9-9a72-a67b82c64b2c 6703 >> 2015-02-07T23:01:25.884+0530 b.s.m.n.Client [INFO] Closing Netty >> Client Netty-Client-ip-20-0-0-78/20.0.0.78:6703 >> 2015-02-07T23:01:25.885+0530 b.s.m.n.Client [INFO] Waiting for >> pending batchs to be sent with >> Netty-Client-ip-20-0-0-78/20.0.0.78:6703..., timeout: 600000ms, >> pendings: 0 2015-02-07T23:01:25.886+0530 b.s.d.worker [INFO] Shutting >> down receive thread 2015-02-07T23:01:25.886+0530 >> o.a.s.c.r.ExponentialBackoffRetry [WARN] maxRetries too large (300). >> Pinning to 29 2015-02-07T23:01:25.886+0530 >> b.s.u.StormBoundedExponentialBackoffRetry [INFO] The baseSleepTimeMs >> [100] the maxSleepTimeMs [1000] the maxRetries [300] >> 2015-02-07T23:01:25.887+0530 b.s.m.n.Client [INFO] New Netty Client, >> connect to localhost, 6703, config: , buffer_size: 5242880 >> 2015-02-07T23:01:25.887+0530 b.s.m.n.Client [INFO] Reconnect started >> for Netty-Client-localhost/127.0.0.1:6703... [0] >> 2015-02-07T23:01:25.887+0530 b.s.m.loader [INFO] Shutting down >> receiving-thread: [KafkaConsumerTopologyy-1-1423329275, 6703] >> 2015-02-07T23:01:25.893+0530 b.s.m.n.Client [INFO] connection >> established to a remote host Netty-Client-localhost/127.0.0.1:6703, >> [id: 0x8f71aaa0, /127.0.0.1:59427 => localhost/127.0.0.1:6703] >> 2015-02-07T23:01:25.893+0530 b.s.m.n.Client [INFO] Closing Netty >> Client Netty-Client-localhost/127.0.0.1:6703 >> 2015-02-07T23:01:25.893+0530 b.s.m.n.Client [INFO] Waiting for >> pending batchs to be sent with >> Netty-Client-localhost/127.0.0.1:6703..., timeout: 600000ms, >> pendings: 0 2015-02-07T23:01:25.894+0530 b.s.m.loader [INFO] Waiting >> for receiving-thread:[KafkaConsumerTopologyy-1-1423329275, 6703] to >> die 2015-02-07T23:01:25.895+0530 b.s.m.loader [INFO] Shutdown >> receiving-thread: [KafkaConsumerTopologyy-1-1423329275, 6703] >> 2015-02-07T23:01:25.895+0530 b.s.d.worker [INFO] Shut down receive >> thread 2015-02-07T23:01:25.895+0530 b.s.d.worker [INFO] Terminating >> messaging context 2015-02-07T23:01:25.895+0530 b.s.d.worker [INFO] >> Shutting down executors 2015-02-07T23:01:25.895+0530 b.s.d.executor >> [INFO] Shutting down executor KafkaSpout:[3 3] >> 2015-02-07T23:01:25.896+0530 b.s.util [INFO] Async loop interrupted! >> 2015-02-07T23:01:25.896+0530 b.s.util [INFO] Async loop interrupted! >> 2015-02-07T23:01:25.897+0530 b.s.util [ERROR] Async loop died! >> java.lang.RuntimeException: java.lang.RuntimeException: Client is >> being closed, and does not take requests any more at >> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:128) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:99) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:80) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.disruptor$consume_loop_STAR_$fn__1460.invoke(disruptor.clj:94) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.util$async_loop$fn__464.invoke(util.clj:463) >> ~[storm-core-0.9.3.jar:0.9.3] at clojure.lang.AFn.run(AFn.java:24) >> [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) >> [na:1.7.0_75] Caused by: java.lang.RuntimeException: Client is being >> closed, and does not take requests any more at >> backtype.storm.messaging.netty.Client.send(Client.java:185) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.utils.TransferDrainer.send(TransferDrainer.java:54) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730$fn__3731.invoke(worker.clj:330) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.daemon.worker$mk_transfer_tuples_handler$fn__3730.invoke(worker.clj:328) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.disruptor$clojure_handler$reify__1447.onEvent(disruptor.clj:58) >> ~[storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:125) >> ~[storm-core-0.9.3.jar:0.9.3] ... 6 common frames omitted >> 2015-02-07T23:01:25.900+0530 o.a.z.ZooKeeper [INFO] Session: >> 0x14b58cc25de0115 closed 2015-02-07T23:01:25.900+0530 >> o.a.z.ClientCnxn [INFO] EventThread shut down >> 2015-02-07T23:01:25.900+0530 b.s.d.executor [INFO] Shut down executor >> KafkaSpout:[3 3] 2015-02-07T23:01:25.901+0530 b.s.d.executor [INFO] >> Shutting down executor KafkaSpout:[5 5] 2015-02-07T23:01:25.901+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.901+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.903+0530 >> o.a.z.ZooKeeper [INFO] Session: 0x14b58cc25de0117 closed >> 2015-02-07T23:01:25.903+0530 o.a.z.ClientCnxn [INFO] EventThread shut >> down 2015-02-07T23:01:25.903+0530 b.s.d.executor [INFO] Shut down >> executor KafkaSpout:[5 5] 2015-02-07T23:01:25.903+0530 b.s.d.executor >> [INFO] Shutting down executor KafkaSpout:[7 7] >> 2015-02-07T23:01:25.904+0530 b.s.util [INFO] Async loop interrupted! >> 2015-02-07T23:01:25.904+0530 b.s.util [INFO] Async loop interrupted! >> 2015-02-07T23:01:25.905+0530 o.a.z.ZooKeeper [INFO] Session: >> 0x14b58cc25de0114 closed 2015-02-07T23:01:25.905+0530 >> o.a.z.ClientCnxn [INFO] EventThread shut down >> 2015-02-07T23:01:25.906+0530 b.s.d.executor [INFO] Shut down executor >> KafkaSpout:[7 7] 2015-02-07T23:01:25.906+0530 b.s.d.executor [INFO] >> Shutting down executor KafkaSpout:[9 9] 2015-02-07T23:01:25.906+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.906+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.906+0530 >> b.s.util [ERROR] Halting process: ("Async loop died!") >> java.lang.RuntimeException: ("Async loop died!") at >> backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) >> [storm-core-0.9.3.jar:0.9.3] at >> clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na] at >> backtype.storm.disruptor$consume_loop_STAR_$fn__1458.invoke(disruptor.clj:92) >> [storm-core-0.9.3.jar:0.9.3] at >> backtype.storm.util$async_loop$fn__464.invoke(util.clj:473) >> [storm-core-0.9.3.jar:0.9.3] at clojure.lang.AFn.run(AFn.java:24) >> [clojure-1.5.1.jar:na] at java.lang.Thread.run(Thread.java:745) >> [na:1.7.0_75] 2015-02-07T23:01:25.908+0530 o.a.z.ZooKeeper [INFO] >> Session: 0x14b58cc25de0116 closed 2015-02-07T23:01:25.908+0530 >> o.a.z.ClientCnxn [INFO] EventThread shut down >> 2015-02-07T23:01:25.908+0530 b.s.d.executor [INFO] Shut down executor >> KafkaSpout:[9 9] 2015-02-07T23:01:25.909+0530 b.s.d.executor [INFO] >> Shutting down executor __acker:[11 11] 2015-02-07T23:01:25.909+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.909+0530 >> b.s.util [INFO] Async loop interrupted! 2015-02-07T23:01:25.909+0530 >> b.s.d.executor [INFO] Shut down executor __acker:[11 11] >> 2015-02-07T23:01:25.909+0530 b.s.d.executor [INFO] Shutting down >> executor __system:[-1 -1] 2015-02-07T23:01:25.910+0530 b.s.util >> [INFO] Async loop interrupted! 2015-02-07T23:01:25.910+0530 b.s.util >> [INFO] Async loop interrupted! 2015-02-07T23:01:25.910+0530 >> b.s.d.executor [INFO] Shut down executor __system:[-1 -1] >> 2015-02-07T23:01:25.910+0530 b.s.d.executor [INFO] Shutting down >> executor FileBolt:[1 1] 2015-02-07T23:01:25.910+0530 b.s.util [INFO] >> Async loop interrupted! 2015-02-07T23:01:25.910+0530 b.s.util [INFO] >> Async loop interrupted! 2015-02-07T23:01:25.911+0530 b.s.d.executor >> [INFO] Shut down executor FileBolt:[1 1] 2015-02-07T23:01:25.911+0530 >> b.s.d.worker [INFO] Shut down executors 2015-02-07T23:01:25.916+0530 >> b.s.d.worker [INFO] Shutting down transfer thread >> >> URGENT CALL! >> >> Thanks! >