RE: worker dies after view minutes

Eric Ruel Thu, 23 Jul 2015 10:58:53 -0700


Originally, we had multiple topologies single worker with a maxSpoutpending of 
3, a parallellimshint between 1 and 20 depending on the bolt,a batch size of 
300 and a maxTask of 90



by reducing the batchsize to 2 records, and saw that my workers died always 
after about ~45 seconds... so I just did many tries by changing values of all 
parameters


I still don't totally understand the difference between the maxtask and the 
paralellism hint



I only remember that originally, we preferred to reduce the parallellismHint to 
minimum to avoid the problem caused by 
https://issues.apache.org/jira/browse/STORM-503


note that we use trident, so if I count all bolts I can see under the section 
"Bolt (All time)" in storm UI, I have 137 bolts (including all merges, joins, 
project...)



Eric


________________________________
De : Harsha <[email protected]>
Envoyé : 23 juillet 2015 11:14
À : [email protected]
Objet : Re: worker dies after view minutes

Thanks for update Eric. Could you describe how did you find that maxTask too 
high causing this issue. We are trying to improve debugging of storm topologies 
, this will be helpful for us.
Thanks,
Harhsa

On Thu, Jul 23, 2015, at 07:36 AM, Eric Ruel wrote:


finally the problem was caused by a maxTask too high


________________________________

De : Harsha <[email protected]>
Envoyé : 22 juillet 2015 10:56
À : [email protected]
Objet : Re: worker dies after view minutes

how is your topology code looks like are you throwing any errors from bolt's 
execute method?. It does look like there is a RuntimeException happening
"
Error when processing event
java.lang.RuntimeException:
"
Its up to the user to catch any exception and log or do something with instead 
of throwing it back to worker jvm

-Harsha


On Wed, Jul 22, 2015, at 07:43 AM, Eric Ruel wrote:

Hello


the workers in my topology dies after 1,2 minutes


I tried to change the config about the heartbeat, cluster or local mode, but 
they always die


any idea?


10:38:38.019 ERROR backtype.storm.daemon.worker - Error when processing event

java.lang.RuntimeException: 
org.apache.storm.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-68f27054a7b2-1024

at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44) 
~[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:173) 
~[storm-core-0.9.6.jar:0.9.6]

at 
backtype.storm.cluster$mk_distributed_cluster_state$reify__1919.set_data(cluster.clj:92)
 ~[storm-core-0.9.6.jar:0.9.6]

at 
backtype.storm.cluster$mk_storm_cluster_state$reify__2376.worker_heartbeat_BANG_(cluster.clj:332)
 ~[storm-core-0.9.6.jar:0.9.6]

at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) ~[na:na]

at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.7.0_71]

at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]

at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93) 
~[clojure-1.5.1.jar:na]

at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28) 
~[clojure-1.5.1.jar:na]

at backtype.storm.daemon.worker$do_executor_heartbeats.doInvoke(worker.clj:56) 
~[storm-core-0.9.6.jar:0.9.6]

at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-1.5.1.jar:na]

at 
backtype.storm.daemon.worker$fn__3757$exec_fn__1163__auto____3758$fn__3761.invoke(worker.clj:413)
 ~[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.timer$schedule_recurring$this__1704.invoke(timer.clj:99) 
~[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj:50) 
~[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42) 
[storm-core-0.9.6.jar:0.9.6]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException: 
KeeperErrorCode = ConnectionLoss for 
/workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-68f27054a7b2-1024

at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99) 
~[storm-core-0.9.6.jar:0.9.6]

at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51) 
~[storm-core-0.9.6.jar:0.9.6]

at org.apache.storm.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270) 
~[storm-core-0.9.6.jar:0.9.6]

at 
org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:260)
 ~[storm-core-0.9.6.jar:0.9.6]

at 
org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:256)
 ~[storm-core-0.9.6.jar:0.9.6]

at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107) 
~[storm-core-0.9.6.jar:0.9.6]

at 
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:252)
 ~[storm-core-0.9.6.jar:0.9.6]

at 
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:239)
 ~[storm-core-0.9.6.jar:0.9.6]

at 
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:39)
 ~[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:172) 
~[storm-core-0.9.6.jar:0.9.6]

... 15 common frames omitted

10:38:38.023 ERROR backtype.storm.util - Halting process: ("Error when 
processing an event")

java.lang.RuntimeException: ("Error when processing an event")

at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325) 
[storm-core-0.9.6.jar:0.9.6]

at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]

at 
backtype.storm.daemon.worker$mk_halting_timer$fn__3572.invoke(worker.clj:177) 
[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj:68) 
[storm-core-0.9.6.jar:0.9.6]

at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42) 
[storm-core-0.9.6.jar:0.9.6]

at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]

at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]

RE: worker dies after view minutes

Reply via email to