it does not work
what i see during a strace is
[pid 35275]
open("/data/storm/workers/06f5e71e-ea3f-4ffc-93e5-5263dd12920d/pids/35274",
O_RDWR|O_CREAT|O_EXCL, 0666) = -1 ENOENT (No such file or directory)
[pid 35275] write(59, "2015-07-30T14:17:26.297-0400 b.s.d.worker [ERROR] Error
on initialization of server mk-worker\njava.io.IOException: No such file or
directory\n\tat java.io.UnixFileSystem.createFileExclusively(Native Method)
~[na:1.7.0_67]\n\tat java.io.File.createNewFile(File.java:1006)
~[na:1.7.0_67]\n\tat backtype.storm.util$touch.invoke(util.clj:525)
~[storm-core-0.9.6.jar:0.9.6]\n\tat
backtype.storm.daemon.worker$fn__3757$exec_fn__1163__auto____3758.invoke(worker.clj:401)
~[storm-core-0.9.6.jar:0.9.6]\n\tat
clojure.lang.AFn.applyToHelper(AFn.java:185) [clojure-1.5.1.jar:na]\n\tat
clojure.lang.AFn.applyTo(AFn.java:151) [clojure-1.5.1.jar:na]\n\tat
clojure.core$apply.invoke(core.clj:617) ~[clojure-1.5.1.jar:na]\n\tat
backtype.storm.daemon.worker$fn__3757$mk_worker__3813.doInvoke(worker.clj:393)
[storm-core-0.9.6.jar:0.9.6]\n\tat clojure.lang.RestFn.invoke(RestFn.java:512)
[clojure-1.5.1.jar:na]\n\tat
backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
[storm-core-0.9.6.jar:0.9.6]\n\tat clojure.lang.AFn.applyToHelper"..., 1187
<unfinished ...>
[pid 35275] write(59, "2015-07-30T14:17:26.305-0400 b.s.util [ERROR] Halting
process: (\"Error on initialization\")\njava.lang.RuntimeException: (\"Error on
initialization\")\n\tat
backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.6.jar:0.9.6]\n\tat clojure.lang.RestFn.invoke(RestFn.java:423)
[clojure-1.5.1.jar:na]\n\tat
backtype.storm.daemon.worker$fn__3757$mk_worker__3813.doInvoke(worker.clj:393)
[storm-core-0.9.6.jar:0.9.6]\n\tat clojure.lang.RestFn.invoke(RestFn.java:512)
[clojure-1.5.1.jar:na]\n\tat
backtype.storm.daemon.worker$_main.invoke(worker.clj:504)
[storm-core-0.9.6.jar:0.9.6]\n\tat clojure.lang.AFn.applyToHelper(AFn.java:172)
[clojure-1.5.1.jar:na]\n\tat clojure.lang.AFn.applyTo(AFn.java:151)
[clojure-1.5.1.jar:na]\n\tat backtype.storm.daemon.worker.main(Unknown Source)
[storm-core-0.9.6.jar:0.9.6]\n", 808) = 808
I have a single topo with 1 worker 368 executor and 386 tasks, with Trident
is my problem similar to https://issues.apache.org/jira/browse/STORM-885
Do we have an idea when it will be completed?
________________________________
De : Harsha <[email protected]>
Envoyé : 24 juillet 2015 22:11
À : [email protected]
Objet : Re: worker dies after view minutes
you can try increasing the supervisor.worker.timeout.secs . At basic level your
parallelism depends on the number of cpus as you are increasing no.of threads
executing the vm. You probably want to increase the JVM memory as well.
On Fri, Jul 24, 2015, at 05:58 AM, Eric Ruel wrote:
is there a limit of bolts/threads we can have within a single worker?
our topology has almost 140 bolts including those created by trident, and if I
increase the parallelism, the worker dies
is it caused by the process that check if every tasks are alive, and it takes
too much time to do the whole loop, or something like that?
is there any values in storm.yaml I should modify to support that number of
threads?
________________________________
De : Eric Ruel <[email protected]>
Envoyé : 23 juillet 2015 13:57
À : [email protected]
Objet : RE: worker dies after view minutes
Originally, we had multiple topologies single worker with a maxSpoutpending of
3, a parallellimshint between 1 and 20 depending on the bolt,a batch size of
300 and a maxTask of 90
by reducing the batchsize to 2 records, and saw that my workers died always
after about ~45 seconds... so I just did many tries by changing values of all
parameters
I still don't totally understand the difference between the maxtask and the
paralellism hint
I only remember that originally, we preferred to reduce the parallellismHint to
minimum to avoid the problem caused by
https://issues.apache.org/jira/browse/STORM-503
note that we use trident, so if I count all bolts I can see under the section
"Bolt (All time)" in storm UI, I have 137 bolts (including all merges, joins,
project...)
Eric
________________________________
De : Harsha <[email protected]>
Envoyé : 23 juillet 2015 11:14
À : [email protected]
Objet : Re: worker dies after view minutes
Thanks for update Eric. Could you describe how did you find that maxTask too
high causing this issue. We are trying to improve debugging of storm topologies
, this will be helpful for us.
Thanks,
Harhsa
On Thu, Jul 23, 2015, at 07:36 AM, Eric Ruel wrote:
finally the problem was caused by a maxTask too high
________________________________
De : Harsha <[email protected]>
Envoyé : 22 juillet 2015 10:56
À : [email protected]
Objet : Re: worker dies after view minutes
how is your topology code looks like are you throwing any errors from bolt's
execute method?. It does look like there is a RuntimeException happening
"
Error when processing event
java.lang.RuntimeException:
"
Its up to the user to catch any exception and log or do something with instead
of throwing it back to worker jvm
-Harsha
On Wed, Jul 22, 2015, at 07:43 AM, Eric Ruel wrote:
Hello
the workers in my topology dies after 1,2 minutes
I tried to change the config about the heartbeat, cluster or local mode, but
they always die
any idea?
10:38:38.019 ERROR backtype.storm.daemon.worker - Error when processing event
java.lang.RuntimeException:
org.apache.storm.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-68f27054a7b2-1024
at backtype.storm.util$wrap_in_runtime.invoke(util.clj:44)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:173)
~[storm-core-0.9.6.jar:0.9.6]
at
backtype.storm.cluster$mk_distributed_cluster_state$reify__1919.set_data(cluster.clj:92)
~[storm-core-0.9.6.jar:0.9.6]
at
backtype.storm.cluster$mk_storm_cluster_state$reify__2376.worker_heartbeat_BANG_(cluster.clj:332)
~[storm-core-0.9.6.jar:0.9.6]
at sun.reflect.GeneratedMethodAccessor135.invoke(Unknown Source) ~[na:na]
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
~[na:1.7.0_71]
at java.lang.reflect.Method.invoke(Method.java:606) ~[na:1.7.0_71]
at clojure.lang.Reflector.invokeMatchingMethod(Reflector.java:93)
~[clojure-1.5.1.jar:na]
at clojure.lang.Reflector.invokeInstanceMethod(Reflector.java:28)
~[clojure-1.5.1.jar:na]
at backtype.storm.daemon.worker$do_executor_heartbeats.doInvoke(worker.clj:56)
~[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:439) ~[clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$fn__3757$exec_fn__1163__auto____3758$fn__3761.invoke(worker.clj:413)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.timer$schedule_recurring$this__1704.invoke(timer.clj:99)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj:50)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]
Caused by: org.apache.storm.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for
/workerbeats/testeric-1-1437575782/259f61ae-02a5-4a75-be50-68f27054a7b2-1024
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:99)
~[storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.zookeeper.KeeperException.create(KeeperException.java:51)
~[storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.zookeeper.ZooKeeper.setData(ZooKeeper.java:1270)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:260)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.curator.framework.imps.SetDataBuilderImpl$4.call(SetDataBuilderImpl.java:256)
~[storm-core-0.9.6.jar:0.9.6]
at org.apache.storm.curator.RetryLoop.callWithRetry(RetryLoop.java:107)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.pathInForeground(SetDataBuilderImpl.java:252)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:239)
~[storm-core-0.9.6.jar:0.9.6]
at
org.apache.storm.curator.framework.imps.SetDataBuilderImpl.forPath(SetDataBuilderImpl.java:39)
~[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.zookeeper$set_data.invoke(zookeeper.clj:172)
~[storm-core-0.9.6.jar:0.9.6]
... 15 common frames omitted
10:38:38.023 ERROR backtype.storm.util - Halting process: ("Error when
processing an event")
java.lang.RuntimeException: ("Error when processing an event")
at backtype.storm.util$exit_process_BANG_.doInvoke(util.clj:325)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.RestFn.invoke(RestFn.java:423) [clojure-1.5.1.jar:na]
at
backtype.storm.daemon.worker$mk_halting_timer$fn__3572.invoke(worker.clj:177)
[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.timer$mk_timer$fn__1687$fn__1688.invoke(timer.clj:68)
[storm-core-0.9.6.jar:0.9.6]
at backtype.storm.timer$mk_timer$fn__1687.invoke(timer.clj:42)
[storm-core-0.9.6.jar:0.9.6]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_71]