I updated our cluster from storm 1.0.4 to 1.0.5. The supervisors are fine,
but the nimbus keeps dying every 10s. It just dies silently, there are no
errors in the logs, nor in the JVM stdout. Nimbus exits with status 13.
Logs follow:

...
2017-09-19 09:51:20.200 o.a.s.n.NimbusInfo main [INFO] Overriding nimbus
host to storm.local.hostname -> 172.17.0.3
2017-09-19 09:51:20.311 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main [INFO]
Starting
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:host.name=85c13f835de1
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.version=1.8.0_121
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.vendor=Oracle Corporation
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.class.path=/opt/apache-storm-1.0.5/lib/objenesis-2.1.jar:/opt/apache-storm-1.0.5/lib/log4j-slf4j-impl-2.8.jar:/opt/apache-storm-1.0.5/lib/kryo-3.0.3.jar:/opt/apache-storm-1.0.5/lib/disruptor-3.3.2.jar:/opt/apache-storm-1.0.5/lib/asm-5.0.3.jar:/opt/apache-storm-1.0.5/lib/log4j-core-2.8.jar:/opt/apache-storm-1.0.5/lib/minlog-1.3.0.jar:/opt/apache-storm-1.0.5/lib/slf4j-api-1.7.21.jar:/opt/apache-storm-1.0.5/lib/reflectasm-1.10.1.jar:/opt/apache-storm-1.0.5/lib/storm-core-1.0.5.jar:/opt/apache-storm-1.0.5/lib/storm-rename-hack-1.0.5.jar:/opt/apache-storm-1.0.5/lib/clojure-1.7.0.jar:/opt/apache-storm-1.0.5/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-1.0.5/lib/servlet-api-2.5.jar:/opt/apache-storm-1.0.5/lib/log4j-api-2.8.jar:/opt/apache-storm-1.0.5/lib/airbrake-java.jar:/opt/apache-storm-1.0.5/conf
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.io.tmpdir=/tmp
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:java.compiler=<NA>
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:os.name=Linux
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:os.arch=amd64
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:os.version=4.11.6-3-ARCH
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:user.name=storm
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:user.home=/home/storm
2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client
environment:user.dir=/home/storm
2017-09-19 09:51:20.321 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating
client connection, connectString=172.17.0.2:2181/storm sessionTimeout=20000
watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@455c1d8c
2017-09-19 09:51:20.357 o.a.s.s.o.a.z.ClientCnxn main-SendThread(
172.17.0.2:2181) [INFO] Opening socket connection to server
172.17.0.2/172.17.0.2:2181. Will not attempt to authenticate using SASL
(unknown error)
2017-09-19 09:51:20.366 o.a.s.b.FileBlobStoreImpl main [INFO] Creating new
blob store based in /home/storm/data/blobs
2017-09-19 09:51:20.393 o.a.s.d.nimbus main [INFO] Using custom scheduler:
tparking.storm.scheduler.StaticScheduler
2017-09-19 09:51:31.406 o.a.s.d.nimbus main [INFO] Starting Nimbus with
conf {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
"-Xmx1024m
...

The thing is that the problem persists even after downgrade back to 1.0.4.
I cleared all the state from the disk before both the up- and downgrade,
everything in the nimbus data dir and all the zookeeper state.

Does anyone have an idea about what's going on?

Thanks in advance, Martin

Reply via email to