I updated our cluster from storm 1.0.4 to 1.0.5. The supervisors are fine, but the nimbus keeps dying every 10s. It just dies silently, there are no errors in the logs, nor in the JVM stdout. Nimbus exits with status 13. Logs follow:
... 2017-09-19 09:51:20.200 o.a.s.n.NimbusInfo main [INFO] Overriding nimbus host to storm.local.hostname -> 172.17.0.3 2017-09-19 09:51:20.311 o.a.s.s.o.a.c.f.i.CuratorFrameworkImpl main [INFO] Starting 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:host.name=85c13f835de1 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.version=1.8.0_121 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.vendor=Oracle Corporation 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.home=/usr/lib/jvm/java-8-openjdk-amd64/jre 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.class.path=/opt/apache-storm-1.0.5/lib/objenesis-2.1.jar:/opt/apache-storm-1.0.5/lib/log4j-slf4j-impl-2.8.jar:/opt/apache-storm-1.0.5/lib/kryo-3.0.3.jar:/opt/apache-storm-1.0.5/lib/disruptor-3.3.2.jar:/opt/apache-storm-1.0.5/lib/asm-5.0.3.jar:/opt/apache-storm-1.0.5/lib/log4j-core-2.8.jar:/opt/apache-storm-1.0.5/lib/minlog-1.3.0.jar:/opt/apache-storm-1.0.5/lib/slf4j-api-1.7.21.jar:/opt/apache-storm-1.0.5/lib/reflectasm-1.10.1.jar:/opt/apache-storm-1.0.5/lib/storm-core-1.0.5.jar:/opt/apache-storm-1.0.5/lib/storm-rename-hack-1.0.5.jar:/opt/apache-storm-1.0.5/lib/clojure-1.7.0.jar:/opt/apache-storm-1.0.5/lib/log4j-over-slf4j-1.6.6.jar:/opt/apache-storm-1.0.5/lib/servlet-api-2.5.jar:/opt/apache-storm-1.0.5/lib/log4j-api-2.8.jar:/opt/apache-storm-1.0.5/lib/airbrake-java.jar:/opt/apache-storm-1.0.5/conf 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.library.path=/usr/local/lib:/opt/local/lib:/usr/lib 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.io.tmpdir=/tmp 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:java.compiler=<NA> 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:os.name=Linux 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:os.arch=amd64 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:os.version=4.11.6-3-ARCH 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:user.name=storm 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:user.home=/home/storm 2017-09-19 09:51:20.320 o.a.s.s.o.a.z.ZooKeeper main [INFO] Client environment:user.dir=/home/storm 2017-09-19 09:51:20.321 o.a.s.s.o.a.z.ZooKeeper main [INFO] Initiating client connection, connectString=172.17.0.2:2181/storm sessionTimeout=20000 watcher=org.apache.storm.shade.org.apache.curator.ConnectionState@455c1d8c 2017-09-19 09:51:20.357 o.a.s.s.o.a.z.ClientCnxn main-SendThread( 172.17.0.2:2181) [INFO] Opening socket connection to server 172.17.0.2/172.17.0.2:2181. Will not attempt to authenticate using SASL (unknown error) 2017-09-19 09:51:20.366 o.a.s.b.FileBlobStoreImpl main [INFO] Creating new blob store based in /home/storm/data/blobs 2017-09-19 09:51:20.393 o.a.s.d.nimbus main [INFO] Using custom scheduler: tparking.storm.scheduler.StaticScheduler 2017-09-19 09:51:31.406 o.a.s.d.nimbus main [INFO] Starting Nimbus with conf {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts" "-Xmx1024m ... The thing is that the problem persists even after downgrade back to 1.0.4. I cleared all the state from the disk before both the up- and downgrade, everything in the nimbus data dir and all the zookeeper state. Does anyone have an idea about what's going on? Thanks in advance, Martin