Yes, GC might be the issue. I read somewhere in archives. http://grokbase.com/t/gg/storm-user/133mdhagd0/worker-dying
On Tue, Sep 9, 2014 at 5:09 PM, Palak Shah <spala...@gmail.com> wrote: > I looked at the supervisor and nimbus logs and confirmed that the topology > is being rebalanced. It could be because of timeout. Could you help me > figure out exactly why this timeout occurs and how I can fix it? > > I have enabled GC for my workers and supervisor. Could it be the reason > that worker is not able to send a heartbeat? I tried increasing the heap > size allotted to each worker by tweaking the value of worker.childopts to > have "-Xmx768m". But I did not see any difference in behaviour of my > topology. How can I fix this issue? > > here are my supervisor logs - > > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down and clearing > state for id a8733f89-9f41-4624-9bce-d9f2d8c449ee. Current supervisor time: > 1410261997. State: :timed-out, Heartbeat: > #backtype.storm.daemon.common.WorkerHeartbeat{:time-secs 1410261964, > :storm-id "popeyeTopology-2-1410260830", :executors #{[4 4] [7 7] [-1 -1] > [1 1]}, :port 6700} > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shutting down > 9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee > 2014-09-09 16:56:37 b.s.util [INFO] Error when trying to kill 28950. > Process is probably already dead. > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Shut down > 9f26d478-1963-425e-a0c2-139712f32b9e:a8733f89-9f41-4624-9bce-d9f2d8c449ee > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with > assignment #backtype.storm.daemon.supervisor.LocalAssignment{:storm-id > "popeyeTopology-2-1410260830", :executors ([7 7] [4 4] [1 1])} for this > supervisor 9f26d478-1963-425e-a0c2-139712f32b9e on port 6700 with id > 78233b86-b7f3-4590-9709-ba0d71130d7e > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] Launching worker with command: > '/usr/lib/jvm/java-7-oracle/bin/java' '-server' '-Xmx768m' '-verbose:gc' > '-XX:+PrintGCTimeStamps' '-XX:+PrintGCDetails' > '-Dcom.sun.management.jmxremote' '-Dcom.sun.management.jmxremote.ssl=false' > '-Dcom.sun.management.jmxremote.authenticate=false' > '-Dcom.sun.management.jmxremote.port=16700' > '-Djava.library.path=/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources/Linux-amd64:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/resources:/usr/lib/jvm/java-7-oracle/lib' > '-Dlogfile.name=worker-6700.log' > '-Dstorm.home=/home/stormcluster/Storm/apache-storm-0.9.2-incubating' > '-Dlogback.configurationFile=/home/stormcluster/Storm/apache-storm-0.9.2-incubating/logback/cluster.xml' > '-Dstorm.id=popeyeTopology-2-1410260830' > '-Dworker.id=78233b86-b7f3-4590-9709-ba0d71130d7e' '-Dworker.port=6700' > '-cp' > '/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/chill-java-0.3.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-exec-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-io-2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/joda-time-2.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-servlet-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/scala-library-2.9.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clout-1.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-core-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/json-simple-1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-client-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/carbonite-1.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/asm-4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpcore-4.3.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-util-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-stacktrace-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/log4j-over-slf4j-1.6.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.2.2.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/compojure-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zookeeper-3.4.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-devel-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jgrapht-core-0.9.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/guava-13.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/objenesis-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/minlog-1.2.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/core.incubator-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clj-time-0.4.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-core-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-logging-1.1.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/logback-classic-1.0.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/clojure-1.5.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-jetty-adapter-0.3.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.macro-0.1.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/reflectasm-1.07-shaded.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/math.numeric-tower-0.0.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-fileupload-1.2.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/servlet-api-2.5-20081211.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jetty-6.1.26.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/jline-2.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.cli-0.2.4.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/slf4j-api-1.6.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/hiccup-0.3.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-codec-1.6.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/snakeyaml-1.11.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/zmq.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/netty-3.6.3.Final.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/storm-kafka-0.9.2-incubating.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/ring-core-1.1.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kafka_2.9.2-0.8.1.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/tools.logging-0.2.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/kryo-2.21.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/disruptor-2.10.1.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/httpclient-4.3.3.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/commons-lang-2.5.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/lib/curator-framework-2.4.0.jar:/home/stormcluster/Storm/apache-storm-0.9.2-incubating/conf:/tmp/stormtmp/supervisor/stormdist/popeyeTopology-2-1410260830/stormjar.jar' > 'backtype.storm.daemon.worker' 'popeyeTopology-2-1410260830' > '9f26d478-1963-425e-a0c2-139712f32b9e' '6700' > '78233b86-b7f3-4590-9709-ba0d71130d7e' > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:37 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:38 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:38 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:39 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:39 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:40 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > 2014-09-09 16:56:40 b.s.d.supervisor [INFO] > 78233b86-b7f3-4590-9709-ba0d71130d7e still hasn't started > > > Thanks, > Palak > > On Mon, Sep 8, 2014 at 6:46 PM, Cyrille Karmann <cyrillekarm...@gmail.com> > wrote: > >> Look at the supervisors logs. There could be a timeout that make a >> supervisor to force a worker to shut down, triggering a rebalancing. >> >> >> >> 2014-09-08 7:17 GMT-04:00 Palak Shah <spala...@gmail.com>: >> >> I have a topology that uses a Kafka spout to read values from a Kafka >>> queue. I used the kafkaSpout that came with storm-0.9.2-incubating. >>> >>> I observed that the nimbus rebalances the topology very often. The >>> topology suddenly shuts down and starts again with the tasks running on >>> different machines. I wanted to know why storm nimbus is rebalancing my >>> topology, so I observed the storm throughput, latency, load on bolts and >>> even system metrics like cpu and memory utilization, but I could not see a >>> pattern. >>> >>> Can someone explain what are the factors that lead to rebalancing of >>> topology in storm? >>> >>> Thanks, >>> Palak Shah >>> >> >> >> >> -- >> Cyrille Karmann >> +1-514-659-1209 >> cyrillekarm...@gmail.com >> > > -- Regards, Vikas Agarwal 91 – 9928301411 InfoObjects, Inc. Execution Matters http://www.infoobjects.com 2041 Mission College Boulevard, #280 Santa Clara, CA 95054 +1 (408) 988-2000 Work +1 (408) 716-2726 Fax