For anyone who is interested in this, we finally consider this is probably a non stable use of server memory issue, Using -server -xms2G -xmx2G has successfully help us get rid of this issue.
Regards, Yiming _____________________________________________ From: Fang, Yiming [ICG-IT] Sent: Monday, October 20, 2014 11:51 AM To: [email protected] Subject: RE: nimbus detect executor not alive and reassign tasks issue Thanks Harsha for the help. Yes, every tuple is acked once its successfully written to the DB. Below attached nimbus and worker log for a restart scenario triggered recently: Nimbus log: 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[2 2] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[3 3] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[4 4] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[5 5] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[6 6] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[7 7] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[8 8] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[9 9] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[10 10] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[27 27] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[92 92] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[93 93] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[1 1] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[36 39] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[68 71] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[40 43] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[72 75] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[11 14] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[44 47] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[76 79] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[15 18] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[48 51] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[80 83] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[19 22] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[52 55] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[84 87] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[23 26] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[56 59] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[88 91] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[28 31] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[60 63] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[32 35] not alive 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Executor csl-realtime-tp-ap-equity-topology-1-1413267046:[64 67] not alive 2014-10-19 23:01:37 b.s.s.EvenScheduler [INFO] Available slots: (["e18633fb-6613-4afc-bea1-9941084508c5" 6702] ["e18633fb-6613-4afc-bea1-9941084508c5" 6701] ["e18633fb-6613-4afc-bea1-9941084508c5" 6700]) 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Reassigning csl-realtime-tp-ap-equity-topology-1-1413267046 to 1 slots 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 3] [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [27 27] [92 92] [93 93] [1 1] [36 39] [68 71] [40 43] [72 75] [11 14] [44 47] [76 79] [15 18] [48 51] [80 83] [19 22] [52 55] [84 87] [23 26] [56 59] [88 91] [28 31] [60 63] [32 35] [64 67]] 2014-10-19 23:01:37 b.s.d.nimbus [INFO] Setting new assignment for topology id csl-realtime-tp-ap-equity-topology-1-1413267046: #backtype.storm.daemon.common.Assignment{:master-code-dir "/data/PBData/foundation/csl_realtime/data/storm/0.9.2/nimbus/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046", :node->host {"e18633fb-6613-4afc-bea1-9941084508c5" "pfdwlnx1u"}, :executor->node+port {[2 2] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [3 3] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [4 4] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [5 5] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [6 6] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [7 7] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [8 8] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [9 9] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [10 10] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [27 27] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [92 92] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [93 93] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [1 1] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [36 39] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [68 71] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [40 43] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [72 75] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [11 14] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [44 47] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [76 79] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [15 18] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [48 51] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [80 83] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [19 22] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [52 55] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [84 87] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [23 26] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [56 59] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [88 91] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [28 31] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [60 63] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [32 35] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [64 67] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702]}, :executor->start-time-secs {[2 2] 1413774097, [3 3] 1413774097, [4 4] 1413774097, [5 5] 1413774097, [6 6] 1413774097, [7 7] 1413774097, [8 8] 1413774097, [9 9] 1413774097, [10 10] 1413774097, [27 27] 1413774097, [92 92] 1413774097, [93 93] 1413774097, [1 1] 1413774097, [36 39] 1413774097, [68 71] 1413774097, [40 43] 1413774097, [72 75] 1413774097, [11 14] 1413774097, [44 47] 1413774097, [76 79] 1413774097, [15 18] 1413774097, [48 51] 1413774097, [80 83] 1413774097, [19 22] 1413774097, [52 55] 1413774097, [84 87] 1413774097, [23 26] 1413774097, [56 59] 1413774097, [88 91] 1413774097, [28 31] 1413774097, [60 63] 1413774097, [32 35] 1413774097, [64 67] 1413774097}} Worker log 6702: 2014-10-19 22:59:33 s.k.KafkaUtils [INFO] Task [1/1] assigned [Partition{host= serverName:9092, partition=0}] 2014-10-19 22:59:33 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition managers: [] 2014-10-19 22:59:33 s.k.ZkCoordinator [INFO] Task [1/1] New partition managers: [] 2014-10-19 22:59:33 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing 2014-10-19 23:00:33 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing partition manager connections 2014-10-19 23:00:33 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=serverName:9092}} 2014-10-19 23:00:33 s.k.KafkaUtils [INFO] Task [1/1] assigned [Partition{host= serverName:9092, partition=0}] 2014-10-19 23:00:33 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition managers: [] 2014-10-19 23:00:33 s.k.ZkCoordinator [INFO] Task [1/1] New partition managers: [] 2014-10-19 23:00:33 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing At 20:54 the last log for worker 6702 then suddenly at 23:01 we get the reassign tasks logics, seems nothing as exception happened. From the upper worker-6703 log, the worker 6703 finishes the last refresh of Kafka spout 23:00:33 then seems no more active trace for the 6703 port, nimbus begin to reassign tasks on 6702: 2014-10-19 20:54:21 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:host.name=serverName2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.version=1.7.0_512014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.home=/opt/gpf/realtime/jdk/1.7.0_51l64/jre2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.class.path=/opt/gpf/realtime/storm/0.9.2/lib/logback-core-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/minlog-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/logback-classic-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/slf4j-api-1.6.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/log4j-over-slf4j-1.6.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/clojure-1.5.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-exec-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/asm-4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-io-2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-lang-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-codec-1.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.2.2.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/json-simple-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/core.incubator-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/hiccup-0.3.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpcore-4.3.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/jgrapht-core-0.9.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/reflectasm-1.07-shaded.jar:/opt/gpf/realtime/storm/0.9.2/lib/guava-13.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-time-0.4.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/storm-core-0.9.2-incubating.jar:/opt/gpf/realtime/storm/0.9.2/lib/jline-2.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/disruptor-2.10.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/objenesis-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/kryo-2.21.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.logging-0.2.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/compojure-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-jetty-adapter-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-framework-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-stacktrace-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-servlet-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpclient-4.3.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-devel-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/snakeyaml-1.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/carbonite-1.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/math.numeric-tower-0.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.6.3.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/zookeeper-3.4.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-core-1.1.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.macro-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clout-1.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.cli-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-logging-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/chill-java-0.3.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-fileupload-1.2.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-client-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-util-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/joda-time-2.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5-20081211.jar:/opt/gpf/realtime/storm/0.9.2/conf:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/stormjar.jar2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.library.path=/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources/Linux-amd64:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources:/usr/local/lib:/opt/local/lib:/usr/lib2014-10-19 232014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:host.name=serverName2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.version=1.7.0_512014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.vendor=Oracle Corporation2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.home=/opt/gpf/realtime/jdk/1.7.0_51l64/jre2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.class.path=/opt/gpf/realtime/storm/0.9.2/lib/logback-core-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/minlog-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/logback-classic-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/slf4j-api-1.6.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/log4j-over-slf4j-1.6.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/clojure-1.5.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-exec-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/asm-4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-io-2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-lang-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-codec-1.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.2.2.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/json-simple-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/core.incubator-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/hiccup-0.3.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpcore-4.3.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/jgrapht-core-0.9.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/reflectasm-1.07-shaded.jar:/opt/gpf/realtime/storm/0.9.2/lib/guava-13.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-time-0.4.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/storm-core-0.9.2-incubating.jar:/opt/gpf/realtime/storm/0.9.2/lib/jline-2.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/disruptor-2.10.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/objenesis-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/kryo-2.21.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.logging-0.2.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/compojure-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-jetty-adapter-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-framework-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-stacktrace-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-servlet-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpclient-4.3.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-devel-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/snakeyaml-1.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/carbonite-1.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/math.numeric-tower-0.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.6.3.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/zookeeper-3.4.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-core-1.1.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.macro-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clout-1.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.cli-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-logging-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/chill-java-0.3.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-fileupload-1.2.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-client-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-util-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/joda-time-2.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5-20081211.jar:/opt/gpf/realtime/storm/0.9.2/conf:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/stormjar.jar2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.library.path=/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources/Linux-amd64:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources:/usr/local/lib:/opt/local/lib:/usr/lib2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.io.tmpdir=/tmp2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:java.compiler=<NA>2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:os.name=Linux2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:os.arch=amd642014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:os.version=2.6.18-348.25.1.el52014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:user.name=gpfadm2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:user.home=/home/gpfadm2014-10-19 23:01:41 o.a.z.ZooKeeper [INFO] Client environment:user.dir=/opt/gpf/realtime/bin2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:zookeeper.version=3.4.5-1392090, built on 09/30/2012 17:52 GMT2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:host.name=serverName2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.version=1.7.0_512014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.vendor=Oracle Corporation2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.home=/opt/gpf/realtime/jdk/1.7.0_51l64/jre2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.class.path=/opt/gpf/realtime/storm/0.9.2/lib/logback-core-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/minlog-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/logback-classic-1.0.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/slf4j-api-1.6.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/log4j-over-slf4j-1.6.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/clojure-1.5.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-exec-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/asm-4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-io-2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-lang-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-codec-1.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.2.2.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/json-simple-1.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/core.incubator-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/hiccup-0.3.6.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpcore-4.3.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/jgrapht-core-0.9.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/reflectasm-1.07-shaded.jar:/opt/gpf/realtime/storm/0.9.2/lib/guava-13.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-time-0.4.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/storm-core-0.9.2-incubating.jar:/opt/gpf/realtime/storm/0.9.2/lib/jline-2.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/disruptor-2.10.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/objenesis-1.2.jar:/opt/gpf/realtime/storm/0.9.2/lib/kryo-2.21.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.logging-0.2.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/compojure-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-jetty-adapter-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-framework-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clj-stacktrace-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-servlet-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/httpclient-4.3.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-devel-0.3.11.jar:/opt/gpf/realtime/storm/0.9.2/lib/snakeyaml-1.11.jar:/opt/gpf /realtime/storm/0.9.2/lib/carbonite-1.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/math.numeric-tower-0.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/netty-3.6.3.Final.jar:/opt/gpf/realtime/storm/0.9.2/lib/zookeeper-3.4.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/ring-core-1.1.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.macro-0.1.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/clout-1.0.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/tools.cli-0.2.4.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-logging-1.1.3.jar:/opt/gpf/realtime/storm/0.9.2/lib/chill-java-0.3.5.jar:/opt/gpf/realtime/storm/0.9.2/lib/commons-fileupload-1.2.1.jar:/opt/gpf/realtime/storm/0.9.2/lib/curator-client-2.4.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/jetty-util-6.1.26.jar:/opt/gpf/realtime/storm/0.9.2/lib/joda-time-2.0.jar:/opt/gpf/realtime/storm/0.9.2/lib/servlet-api-2.5-20081211.jar:/opt/gpf/realtime/storm/0.9.2/conf:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/stormjar.jar 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.library.path=/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources/Linux-amd64:/data/PBData/foundation/csl_realtime/data/storm/0.9.2/supervisor/stormdist/csl-realtime-tp-ap-equity-topology-1-1413267046/resources:/usr/local/lib:/opt/local/lib:/usr/lib 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.io.tmpdir=/tmp 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:java.compiler=<NA> 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.name=Linux 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.arch=amd64 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:os.version=2.6.18-348.25.1.el5 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.name=gpfadm 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.home=/home/gpfadm 2014-10-19 23:01:41 o.a.z.s.ZooKeeperServer [INFO] Server environment:user.dir=/opt/gpf/realtime/bin 2014-10-19 23:01:42 b.s.d.worker [INFO] Launching worker for csl-realtime-tp-ap-equity-topology-1-1413267046 on e18633fb-6613-4afc-bea1-9941084508c5:6702 with id 00b7ea91-5e19-452e-8b9e-ce035be31867 and conf {"dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations" false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "storm.messaging.netty.flush.check.interval.ms" 10, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/data/PBData/foundation/csl_realtime/data/storm/0.9.2", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "serverName.nam.nsroot.net", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 7181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["serverName.nam.nsroot.net"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" nil, "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx2048m -Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1%ID%", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" nil, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.message.timeout.secs" 30, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=29999", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "topology.multilang.serializer" "backtype.storm.multilang.JsonSerializer", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.worker.receiver.thread.count" 1, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 18087, "nimbus.childopts" "-Xmx2048m -Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=19999", "storm.cluster.mode" "distributed", "topology.max.task.parallelism" nil, "storm.messaging.netty.transfer.batch.size" 262144} 2014-10-19 23:01:42 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-10-19 23:01:42 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=serverName.nam.nsroot.net:7181 sessionTimeout=20000 watcher=org.apache.curator.ConnectionState@79acfa95<mailto:watcher=org.apache.curator.ConnectionState@79acfa95> 2014-10-19 23:01:42 o.a.z.ClientCnxn [INFO] Opening socket connection to server serverName/ip:7181. Will not attempt to authenticate using SASL (unknown error) 2014-10-19 23:01:42 o.a.z.ClientCnxn [INFO] Socket connection established to serverName/ip:7181, initiating session 2014-10-19 23:01:42 o.a.z.ClientCnxn [INFO] Session establishment complete on server serverName/ip:7181, sessionid = 0x1492848710d0021, negotiated timeout = 20000 2014-10-19 23:01:42 o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2014-10-19 23:01:42 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-10-19 23:01:42 b.s.zookeeper [INFO] Zookeeper state update: :connected:none 2014-10-19 23:01:43 o.a.z.ZooKeeper [INFO] Session: 0x1492848710d0021 closed 2014-10-19 23:01:43 o.a.z.ClientCnxn [INFO] EventThread shut down 2014-10-19 23:01:43 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-10-19 23:01:43 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=serverName.nam.nsroot.net:7181/storm sessionTimeout=20000 watcher=org.apache.curator.ConnectionState@7e56f324<mailto:watcher=org.apache.curator.ConnectionState@7e56f324> 2014-10-19 23:01:43 o.a.z.ClientCnxn [INFO] Opening socket connection to server serverName/ip:7181. Will not attempt to authenticate using SASL (unknown error) 2014-10-19 23:01:43 o.a.z.ClientCnxn [INFO] Socket connection established to serverName/ip:7181, initiating session 2014-10-19 23:01:43 o.a.z.ClientCnxn [INFO] Session establishment complete on server serverName/ip:7181, sessionid = 0x1492848710d0022, negotiated timeout = 20000 2014-10-19 23:01:43 o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2014-10-19 23:01:43 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-10-19 23:01:43 b.s.m.TransportFactory [INFO] Storm peer transport plugin:backtype.storm.messaging.netty.Context 2014-10-19 23:01:44 b.s.d.executor [INFO] Loading executor __acker:[2 2] ... prepare log omitted here 2014-10-19 23:01:44 b.s.d.executor [INFO] Finished loading executor kafkaSpout:[92 92] 2014-10-19 23:01:44 b.s.d.executor [INFO] Opening spout kafkaSpout:(92) 2014-10-19 23:01:44 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-10-19 23:01:44 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=serverName.nam.nsroot.net:7181, sessionTimeout=20000 watcher=org.apache.curator.ConnectionState@229f29d<mailto:watcher=org.apache.curator.ConnectionState@229f29d> 2014-10-19 23:01:44 b.s.d.executor [INFO] Loading executor splitBolt:[93 93] 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Opening socket connection to server serverName/ip:7181. Will not attempt to authenticate using SASL (unknown error) 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Socket connection established to serverName/ip:7181, initiating session 2014-10-19 23:01:44 b.s.d.executor [INFO] Loaded executor tasks splitBolt:[93 93] 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Session establishment complete on server serverName/ip:7181, sessionid = 0x1492848710d0023, negotiated timeout = 20000 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-10-19 23:01:44 b.s.d.executor [INFO] Finished loading executor splitBolt:[93 93] 2014-10-19 23:01:44 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-10-19 23:01:44 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=serverName.nam.nsroot.net:2181 sessionTimeout=20000 watcher=org.apache.curator.ConnectionState@613d77af<mailto:watcher=org.apache.curator.ConnectionState@613d77af> 2014-10-19 23:01:44 b.s.d.executor [INFO] Preparing bolt splitBolt:(93) 2014-10-19 23:01:44 b.s.d.executor [INFO] Prepared bolt splitBolt:(93) 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Opening socket connection to server serverName/ip:2181. Will not attempt to authenticate using SASL (unknown error) 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Socket connection established to serverName/ip:2181, initiating session 2014-10-19 23:01:44 b.s.d.executor [INFO] Loading executor __system:[-1 -1] 2014-10-19 23:01:44 o.a.z.ClientCnxnSocket [WARN] Connected to an old server; r-o mode will be unavailable 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Session establishment complete on server serverName/ip:2181, sessionid = 0x14928475c440010, negotiated timeout = 20000 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-10-19 23:01:44 b.s.d.executor [INFO] Loaded executor tasks __system:[-1 -1] 2014-10-19 23:01:44 b.s.d.executor [INFO] Finished loading executor __system:[-1 -1] 2014-10-19 23:01:44 b.s.d.executor [INFO] Preparing bolt __system:(-1) 2014-10-19 23:01:44 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=serverName:9092}} 2014-10-19 23:01:44 b.s.d.executor [INFO] Prepared bolt __system:(-1) 2014-10-19 23:01:44 o.a.c.f.i.CuratorFrameworkImpl [INFO] Starting 2014-10-19 23:01:44 o.a.z.ZooKeeper [INFO] Initiating client connection, connectString=serverName.nam.nsroot.net:2181 sessionTimeout=20000 watcher=org.apache.curator.ConnectionState@2a3f0cf8<mailto:watcher=org.apache.curator.ConnectionState@2a3f0cf8> 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Opening socket connection to server serverName/ip:2181. Will not attempt to authenticate using SASL (unknown error) 2014-10-19 23:01:44 b.s.d.executor [INFO] Loading executor __acker:[1 1] 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Socket connection established to serverName/ip:2181, initiating session 2014-10-19 23:01:44 b.s.d.executor [INFO] Loaded executor tasks __acker:[1 1] 2014-10-19 23:01:44 o.a.z.ClientCnxnSocket [WARN] Connected to an old server; r-o mode will be unavailable 2014-10-19 23:01:44 o.a.z.ClientCnxn [INFO] Session establishment complete on server serverName/ip:2181, sessionid = 0x14928475c440011, negotiated timeout = 20000 2014-10-19 23:01:44 b.s.d.executor [INFO] Preparing bolt __acker:(1) 2014-10-19 23:01:44 b.s.d.executor [INFO] Timeouts disabled for executor __acker:[1 1] 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [INFO] State change: CONNECTED 2014-10-19 23:01:44 b.s.d.executor [INFO] Opened spout kafkaSpout:(92) 2014-10-19 23:01:44 o.a.c.f.s.ConnectionStateManager [WARN] There are no ConnectionStateListeners registered. 2014-10-19 23:01:44 b.s.d.executor [INFO] Finished loading executor __acker:[1 1] 2014-10-19 23:01:44 b.s.d.executor [INFO] Prepared bolt __acker:(1) 2014-10-19 23:01:44 b.s.d.executor [INFO] Activating spout kafkaSpout:(92) 2014-10-19 23:01:44 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing partition manager connections 2014-10-19 23:01:44 s.k.DynamicBrokersReader [INFO] Read partition info from zookeeper: GlobalPartitionInformation{partitionMap={0=serverName:9092}} 2014-10-19 23:01:44 s.k.KafkaUtils [INFO] Task [1/1] assigned [Partition{host=serverName:9092, partition=0}] 2014-10-19 23:01:44 s.k.ZkCoordinator [INFO] Task [1/1] Deleted partition managers: [] 2014-10-19 23:01:44 s.k.ZkCoordinator [INFO] Task [1/1] New partition managers: [Partition{host=serverName:9092, partition=0}] 2014-10-19 23:01:44 b.s.d.executor [INFO] Loading executor eqpbOlapBolt:[36 39] 2014-10-19 23:01:44 b.s.d.executor [INFO] Loaded executor tasks eqpbOlapBolt:[36 39] ... Omit prepare bolt log here 2014-10-19 23:01:44 b.s.d.executor [INFO] Finished loading executor eqpbOlapBolt:[64 67] 2014-10-19 23:01:44 b.s.d.worker [INFO] Launching receive-thread for e18633fb-6613-4afc-bea1-9941084508c5:6702 2014-10-19 23:01:44 b.s.d.executor [INFO] Preparing bolt eqpbOlapBolt:(64 65 66 67) 2014-10-19 23:01:44 b.s.m.n.Server [INFO] Create Netty Server Netty-server-localhost-6702, buffer_size: 5242880, maxWorkers: 1 2014-10-19 23:01:44 b.s.m.loader [INFO] Starting receive-thread: [stormId: csl-realtime-tp-ap-equity-topology-1-1413267046, port: 6702, thread-id: 0 ] 2014-10-19 23:01:44 b.s.d.worker [INFO] Worker has topology config {"storm.id" "csl-realtime-tp-ap-equity-topology-1-1413267046", "dev.zookeeper.path" "/tmp/dev-storm-zookeeper", "topology.tick.tuple.freq.secs" nil, "topology.builtin.metrics.bucket.size.secs" 60, "topology.fall.back.on.java.serialization" true, "topology.max.error.report.per.interval" 5, "zmq.linger.millis" 5000, "topology.skip.missing.kryo.registrations" false, "storm.messaging.netty.client_worker_threads" 1, "ui.childopts" "-Xmx768m", "storm.zookeeper.session.timeout" 20000, "nimbus.reassign" true, "topology.trident.batch.emit.interval.millis" 500, "storm.messaging.netty.flush.check.interval.ms" 10, "nimbus.monitor.freq.secs" 10, "logviewer.childopts" "-Xmx128m", "java.library.path" "/usr/local/lib:/opt/local/lib:/usr/lib", "topology.executor.send.buffer.size" 1024, "storm.local.dir" "/data/PBData/foundation/csl_realtime/data/storm/0.9.2", "storm.messaging.netty.buffer_size" 5242880, "supervisor.worker.start.timeout.secs" 120, "topology.enable.message.timeouts" true, "nimbus.cleanup.inbox.freq.secs" 600, "nimbus.inbox.jar.expiration.secs" 3600, "drpc.worker.threads" 64, "topology.worker.shared.thread.pool.size" 4, "nimbus.host" "serverName.nam.nsroot.net", "storm.messaging.netty.min_wait_ms" 100, "storm.zookeeper.port" 7181, "transactional.zookeeper.port" nil, "topology.executor.receive.buffer.size" 1024, "transactional.zookeeper.servers" nil, "storm.zookeeper.root" "/storm", "storm.zookeeper.retry.intervalceiling.millis" 30000, "supervisor.enable" true, "storm.messaging.netty.server_worker_threads" 1, "storm.zookeeper.servers" ["serverName.nam.nsroot.net"], "transactional.zookeeper.root" "/transactional", "topology.acker.executors" 10, "topology.kryo.decorators" (), "topology.name" "csl-realtime-tp-ap-equity-topology", "topology.transfer.buffer.size" 1024, "topology.worker.childopts" nil, "drpc.queue.size" 128, "worker.childopts" "-Xmx2048m -Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=1%ID%", "supervisor.heartbeat.frequency.secs" 5, "topology.error.throttle.interval.secs" 10, "zmq.hwm" 0, "drpc.port" 3772, "supervisor.monitor.frequency.secs" 3, "drpc.childopts" "-Xmx768m", "topology.receiver.buffer.size" 8, "task.heartbeat.frequency.secs" 3, "topology.tasks" nil, "storm.messaging.netty.max_retries" 30, "topology.spout.wait.strategy" "backtype.storm.spout.SleepSpoutWaitStrategy", "nimbus.thrift.max_buffer_size" 1048576, "topology.max.spout.pending" 5000, "storm.zookeeper.retry.interval" 1000, "topology.sleep.spout.wait.strategy.time.ms" 1, "nimbus.topology.validator" "backtype.storm.nimbus.DefaultTopologyValidator", "supervisor.slots.ports" [6700 6701 6702 6703], "topology.debug" false, "nimbus.task.launch.secs" 120, "nimbus.supervisor.timeout.secs" 60, "topology.kryo.register" nil, "topology.message.timeout.secs" 3600, "task.refresh.poll.secs" 10, "topology.workers" 1, "supervisor.childopts" "-Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=29999", "nimbus.thrift.port" 6627, "topology.stats.sample.rate" 0.05, "worker.heartbeat.frequency.secs" 1, "topology.tuple.serializer" "backtype.storm.serialization.types.ListDelegateSerializer", "topology.disruptor.wait.strategy" "com.lmax.disruptor.BlockingWaitStrategy", "topology.multilang.serializer" "backtype.storm.multilang.JsonSerializer", "nimbus.task.timeout.secs" 30, "storm.zookeeper.connection.timeout" 15000, "topology.kryo.factory" "backtype.storm.serialization.DefaultKryoFactory", "drpc.invocations.port" 3773, "logviewer.port" 8000, "zmq.threads" 1, "storm.zookeeper.retry.times" 5, "topology.worker.receiver.thread.count" 1, "storm.thrift.transport" "backtype.storm.security.auth.SimpleTransportPlugin", "topology.state.synchronization.timeout.secs" 60, "supervisor.worker.timeout.secs" 30, "nimbus.file.copy.expiration.secs" 600, "storm.messaging.transport" "backtype.storm.messaging.netty.Context", "logviewer.appender.name" "A1", "storm.messaging.netty.max_wait_ms" 1000, "drpc.request.timeout.secs" 600, "storm.local.mode.zmq" false, "ui.port" 18087, "nimbus.childopts" "-Xmx2048m -Djava.net.preferIPv4Stack=true -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.port=19999", "storm.cluster.mode" "distributed", "topology.max.task.parallelism" nil, "storm.messaging.netty.transfer.batch.size" 262144} 2014-10-19 23:01:44 b.s.d.worker [INFO] Worker 00b7ea91-5e19-452e-8b9e-ce035be31867 for storm csl-realtime-tp-ap-equity-topology-1-1413267046 on e18633fb-6613-4afc-bea1-9941084508c5:6702 has finished loading 2014-10-19 23:01:44 o.s.b.f.x.XmlBeanDefinitionReader [INFO] Loading XML bean definitions from class path resource [csl_saving_service_context.xml] 2014-10-19 23:01:44 o.s.b.f.x.XmlBeanDefinitionReader [INFO] Loading XML bean definitions from class path resource [spring_mail_context.xml] 2014-10-19 23:01:44 s.k.PartitionManager [INFO] Read partition information from: /storm/partition_0 --> {"topic":"citi.gpf_162842.pfdwh.RT.kafka.topic.equity","partition":0,"topology":{"id ":"9b24e9c0-b9cb-4d7c-8659-2b3ff9ca4a43","name":"csl-realtime-tp-ap-equity-topology"},"broker":{"port":9092,"host":"serverName"},"offset":588012} 2014-10-19 23:01:45 s.k.PartitionManager [INFO] Read last commit offset from zookeeper: 588012; old topology_id: 9b24e9c0-b9cb-4d7c-8659-2b3ff9ca4a43 - new topology_id: 9b24e9c0-b9cb-4d7c-8659-2b3ff9ca4a43 2014-10-19 23:01:45 s.k.PartitionManager [INFO] Starting Kafka serverName:0 from offset 588012 2014-10-19 23:01:45 s.k.ZkCoordinator [INFO] Task [1/1] Finished refreshing 2014-10-19 23:01:45 o.s.b.f.c.PropertyPlaceholderConfigurer [INFO] Loading properties file from class path resource [uat/applicationResourcesStorm.properties] 2014-10-19 23:01:45 o.s.b.f.s.DefaultListableBeanFactory [INFO] Pre-instantiating singletons in org.springframework.beans.factory.support.DefaultListableBeanFactory@36cf2a48<mailto:org.springframework.beans.factory.support.DefaultListableBeanFactory@36cf2a48>: defining beans [org.springframework.beans.factory.config.PropertyPlaceholderConfigurer#0,txManager,org.springframework.aop.config.internalAutoProxyCreator,org.springframework.transaction.annotation.AnnotationTransactionAttributeSource#0,org.springframework.transaction.interceptor.TransactionInterceptor#0,org.springframework.transaction.config.internalTransactionAdvisor,olapDataSource,olapDAO,fetPayloadDAO,fbtPayloadDAO,frtPayloadDAO,insFactPositionSP,insFactEquityPositionSP,insFactFIPositionWithDetailSP,insFactBalanceSP,olapPersitenceService,insFactCommonTransctionProc,insFactEquityProc,insFactClientOrigDetailProc,GPBalanceUtil,EQPBBalanceUtil,EQPBPositionUtil,FIPBBalanceUtil,FIPBPositionUtil,utilManager,mailSender,emailHelper,mailSenderHelper,mailSenderUtil,duplikeyIssueMailList]; root of factory hierarchy 2014-10-19 23:01:45 c.c.g.c.u.e.MailSenderHelperImpl [INFO] mail service is startted 2014-10-19 23:01:45 c.c.g.c.u.e.MailSenderHelperImpl [INFO] ============================================================ 2014-10-19 23:01:45 b.s.d.executor [INFO] Prepared bolt eqpbEnrichBolt:(27) 2014-10-19 23:01:45 b.s.d.executor [INFO] Prepared bolt eqpbOlapBolt:(29 28 31 30) ... Omit prepare bolt log here 2014-10-19 23:01:45 b.s.d.executor [INFO] Prepared bolt eqpbOlapBolt:(59 58 57 56) 2014-10-19 23:02:45 s.k.ZkCoordinator [INFO] Task [1/1] Refreshing partition manager connections Hope this is all needed, Regards, Yiming -----Original Message----- From: Harsha [mailto:[email protected]] Sent: Friday, October 17, 2014 9:56 PM To: [email protected]<mailto:[email protected]> Subject: Re: nimbus detect executor not alive and reassign tasks issue Do you see any errors in worker logs thats causing executor failures. Are you ack the tuple once its successfully written to the database. On Thu, Oct 16, 2014, at 11:15 PM, Fang, Yiming wrote: > Dear All, > > I have a nimbus detect executor not alive and reassign tasks issue: > > We are working on 0.9.2 version of storm + 3.4.6 version for zookeeper. > And it's a single node with 1 instance of storm relies on 1 instance > of zookeeper On the same machine. > > Accidently we get the following issue, about 1-2 hours, the nimbus.log > tells that No executor is alive like the following: > > 2014-10-17 01:30:58 b.s.d.nimbus [INFO] Executor xxx :[2 2] not alive > ... executor not alive code here ommitted > 2014-10-17 01:30:58 b.s.d.nimbus [INFO] Executor xxx :[64 67] not > alive > 2014-10-17 01:30:58 b.s.s.EvenScheduler [INFO] Available slots: > (["e18633fb-6613-4afc-bea1-9941084508c5" 6702] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6701] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6700]) > 2014-10-17 01:30:58 b.s.d.nimbus [INFO] Reassigning xxx to 1 slots > 2014-10-17 01:30:58 b.s.d.nimbus [INFO] Reassign executors: [[2 2] [3 > 3] > [4 4] [5 5] [6 6] [7 7] [8 8] [9 9] [10 10] [27 27] [92 92] [93 93] [1 > 1] > [36 39] [68 71] [40 43] [72 75] [11 14] [44 47] [76 79] [15 18] [48 > 51] > [80 83] [19 22] [52 55] [84 87] [23 26] [56 59] [88 91] [28 31] [60 > 63] > [32 35] [64 67]] > 2014-10-17 01:30:58 b.s.d.nimbus [INFO] Setting new assignment for > topology id xxx: > #backtype.storm.daemon.common.Assignment{:master-code-dir "yyy", > :node->host {"e18633fb-6613-4afc-bea1-9941084508c5" "pfdwlnx1u"}, > :executor->node+port {[2 2] ["e18633fb-6613-4afc-bea1-9941084508c5" > 6702], [3 3] ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [4 4] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [5 5] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [6 6] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [7 7] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [8 8] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [9 9] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [10 10] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [27 27] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [92 92] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [93 93] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [1 1] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [36 39] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [68 71] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [40 43] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [72 75] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [11 14] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [44 47] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [76 79] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [15 18] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [48 51] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [80 83] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [19 22] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [52 55] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [84 87] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [23 26] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [56 59] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [88 91] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [28 31] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [60 63] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [32 35] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702], [64 67] > ["e18633fb-6613-4afc-bea1-9941084508c5" 6702]}, > :executor->start-time-secs {[2 2] 1413523858, [3 3] 1413523858, [4 4] > 1413523858, [5 5] 1413523858, [6 6] 1413523858, [7 7] 1413523858, [8 > 8] 1413523858, [9 9] 1413523858, [10 10] 1413523858, [27 27] > 1413523858, [92 92] 1413523858, [93 93] 1413523858, [1 1] 1413523858, > [36 39] 1413523858, > [68 71] 1413523858, [40 43] 1413523858, [72 75] 1413523858, [11 14] > 1413523858, [44 47] 1413523858, [76 79] 1413523858, [15 18] > 1413523858, > [48 51] 1413523858, [80 83] 1413523858, [19 22] 1413523858, [52 55] > 1413523858, [84 87] 1413523858, [23 26] 1413523858, [56 59] > 1413523858, > [88 91] 1413523858, [28 31] 1413523858, [60 63] 1413523858, [32 35] > 1413523858, [64 67] 1413523858}} > > Then it reassign the tasks to supervisor. The problem is as we use > Kafka Spout to get messages from Kafka server. We observed that > Certain messages have already been processed before the reassign of > tasks (persist into a Data Base) and the reassign of tasks make those > messages been reflowed as I believe that the offset of those already > been treated kafka messages not committed to the zookeeper server. > > So questions are: > 1) I know it's one of the fault tolerant feature provided by storm and > kafka, is this a normal case? Anything wrong with the not alive issue? > 2) If there's potential issue with task not alive, how could I check > in details why the task is not alive? As I believe that the nimbus, > supervisor process never dies in our test case. > > Hope I make things clear if anything not clear let me know. > > Thanks and regards, > Yiming >
