It can a network issue (high latency). Check your network configuration( NIC ) Vladi
On Thu, Nov 13, 2014 at 12:25 AM, Hines, Sohail <[email protected]> wrote: > We have a 2-node Storm cluster in production (v0.9.0.1). The condition > we're noticing is that Storm Complete Latency varies throughout the day > from *300ms to 1**2**00ms*. However during the same period, the sum of > all Bolt Execute Latency times only add up to about ** *50ms ***. The > same is true for the sum of all of the Bolt Process Latency times. See > snapshot of this illustrated below (monospaced font works best), along with > the full storm configuration. > > > We've tried doubling the topology workers and ackers, as well as > increasing tasks/parallelism for all bolts. Lastly we also tried to > increase the topology.executor send/receive/transfer buffer sizes (as > included below). > > > However we're unable to account for most of the topology latency > as reported under Complete Latency. Should the sum of the bolt > latencies closely match the topology's Complete Latency? If so, what > tuning parameters could we change to help achieve this? > > *Topology stats (Storm UI)* > > Window Emitted Transferred Complete latency (ms) > Acked > Failed > 10m 0s 4502820 5912040 *1009.565* > 260600 0 > 3h 0m 0s 71909220 94401060 731.852 > 4466280 0 > 1d 0h 0m 0s 461365780 604827120 486.754 > 31079420 0 > All time 911575300 1194802200 483.995 > 61884080 0 > > *Bolts (All time)* > > Id Executors Tasks Emitted Transferred > Capacity Execute latency Executed Process latency (ms) > writerBolt 12 12 0 0 > 0.479 *1.246* 283204540 1.244 > udsEnhancement 400 400 283242820 566485640 > 0.872 *47.420* 283204720 47.397 > transformBolt 8 8 283168440 283168440 > 0.772 *1.085* 283205380 1.085 > realtimePublisher 8 8 0 0 > 0.533 *0.834* 283204360 0.831 > parserBolt 20 20 283254740 283238820 > 0.745 *8.392* 61884380 8.409 > > *Full Configuration Settings:* > > medio.storm.debug false > medio.storm.events.max_future_allowed_hours 24 > medio.storm.events.max_past_allowed_hours 720 > medio.storm.max_spout_pending 100 > medio.storm.max_task_parallelism 5 > medio.storm.message_timeout_seconds 180 > medio.storm.num_ackers 4 > medio.storm.num_workers 4 > medio.storm.topology_name avalanche-core > medio.topology.connect_kafka_logging_bolt false > medio.topology.emit_mock_kafka_requests false > medio.topology.version 2.4.6 > nimbus.childopts -Xmx1024m > nimbus.cleanup.inbox.freq.secs 600 > nimbus.file.copy.expiration.secs 600 > nimbus.host m3web01-sef.prod.msrch > nimbus.inbox.jar.expiration.secs 3600 > nimbus.monitor.freq.secs 10 > nimbus.reassign true > nimbus.supervisor.timeout.secs 60 > nimbus.task.launch.secs 120 > nimbus.task.timeout.secs 30 > nimbus.thrift.port 6627 > nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator > storm.cluster.mode distributed > storm.id avalanche-core-3-1415647968 > storm.local.dir /workplace/storm/local > storm.local.mode.zmq false > storm.messaging.netty.buffer_size 5242880 > storm.messaging.netty.client_worker_threads 1 > storm.messaging.netty.max_retries 30 > storm.messaging.netty.max_wait_ms 1000 > storm.messaging.netty.min_wait_ms 100 > storm.messaging.netty.server_worker_threads 1 > storm.messaging.transport backtype.storm.messaging.zmq > storm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin > storm.zookeeper.connection.timeout 15000 > storm.zookeeper.port 2181 > storm.zookeeper.retry.interval 1000 > storm.zookeeper.retry.intervalceiling.millis 30000 > storm.zookeeper.retry.times 5 > storm.zookeeper.root /storm > storm.zookeeper.servers ["zk01-sef" "zk02-sef" "zk03-sef"] > storm.zookeeper.session.timeout 20000 > supervisor.childopts -Xmx256m > supervisor.enable true > supervisor.heartbeat.frequency.secs 5 > supervisor.monitor.frequency.secs 3 > supervisor.slots.ports [6700 6701 6702 6703] > supervisor.worker.start.timeout.secs 120 > supervisor.worker.timeout.secs 30 > task.heartbeat.frequency.secs 3 > task.refresh.poll.secs 10 > topology.acker.executors 4 > topology.builtin.metrics.bucket.size.secs 60 > topology.debug false > topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy > topology.enable.message.timeouts true > topology.error.throttle.interval.secs 10 > topology.executor.receive.buffer.size 16384 > topology.executor.send.buffer.size 16384 > topology.fall.back.on.java.serialization false > topology.kryo.decorators [] > topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory > topology.kryo.register > {"com.medio.services.avalanche.model.impl.RoutableEventImpl" > "com.medio.services.avalanche.storm.common.serializers.RoutableEventSerializer"} > topology.max.error.report.per.interval 5 > topology.max.spout.pending 100 > topology.max.task.parallelism > topology.message.timeout.secs 180 > topology.metrics.consumer.register [{"argument" nil, "parallelism.hint" > 2, "class" > "com.medio.services.tempest.client.storm.OpenTSDBMetricConsumer"}] > topology.name avalanche-core > topology.optimize true > topology.receiver.buffer.size 8 > topology.skip.missing.kryo.registrations false > topology.sleep.spout.wait.strategy.time.ms 1 > topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy > topology.state.synchronization.timeout.secs 60 > topology.stats.sample.rate 0.05 > topology.tasks > topology.tick.tuple.freq.secs > topology.transfer.buffer.size 32 > topology.trident.batch.emit.interval.millis 500 > topology.tuple.serializer > backtype.storm.serialization.types.ListDelegateSerializer > topology.worker.childopts > topology.worker.shared.thread.pool.size 4 > topology.workers 4 > > medio.bolt.event_parser.num_tasks 20 > medio.bolt.event_parser.parallelism_hint 20 > medio.bolt.event_transform.enabled true > medio.bolt.event_transform.num_tasks 8 > medio.bolt.event_transform.parallelism_hint 8 > medio.bolt.event_transform.refresh_interval 60000 > medio.bolt.event_transform.root_directory > /usr/medio/avalanche-storm-transforms/ > medio.bolt.event_writer.num_tasks 12 > medio.bolt.event_writer.parallelism_hint 12 > medio.bolt.event_writer.upload_notification.batch_expire_ms 5000 > medio.bolt.event_writer.upload_notification.batch_size 200 > medio.bolt.event_writer.upload_notification.compression_codec 0 > medio.bolt.event_writer.upload_notification.max_queue_size 5000 > medio.bolt.event_writer.upload_notification.topic avalancheFileUploads > medio.bolt.event_writer.upload_notification.zk_connect > medio.bolt.kafka_logging.num_tasks 1 > medio.bolt.kafka_logging.parallelism_hint 1 > medio.bolt.location.blacklisted_events > medio.bolt.location.enabled false > medio.bolt.location.magellan.host 10.10.32.249:8904 > medio.bolt.location.magellan.timeout_ms 5000 > medio.bolt.location.num_tasks 20 > medio.bolt.location.parallelism_hint 20 > medio.bolt.location.whitelisted_events > medio.bolt.realtime_publisher.enabled true > medio.bolt.realtime_publisher.kafka.batch_expire_ms 5000 > medio.bolt.realtime_publisher.kafka.batch_size 2000 > medio.bolt.realtime_publisher.kafka.compression_codec 1 > medio.bolt.realtime_publisher.kafka.max_queue_size 100000 > medio.bolt.realtime_publisher.kafka.topic realtimeEvents > medio.bolt.realtime_publisher.kafka.zk_connect > zk01-sef,zk02-sef,zk03-sef:2181/kafka/prod_cluster > medio.bolt.realtime_publisher.num_tasks 8 > medio.bolt.realtime_publisher.parallelism_hint 8 > medio.bolt.realtime_publisher.rabbit.enabled false > medio.bolt.realtime_publisher.rabbit.exchange_name realtimeEvents > medio.bolt.realtime_publisher.rabbit.host dlc02-sea > medio.bolt.realtime_publisher.whitelisted_events *:*:* > > medio.bolt.uds_enhancement.ccp.service.root > http://ccp.medio.com/ccp-ws/v1/ > medio.bolt.uds_enhancement.ccpclient.connectionTimeout 3000 > medio.bolt.uds_enhancement.enabled true > medio.bolt.uds_enhancement.insert_medio_id.default_uds_mapping > anon_id->anon-id > : user_id -> customer-id : device_id -> device-id : MSISDN -> msisdn > medio.bolt.uds_enhancement.num_tasks 400 > medio.bolt.uds_enhancement.parallelism_hint 400 > medio.bolt.uds_enhancement.uss.service.root http://sefuss:8422/uss-ws/v1/ > medio.bolt.uds_enhancement.whitelisted_events *:*: > medio.botl.uds_enhancement.ccpclient.root http://ccp.medio.com/ccp-ws/v1/ > > >
