!! Please Help with Storm configuration !!

Hines, Sohail Wed, 12 Nov 2014 14:26:34 -0800

We have a 2-node Storm cluster in production (v0.9.0.1).  The condition we're 
noticing is that Storm Complete Latency varies throughout the day from 300ms to 
1200ms.  However during the same period, the sum of all Bolt Execute Latency 
times only add up to about ** 50ms **.  The same is true for the sum of all of 
the Bolt Process Latency times.  See snapshot of this illustrated below 
(monospaced font works best), along with the full storm configuration.



We've tried doubling the topology workers and ackers, as well as increasing 
tasks/parallelism for all bolts.  Lastly we also tried to increase the 
topology.executor send/receive/transfer buffer sizes (as included below).


However we're unable to account for most of the topology latency as reported 
under Complete Latency.  Should the sum of the bolt latencies closely match the 
topology's Complete Latency?  If so, what tuning parameters could we change to 
help achieve this?

Topology stats (Storm UI)

Window                  Emitted       Transferred   Complete latency (ms) Acked 
    Failed
10m 0s                   4502820      5912040       1009.565              
260600    0
3h 0m 0s                 71909220     94401060      731.852               
4466280   0
1d 0h 0m 0s              461365780    604827120     486.754               
31079420  0
All time                 911575300    1194802200    483.995               
61884080  0

Bolts (All time)

Id                 Executors   Tasks      Emitted      Transferred    Capacity  
Execute latency  Executed      Process latency (ms)
writerBolt         12          12         0            0              0.479     
 1.246           283204540      1.244
udsEnhancement     400         400        283242820    566485640      0.872     
47.420           283204720     47.397
transformBolt      8           8          283168440    283168440      0.772     
 1.085           283205380      1.085
realtimePublisher  8           8          0            0              0.533     
 0.834           283204360      0.831
parserBolt         20          20         283254740    283238820      0.745     
 8.392           61884380       8.409

Full Configuration Settings:

medio.storm.debug false
medio.storm.events.max_future_allowed_hours 24
medio.storm.events.max_past_allowed_hours 720
medio.storm.max_spout_pending 100
medio.storm.max_task_parallelism 5
medio.storm.message_timeout_seconds 180
medio.storm.num_ackers 4
medio.storm.num_workers 4
medio.storm.topology_name avalanche-core
medio.topology.connect_kafka_logging_bolt false
medio.topology.emit_mock_kafka_requests false
medio.topology.version 2.4.6
nimbus.childopts -Xmx1024m
nimbus.cleanup.inbox.freq.secs 600
nimbus.file.copy.expiration.secs 600
nimbus.host m3web01-sef.prod.msrch
nimbus.inbox.jar.expiration.secs 3600
nimbus.monitor.freq.secs 10
nimbus.reassign true
nimbus.supervisor.timeout.secs 60
nimbus.task.launch.secs 120
nimbus.task.timeout.secs 30
nimbus.thrift.port 6627
nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator
storm.cluster.mode distributed
storm.id avalanche-core-3-1415647968
storm.local.dir /workplace/storm/local
storm.local.mode.zmq false
storm.messaging.netty.buffer_size 5242880
storm.messaging.netty.client_worker_threads 1
storm.messaging.netty.max_retries 30
storm.messaging.netty.max_wait_ms 1000
storm.messaging.netty.min_wait_ms 100
storm.messaging.netty.server_worker_threads 1
storm.messaging.transport backtype.storm.messaging.zmq
storm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin
storm.zookeeper.connection.timeout 15000
storm.zookeeper.port 2181
storm.zookeeper.retry.interval 1000
storm.zookeeper.retry.intervalceiling.millis 30000
storm.zookeeper.retry.times 5
storm.zookeeper.root /storm
storm.zookeeper.servers ["zk01-sef" "zk02-sef" "zk03-sef"]
storm.zookeeper.session.timeout 20000
supervisor.childopts -Xmx256m
supervisor.enable true
supervisor.heartbeat.frequency.secs 5
supervisor.monitor.frequency.secs 3
supervisor.slots.ports [6700 6701 6702 6703]
supervisor.worker.start.timeout.secs 120
supervisor.worker.timeout.secs 30
task.heartbeat.frequency.secs 3
task.refresh.poll.secs 10
topology.acker.executors 4
topology.builtin.metrics.bucket.size.secs 60
topology.debug false
topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy
topology.enable.message.timeouts true
topology.error.throttle.interval.secs 10
topology.executor.receive.buffer.size 16384
topology.executor.send.buffer.size 16384
topology.fall.back.on.java.serialization false
topology.kryo.decorators []
topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory
topology.kryo.register 
{"com.medio.services.avalanche.model.impl.RoutableEventImpl" 
"com.medio.services.avalanche.storm.common.serializers.RoutableEventSerializer"}
topology.max.error.report.per.interval 5
topology.max.spout.pending 100
topology.max.task.parallelism
topology.message.timeout.secs 180
topology.metrics.consumer.register [{"argument" nil, "parallelism.hint" 2, 
"class" "com.medio.services.tempest.client.storm.OpenTSDBMetricConsumer"}]
topology.name avalanche-core
topology.optimize true
topology.receiver.buffer.size 8
topology.skip.missing.kryo.registrations false
topology.sleep.spout.wait.strategy.time.ms 1
topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy
topology.state.synchronization.timeout.secs 60
topology.stats.sample.rate 0.05
topology.tasks
topology.tick.tuple.freq.secs
topology.transfer.buffer.size 32
topology.trident.batch.emit.interval.millis 500
topology.tuple.serializer 
backtype.storm.serialization.types.ListDelegateSerializer
topology.worker.childopts
topology.worker.shared.thread.pool.size 4
topology.workers 4

medio.bolt.event_parser.num_tasks 20
medio.bolt.event_parser.parallelism_hint 20
medio.bolt.event_transform.enabled true
medio.bolt.event_transform.num_tasks 8
medio.bolt.event_transform.parallelism_hint 8
medio.bolt.event_transform.refresh_interval 60000
medio.bolt.event_transform.root_directory /usr/medio/avalanche-storm-transforms/
medio.bolt.event_writer.num_tasks 12
medio.bolt.event_writer.parallelism_hint 12
medio.bolt.event_writer.upload_notification.batch_expire_ms 5000
medio.bolt.event_writer.upload_notification.batch_size 200
medio.bolt.event_writer.upload_notification.compression_codec 0
medio.bolt.event_writer.upload_notification.max_queue_size 5000
medio.bolt.event_writer.upload_notification.topic avalancheFileUploads
medio.bolt.event_writer.upload_notification.zk_connect
medio.bolt.kafka_logging.num_tasks 1
medio.bolt.kafka_logging.parallelism_hint 1
medio.bolt.location.blacklisted_events
medio.bolt.location.enabled false
medio.bolt.location.magellan.host 10.10.32.249:8904
medio.bolt.location.magellan.timeout_ms 5000
medio.bolt.location.num_tasks 20
medio.bolt.location.parallelism_hint 20
medio.bolt.location.whitelisted_events
medio.bolt.realtime_publisher.enabled true
medio.bolt.realtime_publisher.kafka.batch_expire_ms 5000
medio.bolt.realtime_publisher.kafka.batch_size 2000
medio.bolt.realtime_publisher.kafka.compression_codec 1
medio.bolt.realtime_publisher.kafka.max_queue_size 100000
medio.bolt.realtime_publisher.kafka.topic realtimeEvents
medio.bolt.realtime_publisher.kafka.zk_connect 
zk01-sef,zk02-sef,zk03-sef:2181/kafka/prod_cluster
medio.bolt.realtime_publisher.num_tasks 8
medio.bolt.realtime_publisher.parallelism_hint 8
medio.bolt.realtime_publisher.rabbit.enabled false
medio.bolt.realtime_publisher.rabbit.exchange_name realtimeEvents
medio.bolt.realtime_publisher.rabbit.host dlc02-sea
medio.bolt.realtime_publisher.whitelisted_events *:*:*

medio.bolt.uds_enhancement.ccp.service.root http://ccp.medio.com/ccp-ws/v1/
medio.bolt.uds_enhancement.ccpclient.connectionTimeout 3000
medio.bolt.uds_enhancement.enabled true
medio.bolt.uds_enhancement.insert_medio_id.default_uds_mapping anon_id->anon-id 
: user_id -> customer-id : device_id -> device-id : MSISDN -> msisdn
medio.bolt.uds_enhancement.num_tasks 400
medio.bolt.uds_enhancement.parallelism_hint 400
medio.bolt.uds_enhancement.uss.service.root http://sefuss:8422/uss-ws/v1/
medio.bolt.uds_enhancement.whitelisted_events 
*:*:medio.botl.uds_enhancement.ccpclient.root http://ccp.medio.com/ccp-ws/v1/?

!! Please Help with Storm configuration !!

Reply via email to