It can a network issue (high latency).
Check your network configuration( NIC )
Vladi

On Thu, Nov 13, 2014 at 12:25 AM, Hines, Sohail <[email protected]>
wrote:

>  We have a 2-node Storm cluster in production (v0.9.0.1).  The condition
> we're noticing is that Storm Complete Latency varies throughout the day
> from *300ms to 1**2**00ms*.  However during the same period, the sum of
> all Bolt Execute Latency times only add up to about ** *50ms ***.  The
> same is true for the sum of all of the Bolt Process Latency times.  See
> snapshot of this illustrated below (monospaced font works best), along with
> the full storm configuration.
>
>
>  We've tried doubling the topology workers and ackers, as well as
> increasing tasks/parallelism for all bolts.  Lastly we also tried to
> increase the topology.executor send/receive/transfer buffer sizes (as
> included below).
>
>
>  However we're unable to account for most of the topology latency
> as reported under Complete Latency.  Should the sum of the bolt
> latencies closely match the topology's Complete Latency?  If so, what
> tuning parameters could we change to help achieve this?
>
>  *Topology stats (Storm UI)*
>
> Window                  Emitted       Transferred   Complete latency (ms) 
> Acked
>     Failed
> 10m 0s                   4502820      5912040       *1009.565*
>   260600    0
> 3h 0m 0s                 71909220     94401060      731.852
> 4466280   0
> 1d 0h 0m 0s              461365780    604827120     486.754
> 31079420  0
> All time                 911575300    1194802200    483.995
> 61884080  0
>
>  *Bolts (All time)*
>
>  Id                 Executors   Tasks      Emitted      Transferred
>  Capacity  Execute latency  Executed      Process latency (ms)
>  writerBolt         12          12         0            0
>  0.479      *1.246*           283204540      1.244
> udsEnhancement     400         400        283242820    566485640
>  0.872     *47.420*           283204720     47.397
> transformBolt      8           8          283168440    283168440
>  0.772      *1.085*           283205380      1.085
> realtimePublisher  8           8          0            0
>  0.533      *0.834*           283204360      0.831
> parserBolt         20          20         283254740    283238820
>  0.745      *8.392*           61884380       8.409
>
>  *Full Configuration Settings:*
>
>  medio.storm.debug false
> medio.storm.events.max_future_allowed_hours 24
> medio.storm.events.max_past_allowed_hours 720
> medio.storm.max_spout_pending 100
> medio.storm.max_task_parallelism 5
> medio.storm.message_timeout_seconds 180
> medio.storm.num_ackers 4
> medio.storm.num_workers 4
> medio.storm.topology_name avalanche-core
> medio.topology.connect_kafka_logging_bolt false
> medio.topology.emit_mock_kafka_requests false
> medio.topology.version 2.4.6
> nimbus.childopts -Xmx1024m
> nimbus.cleanup.inbox.freq.secs 600
> nimbus.file.copy.expiration.secs 600
> nimbus.host m3web01-sef.prod.msrch
> nimbus.inbox.jar.expiration.secs 3600
> nimbus.monitor.freq.secs 10
> nimbus.reassign true
> nimbus.supervisor.timeout.secs 60
> nimbus.task.launch.secs 120
> nimbus.task.timeout.secs 30
> nimbus.thrift.port 6627
> nimbus.topology.validator backtype.storm.nimbus.DefaultTopologyValidator
> storm.cluster.mode distributed
> storm.id avalanche-core-3-1415647968
> storm.local.dir /workplace/storm/local
> storm.local.mode.zmq false
> storm.messaging.netty.buffer_size 5242880
> storm.messaging.netty.client_worker_threads 1
> storm.messaging.netty.max_retries 30
> storm.messaging.netty.max_wait_ms 1000
> storm.messaging.netty.min_wait_ms 100
> storm.messaging.netty.server_worker_threads 1
> storm.messaging.transport backtype.storm.messaging.zmq
> storm.thrift.transport backtype.storm.security.auth.SimpleTransportPlugin
> storm.zookeeper.connection.timeout 15000
> storm.zookeeper.port 2181
> storm.zookeeper.retry.interval 1000
> storm.zookeeper.retry.intervalceiling.millis 30000
> storm.zookeeper.retry.times 5
> storm.zookeeper.root /storm
> storm.zookeeper.servers ["zk01-sef" "zk02-sef" "zk03-sef"]
> storm.zookeeper.session.timeout 20000
> supervisor.childopts -Xmx256m
> supervisor.enable true
> supervisor.heartbeat.frequency.secs 5
> supervisor.monitor.frequency.secs 3
> supervisor.slots.ports [6700 6701 6702 6703]
> supervisor.worker.start.timeout.secs 120
> supervisor.worker.timeout.secs 30
> task.heartbeat.frequency.secs 3
> task.refresh.poll.secs 10
> topology.acker.executors 4
> topology.builtin.metrics.bucket.size.secs 60
> topology.debug false
> topology.disruptor.wait.strategy com.lmax.disruptor.BlockingWaitStrategy
> topology.enable.message.timeouts true
> topology.error.throttle.interval.secs 10
> topology.executor.receive.buffer.size 16384
> topology.executor.send.buffer.size 16384
> topology.fall.back.on.java.serialization false
> topology.kryo.decorators []
> topology.kryo.factory backtype.storm.serialization.DefaultKryoFactory
> topology.kryo.register 
> {"com.medio.services.avalanche.model.impl.RoutableEventImpl"
> "com.medio.services.avalanche.storm.common.serializers.RoutableEventSerializer"}
> topology.max.error.report.per.interval 5
> topology.max.spout.pending 100
> topology.max.task.parallelism
> topology.message.timeout.secs 180
> topology.metrics.consumer.register [{"argument" nil, "parallelism.hint"
> 2, "class"
> "com.medio.services.tempest.client.storm.OpenTSDBMetricConsumer"}]
> topology.name avalanche-core
> topology.optimize true
> topology.receiver.buffer.size 8
> topology.skip.missing.kryo.registrations false
> topology.sleep.spout.wait.strategy.time.ms 1
> topology.spout.wait.strategy backtype.storm.spout.SleepSpoutWaitStrategy
> topology.state.synchronization.timeout.secs 60
> topology.stats.sample.rate 0.05
> topology.tasks
> topology.tick.tuple.freq.secs
> topology.transfer.buffer.size 32
> topology.trident.batch.emit.interval.millis 500
> topology.tuple.serializer
> backtype.storm.serialization.types.ListDelegateSerializer
> topology.worker.childopts
> topology.worker.shared.thread.pool.size 4
> topology.workers 4
>
>   medio.bolt.event_parser.num_tasks 20
> medio.bolt.event_parser.parallelism_hint 20
> medio.bolt.event_transform.enabled true
> medio.bolt.event_transform.num_tasks 8
> medio.bolt.event_transform.parallelism_hint 8
> medio.bolt.event_transform.refresh_interval 60000
> medio.bolt.event_transform.root_directory
> /usr/medio/avalanche-storm-transforms/
> medio.bolt.event_writer.num_tasks 12
> medio.bolt.event_writer.parallelism_hint 12
> medio.bolt.event_writer.upload_notification.batch_expire_ms 5000
> medio.bolt.event_writer.upload_notification.batch_size 200
> medio.bolt.event_writer.upload_notification.compression_codec 0
> medio.bolt.event_writer.upload_notification.max_queue_size 5000
> medio.bolt.event_writer.upload_notification.topic avalancheFileUploads
> medio.bolt.event_writer.upload_notification.zk_connect
> medio.bolt.kafka_logging.num_tasks 1
> medio.bolt.kafka_logging.parallelism_hint 1
> medio.bolt.location.blacklisted_events
> medio.bolt.location.enabled false
> medio.bolt.location.magellan.host 10.10.32.249:8904
> medio.bolt.location.magellan.timeout_ms 5000
> medio.bolt.location.num_tasks 20
> medio.bolt.location.parallelism_hint 20
> medio.bolt.location.whitelisted_events
> medio.bolt.realtime_publisher.enabled true
> medio.bolt.realtime_publisher.kafka.batch_expire_ms 5000
> medio.bolt.realtime_publisher.kafka.batch_size 2000
> medio.bolt.realtime_publisher.kafka.compression_codec 1
> medio.bolt.realtime_publisher.kafka.max_queue_size 100000
> medio.bolt.realtime_publisher.kafka.topic realtimeEvents
> medio.bolt.realtime_publisher.kafka.zk_connect
> zk01-sef,zk02-sef,zk03-sef:2181/kafka/prod_cluster
> medio.bolt.realtime_publisher.num_tasks 8
> medio.bolt.realtime_publisher.parallelism_hint 8
> medio.bolt.realtime_publisher.rabbit.enabled false
> medio.bolt.realtime_publisher.rabbit.exchange_name realtimeEvents
> medio.bolt.realtime_publisher.rabbit.host dlc02-sea
> medio.bolt.realtime_publisher.whitelisted_events *:*:*
>
>  medio.bolt.uds_enhancement.ccp.service.root
> http://ccp.medio.com/ccp-ws/v1/
> medio.bolt.uds_enhancement.ccpclient.connectionTimeout 3000
> medio.bolt.uds_enhancement.enabled true
> medio.bolt.uds_enhancement.insert_medio_id.default_uds_mapping 
> anon_id->anon-id
> : user_id -> customer-id : device_id -> device-id : MSISDN -> msisdn
> medio.bolt.uds_enhancement.num_tasks 400
> medio.bolt.uds_enhancement.parallelism_hint 400
> medio.bolt.uds_enhancement.uss.service.root http://sefuss:8422/uss-ws/v1/
> medio.bolt.uds_enhancement.whitelisted_events *:*:
> medio.botl.uds_enhancement.ccpclient.root http://ccp.medio.com/ccp-ws/v1/​
>
>
>

Reply via email to