Latency calculations in storm are a bit harry. If you really want them to be accurate (at the cost of performance) you need to set the topology.stats.sample.rate config to 1.0. Otherwise by default we will randomly sub-sample 5% of the tuples and then up multiply it accordingly. The complete latency is calculated using the class https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/metric/internal/MultiLatencyStatAndMetric.java on a per spout basis. I don't remember exactly how they are combined for a final number that appears on the UI, but I think it is just a simple average. When a tuple is emitted a timestamp is attached to the tuple. https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutOutputCollectorImpl.java#L128 When the tuple is fully acked, or marked as failed the time taken is calculated and recorded. In older versions of storm this was done in the spout to be sure that the same clock was used everywhere. In newer versions of storm the delta is calculated by the acker from the time it gets the first message to the time it gets the last message. Without knowing which version of storm you are using it is hard to tell why these numbers might be off.
- Bobby On Tuesday, July 25, 2017, 5:23:33 AM CDT, preethini v <preethin...@gmail.com> wrote: Also, Any hints on how the Storm metrics calculate the "complete latency"? Thanks,Preethini On Tue, Jul 25, 2017 at 9:48 AM, preethini v <preethin...@gmail.com> wrote: Hi Bobby, I am running a simple word count topology. I have 2 worker nodes and a nimbus/zookeeper node. The latency between the nodes is < 1ms. I have synched the clocks of all 3 nodes using NTP. Is this not sufficient ? Thanks,Preethini On Mon, Jul 24, 2017 at 5:22 PM, Bobby Evans <ev...@yahoo-inc.com> wrote: It is really hard to tell without more information. Off the top of my head it might have something to do with the system time on different hosts. Getting the current time in milliseconds is full of issues, especially with leap seconds etc, but it is even more problematic between machines because the time is not guaranteed to be synced very closely. That would be my first guess. If they are all on the same machine (you are not switching hosts), then my next guess would be a bug in the code some where, or a misinterpretation of the results. Do you have a reproducible use case that you can share? - Bobby On Monday, July 24, 2017, 10:13:59 AM CDT, preethini v <preethin...@gmail.com> wrote: Hi, I measure the latency of a storm topology in the below two ways. And I see a huge difference in the values. Approach 1: attach a start time with every tuple. Note the end time for that tuple in ack(). Calculate the time delta of start and end times. Latency value is ~ 104 ms. Approach 2: Using Storm UI parameter "complete Latency" to measure latency. Latency value is ~ 2-3 ms. Could someone please explain why is there a huge difference in Latency calculations?If not on timestamp basis, how does storm internal metrics system calculate the complete latency? Thanks,Preethini