Re: Difference in end-to-end Latency calculations for Storm

Bobby Evans Tue, 25 Jul 2017 07:35:03 -0700

Latency calculations in storm are a bit harry.  If you really want them to be 
accurate (at the cost of performance) you need to set the 
topology.stats.sample.rate config to 1.0.  Otherwise by default we will 
randomly sub-sample 5% of the tuples and then up multiply it accordingly.  The 
complete latency is calculated using the class 
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/metric/internal/MultiLatencyStatAndMetric.java
on a per spout basis.  I don't remember exactly how they are combined for a 
final number that appears on the UI, but I think it is just a simple average.  
When a tuple is emitted a timestamp is attached to the tuple.
https://github.com/apache/storm/blob/master/storm-client/src/jvm/org/apache/storm/executor/spout/SpoutOutputCollectorImpl.java#L128
When the tuple is fully acked, or marked as failed the time taken is calculated 
and recorded.
In older versions of storm this was done in the spout to be sure that the same 
clock was used everywhere.  In newer versions of storm the delta is calculated 
by the acker from the time it gets the first message to the time it gets the 
last message.
Without knowing which version of storm you are using it is hard to tell why 
these numbers might be off.


- Bobby


On Tuesday, July 25, 2017, 5:23:33 AM CDT, preethini v <preethin...@gmail.com> 
wrote:

Also,
Any hints on how the Storm metrics calculate the "complete latency"?
Thanks,Preethini
On Tue, Jul 25, 2017 at 9:48 AM, preethini v <preethin...@gmail.com> wrote:

Hi Bobby,
I am running a simple word count topology. I have 2 worker nodes and a 
nimbus/zookeeper node. The latency between the nodes is < 1ms.
I have synched the clocks of all 3 nodes using NTP.  Is this not sufficient ?
Thanks,Preethini
On Mon, Jul 24, 2017 at 5:22 PM, Bobby Evans <ev...@yahoo-inc.com> wrote:

It is really hard to tell without more information.  Off the top of my head it 
might have something to do with the system time on different hosts.  Getting 
the current time in milliseconds is full of issues, especially with leap 
seconds etc, but it is even more problematic between machines because the time 
is not guaranteed to be synced very closely.  That would be my first guess.  If 
they are all on the same machine (you are not switching hosts), then my next 
guess would be a bug in the code some where, or a misinterpretation of the 
results.
Do you have a reproducible use case that you can share?

- Bobby


On Monday, July 24, 2017, 10:13:59 AM CDT, preethini v <preethin...@gmail.com> 
wrote:

Hi,
I measure the latency of a storm topology in the below two ways. And I see a 
huge difference in the values. 
Approach 1: attach a start time with every tuple. Note the end time for that 
tuple in ack(). Calculate the time delta of start and end times. 
Latency value is ~ 104 ms.
Approach 2: Using Storm UI parameter "complete Latency" to measure latency.
Latency value is ~ 2-3 ms.
Could someone please explain why is there a huge difference in Latency 
calculations?If not on timestamp basis, how does storm internal metrics system 
calculate the complete latency?
Thanks,Preethini

Re: Difference in end-to-end Latency calculations for Storm

Reply via email to