Hi, This gives a great explanation. Thank you very much.
I am using Storm version 1.1.0. Like you mention, I shall change the value of topology.stats.sample.rate and analyze. Thanks, Preethini On Tue, Jul 25, 2017 at 4:34 PM, Bobby Evans <[email protected]> wrote: > Latency calculations in storm are a bit harry. If you really want them to > be accurate (at the cost of performance) you need to set the > topology.stats.sample.rate config to 1.0. Otherwise by default we will > randomly sub-sample 5% of the tuples and then up multiply it accordingly. > The complete latency is calculated using the class > > https://github.com/apache/storm/blob/master/storm- > client/src/jvm/org/apache/storm/metric/internal/ > MultiLatencyStatAndMetric.java > > on a per spout basis. I don't remember exactly how they are combined for > a final number that appears on the UI, but I think it is just a simple > average. When a tuple is emitted a timestamp is attached to the tuple. > > https://github.com/apache/storm/blob/master/storm- > client/src/jvm/org/apache/storm/executor/spout/ > SpoutOutputCollectorImpl.java#L128 > > When the tuple is fully acked, or marked as failed the time taken is > calculated and recorded. > > In older versions of storm this was done in the spout to be sure that the > same clock was used everywhere. In newer versions of storm the delta is > calculated by the acker from the time it gets the first message to the time > it gets the last message. > > Without knowing which version of storm you are using it is hard to tell > why these numbers might be off. > > - Bobby > > > > On Tuesday, July 25, 2017, 5:23:33 AM CDT, preethini v < > [email protected]> wrote: > > > Also, > > Any hints on how the Storm metrics calculate the "complete latency"? > > Thanks, > Preethini > > On Tue, Jul 25, 2017 at 9:48 AM, preethini v <[email protected]> > wrote: > > Hi Bobby, > > I am running a simple word count topology. I have 2 worker nodes and a > nimbus/zookeeper node. The latency between the nodes is < 1ms. > > I have synched the clocks of all 3 nodes using NTP. Is this not > sufficient ? > > Thanks, > Preethini > > On Mon, Jul 24, 2017 at 5:22 PM, Bobby Evans <[email protected]> wrote: > > It is really hard to tell without more information. Off the top of my > head it might have something to do with the system time on different > hosts. Getting the current time in milliseconds is full of issues, > especially with leap seconds etc, but it is even more problematic between > machines because the time is not guaranteed to be synced very closely. > That would be my first guess. If they are all on the same machine (you are > not switching hosts), then my next guess would be a bug in the code some > where, or a misinterpretation of the results. > > Do you have a reproducible use case that you can share? > > - Bobby > > > > On Monday, July 24, 2017, 10:13:59 AM CDT, preethini v < > [email protected]> wrote: > > > Hi, > > I measure the latency of a storm topology in the below two ways. And I see > a huge difference in the values. > > *Approach 1*: attach a start time with every tuple. Note the end time for > that tuple in ack(). Calculate the time delta of start and end times. > > Latency value is ~ 104 ms. > > *Approach 2*: Using Storm UI parameter "complete Latency" to measure > latency. > > Latency value is ~ 2-3 ms. > > Could someone please explain why is there a huge difference in Latency > calculations? > If not on timestamp basis, how does storm internal metrics system > calculate the complete latency? > > Thanks, > Preethini > > > >
