Regarding why you need the 5th tuple, it is happening because you are using timestamp fields. The windowing code will receive the first 4 tuples and add them to the same window. Until it receives the 5th tuple, there is no way to tell whether the window is "done", as we might receive more tuples that fall within the window. The 5th tuple acts as a trigger that tells the windowing code that the window with the first 4 tuples is now over, and should be delivered to your bolt.
More specifically, the way it works is that there's a thread running which periodically (every 10 seconds in your case) sets a watermark. The watermark is set to be the timestamp of the newest received tuple, minus the lag. The watermark is then passed on to a trigger policy, which decides how to generate windows. The windows are generated from the watermark backwards, so if e.g. your watermark is 10, your lag is 2 and your interval is 3, it will try to generate windows for 0-2, 2-5, 5-8. Note that any tuples with timestamp 9 and 10 aren't delivered yet, as you've said you expect up to 2 seconds of lag, so it isn't safe to close the window containing them yet. We can't deliver 9 and 10 until we see a tuple with timestamp 10 plus the lag, so 12. See https://github.com/apache/storm/blob/21bb1388414d373572779289edc785c7e5aa52aa/storm-client/src/jvm/org/apache/storm/windowing/WaterMarkEventGenerator.java and https://github.com/apache/storm/blob/925422a5b5ad1c3329a2c2b44db460ae94f70806/storm-client/src/jvm/org/apache/storm/windowing/WatermarkTimeTriggerPolicy.java Regarding why your tuples are getting split, I don't know. Are you maybe running multiple tasks of the windowing bolt? Den tir. 30. jul. 2019 kl. 16.11 skrev Sandeep Singh < tosandeepsi...@gmail.com>: > Sorry for multiple message with same subject as I had to register with > different email address. > To follow up on the thread, can someone explain to me why the tuples with > same timestamp are sometimes sent in two different time windows? And also > why sending an extra 5th tuple is required before storm invokes execute > method? Do I need to set a different value for tumbling window duration or > lag? > Thank you for your help in advance > Sandeep > > On Mon, Jul 29, 2019 at 7:27 PM Sandeep Singh <tosandeepsi...@gmail.com> > wrote: > >> During testing of my topology which uses Storm's Tumbling window, I see >> strange behavior how my stream of tuples are handled and split into >> different time windows. >> >> I am using a Tumbling window with duration and lag set to 10 seconds: >> >> * val *duration = BaseWindowedBolt.Duration.*seconds*(10) >> >> >> >> myBolt.withTumblingWindow(duration).withTimestampField("timestampField").withLag(duration) >> >> >> >> When I send four tuples with timestamp set to same value "now - 1 second" >> (where now = System.*currentTimeMillis*()), I see log messages that >> storm is able to extract the time information from tuples. However bolt's >> *"*execute(inputWindow: TupleWindow)" method never gets invoked. In my >> test I wait for 2 minutes. I do not see any log message about late tuples. >> >> >> >> When I send five tuples, the first four with timestamp "now - 1 second" >> and last one with "now + 1 hour", I see Storm is able to extract all the >> five tuples. However the execute(inputWindow: TupleWindow) method is >> either invoked >> >> a) only once with first four tuple (the behavior I expected) or, >> >> b) twice, first invocation with tuple 1 & 2, second invocation with >> tuple 3 & 4. Since all the four tuples have exactly same timestamp, I don't >> understand why tuples are partitined in different time windows. >> >> Also the bolt's execute method never get's invoked with 5th tuple. >> However, sending 5th tuple (which is well outside the time duration window >> of 10 seconds) ensure that execute method is called once or twice for the >> first four tuples. >> >> >> >