The overflow list is basically flattened when it is drained into the Disruptor by the flusher. So again insertion is one element at a time.
In your case overflow seems empty in both readings. The observation is indeed puzzling. Are these metrics for a terminal bolt / intermediate bolt / spout ? Are those metrics for the inbound or the outbound DisruptorQ of the spout/bolt? -roshan From: Adam Meyerowitz <ameye...@gmail.com> Reply-To: "user@storm.apache.org" <user@storm.apache.org> Date: Tuesday, June 6, 2017 at 7:12 AM To: "user@storm.apache.org" <user@storm.apache.org> Subject: Re: LMAX queue batch size To make things a little bit more specific this is the kind of thing we are seeing.. I have chopped out some line, etc to make it more readable. Note that during this one minute interval the read_pos went from 177914 to 178522 which is a difference of 608 yet we processed 4140 tuples based on the execute-count. Maybe we are just interpreting these stats incorrectly..... 2017-06-05 13:50:17.834 EMSMetricsConsumer 10:B_CALC :8__receive {arrival_rate_secs=9.876286516269882, overflow=0, read_pos=177914, write_pos=177915, sojourn_time_ms=101.25263157894737, capacity=16384, population 2017-06-05 13:51:17.836 EMSMetricsConsumer 10:B_CALC :8 __receive {arrival_rate_secs=10.809687142708658, overflow=0, read_pos=178522, write_pos=178523, sojourn_time_ms=92.50961538461539, capacity=16384, population =1} 2017-06-05 13:51:17.837 EMSMetricsConsumer 10:B_CALC :8 __execute-count {B_MARKET:TRADE_STREAM=4140, B_TIMER:DATE_TICK=0, B_MARKET:OPTX_STREAM=0} On Mon, Jun 5, 2017 at 9:10 PM, Roshan Naik <ros...@hortonworks.com<mailto:ros...@hortonworks.com>> wrote: AM> So the difference in the read and write queue positions will be 100 then, correct? Only if the diff was 0 prior to the publish. More accurately… the diff between the read and write positions will increase by 100 after the publish. AM> Empirically what we are seeing would lead us to believe that each queue entry is actually a list of tuples, not a single tuple. It is not the case with the current code in master/1.x branches. Not sure if it used to be different previously. AM> There is a pretty old description of this behavior by Michael Noll @http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/. Very good write-up but wondering if it's still appliest. Michael’s otherwise excellent description seems a bit dated now. Not very sure, but that behavior may have been true when he wrote the blog. I give a more detailed and up-to-date diagram/description of current messaging system here (starting 36:00) https://www.youtube.com/watch?v=kCRv6iEd7Ow With the diagram show there, here is more info on the write path: - The incoming writes are buffered into an ArrayList<Object> called ‘currentBatch’ - If currenBatch is full then we try to drain it into the Disruptor (as I described previously… one element at a time.. followed by a single publish). - But if Disruptor is full, the entire currentBatch list is inserted as *one element* into another (unbounded) overflow list which is of type ConcurrentLinkedQueue<ArrayList<Object>>. The currentBatch is then cleared to make room for new incoming events - Every once in a while a flusher thread comes along and tries to drain any available items in overflow list into the Disruptor. Having said that, we are planning to significantly revise this messaging subsystem for 2.0 as explained later in the above video. -roshan