On Sat, 13 Feb 2021 at 16:40, Ted Ross <[email protected]> wrote: > > On Fri, Feb 12, 2021 at 1:47 PM Michael Goulish <[email protected]> wrote: > > > Well, *this* certainly made a difference! > > I tried this test: > > > > *message size:* 200000 bytes > > *client-pairs:* 10 > > *sender pause between messages:* 10 msec > > *messages per sender:* 10,000 > > * credit window:* 1000 > > > > > > > > > > *Results:* > > > > router buffer size > > 512 bytes 4K bytes > > ----------------------------------- > > CPU 517% 102% > > Mem 711 MB 59 MB > > Latency 26.9 *seconds* 2.486 *msec* > > > > > > So with the large messages and our normal buffer size of 1/2 K, the router > > just got overwhelmed. What I recorded was average memory usage, but looking > > at the time sequence I see that its memory kept increasing steadily until > > the end of the test. > > > > With the large messages, the credit window is not sufficient to protect the > memory of the router. I think this test needs to use a limited session > window as well. This will put back-pressure on the senders much earlier in > the test. With 200Kbyte messages x 1000 credits x 10 senders, there's a > theoretical maximum of 2Gig of proton buffer memory that can be consumed > before the router core ever moves any data. It's interesting that in the > 4K-buffer case, the router core keeps up with the flow and in the 512-byte > case, it does not.
The senders are noted as having a 10ms delay between sends, how exactly is that achieved? Do the receivers receive flat out? Is that 1000 credit window from the receiver to router, or from the router to the sender, or both? If the senders are slow compared to the receivers, and only 200MB/sec max is actually hitting the router as a result of the governed sends, I'm somewhat surprised the router would ever seem to accumulate as much data (noted as an average; any idea what was the peak?) in such a test unless something odd/interesting starts happening to it at the smaller buffer size after some time. From the other mail it seems it all plays nicely for > 200 seconds and only then starts to behave differently, since delivery speed over time appears as expected from the governed sends meaning there should be no accumulation, and then it noticeably reduces meaning there must be some if the sends maintained their rate. There is a clear disparity in the CPU result between the two tests; was there any discernible difference in the 512b test alone at the point the receive throughput looks to reduce? > > It appears that increasing the buffer size is a good idea. I don't think > we've figured out how much the increase should be, however. We should look > at interim sizes: 1K, 2K, maybe 1.5K and 3K. We want the smallest buffer > size that gives us acceptable performance. If throughput, CPU, and memory > use improve sharply with buffer size then level off, let's identify the > "knee of the curve" and see what buffer size that represents. > > > > > > Messages just sat there waiting to get processed, which is maybe why their > > average latency was *10,000 times longer* than when I used the large > > buffers. > > > > And Nothing Bad Happened in the 4K buffer test. No crash, all messages > > delivered, normal shutdown. > > > > Now I will try a long-duration test to see if it survives that while using > > the large buffers. > > > > If it does survive OK, we need to see what happens with large buffers as > > message size varies from small to large. > > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
