Re: Handling requests when under load - ACCEPT and RST vs non-ACCEPT

Asankha C. Perera Tue, 06 Nov 2012 20:02:01 -0800

Hi Chris

My expectation from the backlog is:


1. Connections that can be handled directly will be accepted and work
will begin

2. Connections that cannot be handled will accumulate in the backlog

3. Connections that exceed the backlog will get "connection refused"

There are caveats, I would imagine. For instance, do the connections in
the backlog have any kind of server-side timeouts associated with them
-- what is, will they ever get discarded from the queue without ever
being handled by the bound process (assuming the bound process doesn't
terminate or anything weird like that)? Do the clients have any timeouts
associated with them?

Does the above *not* happen? On which platform? Is this only with NIO?

I am not a Linux level TCP expert, but what I believe is that the TCPlayer has its timeouts and older connection requests will get discardedfrom the queue etc. Typically a client will have a TCP level timeout aswell, i.e. the time it will wait for the other party to accept its SYNpacket. My testing has been primarily on Linux / Ubuntu.

Leaving everything to the TCP backlog makes the end clients see nastyRSTs when Tomcat is under load instead of connection refused - and couldprevent the client from performing a clean fail-over when one Tomcatnode is overloaded.

So you are eliminating the backlog entirely? Or are you allowing the
backlog to work as "expected"? Does closing and re-opening the socket
clear the existing backlog (which would cancel a number of waiting
though not technically accepted connections, I think), or does it retain
the backlog? Since you are re-binding, I would imagine that the backlog
gets flushed every time there is a "pause".

I am not sure how the backlog would work under different operatingsystems and conditions etc. However, the code I've shared shows how apure Java program could take better control of the underlying TCPbehavior - as visible to its clients.

What about performance effects of maintaining a connector-wide counter
of "active" connections, plus pausing and resuming the channel -- plus
re-connects by clients that have been dropped from the backlog?

What the UltraESB does by default is to stop accepting new connectionsafter a threshold is reached (e.g. 4096) and remain paused until theactive connections drops back to another threshold (e.g. 3073). Each ofthese parameters are user configurable, and depends on the maximumnumber of connections each node is expected to handle. Maintainingconnector wide counts in my experience does not cause any performanceeffects, neither re-connects by clients - as whats expected in realityis for a hardware load balancer to forward requests that are "refused"by one node, to another node, which hopefully is not loaded.

Such a fail-over can take place immediately, cleanly and without anycause of confusion even if the backend service is not idempotent. Thisis clearly not the case when a TCP/HTTP connection is accepted and thenmet with a hard RST after a part or a full request has been sent to it.

I'm concerned that all of your bench tests appear to be done using
telnet with a single acceptable connection. What if you allow 1000
simultaneous connections and test it under some real load so we can see
how such a solution would behave.

Clearly the example I shared was just to illustrate this with a pureJava program. We usually conduct performance tests over half a dozenopen source ESBs with concurrency levels of 20,40,80,160,320,640,1280and 2560 and payload sizes of 0.5, 1, 5, 10 and 100K bytes. You can seesome of the scenarios here http://esbperformance.org. We privatelyconduct performance tests beyond 2560 to much higher levels. We used aHttpComponents based EchoService as our backend service all this time,and it behaved very well with all load levels. However some weeks backwe accepted a contribution which was an async servlet to be deployed onTomcat as it was considered more "real world". The issues I noticed waswhen running high load levels over this servet deployed on Tomcat,especially when the response was being delayed to simulate realisticbehavior.

Although we do not Tomcat ourselves, our customers do. I am also notcalling this a bug - but as an area for possible improvement. If theTomcat users, developers and the PMC thinks this is worthwhile topursue, I believe it would be a good enhancement - maybe even a goodGSoc project. As a fellow member of the ASF and a committer on multipleprojects/years, I believed it was my duty to bring this to the attentionof the Tomcat community.


regards
asankha

--
Asankha C. Perera
AdroitLogic, http://adroitlogic.org

http://esbmagic.blogspot.com




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Handling requests when under load - ACCEPT and RST vs non-ACCEPT

Reply via email to