Hi Chris
My expectation from the backlog is:
1. Connections that can be handled directly will be accepted and work
will begin
2. Connections that cannot be handled will accumulate in the backlog
3. Connections that exceed the backlog will get "connection refused"
There are caveats, I would imagine. For instance, do the connections in
the backlog have any kind of server-side timeouts associated with them
-- what is, will they ever get discarded from the queue without ever
being handled by the bound process (assuming the bound process doesn't
terminate or anything weird like that)? Do the clients have any timeouts
associated with them?
Does the above *not* happen? On which platform? Is this only with NIO?
I am not a Linux level TCP expert, but what I believe is that the TCP
layer has its timeouts and older connection requests will get discarded
from the queue etc. Typically a client will have a TCP level timeout as
well, i.e. the time it will wait for the other party to accept its SYN
packet. My testing has been primarily on Linux / Ubuntu.
Leaving everything to the TCP backlog makes the end clients see nasty
RSTs when Tomcat is under load instead of connection refused - and could
prevent the client from performing a clean fail-over when one Tomcat
node is overloaded.
So you are eliminating the backlog entirely? Or are you allowing the
backlog to work as "expected"? Does closing and re-opening the socket
clear the existing backlog (which would cancel a number of waiting
though not technically accepted connections, I think), or does it retain
the backlog? Since you are re-binding, I would imagine that the backlog
gets flushed every time there is a "pause".
I am not sure how the backlog would work under different operating
systems and conditions etc. However, the code I've shared shows how a
pure Java program could take better control of the underlying TCP
behavior - as visible to its clients.
What about performance effects of maintaining a connector-wide counter
of "active" connections, plus pausing and resuming the channel -- plus
re-connects by clients that have been dropped from the backlog?
What the UltraESB does by default is to stop accepting new connections
after a threshold is reached (e.g. 4096) and remain paused until the
active connections drops back to another threshold (e.g. 3073). Each of
these parameters are user configurable, and depends on the maximum
number of connections each node is expected to handle. Maintaining
connector wide counts in my experience does not cause any performance
effects, neither re-connects by clients - as whats expected in reality
is for a hardware load balancer to forward requests that are "refused"
by one node, to another node, which hopefully is not loaded.
Such a fail-over can take place immediately, cleanly and without any
cause of confusion even if the backend service is not idempotent. This
is clearly not the case when a TCP/HTTP connection is accepted and then
met with a hard RST after a part or a full request has been sent to it.
I'm concerned that all of your bench tests appear to be done using
telnet with a single acceptable connection. What if you allow 1000
simultaneous connections and test it under some real load so we can see
how such a solution would behave.
Clearly the example I shared was just to illustrate this with a pure
Java program. We usually conduct performance tests over half a dozen
open source ESBs with concurrency levels of 20,40,80,160,320,640,1280
and 2560 and payload sizes of 0.5, 1, 5, 10 and 100K bytes. You can see
some of the scenarios here http://esbperformance.org. We privately
conduct performance tests beyond 2560 to much higher levels. We used a
HttpComponents based EchoService as our backend service all this time,
and it behaved very well with all load levels. However some weeks back
we accepted a contribution which was an async servlet to be deployed on
Tomcat as it was considered more "real world". The issues I noticed was
when running high load levels over this servet deployed on Tomcat,
especially when the response was being delayed to simulate realistic
behavior.
Although we do not Tomcat ourselves, our customers do. I am also not
calling this a bug - but as an area for possible improvement. If the
Tomcat users, developers and the PMC thinks this is worthwhile to
pursue, I believe it would be a good enhancement - maybe even a good
GSoc project. As a fellow member of the ASF and a committer on multiple
projects/years, I believed it was my duty to bring this to the attention
of the Tomcat community.
regards
asankha
--
Asankha C. Perera
AdroitLogic, http://adroitlogic.org
http://esbmagic.blogspot.com
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org