I looked at the dumps. Comments inline. On 18.05.2009 20:33, kvancamp wrote: > My problem seems to be most similar to this post. We are having intermittent > problems with the JBoss/Tomcat AJP 1.3 connector hanging. From searching > the JBoss and Tomcat user forums, other issues that are similar to mine are: > http://marc.info/?l=tomcat-user&m=116231271819840&w=2 > http://www.nabble.com/Problem-with-AJP-connector-td19657959.html#a19657959 > > neither of which really seems to offer a solution. Here are my specifics: > > We are running JBoss 4.2.2 (which uses Tomcat 6) running on Linux (RedHat > 5.3) behind an IIS proxy, which is proxying to the JBoss AJP port. I have > left AJP at its default settings in my server.xml: > <!-- A AJP 1.3 Connector on port 8009 --> > <Connector protocol="AJP/1.3" port="8009" > address="${jboss.bind.address}" > redirectPort="8443" /> > The behavior I’m observing is only occurring about once every 2 weeks, > making it difficult to reproduce. From the user’s perspective, the site is > unreachable. The IIS proxy is logging this when the problem occurs: > [Tue Apr 21 04:13:14.775 2009] [3192:2500] [error] jk_ajp_common.c (1011): > (adastarNode) can't receive the response message from tomcat, network > problems or tomcat (172.17.3.240:8009) is down (errno=54)
54 is winsock error 10054, connection reset by peer. > [Tue Apr 21 04:13:14.775 2009] [3192:2500] [error] jk_ajp_common.c (1766): > (adastarNode) Tomcat is down or refused connection. No response has been > sent to the client (yet) > [Tue Apr 21 04:13:14.775 2009] [3192:2500] [info] jk_ajp_common.c (2186): > (adastarNode) sending request to tomcat failed (recoverable), (attempt=1) > > My JBoss instance is not logging any errors during this timeframe. As far > as how to solve the problem, in one case the server was left like this for > several hours and seemed to recover on its own, only to hang again a couple > of hours later; otherwise the only solution that’s worked is to restart > JBoss. > > The main difference I can observe in a thread dump is that the AJP acceptor > thread, which is normally in a RUNNABLE state, is in a WAITING state when > the hang occurs: > "ajp-abeitmpr1.andesatpa.com%2F172.17.3.88-8009-Acceptor-0" daemon prio=10 > tid=0x00002aaad7a70400 nid=0x7dae in Object.wait() > [0x0000000044240000..0x0000000044240c10] > java.lang.Thread.State: WAITING (on object monitor) Good analysis. If you look further down the stack of this thread you see it sticks in org.apache.tomcat.util.net.JIoEndpoint.getWorkerThread(). Furthermore all you threads available for requests at port 8009 are in the same state, namely they are connected to the web server waiting for the next request to come in over the existing connection they handle. In all dumps there were 40 of those, so I assume your pool size of the AJP connector (port 8009) is maximum 40, all of those are connected, and when the next connection came in, the acceptor tries to get another free thread, none exists and the acceptor is blocked. As a consequence, neither the new connection is handled, not is your JBOSS able to accept any more connections. The root cause is, that all threads in your pool are busy, waiting for new rquests coming in over existing connections. So there are two possiblities: - either you need more threads to handle more connections - or something is wrong and instead those threads should be freed. It's not totally trivial to guess now, which of the two cases you are in, but I would say the second is more likely. > Lately I’ve been trying to also use netstat to look at the problem when a > hang occurs, but I’m not sure I’ve caught it during a true hang. It appears > to me that I have a growing number of ESTABLISHED connections prior to the > hang, plus one CLOSE_WAIT connection: > [it...@abeitmpr1 log]$ netstat -vatn |grep 8009 > tcp 0 0 172.17.3.88:8009 0.0.0.0:* > LISTEN > tcp 516 0 172.17.3.88:8009 172.17.5.42:2154 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3690 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2159 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2158 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2144 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3680 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2171 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2170 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1395 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2935 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4724 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2120 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2375 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2119 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2118 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2372 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1114 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2143 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1116 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2131 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3923 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2133 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2132 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2347 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1834 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2093 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1837 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2092 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2348 > ESTABLISHED > tcp 795 0 172.17.3.88:8009 172.17.5.42:2080 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2336 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2086 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2105 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2360 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1592 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2111 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2366 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2099 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2359 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1288 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4610 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2311 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2309 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2308 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4635 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2335 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2079 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4126 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2334 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2323 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2835 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2322 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1809 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4884 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3049 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2286 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2285 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1772 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2529 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2273 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2272 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2277 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2297 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3064 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2294 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2248 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1736 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1224 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1219 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2247 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2246 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2266 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2259 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1233 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2260 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2221 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2220 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1443 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2208 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2214 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2235 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2234 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3002 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3513 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3518 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3260 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4019 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4789 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2184 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:3213 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:1667 > ESTABLISHED > tcp 516 0 172.17.3.88:8009 172.17.5.42:2183 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2182 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:4767 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2207 > ESTABLISHED > tcp 0 0 172.17.3.88:8009 172.17.5.42:2204 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2195 > ESTABLISHED > tcp 514 0 172.17.3.88:8009 172.17.5.42:2196 > ESTABLISHED > tcp 1 0 172.17.3.88:8009 172.17.1.73:4169 > CLOSE_WAIT > > If anyone has any leads on this problem, or suggestions for things to try, > it would be appreciated. I would say you should: - set connectionTimeout on the AJP connector of JBOSS - ensure you are using a recent version of the IIS plugin (1.2.28) - read the timeouts documentation page of the plugin and set appropriate timeouts. - monitor the use of the ajp threads in order to find out, whether the problem occurs slowly step by step until at the end all threads are bound, or it occurs spontaneously The thread use monitoring would also give you an idea, what a good number of ajp pool threads in your situation would be. Do you have a firewall between IIS and JBOSS? Regards, Rainer --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org