I've spent few days trying to figure out what's going on but I give up. I've tried boosting max_packet_size and few other properties but with no success.
2012/4/30 Agnieszka Allstar <allstar...@gmail.com> > > > 2012/4/30 Christopher Schultz <ch...@christopherschultz.net> > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Kate, >> >> On 4/30/12 7:06 AM, Agnieszka Allstar wrote: >> > Here's my test scenario: 1. Web service client sends SOAP request >> > to apache server. This client sends requests in 2 flavors, either >> > it is a generic SOAP call (A case) or a soap request with some >> > files attached with MTOM (B case). Web service is capable of >> > handling both types of requests. 2. Once the processing starts, I >> > kill this tomcat with "kill -9". >> > >> > The results are: A - when tomcat1 is killed, the request is >> > automatically transmitted to tomcat2. Client receives correct >> > results. This is OK. B - when tomcat1 is killed, the request is not >> > transmitted to tomcat2. Client receives 502 error: bad gateway >> > instead. >> >> Are you using POST to send both messages (A case and B case)? >> > > Yes, both are POST. > >> >> What if you send a very *small* attachment via MTOM? I'm wondering >> what the real difference is, since in both cases you should be sending >> HTTP POST... the only difference should be larger Content-Length. >> > > Good idea, gotta check the small attachment version and see what happens. > I've also noticed that requests differ in Content-Type. For simple soap > request is text/xml and the one with attachments is multipart/related; > type="application/xop+xml (...). > >> >> I wonder if mod_jk can only failover the current request (with no >> error to the client) if the request is small enough (or only a small >> amount has already been transferred to the failing server). >> > > This could be it I'll check this out. I only need to wait few days until > I'm back in office. > I did some tests and it turns out that sending just a single tiny attachment (1KB) breaks the failover. > >> > [Fri Apr 27 12:19:08 2012] [1376450:1] [error] >> > service::jk_lb_worker.c (1425): unrecoverable error 502, request >> > failed. Tomcat failed in the middle of reques [Fri Apr 27 12:19:08 >> > 2012] [1376450:1] [error] service::jk_lb_worker.c (1485): All >> > tomcat instances failed, no more workers left >> >> This looks like you have killed both tomcat instances and mod_jk can >> contact neither of them. Are you sure you have gotten mod_jk back in >> communication with both servers between "A case" and "B case" runs? If >> you haven't, this isn't really a valid test. >> > > I'm pretty sure there was only one tomcat killed but I'll rerun my tests. > As for initial state, after each test I restart both mod_jk (apache) and > tomcat servers and make sure they appear as ok in mod_jk. > I kill tomcat while it's processing request in B case and the error log says there aren't any workers left and returns 502. I do the same in A case and w/o problems request is retransmitted to tomcat2 (not indicated in the log though). Makes no sense to me. A [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1266): (tomcat1) can't receive the response header message from tomcat, tomcat (x.x.x.x:8191) has forced a connection close for socket 19 [error] ajp_get_reply::jk_ajp_common.c (2118): (tomcat1) Tomcat is down or refused connection. No response has been sent to the client (yet) [info] ajp_service::jk_ajp_common.c (2607): (tomcat1) sending request to tomcat failed (recoverable), (attempt=1) [info] jk_open_socket::jk_connect.c (626): connect to x.x.x.x:8191 failed (errno=79) [info] ajp_connect_to_endpoint::jk_ajp_common.c (1008): Failed opening socket to (x.x.x.x:8191) (errno=79) [error] ajp_send_request::jk_ajp_common.c (1630): (tomcat1) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=79) [info] ajp_service::jk_ajp_common.c (2607): (tomcat1) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2) [error] ajp_service::jk_ajp_common.c (2626): (tomcat1) connecting to tomcat failed. [info] service::jk_lb_worker.c (1400): service failed, worker tomcat1 is in error state B [info] ajp_connection_tcp_get_message::jk_ajp_common.c (1266): (tomcat1) can't receive the response header message from tomcat, tomcat (x.x.x.x:8191) has forced a connection close for socket 19 [error] ajp_get_reply::jk_ajp_common.c (2118): (tomcat1) Tomcat is down or refused connection. No response has been sent to the client (yet) [error] ajp_service::jk_ajp_common.c (2600): (tomcat1) sending request to tomcat failed (unrecoverable), (attempt=1) [info] service::jk_lb_worker.c (1400): service failed, worker tomcat1 is in error state [error] service::jk_lb_worker.c (1425): unrecoverable error 502, request failed. Tomcat failed in the middle of request, we can't recover to another instance. [error] service::jk_lb_worker.c (1485): All tomcat instances failed, no more workers left Obviously this has sth to do with the fact that in A case the mod_jk.log says the request is recoverable, whereas in B case it's unrecoverable but I can't really tell what's the cause after looking at mod_jk src. Either way, I have to quit this and implement some retries on the client side instead. Thanks Kate