Hi Rainer, I will do my best to provide those things. Here is what looks like the full sequence from the our log:
[46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to _ip_:12409 failed (errno=115) [46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to (_ip_:12409) (errno=115) [46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621): (_hostname_) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=115) [46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_) sending request to tomcat failed (recoverable), because of error during request sending (attempt=1) [46055:3512666992] [info] jk_open_socket::jk_connect.c (627): connect to _ip_:12409 failed (errno=115) [46055:3512666992] [info] ajp_connect_to_endpoint::jk_ajp_common.c (992): Failed opening socket to (_ip_:12409) (errno=115) [46055:3512666992] [error] ajp_send_request::jk_ajp_common.c (1621): (_hostname_) connecting to backend failed. Tomcat is probably not started or is listening on the wrong port (errno=115) [46055:3512666992] [info] ajp_service::jk_ajp_common.c (2614): (_hostname_) sending request to tomcat failed (recoverable), because of error during request sending (attempt=2) [46055:3512666992] [error] ajp_service::jk_ajp_common.c (2634): (_hostname_) connecting to tomcat failed. [46055:3512666992] [info] service::jk_lb_worker.c (1469): service failed, worker _hostname_ is in error state You can see after this sequence the backend worker is marked as Bad. Here is the config: JkWorkerProperty worker.list=jkstatus,ajp_app,ajp_app2,ajp_app3,... JkWorkerProperty worker.jkstatus.type=status JkWorkerProperty worker.lb_member_template.type=ajp13 JkWorkerProperty worker.lb_member_template.activation=Active JkWorkerProperty worker.lb_member_template.ping_mode=A JkWorkerProperty worker.lb_member_template.connection_pool_timeout=600 JkWorkerProperty worker.lb_member_template.socket_keepalive=True JkWorkerProperty worker.lb_member_template.socket_timeout=30 JkWorkerProperty worker.lb_member_template.socket_connect_timeout=3000 JkWorkerProperty worker.lb_member_template.recover_time=30 JkWorkerProperty worker.lb_member_template.recovery_options=7 JkWorkerProperty worker.lb_worker_template.type=lb JkWorkerProperty worker.ajp_app.reference=worker.lb_worker_template JkWorkerProperty worker.ajp_app.balance_workers=_hostname1_ajpport1, _hostname1_ajpport2, ..., _hostname34_ajpport15 JkWorkerProperty worker._hostname_ajpportX.reference=worker.lb_member_template JkWorkerProperty worker._hostname_ajpportX.host=_hostname_ JkWorkerProperty worker._hostname_ajpportX.port=xxxx will this list accept attachments for the other details such as netstat output and thread dumps? Thanks, Max L On Fri, Mar 4, 2016 at 1:22 PM, Rainer Jung <rainer.j...@kippdata.de> wrote: > Am 04.03.2016 um 20:35 schrieb Max Lynch: > >> Hi there, >> >> We have a very heavily used implementation of modjk 1.2.35 running in >> Apache 2.2.15 i686 on CentOS 6.7 x86_64. After Apache startup, our system >> will perform optimally with no errors for about 24 hours, after which we >> begin to see this message: >> >> connecting to backend failed. Tomcat is probably not started or is >> listening on the wrong port (errno=115) >> > > Errno 115 on RHEL is EINPROGRESS. That means the call didn't finish but > one could retry it. This indicates we might be able to improve the code, > but it is also possible that e.g. a configured socket_connect_timeout was > reached. To check, we would need the full mod_jk log lines (and if several > different log lines show up for one event all of them) including the > columns with source file name and line number etc. > > It would also be very useful to see your configuration. You can remove IP > adresses, ports, secrets etc. and rename your workers but we should see the > timeout setting, cping settings and so on. > > Once we start seeing this error for one backend/worker, we begin seeing the >> same errors for eventually all workers. This problem doesn't go away until >> we restart Apache. Our setup consists of 720 workers per apache, with >> multiple apache servers, and each apache server also has several other >> sites configured with modjk serving other tomcat backends. It should be >> noted that we do not see the same error with other sites, nor do we have >> so >> many workers defined for any other site. >> > > We've searched through past mailings to try and find the same issue. The >> couple times we saw it brought up the error code 110 was also mentioned. >> It >> should be noted we do not see the same pattern. errno 110 does show but >> outside of the window when the problem begins and is at its worst. We >> believe this issue is not configuration related. >> >> We're posting this here to try and gather more data. Our process prohibits >> an upgrade of any kind without plenty of evidence supporting our position. >> Hoping that some individuals might have seen this particular issue, or if >> there is any data on whether this could be a bug. Hopefully we're correct >> in thinking this issue is not a configuration problem, but we'll help to >> rule out. >> > > It could also be interesting to capture the output of "netstat -an" during > the time that the problem happens. And finally the same on the Tomcat side > as well as a thread dump of the Tomcat JVM. > > Once you provide the full log line, I can check, whether the 1.2.41 code > actually has improvements related to errno 115 or not. Knowing the place in > the code where the error occurs might also give us an idea, what might have > happened and how to check further (e.g. Tomcat not accepting connections, > firewall idle connection drop between mod_jk and Tomcat etc. etc.). > > Regards, > > Rainer > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org > For additional commands, e-mail: users-h...@tomcat.apache.org > >