I'm seeing an issue where Apache (2.2.19, mpm-prefork, default settings for 
prefork, Centos 4) slows down and eventually
becomes non-responsive. I'm not seeing any obvious errors but I am seeing a 
weird symptom. At this point I don't know if
Apache or Railo is causing it or whether it's a side-effect or the actual cause.

When I look at /server-status after the server has been running for at least an 
hour I start seeing output like:

Srv     PID     Acc     M       CPU     SS      Req     Conn    Child   Slot    
Client  VHost   Request
*9-0*   7216    0/0/1   _       0.00    19      0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0
*10-0*  7189    0/1/1   _       0.00    3       9       0.0     0.00    0.00    
redacted        redacted        GET /data/designs/photos/large/0/45.jpg HTTP/1.1
*11-0*  7190    0/1/1   _       0.00    3       6       0.0     0.00    0.00    
redacted        redacted        GET /data/accessories/photos/large/0/1.jpg
HTTP/1.1
*12-0*  7224    0/0/1   _       0.00    1       0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0
*13-0*  -       0/0/1   .       0.00    11      0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0
*14-0*  7193    1/1/1   *K*     0.00    1       6       3.1     0.00    0.00    
redacted        redacted        GET
/blog/wp-content/uploads/S11_055-100x150.jpg HTTP/1.1
*15-0*  -       0/0/1   .       0.00    10      0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0
*16-0*  -       0/0/1   .       0.00    9       0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0
*17-0*  -       0/0/1   .       0.00    1       0       0.0     0.00    0.00    
::1     default         OPTIONS * HTTP/1.0


According to Apache docs the OPTIONS requests are the Apache main process 
communicating with its child processes. It
sounded like the main reason for this communication is to kill a child when 
it's no longer needed, ie, to satisfy
Min/MaxSpareServers and such. It seems though that after receiving these 
requests the childs becomes stuck and neither
exit nor handle any subsequent requests.

I've seen the SS value after one of these OPTIONS requests go as high as 17000 
(4-5 hours). It's a reasonably busy
server so there definitely would have been requests in that period that the 
child (or its replacement) should have
handled, but didn't.

A server restart fixes the issue, for about half an hour. So does restarting 
EITHER Apache or Railo.

Other details:
* Tried Apache 2.2.17 and 2.2.19 installed from CentALT repo. Both displaying 
the issue, even with a new httpd.conf
* Railo versions 3.2.3 (stable) and 3.3.0 (preview) both exhibit the issue
* Downgrading from Railo 3.3.0 -> 3.2.3 *seemed* to improve things, as did the 
Apache 2.2.17 -> 2.2.19 update but
prevented the stuck threads entirely.
* Tried re-compiling the resin caucho connector for Apache after the 2.2.19 
update.
* Tried playing with Min/MaxSpareServers, MaxRequestsPerChild, etc. Currently 
back on defaults.
* Server runs about 20 dynamic sites in PHP and Railo with PostgresSQL and MySQL
* CPU load is low, even when workers are stuck
* Earlier crashes showed symptoms of memory leaks (like oom-killer killing java)
* The apache/railo user has a virtual memory limit of 3.5G and process limit of 
500 via ulimits (because previously the
whole server was becoming unresponsive including sshd)
* In Railo, some sites store/run components in the Application scope. The 
component methods use the 'local' scope for
variables and occasionally read in values from the request scope, this used to 
crash CFMX6.1.
* Java opts set in resin.conf are:

      <jvm-arg>-Xmn96M</jvm-arg>
      <jvm-arg>-Xms512M</jvm-arg>
      <jvm-arg>-Xmx512M</jvm-arg>
      <jvm-arg>-Xss1024k</jvm-arg>
      <jvm-arg>-Xdebug</jvm-arg>
      <jvm-arg>-Dcom.sun.management.jmxremote</jvm-arg>

* Tried increasing -Xmx to 1024M but issue persists.
* JVM is OpenJDK 1.6
* Resin is version 3.1.9, originally installed with Railo 3.1.2 package.
* Server is a live production server. If possible I'd like to avoid suggestions 
that would take sites offline for any
extended period of time.
* Added # CVE-2011-3192 Workaround to httpd.conf this morning.


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
   "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to