Daniel, all, Just wanted to post the solution to this issue. I wanted to wait a significant amount of time to make sure we had this solved. The root caused was the LDAP Caching mechanism. I am guessing there is a bug in that code that causes the server to go haywire after n-number of items being cached or looked up. Or perhaps some memory leak.
By disabling LDAP caching the server has been stable for 60+ days. The last changes to http.conf I made were these: < LDAPSharedCacheSize 500000 < LDAPCacheEntries 1024 --- > LDAPSharedCacheSize 0 > LDAPCacheEntries 16 352c352 < LDAPOpCacheEntries 1024 --- > LDAPOpCacheEntries 0 386c386 Hope this helps some other poor souls out there. MJ On Thursday, January 29, 2015 6:35 PM, Daniel <dferra...@gmail.com> wrote: 2015-01-30 1:03 GMT+01:00 Mark Jacquet <mark_jacq...@yahoo.com.invalid>: Problem: Apache server will stay up for random amount of time, usually days, but eventually enters a hung state. When hung the CPU load gradually spikes on the machine and new web server requests are unresponsive. Error logs typically contain lots of these: Wed Jan 28 16:06:58.667188 2015] [mpm_event:error] [pid 25336:tid 1] AH00485: scoreboard is full, not at MaxRequestWorkers I have done a lot of web research on this top and have found many cases where others o=have had the same/similar issue but no real solutions. Seem very close to this bug report: https://issues.apache.org/bugzilla/show_bug.cgi?id=53555 Environment: LDOM (VM) SunOS myhostname 5.10 Generic_118833-36 sun4v sparc SUNW,Sun-Fire-T200 8G RAM http Conf: StartServers 8 MinSpareServers Not set MaxSpareServers Not set ServerLimit 256 MaxRequestWorkers 100 MaxConnectionsPerChild 1000 KeepAlive On TimeOut 3000 MaxKeepAliveRequests 50 KeepAliveTimeout 2 Current non-hung Score Board: Server Version: Apache/2.4.10 (Unix) Server MPM: event Server Built: Oct 30 2014 16:29:03 Current Time: Wednesday, 28-Jan-2015 10:59:39 PST Restart Time: Wednesday, 28-Jan-2015 09:49:21 PST Parent Server Config. Generation: 1 Parent Server MPM Generation: 0 Server uptime: 1 hour 10 minutes 17 seconds Server load: 0.60 0.46 0.41 Total accesses: 1134 - Total Traffic: 2.2 GB CPU Usage: u9.07 s16.94 cu609.51 cs69.31 - 16.7% CPU load .269 requests/sec - 0.5 MB/second - 2.0 MB/request 1 requests currently being processed, 99 idle workers PID Connections Threads Async connections total accepting busy idle writing keep-alive closing 25337 0 yes 1 24 0 0 0 25338 1 yes 0 25 1 0 0 25339 1 yes 0 25 0 0 1 25340 1 yes 0 25 0 0 1 Sum 3 1 99 1 0 2 Any thoughts/comments on http conf tuning, OS patches, apache bug fixes appreciated. This is a production server, so you can imagine, having it go down at random times (usually when I am asleep) is not fun! Thanks. MJ Hello, you have some odd values. First you don't specify ThreadsPerChild, which by default is 64. Yet you do specify the maxrequestworkers which represents the total of threads in all child processes together, but you specify a maximun of 256 processes. By a simple math, 256 process * 64 childs per process would yield 16384 threads in total, yet you are just allowing a maximun of 100, so effectively your server is just capable of starting 1 single process and thus, every time you restart, having no "spare" processes available you will get scoreboard is full message. Consider something more logical like this for starters: StartServers 1 <-- starts with 1 processServerLimit 5 <-- 4 more process available, 5 x 200 max threads = 1000 (as you can see bellow, math matches maxrequestworkers)MinSpareThreads 25 MaxSpareThreads 100ThreadsPerChild 200 <-- threads per child processThreadLimit 200 <---max threads per child processMaxRequestWorkers 1000 <--- a total of 1000 threadsMaxConnectionsPerChild 10000000 This is an example, adjust to your needs. -- Daniel FerradalIT Specialist email dferradal@gmail.comlinkedin es.linkedin.com/in/danielferradal