I have a farm of Apache httpd servers proxying to Tomcat with mod_jk and I am 
having issues with Apache processes getting stuck (as seen by the W state in 
server-status).  I am sending to this list because the stack traces show httpd 
gets stuck in mod_jk.

httpd is configured for prefork and with 512 servers on start and maximum.  
When the problem occurs we end up with nearly all 512 processes in the W state 
until we restart it.  The problem occurs more often when load is high but is 
not restricted to high load.  The problem started occuring more often when we 
increased the servers from 384 to 512.  The hosts have enough memory and do not 
swap.  The issue occurs intermitently and is not tied to a particular host or 
instance Tomcat (there are ~150 Tomcat instances).  The JkShmFile is on tmpfs.

Environment: RHEL5.11, Apache 2.4.10 (prefork), JK 1.2.40, APR 1.5.1, APR-UTIL 
1.5.4

The stuck httpd processes all show the same stack and strace:

pstack:
#0  0x00002b3439026bff in fcntl () from /lib64/libpthread.so.0
#1  0x00002b3440911656 in jk_shm_lock () from 
/usr/local/apache2/modules/mod_jk.so
#2  0x00002b3440917805 in ajp_maintain () from 
/usr/local/apache2/modules/mod_jk.so
#3  0x00002b3440906cac in maintain_workers () from 
/usr/local/apache2/modules/mod_jk.so
#4  0x00002b3440901140 in wc_maintain () from 
/usr/local/apache2/modules/mod_jk.so
#5  0x00002b34408f40b6 in jk_handler () from 
/usr/local/apache2/modules/mod_jk.so
#6  0x0000000000448eca in ap_run_handler ()
#7  0x000000000044cc92 in ap_invoke_handler ()
#8  0x000000000045e24f in ap_process_async_request ()
#9  0x000000000045e38f in ap_process_request ()
#10 0x000000000045ab65 in ap_process_http_connection ()
#11 0x00000000004530ba in ap_run_process_connection ()
#12 0x000000000046423a in child_main ()
#13 0x0000000000464544 in make_child ()
#14 0x00000000004649ae in prefork_run ()
#15 0x0000000000430634 in ap_run_mpm ()
#16 0x000000000042ad97 in main ()

strace:
fcntl(19, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=1}) = 0
time(NULL)                              = 1424711498
fcntl(19, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_WRLCK, whence=SEEK_SET, start=0, len=1}) = 0
fcntl(19, F_SETLKW, {type=F_UNLCK, whence=SEEK_SET, start=0, len=1}) = 0

Any help tracking this issue down would be appreciated.

Thanks,
Jesse DeFer

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to