On Solaris 10 u8, HTTPD 2.2.15 occasionally has one child process hang during a graceful restart.

Symptoms:
1. At debug-level logging, the error log shows:
[Wed Jun 23 14:38:21 2010] [debug] worker.c(1083): the listener thread didn't exit

I understand this is not a major issue (https://issues.apache.org/bugzilla/show_bug.cgi?id=9011), but provides insight into execution.

2. pstack of the hanging child shows the main thread is hanging while shutting down worker threads:

-----------------  lwp# 1 / thread# 1  --------------------
 fffffd7fff06cdea lwp_wait (3, fffffd7fffdff964)
 fffffd7fff063eee _thrp_join () + 3e
 fffffd7fff0640cc pthread_join () + 1c
 fffffd7fff27b195 apr_thread_join () + 25
 0000000000470a19 join_workers () + e9
 0000000000470de3 child_main () + 353
 0000000000471137 make_child () + 147
 0000000000471a6e ap_mpm_run () + 8be
 000000000042fd81 main () + 8b1
 000000000042f08c _start () + 6c
-----------------  lwp# 3 / thread# 3  --------------------
 fffffd7fff067527 lwp_park (0, 0, 0)
 fffffd7fff0610b9 cond_wait_queue () + 59
 fffffd7fff061647 _cond_wait () + 57
 fffffd7fff061676 cond_wait () + 26
 fffffd7fff0616b9 pthread_cond_wait () + 9
 0000000000472cc2 ap_queue_pop () + 72
 000000000047032d worker_thread () + 11d
 fffffd7fff06727b _thr_setup () + 5b
 fffffd7fff0674b0 _lwp_start ()
-----------------  lwp# 4 / thread# 4  --------------------
 fffffd7fff067527 lwp_park (0, 0, 0)
 fffffd7fff0610b9 cond_wait_queue () + 59
 fffffd7fff061647 _cond_wait () + 57
 fffffd7fff061676 cond_wait () + 26
 fffffd7fff0616b9 pthread_cond_wait () + 9
 0000000000472cc2 ap_queue_pop () + 72
 000000000047032d worker_thread () + 11d
 fffffd7fff06727b _thr_setup () + 5b
 fffffd7fff0674b0 _lwp_start ()

---SNIP---
...lots more threads in lwp_park(0, 0, 0)...
---SNIP---

-----------------  lwp# 28 / thread# 28  --------------------
 fffffd7fff06ce2a lwp_mutex_timedlock (fffffd7ffeee0000, 0)
 fffffd7fff05fb78 mutex_lock_internal () + 328
 fffffd7fff05ff62 mutex_lock_impl () + 112
 fffffd7fff06002b mutex_lock () + b
 fffffd7fff26e5a5 proc_mutex_proc_pthread_acquire () + 15
 000000000046ff4c listener_thread () + 3bc
 fffffd7fff06727b _thr_setup () + 5b
 fffffd7fff0674b0 _lwp_start ()

It appears that join_workers() is hanging on a call to apr_thread_join(...), in line 1104 of worker.c.


HTTPD was compiled with Solaris's default GCC (3.4.3), with the following flags:

CFLAGS="-O3 -m64 -march=athlon64"
LDFLAGS="-R$INSTALL_SSL/lib -L$INSTALL_SSL/lib"
./configure -C \
                --prefix=$INSTALL \
--enable-mods-shared="deflate expires headers proxy proxy-ajp proxy-balancer proxy-connect proxy-http rewrite ssl usertrack dav status log-config logio" \
                -with-ssl=$INSTALL_SSL \
                --with-mpm=worker \
                --enable-nonportable-atomics


Any thoughts? Anything other information I can provide to diagnose this issue?


Many thanks,
Scott Severtson

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
  "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to