Just a follow-up,
upgrading to 2.2.21 has cleared this issue and we are no longer
experiencing zombies.
Thank you!
Jeff


-----Original Message-----
From: Rainer Jung [mailto:rainer.j...@kippdata.de] 
Sent: Saturday, September 10, 2011 3:45 AM
To: users@httpd.apache.org
Cc: Martin, Jeff
Subject: Re: [users@httpd] HELP: apache 2.2.17 creating zombies that are
increasing server load

Hi Martin,

Thanks for the detailed information.

The observed zombies are threads in Apache child processes. Those
processes (here PID 16042 are actually in the process of shutting down,
either due to a web server restart, or MPM configuration (like
MacRequestsPerChild or spare process configuration).

Unfortunately one of the threads falls into a non-terminated loop during
shutdown which consumes lots of cpu and prevents the process from
exiting. So the real problem is this looping thread:

> -----------------  lwp# 24 / thread# 24  --------------------
> ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624,
> a5a4f0) + 18
> ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20
> ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38
> ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c
> 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230
> ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c
> fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0)

Problems like that are unfortunately not easy to debug.

Do you use any 3rd-party modules, which did not come bundled with
Apache? Your config doesn't indicate it, but I'm asking to double check,
because e.g. "pfiles" lists OpenSSL libs without mod_ssl being loaded in
the config. It might be you compiled modules into httpd statically.

Any error message in the error_log?

Can you reproduce the problem? Even on a test system?

Although I'm not aware of any fixes directly related, it might be a good
first step to switch to 2.2.20 (or 2.2.21, which will be released likely
in few days) and apr 1.4.5 / apr-util 1.3.12 in order to start debugging
from recent versions.

Regards,

Rainer

On 07.09.2011 22:59, Martin, Jeff wrote:
> Hello,
> I have a Solaris 10 server running apache 2.2.17 and on a weekly basis
> its creating zombies and increasing the load to the point where we
have
> to restart it every Thursday night. There are 6 apache instances
running
> on this box but this is the only one seeing the issue. There have been
> no changes to the box that I am aware of or the developers are aware
of.
> I've included a lot of output as I'm not sure what will be helpful and
> what won't. Any info or steps to resolve this is most appreciated.
TIA.
> Jeff
> 
> bash-3.00# ulimit -a
> core file size        (blocks, -c) unlimited
> data seg size         (kbytes, -d) unlimited
> file size             (blocks, -f) unlimited
> open files                    (-n) 256
> pipe size          (512 bytes, -p) 10
> stack size            (kbytes, -s) 8192
> cpu time             (seconds, -t) unlimited
> max user processes            (-u) 29995
> virtual memory        (kbytes, -v) unlimited
> 
> bash-3.00# netstat -an|grep 172.23.181.34.80|wc -l
>     3438
> 
> bash-3.00# uptime
>   1:43pm  up 343 day(s),  2:59,  2 users,  load average: 4.41, 4.50,
> 4.39
> 
> SunOS 5.10 Generic_142909-17 sun4v sparc SUNW,SPARC-Enterprise-T5120
> 
> httpd.conf
> ServerRoot "/web/apache2-prod-showcase_second"
> 
> Listen 172.23.181.34:80
> 
> LoadModule headers_module modules/mod_headers.so
> LoadModule rewrite_module modules/mod_rewrite.so
> 
> <IfModule !mpm_netware_module>
> <IfModule !mpm_winnt_module>
> 
> User csdrd
> Group daemon
> 
> </IfModule>
> </IfModule>
> 
> ServerAdmin webmas...@xx.xxxxx.com
> 
> ServerName xx.xxxxx.com
> 
> DocumentRoot "/apps/doc-root"
> 
> ErrorLog "logs/error_log"
> LogLevel warn
> 
> DefaultType text/plain
> 
> # Cache control
> ExpiresActive   On
> ExpiresByType   image/gif       "access plus 1 weeks"
> ExpiresByType   image/jpg       "access plus 1 weeks"
> ExpiresByType   image/jpeg       "access plus 1 weeks"
> ExpiresByType   application/x-shockwave-flash       "access plus 1
> weeks"
> ExpiresByType   image/png       "access plus 1 weeks"
> FileETag none
> 
> ProxyRequests Off
> ProxyPreserveHost On
> 
> <Proxy *>
>         Order deny,allow
>         Deny from all
>         Allow from all
> </Proxy>
> 
> ProxyPass /showcase/explore balancer://exploreutc
> stickysession=JSESSIONID|jsessionid timeout=5 lbmethod=byrequests
nofail
> over=Off
> # Port 8180 service bind
> <Proxy balancer://exploreutc>
>         BalancerMember http://172.22.81.99:8080/utc route=host3
>         BalancerMember http://172.22.81.100:8080/utc route=host4
>         BalancerMember http://172.22.81.99:8180/utc route=host3a
>         BalancerMember http://172.22.81.100:8180/utc route=host4a
> </Proxy>
> 
> <Directory />
>     Options FollowSymLinks
>     AllowOverride None
>     Order deny,allow
>     Deny from all
> </Directory>
> 
> <Directory "/apps/doc-root">
>     Options FollowSymLinks
>     AllowOverride All
>     Order allow,deny
>     Allow from all
> </Directory>
> 
> <Directory "/web/apache2-prod-showcase_second/cgi-bin">
>     AllowOverride None
>     Options None
>     Order allow,deny
>     Allow from all
> </Directory>
> 
> <FilesMatch "^\.ht">
>     Order allow,deny
>     Deny from all
>     Satisfy All
> </FilesMatch>
> 
> <IfModule dir_module>
>     DirectoryIndex index_explore.html
> </IfModule>
> 
> <IfModule log_config_module>
>     LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\"" combined
>     LogFormat "%h %l %u %t \"%r\" %>s %b" common
>      
> <IfModule logio_module>
>       # You need to enable mod_logio.c to use %I and %O
>       LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\"
> \"%{User-Agent}i\" %I %O" combinedio
>     </IfModule>
> </IfModule>
> 
> <IfModule alias_module>
>     ScriptAlias /cgi-bin/ "/web/apache2-prod-showcase_second/cgi-bin/"
> </IfModule>
> 
> <IfModule cgid_module>
> </IfModule>
> 
> <IfModule mime_module>
>     TypesConfig conf/mime.types
>     AddType application/x-compress .Z
>     AddType application/x-gzip .gz .tgz
> </IfModule>
> 
> <IfModule ssl_module>
> SSLRandomSeed startup builtin
> SSLRandomSeed connect builtin
> </IfModule>
> 
> bash-3.00# ./httpd -S
> VirtualHost configuration:
> Syntax OK
> 
> bash-3.00# ./httpd -V
> Server version: Apache/2.2.17 (Unix)
> Server built:   Mar 16 2011 16:19:54
> Server's Module Magic Number: 20051115:25
> Server loaded:  APR 1.4.2, APR-Util 1.3.10
> Compiled using: APR 1.4.2, APR-Util 1.3.10
> Architecture:   32-bit
> Server MPM:     Worker
>   threaded:     yes (fixed thread count)
>     forked:     yes (variable process count)
> Server compiled with....
> -D APACHE_MPM_DIR="server/mpm/worker"
> -D APR_HAS_SENDFILE
> -D APR_HAS_MMAP
> -D APR_HAVE_IPV6 (IPv4-mapped addresses enabled)
> -D APR_USE_PROC_PTHREAD_SERIALIZE
> -D APR_USE_PTHREAD_SERIALIZE
> -D SINGLE_LISTEN_UNSERIALIZED_ACCEPT
> -D APR_HAS_OTHER_CHILD
> -D AP_HAVE_RELIABLE_PIPED_LOGS
> -D DYNAMIC_MODULE_LIMIT=128
> -D HTTPD_ROOT="/web/apache2-prod-showcase_second"
> -D SUEXEC_BIN="/web/apache2-prod-showcase_second/bin/suexec"
> -D DEFAULT_SCOREBOARD="logs/apache_runtime_status"
> -D DEFAULT_ERRORLOG="logs/error_log"
> -D AP_TYPES_CONFIG_FILE="conf/mime.types"
> -D SERVER_CONFIG_FILE="conf/httpd.conf"
> 
>  bash-3.00# pstack 7619
> 7619:   /web/apache2-prod-showcase_second/bin/httpd -k start
> fecccdbc pollsys  (ffbff868, 0, ffbff8d0, 0)
> fec68590 pselect  (ffbff868, fed34728, fed34728, 0, ffbff8d0, 0) + 1c8
> fec68908 select   (0, 0, 0, 0, ffbff938, 0) + a0
> ff0219b8 apr_sleep (0, f4240, ffbffa4c, 0, eb610, 11176) + 4c
> 0004aadc ap_wait_or_timeout (ffbffa4c, ffbffa48, ffbffad0, eb610,
dd000,
> e0400) + 60
> 0009a764 ap_mpm_run (fead01d8, 1c, 0, 20, 6, 3e) + 218
> 0002fcc8 main     (eb610, db400, ddc00, ddc00, e9608, 0) + 76c
> 0002f08c _start   (0, 0, 0, 0, 0, 0) + 5c
> 
> 16042 csdrd      20M   16M cpu20   50    0   3:52:05 3.1% httpd/24
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/65
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/64
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/63
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/62
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/61
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/60
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/59
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/58
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/57
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/56
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/55
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/54
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/53
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/52
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/51
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/50
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/49
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/48
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/47
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/46
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/45
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/44
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/43
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/42
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/41
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/40
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/39
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/38
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/37
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/36
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/35
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/34
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/33
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/32
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/31
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/30
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/29
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/28
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/27
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/26
> 16042 csdrd      20M   16M zombie   0    -   0:00:00 0.0% httpd/25
> 16042 csdrd      20M   16M sleep   59    0   0:00:00 0.0% httpd/1
> 
> bash-3.00# pstack 16042
> 16042:  /web/apache2-prod-showcase_second/bin/httpd -k start
> -----------------  lwp# 1 / thread# 1  --------------------
> feccd210 lwp_wait (18, ffbff7d4)
> fecc60f0 _thrp_join (18, 0, ffbff83c, 1, ffbff7d4, fed35900) + 34
> ff020778 apr_thread_join (ffbff8bc, 19aef8, 2, 0, 1, c6bf0) + c
> 00099f28 join_workers (54, 1b3ee8, 99ab8, 19a810, 0, 1) + ec
> 0009a27c child_main (7, 98e0c, 0, 0, fed35960, ff172a00) + 270
> 0009a45c make_child (ddc00, 7, 1, e0c00, dd000, e0400) + 128
> 0009ac8c ap_mpm_run (fead0198, 18, 0, 20, 1, 15) + 740
> 0002fcc8 main     (eb610, db400, ddc00, ddc00, e9608, 0) + 76c
> 0002f08c _start   (0, 0, 0, 0, 0, 0) + 5c
> -----------------  lwp# 24 / thread# 24  --------------------
> ff1577dc apr_brigade_cleanup (a5a500, 0, 10c0c, fec6367c, fee58624,
> a5a4f0) + 18
> ff014ab8 run_cleanups (a39a80, 0, 4, 0, 1, a65b00) + 20
> ff015b94 apr_pool_destroy (a39a70, a35aa0, ff017ddc, 0, de520, 0) + 38
> ff015dec apr_pool_clear (a35a60, a35aa0, a35aa0, 1d5, 0, 19ab58) + 1c
> 00099a2c worker_thread (19aef8, 7, 0, e0400, e0400, 54) + 230
> ff020640 dummy_worker (19aef8, fd47c000, 0, 0, ff020634, 1) + c
> fecc94f0 _lwp_start (0, 0, 0, 0, 0, 0)
> -----------------  lwp# 25 / thread# 25  --------------------
> ff020634 dummy_worker(), exit value = 0x00000000
>         ** zombie (exited, not detached, not yet joined) **
> -----------------  lwp# 26 / thread# 26  --------------------
> ff020634 dummy_worker(), exit value = 0x00000000
>        ** zombie (exited, not detached, not yet joined) **
> <SNIP more of the same.....>
> 
> bash-3.00# pfiles 16042
> 16042:  /web/apache2-prod-showcase_second/bin/httpd -k start
>   Current rlimit: 65536 file descriptors
>    0: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2
>       O_RDONLY
>       /devices/pseudo/mm@0:null
>    1: S_IFCHR mode:0666 dev:348,0 ino:6815752 uid:0 gid:3 rdev:13,2
>       O_WRONLY|O_CREAT|O_TRUNC
>       /devices/pseudo/mm@0:null
>    2: S_IFREG mode:0644 dev:32,26 ino:110758 uid:0 gid:0 size:570041
>       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE
>       /web/apache2-prod-showcase_second/logs/error_log
>    4: S_IFDOOR mode:0444 dev:357,0 ino:42 uid:0 gid:0 size:0
>       O_RDONLY|O_LARGEFILE FD_CLOEXEC  door to pid -1
>    5: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0
>       O_RDWR FD_CLOEXEC
>    6: S_IFIFO mode:0000 dev:346,0 ino:2614440 uid:0 gid:0 size:0
>       O_RDWR FD_CLOEXEC
>    7: S_IFREG mode:0644 dev:32,26 ino:110763 uid:0 gid:0
size:1240942649
>       O_WRONLY|O_APPEND|O_CREAT|O_LARGEFILE FD_CLOEXEC
>       /web/apache2-prod-showcase_second/logs/access_log
>   18: S_IFSOCK mode:0666 dev:355,0 ino:10041 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50949
>         peername: AF_INET 172.22.81.100  port: 8180
>   24: S_IFSOCK mode:0666 dev:355,0 ino:40577 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 51076
>         peername: AF_INET 172.22.81.99  port: 8180
>   48: S_IFSOCK mode:0666 dev:355,0 ino:41083 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50927
>         peername: AF_INET 172.22.81.100  port: 8180
>   49: S_IFSOCK mode:0666 dev:355,0 ino:27268 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 51025
>         peername: AF_INET 172.22.81.99  port: 8180
>   51: S_IFSOCK mode:0666 dev:355,0 ino:10997 uid:0 gid:0 size:0
>       O_RDWR|O_NONBLOCK FD_CLOEXEC
>         SOCK_STREAM
>         SO_SNDBUF(49152),SO_RCVBUF(49640),IP_NEXTHOP(0.0.193.232)
>         sockname: AF_INET 172.22.81.122  port: 50900
>         peername: AF_INET 172.22.81.99  port: 8180
>            
> bash-3.00# ldd httpd
>         libssl.so.1.0.0 =>       /usr/local/ssl/lib/libssl.so.1.0.0
>         libcrypto.so.1.0.0 =>    /usr/local/ssl/lib/libcrypto.so.1.0.0
>         libdl.so.1 =>    /lib/libdl.so.1
>         libm.so.2 =>     /lib/libm.so.2
>         libaprutil-1.so.0 =>
> /web/apache2-prod-showcase_second/lib/libaprutil-1.so.0
>         libexpat.so.0 =>
> /web/apache2-prod-showcase_second/lib/libexpat.so.0
>         libiconv.so.2 =>         /usr/local/lib/libiconv.so.2
>         libapr-1.so.0 =>
> /web/apache2-prod-showcase_second/lib/libapr-1.so.0
>         libuuid.so.1 =>  /lib/libuuid.so.1
>         libsendfile.so.1 =>      /lib/libsendfile.so.1
>         librt.so.1 =>    /lib/librt.so.1
>         libsocket.so.1 =>        /lib/libsocket.so.1
>         libnsl.so.1 =>   /lib/libnsl.so.1
>         libpthread.so.1 =>       /lib/libpthread.so.1
>         libc.so.1 =>     /lib/libc.so.1
>         libgcc_s.so.1 =>         /usr/local/lib/libgcc_s.so.1
>         libaio.so.1 =>   /lib/libaio.so.1
>         libmd.so.1 =>    /lib/libmd.so.1
>         libmp.so.2 =>    /lib/libmp.so.2
>         libscf.so.1 =>   /lib/libscf.so.1
>         libdoor.so.1 =>  /lib/libdoor.so.1
>         libuutil.so.1 =>         /lib/libuutil.so.1
>         libgen.so.1 =>   /lib/libgen.so.1
>         /platform/SUNW,SPARC-Enterprise-T5120/lib/libc_psr.so.1
>         /platform/SUNW,SPARC-Enterprise-T5120/lib/libmd_psr.so.1


________________________________
This message may contain confidential information.  If you are not the intended 
recipient of this e-mail, do not disseminate, distribute or copy this e-mail 
and delete this e-mail from your system.

---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
   "   from the digest: users-digest-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org

Reply via email to