Bug#1057126: "AH03490: scoreboard is full" after nightly maintenance

2023-12-14 Thread Peter Krefting

Hi!

Stefan Fritsch:

Some processes are in "stopping" state but are not dying. They accumulate 
until the scoreboard is full.


First you should verify with ps if the processes with the PIDs in the "(old 
gen)" lines still exist. If not, it is a bug in apache itself (probably in 
mpm_event).


They do indeed still exist in the process table. Just looking at the 
command lines doesn't tell me much as they are all apache2 
forks/threads, but see below:


  # apachectl status 2>&1 |awk '/old gen/ { print "tr \"\\0\" \" \" < /proc/" $2 
"/cmdline; echo" }'  | sh
  /usr/sbin/apache2 -k start
  /usr/sbin/apache2 -k start
  /usr/sbin/apache2 -k start
  /usr/sbin/apache2 -k start
  /usr/sbin/apache2 -k start
  /usr/sbin/apache2 -k start

If they do, one needs to find out why they are not dying. This is likely the 
fault of some module.


The server is running Trac with its default configuration, there might 
be something that has changed there?


  # dpkg -l trac
  Desired=Unknown/Install/Remove/Purge/Hold
  | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Tri
  |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
  ||/ Name   Version  Architecture Description
  +++-==---===
  ii  trac   1.6-2all  Enhanced wiki and issue tra

The processes do indeed seem to be from Trac, here is one example:

  # ls -l /proc/210869/fd
  total 0
  lr-x-- 1 root root 64 dec 14 09:19 0 -> /dev/null
  l-wx-- 1 root root 64 dec 14 09:19 1 -> /dev/null
  lrwx-- 1 root root 64 dec 14 09:19 10 -> 'anon_inode:[eventpoll]'
  lr-x-- 1 root root 64 dec 14 09:19 11 -> 'pipe:[35832724]'
  l-wx-- 1 root root 64 dec 14 09:19 12 -> 'pipe:[35832724]'
  l-wx-- 1 root root 64 dec 14 09:19 14 -> /srv/trac/log/trac.log
  lrwx-- 1 root root 64 dec 14 09:19 18 -> /srv/trac/db/trac.db
  l-wx-- 1 root root 64 dec 14 09:19 2 -> '/var/log/apache2/error.log.1 
(deleted)'
  lrwx-- 1 root root 64 dec 14 09:19 21 -> /srv/trac/db/trac.db
  lrwx-- 1 root root 64 dec 14 09:19 3 -> 'socket:[35833027]'
  lr-x-- 1 root root 64 dec 14 09:19 5 -> 'pipe:[35832707]'
  l-wx-- 1 root root 64 dec 14 09:19 6 -> 'pipe:[35832707]'
  l-wx-- 1 root root 64 dec 14 09:19 7 -> 
/var/log/apache2/other_vhosts_access.log
  l-wx-- 1 root root 64 dec 14 09:19 8 -> '/var/log/apache2/access.log.1 
(deleted)'
  l-wx-- 1 root root 64 dec 14 09:19 9 -> '/var/log/apache2/access.log.1 
(deleted)'

Should I reassign the bug to Trac instead?

--
\\// Peter - http://www.softwolves.pp.se/



Bug#1057126: "AH03490: scoreboard is full" after nightly maintenance

2023-12-12 Thread Stefan Fritsch

Hi,

Am 30.11.23 um 09:05 schrieb Peter Krefting:

we are experiencing that the Apache httpd locks up, filling the 
error.log with errors after the nightly maintenance (not every night, 
though):


   [Wed Nov 29 00:00:01.922731 2023] [mpm_event:notice] [pid 62346:tid 
139841215223680] AH00489: Apache/2.4.57 (Debian) mod_wsgi/4.9.4 
Python/3.11 configured -- resuming normal operations
   [Wed Nov 29 00:00:01.922790 2023] [core:notice] [pid 62346:tid 
139841215223680] AH00094: Command line: '/usr/sbin/apache2'
   [Wed Nov 29 00:00:03.924683 2023] [mpm_event:error] [pid 62346:tid 
139841215223680] AH03490: scoreboard is full, not at 
MaxRequestWorkers.Increase ServerLimit.
   [Wed Nov 29 00:00:04.925780 2023] [mpm_event:error] [pid 62346:tid 
139841215223680] AH03490: scoreboard is full, not at 
MaxRequestWorkers.Increase ServerLimit.


(etc)

I set up a nightly job to mail me the output of "apachectl status", and 
this is the contents reported at Nov 29 00:00:00, i.e. just before the 
nightly maintenance:


Some processes are in "stopping" state but are not dying. They 
accumulate until the scoreboard is full.


First you should verify with ps if the processes with the PIDs in the 
"(old gen)" lines still exist. If not, it is a bug in apache itself 
(probably in mpm_event).


If they do, one needs to find out why they are not dying. This is likely 
the fault of some module. Third-party modules that do not come with 
apache itself are more likely to be the culprit. Since you are using 
mod_wsgi, that is a likely candidate. Look in the error log for the last 
log lines of the processes with those PIDs. Maybe that gives some hint. 
Or maybe mod_wsgi writes some logs itself, I am not familiar with that one.







   Apache Server Status for localhost (via ::1)

    Server Version: Apache/2.4.57 (Debian) mod_wsgi/4.9.4 Python/3.11
    Server MPM: event
    Server Built: 2023-04-13T03:26:51
  __

    Current Time: Wednesday, 29-Nov-2023 00:00:01 CET
    Restart Time: Tuesday, 21-Nov-2023 08:51:55 CET
    Parent Server Config. Generation: 8
    Parent Server MPM Generation: 7
    Server uptime: 7 days 15 hours 8 minutes 6 seconds
    Server load: 0.01 0.02 0.00
    Total accesses: 34860 - Total Traffic: 717.8 MB - Total Duration:
   2289464

    CPU Usage: u297.08 s30.61 cu.01 cs.07 - .0497% CPU load
    .0529 requests/sec - 1141 B/second - 21.1 kB/request - 65.676
   ms/request

    2 requests currently being processed, 48 idle workers

    Slot PID Stopping Connections Threads Async connections
    total accepting busy idle writing keep-alive closing
    0 62348 yes (old gen) 0 no 0 0 0 0 0
    1 62350 yes (old gen) 0 no 0 0 0 0 0
    2 66497 yes (old gen) 0 no 0 0 0 0 0
    3 66498 yes (old gen) 0 no 0 0 0 0 0
    4 73089 yes (old gen) 0 no 0 0 0 0 0
    5 73090 yes (old gen) 0 no 0 0 0 0 0
    6 79644 yes (old gen) 0 no 0 0 0 0 0
    7 79645 yes (old gen) 0 no 0 0 0 0 0
    8 86126 yes (old gen) 0 no 0 0 0 0 0
    9 86127 yes (old gen) 0 no 0 0 0 0 0
    10 92669 yes (old gen) 0 no 0 0 0 0 0
    11 92670 yes (old gen) 0 no 0 0 0 0 0
    12 99203 yes (old gen) 0 no 0 0 0 0 0
    13 99204 yes (old gen) 0 no 0 0 0 0 0
    14 105761 no 0 yes 0 25 0 0 0
    15 105762 no 0 yes 2 23 0 0 0
    Sum 16 14 0   2 48 0 0 0



.._W
W___

    Scoreboard Key:
    "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
    "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
    "C" Closing connection, "L" Logging, "G" Gracefully finishing,
    "I" Idle cleanup of worker, "." Open slot with no current process
'/usr/bin/lynx -dump http://localhost:80/server-status' failed.
Maybe you need to install a package providing www-browser or you
need to adjust the APACHE_LYNX variable in /etc/apache2/envvars




Cheers,
Stefan



Bug#1057126: "AH03490: scoreboard is full" after nightly maintenance

2023-11-30 Thread Peter Krefting

Package: apache2
Version: 2.4.57-2
Severity: normal

Dear Maintainer,

we are experiencing that the Apache httpd locks up, filling the 
error.log with errors after the nightly maintenance (not every night, 
though):


  [Wed Nov 29 00:00:01.922731 2023] [mpm_event:notice] [pid 62346:tid 
139841215223680] AH00489: Apache/2.4.57 (Debian) mod_wsgi/4.9.4 Python/3.11 
configured -- resuming normal operations
  [Wed Nov 29 00:00:01.922790 2023] [core:notice] [pid 62346:tid 
139841215223680] AH00094: Command line: '/usr/sbin/apache2'
  [Wed Nov 29 00:00:03.924683 2023] [mpm_event:error] [pid 62346:tid 
139841215223680] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase 
ServerLimit.
  [Wed Nov 29 00:00:04.925780 2023] [mpm_event:error] [pid 62346:tid 
139841215223680] AH03490: scoreboard is full, not at MaxRequestWorkers.Increase 
ServerLimit.

(etc)

I set up a nightly job to mail me the output of "apachectl status", 
and this is the contents reported at Nov 29 00:00:00, i.e. just before 
the nightly maintenance:


  Apache Server Status for localhost (via ::1)

   Server Version: Apache/2.4.57 (Debian) mod_wsgi/4.9.4 Python/3.11
   Server MPM: event
   Server Built: 2023-04-13T03:26:51
 __

   Current Time: Wednesday, 29-Nov-2023 00:00:01 CET
   Restart Time: Tuesday, 21-Nov-2023 08:51:55 CET
   Parent Server Config. Generation: 8
   Parent Server MPM Generation: 7
   Server uptime: 7 days 15 hours 8 minutes 6 seconds
   Server load: 0.01 0.02 0.00
   Total accesses: 34860 - Total Traffic: 717.8 MB - Total Duration:
  2289464

   CPU Usage: u297.08 s30.61 cu.01 cs.07 - .0497% CPU load
   .0529 requests/sec - 1141 B/second - 21.1 kB/request - 65.676
  ms/request

   2 requests currently being processed, 48 idle workers

   Slot PID Stopping Connections Threads Async connections
   total accepting busy idle writing keep-alive closing
   0 62348 yes (old gen) 0 no 0 0 0 0 0
   1 62350 yes (old gen) 0 no 0 0 0 0 0
   2 66497 yes (old gen) 0 no 0 0 0 0 0
   3 66498 yes (old gen) 0 no 0 0 0 0 0
   4 73089 yes (old gen) 0 no 0 0 0 0 0
   5 73090 yes (old gen) 0 no 0 0 0 0 0
   6 79644 yes (old gen) 0 no 0 0 0 0 0
   7 79645 yes (old gen) 0 no 0 0 0 0 0
   8 86126 yes (old gen) 0 no 0 0 0 0 0
   9 86127 yes (old gen) 0 no 0 0 0 0 0
   10 92669 yes (old gen) 0 no 0 0 0 0 0
   11 92670 yes (old gen) 0 no 0 0 0 0 0
   12 99203 yes (old gen) 0 no 0 0 0 0 0
   13 99204 yes (old gen) 0 no 0 0 0 0 0
   14 105761 no 0 yes 0 25 0 0 0
   15 105762 no 0 yes 2 23 0 0 0
   Sum 16 14 0   2 48 0 0 0



.._W
W___

   Scoreboard Key:
   "_" Waiting for Connection, "S" Starting up, "R" Reading Request,
   "W" Sending Reply, "K" Keepalive (read), "D" DNS Lookup,
   "C" Closing connection, "L" Logging, "G" Gracefully finishing,
   "I" Idle cleanup of worker, "." Open slot with no current process
'/usr/bin/lynx -dump http://localhost:80/server-status' failed.
Maybe you need to install a package providing www-browser or you
need to adjust the APACHE_LYNX variable in /etc/apache2/envvars


The server is fairly lightly loaded, running Trac 1.6.

While looking for information on the subject, I found this thread on 
Reddit, noting the exact same problem, also being tied to the nightly 
maintenance:


  https://www.reddit.com/r/debian/comments/15stmn7

Weird thing is, it happens right after a reload of apache that happens 
at midnight. So it's not a case of heavy usage (this site is barely 
used at all during the day, never mind at night). In fact there were 
no accesses between the time it reloaded the service and the error 
started.


-- Package-specific info:

-- System Information:
Debian Release: 12.2
  APT prefers stable-updates
  APT policy: (500, 'stable-updates'), (500, 'stable-security'), (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 6.1.0-13-amd64 (SMP w/2 CPU threads; PREEMPT)
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages apache2 depends on:
ii  apache2-bin2.4.57-2
ii  apache2-data   2.4.57-2
ii  apache2-utils  2.4.57-2
ii  init-system-helpers1.65.2
ii  lsb-base   11.6
ii  media-types10.0.0
ii  perl   5.36.0-7
ii  procps 2:4.0.2-3
ii  sysvinit-utils [lsb-base]  3.06-4

Versions of packages apache2 recommends:
ii  ssl-cert  1.1.2

Versions of packages apache2 suggests:
pn  apache2-doc  
pn  apache2-suexec-pristine | apache2-suexec-custom  
ii  lynx [www-browser]   2.9.0dev.12-1

Versions of packages apache2-bin depends on:
ii  libapr1  1.7.2-3
ii  libaprutil1  1.6.3-1
ii