Hi,

Short version:

I use httpd on Windows as a reverse proxy for a microservice system. Some
services communicate over websockets (more precicely: SignalR). From time
to time I have to restart the server in order to read a new configuration.
I observe an increasing number of threads blocked by the SignalR
connections. It's a matter of time until the server completely freezes
because no threads are available for other requests.

Details:

I reduced my system as much as possible. I end up with two microservices, A
and B. A has a SignalR hub. Both, A and B subscribe to the events of this
hub. Thus, there should be two connections.

Now the experiment:

1. Start the two microservices: They repeatedly try to connect, but fail.
This is expected, because they are configured to connect via the reverse
proxy and httpd is not running yet.
2. Start httpd (Windows Service): As expected, both services establish
their connection, confirmed by the service logs and mod_status showing 2
connections.
3. Restart httpd: In real-world, I call
    httpd.exe -n "ServiceName" -k restart
   programmatically. For this experiment, I call it from Powershell. What
happens?
   3a. The parent starts a new child and hands over 2 sockets, see
error.log on Pastebin
   3b. The parent needs to stop the old child. The old child cannot stop
because of the open connections. The old child waits a grace period of 30s
before, then it terminates the 2 threads. My services log that their
connection was disconnected and attempt to reconnect. At this moment, 2
more connections appear in mod_status. However, I don't see any socket
handover in error.log.
4. Repeat httpd restart.
   4a. The parent starts a new child and hands over 2 sockets, see
error.log. It's still 2 sockets, although I saw 4 connections in mod_status
in the previous step.
   4b. The parent shuts down the old child. This time, there is no grace
period, but 18(!) threads that failed to exit are terminated, see
error.log. Both services log disconnect and reconnect. However, no
additional connections appear in mod_stats, it remains 4.

When I repeat restarting httpd, most of the time it happens the same as
described in step 4. Only difference is a changing number of "threads that
failed to exit". But sometimes, additional connections appear in
mod_status. I can't reproduce this on purpose. I suspect a race condition
how fast the old child is shut down, the new one is started and my services
trying to reconnect, but I don't know the httpd source code.


To get my job done, I need to know: What can I do to avoid eventually
blocking the server?
Out of curiosity, I also would like to know what excatly happens, how the
SignalR connectios are handed over to the next child, why the first restart
works different than the other restarts.

I appreciate any hint!


Some more information about server and configuration:
Version: 2.4.41
Some config snippets:

ThreadsPerChild 20 # handy for debugging, not in production

RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule "^/my/microservice" "wss://hostname:53728%{REQUEST_URI}"[P]
ProxyPass /my/microservice https://hostname:53728/my/microservice
ProxyPassReverse /my/microservice https://hostname:53728/my/microservice

Link to error.log on Pastebin: https://pastebin.com/7a7B0bLb

Reply via email to