[Bug 1909950] Re: named: TCP connections sometimes never close due to race in socket teardown

Mauricio Faria de Oliveira Thu, 11 Feb 2021 16:01:11 -0800

Matthew,

Thanks for the great work on this bug. Sponsored to focal.

I've reviewed the debdiff and had only two minor changes:
1) the Description: field to conform with DEP3/deb822 [1,2]
on multiline field (first line and paragraph separators),
2) trimmed 'and-' of the patch name to keep its line under
80 chars in the changelog (we could break it, but it's weird.)

The package built fine on all archs w/ focal-proposed enabled.

The test-case consistently broke bind9 within ~30 seconds
with a powerful client VM (32 CPUs, 5 tmux tabs w/ the loop)
for the version in focal-updates.
The test package consistently survived the test-case.

cheers,
Mauricio

[1] https://dep-team.pages.debian.net/deps/dep3/
[2] https://manpages.debian.org/unstable/dpkg-dev/deb822.5.en.html

** Description changed:

[Impact]

We are seeing busy Bind9 servers stop accepting TCP connections after a
period of time. Looking at netstat, named is still listening to port 53
on all interfaces, but if you send a dig, the connection will just time
out:

$ dig +tcp ubuntu.com @192.168.122.2
;; Connection to 192.168.122.2#53(192.168.122.2) for ubuntu.com failed: timed
out.

Symptoms are the number of tcp connections slowly increase, as well as
the tcp high water mark increases, if you run the "rndc status" command.
Eventually, the number of tcp connections will reach the tcp connection
limit, and named will "break" and no longer accept any new tcp
connections.

There will also be a number of connections in the conntrack table stuck
in the ESTABLISHED state, even through they are idle and ready to close,
and there will be a number of connections in the SYN_SENT state, due to
these connections getting stuck since the tcp connection limit has been
reached.

This appears to be caused by a race between deactivating a netmgr handle
and processing a asynchronous callback for the socket close code, which
can get triggered when a client sends a broken packet to the server and
then doesn't close the connection properly.

[Testcase]

You will need two VMs to reproduce this issue.

On the first, install bind9:

$ sudo apt install bind9

Set up a caching resolver by editing /etc/bind/named.conf.options and
uncommenting the forwarding block, and adding a DNS provider:

forwarders {
8.8.8.8;
};
+
+ If the DNS provider runs on dnsmasq/libvirt, also set:
+
+ dnssec-validation yes;

Next, restart the named service:

$ sudo systemctl restart named.service

Edit /etc/resolv.conf and change the resolver to 127.0.0.1.

Disable the systemd-resolved service:

$ sudo systemctl stop systemd-resolved.service

Test to make sure resolving ubuntu.com works, using the IP of the NIC:

$ dig +tcp @192.168.122.21 ubuntu.com
https://paste.ubuntu.com/p/7NQJ6RRJHN/

Now, go to the second VM:

Test to make sure that you can dig the other VM with:

$ dig +tcp @192.168.122.21 ubuntu.com

After that, use tc to intentionally drop some packets, so we can
simulate bad clients dropping connections and not closing them properly,
so we can see if we can trigger the race.

My NIC is enp1s0, and 30% drop should do the trick.

$ sudo tc qdisc add dev enp1s0 root netem loss 30%

Next, open gnome-terminal and paste and run the below command in 10-15
tabs, the more the better:

$ for run in {1..10000}; do dig +tcp @192.168.122.21 ubuntu.com & done

This parallelizes the connections to the bind9 server, to try and get
above the 150 connection limit.

Back on the server, watch the tcp high water mark in:

$ sudo rndc status
..
tcp clients: 0/150
TCP high-water: 10
..

$ sudo rndc status
..
tcp clients: 31/150
TCP high-water: 58
..

$ sudo rndc status
..
tcp clients: 56/150
TCP high-water: 141
..

$ sudo rndc status
..
tcp clients: 142/150
TCP high-water: 150
..

If you can't hit the 150 mark on tcp high water, add more tabs to the
other VM and keep hitting the DNS server. This will likely make the
other VM unstable as well, FYI.

Eventually, you will hit the 150 mark. After hitting it a bit longer,
your bind9 server will be broken.

$ dig +tcp @192.168.122.21 ubuntu.com
;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com failed:
timed out.
;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com failed:
timed out.

; <<>> DiG 9.16.1-Ubuntu <<>> +tcp @192.168.122.21 ubuntu.com
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com
failed: timed out.

Do this from the bind9 server, so you don't get confused with the 30%
packet drop of the other VM.

If you install the test package from the below ppa:

https://launchpad.net/~mruffell/+archive/ubuntu/lp1909950-test

You can hit this bind9 as much as you can, but it will never become
broken. If you stop the thundering herd at the 150 max connections, the
server will correctly tear down tcp connections, and you will be able to
successfully query the DNS server.

[Where problems could occur]

This patch doesn't really introduce any new code, it re-arranges the
ordering of events of existing code.

Before, depending on when a thread was scheduled, we could either
deactivate the netmgr handle before calling the asynchronous callback
for the socket close code, or vice versa.

The patches change this to ensure that the netmgr handle is deactivated
before the socket close callback is issued.

If a regression were to occur, we would see similar symptoms to this
bug, with sockets not closing properly and eventually exhausting tcp
connection limits, which will cause new tcp connections to not be
accepted.

In this case, a workaround would be to restart the named service when
the tcp high water mark is nearing the tcp connection limit, and wait
for a fix to be developed.

Regardless, only TCP connections would be affected, and UDP will still
function, meaning at worst, a partial outage would occur.

[Other]

This was fixed in bind9 9.16.2 by the below commit:

commit 01c4c3301e55b7d6a935a95ac0829e37fb317a0e
Author: Witold Kręcicki <w...@isc.org>
Date: Thu Mar 26 14:25:06 2020 +0100
Subject: Deactivate the handle before sending the async close callback.
Link:
https://gitlab.isc.org/isc-projects/bind9/-/commit/01c4c3301e55b7d6a935a95ac0829e37fb317a0e

Upstream bug: https://gitlab.isc.org/isc-projects/bind9/-/issues/1700

This commit is already present in Groovy and Hirsute. Only Focal needs
this patch.

--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1909950

Title:
named: TCP connections sometimes never close due to race in socket
teardown

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bind9/+bug/1909950/+subscriptions

--
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1909950] Re: named: TCP connections sometimes never close due to race in socket teardown

Reply via email to