Matthew,

Thanks for the great work on this bug. Sponsored to focal.

I've reviewed the debdiff and had only two minor changes:
1) the Description: field to conform with DEP3/deb822 [1,2]
on multiline field (first line and paragraph separators), 
2) trimmed 'and-' of the patch name to keep its line under
80 chars in the changelog (we could break it, but it's weird.)

The package built fine on all archs w/ focal-proposed enabled.

The test-case consistently broke bind9 within ~30 seconds
with a powerful client VM (32 CPUs, 5 tmux tabs w/ the loop)
for the version in focal-updates.
The test package consistently survived the test-case.

cheers,
Mauricio

[1] https://dep-team.pages.debian.net/deps/dep3/
[2] https://manpages.debian.org/unstable/dpkg-dev/deb822.5.en.html

** Description changed:

  [Impact]
  
  We are seeing busy Bind9 servers stop accepting TCP connections after a
  period of time. Looking at netstat, named is still listening to port 53
  on all interfaces, but if you send a dig, the connection will just time
  out:
  
  $ dig +tcp ubuntu.com @192.168.122.2
  ;; Connection to 192.168.122.2#53(192.168.122.2) for ubuntu.com failed: timed 
out.
  
  Symptoms are the number of tcp connections slowly increase, as well as
  the tcp high water mark increases, if you run the "rndc status" command.
  Eventually, the number of tcp connections will reach the tcp connection
  limit, and named will "break" and no longer accept any new tcp
  connections.
  
  There will also be a number of connections in the conntrack table stuck
  in the ESTABLISHED state, even through they are idle and ready to close,
  and there will be a number of connections in the SYN_SENT state, due to
  these connections getting stuck since the tcp connection limit has been
  reached.
  
  This appears to be caused by a race between deactivating a netmgr handle
  and processing a asynchronous callback for the socket close code, which
  can get triggered when a client sends a broken packet to the server and
  then doesn't close the connection properly.
  
  [Testcase]
  
  You will need two VMs to reproduce this issue.
  
  On the first, install bind9:
  
  $ sudo apt install bind9
  
  Set up a caching resolver by editing /etc/bind/named.conf.options and
  uncommenting the forwarding block, and adding a DNS provider:
  
  forwarders {
          8.8.8.8;
  };
+ 
+ If the DNS provider runs on dnsmasq/libvirt, also set:
+ 
+ dnssec-validation yes;
  
  Next, restart the named service:
  
  $ sudo systemctl restart named.service
  
  Edit /etc/resolv.conf and change the resolver to 127.0.0.1.
  
  Disable the systemd-resolved service:
  
  $ sudo systemctl stop systemd-resolved.service
  
  Test to make sure resolving ubuntu.com works, using the IP of the NIC:
  
  $ dig +tcp @192.168.122.21 ubuntu.com
  https://paste.ubuntu.com/p/7NQJ6RRJHN/
  
  Now, go to the second VM:
  
  Test to make sure that you can dig the other VM with:
  
  $ dig +tcp @192.168.122.21 ubuntu.com
  
  After that, use tc to intentionally drop some packets, so we can
  simulate bad clients dropping connections and not closing them properly,
  so we can see if we can trigger the race.
  
  My NIC is enp1s0, and 30% drop should do the trick.
  
  $ sudo tc qdisc add dev enp1s0 root netem loss 30%
  
  Next, open gnome-terminal and paste and run the below command in 10-15
  tabs, the more the better:
  
  $ for run in {1..10000}; do dig +tcp @192.168.122.21 ubuntu.com & done
  
  This parallelizes the connections to the bind9 server, to try and get
  above the 150 connection limit.
  
  Back on the server, watch the tcp high water mark in:
  
  $ sudo rndc status
  ..
  tcp clients: 0/150
  TCP high-water: 10
  ..
  
  $ sudo rndc status
  ..
  tcp clients: 31/150
  TCP high-water: 58
  ..
  
  $ sudo rndc status
  ..
  tcp clients: 56/150
  TCP high-water: 141
  ..
  
  $ sudo rndc status
  ..
  tcp clients: 142/150
  TCP high-water: 150
  ..
  
  If you can't hit the 150 mark on tcp high water, add more tabs to the
  other VM and keep hitting the DNS server. This will likely make the
  other VM unstable as well, FYI.
  
  Eventually, you will hit the 150 mark. After hitting it a bit longer,
  your bind9 server will be broken.
  
  $ dig +tcp @192.168.122.21 ubuntu.com
  ;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com failed: 
timed out.
  ;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com failed: 
timed out.
  
  ; <<>> DiG 9.16.1-Ubuntu <<>> +tcp @192.168.122.21 ubuntu.com
  ; (1 server found)
  ;; global options: +cmd
  ;; connection timed out; no servers could be reached
  
  ;; Connection to 192.168.122.21#53(192.168.122.21) for ubuntu.com
  failed: timed out.
  
  Do this from the bind9 server, so you don't get confused with the 30%
  packet drop of the other VM.
  
  If you install the test package from the below ppa:
  
  https://launchpad.net/~mruffell/+archive/ubuntu/lp1909950-test
  
  You can hit this bind9 as much as you can, but it will never become
  broken. If you stop the thundering herd at the 150 max connections, the
  server will correctly tear down tcp connections, and you will be able to
  successfully query the DNS server.
  
  [Where problems could occur]
  
  This patch doesn't really introduce any new code, it re-arranges the
  ordering of events of existing code.
  
  Before, depending on when a thread was scheduled, we could either
  deactivate the netmgr handle before calling the asynchronous callback
  for the socket close code, or vice versa.
  
  The patches change this to ensure that the netmgr handle is deactivated
  before the socket close callback is issued.
  
  If a regression were to occur, we would see similar symptoms to this
  bug, with sockets not closing properly and eventually exhausting tcp
  connection limits, which will cause new tcp connections to not be
  accepted.
  
  In this case, a workaround would be to restart the named service when
  the tcp high water mark is nearing the tcp connection limit, and wait
  for a fix to be developed.
  
  Regardless, only TCP connections would be affected, and UDP will still
  function, meaning at worst, a partial outage would occur.
  
  [Other]
  
  This was fixed in bind9 9.16.2 by the below commit:
  
  commit 01c4c3301e55b7d6a935a95ac0829e37fb317a0e
  Author: Witold Kręcicki <w...@isc.org>
  Date: Thu Mar 26 14:25:06 2020 +0100
  Subject: Deactivate the handle before sending the async close callback.
  Link: 
https://gitlab.isc.org/isc-projects/bind9/-/commit/01c4c3301e55b7d6a935a95ac0829e37fb317a0e
  
  Upstream bug: https://gitlab.isc.org/isc-projects/bind9/-/issues/1700
  
  This commit is already present in Groovy and Hirsute. Only Focal needs
  this patch.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1909950

Title:
  named: TCP connections sometimes never close due to race in socket
  teardown

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/bind9/+bug/1909950/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to