While the workaround is being prepared to get SRUed to the stable releases, I prepared the dbus packages with the two patches Simon proposed for testing.
https://launchpad.net/~sil2100/+archive/ubuntu/ppa Could anyone that was able to reproduce the original issue install the dbus packages from the above PPA and re-try the tests to see if the issue is reproducible? The following packages have the workaround reverted and the two requested patches applied. I prepared both xenial and zesty packages in the PPA for testing purposes. Thanks! ** Description changed: [Impact] The bug affects multiple users and introduces an user visible delay (~25 seconds) on SSH connections after a large number of sessions have been processed. This has a serious impact on big systems and servers running our software. The currently proposed fix is actually a safe workaround for the bug as proposed by the dbus upstream. The workaround makes uid 0 immune to the pending_fd_timeout limit that kicks in and causes the original issue. [Test Case] - . + lxc launch ubuntu:x test + lxc exec test -- login -f ubuntu + ssh-import-id <whatever> + + Then ran a script as follows (passing in ubuntu@<container-ip>): + + while [ 1 ]; do + (time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log + done + + Then checking the log file if there are any ssh sessions that are taking + 25+ seconds to complete. + + Multiple instances of the same script can be used at the same time. [Regression Potential] The fix has a rather low regression potential as the workaround is a very small change only affecting one particular case - handling of uid 0. The fix has been tested by multiple users and has been around in zesty for a while, with multiple people involved in reviewing the change. It's also a change that has been proposed by upstream. - [Original Description] I noticed on a system that accepts large numbers of SSH connections that after awhile, SSH sessions were taking ~25 seconds to complete. Looking in /var/log/auth.log, systemd-logind starts failing with the following: Jun 10 23:55:28 test sshd[3666]: pam_unix(sshd:session): session opened for user ubuntu by (uid=0) Jun 10 23:55:28 test systemd-logind[105]: New session c1052 of user ubuntu. Jun 10 23:55:28 test systemd-logind[105]: Failed to abandon session scope: Transport endpoint is not connected Jun 10 23:55:28 test sshd[3666]: pam_systemd(sshd:session): Failed to create session: Message recipient disconnected from message bus without replying I reproduced this in an LXD container by doing something like: lxc launch ubuntu:x test lxc exec test -- login -f ubuntu ssh-import-id <whatever> Then ran a script as follows (passing in ubuntu@<container-ip>): while [ 1 ]; do (time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log done In my case, after 1052 logins, the 1053rd and thereafter were taking 25+ seconds to complete. Here are some snippets from the log file: $ cat log | grep 0m0 | wc -l 1052 $ cat log | grep 0m25 | wc -l 4 $ tail -5 log real 0m0.222s real 0m25.232s real 0m25.235s real 0m25.236s real 0m25.239s ProblemType: Bug DistroRelease: Ubuntu 16.04 Package: systemd 229-4ubuntu5 ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8 Uname: Linux 4.4.0-22-generic x86_64 ApportVersion: 2.20.1-0ubuntu2 Architecture: amd64 Date: Sat Jun 11 00:09:34 2016 MachineType: Notebook W230SS ProcEnviron: TERM=xterm-256color PATH=(custom, no user) ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-22-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash SourcePackage: systemd SystemdDelta: [EXTENDED] /lib/systemd/system/rc-local.service → /lib/systemd/system/rc-local.service.d/debian.conf [EXTENDED] /lib/systemd/system/systemd-timesyncd.service → /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf 2 overridden configuration files found. UpgradeStatus: No upgrade log present (probably fresh install) dmi.bios.date: 04/15/2014 dmi.bios.vendor: American Megatrends Inc. dmi.bios.version: 4.6.5 dmi.board.asset.tag: Tag 12345 dmi.board.name: W230SS dmi.board.vendor: Notebook dmi.board.version: Not Applicable dmi.chassis.asset.tag: No Asset Tag dmi.chassis.type: 9 dmi.chassis.vendor: Notebook dmi.chassis.version: N/A dmi.modalias: dmi:bvnAmericanMegatrendsInc.:bvr4.6.5:bd04/15/2014:svnNotebook:pnW230SS:pvrNotApplicable:rvnNotebook:rnW230SS:rvrNotApplicable:cvnNotebook:ct9:cvrN/A: dmi.product.name: W230SS dmi.product.version: Not Applicable dmi.sys.vendor: Notebook -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1591411 Title: systemd-logind must be restarted every ~1000 SSH logins to prevent a ~25 second delay To manage notifications about this bug go to: https://bugs.launchpad.net/dbus/+bug/1591411/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs