While the workaround is being prepared to get SRUed to the stable
releases, I prepared the dbus packages with the two patches Simon
proposed for testing.

https://launchpad.net/~sil2100/+archive/ubuntu/ppa

Could anyone that was able to reproduce the original issue install the
dbus packages from the above PPA and re-try the tests to see if the
issue is reproducible? The following packages have the workaround
reverted and the two requested patches applied. I prepared both xenial
and zesty packages in the PPA for testing purposes.

Thanks!

** Description changed:

  [Impact]
  
  The bug affects multiple users and introduces an user visible delay (~25
  seconds) on SSH connections after a large number of sessions have been
  processed. This has a serious impact on big systems and servers running
  our software.
  
  The currently proposed fix is actually a safe workaround for the bug as
  proposed by the dbus upstream. The workaround makes uid 0 immune to the
  pending_fd_timeout limit that kicks in and causes the original issue.
  
  [Test Case]
  
- .
+ lxc launch ubuntu:x test
+ lxc exec test -- login -f ubuntu
+ ssh-import-id <whatever>
+ 
+ Then ran a script as follows (passing in ubuntu@<container-ip>):
+ 
+ while [ 1 ]; do
+     (time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log
+ done
+ 
+ Then checking the log file if there are any ssh sessions that are taking
+ 25+ seconds to complete.
+ 
+ Multiple instances of the same script can be used at the same time.
  
  [Regression Potential]
  
  The fix has a rather low regression potential as the workaround is a
  very small change only affecting one particular case - handling of uid
  0. The fix has been tested by multiple users and has been around in
  zesty for a while, with multiple people involved in reviewing the
  change. It's also a change that has been proposed by upstream.
- 
  
  [Original Description]
  
  I noticed on a system that accepts large numbers of SSH connections that
  after awhile, SSH sessions were taking ~25 seconds to complete.
  
  Looking in /var/log/auth.log, systemd-logind starts failing with the
  following:
  
  Jun 10 23:55:28 test sshd[3666]: pam_unix(sshd:session): session opened for 
user ubuntu by (uid=0)
  Jun 10 23:55:28 test systemd-logind[105]: New session c1052 of user ubuntu.
  Jun 10 23:55:28 test systemd-logind[105]: Failed to abandon session scope: 
Transport endpoint is not connected
  Jun 10 23:55:28 test sshd[3666]: pam_systemd(sshd:session): Failed to create 
session: Message recipient disconnected from message bus without replying
  
  I reproduced this in an LXD container by doing something like:
  
  lxc launch ubuntu:x test
  lxc exec test -- login -f ubuntu
  ssh-import-id <whatever>
  
  Then ran a script as follows (passing in ubuntu@<container-ip>):
  
  while [ 1 ]; do
      (time ssh $1 "echo OK > /dev/null") 2>&1 | grep ^real >> log
  done
  
  In my case, after 1052 logins, the 1053rd and thereafter were taking 25+
  seconds to complete. Here are some snippets from the log file:
  
  $ cat log | grep 0m0 | wc -l
  1052
  
  $ cat log | grep 0m25 | wc -l
  4
  
  $ tail -5 log
  real  0m0.222s
  real  0m25.232s
  real  0m25.235s
  real  0m25.236s
  real  0m25.239s
  
  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: systemd 229-4ubuntu5
  ProcVersionSignature: Ubuntu 4.4.0-22.40-generic 4.4.8
  Uname: Linux 4.4.0-22-generic x86_64
  ApportVersion: 2.20.1-0ubuntu2
  Architecture: amd64
  Date: Sat Jun 11 00:09:34 2016
  MachineType: Notebook W230SS
  ProcEnviron:
   TERM=xterm-256color
   PATH=(custom, no user)
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-22-generic 
root=/dev/mapper/ubuntu--vg-root ro quiet splash
  SourcePackage: systemd
  SystemdDelta:
   [EXTENDED]   /lib/systemd/system/rc-local.service → 
/lib/systemd/system/rc-local.service.d/debian.conf
   [EXTENDED]   /lib/systemd/system/systemd-timesyncd.service → 
/lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf
  
   2 overridden configuration files found.
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 04/15/2014
  dmi.bios.vendor: American Megatrends Inc.
  dmi.bios.version: 4.6.5
  dmi.board.asset.tag: Tag 12345
  dmi.board.name: W230SS
  dmi.board.vendor: Notebook
  dmi.board.version: Not Applicable
  dmi.chassis.asset.tag: No Asset Tag
  dmi.chassis.type: 9
  dmi.chassis.vendor: Notebook
  dmi.chassis.version: N/A
  dmi.modalias: 
dmi:bvnAmericanMegatrendsInc.:bvr4.6.5:bd04/15/2014:svnNotebook:pnW230SS:pvrNotApplicable:rvnNotebook:rnW230SS:rvrNotApplicable:cvnNotebook:ct9:cvrN/A:
  dmi.product.name: W230SS
  dmi.product.version: Not Applicable
  dmi.sys.vendor: Notebook

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1591411

Title:
  systemd-logind must be restarted every ~1000 SSH logins to prevent a
  ~25 second delay

To manage notifications about this bug go to:
https://bugs.launchpad.net/dbus/+bug/1591411/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to