Hi Rex and Snakekick, I have built test packages for Bionic, Focal, Hirsute and Impish, with the below commit included:
commit da55e3e69707de416b7949d08c165c950090bbb6 From: Iker Pedrosa <ipedr...@redhat.com> Date: Wed, 3 Mar 2021 15:34:49 +0100 Subject: ldap: retry ldap_install_tls() when watchdog interruption Link: https://github.com/SSSD/sssd/commit/da55e3e69707de416b7949d08c165c950090bbb6 Test packages can be found in the below ppa: https://launchpad.net/~mruffell/+archive/ubuntu/lp1921494-test Can you please test the new packages and report back and let me know if they solve your problem? Please note, these test packages are NOT SUPPORTED by Canonical and are for TEST PURPOSES ONLY. ONLY install in a dedicated test environment. Instructions to Install: 1) sudo add-apt-repository ppa:mruffell/lp1921494-test 2) sudo apt update 3) sudo apt upgrade # or just install the sssd packages, up to you, there may be many of them. 4) sudo apt-cache policy sssd | grep Installed For Impish: 2.4.1-2ubuntu4+lp1921494v20211011b1 For Hirsute: 2.4.0-1ubuntu6.1+lp1921494v20211011b1 For Focal: 2.2.3-3ubuntu0.7+lp1921494v20211011b2 For Bionic: 1.16.1-1ubuntu1.8+lp1921494v20211011b1 If the test packages work, I will submit the patches for SRU and we can get this fixed in official packages. Thanks, Matthew ** Description changed: [Impact] If you enable ad_use_ldaps on your sssd config, and have your sssd configured to use TLS instead of the regular GSS-SPNEGO or GSSAPI encryption, if you have a slow AD server or a busy network, the watchdog could timeout the call to ldap_install_tls() before it completes, and you won't be able to connect to the AD server, since the TLS handshake will fail. If you set debug_level to 4 or higher, you will see the following in sssd_ldap_server.log: [set_server_common_status] (0x0100): Marking server 'ad-server.company.com' as 'name resolved' [be_resolve_server_process] (0x0200): Found address for server ad-server.company.com: [y.y.y.y] TTL 3600 [ad_resolve_callback] (0x0100): Constructed uri 'ldaps://ad-server.company.com' [ad_resolve_callback] (0x0100): Constructed GC uri 'ldaps://ad-server.company.com' [sssd_async_socket_init_send] (0x0400): Setting 6 seconds timeout for connecting [sss_ldap_init_sys_connect_done] (0x0020): ldap_install_tls failed: [Connect error] [(unknown error code)] [sss_ldap_init_state_destructor] (0x0400): calling ldap_unbind_ext for ldap:[0x55d1149ef6e0] sd:[18] [sss_ldap_init_state_destructor] (0x0400): closing socket [18] [sdap_sys_connect_done] (0x0020): sdap_async_connect_call request failed: [5]: Input/output error. [fo_set_port_status] (0x0100): Marking port 389 of server 'ad-server.company.com' as 'not working' [fo_set_port_status] (0x0400): Marking port 389 of duplicate server 'ad-server.company.com' as 'not working' ldapsearch with ldaps will work correctly in the same environment: # openssl s_client -connect company-ad-server.company.com:636 CONNECTED(00000005) # ldapsearch -v -H ldaps://company-ad-server.company.com:636 -b "dc=company,dc=com" "(sAMAccountName=superduperuser)" ldap_initialize( ldaps://company-ad-server.company.com:636/??base ) SASL/GSSAPI authentication started SASL username: superduperu...@company.com SASL SSF: 0 filter: (sAMAccountName=superduperuser) requesting: All userApplication attributes <snip> # Duperuser\2C Super ADM, Users, Admin, company.com dn: CN=Duperuser\, Super ADM,OU=Internal,OU=Users,OU=Admin,DC=company,DC=com <snip> A workaround is to simply try again, since this a race condition, and you might beat the watchdog on subsequent retries. Otherwise, disable ad_use_ldaps until a fix is available. [Testcase] You will need a Windows 2k19 server with Active Directory installed and configured, and create some users in Active Directory. On the Ubuntu client, join the AD server using realm. You will need to import the AD certificate too. When importing the TLS certificate, you can add it to /etc/ssl/ca-certificates, and edit /etc/ldap/ldap.conf and set: TLS_CACERT /etc/ssl/certs/ca-certificates.crt Edit /etc/sssd/sssd.conf and ensure that ldap_tls_cacert is set correctly to "ldap_tls_cacert = /etc/ssl/certs/ca-certificates.crt", and enable "ad_use_ldaps = True". Then restart sssd with: $ sudo systemctl restart sssd.service If you have a slow server or busy network, the watchdog will kill the call to ldap_install_tls() before it completes, and sssd will fail to start. You may need several attempts to reproduce. Just keep restarting sssd.service. + Test packages are available in the below ppa: + https://launchpad.net/~mruffell/+archive/ubuntu/lp1921494-test + + When using the test packages, sssd should start reliably everytime. + [Where problems could occur] The changes only affect users who implement ad_use_ldaps, and only those who use TLS. Those using GSS-SPNEGO with ad_use_ldaps would not be affected, and neither those not using ad_use_ldaps. The patch checks for failure of TLS handshake with the AD server, and adds a retry if the failure was caused by the watchdog killing the call to ldap_install_tls(). This happens very early on in sssd service startup, and if a regression were to occur, a system administrator would notice almost immediately and downgrade the package. If a regression were to occur, a workaround is to 1) change from tls to GSS_SPNEGO, or 2) disable ad_use_ldaps. [Other info] This is reported upstream in: https://github.com/SSSD/sssd/issues/5531 The commit which fixes the issue is: commit da55e3e69707de416b7949d08c165c950090bbb6 From: Iker Pedrosa <ipedr...@redhat.com> Date: Wed, 3 Mar 2021 15:34:49 +0100 Subject: ldap: retry ldap_install_tls() when watchdog interruption Link: https://github.com/SSSD/sssd/commit/da55e3e69707de416b7949d08c165c950090bbb6 This landed in sssd 2.5.0, so Bionic, Focal, Hirsute and Impish all require fixing. The commit is a cherry pick to Focal, Hirsute and Impish, while Bionic requires a backport for minor context adjustments. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1921494 Title: ldap_install_tls occasionally fails due to watchdog timeout when using ad_use_ldaps with tls To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/sssd/+bug/1921494/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs