[Yahoo-eng-team] [Bug 1998789] Re: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised
** Also affects: keystone (Ubuntu) Importance: Undecided Status: New ** Also affects: keystone (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: keystone (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: cloud-archive Importance: Undecided Status: New ** Also affects: cloud-archive/victoria Importance: Undecided Status: New ** Also affects: cloud-archive/zed Importance: Undecided Status: New ** Also affects: cloud-archive/ussuri Importance: Undecided Status: New ** Also affects: cloud-archive/xena Importance: Undecided Status: New ** Also affects: cloud-archive/wallaby Importance: Undecided Status: New ** Also affects: cloud-archive/yoga Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1998789 Title: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised Status in Ubuntu Cloud Archive: New Status in Ubuntu Cloud Archive ussuri series: New Status in Ubuntu Cloud Archive victoria series: New Status in Ubuntu Cloud Archive wallaby series: New Status in Ubuntu Cloud Archive xena series: New Status in Ubuntu Cloud Archive yoga series: New Status in Ubuntu Cloud Archive zed series: New Status in OpenStack Identity (keystone): Fix Released Status in keystone package in Ubuntu: New Status in keystone source package in Focal: New Status in keystone source package in Jammy: New Bug description: This is a follow-up issue for LP#1896125. This problem has happened when LDAP connection pooling is on (use_pool=True), page_size > 0 and pool_connection_timeout is < 'ldap server response time'. The scenario is as follows: - An user tries to log in to a domain that is attached to LDAP backend. - LDAP server does not respond in `pool_connection_timeout` seconds, causing LDAP connection to raise a ldap.TIMEOUT() exception - From now on, all subsequent LDAP requests will fail with ldappool.MaxConnectionReachedError An in-depth analysis explains why it happens: - LDAP query initiated for user login request with BaseLdap._ldap_get() function call, which grabs a connection with self.get_connection() and invokes conn.search_s() - conn.search_s() invokes conn._paged_search_s() since page_size is > 0 - conn._paged_search_s() calls conn.search_ext() (PooledLDAPHandler.search_ext) method - conn.search_ext() initiates an asynchronous LDAP request and returns an AsynchronousMessage object to the _paged_search_s(), representing the request. - conn._paged_search_s() tries to obtain asynchronous LDAP request results via calling conn.result3() (PooledLDAPHandler.result3) - conn.result3() calls message.connection.result3() - the server cannot respond in pool_connection_timeout seconds, - message.connection.result3() raises a ldap.TIMEOUT(), causes subsequent connection release function, message.clean() to be not called - the connection is kept active forever, subsequent requests cannot use it anymore Reproducer: - Deploy an LDAP server of your choice - Fill it with many data so the search takes more than `pool_connection_timeout` seconds - Define a keystone domain with the LDAP driver with following options: [ldap] use_pool = True page_size = 100 pool_connection_timeout = 3 pool_retry_max = 3 pool_size = 10 - Point the domain to the LDAP server - Try to login to the OpenStack dashboard, or try to do anything that uses the LDAP user - Observe the /var/log/apache2/keystone_error.log, it should contain ldap.TIMEOUT() stack traces followed by `ldappool.MaxConnectionReachedError` stack traces Known workarounds: - Disable LDAP pooling by setting use_pool=Flase - Set page_size to 0 To manage notifications about this bug go to: https://bugs.launchpad.net/cloud-archive/+bug/1998789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1998789] Re: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised
Reviewed: https://review.opendev.org/c/openstack/keystone/+/866723 Committed: https://opendev.org/openstack/keystone/commit/ff632a81fb09e6d9f3298e494d53eb6df50269cf Submitter: "Zuul (22348)" Branch:master commit ff632a81fb09e6d9f3298e494d53eb6df50269cf Author: Mustafa Kemal Gilor Date: Mon Dec 5 17:33:47 2022 +0300 [PooledLDAPHandler] Ensure result3() invokes message.clean() result3 does not invoke message.clean() when an exception is thrown by `message.connection.result3()` call, causing pool connection associated with the message to be marked active forever. This causes a denial-of-service on ldappool. The fix ensures message.clean() is invoked by wrapping the offending call in try-except-finally and putting the message.clean() in finally block. Closes-Bug: #1998789 Change-Id: I59ebf0fa77391d49b2349e918fc55f96318c42a6 Signed-off-by: Mustafa Kemal Gilor ** Changed in: keystone Status: In Progress => Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Identity (keystone). https://bugs.launchpad.net/bugs/1998789 Title: PooledLDAPHandler.result3 does not release pool connection back when an exception is raised Status in OpenStack Identity (keystone): Fix Released Bug description: This is a follow-up issue for LP#1896125. This problem has happened when LDAP connection pooling is on (use_pool=True), page_size > 0 and pool_connection_timeout is < 'ldap server response time'. The scenario is as follows: - An user tries to log in to a domain that is attached to LDAP backend. - LDAP server does not respond in `pool_connection_timeout` seconds, causing LDAP connection to raise a ldap.TIMEOUT() exception - From now on, all subsequent LDAP requests will fail with ldappool.MaxConnectionReachedError An in-depth analysis explains why it happens: - LDAP query initiated for user login request with BaseLdap._ldap_get() function call, which grabs a connection with self.get_connection() and invokes conn.search_s() - conn.search_s() invokes conn._paged_search_s() since page_size is > 0 - conn._paged_search_s() calls conn.search_ext() (PooledLDAPHandler.search_ext) method - conn.search_ext() initiates an asynchronous LDAP request and returns an AsynchronousMessage object to the _paged_search_s(), representing the request. - conn._paged_search_s() tries to obtain asynchronous LDAP request results via calling conn.result3() (PooledLDAPHandler.result3) - conn.result3() calls message.connection.result3() - the server cannot respond in pool_connection_timeout seconds, - message.connection.result3() raises a ldap.TIMEOUT(), causes subsequent connection release function, message.clean() to be not called - the connection is kept active forever, subsequent requests cannot use it anymore Reproducer: - Deploy an LDAP server of your choice - Fill it with many data so the search takes more than `pool_connection_timeout` seconds - Define a keystone domain with the LDAP driver with following options: [ldap] use_pool = True page_size = 100 pool_connection_timeout = 3 pool_retry_max = 3 pool_size = 10 - Point the domain to the LDAP server - Try to login to the OpenStack dashboard, or try to do anything that uses the LDAP user - Observe the /var/log/apache2/keystone_error.log, it should contain ldap.TIMEOUT() stack traces followed by `ldappool.MaxConnectionReachedError` stack traces Known workarounds: - Disable LDAP pooling by setting use_pool=Flase - Set page_size to 0 To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1998789/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp