[Yahoo-eng-team] [Bug 1251123] Re: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend)
** Changed in: keystone Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Keystone. https://bugs.launchpad.net/bugs/1251123 Title: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend) Status in OpenStack Identity (Keystone): Fix Released Status in Keystone havana series: Fix Released Bug description: [Problem statement] In Havana, when using memcached as the backend of token store, we have been seeing significant performance drop by comparison with Grizzly. [How to reproduce] We used a Python script to boot VMs at the rate of 1 VM per second. We have seen a lot of VM creation failed and the Keystone-all process's CPU utilization was nearly 100%. [Analysis] When using memcached as token's backend, keystone stores two types of K-V pairs into memcached. token_id === token data (associated with an TTL) user_id === a list of ids for tokens that belong to the user When creating a new token, Keystone first adds the (token_id, data) pair into memcahce, and then update the (user_id, token_id_list) pair in function _update_user_list_with_cas. What _update_user_list_with_cas does are: 1. retrieve the old list 2. for each token_id in the old list, retrieve the token data to check whether it is expired or not. 3. discard the expired tokens, add the valid token_ids to a new list 4. append the newly created token's id to the new list too. 5. use memcached's Compare-And-Set function to replace the old list with the new list In practice we have found it is very usual that a user have thousands of valid tokens at a given moment, so the step 2 consumes a lot of time. What's worse is that CAS tends to end up with failure and retry, which makes this function even less efficient. [Proposed fix] I'd like to propose a 'lazy cleanup of expired token_ids from the user list' solution. The idea is to avoid doing the clean up EVERY TIME when a new token is created. We can set a dynamic threshold T for each user, and cleanup job will be triggered only when the number of token_ids exceeds the threshold T. After every cleanup, it will check how many token_ids have been cleaned up, if the percentage is lower than a pre-specified P, than the T needs to be increased to T*(1+P) to avoid too frequent clean-ups. Besides, every time the list_tokens function for a given user is called, it will always trigger a clean-up action. It is necessary to ensure list_tokens always return valid tokens only. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1251123/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp
[Yahoo-eng-team] [Bug 1251123] Re: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend)
** Changed in: keystone/havana Status: Fix Committed = Fix Released -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to Keystone. https://bugs.launchpad.net/bugs/1251123 Title: _update_user_list_with_cas causes significant overhead (when using memcached as token store backend) Status in OpenStack Identity (Keystone): Fix Committed Status in Keystone havana series: Fix Released Bug description: [Problem statement] In Havana, when using memcached as the backend of token store, we have been seeing significant performance drop by comparison with Grizzly. [How to reproduce] We used a Python script to boot VMs at the rate of 1 VM per second. We have seen a lot of VM creation failed and the Keystone-all process's CPU utilization was nearly 100%. [Analysis] When using memcached as token's backend, keystone stores two types of K-V pairs into memcached. token_id === token data (associated with an TTL) user_id === a list of ids for tokens that belong to the user When creating a new token, Keystone first adds the (token_id, data) pair into memcahce, and then update the (user_id, token_id_list) pair in function _update_user_list_with_cas. What _update_user_list_with_cas does are: 1. retrieve the old list 2. for each token_id in the old list, retrieve the token data to check whether it is expired or not. 3. discard the expired tokens, add the valid token_ids to a new list 4. append the newly created token's id to the new list too. 5. use memcached's Compare-And-Set function to replace the old list with the new list In practice we have found it is very usual that a user have thousands of valid tokens at a given moment, so the step 2 consumes a lot of time. What's worse is that CAS tends to end up with failure and retry, which makes this function even less efficient. [Proposed fix] I'd like to propose a 'lazy cleanup of expired token_ids from the user list' solution. The idea is to avoid doing the clean up EVERY TIME when a new token is created. We can set a dynamic threshold T for each user, and cleanup job will be triggered only when the number of token_ids exceeds the threshold T. After every cleanup, it will check how many token_ids have been cleaned up, if the percentage is lower than a pre-specified P, than the T needs to be increased to T*(1+P) to avoid too frequent clean-ups. Besides, every time the list_tokens function for a given user is called, it will always trigger a clean-up action. It is necessary to ensure list_tokens always return valid tokens only. To manage notifications about this bug go to: https://bugs.launchpad.net/keystone/+bug/1251123/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp