Public bug reported: Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows.
Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and also makes it unclear if archive progress is actually being made. ** Affects: nova Importance: Undecided Assignee: melanie witt (melwitt) Status: New ** Affects: nova/antelope Importance: Undecided Status: New ** Affects: nova/wallaby Importance: Undecided Status: New ** Affects: nova/xena Importance: Undecided Status: New ** Affects: nova/yoga Importance: Undecided Status: New ** Affects: nova/zed Importance: Undecided Status: New ** Tags: db performance ** Description changed: - Observed downstream in a large scale cluster with constant create/delete + Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and - makes it unclear if archive progress is actually being made. + also makes it unclear if archive progress is actually being made. ** Also affects: nova/xena Importance: Undecided Status: New ** Also affects: nova/antelope Importance: Undecided Status: New ** Also affects: nova/zed Importance: Undecided Status: New ** Also affects: nova/wallaby Importance: Undecided Status: New ** Also affects: nova/yoga Importance: Undecided Status: New -- You received this bug notification because you are a member of Yahoo! Engineering Team, which is subscribed to OpenStack Compute (nova). https://bugs.launchpad.net/bugs/2024258 Title: Performance degradation archiving DB with large numbers of FK related records Status in OpenStack Compute (nova): New Status in OpenStack Compute (nova) antelope series: New Status in OpenStack Compute (nova) wallaby series: New Status in OpenStack Compute (nova) xena series: New Status in OpenStack Compute (nova) yoga series: New Status in OpenStack Compute (nova) zed series: New Bug description: Observed downstream in a large scale cluster with constant create/delete server activity and hundreds of thousands of deleted instances rows. Currently, we archive deleted rows in batches of max_rows parents + their child rows in a single database transaction. Doing it that way limits how high a value of max_rows can be specified by the caller because of the size of the database transaction it could generate. For example, in a large scale deployment with hundreds of thousands of deleted rows and constant server creation and deletion activity, a value of max_rows=1000 might exceed the database's configured maximum packet size or timeout due to a database deadlock, forcing the operator to use a much lower max_rows value like 100 or 50. And when the operator has e.g. 500,000 deleted instances rows (and millions of deleted rows total) they are trying to archive, being forced to use a max_rows value several orders of magnitude lower than the number of rows they need to archive is a poor user experience and also makes it unclear if archive progress is actually being made. To manage notifications about this bug go to: https://bugs.launchpad.net/nova/+bug/2024258/+subscriptions -- Mailing list: https://launchpad.net/~yahoo-eng-team Post to : yahoo-eng-team@lists.launchpad.net Unsubscribe : https://launchpad.net/~yahoo-eng-team More help : https://help.launchpad.net/ListHelp