On 09.05.23 20:46, Julien Grall wrote:
Hi Juergen,

On 08/05/2023 12:47, Juergen Gross wrote:
Add the node accounting to the accounting information buffering in
order to avoid having to undo it in case of failure.

This requires to call domain_nbentry_dec() before any changes to the
data base, as it can return an error now.

Signed-off-by: Juergen Gross <jgr...@suse.com>
---
V5:
- add error handling after domain_nbentry_dec() calls (Julien Grall)
---
  tools/xenstore/xenstored_core.c   | 29 +++++++----------------------
  tools/xenstore/xenstored_domain.h |  4 ++--
  2 files changed, 9 insertions(+), 24 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 8392bdec9b..22da434e2a 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1454,7 +1454,6 @@ static void destroy_node_rm(struct connection *conn, struct node *node)
  static int destroy_node(struct connection *conn, struct node *node)
  {
      destroy_node_rm(conn, node);
-    domain_nbentry_dec(conn, get_node_owner(node));
      /*
       * It is not possible to easily revert the changes in a transaction.
@@ -1645,6 +1644,9 @@ static int delnode_sub(const void *ctx, struct connection *conn,
      if (ret > 0)
          return WALK_TREE_SUCCESS_STOP;
+    if (domain_nbentry_dec(conn, get_node_owner(node)))
+        return WALK_TREE_ERROR_STOP;

I think there is a potential issue with the buffering here. In case of failure, the node could have been removed, but the quota would not be properly accounted.

You mean the case where another node has been deleted and due to accounting
buffering the corrected accounting data wouldn't be committed?

This is no problem, as an error returned by delnode_sub() will result in
corrupt() being called, which in turn will correct the accounting data.

Also, I think a comment would be warrant to explain why we are returning WALK_TREE_ERROR_STOP here when...

+
      /* In case of error stop the walk. */
      if (!ret && do_tdb_delete(conn, &key, &node->acc))
          return WALK_TREE_SUCCESS_STOP;

... this is not the case when do_tdb_delete() fails for some reasons.

The main idea was that the remove is working from the leafs towards the root.
In case one entry can't be removed, we should just stop.

OTOH returning WALK_TREE_ERROR_STOP might be cleaner, as this would make sure
that accounting data is repaired afterwards. Even if do_tdb_delete() can't
really fail in normal configurations, as the only failure reasons are:

- the node isn't found (quite unlikely, as we just read it before entering
  delnode_sub()), or
- writing the updated data base failed (in normal configurations it is in
  already allocated memory, so no way to fail that)

I think I'll switch to return WALK_TREE_ERROR_STOP here.


Juergen

Attachment: OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature
Description: OpenPGP digital signature

Reply via email to