On Wed, 2021-05-19 at 11:09 +0200, Juergen Gross wrote:

On 18.05.21 20:11, Julien Grall wrote:

Hi Juergen,


I have started to look at preserving transaction accross Live-update in


C Xenstored. So far, I managed to transfer transaction that read/write

existing nodes.


Now, I am running into trouble to transfer new/deleted node within a

transaction with the existing migration format.


C Xenstored will keep track of nodes accessed during the transaction but

not the children (AFAICT for performance reason).


Not performance reasons, but because there isn't any need for that:


The children are either unchanged (so the non-transaction node records

apply), or they will be among the tracked nodes (transaction node

records apply). So in both cases all children should be known.


In case a child has been deleted in the transaction, the stream should

contain a node record for that child with the transaction-id and the

number of permissions being zero: see docs/designs/xenstore-migration.md


The problem for oxenstored is that you might've taken a snapshot in the past, 
your root has moved on, but you have in your snapshot a lot of nodes that have 
been deleted in the latest root.

A brute force way might be to diff the transaction's state and the latest root 
state and dump the delta entries as adding/deleting nodes in the migration 
stream.
This could lead to dumping a lot of duplicate state, and result in an explosion 
of file size (e.g. if you run 1000 domain, the current max supported limit  and 
each has one tiny transaction from the past
this will lead to 1000x amplification of xenstore size in the dump. In-memory 
is fine because OCaml will share common tree nodes that are unchanged).
This should correctly restore content but have a bad effect on conflict 
semantics: your migrated transactions will all then likely conflict at the 
root, or near the root and fail anyway.
Whereas without a live-update as long as you do not modify any of the old state 
you would get the conflict marker further down the tree and most of the time 
able to avoid conflicts.
I've tried implementing this last year: 
https://github.com/edwintorok/xen/pull/2/commits/a9f057131b75e1bd2dcb49c795630ab5875b7f76#diff-0f4826471775d78bfc6922c63152e268ef386171ebd985208cb82e21c621e749R288-R365
(ignore the awful indentation that code has been rebased with ignore_all_space 
so many times between different branches of Xen that whitespace correctness has 
been lost)

I've got a fuzzer/unit test for live-update (see xen-devel), but it has 
transactions turned off currently because I couldn't get it to work reliably, 
it always found examples where the transaction conflict state was not identical 
pre/post update.
If we abort all transactions after migration as discussed previously then it 
might be possible to get this to work if we accept the size explosion as a 
possibility and dump transaction state to /var/tmp, not to /tmp (which might be 
a tmpfs that gives you ENOSPC).

Live updates are a fairly niche use case and I'd like to see the current 
variant without transactions proven to work on an actual XSA (likely the next 
oxenstored XSA about queue limits if we find a solution to that),
and only after that deploy live-update support with transactions.
We also completely lack any unit tests for transactions (aside from the fuzzer 
that I started writing, which does just some very minimal state comparisons), 
we do not have a formal model on how transactions
and transaction conflicts should be handled to check whether transactions 
behave correctly, though a fairly good appromixation is: run 2 oxenstored one 
with and without live-update and check that they produce equivalent
(not necessarily identical, txid can change) answers. As long as we do not have 
to change the transaction semantics or code in any way to support live update.

Best regards,
--Edwin




Juergen

[CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments 
unless you have verified the sender and know the content is safe.

Reply via email to