Adding Don Slutz as he requested to be added Lars On 15/05/2015 20:44, "Konrad Rzeszutek Wilk" <konrad.w...@oracle.com> wrote:
>Hey! > >During the Xen Hacka^H^H^H^HProject Summit? we chatted about live-patching >the hypervisor. We sketched out how it could be done, and brainstormed >some of the problems. > >I took that and wrote an design - which is very much RFC. The design is >laid out in two sections - the format of the ELF payload - and then the >hypercalls to act on it. > >Hypercall preemption has caused a couple of XSAs so I've baked the need >for that in the design so we hopefully won't have an XSA for this code. > >There are two big *TODO* in the design which I had hoped to get done >before sending this out - however I am going on vacation for two weeks >so I figured it would be better to send this off for folks to mull now >then to have it languish. > >Please feel free to add more folks on the CC list. > >Enjoy! > > ># xSplice Design v1 (EXTERNAL RFC v2) > >## Rationale > >A mechanism is required to binarily patch the running hypervisor with new >opcodes that have come about due to primarily security updates. > >This document describes the design of the API that would allow us to >upload to the hypervisor binary patches. > >## Glossary > > * splice - patch in the binary code with new opcodes > * trampoline - a jump to a new instruction. > * payload - telemetries of the old code along with binary blob of the new > function (if needed). > * reloc - telemetries contained in the payload to construct proper >trampoline. > >## Multiple ways to patch > >The mechanism needs to be flexible to patch the hypervisor in multiple >ways >and be as simple as possible. The compiled code is contiguous in memory >with >no gaps - so we have no luxury of 'moving' existing code and must either >insert a trampoline to the new code to be executed - or only modify >in-place >the code if there is sufficient space. The placement of new code has to >be done >by hypervisor and the virtual address for the new code is allocated >dynamically. >i >This implies that the hypervisor must compute the new offsets when >splicing >in the new trampoline code. Where the trampoline is added (inside >the function we are patching or just the callers?) is also important. > >To lessen the amount of code in hypervisor, the consumer of the API >is responsible for identifying which mechanism to employ and how many >locations >to patch. Combinations of modifying in-place code, adding trampoline, etc >has to be supported. The API should allow read/write any memory within >the hypervisor virtual address space. > >We must also have a mechanism to query what has been applied and a >mechanism >to revert it if needed. > >We must also have a mechanism to: provide an copy of the old code - so >that >the hypervisor can verify it against the code in memory; the new code; >the symbol name of the function to be patched; or offset from the symbol; >or virtual address. > >The complications that this design will encounter are explained later >in this document. > >## Patching code > >The first mechanism to patch that comes in mind is in-place replacement. >That is replace the affected code with new code. Unfortunately the x86 >ISA is variable size which places limits on how much space we have >available >to replace the instructions. > >The second mechanism is by replacing the call or jump to the >old function with the address of the new function. > >A third mechanism is to add a jump to the new function at the >start of the old function. > >### Example of trampoline and in-place splicing > >As example we will assume the hypervisor does not have XSA-132 (see >*domctl/sysctl: don't leak hypervisor stack to toolstacks* >4ff3449f0e9d175ceb9551d3f2aecb59273f639d) and we would like to binary >patch >the hypervisor with it. The original code looks as so: > ><pre> > 48 89 e0 mov %rsp,%rax > 48 25 00 80 ff ff and $0xffffffffffff8000,%rax ></pre> > >while the new patched hypervisor would be: > ><pre> > 48 c7 45 b8 00 00 00 00 movq $0x0,-0x48(%rbp) > 48 c7 45 c0 00 00 00 00 movq $0x0,-0x40(%rbp) > 48 c7 45 c8 00 00 00 00 movq $0x0,-0x38(%rbp) > 48 89 e0 mov %rsp,%rax > 48 25 00 80 ff ff and $0xffffffffffff8000,%rax ></pre> > >This is inside the arch_do_domctl. This new change adds 21 extra >bytes of code which alters all the offsets inside the function. To alter >these offsets and add the extra 21 bytes of code we might not have enough >space in .text to squeze this in. > >As such we could simplify this problem by only patching the site >which calls arch_do_domctl: > ><pre> ><do_domctl>: > e8 4b b1 05 00 callq ffff82d08015fbb9 <arch_do_domctl> ></pre> > >with a new address for where the new `arch_do_domctl` would be (this >area would be allocated dynamically). > >Astute readers will wonder what we need to do if we were to patch >`do_domctl` >- which is not called directly by hypervisor but on behalf of the guests >via >the `compat_hypercall_table` and `hypercall_table`. >Patching the offset in `hypercall_table` for `do_domctl: >(ffff82d080103079 <do_domctl>:) ><pre> > > ffff82d08024d490: 79 30 > ffff82d08024d492: 10 80 d0 82 ff ff > ></pre> >with the new address where the new `do_domctl` is possible. The other >place where it is used is in `hvm_hypercall64_table` which would need >to be patched in a similar way. This would require an in-place splicing >of the new virtual address of `arch_do_domctl`. > >In summary this example patched the callee of the affected function by > * allocating memory for the new code to live in, > * changing the virtual address of all the functions which called the old > code (computing the new offset, patching the callq with a new callq). > * changing the function pointer tables with the new virtual address of > the function (splicing in the new virtual address). Since this table > resides in the .rodata section we would need to temporarily change the > page table permissions during this part. > > >However it has severe drawbacks - the safety checks which have to make >sure >the function is not on the stack - must also check every caller. For some >patches this could if there were an sufficient large amount of callers >that we would never be able to apply the update. > >### Example of different trampoline patching. > >An alternative mechanism exists where we can insert an trampoline in the >existing function to be patched to jump directly to the new code. This >lessens the locations to be patched to one but it puts pressure on the >CPU branching logic (I-cache, but it is just one unconditional jump). > >For this example we will assume that the hypervisor has not been compiled >with fe2e079f642effb3d24a6e1a7096ef26e691d93e (XSA-125: *pre-fill >structures >for certain HYPERVISOR_xen_version sub-ops*) which mem-sets an structure >in `xen_version` hypercall. This function is not called **anywhere** in >the hypervisor (it is called by the guest) but referenced in the >`compat_hypercall_table` and `hypercall_table` (and indirectly called >from that). Patching the offset in `hypercall_table` for the old >`do_xen_version` (ffff82d080112f9e <do_xen_version>) > ></pre> > ffff82d08024b270 <hypercall_table> > ... > ffff82d08024b2f8: 9e 2f 11 80 d0 82 ff ff > ></pre> >with the new address where the new `do_xen_version` is possible. The other >place where it is used is in `hvm_hypercall64_table` which would need >to be patched in a similar way. This would require an in-place splicing >of the new virtual address of `do_xen_version`. > >An alternative solution would be to patch insert an trampoline in the >old `do_xen_version' function to directly jump to the new >`do_xen_version`. > ><pre> > ffff82d080112f9e <do_xen_version>: > ffff82d080112f9e: 48 c7 c0 da ff ff ff mov >$0xffffffffffffffda,%rax > ffff82d080112fa5: 83 ff 09 cmp $0x9,%edi > ffff82d080112fa8: 0f 87 24 05 00 00 ja ffff82d0801134d2 ><do_xen_version+0x534> ></pre> > >with: > ><pre> > ffff82d080112f9e <do_xen_version>: > ffff82d080112f9e: e9 XX YY ZZ QQ jmpq [new >do_xen_version] ></pre> > >which would lessen the amount of patching to just one location. > >In summary this example patched the affected function to jump to the >new replacement function which required: > * allocating memory for the new code to live in, > * inserting trampoline with new offset in the old function to point to >the > new function. > * Optionally we can insert in the old function an trampoline jump to an >function > providing an BUG_ON to catch errant code. > >The disadvantage of this are that the unconditional jump will consume a >small >I-cache penalty. However the simplicity of the patching of safety checks >make this a worthwhile option. > >### Security > >With this method we can re-write the hypervisor - and as such we **MUST** >be >diligent in only allowing certain guests to perform this operation. > >Furthermore with SecureBoot or tboot, we **MUST** also verify the >signature >of the payload to be certain it came from a trusted source. > >As such the hypercall **MUST** support an XSM policy to limit the what >guest is allowed. If the system is booted with signature checking the >signature checking will be enforced. > >## Payload format > >The payload **MUST** contain enough data to allow us to apply the update >and also safely reverse it. As such we **MUST** know: > > * What the old code is expected to be. We **MUST** verify it against the > runtime code. > * The locations in memory to be patched. This can be determined >dynamically > via symbols or via virtual addresses. > * The new code to be used. > * Signature to verify the payload. > >This binary format can be constructed using an custom binary format but >there are severe disadvantages of it: > > * The format might need to be change and we need an mechanism to >accommodate > that. > * It has to be platform agnostic. > * Easily constructed using existing tools. > >As such having the payload in an ELF file is the sensible way. We would be >carrying the various set of structures (and data) in the ELF sections >under >different names and with definitions. The prefix for the ELF section name >would always be: *.xsplice_* > >Note that every structure has padding. This is added so that the >hypervisor >can re-use those fields as it sees fit. > >There are five sections *.xsplice_* sections: > > * `.xsplice_symbols` and `.xsplice_str`. The array of symbols to be >referenced > during the update. This can contain the symbols (functions) that will >be > patched, or the list of symbols (functions) to be checked pre-patching >which > may not be on the stack. > >* `.xsplice_reloc` and `.xsplice_reloc_howto`. The howto properly >construct > trampolines for an patch. We can have multiple locations for which we > need to insert an trampoline for a payload and each location might >require > a different way of handling it. This would naturally reference the >`.text` > section and its proper offset. The `.xsplice_reloc` is not directly >concerned > with patches but rather is an ELF relocation - describing the target > of a relocation and how that is performed. They're also used for where > the new code references the run code too. > > * `.xsplice_sections`. The safety data for the old code and new code. > This contains an array of symbols (pointing to `.xsplice_symbols` to > and `.text`) which are to be used during safety and dependency >checking. > > > * `.xsplice_patches`: The description of the new functions to be patched > in (size, type, pointer to code, etc.). > > * `.xsplice_change`. The structure that ties all of this together and >defines > the payload. > >Additionally the ELF file would contain: > > * `.text` section for the new and old code (function). > * `.rela.text` relocation data for the `.text` (both new and old). > * `.rela.xsplice_patches` relocation data for `.xsplice_patches` (such >as offset > to the `.text` ,`.xsplice_symbols`, or `.xsplice_reloc` section). > * `.bss` section for the new code (function) > * `.data` and `.data.read_mostly` section for the new and old code >(function) > * `.rodata` section for the new and old code (function). > >In short the *.xsplice_* sections represent various structures and the >ELF provides the mechanism to glue it all together when loaded in memory. > >Note that a lot of these ideas are borrowed from kSplice which is >available at: https://github.com/jirislaby/ksplice > >For ELF understanding the best starting point is the OSDev Wiki >(http://wiki.osdev.org/ELF). Furthermore the ELF specification is >at http://www.skyfree.org/linux/references/ELF_Format.pdf and >at Oracle's web site: >http://docs.oracle.com/cd/E23824_01/html/819-0690/chapter6-46512.html#scro >lltoc > >### ASCII art of the ELF structures > >*TODO*: Include an ASCII art of how the sections are tied together. > >### xsplice_symbols > >The section contains an array of an structure that outlines the name >of the symbol to be patched (or checked against). The structure is >as follow: > ><pre> >struct xsplice_symbol { > const char *name; /* The ELF name of the symbol. */ > const char *label; /* A unique xSplice name for the symbol. */ > uint8_t pad[16]; /* Must be zero. */ >}; ></pre> >The structures may be in the section in any order and in any amount >(duplicate entries are permitted). > >Both `name` and `label` would be pointing to entries in `.xsplice_str`. > >The `label` is used for diagnostic purposes - such as including the >name and the offset. > >### xsplice_reloc and xsplice_reloc_howto > >The section contains an array of a structure that outlines the different >locations (and howto) for which an trampoline is to be inserted. > >The howto defines in the detail the change. It contains the type, >whether the relocation is relative, the size of the relocation, >bitmask for which parts of the instruction or data are to be replaced, >amount of final relocation is shifted by (to drop unwanted data), and >whether the replacement should be interpreted as signed value. > >The structure is as follow: > ><pre> >#define XSPLICE_HOWTO_RELOC_INLINE 0 /* Inline replacement. */ >#define XSPLICE_HOWTO_RELOC_PATCH 1 /* Add trampoline. */ >#define XSPLICE_HOWTO_RELOC_DATA 2 /* __DATE__ type change. */ >#define XSPLICE_HOWTO_RELOC_TIME 3 /* __TIME__ type chnage. */ >#define XSPLICE_HOWTO_BUG 4 /* BUG_ON being replaced.*/ >#define XSPLICE_HOWTO_EXTABLE 5 /* exception_table change. */ >#define XSPLICE_HOWTO_SYMBOL 6 /* change in symbol table. */ > >#define XSPLICE_HOWTO_FLAG_PC_REL 0x00000001 /* Is PC relative. */ >#define XSPLICE_HOWOT_FLAG_SIGN 0x00000002 /* Should the new value >be treated as signed value. */ > >struct xsplice_reloc_howto { > uint32_t type; /* XSPLICE_HOWTO_* */ > uint32_t flag; /* XSPLICE_HOWTO_FLAG_* */ > uint32_t size; /* Size, in bytes, of the item to be relocated. */ > uint32_t r_shift; /* The value the final relocation is shifted >right by; used to drop unwanted data from the relocation. */ > uint64_t mask; /* Bitmask for which parts of the instruction or >data are replaced with the relocated value. */ > uint8_t pad[8]; /* Must be zero. */ >}; > ></pre> > >This structure is used in: > ><pre> >struct xsplice_reloc { > uint64_t addr; /* The address of the relocation (if known). */ > struct xsplice_symbol *symbol; /* Symbol for this relocation. */ > struct xsplice_reloc_howto *howto; /* Pointer to the above >structure. */ > uint64_t isns_added; /* ELF addend resulting from quirks of >instruction one of whose operands is the relocation. For example, this is >-4 on x86 pc-relative jumps. */ > uint64_t isns_target; /* rest of the ELF addend. This is equal to >the offset against the symbol that the relocation refers to. */ > uint8_t pad[8]; /* Must be zero. */ >}; ></pre> > >### xsplice_sections > >The structure defined in this section is used to verify that it is safe >to update with the new changes. It can contain safety data on the old code >and what kind of matching we are to expect. > >It also can contain safety date of what to check when about to patch. >That is whether any of the addresses (either provided or resolved >when payload is loaded by referencing the symbols) are in memory >with what we expect it to be. > >As such the flags can be or-ed together: > ><pre> >#define XSPLICE_SECTION_TEXT 0x00000001 /* Section is in .text */ >#define XSPLICE_SECTION_RODATA 0x00000002 /* Section is in .ro */ >#define XSPLICE_SECTION_DATA 0x00000004 /* Section is in .rodata */ >#define XSPLICE_SECTION_STRING 0x00000008 /* Section is in .str */ >#define XSPLICE_SECTION_ALTINSTRUCTIONS 0x00000010 /* Section has >.altinstructions. */ >#define XSPLICE_SECTION_TEXT_INPLACE 0x00000200 /* Change is in place. */ > >#dekine XSPLICE_SECTION_MATCH_EXACT 0x00000400 /* Must match exactly. */ >#define XSPLICE_SECTION_NO_STACKCHECK 0x00000800 /* Do not check the >stack. */ > >struct xsplice_section { > struct xsplice_symbol *symbol; /* The symbol associated with this >change. */ > uint64_t address; /* The address of the section (if known). */ > uint64_t size; /* The size of the section. */ > uint64_t flags; /* Various XSPLICE_SECTION_* flags. */ > uint8_t pad[16]; /* To be zero. */ >}; > ></pre> > >### xsplice_patches > >Within this section we have an array of a structure defining the new code >(patch). > >This structure consist of an pointer to the new code (which in ELF ends up >pointing to an offset in `.text` or `.data` section); the type of patch: >inline - either text or data, or requesting an trampoline; and size of >patch. > >The structure is as follow: > ><pre> >#define XSPLICE_PATCH_INLINE_TEXT 0 >#define XSPLICE_PATCH_INLINE_DATA 1 >#define XSPLICE_PATCH_RELOC_TEXT 2 > >struct xsplice_patch { > uint32_t type; /* XSPLICE_PATCH_* .*/ > uint32_t size; /* Size of patch. */ > uint64_t addr; /* The address of the new code (or data). */ > void *content; /* The bytes to be installed. */ > uint8_t pad[16]; /* Must be zero. */ >}; > ></pre> > >### xsplice_code > >The structure embedded within this section ties it all together. >It has the name of the patch, and pointers to all the above >mentioned structures (the start and end addresses). > >The structure is as follow: > ><pre> >struct xsplice_code { > const char *name; /* A sensible name for the patch. Up to 40 >characters. */ > struct xsplice_reloc *relocs, *relocs_end; /* How to patch it */ > struct xsplice_section *sections, *sections_end; /* Safety data */ > struct xsplice_patch *patches, *patches_end; /* Patch code & data */ > uint8_t pad[32]; /* Must be zero. */ >}; ></pre> > >There should only be one such structure in the section. > >### Example > >*TODO*: Include an objdump of how the ELF would look like for the XSA >mentioned earlier. > >## Signature checking requirements. > >The signature checking requires that the layout of the data in memory >**MUST** be same for signature to be verified. This means that the payload >data layout in ELF format **MUST** match what the hypervisor would be >expecting such that it can properly do signature verification. > >The signature is based on the all of the payloads continuously laid out >in memory. The signature is to be appended at the end of the ELF payload >prefixed with the string '~Module signature appended~\n", followed by >an signature header then followed by the signature, key identifier, and >signers >name. > >Specifically the signature header would be: > ><pre> >#define PKEY_ALGO_DSA 0 >#define PKEY_ALGO_RSA 1 > >#define PKEY_ID_PGP 0 /* OpenPGP generated key ID */ >#define PKEY_ID_X509 1 /* X.509 arbitrary subjectKeyIdentifier */ > >#define HASH_ALGO_MD4 0 >#define HASH_ALGO_MD5 1 >#define HASH_ALGO_SHA1 2 >#define HASH_ALGO_RIPE_MD_160 3 >#define HASH_ALGO_SHA256 4 >#define HASH_ALGO_SHA384 5 >#define HASH_ALGO_SHA512 6 >#define HASH_ALGO_SHA224 7 >#define HASH_ALGO_RIPE_MD_128 8 >#define HASH_ALGO_RIPE_MD_256 9 >#define HASH_ALGO_RIPE_MD_320 10 >#define HASH_ALGO_WP_256 11 >#define HASH_ALGO_WP_384 12 >#define HASH_ALGO_WP_512 13 >#define HASH_ALGO_TGR_128 14 >#define HASH_ALGO_TGR_160 15 >#define HASH_ALGO_TGR_192 16 > > >struct elf_payload_signature { > u8 algo; /* Public-key crypto algorithm PKEY_ALGO_*. */ > u8 hash; /* Digest algorithm: HASH_ALGO_*. */ > u8 id_type; /* Key identifier type PKEY_ID*. */ > u8 signer_len; /* Length of signer's name */ > u8 key_id_len; /* Length of key identifier */ > u8 __pad[3]; > __be32 sig_len; /* Length of signature data */ >}; > ></pre> >(Note that this has been borrowed from Linux module signature code.). > > >## Hypercalls > >We will employ the sub operations of the system management hypercall >(sysctl). >There are to be four sub-operations: > > * upload the payloads. > * listing of payloads summary uploaded and their state. > * getting an particular payload summary and its state. > * command to apply, delete, or revert the payload. > >The patching is asynchronous therefore the caller is responsible >to verify that it has been applied properly by retrieving the summary of >it >and verifying that there are no error codes associated with the payload. > >We **MUST** make it asynchronous due to the nature of patching: it >requires >every physical CPU to be lock-step with each other. The patching mechanism >while an implementation detail, is not an short operation and as such >the design **MUST** assume it will be an long-running operation. > >Furthermore it is possible to have multiple different payloads for the >same >function. As such an unique id has to be visible to allow proper >manipulation. > >The hypercall is part of the `xen_sysctl`. The top level structure >contains >one uint32_t to determine the sub-operations: > ><pre> >struct xen_sysctl_xsplice_op { > uint32_t cmd; > union { > ... see below ... > } u; >}; > ></pre> >while the rest of hypercall specific structures are part of the this >structure. > > >### XEN_SYSCTL_XSPLICE_UPLOAD (0) > >Upload a payload to the hypervisor. The payload is verified and if there >are any issues the proper return code will be returned. The payload is >not applied at this time - that is controlled by >*XEN_SYSCTL_XSPLICE_ACTION*. > >The caller provides: > > * `id` unique id. > * `payload` the virtual address of where the ELF payload is. > >The return value is zero if the payload was succesfully uploaded and the >signature was verified. Otherwise an EXX return value is provided. >Duplicate `id` are not supported. > >The `payload` is the ELF payload as mentioned in the `Payload format` >section. > >The structure is as follow: > ><pre> >struct xen_sysctl_xsplice_upload { > char id[40]; /* IN, name of the patch. */ > uint64_t size; /* IN, size of the ELF file. */ > XEN_GUEST_HANDLE_64(uint8) payload; /* ELF file. */ >}; ></pre> > >### XEN_SYSCTL_XSPLICE_GET (1) > >Retrieve an summary of an specific payload. This caller provides: > > * `id` the unique id. > * `status` *MUST* be set to zero. > * `rc` *MUST* be set to zero. > >The `summary` structure contains an summary of payload which includes: > > * `id` the unique id. > * `status` - whether it has been: > 1. *XSPLICE_STATUS_LOADED* (0) has been loaded. > 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the >**XEN_SYSCTL_XSPLICE_ACTION** command. > 3. *XSPLICE_STATUS_CHECKED* (2) the ELF payload safety checks passed. > 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. > 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also >reverted. > 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult >`rc` for details. > * `rc` - its error state if any. > >The structure is as follow: > ><pre> >#define XSPLICE_STATUS_LOADED 0 >#define XSPLICE_STATUS_PROGRESS 1 >#define XSPLICE_STATUS_CHECKED 2 >#define XSPLICE_STATUS_APPLIED 3 >#define XSPLICE_STATUS_REVERTED 4 >#define XSPLICE_STATUS_IN_ERROR 5 > >struct xen_sysctl_xsplice_summary { > char id[40]; /* IN/OUT, name of the patch. */ > uint32_t status; /* OUT */ > int32_t rc; /* OUT */ >}; ></pre> > >### XEN_SYSCTL_XSPLICE_LIST (2) > >Retrieve an array of abbreviated summary of payloads that are loaded in >the >hypervisor. > >The caller provides: > > * `idx` index iterator. Initially it *MUST* be zero. > * `count` the max number of entries to populate. > * `summary` virtual address of where to write payload summaries. > >The hypercall returns zero on success and updates the `idx` (index) >iterator >with the number of payloads returned, `count` to the number of remaining >payloads, and `summary` with an number of payload summaries. > >If the hypercall returns E2BIG the `count` is too big and should be >lowered. > >Note that due to the asynchronous nature of hypercalls the domain might >have >added or removed the number of payloads making this information stale. It >is >the responsibility of the domain to provide proper accounting. > >The `summary` structure contains an summary of payload which includes: > > * `id` unique id. > * `status` - whether it has been: > 1. *XSPLICE_STATUS_LOADED* (0) has been loaded. > 2. *XSPLICE_STATUS_PROGRESS* (1) acting on the >**XEN_SYSCTL_XSPLICE_ACTION** command. > 3. *XSPLICE_STATUS_CHECKED* (2) the payload `old` and `addr` match with >the hypervisor. > 4. *XSPLICE_STATUS_APPLIED* (3) loaded, checked, and applied. > 5. *XSPLICE_STATUS_REVERTED* (4) loaded, checked, applied and then also >reverted. > 6. *XSPLICE_STATUS_IN_ERROR* (5) loaded and in a failed state. Consult >`rc` for details. > * `rc` - its error state if any. > >The structure is as follow: > ><pre> >struct xen_sysctl_xsplice_list { > uint32_t idx; /* IN/OUT */ > uint32_t count; /* IN/OUT */ > XEN_GUEST_HANDLE_64(xen_sysctl_xsplice_summary) summary; /* OUT */ >}; > >struct xen_sysctl_xsplice_summary { > char id[40]; /* OUT, name of the patch. */ > uint32_t status; /* OUT */ > int32_t rc; /* OUT */ >}; > ></pre> >### XEN_SYSCTL_XSPLICE_ACTION (3) > >Perform an operation on the payload structure referenced by the `id` >field. >The operation request is asynchronous and the status should be retrieved >by using either **XEN_SYSCTL_XSPLICE_GET** or **XEN_SYSCTL_XSPLICE_LIST** >hypercall. > >The caller provides: > > * `id` the unique id. > * `cmd` the command requested: > 1. *XSPLICE_ACTION_CHECK* (0) check that the payload will apply >properly. > 2. *XSPLICE_ACTION_UNLOAD* (1) unload the payload. > 3. *XSPLICE_ACTION_REVERT* (2) revert the payload. > 4. *XSPLICE_ACTION_APPLY* (3) apply the payload. > > >The return value will be zero unless the provided fields are incorrect. > >The structure is as follow: > ><pre> >#define XSPLICE_ACTION_CHECK 0 >#define XSPLICE_ACTION_UNLOAD 1 >#define XSPLICE_ACTION_REVERT 2 >#define XSPLICE_ACTION_APPLY 3 > >struct xen_sysctl_xsplice_action { > char id[40]; /* IN, name of the patch. */ > uint32_t cmd; /* IN */ >}; > ></pre> > >## Sequence of events. > >The normal sequence of events is to: > > 1. *XEN_SYSCTL_XSPLICE_UPLOAD* to upload the payload. If there are >errors *STOP* here. > 2. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in >*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_LOADED* go to next >step. > 3. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_CHECK* command to >verify that the payload can be succesfully applied. > 4. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in >*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_CHECKED* go to next >step. > 5. *XEN_SYSCTL_XSPLICE_ACTION* with *XSPLICE_ACTION_APPLY* to apply the >patch. > 6. *XEN_SYSCTL_XSPLICE_GET* to check the `->status`. If in >*XSPLICE_STATUS_PROGRESS* spin. If in *XSPLICE_STATUS_APPLIED* exit with >success. > > >## Addendum > >Implementation quirks should not be discussed in a design document. > >However these observations can provide aid when developing against this >document. > > >### Alternative assembler > >Alternative assembler is a mechanism to use different instructions >depending >on what the CPU supports. This is done by providing multiple streams of >code >that can be patched in - or if the CPU does not support it - padded with >`nop` operations. The alternative assembler macros cause the compiler to >expand the code to place a most generic code in place - emit a special >ELF .section header to tag this location. During run-time the hypervisor >can leave the areas alone or patch them with an better suited opcodes. > >As we might be patching the alternative assembler sections as well - by >providing a new better suited op-codes or perhaps with nops - we need to >also re-run the alternative assembler patching after we have done our >patching. > >Also when we are doing safety checks the code we are checking might be >utilizing alternative assembler. As such we should relax out checks to >accomodate that. > >### .rodata sections > >The patching might require strings to be updated as well. As such we must >be >also able to patch the strings as needed. This sounds simple - but the >compiler >has a habit of coalescing strings that are the same - which means if we >in-place >alter the strings - other users will be inadvertently affected as well. > >This is also where pointers to functions live - and we may need to patch >this >as well. > >To guard against that we must be prepared to do patching similar to >trampoline patching or in-line depending on the flavour. If we can >do in-line patching we would need to: > > * alter `.rodata` to be writeable. > * inline patch. > * alter `.rodata` to be read-only. > >If are doing trampoline patching we would need to: > > * allocate a new memory location for the string. > * all locations which use this string will have to be updated to use the > offset to the string. > * mark the region RO when we are done. > >### .bss sections > >Patching writable data is not suitable as it is unclear what should be >done >depending on the current state of data. As such it should not be >attempted. > > >### Patching code which is in the stack. > >We should not patch the code which is on the stack. That can lead >to corruption. > >### Trampoline (e9 opcode) > >The e9 opcode used for jmpq uses a 32-bit signed displacement. That means >we are limited to up to 2GB of virtual address to place the new code >from the old code. That should not be a problem since Xen hypervisor has >a very small footprint. > >However if we need - we can always add two trampolines. One at the 2GB >limit that calls the next trampoline. > >### Time rendezvous code instead of stop_machine for patching > >The hypervisor's time rendezvous code runs synchronously across all CPUs >every second. Using the stop_machine to patch can stall the time >rendezvous >code and result in NMI. As such having the patching be done at the tail >of rendezvous code should avoid this problem. > >### Security > >Only the privileged domain should be allowed to do this operation. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel