Author: jhb Date: Tue May 5 00:02:04 2020 New Revision: 360648 URL: https://svnweb.freebsd.org/changeset/base/360648
Log: Initial support for bhyve save and restore. Save and restore (also known as suspend and resume) permits a snapshot to be taken of a guest's state that can later be resumed. In the current implementation, bhyve(8) creates a UNIX domain socket that is used by bhyvectl(8) to send a request to save a snapshot (and optionally exit after the snapshot has been taken). A snapshot currently consists of two files: the first holds a copy of guest RAM, and the second file holds other guest state such as vCPU register values and device model state. To resume a guest, bhyve(8) must be started with a matching pair of command line arguments to instantiate the same set of device models as well as a pointer to the saved snapshot. While the current implementation is useful for several uses cases, it has a few limitations. The file format for saving the guest state is tied to the ABI of internal bhyve structures and is not self-describing (in that it does not communicate the set of device models present in the system). In addition, the state saved for some device models closely matches the internal data structures which might prove a challenge for compatibility of snapshot files across a range of bhyve versions. The file format also does not currently support versioning of individual chunks of state. As a result, the current file format is not a fixed binary format and future revisions to save and restore will break binary compatiblity of snapshot files. The goal is to move to a more flexible format that adds versioning, etc. and at that point to commit to providing a reasonable level of compatibility. As a result, the current implementation is not enabled by default. It can be enabled via the WITH_BHYVE_SNAPSHOT=yes option for userland builds, and the kernel option BHYVE_SHAPSHOT. Submitted by: Mihai Tiganus, Flavius Anton, Darius Mihai Submitted by: Elena Mihailescu, Mihai Carabas, Sergiu Weisz Relnotes: yes Sponsored by: University Politehnica of Bucharest Sponsored by: Matthew Grooms (student scholarships) Sponsored by: iXsystems Differential Revision: https://reviews.freebsd.org/D19495 Added: head/sys/amd64/include/vmm_snapshot.h (contents, props changed) head/sys/amd64/vmm/vmm_snapshot.c (contents, props changed) head/tools/build/options/WITH_BHYVE_SNAPSHOT (contents, props changed) head/usr.sbin/bhyve/snapshot.c (contents, props changed) head/usr.sbin/bhyve/snapshot.h (contents, props changed) Modified: head/lib/libvmmapi/vmmapi.c head/lib/libvmmapi/vmmapi.h head/share/man/man5/src.conf.5 head/share/mk/src.opts.mk head/sys/amd64/include/vmm.h head/sys/amd64/include/vmm_dev.h head/sys/amd64/vmm/amd/svm.c head/sys/amd64/vmm/amd/svm.h head/sys/amd64/vmm/amd/svm_msr.c head/sys/amd64/vmm/amd/vmcb.c head/sys/amd64/vmm/amd/vmcb.h head/sys/amd64/vmm/intel/vmcs.c head/sys/amd64/vmm/intel/vmcs.h head/sys/amd64/vmm/intel/vmx.c head/sys/amd64/vmm/io/vatpic.c head/sys/amd64/vmm/io/vatpic.h head/sys/amd64/vmm/io/vatpit.c head/sys/amd64/vmm/io/vatpit.h head/sys/amd64/vmm/io/vhpet.c head/sys/amd64/vmm/io/vhpet.h head/sys/amd64/vmm/io/vioapic.c head/sys/amd64/vmm/io/vioapic.h head/sys/amd64/vmm/io/vlapic.c head/sys/amd64/vmm/io/vlapic.h head/sys/amd64/vmm/io/vpmtmr.c head/sys/amd64/vmm/io/vpmtmr.h head/sys/amd64/vmm/io/vrtc.c head/sys/amd64/vmm/io/vrtc.h head/sys/amd64/vmm/vmm.c head/sys/amd64/vmm/vmm_dev.c head/sys/conf/config.mk head/sys/conf/kern.opts.mk head/sys/conf/options.amd64 head/sys/modules/vmm/Makefile head/usr.sbin/bhyve/Makefile head/usr.sbin/bhyve/Makefile.depend head/usr.sbin/bhyve/atkbdc.c head/usr.sbin/bhyve/atkbdc.h head/usr.sbin/bhyve/bhyve.8 head/usr.sbin/bhyve/bhyverun.c head/usr.sbin/bhyve/bhyverun.h head/usr.sbin/bhyve/block_if.c head/usr.sbin/bhyve/block_if.h head/usr.sbin/bhyve/mevent.c head/usr.sbin/bhyve/pci_ahci.c head/usr.sbin/bhyve/pci_e82545.c head/usr.sbin/bhyve/pci_emul.c head/usr.sbin/bhyve/pci_emul.h head/usr.sbin/bhyve/pci_fbuf.c head/usr.sbin/bhyve/pci_lpc.c head/usr.sbin/bhyve/pci_virtio_block.c head/usr.sbin/bhyve/pci_virtio_net.c head/usr.sbin/bhyve/pci_xhci.c head/usr.sbin/bhyve/ps2kbd.c head/usr.sbin/bhyve/ps2kbd.h head/usr.sbin/bhyve/ps2mouse.c head/usr.sbin/bhyve/ps2mouse.h head/usr.sbin/bhyve/uart_emul.c head/usr.sbin/bhyve/uart_emul.h head/usr.sbin/bhyve/usb_emul.h head/usr.sbin/bhyve/usb_mouse.c head/usr.sbin/bhyve/virtio.c head/usr.sbin/bhyve/virtio.h head/usr.sbin/bhyvectl/Makefile head/usr.sbin/bhyvectl/bhyvectl.8 head/usr.sbin/bhyvectl/bhyvectl.c Modified: head/lib/libvmmapi/vmmapi.c ============================================================================== --- head/lib/libvmmapi/vmmapi.c Mon May 4 23:53:46 2020 (r360647) +++ head/lib/libvmmapi/vmmapi.c Tue May 5 00:02:04 2020 (r360648) @@ -44,6 +44,7 @@ __FBSDID("$FreeBSD$"); #include <machine/specialreg.h> #include <errno.h> +#include <stdbool.h> #include <stdio.h> #include <stdlib.h> #include <assert.h> @@ -53,8 +54,10 @@ __FBSDID("$FreeBSD$"); #include <libutil.h> +#include <vm/vm.h> #include <machine/vmm.h> #include <machine/vmm_dev.h> +#include <machine/vmm_snapshot.h> #include "vmmapi.h" @@ -238,6 +241,17 @@ vm_mmap_memseg(struct vmctx *ctx, vm_paddr_t gpa, int } int +vm_get_guestmem_from_ctx(struct vmctx *ctx, char **guest_baseaddr, + size_t *lowmem_size, size_t *highmem_size) +{ + + *guest_baseaddr = ctx->baseaddr; + *lowmem_size = ctx->lowmem; + *highmem_size = ctx->highmem; + return (0); +} + +int vm_mmap_getnext(struct vmctx *ctx, vm_paddr_t *gpa, int *segid, vm_ooffset_t *segoff, size_t *len, int *prot, int *flags) { @@ -448,6 +462,34 @@ vm_map_gpa(struct vmctx *ctx, vm_paddr_t gaddr, size_t return (NULL); } +vm_paddr_t +vm_rev_map_gpa(struct vmctx *ctx, void *addr) +{ + vm_paddr_t offaddr; + + offaddr = (char *)addr - ctx->baseaddr; + + if (ctx->lowmem > 0) + if (offaddr >= 0 && offaddr <= ctx->lowmem) + return (offaddr); + + if (ctx->highmem > 0) + if (offaddr >= 4*GB && offaddr < 4*GB + ctx->highmem) + return (offaddr); + + return ((vm_paddr_t)-1); +} + +/* TODO: maximum size for vmname */ +int +vm_get_name(struct vmctx *ctx, char *buf, size_t max_len) +{ + + if (strlcpy(buf, ctx->name, max_len) >= max_len) + return (EINVAL); + return (0); +} + size_t vm_get_lowmem_size(struct vmctx *ctx) { @@ -1499,6 +1541,29 @@ vm_restart_instruction(void *arg, int vcpu) struct vmctx *ctx = arg; return (ioctl(ctx->fd, VM_RESTART_INSTRUCTION, &vcpu)); +} + +int +vm_snapshot_req(struct vm_snapshot_meta *meta) +{ + + if (ioctl(meta->ctx->fd, VM_SNAPSHOT_REQ, meta) == -1) { +#ifdef SNAPSHOT_DEBUG + fprintf(stderr, "%s: snapshot failed for %s: %d\r\n", + __func__, meta->dev_name, errno); +#endif + return (-1); + } + return (0); +} + +int +vm_restore_time(struct vmctx *ctx) +{ + int dummy; + + dummy = 0; + return (ioctl(ctx->fd, VM_RESTORE_TIME, &dummy)); } int Modified: head/lib/libvmmapi/vmmapi.h ============================================================================== --- head/lib/libvmmapi/vmmapi.h Mon May 4 23:53:46 2020 (r360647) +++ head/lib/libvmmapi/vmmapi.h Tue May 5 00:02:04 2020 (r360648) @@ -33,6 +33,7 @@ #include <sys/param.h> #include <sys/cpuset.h> +#include <machine/vmm_dev.h> /* * API version for out-of-tree consumers like grub-bhyve for making compile @@ -42,6 +43,7 @@ struct iovec; struct vmctx; +struct vm_snapshot_meta; enum x2apic_state; /* @@ -88,6 +90,10 @@ int vm_get_memseg(struct vmctx *ctx, int ident, size_t */ int vm_mmap_getnext(struct vmctx *ctx, vm_paddr_t *gpa, int *segid, vm_ooffset_t *segoff, size_t *len, int *prot, int *flags); + +int vm_get_guestmem_from_ctx(struct vmctx *ctx, char **guest_baseaddr, + size_t *lowmem_size, size_t *highmem_size); + /* * Create a device memory segment identified by 'segid'. * @@ -110,6 +116,8 @@ void vm_destroy(struct vmctx *ctx); int vm_parse_memsize(const char *optarg, size_t *memsize); int vm_setup_memory(struct vmctx *ctx, size_t len, enum vm_mmap_style s); void *vm_map_gpa(struct vmctx *ctx, vm_paddr_t gaddr, size_t len); +/* inverse operation to vm_map_gpa - extract guest address from host pointer */ +vm_paddr_t vm_rev_map_gpa(struct vmctx *ctx, void *addr); int vm_get_gpa_pmap(struct vmctx *, uint64_t gpa, uint64_t *pte, int *num); int vm_gla2gpa(struct vmctx *, int vcpuid, struct vm_guest_paging *paging, uint64_t gla, int prot, uint64_t *gpa, int *fault); @@ -120,6 +128,7 @@ uint32_t vm_get_lowmem_limit(struct vmctx *ctx); void vm_set_lowmem_limit(struct vmctx *ctx, uint32_t limit); void vm_set_memflags(struct vmctx *ctx, int flags); int vm_get_memflags(struct vmctx *ctx); +int vm_get_name(struct vmctx *ctx, char *buffer, size_t max_len); size_t vm_get_lowmem_size(struct vmctx *ctx); size_t vm_get_highmem_size(struct vmctx *ctx); int vm_set_desc(struct vmctx *ctx, int vcpu, int reg, @@ -237,4 +246,24 @@ int vm_setup_freebsd_registers_i386(struct vmctx *vmct uint32_t eip, uint32_t gdtbase, uint32_t esp); void vm_setup_freebsd_gdt(uint64_t *gdtr); + +/* + * Save and restore + */ + +#define MAX_SNAPSHOT_VMNAME 100 + +enum checkpoint_opcodes { + START_CHECKPOINT = 0, + START_SUSPEND = 1, +}; + +struct checkpoint_op { + unsigned int op; + char snapshot_filename[MAX_SNAPSHOT_VMNAME]; +}; + +int vm_snapshot_req(struct vm_snapshot_meta *meta); +int vm_restore_time(struct vmctx *ctx); + #endif /* _VMMAPI_H_ */ Modified: head/share/man/man5/src.conf.5 ============================================================================== --- head/share/man/man5/src.conf.5 Mon May 4 23:53:46 2020 (r360647) +++ head/share/man/man5/src.conf.5 Tue May 5 00:02:04 2020 (r360648) @@ -1,6 +1,6 @@ .\" DO NOT EDIT-- this file is @generated by tools/build/options/makeman. .\" $FreeBSD$ -.Dd April 30, 2020 +.Dd May 4, 2020 .Dt SRC.CONF 5 .Os .Sh NAME @@ -168,6 +168,13 @@ is set explicitly) Set to not build or install .Xr bhyve 8 , associated utilities, and examples. +.Pp +This option only affects amd64/amd64. +.It Va WITH_BHYVE_SNAPSHOT +Set to include support for save and restore (snapshots) in +.Xr bhyve 8 +and +.Xr bhyvectl 8 . .Pp This option only affects amd64/amd64. .It Va WITH_BIND_NOW Modified: head/share/mk/src.opts.mk ============================================================================== --- head/share/mk/src.opts.mk Mon May 4 23:53:46 2020 (r360647) +++ head/share/mk/src.opts.mk Tue May 5 00:02:04 2020 (r360648) @@ -200,6 +200,7 @@ __DEFAULT_YES_OPTIONS = \ __DEFAULT_NO_OPTIONS = \ BEARSSL \ + BHYVE_SNAPSHOT \ BSD_GREP \ CLANG_EXTRAS \ DTRACE_TESTS \ Modified: head/sys/amd64/include/vmm.h ============================================================================== --- head/sys/amd64/include/vmm.h Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/include/vmm.h Tue May 5 00:02:04 2020 (r360648) @@ -34,6 +34,8 @@ #include <sys/sdt.h> #include <x86/segments.h> +struct vm_snapshot_meta; + #ifdef _KERNEL SDT_PROVIDER_DECLARE(vmm); #endif @@ -152,6 +154,7 @@ struct vmspace; struct vm_object; struct vm_guest_paging; struct pmap; +enum snapshot_req; struct vm_eventinfo { void *rptr; /* rendezvous cookie */ @@ -180,6 +183,10 @@ typedef struct vmspace * (*vmi_vmspace_alloc)(vm_offse typedef void (*vmi_vmspace_free)(struct vmspace *vmspace); typedef struct vlapic * (*vmi_vlapic_init)(void *vmi, int vcpu); typedef void (*vmi_vlapic_cleanup)(void *vmi, struct vlapic *vlapic); +typedef int (*vmi_snapshot_t)(void *vmi, struct vm_snapshot_meta *meta); +typedef int (*vmi_snapshot_vmcx_t)(void *vmi, struct vm_snapshot_meta *meta, + int vcpu); +typedef int (*vmi_restore_tsc_t)(void *vmi, int vcpuid, uint64_t now); struct vmm_ops { vmm_init_func_t init; /* module wide initialization */ @@ -199,6 +206,11 @@ struct vmm_ops { vmi_vmspace_free vmspace_free; vmi_vlapic_init vlapic_init; vmi_vlapic_cleanup vlapic_cleanup; + + /* checkpoint operations */ + vmi_snapshot_t vmsnapshot; + vmi_snapshot_vmcx_t vmcx_snapshot; + vmi_restore_tsc_t vm_restore_tsc; }; extern struct vmm_ops vmm_ops_intel; @@ -272,7 +284,10 @@ void vm_exit_debug(struct vm *vm, int vcpuid, uint64_t void vm_exit_rendezvous(struct vm *vm, int vcpuid, uint64_t rip); void vm_exit_astpending(struct vm *vm, int vcpuid, uint64_t rip); void vm_exit_reqidle(struct vm *vm, int vcpuid, uint64_t rip); +int vm_snapshot_req(struct vm *vm, struct vm_snapshot_meta *meta); +int vm_restore_time(struct vm *vm); + #ifdef _SYS__CPUSET_H_ /* * Rendezvous all vcpus specified in 'dest' and execute 'func(arg)'. @@ -408,6 +423,15 @@ int vm_exit_intinfo(struct vm *vm, int vcpuid, uint64_ int vm_entry_intinfo(struct vm *vm, int vcpuid, uint64_t *info); int vm_get_intinfo(struct vm *vm, int vcpuid, uint64_t *info1, uint64_t *info2); + +/* + * Function used to keep track of the guest's TSC offset. The + * offset is used by the virutalization extensions to provide a consistent + * value for the Time Stamp Counter to the guest. + * + * Return value is 0 on success and non-zero on failure. + */ +int vm_set_tsc_offset(struct vm *vm, int vcpu_id, uint64_t offset); enum vm_reg_name vm_segment_name(int seg_encoding); Modified: head/sys/amd64/include/vmm_dev.h ============================================================================== --- head/sys/amd64/include/vmm_dev.h Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/include/vmm_dev.h Tue May 5 00:02:04 2020 (r360648) @@ -31,6 +31,8 @@ #ifndef _VMM_DEV_H_ #define _VMM_DEV_H_ +struct vm_snapshot_meta; + #ifdef _KERNEL void vmmdev_init(void); int vmmdev_cleanup(void); @@ -312,6 +314,11 @@ enum { IOCNUM_RTC_WRITE = 101, IOCNUM_RTC_SETTIME = 102, IOCNUM_RTC_GETTIME = 103, + + /* checkpoint */ + IOCNUM_SNAPSHOT_REQ = 113, + + IOCNUM_RESTORE_TIME = 115 }; #define VM_RUN \ @@ -422,4 +429,8 @@ enum { _IOR('v', IOCNUM_RTC_GETTIME, struct vm_rtc_time) #define VM_RESTART_INSTRUCTION \ _IOW('v', IOCNUM_RESTART_INSTRUCTION, int) +#define VM_SNAPSHOT_REQ \ + _IOWR('v', IOCNUM_SNAPSHOT_REQ, struct vm_snapshot_meta) +#define VM_RESTORE_TIME \ + _IOWR('v', IOCNUM_RESTORE_TIME, int) #endif Added: head/sys/amd64/include/vmm_snapshot.h ============================================================================== --- /dev/null 00:00:00 1970 (empty, because file is newly added) +++ head/sys/amd64/include/vmm_snapshot.h Tue May 5 00:02:04 2020 (r360648) @@ -0,0 +1,156 @@ +/*- + * SPDX-License-Identifier: BSD-2-Clause-FreeBSD + * + * Copyright (c) 2016 Flavius Anton + * Copyright (c) 2016 Mihai Tiganus + * Copyright (c) 2016-2019 Mihai Carabas + * Copyright (c) 2017-2019 Darius Mihai + * Copyright (c) 2017-2019 Elena Mihailescu + * Copyright (c) 2018-2019 Sergiu Weisz + * All rights reserved. + * The bhyve-snapshot feature was developed under sponsorships + * from Matthew Grooms. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY NETAPP, INC ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL NETAPP, INC OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + * + * $FreeBSD$ + */ + +#ifndef _VMM_SNAPSHOT_ +#define _VMM_SNAPSHOT_ + +#include <sys/errno.h> +#include <sys/types.h> +#ifndef _KERNEL +#include <stdbool.h> +#endif + +struct vmctx; + +enum snapshot_req { + STRUCT_VMX, + STRUCT_VIOAPIC, + STRUCT_VM, + STRUCT_VLAPIC, + VM_MEM, + STRUCT_VHPET, + STRUCT_VMCX, + STRUCT_VATPIC, + STRUCT_VATPIT, + STRUCT_VPMTMR, + STRUCT_VRTC, +}; + +struct vm_snapshot_buffer { + /* + * R/O for device-specific functions; + * written by generic snapshot functions. + */ + uint8_t *const buf_start; + const size_t buf_size; + + /* + * R/W for device-specific functions used to keep track of buffer + * current position and remaining size. + */ + uint8_t *buf; + size_t buf_rem; + + /* + * Length of the snapshot is either determined as (buf_size - buf_rem) + * or (buf - buf_start) -- the second variation returns a signed value + * so it may not be appropriate. + * + * Use vm_get_snapshot_size(meta). + */ +}; + +enum vm_snapshot_op { + VM_SNAPSHOT_SAVE, + VM_SNAPSHOT_RESTORE, +}; + +struct vm_snapshot_meta { + struct vmctx *ctx; + void *dev_data; + const char *dev_name; /* identify userspace devices */ + enum snapshot_req dev_req; /* identify kernel structs */ + + struct vm_snapshot_buffer buffer; + + enum vm_snapshot_op op; +}; + + +void vm_snapshot_buf_err(const char *bufname, const enum vm_snapshot_op op); +int vm_snapshot_buf(volatile void *data, size_t data_size, + struct vm_snapshot_meta *meta); +size_t vm_get_snapshot_size(struct vm_snapshot_meta *meta); +int vm_snapshot_guest2host_addr(void **addrp, size_t len, bool restore_null, + struct vm_snapshot_meta *meta); +int vm_snapshot_buf_cmp(volatile void *data, size_t data_size, + struct vm_snapshot_meta *meta); + +#define SNAPSHOT_BUF_OR_LEAVE(DATA, LEN, META, RES, LABEL) \ +do { \ + (RES) = vm_snapshot_buf((DATA), (LEN), (META)); \ + if ((RES) != 0) { \ + vm_snapshot_buf_err(#DATA, (META)->op); \ + goto LABEL; \ + } \ +} while (0) + +#define SNAPSHOT_VAR_OR_LEAVE(DATA, META, RES, LABEL) \ + SNAPSHOT_BUF_OR_LEAVE(&(DATA), sizeof(DATA), (META), (RES), LABEL) + +/* + * Address variables are pointers to guest memory. + * + * When RNULL != 0, do not enforce invalid address checks; instead, make the + * pointer NULL at restore time. + */ +#define SNAPSHOT_GUEST2HOST_ADDR_OR_LEAVE(ADDR, LEN, RNULL, META, RES, LABEL) \ +do { \ + (RES) = vm_snapshot_guest2host_addr((void **)&(ADDR), (LEN), (RNULL), \ + (META)); \ + if ((RES) != 0) { \ + if ((RES) == EFAULT) \ + fprintf(stderr, "%s: invalid address: %s\r\n", \ + __func__, #ADDR); \ + goto LABEL; \ + } \ +} while (0) + +/* compare the value in the meta buffer with the data */ +#define SNAPSHOT_BUF_CMP_OR_LEAVE(DATA, LEN, META, RES, LABEL) \ +do { \ + (RES) = vm_snapshot_buf_cmp((DATA), (LEN), (META)); \ + if ((RES) != 0) { \ + vm_snapshot_buf_err(#DATA, (META)->op); \ + goto LABEL; \ + } \ +} while (0) + +#define SNAPSHOT_VAR_CMP_OR_LEAVE(DATA, META, RES, LABEL) \ + SNAPSHOT_BUF_CMP_OR_LEAVE(&(DATA), sizeof(DATA), (META), (RES), LABEL) + +#endif Modified: head/sys/amd64/vmm/amd/svm.c ============================================================================== --- head/sys/amd64/vmm/amd/svm.c Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/vmm/amd/svm.c Tue May 5 00:02:04 2020 (r360648) @@ -29,6 +29,8 @@ #include <sys/cdefs.h> __FBSDID("$FreeBSD$"); +#include "opt_bhyve_snapshot.h" + #include <sys/param.h> #include <sys/systm.h> #include <sys/smp.h> @@ -50,6 +52,7 @@ __FBSDID("$FreeBSD$"); #include <machine/vmm.h> #include <machine/vmm_dev.h> #include <machine/vmm_instruction_emul.h> +#include <machine/vmm_snapshot.h> #include "vmm_lapic.h" #include "vmm_stat.h" @@ -276,6 +279,25 @@ svm_restore(void) svm_enable(NULL); } +#ifdef BHYVE_SNAPSHOT +int +svm_set_tsc_offset(struct svm_softc *sc, int vcpu, uint64_t offset) +{ + int error; + struct vmcb_ctrl *ctrl; + + ctrl = svm_get_vmcb_ctrl(sc, vcpu); + ctrl->tsc_offset = offset; + + svm_set_dirty(sc, vcpu, VMCB_CACHE_I); + VCPU_CTR1(sc->vm, vcpu, "tsc offset changed to %#lx", offset); + + error = vm_set_tsc_offset(sc->vm, vcpu, offset); + + return (error); +} +#endif + /* Pentium compatible MSRs */ #define MSR_PENTIUM_START 0 #define MSR_PENTIUM_END 0x1FFF @@ -2203,7 +2225,37 @@ svm_setreg(void *arg, int vcpu, int ident, uint64_t va return (EINVAL); } +#ifdef BHYVE_SNAPSHOT static int +svm_snapshot_reg(void *arg, int vcpu, int ident, + struct vm_snapshot_meta *meta) +{ + int ret; + uint64_t val; + + if (meta->op == VM_SNAPSHOT_SAVE) { + ret = svm_getreg(arg, vcpu, ident, &val); + if (ret != 0) + goto done; + + SNAPSHOT_VAR_OR_LEAVE(val, meta, ret, done); + } else if (meta->op == VM_SNAPSHOT_RESTORE) { + SNAPSHOT_VAR_OR_LEAVE(val, meta, ret, done); + + ret = svm_setreg(arg, vcpu, ident, val); + if (ret != 0) + goto done; + } else { + ret = EINVAL; + goto done; + } + +done: + return (ret); +} +#endif + +static int svm_setcap(void *arg, int vcpu, int type, int val) { struct svm_softc *sc; @@ -2285,6 +2337,306 @@ svm_vlapic_cleanup(void *arg, struct vlapic *vlapic) free(vlapic, M_SVM_VLAPIC); } +#ifdef BHYVE_SNAPSHOT +static int +svm_snapshot_vmi(void *arg, struct vm_snapshot_meta *meta) +{ + /* struct svm_softc is AMD's representation for SVM softc */ + struct svm_softc *sc; + struct svm_vcpu *vcpu; + struct vmcb *vmcb; + uint64_t val; + int i; + int ret; + + sc = arg; + + KASSERT(sc != NULL, ("%s: arg was NULL", __func__)); + + SNAPSHOT_VAR_OR_LEAVE(sc->nptp, meta, ret, done); + + for (i = 0; i < VM_MAXCPU; i++) { + vcpu = &sc->vcpu[i]; + vmcb = &vcpu->vmcb; + + /* VMCB fields for virtual cpu i */ + SNAPSHOT_VAR_OR_LEAVE(vmcb->ctrl.v_tpr, meta, ret, done); + val = vmcb->ctrl.v_tpr; + SNAPSHOT_VAR_OR_LEAVE(val, meta, ret, done); + vmcb->ctrl.v_tpr = val; + + SNAPSHOT_VAR_OR_LEAVE(vmcb->ctrl.asid, meta, ret, done); + val = vmcb->ctrl.np_enable; + SNAPSHOT_VAR_OR_LEAVE(val, meta, ret, done); + vmcb->ctrl.np_enable = val; + + val = vmcb->ctrl.intr_shadow; + SNAPSHOT_VAR_OR_LEAVE(val, meta, ret, done); + vmcb->ctrl.intr_shadow = val; + SNAPSHOT_VAR_OR_LEAVE(vmcb->ctrl.tlb_ctrl, meta, ret, done); + + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad1, + sizeof(vmcb->state.pad1), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cpl, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad2, + sizeof(vmcb->state.pad2), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.efer, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad3, + sizeof(vmcb->state.pad3), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cr4, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cr3, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cr0, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.dr7, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.dr6, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.rflags, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.rip, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad4, + sizeof(vmcb->state.pad4), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.rsp, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad5, + sizeof(vmcb->state.pad5), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.rax, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.star, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.lstar, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cstar, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.sfmask, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.kernelgsbase, + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.sysenter_cs, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.sysenter_esp, + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.sysenter_eip, + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.cr2, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad6, + sizeof(vmcb->state.pad6), + meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.g_pat, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.dbgctl, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.br_from, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.br_to, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.int_from, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vmcb->state.int_to, meta, ret, done); + SNAPSHOT_BUF_OR_LEAVE(vmcb->state.pad7, + sizeof(vmcb->state.pad7), + meta, ret, done); + + /* Snapshot swctx for virtual cpu i */ + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rbp, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rbx, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rcx, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rdx, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rdi, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_rsi, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r8, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r9, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r10, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r11, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r12, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r13, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r14, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_r15, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_dr0, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_dr1, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_dr2, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.sctx_dr3, meta, ret, done); + + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr0, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr1, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr2, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr3, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr6, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_dr7, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->swctx.host_debugctl, meta, ret, + done); + + /* Restore other svm_vcpu struct fields */ + + /* Restore NEXTRIP field */ + SNAPSHOT_VAR_OR_LEAVE(vcpu->nextrip, meta, ret, done); + + /* Restore lastcpu field */ + SNAPSHOT_VAR_OR_LEAVE(vcpu->lastcpu, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->dirty, meta, ret, done); + + /* Restore EPTGEN field - EPT is Extended Page Tabel */ + SNAPSHOT_VAR_OR_LEAVE(vcpu->eptgen, meta, ret, done); + + SNAPSHOT_VAR_OR_LEAVE(vcpu->asid.gen, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(vcpu->asid.num, meta, ret, done); + + /* Set all caches dirty */ + if (meta->op == VM_SNAPSHOT_RESTORE) { + svm_set_dirty(sc, i, VMCB_CACHE_ASID); + svm_set_dirty(sc, i, VMCB_CACHE_IOPM); + svm_set_dirty(sc, i, VMCB_CACHE_I); + svm_set_dirty(sc, i, VMCB_CACHE_TPR); + svm_set_dirty(sc, i, VMCB_CACHE_CR2); + svm_set_dirty(sc, i, VMCB_CACHE_CR); + svm_set_dirty(sc, i, VMCB_CACHE_DT); + svm_set_dirty(sc, i, VMCB_CACHE_SEG); + svm_set_dirty(sc, i, VMCB_CACHE_NP); + } + } + + if (meta->op == VM_SNAPSHOT_RESTORE) + flush_by_asid(); + +done: + return (ret); +} + +static int +svm_snapshot_vmcx(void *arg, struct vm_snapshot_meta *meta, int vcpu) +{ + struct vmcb *vmcb; + struct svm_softc *sc; + int err, running, hostcpu; + + sc = (struct svm_softc *)arg; + err = 0; + + KASSERT(arg != NULL, ("%s: arg was NULL", __func__)); + vmcb = svm_get_vmcb(sc, vcpu); + + running = vcpu_is_running(sc->vm, vcpu, &hostcpu); + if (running && hostcpu !=curcpu) { + printf("%s: %s%d is running", __func__, vm_name(sc->vm), vcpu); + return (EINVAL); + } + + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_CR0, meta); + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_CR2, meta); + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_CR3, meta); + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_CR4, meta); + + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_DR7, meta); + + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_RAX, meta); + + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_RSP, meta); + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_RIP, meta); + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_RFLAGS, meta); + + /* Guest segments */ + /* ES */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_ES, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_ES, meta); + + /* CS */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_CS, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_CS, meta); + + /* SS */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_SS, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_SS, meta); + + /* DS */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_DS, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_DS, meta); + + /* FS */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_FS, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_FS, meta); + + /* GS */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_GS, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_GS, meta); + + /* TR */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_TR, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_TR, meta); + + /* LDTR */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_LDTR, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_LDTR, meta); + + /* EFER */ + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_EFER, meta); + + /* IDTR and GDTR */ + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_IDTR, meta); + err += vmcb_snapshot_desc(sc, vcpu, VM_REG_GUEST_GDTR, meta); + + /* Specific AMD registers */ + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_SYSENTER_CS, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_SYSENTER_ESP, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_SYSENTER_EIP, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_NPT_BASE, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_CR_INTERCEPT, 4), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_DR_INTERCEPT, 4), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_EXC_INTERCEPT, 4), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_INST1_INTERCEPT, 4), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_INST2_INTERCEPT, 4), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_TLB_CTRL, 4), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_EXITINFO1, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_EXITINFO2, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_EXITINTINFO, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_VIRQ, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_GUEST_PAT, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_AVIC_BAR, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_AVIC_PAGE, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_AVIC_LT, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_AVIC_PT, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_IO_PERM, 8), meta); + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_MSR_PERM, 8), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_ASID, 4), meta); + + err += vmcb_snapshot_any(sc, vcpu, + VMCB_ACCESS(VMCB_OFF_EXIT_REASON, 8), meta); + + err += svm_snapshot_reg(sc, vcpu, VM_REG_GUEST_INTR_SHADOW, meta); + + return (err); +} + +static int +svm_restore_tsc(void *arg, int vcpu, uint64_t offset) +{ + int err; + + err = svm_set_tsc_offset(arg, vcpu, offset); + + return (err); +} +#endif + struct vmm_ops vmm_ops_amd = { .init = svm_init, .cleanup = svm_cleanup, @@ -2302,4 +2654,9 @@ struct vmm_ops vmm_ops_amd = { .vmspace_free = svm_npt_free, .vlapic_init = svm_vlapic_init, .vlapic_cleanup = svm_vlapic_cleanup, +#ifdef BHYVE_SNAPSHOT + .vmsnapshot = svm_snapshot_vmi, + .vmcx_snapshot = svm_snapshot_vmcx, + .vm_restore_tsc = svm_restore_tsc, +#endif }; Modified: head/sys/amd64/vmm/amd/svm.h ============================================================================== --- head/sys/amd64/vmm/amd/svm.h Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/vmm/amd/svm.h Tue May 5 00:02:04 2020 (r360648) @@ -32,6 +32,7 @@ #define _SVM_H_ struct pcpu; +struct svm_softc; /* * Guest register state that is saved outside the VMCB. @@ -66,5 +67,8 @@ struct svm_regctx { }; void svm_launch(uint64_t pa, struct svm_regctx *gctx, struct pcpu *pcpu); +#ifdef BHYVE_SNAPSHOT +int svm_set_tsc_offset(struct svm_softc *sc, int vcpu, uint64_t offset); +#endif #endif /* _SVM_H_ */ Modified: head/sys/amd64/vmm/amd/svm_msr.c ============================================================================== --- head/sys/amd64/vmm/amd/svm_msr.c Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/vmm/amd/svm_msr.c Tue May 5 00:02:04 2020 (r360648) @@ -29,6 +29,8 @@ #include <sys/cdefs.h> __FBSDID("$FreeBSD$"); +#include "opt_bhyve_snapshot.h" + #include <sys/param.h> #include <sys/errno.h> #include <sys/systm.h> @@ -162,6 +164,11 @@ svm_wrmsr(struct svm_softc *sc, int vcpu, u_int num, u * Ignore writes to microcode update register. */ break; +#ifdef BHYVE_SNAPSHOT + case MSR_TSC: + error = svm_set_tsc_offset(sc, vcpu, val - rdtsc()); + break; +#endif case MSR_EXTFEATURES: break; default: Modified: head/sys/amd64/vmm/amd/vmcb.c ============================================================================== --- head/sys/amd64/vmm/amd/vmcb.c Mon May 4 23:53:46 2020 (r360647) +++ head/sys/amd64/vmm/amd/vmcb.c Tue May 5 00:02:04 2020 (r360648) @@ -29,12 +29,15 @@ #include <sys/cdefs.h> __FBSDID("$FreeBSD$"); +#include "opt_bhyve_snapshot.h" + #include <sys/param.h> #include <sys/systm.h> #include <machine/segments.h> #include <machine/specialreg.h> #include <machine/vmm.h> +#include <machine/vmm_snapshot.h> #include "vmm_ktr.h" @@ -452,3 +455,106 @@ vmcb_getdesc(void *arg, int vcpu, int reg, struct seg_ return (0); } + +#ifdef BHYVE_SNAPSHOT +int +vmcb_getany(struct svm_softc *sc, int vcpu, int ident, uint64_t *val) +{ + int error = 0; + + if (vcpu < 0 || vcpu >= VM_MAXCPU) { + error = EINVAL; + goto err; + } + + if (ident >= VM_REG_LAST) { + error = EINVAL; + goto err; + } + + error = vm_get_register(sc->vm, vcpu, ident, val); + +err: + return (error); +} + +int +vmcb_setany(struct svm_softc *sc, int vcpu, int ident, uint64_t val) +{ + int error = 0; + + if (vcpu < 0 || vcpu >= VM_MAXCPU) { + error = EINVAL; + goto err; + } + + if (ident >= VM_REG_LAST) { + error = EINVAL; + goto err; + } + + error = vm_set_register(sc->vm, vcpu, ident, val); + +err: + return (error); +} + +int +vmcb_snapshot_desc(void *arg, int vcpu, int reg, struct vm_snapshot_meta *meta) +{ + int ret; + struct seg_desc desc; + + if (meta->op == VM_SNAPSHOT_SAVE) { + ret = vmcb_getdesc(arg, vcpu, reg, &desc); + if (ret != 0) + goto done; + + SNAPSHOT_VAR_OR_LEAVE(desc.base, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(desc.limit, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(desc.access, meta, ret, done); + } else if (meta->op == VM_SNAPSHOT_RESTORE) { + SNAPSHOT_VAR_OR_LEAVE(desc.base, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(desc.limit, meta, ret, done); + SNAPSHOT_VAR_OR_LEAVE(desc.access, meta, ret, done); + + ret = vmcb_setdesc(arg, vcpu, reg, &desc); + if (ret != 0) + goto done; + } else { *** DIFF OUTPUT TRUNCATED AT 1000 LINES *** _______________________________________________ svn-src-all@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-all To unsubscribe, send any mail to "svn-src-all-unsubscr...@freebsd.org"