Re: [Xen-devel] [Qemu-devel] [PATCH v3 0/4] QEMU changes to do PVH boot
On 21/01/2019 02:31, no-re...@patchew.org wrote: Patchew URL: https://patchew.org/QEMU/1547554687-12687-1-git-send-email-liam.merw...@oracle.com/ ...> CC dma-helpers.o CC vl.o /tmp/qemu-test/src/block/sheepdog.c: In function 'find_vdi_name': /tmp/qemu-test/src/block/sheepdog.c:1239:5: error: 'strncpy' specified bound 256 equals destination size [-Werror=stringop-truncation] strncpy(buf + SD_MAX_VDI_LEN, tag, SD_MAX_VDI_TAG_LEN); ^~ cc1: all warnings being treated as errors Given the PVH patch series was posted 5 days ago and the following change was committed 3 days ago, I'm assuming this is not related to the PVH changes (which do not touch this file). commit 97b583f46c435aaa40942ca73739d79190776b7f Author: Philippe Mathieu-Daudé Date: Thu Jan 3 09:56:35 2019 +0100 block/sheepdog: Use QEMU_NONSTRING for non NUL-terminated arrays Regards, Liam The full log is available at http://patchew.org/logs/1547554687-12687-1-git-send-email-liam.merw...@oracle.com/testing.docker-mingw@fedora/?type=message. --- Email generated automatically by Patchew [http://patchew.org/]. Please send your feedback to patchew-de...@redhat.com ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 5/5] pvh: load initrd and expose it through fw_cfg
From: Stefano Garzarella When initrd is specified, load and expose it to the guest firmware through fw_cfg. The firmware will fill the hvm_start_info for the kernel. Signed-off-by: Stefano Garzarella Based-on: <1545422632-2-5-git-send-email-liam.merw...@oracle.com> Signed-off-by: Liam Merwick --- hw/i386/pc.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 6d549950a044..9ed5063de8f8 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -1213,25 +1213,45 @@ static void load_linux(PCMachineState *pcms, */ if (load_elfboot(kernel_filename, kernel_size, header, pvh_start_addr, fw_cfg)) { -struct hvm_modlist_entry ramdisk_mod = { 0 }; - fclose(f); fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1); fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline); -assert(machine->device_memory != NULL); -ramdisk_mod.paddr = machine->device_memory->base; -ramdisk_mod.size = -memory_region_size(&machine->device_memory->mr); - -fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod, - sizeof(ramdisk_mod)); fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header)); fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, header, sizeof(header)); +/* load initrd */ +if (initrd_filename) { +gsize initrd_size; +gchar *initrd_data; +GError *gerr = NULL; + +if (!g_file_get_contents(initrd_filename, &initrd_data, +&initrd_size, &gerr)) { +fprintf(stderr, "qemu: error reading initrd %s: %s\n", +initrd_filename, gerr->message); +exit(1); +} + +initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1; +if (initrd_size >= initrd_max) { +fprintf(stderr, "qemu: initrd is too large, cannot support." +"(max: %"PRIu32", need %"PRId64")\n", +initrd_max, (uint64_t)initrd_size); +exit(1); +} + +initrd_addr = (initrd_max - initrd_size) & ~4095; + +fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, + initrd_size); +} + return; } /* This looks like a multiboot kernel. If it is, let's stop -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH v3 3/5] pvh: Add x86/HVM direct boot ABI header file
From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. The canonical version of start_info.h is in the Xen codebase. (like QEMU, the Linux kernel uses a copy as well). Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h new file mode 100644 index ..348779eb10cd --- /dev/null +++ b/include/hw/xen/start_info.h @@ -0,0 +1,146 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2016, Citrix Systems, Inc. + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ +#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ + +/* + * Start of day structure passed to PVH guests and to HVM guests in %ebx. + * + * NOTE: nothing will be loaded at physical address 0, so a 0 value in any + * of the address fields should be treated as not present. + * + * 0 ++ + *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE + *|| ("xEn3" with the 0x80 bit of the "E" set). + * 4 ++ + *| version| Version of this structure. Current version is 1. New + *|| versions are guaranteed to be backwards-compatible. + * 8 ++ + *| flags | SIF_xxx flags. + * 12 ++ + *| nr_modules | Number of modules passed to the kernel. + * 16 ++ + *| modlist_paddr | Physical address of an array of modules + *|| (layout of the structure below). + * 24 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 32 ++ + *| rsdp_paddr | Physical address of the RSDP ACPI data structure. + * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ + * + * The layout of each entry in the module structure is the following: + * + * 0 ++ + *| paddr | Physical address of the module. + * 8 ++ + *| size | Size of the module in bytes. + * 16 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 24 ++ + *| reserved | + * 32 ++ + * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * + * The address and sizes are always a 64bit little endian unsigned integer. + * + * NB: Xen on x86 will always try to place all the data below the 4GiB + * boundary. + * + * Version numbers
[Xen-devel] [PATCH v3 4/5] pvh: Boot uncompressed kernel using direct boot ABI
These changes (along with corresponding Linux kernel and qboot changes) enable a guest to be booted using the x86/HVM direct boot ABI. This commit adds a load_elfboot() routine to pass the size and location of the kernel entry point to qboot (which will fill in the start_info struct information needed to to boot the guest). Having loaded the ELF binary, load_linux() will run qboot which continues the boot. The address for the kernel entry point is read from an ELF Note in the uncompressed kernel binary by a helper routine passed to load_elf(). Co-developed-by: George Kennedy Signed-off-by: George Kennedy Signed-off-by: Liam Merwick --- hw/i386/pc.c | 135 ++ include/elf.h | 10 + 2 files changed, 145 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 73d688f84239..6d549950a044 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -54,6 +54,7 @@ #include "sysemu/qtest.h" #include "kvm_i386.h" #include "hw/xen/xen.h" +#include "hw/xen/start_info.h" #include "ui/qemu-spice.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -110,6 +111,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + GlobalProperty pc_compat_3_1[] = { { "intel-iommu", "dma-drain", "off" }, { "Opteron_G3" "-" TYPE_X86_CPU, "rdtscp", "off" }, @@ -1060,6 +1064,109 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * + * This function is passed to load_elf() when it is called from + * load_elfboot() which then additionally checks for an ELF Note of + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to + * parse the PVH entry address from the ELF Note. + * + * Due to trickery in elf_opts.h, load_elf() is actually available as + * load_elf32() or load_elf64() and this routine needs to be able + * to deal with being called as 32 or 64 bit. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. (although the entry point is 32-bit, the kernel + * binary can be either 32-bit or 64-bit). + */ +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64) +{ +size_t *elf_note_data_addr; + +/* Check if ELF Note header passed in is valid */ +if (arg1 == NULL) { +return 0; +} + +if (is64) { +struct elf64_note *nhdr64 = (struct elf64_note *)arg1; +uint64_t nhdr_size64 = sizeof(struct elf64_note); +uint64_t phdr_align = *(uint64_t *)arg2; +uint64_t nhdr_namesz = nhdr64->n_namesz; + +elf_note_data_addr = +((void *)nhdr64) + nhdr_size64 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} else { +struct elf32_note *nhdr32 = (struct elf32_note *)arg1; +uint32_t nhdr_size32 = sizeof(struct elf32_note); +uint32_t phdr_align = *(uint32_t *)arg2; +uint32_t nhdr_namesz = nhdr32->n_namesz; + +elf_note_data_addr = +((void *)nhdr32) + nhdr_size32 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} + +pvh_start_addr = *elf_note_data_addr; + +return pvh_start_addr; +} + +static bool load_elfboot(const char *kernel_filename, + int kernel_file_size, + uint8_t *header, + size_t pvh_xen_start_addr, + FWCfgState *fw_cfg) +{ +uint32_t flags = 0; +uint32_t mh_load_addr = 0; +uint32_t elf_kernel_size = 0; +uint64_t elf_entry; +uint64_t elf_low, elf_high; +int kernel_size; + +if (ldl_p(header) != 0x464c457f) { +return false; /* no elfboot */ +} + +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; +flags = elf_is64 ? +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; + +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ +error_report("elfboot unsupported flags = %x", flags); +exit(1); +} + +uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY; +kernel_size = load_elf(kernel_filename, read_pvh_start_addr, + NULL, &elf_note_type, &elf_entry, + &elf_low, &elf_high, 0, I386_ELF_MACHINE, + 0, 0); + +if (kernel_size < 0) { +error_report("Error while loading elf kernel"); +exit(1); +} +mh_load_addr = elf_low; +elf_kernel_size = elf_high - elf_low
[Xen-devel] [PATCH v3 1/5] elf: Add optional function ptr to load_elf() to parse ELF notes
This patch adds an optional function pointer, 'elf_note_fn', to load_elf() which causes load_elf() to additionally parse any ELF program headers of type PT_NOTE and check to see if the ELF Note is of the type specified by the 'translate_opaque' arg. If a matching ELF Note is found then the specfied function pointer is called to process the ELF note. Passing a NULL function pointer results in ELF Notes being skipped. The first consumer of this functionality is the PVHboot support which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while loading the uncompressed kernel binary in order to discover the boot entry address for the x86/HVM direct boot ABI. Signed-off-by: Liam Merwick --- hw/alpha/dp264.c | 4 ++-- hw/arm/armv7m.c| 3 ++- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 hw/cris/boot.c | 3 ++- hw/hppa/machine.c | 6 +++--- hw/i386/multiboot.c| 2 +- hw/lm32/lm32_boards.c | 6 -- hw/lm32/milkymist.c| 3 ++- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 --- hw/mips/mips_fulong2e.c| 5 +++-- hw/mips/mips_malta.c | 5 +++-- hw/mips/mips_mipssim.c | 5 +++-- hw/mips/mips_r4k.c | 5 +++-- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 --- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 ++- hw/ppc/mac_newworld.c | 5 +++-- hw/ppc/mac_oldworld.c | 5 +++-- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 ++- hw/ppc/spapr.c | 7 --- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++--- hw/sparc/leon3.c | 3 ++- hw/sparc/sun4m.c | 6 -- hw/sparc64/sun4u.c | 4 ++-- hw/tricore/tricore_testboard.c | 2 +- hw/xtensa/sim.c| 12 hw/xtensa/xtfpga.c | 2 +- include/hw/elf_ops.h | 2 ++ include/hw/loader.h| 9 - 41 files changed, 113 insertions(+), 70 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index dd62f2a4050c..0347eb897c8a 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -114,7 +114,7 @@ static void clipper_init(MachineState *machine) error_report("no palcode provided"); exit(1); } -size = load_elf(palcode_filename, cpu_alpha_superpage_to_phys, +size = load_elf(palcode_filename, NULL, cpu_alpha_superpage_to_phys, NULL, &palcode_entry, &palcode_low, &palcode_high, 0, EM_ALPHA, 0, 0); if (size < 0) { @@ -133,7 +133,7 @@ static void clipper_init(MachineState *machine) if (kernel_filename) { uint64_t param_offset; -size = load_elf(kernel_filename, cpu_alpha_superpage_to_phys, +size = load_elf(kernel_filename, NULL, cpu_alpha_superpage_to_phys, NULL, &kernel_entry, &kernel_low, &kernel_high, 0, EM_ALPHA, 0, 0); if (size < 0) { diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c index f4446528307f..ae68aadef965 100644 --- a/hw/arm/armv7m.c +++ b/hw/arm/armv7m.c @@ -293,7 +293,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size) as = cpu_get_address_space(cs, asidx); if (kernel_filename) { -image_size = load_elf_as(kernel_filename, NULL, NULL, &entry, &lowaddr, +image_size = load_elf_as(kernel_filename, NULL, NULL, NULL, + &entry, &lowaddr, NULL, big_endian, EM_ARM, 1, 0, as); if (image_size < 0) { image_size = load_image_targphys_as(kernel_filename, 0, diff --git a/hw/arm/boot.c b/hw/arm/boot.c index c7a67af7a97c..9d8746f7613f 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -885,7 +885,7 @@ static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry, } } -ret = load_elf_as(info->kernel_filename, NULL, NULL, +ret = load_elf_as(info->kernel_filename, NULL, NULL, NULL, pentry, lowaddr, highaddr, big_endian, elf_machine, 1, data_swab, as); if (ret <= 0) { diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c index fbae05fb3b64..3695dd439cd0 100644 --- a/hw/core/generic-loader.c +++ b/hw/core/generic-loader.c @@ -136,7 +136,7 @@ static void generic_loader_realize(DeviceState *dev, Error **err
[Xen-devel] [PATCH v3 0/4] QEMU changes to do PVH boot
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, QEMU should be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html Details on the Linux changes (in 4.21): https://lkml.org/lkml/2018/12/14/1330 qboot pull request integrated: https://github.com/bonzini/qboot/pull/17 This patch series provides QEMU support to read the ELF header of an uncompressed kernel binary and get the 32-bit PVH kernel entry point from an ELF Note. In load_linux() a call is made to load_elfboot() so see if the header matches that of an uncompressed kernel binary (ELF) and if so, loads the binary and determines the kernel entry address from an ELF Note in the binary. Then qboot does futher initialisation of the guest (e820, etc.) and jumps to the kernel entry address and boots the guest. changes v1 -> v2 - Based on feedback from Stefan Hajnoczi - The reading of the PVH entry point is now done in a single pass during elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2 and considerably reworked. - Patch1 adds a new optional function pointer to parse the ELF note type (the type is passed in via the existing translate_opaque arg - the function already had 11 args so I didn't want to add more than one new arg). - Patch2 adds a function to elf_ops.h to find an ELF note matching a specific type - Patch3 just has a line added to the commit message to state that the Xen repo is the canonical location - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1 just minor load_elfboot() changes and the addition of a read_pvh_start_addr() helper function for load_elf() changes v2 -> v3 - Based on feedback from Stefan Hajnoczi - Fix formatting issues where a few tabs snuck in v2 - Moved code to use ELF Note in load_elf() from Patch1 to Patch2 - In load_elf() set data to NULL after g_free() [now in Patch2 following move] - Added Patch5 containing changes by Stefano Garzarella to support -initrd Usіng the method/scripts documented by the NEMU team at https://github.com/intel/nemu/wiki/Measuring-Boot-Latency https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html below are some timings measured (vmlinux and bzImage from the same build) Time to get to kernel start is almost halved (95ṁs -> 48ms) QEMU + qboot + vmlinux (PVH + 4.20-rc4) qemu_init_end: 41.550521 fw_start: 41.667139 (+0.116618) fw_do_boot: 47.448495 (+5.781356) linux_startup_64: 47.720785 (+0.27229) linux_start_kernel: 48.399541 (+0.678756) linux_start_user: 296.952056 (+248.552515) QEMU + qboot + bzImage: qemu_init_end: 29.209276 fw_start: 29.317342 (+0.108066) linux_start_boot: 36.679362 (+7.36202) linux_startup_64: 94.531349 (+57.851987) linux_start_kernel: 94.900913 (+0.369564) linux_start_user: 401.060971 (+306.160058) QEMU + bzImage: qemu_init_end: 30.424430 linux_startup_64: 893.770334 (+863.345904) linux_start_kernel: 894.17049 (+0.400156) linux_start_user: 1208.679768 (+314.509278) Liam Merwick (4): elf: Add optional function ptr to load_elf() to parse ELF notes elf-ops.h: Add get_elf_note_type() pvh: Add x86/HVM direct boot ABI header file pvh: Boot uncompressed kernel using direct boot ABI hw/alpha/dp264.c | 4 +- hw/arm/armv7m.c| 3 +- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 --- hw/cris/boot.c | 3 +- hw/hppa/machine.c | 6 +- hw/i386/multiboot.c| 2 +- hw/i386/pc.c | 135 + hw/lm32/lm32_boards.c | 6 +- hw/lm32/milkymist.c| 3 +- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 +- hw/mips/mips_fulong2e.c| 5 +- hw/mips/mips_malta.c | 5 +- hw/mips/mips_mipssim.c | 5 +- hw/mips/mips_r4k.c | 5 +- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 +- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 +- hw/ppc/mac_newworld.c | 5 +- hw/ppc/mac_oldworld.c | 5 +- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 +- hw/ppc/spapr.c | 7 +- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++- hw/sparc/leon3.c
[Xen-devel] [PATCH v3 2/5] elf-ops.h: Add get_elf_note_type()
Introduce a routine which, given a pointer to a range of ELF Notes, searches through them looking for a note matching the type specified and returns a pointer to the matching ELF note. get_elf_note_type() is used by elf_load[32|64]() to find the specified note type required by the 'elf_note_fn' parameter added in the previous commit. Signed-off-by: Liam Merwick --- include/hw/elf_ops.h | 75 1 file changed, 75 insertions(+) diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h index 3438d6f69e8d..690f9238c8cc 100644 --- a/include/hw/elf_ops.h +++ b/include/hw/elf_ops.h @@ -265,6 +265,51 @@ fail: return ret; } +/* + * Given 'nhdr', a pointer to a range of ELF Notes, search through them + * for a note matching type 'elf_note_type' and return a pointer to + * the matching ELF note. + */ +static struct elf_note *glue(get_elf_note_type, SZ)(struct elf_note *nhdr, +elf_word note_size, +elf_word phdr_align, +elf_word elf_note_type) +{ +elf_word nhdr_size = sizeof(struct elf_note); +elf_word elf_note_entry_offset = 0; +elf_word note_type; +elf_word nhdr_namesz; +elf_word nhdr_descsz; + +if (nhdr == NULL) { +return NULL; +} + +note_type = nhdr->n_type; +while (note_type != elf_note_type) { +nhdr_namesz = nhdr->n_namesz; +nhdr_descsz = nhdr->n_descsz; + +elf_note_entry_offset = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* + * If the offset calculated in this iteration exceeds the + * supplied size, we are done and no matching note was found. + */ +if (elf_note_entry_offset > note_size) { +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr = (void *)nhdr + elf_note_entry_offset; +note_type = nhdr->n_type; +} + +return nhdr; +} + static int glue(load_elf, SZ)(const char *name, int fd, uint64_t (*elf_note_fn)(void *, void *, bool), uint64_t (*translate_fn)(void *, uint64_t), @@ -497,6 +542,36 @@ static int glue(load_elf, SZ)(const char *name, int fd, high = addr + mem_size; data = NULL; + +} else if (ph->p_type == PT_NOTE && elf_note_fn) { +struct elf_note *nhdr = NULL; + +file_size = ph->p_filesz; /* Size of the range of ELF notes */ +data = g_malloc0(file_size); +if (ph->p_filesz > 0) { +if (lseek(fd, ph->p_offset, SEEK_SET) < 0) { +goto fail; +} +if (read(fd, data, file_size) != file_size) { +goto fail; +} +} + +/* + * Search the ELF notes to find one with a type matching the + * value passed in via 'translate_opaque' + */ +nhdr = (struct elf_note *)data; +assert(translate_opaque != NULL); +nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align, + *(uint64_t *)translate_opaque); +if (nhdr != NULL) { +bool is64 = +sizeof(struct elf_note) == sizeof(struct elf64_note); +elf_note_fn((void *)nhdr, (void *)&ph->p_align, is64); +} +g_free(data); +data = NULL; } } -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC v2 0/4] QEMU changes to do PVH boot
Hi Stefano, On 10/01/2019 15:12, Stefano Garzarella wrote: On Wed, Jan 09, 2019 at 01:18:12PM -0800, Maran Wilson wrote: On 1/9/2019 11:53 AM, Boris Ostrovsky wrote: On 1/9/19 6:53 AM, Stefano Garzarella wrote: Hi Liam, On Tue, Jan 8, 2019 at 3:47 PM Liam Merwick wrote: QEMU sets the hvm_modlist_entry in load_linux() after the call to load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg() But the current PVH patches don't handle initrd (they have start_info.nr_modules == 1). Looking in the linux kernel (arch/x86/platform/pvh/enlighten.c) I saw: /* The first module is always ramdisk. */ if (pvh_start_info.nr_modules) { struct hvm_modlist_entry *modaddr = __va(pvh_start_info.modlist_paddr); pvh_bootparams.hdr.ramdisk_image = modaddr->paddr; pvh_bootparams.hdr.ramdisk_size = modaddr->size; } So, putting start_info.nr_modules = 1, means that the first hvm_modlist_entry should have the ramdisk paddr and size. Is it correct? That's my understanding. I think what's missing, is that we just need Qemu or qboot/seabios to properly populate the pvh_start_info.modlist_paddr with the address (as usable by the guest) of the hvm_modlist_entry which correctly defines the details of the initrd that has already been loaded into memory that is accessible by the guest. -Maran I tried and it works, I modified QEMU to load the initrd and to expose it through fw_cfg, then qboot loads it and set correctly the hvm_modlist_entry. You can find the patch of QEMU at the end of this email and the qboot patch here: https://github.com/stefano-garzarella/qboot/commit/41e1fd765c8419e270fd79d9b3af5d53576e88a8 Do you think can be a good approach? If you want, you can add this patch to your series. Code looks good to me. I'll include it with v3 of my QEMU patches. Regards, Liam Thanks, Stefano From d5c0d51768f5a8fb214be6c2bb0cb7e86e9917b7 Mon Sep 17 00:00:00 2001 From: Stefano Garzarella Date: Thu, 10 Jan 2019 15:16:44 +0100 Subject: [PATCH] pvh: load initrd and expose it through fw_cfg When initrd is specified, load and expose it to the guest firmware through fw_cfg. The firmware will fill the hvm_start_info for the kernel. Signed-off-by: Stefano Garzarella Based-on: <1545422632-2-5-git-send-email-liam.merw...@oracle.com> --- hw/i386/pc.c | 38 +- 1 file changed, 29 insertions(+), 9 deletions(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 06bce6a101..f6721f51be 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -986,25 +986,45 @@ static void load_linux(PCMachineState *pcms, */ if (load_elfboot(kernel_filename, kernel_size, header, pvh_start_addr, fw_cfg)) { -struct hvm_modlist_entry ramdisk_mod = { 0 }; - fclose(f); fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1); fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline); -assert(machine->device_memory != NULL); -ramdisk_mod.paddr = machine->device_memory->base; -ramdisk_mod.size = -memory_region_size(&machine->device_memory->mr); - -fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod, - sizeof(ramdisk_mod)); fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header)); fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, header, sizeof(header)); +/* load initrd */ +if (initrd_filename) { +gsize initrd_size; +gchar *initrd_data; +GError *gerr = NULL; + +if (!g_file_get_contents(initrd_filename, &initrd_data, +&initrd_size, &gerr)) { +fprintf(stderr, "qemu: error reading initrd %s: %s\n", +initrd_filename, gerr->message); +exit(1); +} + +initrd_max = pcms->below_4g_mem_size - pcmc->acpi_data_size - 1; +if (initrd_size >= initrd_max) { +fprintf(stderr, "qemu: initrd is too large, cannot support." +"(max: %"PRIu32", need %"PRId64")\n", +initrd_max, (uint64_t)initrd_size); +exit(1); +} + +initrd_addr = (initrd_max - initrd_size) & ~4095; + +fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, initrd_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, initrd_size); +fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, initrd_data, + initrd_size); +} + return;
Re: [Xen-devel] [RFC v2 4/4] pvh: Boot uncompressed kernel using direct boot ABI
On 02/01/2019 13:18, Stefan Hajnoczi wrote: On Fri, Dec 21, 2018 at 08:03:52PM +, Liam Merwick wrote: @@ -1336,7 +1470,7 @@ void pc_memory_init(PCMachineState *pcms, int linux_boot, i; MemoryRegion *ram, *option_rom_mr; MemoryRegion *ram_below_4g, *ram_above_4g; -FWCfgState *fw_cfg; +FWCfgState *fw_cfg = NULL; What is the purpose of this change? I've removed this. There is no need for it - it dated from when these changes used the Clear Containers -nofw patches. Regards, Liam ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()
On 02/01/2019 13:12, Stefan Hajnoczi wrote: On Fri, Dec 21, 2018 at 08:03:50PM +, Liam Merwick wrote: +while (note_type != elf_note_type) { +nhdr_namesz = nhdr->n_namesz; +nhdr_descsz = nhdr->n_descsz; + +elf_note_entry_offset = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* If the offset calculated in this iteration exceeds the +* supplied size, we are done and no matching note was found. +*/ Indentation is off here. QEMU uses 4-space indentation. +if (elf_note_entry_offset > note_size) { +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr = (void *)nhdr + elf_note_entry_offset; +note_type = nhdr->n_type; +} + +return nhdr; +} + static int glue(load_elf, SZ)(const char *name, int fd, uint64_t (*elf_note_fn)(void *, void *, bool), uint64_t (*translate_fn)(void *, uint64_t), @@ -512,6 +555,13 @@ static int glue(load_elf, SZ)(const char *name, int fd, } } + /* Search the ELF notes to find one with a type matching the +* value passed in via 'translate_opaque' +*/ +nhdr = (struct elf_note *)data; Ah, I see data gets used here! It would be clearer to move loading of data into this patch. Moved. + assert(translate_opaque != NULL); +nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align, + *(uint64_t *)translate_opaque); Indentation is off in this hunk. QEMU uses 4-space indentation. A few stray tabs had snuck in - I've fixed all those. Regards, Liam ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC v2 1/4] elf: Add optional function ptr to load_elf() to parse ELF notes
On 02/01/2019 13:06, Stefan Hajnoczi wrote: On Fri, Dec 21, 2018 at 08:03:49PM +, Liam Merwick wrote: diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h index 74679ff8da3a..37d20a3800c1 100644 --- a/include/hw/elf_ops.h +++ b/include/hw/elf_ops.h @@ -266,6 +266,7 @@ fail: } static int glue(load_elf, SZ)(const char *name, int fd, + uint64_t (*elf_note_fn)(void *, void *, bool), uint64_t (*translate_fn)(void *, uint64_t), void *translate_opaque, int must_swab, uint64_t *pentry, @@ -496,8 +497,30 @@ static int glue(load_elf, SZ)(const char *name, int fd, high = addr + mem_size; data = NULL; + +} else if (ph->p_type == PT_NOTE && elf_note_fn) { +struct elf_note *nhdr = NULL; + +file_size = ph->p_filesz; /* Size of the range of ELF notes */ +data = g_malloc0(file_size); +if (ph->p_filesz > 0) { +if (lseek(fd, ph->p_offset, SEEK_SET) < 0) { +goto fail; +} +if (read(fd, data, file_size) != file_size) { +goto fail; +} +} + +if (nhdr != NULL) { +bool is64 = +sizeof(struct elf_note) == sizeof(struct elf64_note); +elf_note_fn((void *)nhdr, (void *)&ph->p_align, is64); How does data get used? Moved (as suggested in comments for next patch) +} +g_free(data); Missing data = NULL to prevent double free later? Added explicit assignment. Regards, Liam ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [RFC v2 0/4] QEMU changes to do PVH boot
Hi Stefano, [ Catching up on mail after vacation ] On 03/01/2019 17:22, Stefano Garzarella wrote: Hi Liam, Hi Maran, I'm writing the optionrom to do PVH boot also with SeaBIOS. It is almost complete and I'm testing it, but I have some issue with QEMU -initrd parameter. (It works correctly without -initrd and using a kernel with all needed modules compiled statically) Linux boots correctly, but it is not able to find the ramdisk. (I have the same behavior with qboot) Looking at Linux, QEMU, and qboot patches, I understood that the first module pointed by 'modlist_paddr' in the 'hvm_start_info' should be used to pass the ramdisk address and size to the kernel, but I didn't understand who load it in RAM. (I guess QEMU directly or the firmware by fw_cfg interface) Can you give me some suggestions? QEMU sets the hvm_modlist_entry in load_linux() after the call to load_elfboot() and then qboot loads it in boot_pvh_from_fw_cfg() But the current PVH patches don't handle initrd (they have start_info.nr_modules == 1). During (or after) the call to load_elfboot() it looks like we'd need to do something like what load_multiboot() does below (along with the associated initialisation) 400 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_ADDR, ADDR_MBI); 401 fw_cfg_add_i32(fw_cfg, FW_CFG_INITRD_SIZE, sizeof(bootinfo)); 402 fw_cfg_add_bytes(fw_cfg, FW_CFG_INITRD_DATA, mb_bootinfo_data, 403 sizeof(bootinfo)); I'm checking to see if that has any implications for the kernel side. Regards, Liam On Fri, Dec 21, 2018 at 9:07 PM Liam Merwick wrote: For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, QEMU should be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330 qboot pull request: https://github.com/bonzini/qboot/pull/17 This patch series provides QEMU support to read the ELF header of an uncompressed kernel binary and get the 32-bit PVH kernel entry point from an ELF Note. In load_linux() a call is made to load_elfboot() so see if the header matches that of an uncompressed kernel binary (ELF) and if so, loads the binary and determines the kernel entry address from an ELF Note in the binary. Then qboot does futher initialisation of the guest (e820, etc.) and jumps to the kernel entry address and boots the guest. changes v1 -> v2 - Based on feedback from Stefan Hajnoczi - The reading of the PVH entry point is now done in a single pass during elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2 and considerably reworked. - Patch1 adds a new optional function pointer to parse the ELF note type (the type is passed in via the existing translate_opaque arg - the function already had 11 args so I didn't want to add more than one new arg). - Patch2 adds a function to elf_ops.h to find an ELF note matching a specific type - Patch3 just has a line added to the commit message to state that the Xen repo is the canonical location - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1 just minor load_elfboot() changes and the addition of a read_pvh_start_addr() helper function for load_elf() Usіng the method/scripts documented by the NEMU team at https://github.com/intel/nemu/wiki/Measuring-Boot-Latency https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html below are some timings measured (vmlinux and bzImage from the same build) Time to get to kernel start is almost halved (95ṁs -> 48ms) QEMU + qboot + vmlinux (PVH + 4.20-rc4) qemu_init_end: 41.550521 fw_start: 41.667139 (+0.116618) fw_do_boot: 47.448495 (+5.781356) linux_startup_64: 47.720785 (+0.27229) linux_start_kernel: 48.399541 (+0.678756) linux_start_user: 296.952056 (+248.552515) QEMU + qboot + bzImage: qemu_init_end: 29.209276 fw_start: 29.317342 (+0.108066) linux_start_boot: 36.679362 (+7.36202) linux_startup_64: 94.531349 (+57.851987) linux_start_kernel: 94.900913 (+0.369564) linux_start_user: 401.060971 (+306.160058) QEMU + bzImage: qemu_init_end: 30.424430 linux_startup_64: 893.770334 (+863.345904) linux_start_kernel: 894.17049 (+0.400156) linux_start_user: 1208.679768 (+314.509278) Liam Merwick (4): elf: Add optional function ptr to load_elf() to parse ELF notes elf-ops.h: Add get_elf_note_type() pvh: Add x86/HVM direct boot ABI header file pvh: Boot uncompressed kernel using direct boot ABI hw/alpha/dp264.c | 4 +- hw/arm/armv7m.c| 3 +- hw/arm/boot.c |
[Xen-devel] [RFC v2 3/4] pvh: Add x86/HVM direct boot ABI header file
From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. The canonical version of start_info.h is in the Xen codebase. (like QEMU, the Linux kernel uses a copy as well). Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h new file mode 100644 index ..348779eb10cd --- /dev/null +++ b/include/hw/xen/start_info.h @@ -0,0 +1,146 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2016, Citrix Systems, Inc. + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ +#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ + +/* + * Start of day structure passed to PVH guests and to HVM guests in %ebx. + * + * NOTE: nothing will be loaded at physical address 0, so a 0 value in any + * of the address fields should be treated as not present. + * + * 0 ++ + *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE + *|| ("xEn3" with the 0x80 bit of the "E" set). + * 4 ++ + *| version| Version of this structure. Current version is 1. New + *|| versions are guaranteed to be backwards-compatible. + * 8 ++ + *| flags | SIF_xxx flags. + * 12 ++ + *| nr_modules | Number of modules passed to the kernel. + * 16 ++ + *| modlist_paddr | Physical address of an array of modules + *|| (layout of the structure below). + * 24 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 32 ++ + *| rsdp_paddr | Physical address of the RSDP ACPI data structure. + * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ + * + * The layout of each entry in the module structure is the following: + * + * 0 ++ + *| paddr | Physical address of the module. + * 8 ++ + *| size | Size of the module in bytes. + * 16 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 24 ++ + *| reserved | + * 32 ++ + * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * + * The address and sizes are always a 64bit little endian unsigned integer. + * + * NB: Xen on x86 will always try to place all the data below the 4GiB + * boundary. + * + * Version numbers
[Xen-devel] [RFC v2 4/4] pvh: Boot uncompressed kernel using direct boot ABI
These changes (along with corresponding Linux kernel and qboot changes) enable a guest to be booted using the x86/HVM direct boot ABI. This commit adds a load_elfboot() routine to pass the size and location of the kernel entry point to qboot (which will fill in the start_info struct information needed to to boot the guest). Having loaded the ELF binary, load_linux() will run qboot which continues the boot. The address for the kernel entry point is read from an ELF Note in the uncompressed kernel binary by a helper routine passed to load_elf(). Co-developed-by: George Kennedy Signed-off-by: George Kennedy Signed-off-by: Liam Merwick --- hw/i386/pc.c | 136 +- include/elf.h | 10 + 2 files changed, 145 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 115bc2825ce4..6d44a14da44d 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -54,6 +54,7 @@ #include "sysemu/qtest.h" #include "kvm_i386.h" #include "hw/xen/xen.h" +#include "hw/xen/start_info.h" #include "ui/qemu-spice.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -109,6 +110,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -834,6 +838,109 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * + * This function is passed to load_elf() when it is called from + * load_elfboot() which then additionally checks for an ELF Note of + * type XEN_ELFNOTE_PHYS32_ENTRY and passes it to this function to + * parse the PVH entry address from the ELF Note. + * + * Due to trickery in elf_opts.h, load_elf() is actually available as + * load_elf32() or load_elf64() and this routine needs to be able + * to deal with being called as 32 or 64 bit. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. (although the entry point is 32-bit, the kernel + * binary can be either 32-bit or 64-bit). + */ +static uint64_t read_pvh_start_addr(void *arg1, void *arg2, bool is64) +{ +size_t *elf_note_data_addr; + +/* Check if ELF Note header passed in is valid */ +if (arg1 == NULL) { +return 0; +} + +if (is64) { +struct elf64_note *nhdr64 = (struct elf64_note *)arg1; +uint64_t nhdr_size64 = sizeof(struct elf64_note); +uint64_t phdr_align = *(uint64_t *)arg2; +uint64_t nhdr_namesz = nhdr64->n_namesz; + +elf_note_data_addr = +((void *)nhdr64) + nhdr_size64 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} else { +struct elf32_note *nhdr32 = (struct elf32_note *)arg1; +uint32_t nhdr_size32 = sizeof(struct elf32_note); +uint32_t phdr_align = *(uint32_t *)arg2; +uint32_t nhdr_namesz = nhdr32->n_namesz; + +elf_note_data_addr = +((void *)nhdr32) + nhdr_size32 + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align); +} + +pvh_start_addr = *elf_note_data_addr; + +return pvh_start_addr; +} + +static bool load_elfboot(const char *kernel_filename, + int kernel_file_size, + uint8_t *header, + size_t pvh_xen_start_addr, + FWCfgState *fw_cfg) +{ +uint32_t flags = 0; +uint32_t mh_load_addr = 0; +uint32_t elf_kernel_size = 0; +uint64_t elf_entry; +uint64_t elf_low, elf_high; +int kernel_size; + +if (ldl_p(header) != 0x464c457f) { +return false; /* no elfboot */ +} + +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; +flags = elf_is64 ? +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; + +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ +error_report("elfboot unsupported flags = %x", flags); +exit(1); +} + +uint64_t elf_note_type = XEN_ELFNOTE_PHYS32_ENTRY; +kernel_size = load_elf(kernel_filename, read_pvh_start_addr, + NULL, &elf_note_type, &elf_entry, + &elf_low, &elf_high, 0, I386_ELF_MACHINE, + 0, 0); + +if (kernel_size < 0) { +error_report("Error while loading elf kernel"); +exit(1); +} +mh_load_addr = elf_low; +elf_kernel_size = elf_high - elf_low; + +if (pvh_start_addr == 0) { +error_report("Error loading uncompressed kernel without PVH ELF Note"); +
[Xen-devel] [RFC v2 2/4] elf-ops.h: Add get_elf_note_type()
Introduce a routine which, given a pointer to a range of ELF Notes, searches through them looking for a note matching the type specified and returns a pointer to the matching ELF note. Signed-off-by: Liam Merwick --- include/hw/elf_ops.h | 50 ++ 1 file changed, 50 insertions(+) diff --git a/include/hw/elf_ops.h b/include/hw/elf_ops.h index 37d20a3800c1..ffbdfbe9c2d8 100644 --- a/include/hw/elf_ops.h +++ b/include/hw/elf_ops.h @@ -265,6 +265,49 @@ fail: return ret; } +/* Given 'nhdr', a pointer to a range of ELF Notes, search through them + * for a note matching type 'elf_note_type' and return a pointer to + * the matching ELF note. + */ +static struct elf_note *glue(get_elf_note_type, SZ)(struct elf_note *nhdr, +elf_word note_size, +elf_word phdr_align, +elf_word elf_note_type) +{ +elf_word nhdr_size = sizeof(struct elf_note); +elf_word elf_note_entry_offset = 0; +elf_word note_type; +elf_word nhdr_namesz; +elf_word nhdr_descsz; + +if (nhdr == NULL) { +return NULL; +} + +note_type = nhdr->n_type; +while (note_type != elf_note_type) { +nhdr_namesz = nhdr->n_namesz; +nhdr_descsz = nhdr->n_descsz; + +elf_note_entry_offset = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* If the offset calculated in this iteration exceeds the +* supplied size, we are done and no matching note was found. +*/ +if (elf_note_entry_offset > note_size) { +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr = (void *)nhdr + elf_note_entry_offset; +note_type = nhdr->n_type; +} + +return nhdr; +} + static int glue(load_elf, SZ)(const char *name, int fd, uint64_t (*elf_note_fn)(void *, void *, bool), uint64_t (*translate_fn)(void *, uint64_t), @@ -512,6 +555,13 @@ static int glue(load_elf, SZ)(const char *name, int fd, } } + /* Search the ELF notes to find one with a type matching the +* value passed in via 'translate_opaque' +*/ +nhdr = (struct elf_note *)data; + assert(translate_opaque != NULL); +nhdr = glue(get_elf_note_type, SZ)(nhdr, file_size, ph->p_align, + *(uint64_t *)translate_opaque); if (nhdr != NULL) { bool is64 = sizeof(struct elf_note) == sizeof(struct elf64_note); -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
Re: [Xen-devel] [Qemu-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
On 11/12/2018 14:57, Liam Merwick wrote: On 11/12/2018 14:01, Stefan Hajnoczi wrote: On Wed, Dec 05, 2018 at 10:37:24PM +, Liam Merwick wrote: From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. Signed-off-by: Maran Wilson Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h Does it make sense to bring in Linux include/xen/interface/hvm/start_info.h via QEMU's include/standard-headers/? QEMU has a script in scripts/update-linux-header.sh for syncing Linux headers into include/standard-headers/. This makes it easy to keep Linux header files up-to-date. We basically treat files in include/standard-headers/ as auto-generated. If you define start_info.h yourself without using include/standard-headers/, then it won't be synced with Linux. That does seem better. I will make that change. When attempting to implement this, I found the canonical copy of this header file is actually in Xen and the Linux copy is kept in sync with that. Also, 'make headers_install' doesn't install those Xen headers. Instead I updated the commit comment to mention the canonical copy location. This file isn't expected to change much so I think keeping it in sync in future shouldn't be onerous. Regards, Liam ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC v2 0/4] QEMU changes to do PVH boot
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, QEMU should be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html Details on the Linux changes (v9 staged for 4.21): https://lkml.org/lkml/2018/12/14/1330 qboot pull request: https://github.com/bonzini/qboot/pull/17 This patch series provides QEMU support to read the ELF header of an uncompressed kernel binary and get the 32-bit PVH kernel entry point from an ELF Note. In load_linux() a call is made to load_elfboot() so see if the header matches that of an uncompressed kernel binary (ELF) and if so, loads the binary and determines the kernel entry address from an ELF Note in the binary. Then qboot does futher initialisation of the guest (e820, etc.) and jumps to the kernel entry address and boots the guest. changes v1 -> v2 - Based on feedback from Stefan Hajnoczi - The reading of the PVH entry point is now done in a single pass during elf_load() which results in Patch2 in v1 being split into Patches 1&2 in v2 and considerably reworked. - Patch1 adds a new optional function pointer to parse the ELF note type (the type is passed in via the existing translate_opaque arg - the function already had 11 args so I didn't want to add more than one new arg). - Patch2 adds a function to elf_ops.h to find an ELF note matching a specific type - Patch3 just has a line added to the commit message to state that the Xen repo is the canonical location - Patch4 (that does the PVH boot) is mainly equivalent to Patch3 in v1 just minor load_elfboot() changes and the addition of a read_pvh_start_addr() helper function for load_elf() Usіng the method/scripts documented by the NEMU team at https://github.com/intel/nemu/wiki/Measuring-Boot-Latency https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html below are some timings measured (vmlinux and bzImage from the same build) Time to get to kernel start is almost halved (95ṁs -> 48ms) QEMU + qboot + vmlinux (PVH + 4.20-rc4) qemu_init_end: 41.550521 fw_start: 41.667139 (+0.116618) fw_do_boot: 47.448495 (+5.781356) linux_startup_64: 47.720785 (+0.27229) linux_start_kernel: 48.399541 (+0.678756) linux_start_user: 296.952056 (+248.552515) QEMU + qboot + bzImage: qemu_init_end: 29.209276 fw_start: 29.317342 (+0.108066) linux_start_boot: 36.679362 (+7.36202) linux_startup_64: 94.531349 (+57.851987) linux_start_kernel: 94.900913 (+0.369564) linux_start_user: 401.060971 (+306.160058) QEMU + bzImage: qemu_init_end: 30.424430 linux_startup_64: 893.770334 (+863.345904) linux_start_kernel: 894.17049 (+0.400156) linux_start_user: 1208.679768 (+314.509278) Liam Merwick (4): elf: Add optional function ptr to load_elf() to parse ELF notes elf-ops.h: Add get_elf_note_type() pvh: Add x86/HVM direct boot ABI header file pvh: Boot uncompressed kernel using direct boot ABI hw/alpha/dp264.c | 4 +- hw/arm/armv7m.c| 3 +- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 --- hw/cris/boot.c | 3 +- hw/hppa/machine.c | 6 +- hw/i386/multiboot.c| 2 +- hw/i386/pc.c | 131 +++- hw/lm32/lm32_boards.c | 6 +- hw/lm32/milkymist.c| 3 +- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 +- hw/mips/mips_fulong2e.c| 5 +- hw/mips/mips_malta.c | 5 +- hw/mips/mips_mipssim.c | 5 +- hw/mips/mips_r4k.c | 5 +- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 +- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 +- hw/ppc/mac_newworld.c | 5 +- hw/ppc/mac_oldworld.c | 5 +- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 +- hw/ppc/spapr.c | 7 +- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++- hw/sparc/leon3.c | 3 +- hw/sparc/sun4m.c | 6 +- hw/sparc64/sun4u.c | 4 +- hw/tricore/tricore_testboard.c | 2 +- hw/xtensa/sim.c| 12 ++-- hw/xtensa/xtfpga.c | 2 +- include/elf.h | 10 +++ include/hw/elf_ops.h | 72 incl
Re: [Xen-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
Thanks Stefan for the review - comments inline. On 11/12/2018 14:17, Stefan Hajnoczi wrote: On Wed, Dec 05, 2018 at 10:37:25PM +, Liam Merwick wrote: From: Liam Merwick Add support to read the PVH Entry address from an ELF note in the uncompressed kernel binary (as defined by the x86/HVM direct boot ABI). This 32-bit entry point will be used by QEMU to load the kernel in the guest and jump into the kernel entry point. For now, a call to this function is added in pc_memory_init() to read the address - a future patch will use the entry point. Signed-off-by: Liam Merwick --- hw/i386/pc.c | 272 +- include/elf.h | 10 +++ 2 files changed, 281 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f095725dbab2..056aa46d99b9 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -109,6 +109,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -834,6 +837,267 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); +/* + * Search through the ELF Notes for an entry with the given + * ELF Note type + */ +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64, +size_t elf_note_type) Generic ELF code. Can you put it in hw/core/loader.c? I've added a modified/slimmed down version to include/hw/elf_ops.h (which now handles 32 and 64 bit as you mention below). I've put this in a separate commit. +{ +void *nhdr = NULL; +size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr); +size_t elf_note_entry_sz = 0; +size_t phdr_off; +size_t phdr_align; +size_t phdr_memsz; +size_t nhdr_namesz; +size_t nhdr_descsz; +size_t note_type; The macro tricks used by hw/core/loader.c are nasty, but I think they get the types right. Here the Elf64 on 32-bit host case is definitely broken due to using size_t. Perhaps 64-on-32 isn't supported, but getting the types right is worth discussing. + +phdr_off = elf_is64 ? +((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset; +phdr_align = elf_is64 ? +((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align; +phdr_memsz = elf_is64 ? +((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz; + +nhdr = ehdr + phdr_off; The ELF file is untrusted. All inputs must be validated. phdr_off could be an bogus/malicious value. Most of the parsing of the ELF binary goes away due to moving to parse during elf_load() - more info below. +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; + +while (note_type != elf_note_type) { +elf_note_entry_sz = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* + * Verify that we haven't exceeded the end of the ELF Note section. + * If we have, then there is no note of the given type present + * in the ELF Notes. + */ +if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) { +error_report("Note type (0x%lx) not found in ELF Note section", +elf_note_type); +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr += elf_note_entry_sz; +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; +} + +return nhdr; +} + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * This function reads the ELF headers of the binary specified on the + * command line by -kernel (path contained in 'filename') and discovers + * the PVH entry address from the appropriate ELF Note. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. The ELF class of the binary is returned via 'elfclass' + * (although the entry point is 32-bit, the kernel binary can be either + * 32-bit or 64-bit). + */ +static bool read_pvh
[Xen-devel] [RFC v2 1/4] elf: Add optional function ptr to load_elf() to parse ELF notes
This patch adds an optional function pointer, 'elf_note_fn', to load_elf() which causes load_elf() to additionally parse any ELF program headers of type PT_NOTE and check to see if the ELF Note is of the type specified by the 'translate_opaque' arg. If a matching ELF Note is found then the specfied function pointer is called to process the ELF note. Passing a NULL function pointer results in ELF Notes being skipped. The first consumer of this functionality is the PVHboot support which needs to read the XEN_ELFNOTE_PHYS32_ENTRY ELF Note while loading the uncompressed kernel binary in order to discover the boot entry address for the x86/HVM direct boot ABI. Signed-off-by: Liam Merwick --- hw/alpha/dp264.c | 4 ++-- hw/arm/armv7m.c| 3 ++- hw/arm/boot.c | 2 +- hw/core/generic-loader.c | 2 +- hw/core/loader.c | 24 hw/cris/boot.c | 3 ++- hw/hppa/machine.c | 6 +++--- hw/i386/multiboot.c| 2 +- hw/lm32/lm32_boards.c | 6 -- hw/lm32/milkymist.c| 3 ++- hw/m68k/an5206.c | 2 +- hw/m68k/mcf5208.c | 2 +- hw/microblaze/boot.c | 7 --- hw/mips/mips_fulong2e.c| 5 +++-- hw/mips/mips_malta.c | 5 +++-- hw/mips/mips_mipssim.c | 5 +++-- hw/mips/mips_r4k.c | 5 +++-- hw/moxie/moxiesim.c| 2 +- hw/nios2/boot.c| 7 --- hw/openrisc/openrisc_sim.c | 2 +- hw/pci-host/prep.c | 2 +- hw/ppc/e500.c | 3 ++- hw/ppc/mac_newworld.c | 5 +++-- hw/ppc/mac_oldworld.c | 5 +++-- hw/ppc/ppc440_bamboo.c | 2 +- hw/ppc/sam460ex.c | 3 ++- hw/ppc/spapr.c | 7 --- hw/ppc/virtex_ml507.c | 2 +- hw/riscv/sifive_e.c| 2 +- hw/riscv/sifive_u.c| 2 +- hw/riscv/spike.c | 2 +- hw/riscv/virt.c| 2 +- hw/s390x/ipl.c | 9 ++--- hw/sparc/leon3.c | 3 ++- hw/sparc/sun4m.c | 6 -- hw/sparc64/sun4u.c | 4 ++-- hw/tricore/tricore_testboard.c | 2 +- hw/xtensa/sim.c| 12 hw/xtensa/xtfpga.c | 2 +- include/hw/elf_ops.h | 23 +++ include/hw/loader.h| 9 - 41 files changed, 134 insertions(+), 70 deletions(-) diff --git a/hw/alpha/dp264.c b/hw/alpha/dp264.c index dd62f2a4050c..0347eb897c8a 100644 --- a/hw/alpha/dp264.c +++ b/hw/alpha/dp264.c @@ -114,7 +114,7 @@ static void clipper_init(MachineState *machine) error_report("no palcode provided"); exit(1); } -size = load_elf(palcode_filename, cpu_alpha_superpage_to_phys, +size = load_elf(palcode_filename, NULL, cpu_alpha_superpage_to_phys, NULL, &palcode_entry, &palcode_low, &palcode_high, 0, EM_ALPHA, 0, 0); if (size < 0) { @@ -133,7 +133,7 @@ static void clipper_init(MachineState *machine) if (kernel_filename) { uint64_t param_offset; -size = load_elf(kernel_filename, cpu_alpha_superpage_to_phys, +size = load_elf(kernel_filename, NULL, cpu_alpha_superpage_to_phys, NULL, &kernel_entry, &kernel_low, &kernel_high, 0, EM_ALPHA, 0, 0); if (size < 0) { diff --git a/hw/arm/armv7m.c b/hw/arm/armv7m.c index 4bf9131b81e4..a4d528537eb4 100644 --- a/hw/arm/armv7m.c +++ b/hw/arm/armv7m.c @@ -298,7 +298,8 @@ void armv7m_load_kernel(ARMCPU *cpu, const char *kernel_filename, int mem_size) as = cpu_get_address_space(cs, asidx); if (kernel_filename) { -image_size = load_elf_as(kernel_filename, NULL, NULL, &entry, &lowaddr, +image_size = load_elf_as(kernel_filename, NULL, NULL, NULL, + &entry, &lowaddr, NULL, big_endian, EM_ARM, 1, 0, as); if (image_size < 0) { image_size = load_image_targphys_as(kernel_filename, 0, diff --git a/hw/arm/boot.c b/hw/arm/boot.c index 94fce128028c..2b59379be6af 100644 --- a/hw/arm/boot.c +++ b/hw/arm/boot.c @@ -884,7 +884,7 @@ static int64_t arm_load_elf(struct arm_boot_info *info, uint64_t *pentry, } } -ret = load_elf_as(info->kernel_filename, NULL, NULL, +ret = load_elf_as(info->kernel_filename, NULL, NULL, NULL, pentry, lowaddr, highaddr, big_endian, elf_machine, 1, data_swab, as); if (ret <= 0) { diff --git a/hw/core/generic-loader.c b/hw/core/generic-loader.c index fbae05fb3b64..3695dd439cd0 100644 --- a/hw/core/generic-loader.c +++ b/hw/core/generic-loader.c @@ -136,7 +136,7 @@ static void generic_loader_realize(DeviceState *dev, E
Re: [Xen-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
On 11/12/2018 14:01, Stefan Hajnoczi wrote: On Wed, Dec 05, 2018 at 10:37:24PM +, Liam Merwick wrote: From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. Signed-off-by: Maran Wilson Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h Does it make sense to bring in Linux include/xen/interface/hvm/start_info.h via QEMU's include/standard-headers/? QEMU has a script in scripts/update-linux-header.sh for syncing Linux headers into include/standard-headers/. This makes it easy to keep Linux header files up-to-date. We basically treat files in include/standard-headers/ as auto-generated. If you define start_info.h yourself without using include/standard-headers/, then it won't be synced with Linux. That does seem better. I will make that change. One a related note, I'm trying to fix the mingw compilation errors [1] in this series also. I can fix the format issues with PRIx64, etc but I can't seem to find an include file to provide a declaration of mmap() et. al. - has this been resolved before? A pointer to something similar to investigate would be very welcome. Regards, Liam [1] http://patchew.org/logs/1544049446-6359-1-git-send-email-liam.merw...@oracle.com/testing.docker-mingw@fedora/?type=message ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC 0/3] QEMU changes to do PVH boot
For certain applications it is desirable to rapidly boot a KVM virtual machine. In cases where legacy hardware and software support within the guest is not needed, QEMU should be able to boot directly into the uncompressed Linux kernel binary with minimal firmware involvement. There already exists an ABI to allow this for Xen PVH guests and the ABI is supported by Linux and FreeBSD: https://xenbits.xen.org/docs/unstable/misc/pvh.html Details on the Linux changes: https://lkml.org/lkml/2018/4/16/1002 qboot patches: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=80020 This patch series provides QEMU support to read the ELF header of an uncompressed kernel binary and get the 32-bit PVH kernel entry point from an ELF Note. This is called when initialising the machine state in pc_memory_init(). Later on in load_linux() if the kernel entry address is present, the uncompressed kernel binary (ELF) is loaded and qboot does futher initialisation of the guest (e820, etc.) and jumps to the kernel entry address and boots the guest. Usіng the method/scripts documented by the NEMU team at https://github.com/intel/nemu/wiki/Measuring-Boot-Latency https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg00200.html below are some timings measured (vmlinux and bzImage from the same build) Time to get to kernel start is almost halved (95ṁs -> 48ms) QEMU + qboot + vmlinux (PVH + 4.20-rc4) qemu_init_end: 41.550521 fw_start: 41.667139 (+0.116618) fw_do_boot: 47.448495 (+5.781356) linux_startup_64: 47.720785 (+0.27229) linux_start_kernel: 48.399541 (+0.678756) linux_start_user: 296.952056 (+248.552515) QEMU + qboot + bzImage: qemu_init_end: 29.209276 fw_start: 29.317342 (+0.108066) linux_start_boot: 36.679362 (+7.36202) linux_startup_64: 94.531349 (+57.851987) linux_start_kernel: 94.900913 (+0.369564) linux_start_user: 401.060971 (+306.160058) QEMU + bzImage: qemu_init_end: 30.424430 linux_startup_64: 893.770334 (+863.345904) linux_start_kernel: 894.17049 (+0.400156) linux_start_user: 1208.679768 (+314.509278) Liam Merwick (3): pvh: Add x86/HVM direct boot ABI header file pc: Read PVH entry point from ELF note in kernel binary pvh: Boot uncompressed kernel using direct boot ABI hw/i386/pc.c| 344 +++- include/elf.h | 10 ++ include/hw/xen/start_info.h | 146 +++ 3 files changed, 499 insertions(+), 1 deletion(-) create mode 100644 include/hw/xen/start_info.h -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [RFC 2/3] pc: Read PVH entry point from ELF note in kernel binary
From: Liam Merwick Add support to read the PVH Entry address from an ELF note in the uncompressed kernel binary (as defined by the x86/HVM direct boot ABI). This 32-bit entry point will be used by QEMU to load the kernel in the guest and jump into the kernel entry point. For now, a call to this function is added in pc_memory_init() to read the address - a future patch will use the entry point. Signed-off-by: Liam Merwick --- hw/i386/pc.c | 272 +- include/elf.h | 10 +++ 2 files changed, 281 insertions(+), 1 deletion(-) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index f095725dbab2..056aa46d99b9 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -109,6 +109,9 @@ static struct e820_entry *e820_table; static unsigned e820_entries; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; +/* Physical Address of PVH entry point read from kernel ELF NOTE */ +static size_t pvh_start_addr; + void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -834,6 +837,267 @@ struct setup_data { uint8_t data[0]; } __attribute__((packed)); +/* + * Search through the ELF Notes for an entry with the given + * ELF Note type + */ +static void *get_elf_note_type(void *ehdr, void *phdr, bool elf_is64, +size_t elf_note_type) +{ +void *nhdr = NULL; +size_t nhdr_size = elf_is64 ? sizeof(Elf64_Nhdr) : sizeof(Elf32_Nhdr); +size_t elf_note_entry_sz = 0; +size_t phdr_off; +size_t phdr_align; +size_t phdr_memsz; +size_t nhdr_namesz; +size_t nhdr_descsz; +size_t note_type; + +phdr_off = elf_is64 ? +((Elf64_Phdr *)phdr)->p_offset : ((Elf32_Phdr *)phdr)->p_offset; +phdr_align = elf_is64 ? +((Elf64_Phdr *)phdr)->p_align : ((Elf32_Phdr *)phdr)->p_align; +phdr_memsz = elf_is64 ? +((Elf64_Phdr *)phdr)->p_memsz : ((Elf32_Phdr *)phdr)->p_memsz; + +nhdr = ehdr + phdr_off; +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; + +while (note_type != elf_note_type) { +elf_note_entry_sz = nhdr_size + +QEMU_ALIGN_UP(nhdr_namesz, phdr_align) + +QEMU_ALIGN_UP(nhdr_descsz, phdr_align); + +/* + * Verify that we haven't exceeded the end of the ELF Note section. + * If we have, then there is no note of the given type present + * in the ELF Notes. + */ +if (phdr_off + phdr_memsz < ((nhdr - ehdr) + elf_note_entry_sz)) { +error_report("Note type (0x%lx) not found in ELF Note section", +elf_note_type); +return NULL; +} + +/* skip to the next ELF Note entry */ +nhdr += elf_note_entry_sz; +note_type = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_type : ((Elf32_Nhdr *)nhdr)->n_type; +nhdr_namesz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_namesz : ((Elf32_Nhdr *)nhdr)->n_namesz; +nhdr_descsz = elf_is64 ? +((Elf64_Nhdr *)nhdr)->n_descsz : ((Elf32_Nhdr *)nhdr)->n_descsz; +} + +return nhdr; +} + +/* + * The entry point into the kernel for PVH boot is different from + * the native entry point. The PVH entry is defined by the x86/HVM + * direct boot ABI and is available in an ELFNOTE in the kernel binary. + * This function reads the ELF headers of the binary specified on the + * command line by -kernel (path contained in 'filename') and discovers + * the PVH entry address from the appropriate ELF Note. + * + * The address of the PVH entry point is saved to the 'pvh_start_addr' + * global variable. The ELF class of the binary is returned via 'elfclass' + * (although the entry point is 32-bit, the kernel binary can be either + * 32-bit or 64-bit). + */ +static bool read_pvh_start_addr_elf_note(const char *filename, +unsigned char *elfclass) +{ +void *ehdr = NULL; /* Cast to Elf64_Ehdr or Elf32_Ehdr */ +void *phdr = NULL; /* Cast to Elf64_Phdr or Elf32_Phdr */ +void *nhdr = NULL; /* Cast to Elf64_Nhdr or Elf32_Nhdr */ +struct stat statbuf; +size_t ehdr_size; +size_t phdr_size; +size_t nhdr_size; +size_t elf_note_data_addr; +/* Ehdr fields */ +size_t ehdr_poff; +/* Phdr fields */ +size_t phdr_off; +size_t phdr_align; +size_t phdr_memsz; +size_t phdr_type; +/* Nhdr fields */ +size_t nhdr_namesz; +size_t nhdr_descsz; +bool elf_is64; +FILE *file; +union { +Elf32_Ehdr h32; +Elf64_Ehdr h64; +} elf_header; +Error *err = NULL; + +pvh_start_addr = 0; + +if (filename == NULL) { +return false; +} + +file = fopen(filename, &quo
[Xen-devel] [RFC 1/3] pvh: Add x86/HVM direct boot ABI header file
From: Liam Merwick The x86/HVM direct boot ABI permits Qemu to be able to boot directly into the uncompressed Linux kernel binary without the need to run firmware. https://xenbits.xen.org/docs/unstable/misc/pvh.html This commit adds the header file that defines the start_info struct that needs to be populated in order to use this ABI. Signed-off-by: Maran Wilson Signed-off-by: Liam Merwick Reviewed-by: Konrad Rzeszutek Wilk --- include/hw/xen/start_info.h | 146 1 file changed, 146 insertions(+) create mode 100644 include/hw/xen/start_info.h diff --git a/include/hw/xen/start_info.h b/include/hw/xen/start_info.h new file mode 100644 index ..348779eb10cd --- /dev/null +++ b/include/hw/xen/start_info.h @@ -0,0 +1,146 @@ +/* + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the "Software"), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2016, Citrix Systems, Inc. + */ + +#ifndef __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ +#define __XEN_PUBLIC_ARCH_X86_HVM_START_INFO_H__ + +/* + * Start of day structure passed to PVH guests and to HVM guests in %ebx. + * + * NOTE: nothing will be loaded at physical address 0, so a 0 value in any + * of the address fields should be treated as not present. + * + * 0 ++ + *| magic | Contains the magic value XEN_HVM_START_MAGIC_VALUE + *|| ("xEn3" with the 0x80 bit of the "E" set). + * 4 ++ + *| version| Version of this structure. Current version is 1. New + *|| versions are guaranteed to be backwards-compatible. + * 8 ++ + *| flags | SIF_xxx flags. + * 12 ++ + *| nr_modules | Number of modules passed to the kernel. + * 16 ++ + *| modlist_paddr | Physical address of an array of modules + *|| (layout of the structure below). + * 24 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 32 ++ + *| rsdp_paddr | Physical address of the RSDP ACPI data structure. + * 40 ++ + *| memmap_paddr | Physical address of the (optional) memory map. Only + *|| present in version 1 and newer of the structure. + * 48 ++ + *| memmap_entries | Number of entries in the memory map table. Only + *|| present in version 1 and newer of the structure. + *|| Zero if there is no memory map being provided. + * 52 ++ + *| reserved | Version 1 and newer only. + * 56 ++ + * + * The layout of each entry in the module structure is the following: + * + * 0 ++ + *| paddr | Physical address of the module. + * 8 ++ + *| size | Size of the module in bytes. + * 16 ++ + *| cmdline_paddr | Physical address of the command line, + *|| a zero-terminated ASCII string. + * 24 ++ + *| reserved | + * 32 ++ + * + * The layout of each entry in the memory map table is as follows: + * + * 0 ++ + *| addr | Base address + * 8 ++ + *| size | Size of mapping in bytes + * 16 ++ + *| type | Type of mapping as defined between the hypervisor + *|| and guest it's starting. E820_TYPE_xxx, for example. + * 20 +| + *| reserved | + * 24 ++ + * + * The address and sizes are always a 64bit little endian unsigned integer. + * + * NB: Xen on x86 will always try to place all the data below the 4GiB + * boundary. + * + * Version numbers of the hvm_start_info structure have evolved like this: + * + * Version 0: + * + * Versi
[Xen-devel] [RFC 3/3] pvh: Boot uncompressed kernel using direct boot ABI
These changes (along with corresponding qboot and Linux kernel changes) enable a guest to be booted using the x86/HVM direct boot ABI. This commit adds a load_elfboot() routine to pass the size and location of the kernel entry point to qboot (which will fill in the start_info struct information needed to to boot the guest). Having loaded the ELF binary, load_linux() will run qboot which continues the boot. The address for the kernel entry point has already been read from an ELF Note in the uncompressed kernel binary earlier in pc_memory_init(). Signed-off-by: George Kennedy Signed-off-by: Liam Merwick --- hw/i386/pc.c | 72 1 file changed, 72 insertions(+) diff --git a/hw/i386/pc.c b/hw/i386/pc.c index 056aa46d99b9..d3012cbd8597 100644 --- a/hw/i386/pc.c +++ b/hw/i386/pc.c @@ -54,6 +54,7 @@ #include "sysemu/qtest.h" #include "kvm_i386.h" #include "hw/xen/xen.h" +#include "hw/xen/start_info.h" #include "ui/qemu-spice.h" #include "exec/memory.h" #include "exec/address-spaces.h" @@ -1098,6 +1099,50 @@ done: return pvh_start_addr != 0; } +static bool load_elfboot(const char *kernel_filename, + int kernel_file_size, + uint8_t *header, + size_t pvh_xen_start_addr, + FWCfgState *fw_cfg) +{ +uint32_t flags = 0; +uint32_t mh_load_addr = 0; +uint32_t elf_kernel_size = 0; +uint64_t elf_entry; +uint64_t elf_low, elf_high; +int kernel_size; + +if (ldl_p(header) != 0x464c457f) { +return false; /* no elfboot */ +} + +bool elf_is64 = header[EI_CLASS] == ELFCLASS64; +flags = elf_is64 ? +((Elf64_Ehdr *)header)->e_flags : ((Elf32_Ehdr *)header)->e_flags; + +if (flags & 0x00010004) { /* LOAD_ELF_HEADER_HAS_ADDR */ +error_report("elfboot unsupported flags = %x", flags); +exit(1); +} + +kernel_size = load_elf(kernel_filename, NULL, NULL, &elf_entry, + &elf_low, &elf_high, 0, I386_ELF_MACHINE, + 0, 0); + +if (kernel_size < 0) { +error_report("Error while loading elf kernel"); +exit(1); +} +mh_load_addr = elf_low; +elf_kernel_size = elf_high - elf_low; + +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ENTRY, pvh_xen_start_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, mh_load_addr); +fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, elf_kernel_size); + +return true; +} + static void load_linux(PCMachineState *pcms, FWCfgState *fw_cfg) { @@ -1138,6 +1183,33 @@ static void load_linux(PCMachineState *pcms, if (ldl_p(header+0x202) == 0x53726448) { protocol = lduw_p(header+0x206); } else { +/* If the kernel address for using the x86/HVM direct boot ABI has + * been saved then proceed with booting the uncompressed kernel */ +if (pvh_start_addr) { +if (load_elfboot(kernel_filename, kernel_size, + header, pvh_start_addr, fw_cfg)) { +struct hvm_modlist_entry ramdisk_mod = { 0 }; + +fclose(f); + +fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, +strlen(kernel_cmdline) + 1); +fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline); + +assert(machine->device_memory != NULL); +ramdisk_mod.paddr = machine->device_memory->base; +ramdisk_mod.size = +memory_region_size(&machine->device_memory->mr); + +fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, &ramdisk_mod, + sizeof(ramdisk_mod)); +fw_cfg_add_i32(fw_cfg, FW_CFG_SETUP_SIZE, sizeof(header)); +fw_cfg_add_bytes(fw_cfg, FW_CFG_SETUP_DATA, + header, sizeof(header)); + +return; +} +} /* This looks like a multiboot kernel. If it is, let's stop treating it like a Linux kernel. */ if (load_multiboot(fw_cfg, f, kernel_filename, initrd_filename, -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel
[Xen-devel] [PATCH] xen/grant-table: Fix incorrect gnttab_dma_free_pages() pr_debug message
If a call to xenmem_reservation_increase() in gnttab_dma_free_pages() fails it triggers a message "Failed to decrease reservation..." which should be "Failed to increase reservation..." Fixes: 9bdc7304f536 ('xen/grant-table: Allow allocating buffers suitable for DMA') Reported-by: Ross Philipson Signed-off-by: Liam Merwick Reviewed-by: Mark Kanda --- drivers/xen/grant-table.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c index 84575baceebc..97341fa75458 100644 --- a/drivers/xen/grant-table.c +++ b/drivers/xen/grant-table.c @@ -914,7 +914,7 @@ int gnttab_dma_free_pages(struct gnttab_dma_alloc_args *args) ret = xenmem_reservation_increase(args->nr_pages, args->frames); if (ret != args->nr_pages) { - pr_debug("Failed to decrease reservation for DMA buffer\n"); + pr_debug("Failed to increase reservation for DMA buffer\n"); ret = -EFAULT; } else { ret = 0; -- 1.8.3.1 ___ Xen-devel mailing list Xen-devel@lists.xenproject.org https://lists.xenproject.org/mailman/listinfo/xen-devel