PVH: Support relocatable dom0 kernels

Jason Andryuk Thu, 21 Mar 2024 06:54:45 -0700

On 2024-03-20 10:39, Jan Beulich wrote:

On 19.03.2024 21:58, Jason Andryuk wrote:

--- a/xen/arch/x86/hvm/dom0_build.c
+++ b/xen/arch/x86/hvm/dom0_build.c
@@ -537,6 +537,97 @@ static paddr_t __init find_memory(
      return INVALID_PADDR;
  }

+static bool __init check_load_address(

+    const struct domain *d, const struct elf_binary *elf)
+{
+    paddr_t kernel_start = (paddr_t)elf->dest_base;


As before I think it is illegitimate to cast a pointer to paddr_t. The
variable type wants to remain such, but the cast wants to be to
unsigned long or uintptr_t (or else at least a comment wants adding).


Ok, thanks.  This explains it clearly.

+/* Find an e820 RAM region that fits the kernel at a suitable alignment. */
+static paddr_t __init find_kernel_memory(
+    const struct domain *d, struct elf_binary *elf,
+    const struct elf_dom_parms *parms)
+{
+    paddr_t kernel_size = elf->dest_size;
+    unsigned int i;
+
+    for ( i = 0; i < d->arch.nr_e820; i++ )
+    {
+        paddr_t start = d->arch.e820[i].addr;
+        paddr_t end = d->arch.e820[i].addr + d->arch.e820[i].size;
+        paddr_t kstart, kend;
+
+        if ( d->arch.e820[i].type != E820_RAM ||
+             d->arch.e820[i].size < kernel_size )
+            continue;
+
+        kstart = ROUNDUP(start, parms->phys_align);
+        kstart = max_t(paddr_t, kstart, parms->phys_min);


I'd generally try to avoid max_t(), but I cannot think of any good way
of expressing this without using it.

+        kend = kstart + kernel_size;
+
+        if ( kend > parms->phys_max )


So despite its default phys_max is exclusive aiui. Otherwise with
kend being exclusive too, this would look to be off by one.

Yes, I'll fix the off-by-one. Hmmm, phys_max being 32bit, we want it tobe inclusive to represent the maximum range.

+            return 0;
+
+        if ( kend <= end )
+            return kstart;
+    }
+
+    return 0;
+}
+
+/* Check the kernel load address, and adjust if necessary and possible. */
+static bool __init check_and_adjust_load_address(
+    const struct domain *d, struct elf_binary *elf, struct elf_dom_parms 
*parms)
+{
+    paddr_t reloc_base;
+
+    if ( check_load_address(d, elf) )
+        return true;
+
+    if ( !parms->phys_reloc )
+    {
+        printk("%pd kernel: Address conflict and not relocatable\n", d);
+        return false;


This better wouldn't result in -ENOMEM in the caller. Then again reasons
are logged, so the specific error code perhaos doesn't matter that much.

Failure here is turned into -ENOMEM by pvh_load_kernel(). -ENOMEM isalready returned for later failure with find_memory(), so I thought itwas acceptable. Without this code, elf_load_binary() would failed with-1 and that would be returned. I'll change it to whatever you prefer.

+    }
+
+    reloc_base = find_kernel_memory(d, elf, parms);
+    if ( reloc_base == 0 )


With ! used above please be consistent and do so here, too.


phys_reloc is a bool, and reloc_base is a paddr_t.

+    {
+        printk("%pd kernel: Failed find a load address\n", d);
+        return false;
+    }
+
+    if ( opt_dom0_verbose )
+        printk("%pd kernel: Moving [%p, %p] -> [%"PRIpaddr", %"PRIpaddr"]\n", 
d,
+               elf->dest_base, elf->dest_base + elf->dest_size - 1,
+               reloc_base, reloc_base + elf->dest_size - 1);
+
+    parms->phys_entry = reloc_base +
+                            (parms->phys_entry - (paddr_t)elf->dest_base);
+    elf->dest_base = (char *)reloc_base;


Just as a remark, no request to change anything: We're not dealing with
strings here. Hence char * isn't quite right, just that "dest_base" is
of that type (for whatever reason).


I think the reason is just to be byte addressable.

@@ -225,6 +232,28 @@ elf_errorstatus elf_xen_parse_note(struct elf_binary *elf,
      case XEN_ELFNOTE_PHYS32_ENTRY:
          parms->phys_entry = val;
          break;
+
+    case XEN_ELFNOTE_PHYS32_RELOC:
+        parms->phys_reloc = true;
+
+        if ( descsz >= 4 )
+        {
+            parms->phys_max = elf_note_numeric_array(elf, note, 4, 0);
+            elf_msg(elf, " = max: %#"PRIx32, parms->phys_max);
+        }
+        if ( descsz >= 8 )
+        {
+            parms->phys_min = elf_note_numeric_array(elf, note, 4, 1);
+            elf_msg(elf, " min: %#"PRIx32, parms->phys_min);
+        }
+        if ( descsz >= 12 )
+        {
+            parms->phys_align = elf_note_numeric_array(elf, note, 4, 2);
+            elf_msg(elf, " align: %#"PRIx32, parms->phys_align);
+        }
+
+        elf_msg(elf, "\n");


Imo the newline wants adding outside of the switch(), for everything
set to ELFNOTE_CUSTOM. In fact with that I think ELFNOTE_CUSTOM and
ELFNOTE_NAME could be folded (with what's NAME right now simply
printing nothing in the switch here). Which suggests that I may
better not commit the (slightly adjusted) earlier patch, yet.


Yes, please hold off and I'll re-work both.

@@ -536,6 +565,10 @@ elf_errorstatus elf_xen_parse(struct elf_binary *elf,
      parms->p2m_base = UNSET_ADDR;
      parms->elf_paddr_offset = UNSET_ADDR;
      parms->phys_entry = UNSET_ADDR32;
+    parms->phys_min = 0;
+    parms->phys_max = 0xffffffff;
+    parms->phys_align = 0x200000;


I think this wants to be MB(2) (requiring a pre-patch to centralize MB()
in the tool stack to tools/include/xen-tools/common-macros.h). And I
further think this needs to be an arch-specific constant, even if right
now the note is expected to be present only for x86. Which then also
needs saying ...


Ok, I'll look into this.

--- a/xen/include/public/elfnote.h
+++ b/xen/include/public/elfnote.h
@@ -194,10 +194,26 @@
   */
  #define XEN_ELFNOTE_PHYS32_ENTRY 18

+/*

+ * Physical loading constraints for PVH kernels
+ *
+ * Used to place constraints on the guest physical loading addresses and
+ * alignment for a PVH kernel.
+ *
+ * The presence of this note indicates the kernel supports relocating itself.
+ *
+ * The note may include up to three 32bit values in the following order:
+ *  - a maximum address for the entire image to be loaded below (default
+ *      0xffffffff)
+ *  - a minimum address for the start of the image (default 0)
+ *  - a required start alignment (default 0x200000)
+ */
+#define XEN_ELFNOTE_PHYS32_RELOC 19


... in the comment here. The reasoning behind not (intermediately) falling
back to the alignment in the ELF headers also wants adding to the patch
description (or a code comment, albeit there's no really good place to put
it, imo).

Additionally I think the pieces of the comment want re-ordering. The primary
purpose is indicating relocatability; when not relocating, the constraints
aren't applied in any way.


Yes, this is a good point.

Thanks for the review.

-Jason

Re: [PATCH v3 3/3] x86/PVH: Support relocatable dom0 kernels

Reply via email to