date:20201023

On 22.10.2020 21:18, Elliott Mitchell wrote:
> On Thu, Oct 22, 2020 at 07:44:26PM +0100, Julien Grall wrote:
>> Thank you for the patch. FIY I tweak a bit the commit title before 
>> committing.
>>
>> The title is now: "xen/arm: acpi: Don't fail it SPCR table is absent".
> 
> Perhaps "xen/arm: acpi: Don't fail on absent SPCR table"?
> 
> What you're suggesting doesn't read well to me.

Perhaps Julien meant "if" instead of "it". i.e. a simple typo?

Jan

Re: [PATCH] xen/arm: ACPI: Remove EXPERT dependancy, default on for ARM64

On 23.10.2020 05:35, Elliott Mitchell wrote:
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -32,13 +32,18 @@ menu "Architecture Features"
>  source "arch/Kconfig"
>  
>  config ACPI
> - bool "ACPI (Advanced Configuration and Power Interface) Support" if 
> EXPERT
> + bool "ACPI (Advanced Configuration and Power Interface) Support"
>   depends on ARM_64
> + default y if ARM_64

The "if" is pointless with the "depends on".

> --- a/xen/arch/arm/acpi/boot.c
> +++ b/xen/arch/arm/acpi/boot.c
> @@ -254,6 +254,15 @@ int __init acpi_boot_table_init(void)
> dt_scan_depth1_nodes, NULL) )
>  goto disable;
>  
> +printk("\n"
> +"*\n"
> +"*WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING*\n"
> +"*   *\n"
> +"* Xen-ARM ACPI support is EXPERIMENTAL.  It is presently (October 2020) *\n"
> +"* recommended you boot your system in device-tree mode if you can.  *\n"
> +"*\n"
> +"\n");
> +

We have an abstraction for such warnings, causing them to appear
later in the boot process and then consistently all in one place
(both increasing, as we believe, the chances of being noticed):
warning_add(). There's a delay accompanied with this, so I think
you will want to also have a command line option allowing to
silence this warning. "acpi=on" or "acpi=force", as available on
x86 and (possibly wrongly right now) not documented as
x86-specific, may be (re-)usable, i.e. to avoid having to
introduce some entirely new option.

Also a few formal nits: The subject tag should have been [PATCH v2],
there should have been a short revision log outside of the commit
message area, and new patch versione would better start their own
new threads than being in-reply-to the earlier version's one.

Jan

[PATCH] PCI: drop dead pci_lock_*pdev() declarations

They have no definitions, and hence users, anywhere.

Signed-off-by: Jan Beulich 

--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -155,9 +155,6 @@ bool_t pci_device_detect(u16 seg, u8 bus
 int scan_pci_devices(void);
 enum pdev_type pdev_type(u16 seg, u8 bus, u8 devfn);
 int find_upstream_bridge(u16 seg, u8 *bus, u8 *devfn, u8 *secbus);
-struct pci_dev *pci_lock_pdev(int seg, int bus, int devfn);
-struct pci_dev *pci_lock_domain_pdev(
-struct domain *, int seg, int bus, int devfn);
 
 void setup_hwdom_pci_devices(struct domain *,
 int (*)(u8 devfn, struct pci_dev *));

[PATCH v3 0/7] x86: some assembler macro rework

Parts of this were discussed in the context of Andrew's CET-SS work.
Further parts simply fit the underlying picture. And a few patches
towards the end get attached here simply because of their dependency.
Patch 7 is new.

All patches except for the new ones in principle have acks / R-b-s
which would allow them to go in. However, there still the controversy
on the naming of the newly introduced header in patch 1 (which
subsequent patches then add to). There hasn't been a name suggestion
which would - imo - truly represent an improvement.

It's also still not really clear to me what - if any - changes to
make to patch 2. As said there I'd be willing to drop some of the
changes made, but not all. Prior discussion hasn't led to a clear
understanding on my part of what's wanted to be kept / dropped. It
could have looked like the entire patch was wanted to go away, but I
don't think I can agree with this. (I could see about moving this to
the end of the series, to unblock what's currently the remainder.)

1: replace __ASM_{CL,ST}AC
2: reduce CET-SS related #ifdef-ary
3: drop ASM_{CL,ST}AC
4: fold indirect_thunk_asm.h into asm-defns.h
5: guard against straight-line speculation past RET
6: limit amount of INT3 in IND_THUNK_*
7: make guarding against straight-line speculation optional

Jan

[PATCH v3 1/7] x86: replace __ASM_{CL,ST}AC

Introduce proper assembler macros instead, enabled only when the
assembler itself doesn't support the insns. To avoid duplicating the
macros for assembly and C files, have them processed into asm-macros.h.
This in turn requires adding a multiple inclusion guard when generating
that header.

No change to generated code.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 

--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -243,7 +243,10 @@ $(BASEDIR)/include/asm-x86/asm-macros.h:
echo '#if 0' >$@.new
echo '.if 0' >>$@.new
echo '#endif' >>$@.new
+   echo '#ifndef __ASM_MACROS_H__' >>$@.new
+   echo '#define __ASM_MACROS_H__' >>$@.new
echo 'asm ( ".include \"$@\"" );' >>$@.new
+   echo '#endif /* __ASM_MACROS_H__ */' >>$@.new
echo '#if 0' >>$@.new
echo '.endif' >>$@.new
cat $< >>$@.new
--- a/xen/arch/x86/arch.mk
+++ b/xen/arch/x86/arch.mk
@@ -20,6 +20,7 @@ $(call as-option-add,CFLAGS,CC,"rdrand %
 $(call as-option-add,CFLAGS,CC,"rdfsbase %rax",-DHAVE_AS_FSGSBASE)
 $(call as-option-add,CFLAGS,CC,"xsaveopt (%rax)",-DHAVE_AS_XSAVEOPT)
 $(call as-option-add,CFLAGS,CC,"rdseed %eax",-DHAVE_AS_RDSEED)
+$(call as-option-add,CFLAGS,CC,"clac",-DHAVE_AS_CLAC_STAC)
 $(call as-option-add,CFLAGS,CC,"clwb (%rax)",-DHAVE_AS_CLWB)
 $(call as-option-add,CFLAGS,CC,".equ \"x\"$$(comma)1",-DHAVE_AS_QUOTED_SYM)
 $(call as-option-add,CFLAGS,CC,"invpcid (%rax)$$(comma)%rax",-DHAVE_AS_INVPCID)
--- a/xen/arch/x86/asm-macros.c
+++ b/xen/arch/x86/asm-macros.c
@@ -1 +1,2 @@
+#include 
 #include 
--- /dev/null
+++ b/xen/include/asm-x86/asm-defns.h
@@ -0,0 +1,9 @@
+#ifndef HAVE_AS_CLAC_STAC
+.macro clac
+.byte 0x0f, 0x01, 0xca
+.endm
+
+.macro stac
+.byte 0x0f, 0x01, 0xcb
+.endm
+#endif
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -13,10 +13,12 @@
 #include 
 
 #ifdef __ASSEMBLY__
+#include 
 #ifndef CONFIG_INDIRECT_THUNK
 .equ CONFIG_INDIRECT_THUNK, 0
 #endif
 #else
+#include 
 asm ( "\t.equ CONFIG_INDIRECT_THUNK, "
   __stringify(IS_ENABLED(CONFIG_INDIRECT_THUNK)) );
 #endif
@@ -200,34 +202,27 @@ register unsigned long current_stack_poi
 
 #endif
 
-/* "Raw" instruction opcodes */
-#define __ASM_CLAC  ".byte 0x0f,0x01,0xca"
-#define __ASM_STAC  ".byte 0x0f,0x01,0xcb"
-
 #ifdef __ASSEMBLY__
 .macro ASM_STAC
-ALTERNATIVE "", __ASM_STAC, X86_FEATURE_XEN_SMAP
+ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
 .endm
 .macro ASM_CLAC
-ALTERNATIVE "", __ASM_CLAC, X86_FEATURE_XEN_SMAP
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 .endm
 #else
 static always_inline void clac(void)
 {
 /* Note: a barrier is implicit in alternative() */
-alternative("", __ASM_CLAC, X86_FEATURE_XEN_SMAP);
+alternative("", "clac", X86_FEATURE_XEN_SMAP);
 }
 
 static always_inline void stac(void)
 {
 /* Note: a barrier is implicit in alternative() */
-alternative("", __ASM_STAC, X86_FEATURE_XEN_SMAP);
+alternative("", "stac", X86_FEATURE_XEN_SMAP);
 }
 #endif
 
-#undef __ASM_STAC
-#undef __ASM_CLAC
-
 #ifdef __ASSEMBLY__
 .macro SAVE_ALL op, compat=0
 .ifeqs "\op", "CLAC"

[PATCH v3 3/7] x86: drop ASM_{CL,ST}AC

Use ALTERNATIVE directly, such that at the use sites it is visible that
alternative code patching is in use. Similarly avoid hiding the fact in
SAVE_ALL.

No change to generated code.

Signed-off-by: Jan Beulich 
Reviewed-by: Andrew Cooper 
---
v2: Further adjust comment in asm_domain_crash_synchronous().

--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2165,9 +2165,8 @@ void activate_debugregs(const struct vcp
 void asm_domain_crash_synchronous(unsigned long addr)
 {
 /*
- * We need clear AC bit here because in entry.S AC is set
- * by ASM_STAC to temporarily allow accesses to user pages
- * which is prevented by SMAP by default.
+ * We need to clear the AC bit here because the exception fixup logic
+ * may leave user accesses enabled.
  *
  * For some code paths, where this function is called, clac()
  * is not needed, but adding clac() here instead of each place
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -12,7 +12,7 @@
 #include 
 
 ENTRY(entry_int82)
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $0
 movl  $HYPERCALL_VECTOR, 4(%rsp)
 SAVE_ALL compat=1 /* DPL1 gate, restricted to 32bit PV guests only. */
@@ -284,7 +284,7 @@ ENTRY(compat_int80_direct_trap)
 compat_create_bounce_frame:
 ASSERT_INTERRUPTS_ENABLED
 mov   %fs,%edi
-ASM_STAC
+ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
 testb $2,UREGS_cs+8(%rsp)
 jz1f
 /* Push new frame at registered guest-OS stack base. */
@@ -331,7 +331,7 @@ compat_create_bounce_frame:
 movl  TRAPBOUNCE_error_code(%rdx),%eax
 .Lft8:  movl  %eax,%fs:(%rsi)   # ERROR CODE
 1:
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 /* Rewrite our stack frame and return to guest-OS mode. */
 /* IA32 Ref. Vol. 3: TF, VM, RF and NT flags are cleared on trap. */
 andl  $~(X86_EFLAGS_VM|X86_EFLAGS_RF|\
@@ -377,7 +377,7 @@ compat_crash_page_fault_4:
 addl  $4,%esi
 compat_crash_page_fault:
 .Lft14: mov   %edi,%fs
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 movl  %esi,%edi
 call  show_page_walk
 jmp   dom_crash_sync_extable
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -276,7 +276,7 @@ ENTRY(sysenter_entry)
 pushq $0
 pushfq
 GLOBAL(sysenter_eflags_saved)
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $3 /* ring 3 null cs */
 pushq $0 /* null rip */
 pushq $0
@@ -329,7 +329,7 @@ UNLIKELY_END(sysenter_gpf)
 jmp   .Lbounce_exception
 
 ENTRY(int80_direct_trap)
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 pushq $0
 movl  $0x80, 4(%rsp)
 SAVE_ALL
@@ -448,7 +448,7 @@ __UNLIKELY_END(create_bounce_frame_bad_s
 
 subq  $7*8,%rsi
 movq  UREGS_ss+8(%rsp),%rax
-ASM_STAC
+ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
 movq  VCPU_domain(%rbx),%rdi
 STORE_GUEST_STACK(rax,6)# SS
 movq  UREGS_rsp+8(%rsp),%rax
@@ -486,7 +486,7 @@ __UNLIKELY_END(create_bounce_frame_bad_s
 STORE_GUEST_STACK(rax,1)# R11
 movq  UREGS_rcx+8(%rsp),%rax
 STORE_GUEST_STACK(rax,0)# RCX
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 
 #undef STORE_GUEST_STACK
 
@@ -528,11 +528,11 @@ domain_crash_page_fault_2x8:
 domain_crash_page_fault_1x8:
 addq  $8,%rsi
 domain_crash_page_fault_0x8:
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 movq  %rsi,%rdi
 call  show_page_walk
 ENTRY(dom_crash_sync_extable)
-ASM_CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
 # Get out of the guest-save area of the stack.
 GET_STACK_END(ax)
 leaq  STACK_CPUINFO_FIELD(guest_cpu_user_regs)(%rax),%rsp
@@ -590,7 +590,8 @@ UNLIKELY_END(exit_cr3)
 iretq
 
 ENTRY(common_interrupt)
-SAVE_ALL CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+SAVE_ALL
 
 GET_STACK_END(14)
 
@@ -622,7 +623,8 @@ ENTRY(page_fault)
 movl  $TRAP_page_fault,4(%rsp)
 /* No special register assumptions. */
 GLOBAL(handle_exception)
-SAVE_ALL CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+SAVE_ALL
 
 GET_STACK_END(14)
 
@@ -827,7 +829,8 @@ ENTRY(entry_CP)
 ENTRY(double_fault)
 movl  $TRAP_double_fault,4(%rsp)
 /* Set AC to reduce chance of further SMAP faults */
-SAVE_ALL STAC
+ALTERNATIVE "", stac, X86_FEATURE_XEN_SMAP
+SAVE_ALL
 
 GET_STACK_END(14)
 
@@ -860,7 +863,8 @@ ENTRY(nmi)
 pushq $0
 movl  $TRAP_nmi,4(%rsp)
 handle_ist_exception:
-SAVE_ALL CLAC
+ALTERNATIVE "", clac, X86_FEATURE_XEN_SMAP
+SAVE_ALL
 
 GET_STACK_END(14)
 
--- a/xen/in

[PATCH v3 2/7] x86: reduce CET-SS related #ifdef-ary

Commit b586a81b7a90 ("x86/CET: Fix build following c/s 43b98e7190") had
to introduce a number of #ifdef-s to make the build work with older tool
chains. Introduce an assembler macro covering for tool chains not
knowing of CET-SS, allowing some conditionals where just SETSSBSY is the
problem to be dropped again.

No change to generated code.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 
---
Now that I've done this I'm no longer sure which direction is better to
follow: On one hand this introduces dead code (even if just NOPs) into
CET-SS-disabled builds. Otoh this is a step towards breaking the tool
chain version dependency of the feature.

I've also dropped conditionals around bigger chunks of code; while I
think that's preferable, I'm open to undo those parts.

--- a/xen/arch/x86/boot/x86_64.S
+++ b/xen/arch/x86/boot/x86_64.S
@@ -31,7 +31,6 @@ ENTRY(__high_start)
 jz  .L_bsp
 
 /* APs.  Set up shadow stacks before entering C. */
-#ifdef CONFIG_XEN_SHSTK
 testl   $cpufeat_mask(X86_FEATURE_XEN_SHSTK), \
 CPUINFO_FEATURE_OFFSET(X86_FEATURE_XEN_SHSTK) + 
boot_cpu_data(%rip)
 je  .L_ap_shstk_done
@@ -55,7 +54,6 @@ ENTRY(__high_start)
 mov $XEN_MINIMAL_CR4 | X86_CR4_CET, %ecx
 mov %rcx, %cr4
 setssbsy
-#endif
 
 .L_ap_shstk_done:
 callstart_secondary
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -668,7 +668,7 @@ static void __init noreturn reinit_bsp_s
 stack_base[0] = stack;
 memguard_guard_stack(stack);
 
-if ( IS_ENABLED(CONFIG_XEN_SHSTK) && cpu_has_xen_shstk )
+if ( cpu_has_xen_shstk )
 {
 wrmsrl(MSR_PL0_SSP,
(unsigned long)stack + (PRIMARY_SHSTK_SLOT + 1) * PAGE_SIZE - 
8);
--- a/xen/arch/x86/x86_64/compat/entry.S
+++ b/xen/arch/x86/x86_64/compat/entry.S
@@ -197,9 +197,7 @@ ENTRY(cr4_pv32_restore)
 
 /* See lstar_enter for entry register state. */
 ENTRY(cstar_enter)
-#ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
-#endif
 /* sti could live here when we don't switch page tables below. */
 CR4_PV32_RESTORE
 movq  8(%rsp),%rax /* Restore %rax. */
--- a/xen/arch/x86/x86_64/entry.S
+++ b/xen/arch/x86/x86_64/entry.S
@@ -236,9 +236,7 @@ iret_exit_to_guest:
  * %ss must be saved into the space left by the trampoline.
  */
 ENTRY(lstar_enter)
-#ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
-#endif
 /* sti could live here when we don't switch page tables below. */
 movq  8(%rsp),%rax /* Restore %rax. */
 movq  $FLAT_KERNEL_SS,8(%rsp)
@@ -272,9 +270,7 @@ ENTRY(lstar_enter)
 jmp   test_all_events
 
 ENTRY(sysenter_entry)
-#ifdef CONFIG_XEN_SHSTK
 ALTERNATIVE "", "setssbsy", X86_FEATURE_XEN_SHSTK
-#endif
 /* sti could live here when we don't switch page tables below. */
 pushq $FLAT_USER_SS
 pushq $0
--- a/xen/include/asm-x86/asm-defns.h
+++ b/xen/include/asm-x86/asm-defns.h
@@ -7,3 +7,9 @@
 .byte 0x0f, 0x01, 0xcb
 .endm
 #endif
+
+#ifndef CONFIG_HAS_AS_CET_SS
+.macro setssbsy
+.byte 0xf3, 0x0f, 0x01, 0xe8
+.endm
+#endif

[PATCH v3 4/7] x86: fold indirect_thunk_asm.h into asm-defns.h

There's little point in having two separate headers both getting
included by asm_defns.h. This in particular reduces the number of
instances of guarding asm(".include ...") suitably in such dual use
headers.

No change to generated code.

Signed-off-by: Jan Beulich 
Reviewed-by: Roger Pau Monné 

--- a/xen/Makefile
+++ b/xen/Makefile
@@ -139,7 +139,7 @@ ifeq ($(TARGET_ARCH),x86)
 t1 = $(call as-insn,$(CC),".L0: .L1: .skip (.L1 - .L0)",,-no-integrated-as)
 
 # Check whether clang asm()-s support .include.
-t2 = $(call as-insn,$(CC) -I$(BASEDIR)/include,".include 
\"asm-x86/indirect_thunk_asm.h\"",,-no-integrated-as)
+t2 = $(call as-insn,$(CC) -I$(BASEDIR)/include,".include 
\"asm-x86/asm-defns.h\"",,-no-integrated-as)
 
 # Check whether clang keeps .macro-s between asm()-s:
 # https://bugs.llvm.org/show_bug.cgi?id=36110
--- a/xen/include/asm-x86/asm-defns.h
+++ b/xen/include/asm-x86/asm-defns.h
@@ -13,3 +13,40 @@
 .byte 0xf3, 0x0f, 0x01, 0xe8
 .endm
 #endif
+
+.macro INDIRECT_BRANCH insn:req arg:req
+/*
+ * Create an indirect branch.  insn is one of call/jmp, arg is a single
+ * register.
+ *
+ * With no compiler support, this degrades into a plain indirect call/jmp.
+ * With compiler support, dispatch to the correct __x86_indirect_thunk_*
+ */
+.if CONFIG_INDIRECT_THUNK == 1
+
+$done = 0
+.irp reg, ax, cx, dx, bx, bp, si, di, 8, 9, 10, 11, 12, 13, 14, 15
+.ifeqs "\arg", "%r\reg"
+\insn __x86_indirect_thunk_r\reg
+$done = 1
+   .exitm
+.endif
+.endr
+
+.if $done != 1
+.error "Bad register arg \arg"
+.endif
+
+.else
+\insn *\arg
+.endif
+.endm
+
+/* Convenience wrappers. */
+.macro INDIRECT_CALL arg:req
+INDIRECT_BRANCH call \arg
+.endm
+
+.macro INDIRECT_JMP arg:req
+INDIRECT_BRANCH jmp \arg
+.endm
--- a/xen/include/asm-x86/asm_defns.h
+++ b/xen/include/asm-x86/asm_defns.h
@@ -22,7 +22,6 @@
 asm ( "\t.equ CONFIG_INDIRECT_THUNK, "
   __stringify(IS_ENABLED(CONFIG_INDIRECT_THUNK)) );
 #endif
-#include 
 
 #ifndef __ASSEMBLY__
 void ret_from_intr(void);
--- a/xen/include/asm-x86/indirect_thunk_asm.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/*
- * Trickery to allow this header to be included at the C level, to permit
- * proper dependency tracking in .*.o.d files, while still having it contain
- * assembler only macros.
- */
-#ifndef __ASSEMBLY__
-# if 0
-  .if 0
-# endif
-asm ( "\t.include \"asm/indirect_thunk_asm.h\"" );
-# if 0
-  .endif
-# endif
-#else
-
-.macro INDIRECT_BRANCH insn:req arg:req
-/*
- * Create an indirect branch.  insn is one of call/jmp, arg is a single
- * register.
- *
- * With no compiler support, this degrades into a plain indirect call/jmp.
- * With compiler support, dispatch to the correct __x86_indirect_thunk_*
- */
-.if CONFIG_INDIRECT_THUNK == 1
-
-$done = 0
-.irp reg, ax, cx, dx, bx, bp, si, di, 8, 9, 10, 11, 12, 13, 14, 15
-.ifeqs "\arg", "%r\reg"
-\insn __x86_indirect_thunk_r\reg
-$done = 1
-   .exitm
-.endif
-.endr
-
-.if $done != 1
-.error "Bad register arg \arg"
-.endif
-
-.else
-\insn *\arg
-.endif
-.endm
-
-/* Convenience wrappers. */
-.macro INDIRECT_CALL arg:req
-INDIRECT_BRANCH call \arg
-.endm
-
-.macro INDIRECT_JMP arg:req
-INDIRECT_BRANCH jmp \arg
-.endm
-
-#endif

[PATCH v3 5/7] x86: guard against straight-line speculation past RET

Under certain conditions CPUs can speculate into the instruction stream
past a RET instruction. Guard against this just like 3b7dab93f240
("x86/spec-ctrl: Protect against CALL/JMP straight-line speculation")
did - by inserting an "INT $3" insn. It's merely the mechanics of how to
achieve this that differ: A set of macros gets introduced to post-
process RET insns issued by the compiler (or living in assembly files).

Unfortunately for clang this requires further features their built-in
assembler doesn't support: We need to be able to override insn mnemonics
produced by the compiler (which may be impossible, if internally
assembly mnemonics never get generated), and we want to use \(text)
escaping / quoting in the auxiliary macro.

Signed-off-by: Jan Beulich 
Acked-by: Roger Pau Monné 
---
TBD: Would be nice to avoid the additions in .init.text, but a query to
 the binutils folks regarding the ability to identify the section
 stuff is in (by Peter Zijlstra over a year ago:
 https://sourceware.org/pipermail/binutils/2019-July/107528.html)
 has been left without helpful replies.
---
v3: Use .byte 0xc[23] instead of the nested macros.
v2: Fix build with newer clang. Use int3 mnemonic. Also override retq.

--- a/xen/Makefile
+++ b/xen/Makefile
@@ -145,7 +145,15 @@ t2 = $(call as-insn,$(CC) -I$(BASEDIR)/i
 # https://bugs.llvm.org/show_bug.cgi?id=36110
 t3 = $(call as-insn,$(CC),".macro FOO;.endm"$(close); asm volatile 
$(open)".macro FOO;.endm",-no-integrated-as)
 
-CLANG_FLAGS += $(call or,$(t1),$(t2),$(t3))
+# Check whether \(text) escaping in macro bodies is supported.
+t4 = $(call as-insn,$(CC),".macro m ret:req; \\(ret) $$\\ret; .endm; m 
8",,-no-integrated-as)
+
+# Check whether macros can override insn mnemonics in inline assembly.
+t5 = $(call as-insn,$(CC),".macro ret; .error; .endm; .macro retq; .error; 
.endm",-no-integrated-as)
+
+acc1 := $(call or,$(t1),$(t2),$(t3),$(t4))
+
+CLANG_FLAGS += $(call or,$(acc1),$(t5))
 endif
 
 CLANG_FLAGS += -Werror=unknown-warning-option
--- a/xen/include/asm-x86/asm-defns.h
+++ b/xen/include/asm-x86/asm-defns.h
@@ -50,3 +50,19 @@
 .macro INDIRECT_JMP arg:req
 INDIRECT_BRANCH jmp \arg
 .endm
+
+/*
+ * To guard against speculation past RET, insert a breakpoint insn
+ * immediately after them.
+ */
+.macro ret operand:vararg
+retq \operand
+.endm
+.macro retq operand:vararg
+.ifb \operand
+.byte 0xc3
+.else
+.byte 0xc2
+.word \operand
+.endif
+.endm

[PATCH v3 6/7] x86: limit amount of INT3 in IND_THUNK_*

There's no point having every replacement variant to also specify the
INT3 - just have it once in the base macro. When patching, NOPs will get
inserted, which are fine to speculate through (until reaching the INT3).

Signed-off-by: Jan Beulich 
Acked-by: Roger Pau Monné 
---
I also wonder whether the LFENCE in IND_THUNK_RETPOLINE couldn't be
replaced by INT3 as well. Of course the effect will be marginal, as the
size of the thunk will still be 16 bytes when including tail padding
resulting from alignment.
---
v3: Add comment.
v2: New.

--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -11,6 +11,9 @@
 
 #include 
 
+/* Don't tranform the "ret" further down. */
+.purgem ret
+
 .macro IND_THUNK_RETPOLINE reg:req
 call 2f
 1:
@@ -24,12 +27,10 @@
 .macro IND_THUNK_LFENCE reg:req
 lfence
 jmp *%\reg
-int3 /* Halt straight-line speculation */
 .endm
 
 .macro IND_THUNK_JMP reg:req
 jmp *%\reg
-int3 /* Halt straight-line speculation */
 .endm
 
 /*
@@ -44,6 +45,8 @@ ENTRY(__x86_indirect_thunk_\reg)
 __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
 __stringify(IND_THUNK_JMP \reg),X86_FEATURE_IND_THUNK_JMP
 
+int3 /* Halt straight-line speculation */
+
 .size __x86_indirect_thunk_\reg, . - __x86_indirect_thunk_\reg
 .type __x86_indirect_thunk_\reg, @function
 .endm

[PATCH v3 7/7] x86: make guarding against straight-line speculation optional

Put insertion of INT3 behind CONFIG_SPECULATIVE_HARDEN_BRANCH
conditionals.

Signed-off-by: Jan Beulich 
---
v3: New.

--- a/xen/arch/x86/indirect-thunk.S
+++ b/xen/arch/x86/indirect-thunk.S
@@ -11,8 +11,10 @@
 
 #include 
 
+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
 /* Don't transform the "ret" further down. */
 .purgem ret
+#endif
 
 .macro IND_THUNK_RETPOLINE reg:req
 call 2f
@@ -45,7 +47,9 @@ ENTRY(__x86_indirect_thunk_\reg)
 __stringify(IND_THUNK_LFENCE \reg), X86_FEATURE_IND_THUNK_LFENCE, \
 __stringify(IND_THUNK_JMP \reg),X86_FEATURE_IND_THUNK_JMP
 
+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
 int3 /* Halt straight-line speculation */
+#endif
 
 .size __x86_indirect_thunk_\reg, . - __x86_indirect_thunk_\reg
 .type __x86_indirect_thunk_\reg, @function
--- a/xen/include/asm-x86/asm-defns.h
+++ b/xen/include/asm-x86/asm-defns.h
@@ -51,6 +51,8 @@
 INDIRECT_BRANCH jmp \arg
 .endm
 
+#ifdef CONFIG_SPECULATIVE_HARDEN_BRANCH
+
 /*
  * To guard against speculation past RET, insert a breakpoint insn
  * immediately after them.
@@ -66,3 +68,5 @@
 .word \operand
 .endif
 .endm
+
+#endif

Re: [PATCH] xen/arm: Remove EXPERT dependancy


Hi Stefano,

On 22/10/2020 22:17, Stefano Stabellini wrote:

On Thu, 22 Oct 2020, Julien Grall wrote:

On 22/10/2020 02:43, Elliott Mitchell wrote:

Linux requires UEFI support to be enabled on ARM64 devices.  While many
ARM64 devices lack ACPI, the writing seems to be on the wall of UEFI/ACPI
potentially taking over.  Some common devices may need ACPI table
support.

Presently I think it is worth removing the dependancy on CONFIG_EXPERT.


The idea behind EXPERT is to gate any feature that is not considered to be
stable/complete enough to be used in production.


Yes, and from that point of view I don't think we want to remove EXPERT
from ACPI yet. However, the idea of hiding things behind EXPERT works
very well for new esoteric features, something like memory introspection
or memory overcommit.


Memaccess is not very new ;).


It does not work well for things that are actually
required to boot on the platform.


I am not sure where is the problem. It is easy to select EXPERT from the 
menuconfig. It also hints the user that the feature may not fully work.




Typically ACPI systems don't come with device tree at all (RPi4 being an
exception), so users don't really have much of a choice in the matter.


And they typically have IOMMUs.



 From that point of view, it would be better to remove EXPERT from ACPI,
maybe even build ACPI by default, *but* to add a warning at boot saying
something like:

"ACPI support is experimental. Boot using Device Tree if you can."


That would better convey the risks of using ACPI, while at the same time
making it a bit easier for users to boot on their ACPI-only platforms.


Right, I agree that this make easier for users to boot Xen on ACPI-only 
platform. However, based on above, it is easy enough for a developper to 
rebuild Xen with ACPI and EXPERT enabled.


So what sort of users are you targeting?

I am sort of okay to remove EXPERT. But I still think building ACPI by 
default is still wrong because our default .config is meant to be 
(security) supported. I don't think ACPI can earn this qualification today.


In order to remove EXPERT, there are a few things to needs to be done 
(or checked):

1) SUPPORT.MD has a statement about ACPI on Arm
2) DT is favored over ACPI if the two firmware tables are present.





I don't consider the ACPI complete because the parsing of the IORT (used to
discover SMMU and GICv3 ITS) is not there yet.

I vaguely remember some issues on system using SMMU (e.g. Thunder-X) because
Dom0 will try to use the IOMMU and this would break PV drivers.


I am not sure why Dom0 using the IOMMU would break PV drivers? Is it
because the pagetable is not properly updated when mapping foreign
pages?


IIRC, yes. This would need to be tested again.

Cheers,

--
Julien Grall

[qemu-mainline test] 156115: regressions - FAIL

flight 156115 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156115/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 152631
 build-amd64   6 xen-buildfail REGR. vs. 152631
 build-arm64   6 xen-buildfail REGR. vs. 152631
 build-arm64-xsm   6 xen-buildfail REGR. vs. 152631
 build-i3866 xen-buildfail REGR. vs. 152631
 build-i386-xsm6 xen-buildfail REGR. vs. 152631
 build-armhf   6 xen-buildfail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-inte

Re: [PATCH] xen/acpi: Don't fail if SPCR table is absent


Hi Elliott,

On 22/10/2020 20:18, Elliott Mitchell wrote:

On Thu, Oct 22, 2020 at 07:38:26PM +0100, Julien Grall wrote:
On Thu, Oct 22, 2020 at 07:44:26PM +0100, Julien Grall wrote:

Thank you for the patch. FIY I tweak a bit the commit title before
committing.

The title is now: "xen/arm: acpi: Don't fail it SPCR table is absent".


Perhaps "xen/arm: acpi: Don't fail on absent SPCR table"?

What you're suggesting doesn't read well to me.


Sorry, I made a typo when writing the title in the e-mail. Here a direct 
copy from the commit:


"xen/arm: acpi: Don't fail if SPCR table is absent"

This is pretty much your original title with "arm: " added to clarify 
the subsystem modified.


Cheers,

--
Julien Grall

Re: [PATCH v2 0/3] Add Xen CpusAccel

2020-10-23 Thread Paolo Bonzini

On 23/10/20 09:09, Thomas Huth wrote:
> On 22/10/2020 17.29, Paolo Bonzini wrote:
>> On 22/10/20 17:17, Jason Andryuk wrote:
>>> On Tue, Oct 13, 2020 at 1:16 PM Paolo Bonzini  wrote:

 On 13/10/20 16:05, Jason Andryuk wrote:
> Xen was left behind when CpusAccel became mandatory and fails the assert
> in qemu_init_vcpu().  It relied on the same dummy cpu threads as qtest.
> Move the qtest cpu functions to a common location and reuse them for
> Xen.
>
> v2:
>   New patch "accel: Remove _WIN32 ifdef from qtest-cpus.c"
>   Use accel/dummy-cpus.c for filename
>   Put prototype in include/sysemu/cpus.h
>
> Jason Andryuk (3):
>   accel: Remove _WIN32 ifdef from qtest-cpus.c
>   accel: move qtest CpusAccel functions to a common location
>   accel: Add xen CpusAccel using dummy-cpus
>
>  accel/{qtest/qtest-cpus.c => dummy-cpus.c} | 27 --
>  accel/meson.build  |  8 +++
>  accel/qtest/meson.build|  1 -
>  accel/qtest/qtest-cpus.h   | 17 --
>  accel/qtest/qtest.c|  5 +++-
>  accel/xen/xen-all.c|  8 +++
>  include/sysemu/cpus.h  |  3 +++
>  7 files changed, 27 insertions(+), 42 deletions(-)
>  rename accel/{qtest/qtest-cpus.c => dummy-cpus.c} (71%)
>  delete mode 100644 accel/qtest/qtest-cpus.h
>

 Acked-by: Paolo Bonzini 
>>>
>>> Thank you, Paolo.  Also Anthony Acked and Claudio Reviewed patch 3.
>>> How can we get this into the tree?
>>
>> I think Anthony should send a pull request?
> 
> Since Anthony acked patch 3, I think I can also take it through the qtest 
> tree.

No objections, thanks!

Paolo

Re: Xen Coding style and clang-format

2020-10-23 Thread Anastasiia Lukianenko

Hi all,

On Tue, 2020-10-20 at 18:13 +0100, Julien Grall wrote:
> Hi,
> 
> On 19/10/2020 19:07, Stefano Stabellini wrote:
> > On Fri, 16 Oct 2020, Artem Mygaiev wrote:
> > > -Original Message-
> > > From: Julien Grall 
> > > Sent: пятница, 16 октября 2020 г. 13:24
> > > To: Anastasiia Lukianenko ; 
> > > jbeul...@suse.com; george.dun...@citrix.com
> > > Cc: Artem Mygaiev ; vicooo...@gmail.com; 
> > > xen-devel@lists.xenproject.org; committ...@xenproject.org; 
> > > viktor.mitin...@gmail.com; Volodymyr Babchuk <
> > > volodymyr_babc...@epam.com>
> > > Subject: Re: Xen Coding style and clang-format
> > > 
> > > > Hi,
> > > > 
> > > > On 16/10/2020 10:42, Anastasiia Lukianenko wrote:
> > > > > Thanks for your advices, which helped me improve the checker.
> > > > > I
> > > > > understand that there are still some disagreements about the
> > > > > formatting, but as I said before, the checker cannot be very
> > > > > flexible
> > > > > and take into account all the author's ideas.
> > > > 
> > > > I am not sure what you refer by "author's ideas" here. The
> > > > checker
> > > > should follow a coding style (Xen or a modified version):
> > > >  - Anything not following the coding style should be
> > > > considered as
> > > > invalid.
> > > >  - Anything not written in the coding style should be left
> > > > untouched/uncommented by the checker.
> > > > 
> > > 
> > > Agree
> > > 
> > > > > I suggest using the
> > > > > checker not as a mandatory check, but as an indication to the
> > > > > author of
> > > > > possible formatting errors that he can correct or ignore.
> > > > 
> > > > I can understand that short term we would want to make it
> > > > optional so
> > > > either the coding style or the checker can be tuned. But I
> > > > don't think
> > > > this is an ideal situation to be in long term.
> > > > 
> > > > The goal of the checker is to automatically verify the coding
> > > > style and
> > > > get it consistent across Xen. If we make it optional or it is
> > > > "unreliable", then we lose the two benefits and possibly
> > > > increase the
> > > > contributor frustration as the checker would say A but we need
> > > > B.
> > > > 
> > > > Therefore, we need to make sure the checker and the coding
> > > > style match.
> > > > I don't have any opinions on the approach to achieve that.
> > > 
> > > Of the list of remaining issues from Anastasiia, looks like only
> > > items 5
> > > and 6 are conform to official Xen coding style. As for remainning
> > > ones,
> > > I would like to suggest disabling those that are controversial
> > > (items 1,
> > > 2, 4, 8, 9, 10). Maybe we want to have further discussion on
> > > refining
> > > coding style, we can use these as starting point. If we are open
> > > to
> > > extending style now, I would suggest to add rules that seem to be
> > > meaningful (items 3, 7) and keep them in checker.
> > 
> > Good approach. Yes, I would like to keep 3, 7 in the checker.
> > 
> > I would also keep 8 and add a small note to the coding style to say
> > that
> > comments should be aligned where possible.
> 
> +1 for this. Although, I don't mind the coding style used as long as
> we 
> have a checker and the code is consistent :).
> 
> Cheers,
> 
Thank you for advices :)
Now I'm trying to figure out the option that needs to be corrected for
the checker to work correctly:
Wrapping an operation to a new line when the string length is longer
than the allowed
-status = acpi_get_table(ACPI_SIG_SPCR, 0,
-(struct acpi_table_header **)&spcr);
+status =
+acpi_get_table(ACPI_SIG_SPCR, 0, (struct acpi_table_header
**)&spcr);
As it turned out, this case is quite rare and the rule for transferring
parameters works correctly in other cases:
-status = acpi_get_table(ACPI_SIG_SPCR, 0, &spcr, ACPI_SIG_SPC, 0,
ACPI_SIG_SP, 0);
+status = acpi_get_table(ACPI_SIG_SPCR, 0, &spcr, ACPI_SIG_SPC, 0,
+ACPI_SIG_SP, 0);
Thus the checker does not work correctly in the case when the prototype
parameter starts with a parenthesis. I'm going to ask clang community
is this behavior is expected or maybe it's a bug.

Regards,
Anastasiia

Re: [PATCH v2 10/14] kernel-doc: public/vcpu.h

On 21.10.2020 02:00, Stefano Stabellini wrote:
> @@ -140,38 +173,74 @@ struct vcpu_register_runstate_memory_area {
>  typedef struct vcpu_register_runstate_memory_area 
> vcpu_register_runstate_memory_area_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_register_runstate_memory_area_t);
>  
> -/*
> - * Set or stop a VCPU's periodic timer. Every VCPU has one periodic timer
> - * which can be set via these commands. Periods smaller than one millisecond
> - * may not be supported.
> +/**
> + * DOC: VCPUOP_set_periodic_timer
> + *
> + * Set a VCPU's periodic timer. Every VCPU has one periodic timer which
> + * can be set via this command. Periods smaller than one millisecond may
> + * not be supported.
> + *
> + * @arg == vcpu_set_periodic_timer_t
> + */
> +#define VCPUOP_set_periodic_timer6
> +/**
> + * DOC: VCPUOP_stop_periodic_timer
> + *
> + * Stop a VCPU's periodic timer.
> + *
> + * @arg == NULL
> + */
> +#define VCPUOP_stop_periodic_timer   7
> +/**
> + * struct vcpu_set_periodic_timer
>   */
> -#define VCPUOP_set_periodic_timer6 /* arg == vcpu_set_periodic_timer_t */
> -#define VCPUOP_stop_periodic_timer   7 /* arg == NULL */
>  struct vcpu_set_periodic_timer {
>  uint64_t period_ns;
>  };
>  typedef struct vcpu_set_periodic_timer vcpu_set_periodic_timer_t;
>  DEFINE_XEN_GUEST_HANDLE(vcpu_set_periodic_timer_t);
>  
> -/*
> - * Set or stop a VCPU's single-shot timer. Every VCPU has one single-shot
> - * timer which can be set via these commands.
> +/**
> + * DOC: VCPUOP_set_singleshot_timer
> + *
> + * Set a VCPU's single-shot timer. Every VCPU has one single-shot timer
> + * which can be set via this command.
> + *
> + * @arg == vcpu_set_singleshot_timer_t
> + */
> +#define VCPUOP_set_singleshot_timer  8
> +/**
> + * DOC: VCPUOP_stop_singleshot_timer
> + *
> + * Stop a VCPU's single-shot timer.
> + *
> + * arg == NULL

Judging from earlier (and later instances) - @arg?

Jan

Re: [PATCH v2 11/14] kernel-doc: public/version.h

On 21.10.2020 02:00, Stefano Stabellini wrote:
> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -30,19 +30,32 @@
>  
>  #include "xen.h"
>  
> -/* NB. All ops return zero on success, except XENVER_{version,pagesize}
> - * XENVER_{version,pagesize,build_id} */
> +/**
> + * DOC: XENVER_*
> + *
> + * NB. All ops return zero on success, except for:
> + *
> + * - XENVER_{version,pagesize,build_id}
> + */
>  
> -/* arg == NULL; returns major:minor (16:16). */
> +/**
> + * DOC: XENVER_version
> + * @arg == NULL; returns major:minor (16:16).
> + */
>  #define XENVER_version  0
>  
> -/* arg == xen_extraversion_t. */
> +/**
> + * DOC: XENVER_extraversion
> + * @arg == xen_extraversion_t.
> + */
>  #define XENVER_extraversion 1
>  typedef char xen_extraversion_t[16];
>  #define XEN_EXTRAVERSION_LEN (sizeof(xen_extraversion_t))
>  
> -/* arg == xen_compile_info_t. */
>  #define XENVER_compile_info 2
> +/**
> + * struct xen_compile_info - XENVER_compile_info
> + */

At the example of this one - elsewhere I think I've seen you also
use single-line comments starting with /**. I can accept the choice
of multi-line here, but I think I'd like ./CODING_STYLE to then be
amended to allow such in certain cases.

Jan

Re: [PATCH v2 12/14] kernel-doc: public/xen.h

On 21.10.2020 02:00, Stefano Stabellini wrote:
> --- a/xen/include/public/xen.h
> +++ b/xen/include/public/xen.h
> @@ -81,14 +81,62 @@ DEFINE_XEN_GUEST_HANDLE(xen_ulong_t);
>  
>  #endif
>  
> -/*
> - * HYPERCALLS
> - */
> -
> -/* `incontents 100 hcalls List of hypercalls
> - * ` enum hypercall_num { // __HYPERVISOR_* => HYPERVISOR_*()
> +/**
> + * DOC: HYPERCALLS
> + *
> + * List of hypercalls
> + *
> + * - __HYPERVISOR_set_trap_table
> + * - __HYPERVISOR_mmu_update
> + * - __HYPERVISOR_set_gdt
> + * - __HYPERVISOR_stack_switch
> + * - __HYPERVISOR_set_callbacks
> + * - __HYPERVISOR_fpu_taskswitch
> + * - __HYPERVISOR_sched_op_compat
> + * - __HYPERVISOR_platform_op
> + * - __HYPERVISOR_set_debugreg
> + * - __HYPERVISOR_get_debugreg
> + * - __HYPERVISOR_update_descriptor
> + * - __HYPERVISOR_memory_op
> + * - __HYPERVISOR_multicall
> + * - __HYPERVISOR_update_va_mapping
> + * - __HYPERVISOR_set_timer_op
> + * - __HYPERVISOR_event_channel_op_compat
> + * - __HYPERVISOR_xen_version
> + * - __HYPERVISOR_console_io
> + * - __HYPERVISOR_physdev_op_compat
> + * - __HYPERVISOR_grant_table_op
> + * - __HYPERVISOR_vm_assist
> + * - __HYPERVISOR_update_va_mapping_otherdomain
> + * - __HYPERVISOR_iret
> + * - __HYPERVISOR_vcpu_op
> + * - __HYPERVISOR_set_segment_base
> + * - __HYPERVISOR_mmuext_op
> + * - __HYPERVISOR_xsm_op
> + * - __HYPERVISOR_nmi_op
> + * - __HYPERVISOR_sched_op
> + * - __HYPERVISOR_callback_op
> + * - __HYPERVISOR_xenoprof_op
> + * - __HYPERVISOR_event_channel_op
> + * - __HYPERVISOR_physdev_op
> + * - __HYPERVISOR_hvm_op
> + * - __HYPERVISOR_sysctl
> + * - __HYPERVISOR_domctl
> + * - __HYPERVISOR_kexec_op
> + * - __HYPERVISOR_tmem_op
> + * - __HYPERVISOR_argo_op
> + * - __HYPERVISOR_xenpmu_op
> + * - __HYPERVISOR_dm_op
> + * - __HYPERVISOR_hypfs_op
> + * - __HYPERVISOR_arch_0
> + * - __HYPERVISOR_arch_1
> + * - __HYPERVISOR_arch_2
> + * - __HYPERVISOR_arch_3
> + * - __HYPERVISOR_arch_4
> + * - __HYPERVISOR_arch_5
> + * - __HYPERVISOR_arch_6
> + * - __HYPERVISOR_arch_7
>   */
> -
>  #define __HYPERVISOR_set_trap_table0
>  #define __HYPERVISOR_mmu_update1
>  #define __HYPERVISOR_set_gdt   2

I find this (and more similar ones below) addition of redundancy
quite unhelpful. Is there really no way at all to avoid such?

> @@ -650,34 +761,40 @@ typedef struct multicall_entry multicall_entry_t;
>  DEFINE_XEN_GUEST_HANDLE(multicall_entry_t);
>  
>  #if __XEN_INTERFACE_VERSION__ < 0x00040400
> -/*
> +/**
> + * DOC: NR_EVENT_CHANNELS
> + *
>   * Event channel endpoints per domain (when using the 2-level ABI):
>   *  1024 if a long is 32 bits; 4096 if a long is 64 bits.
>   */
>  #define NR_EVENT_CHANNELS EVTCHN_2L_NR_CHANNELS
>  #endif
>  
> +/**
> + * struct vcpu_time_info
> + *
> + * Updates to the following values are preceded and followed by an
> + * increment of 'version'. The guest can therefore detect updates by
> + * looking for changes to 'version'. If the least-significant bit of
> + * the version number is set then an update is in progress and the guest
> + * must wait to read a consistent set of values.
> + * The correct way to interact with the version number is similar to
> + * Linux's seqlock: see the implementations of read_seqbegin/read_seqretry.
> + *
> + * Current system time:
> + *   system_time +
> + *   tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul) >> 32)
> + * CPU frequency (Hz):
> + *   ((10^9 << 32) / tsc_to_system_mul) >> tsc_shift
> + */
>  struct vcpu_time_info {
> -/*
> - * Updates to the following values are preceded and followed by an
> - * increment of 'version'. The guest can therefore detect updates by
> - * looking for changes to 'version'. If the least-significant bit of
> - * the version number is set then an update is in progress and the guest
> - * must wait to read a consistent set of values.
> - * The correct way to interact with the version number is similar to
> - * Linux's seqlock: see the implementations of 
> read_seqbegin/read_seqretry.
> - */
>  uint32_t version;
>  uint32_t pad0;
> -uint64_t tsc_timestamp;   /* TSC at last update of time vals.  */
> -uint64_t system_time; /* Time, in nanosecs, since boot.*/
> -/*
> - * Current system time:
> - *   system_time +
> - *   tsc - tsc_timestamp) << tsc_shift) * tsc_to_system_mul) >> 32)
> - * CPU frequency (Hz):
> - *   ((10^9 << 32) / tsc_to_system_mul) >> tsc_shift
> - */
> +/** @tsc_timestamp: TSC at last update of time vals. */
> +uint64_t tsc_timestamp;
> +/** @system_time: Time, in nanosecs, since boot. */
> +uint64_t system_time;

At the example of this (there are more below) - why the moving of the
main comment out of the struct, when comments inside the struct are
still used and apparently serving the (doc) purpose? This is even
more seeing that you ...

> @@ -692,18 +809,23 @@ typedef struct vcpu_time_info vcpu_time_info_t;
>  #define XEN_PVCLOCK_TSC_

[PATCH v2 0/8] xen: beginnings of moving library-like code into an archive

In a few cases we link in library-like functions when they're not
actually needed. While we could use Kconfig options for each one
of them, I think the better approach for such generic code is to
build it always (thus making sure a build issue can't be introduced
for these in any however exotic configuration) and then put it into
an archive, for the linker to pick up as needed. The series here
presents a first few tiny steps towards such a goal.

Not that we can't use thin archives yet, due to our tool chain
(binutils) baseline being too low.

The first patch actually isn't directly related to the rest of the
series, except that - to avoid undue redundancy - I ran into the
issue addressed there while (originally) making patch 3 convert to
using $(call if_changed,ld) "on the fly". IOW it's a full
(contextual and functional) prereq to the series.

Further almost immediate steps I'd like to take if the approach
meets no opposition are
- split and move the rest of common/lib.c,
- split and move common/string.c, dropping the need for all the
  __HAVE_ARCH_* (implying possible per-arch archives then need to
  be specified ahead of lib/lib.a on the linker command lines),
- move common/libelf/ and common/libfdt/.

It's really only patch 7 which has changed in v7, but since no
other feedback arrived which would require adjustments, I'm
resending with just this one change.

1: lib: split _ctype[] into its own object, under lib/
2: lib: collect library files in an archive
3: lib: move list sorting code
4: lib: move parse_size_and_unit()
5: lib: move init_constructors()
6: lib: move rbtree code
7: lib: move bsearch code
8: lib: move sort code

Jan

[PATCH v2 1/8] lib: split _ctype[] into its own object, under lib/

This is, besides for tidying, in preparation of then starting to use an
archive rather than an object file for generic library code which
arch-es (or even specific configurations within a single arch) may or
may not need.

Signed-off-by: Jan Beulich 
---
 xen/Makefile |  3 ++-
 xen/Rules.mk |  2 +-
 xen/common/lib.c | 29 -
 xen/lib/Makefile |  1 +
 xen/lib/ctype.c  | 38 ++
 5 files changed, 42 insertions(+), 31 deletions(-)
 create mode 100644 xen/lib/ctype.c

diff --git a/xen/Makefile b/xen/Makefile
index bf0c804d4352..73bdc326c549 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -331,6 +331,7 @@ _clean: delete-unfresh-files
$(MAKE) $(clean) include
$(MAKE) $(clean) common
$(MAKE) $(clean) drivers
+   $(MAKE) $(clean) lib
$(MAKE) $(clean) xsm
$(MAKE) $(clean) crypto
$(MAKE) $(clean) arch/arm
@@ -414,7 +415,7 @@ include/asm-$(TARGET_ARCH)/asm-offsets.h: 
arch/$(TARGET_ARCH)/asm-offsets.s
  echo ""; \
  echo "#endif") <$< >$@
 
-SUBDIRS = xsm arch/$(TARGET_ARCH) common drivers test
+SUBDIRS = xsm arch/$(TARGET_ARCH) common drivers lib test
 define all_sources
 ( find include/asm-$(TARGET_ARCH) -name '*.h' -print; \
   find include -name 'asm-*' -prune -o -name '*.h' -print; \
diff --git a/xen/Rules.mk b/xen/Rules.mk
index 891c94e6ad00..333e19bec343 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -36,7 +36,7 @@ TARGET := $(BASEDIR)/xen
 # Note that link order matters!
 ALL_OBJS-y   += $(BASEDIR)/common/built_in.o
 ALL_OBJS-y   += $(BASEDIR)/drivers/built_in.o
-ALL_OBJS-$(CONFIG_X86)   += $(BASEDIR)/lib/built_in.o
+ALL_OBJS-y   += $(BASEDIR)/lib/built_in.o
 ALL_OBJS-y   += $(BASEDIR)/xsm/built_in.o
 ALL_OBJS-y   += $(BASEDIR)/arch/$(TARGET_ARCH)/built_in.o
 ALL_OBJS-$(CONFIG_CRYPTO)   += $(BASEDIR)/crypto/built_in.o
diff --git a/xen/common/lib.c b/xen/common/lib.c
index b2b799da44c5..a224efa8f6e8 100644
--- a/xen/common/lib.c
+++ b/xen/common/lib.c
@@ -1,37 +1,8 @@
-
-#include 
 #include 
 #include 
 #include 
 #include 
 
-/* for ctype.h */
-const unsigned char _ctype[] = {
-_C,_C,_C,_C,_C,_C,_C,_C,/* 0-7 */
-_C,_C|_S,_C|_S,_C|_S,_C|_S,_C|_S,_C,_C, /* 8-15 */
-_C,_C,_C,_C,_C,_C,_C,_C,/* 16-23 */
-_C,_C,_C,_C,_C,_C,_C,_C,/* 24-31 */
-_S|_SP,_P,_P,_P,_P,_P,_P,_P,/* 32-39 */
-_P,_P,_P,_P,_P,_P,_P,_P,/* 40-47 */
-_D,_D,_D,_D,_D,_D,_D,_D,/* 48-55 */
-_D,_D,_P,_P,_P,_P,_P,_P,/* 56-63 */
-_P,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U,  /* 64-71 */
-_U,_U,_U,_U,_U,_U,_U,_U,/* 72-79 */
-_U,_U,_U,_U,_U,_U,_U,_U,/* 80-87 */
-_U,_U,_U,_P,_P,_P,_P,_P,/* 88-95 */
-_P,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L|_X,_L,  /* 96-103 */
-_L,_L,_L,_L,_L,_L,_L,_L,/* 104-111 */
-_L,_L,_L,_L,_L,_L,_L,_L,/* 112-119 */
-_L,_L,_L,_P,_P,_P,_P,_C,/* 120-127 */
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,/* 128-143 */
-0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,/* 144-159 */
-_S|_SP,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,   /* 160-175 */
-_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,_P,   /* 176-191 */
-_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,_U,   /* 192-207 */
-_U,_U,_U,_U,_U,_U,_U,_P,_U,_U,_U,_U,_U,_U,_U,_L,   /* 208-223 */
-_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,_L,   /* 224-239 */
-_L,_L,_L,_L,_L,_L,_L,_P,_L,_L,_L,_L,_L,_L,_L,_L};  /* 240-255 */
-
 /*
  * A couple of 64 bit operations ported from FreeBSD.
  * The code within the '#if BITS_PER_LONG == 32' block below, and no other
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index 7019ca00e8fd..53b1da025e0d 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -1 +1,2 @@
+obj-y += ctype.o
 obj-$(CONFIG_X86) += x86/
diff --git a/xen/lib/ctype.c b/xen/lib/ctype.c
new file mode 100644
index ..7b233a335fdf
--- /dev/null
+++ b/xen/lib/ctype.c
@@ -0,0 +1,38 @@
+#include 
+
+/* for ctype.h */
+const unsigned char _ctype[] = {
+_C,_C,_C,_C,_C,_C,_C,_C,/* 0-7 */
+_C,_C|_S,_C|_S,_C|_S,_C|_S,_C|_S,_C,_C, /* 8-15 */
+_C,_C,_C,_C,_C,_C,_C,_C,/* 16-23 */
+_C,_C,_C,_C,_C,_C,_C,_C,/* 24-31 */
+_S|_SP,_P,_P,_P,_P,_P,_P,_P,/* 32-39 */
+_P,_P,_P,_P,_P,_P,_P,_P,/* 40-47 */
+_D,_D,_D,_D,_D,_D,_D,_D,/* 48-55 */
+_D,_D,_P,_P,_P,_P,_P,_P,/* 56-63 */
+_P,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U|_X,_U,  /* 64-71 */
+_U,_U,_U,_U,_U,_U,_U,_U,

[PATCH v2 2/8] lib: collect library files in an archive

In order to (subsequently) drop odd things like CONFIG_NEEDS_LIST_SORT
just to avoid bloating binaries when only some arch-es and/or
configurations need generic library routines, combine objects under lib/
into an archive, which the linker then can pick the necessary objects
out of.

Note that we can't use thin archives just yet, until we've raised the
minimum required binutils version suitably.

Signed-off-by: Jan Beulich 
---
 xen/Rules.mk  | 33 +++--
 xen/arch/arm/Makefile |  6 +++---
 xen/arch/x86/Makefile |  8 
 xen/lib/Makefile  |  3 ++-
 4 files changed, 36 insertions(+), 14 deletions(-)

diff --git a/xen/Rules.mk b/xen/Rules.mk
index 333e19bec343..e59c7f213f77 100644
--- a/xen/Rules.mk
+++ b/xen/Rules.mk
@@ -41,12 +41,16 @@ ALL_OBJS-y   += $(BASEDIR)/xsm/built_in.o
 ALL_OBJS-y   += $(BASEDIR)/arch/$(TARGET_ARCH)/built_in.o
 ALL_OBJS-$(CONFIG_CRYPTO)   += $(BASEDIR)/crypto/built_in.o
 
+ALL_LIBS-y   := $(BASEDIR)/lib/lib.a
+
 # Initialise some variables
+lib-y :=
 targets :=
 CFLAGS-y :=
 AFLAGS-y :=
 
 ALL_OBJS := $(ALL_OBJS-y)
+ALL_LIBS := $(ALL_LIBS-y)
 
 SPECIAL_DATA_SECTIONS := rodata $(foreach a,1 2 4 8 16, \
 $(foreach w,1 2 4, \
@@ -60,7 +64,14 @@ include Makefile
 # ---
 
 quiet_cmd_ld = LD  $@
-cmd_ld = $(LD) $(XEN_LDFLAGS) -r -o $@ $(real-prereqs)
+cmd_ld = $(LD) $(XEN_LDFLAGS) -r -o $@ $(filter-out %.a,$(real-prereqs)) \
+   --start-group $(filter %.a,$(real-prereqs)) --end-group
+
+# Archive
+# ---
+
+quiet_cmd_ar = AR  $@
+cmd_ar = rm -f $@; $(AR) cPrs $@ $(real-prereqs)
 
 # Objcopy
 # ---
@@ -86,6 +97,10 @@ obj-y:= $(patsubst %/, %/built_in.o, $(obj-y))
 # tell kbuild to descend
 subdir-obj-y := $(filter %/built_in.o, $(obj-y))
 
+# Libraries are always collected in one lib file.
+# Filter out objects already built-in
+lib-y := $(filter-out $(obj-y), $(sort $(lib-y)))
+
 $(filter %.init.o,$(obj-y) $(obj-bin-y) $(extra-y)): CFLAGS-y += 
-DINIT_SECTIONS_ONLY
 
 ifeq ($(CONFIG_COVERAGE),y)
@@ -129,19 +144,25 @@ include $(BASEDIR)/arch/$(TARGET_ARCH)/Rules.mk
 c_flags += $(CFLAGS-y)
 a_flags += $(CFLAGS-y) $(AFLAGS-y)
 
-built_in.o: $(obj-y) $(extra-y)
+built_in.o: $(obj-y) $(if $(strip $(lib-y)),lib.a) $(extra-y)
 ifeq ($(obj-y),)
$(CC) $(c_flags) -c -x c /dev/null -o $@
 else
 ifeq ($(CONFIG_LTO),y)
-   $(LD_LTO) -r -o $@ $(filter-out $(extra-y),$^)
+   $(LD_LTO) -r -o $@ $(filter-out lib.a $(extra-y),$^)
 else
-   $(LD) $(XEN_LDFLAGS) -r -o $@ $(filter-out $(extra-y),$^)
+   $(LD) $(XEN_LDFLAGS) -r -o $@ $(filter-out lib.a $(extra-y),$^)
 endif
 endif
 
+lib.a: $(lib-y) FORCE
+   $(call if_changed,ar)
+
 targets += built_in.o
-targets += $(filter-out $(subdir-obj-y), $(obj-y)) $(extra-y)
+ifneq ($(strip $(lib-y)),)
+targets += lib.a
+endif
+targets += $(filter-out $(subdir-obj-y), $(obj-y) $(lib-y)) $(extra-y)
 targets += $(MAKECMDGOALS)
 
 built_in_bin.o: $(obj-bin-y) $(extra-y)
@@ -155,7 +176,7 @@ endif
 PHONY += FORCE
 FORCE:
 
-%/built_in.o: FORCE
+%/built_in.o %/lib.a: FORCE
$(MAKE) -f $(BASEDIR)/Rules.mk -C $* built_in.o
 
 %/built_in_bin.o: FORCE
diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index 296c5e68bbc3..612a83b315c8 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -90,14 +90,14 @@ endif
 
 ifeq ($(CONFIG_LTO),y)
 # Gather all LTO objects together
-prelink_lto.o: $(ALL_OBJS)
-   $(LD_LTO) -r -o $@ $^
+prelink_lto.o: $(ALL_OBJS) $(ALL_LIBS)
+   $(LD_LTO) -r -o $@ $(filter-out %.a,$^) --start-group $(filter %.a,$^) 
--end-group
 
 # Link it with all the binary objects
 prelink.o: $(patsubst %/built_in.o,%/built_in_bin.o,$(ALL_OBJS)) prelink_lto.o
$(call if_changed,ld)
 else
-prelink.o: $(ALL_OBJS) FORCE
+prelink.o: $(ALL_OBJS) $(ALL_LIBS) FORCE
$(call if_changed,ld)
 endif
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 9b368632fb43..8f2180485b2b 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -132,8 +132,8 @@ EFI_OBJS-$(XEN_BUILD_EFI) := efi/relocs-dummy.o
 
 ifeq ($(CONFIG_LTO),y)
 # Gather all LTO objects together
-prelink_lto.o: $(ALL_OBJS)
-   $(LD_LTO) -r -o $@ $^
+prelink_lto.o: $(ALL_OBJS) $(ALL_LIBS)
+   $(LD_LTO) -r -o $@ $(filter-out %.a,$^) --start-group $(filter %.a,$^) 
--end-group
 
 # Link it with all the binary objects
 prelink.o: $(patsubst %/built_in.o,%/built_in_bin.o,$(ALL_OBJS)) prelink_lto.o 
$(EFI_OBJS-y) FORCE
@@ -142,10 +142,10 @@ prelink.o: $(patsubst 
%/built_in.o,%/built_in_bin.o,$(ALL_OBJS)) prelink_lto.o $
 prelink-efi.o: $(patsubst %/built_in.o,%/built_in_bin.o,$(ALL_OBJS)) 
prelink_lto.o FORCE
$(call if_changed,ld)
 else
-prelink.o: $(ALL_

[PATCH v2 3/8] lib: move list sorting code

Build the source file always, as by putting it into an archive it still
won't be linked into final binaries when not needed. This way possible
build breakage will be easier to notice, and it's more consistent with
us unconditionally building other library kind of code (e.g. sort() or
bsearch()).

While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL().

Signed-off-by: Jan Beulich 

Build the source file always, as by putting it into an archive it still
won't be linked into final binaries when not needed. This way possible
build breakage will be easier to notice, and it's more consistent with
us unconditionally building other library kind of code (e.g. sort() or
bsearch()).

While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL().

Signed-off-by: Jan Beulich 
---
 xen/arch/arm/Kconfig| 4 +---
 xen/common/Kconfig  | 3 ---
 xen/common/Makefile | 1 -
 xen/lib/Makefile| 1 +
 xen/{common/list_sort.c => lib/list-sort.c} | 2 --
 5 files changed, 2 insertions(+), 9 deletions(-)
 rename xen/{common/list_sort.c => lib/list-sort.c} (98%)

diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
index 277738826581..cb7e2523b6de 100644
--- a/xen/arch/arm/Kconfig
+++ b/xen/arch/arm/Kconfig
@@ -56,9 +56,7 @@ config HVM
 def_bool y
 
 config NEW_VGIC
-   bool
-   prompt "Use new VGIC implementation"
-   select NEEDS_LIST_SORT
+   bool "Use new VGIC implementation"
---help---
 
This is an alternative implementation of the ARM GIC interrupt
diff --git a/xen/common/Kconfig b/xen/common/Kconfig
index 3e2cf2508899..0661328a99e7 100644
--- a/xen/common/Kconfig
+++ b/xen/common/Kconfig
@@ -66,9 +66,6 @@ config MEM_ACCESS
 config NEEDS_LIBELF
bool
 
-config NEEDS_LIST_SORT
-   bool
-
 menu "Speculative hardening"
 
 config SPECULATIVE_HARDEN_ARRAY
diff --git a/xen/common/Makefile b/xen/common/Makefile
index 083f62acb634..52d3c2aa9384 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -21,7 +21,6 @@ obj-y += keyhandler.o
 obj-$(CONFIG_KEXEC) += kexec.o
 obj-$(CONFIG_KEXEC) += kimage.o
 obj-y += lib.o
-obj-$(CONFIG_NEEDS_LIST_SORT) += list_sort.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o livepatch_elf.o
 obj-$(CONFIG_MEM_ACCESS) += mem_access.o
 obj-y += memory.o
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index b8814361d63e..764f3624b5f9 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_X86) += x86/
 
 lib-y += ctype.o
+lib-y += list-sort.o
diff --git a/xen/common/list_sort.c b/xen/lib/list-sort.c
similarity index 98%
rename from xen/common/list_sort.c
rename to xen/lib/list-sort.c
index af2b2f6519f1..f8d8bbf28178 100644
--- a/xen/common/list_sort.c
+++ b/xen/lib/list-sort.c
@@ -15,7 +15,6 @@
  * this program; If not, see .
  */
 
-#include 
 #include 
 
 #define MAX_LIST_LENGTH_BITS 20
@@ -154,4 +153,3 @@ void list_sort(void *priv, struct list_head *head,
 
merge_and_restore_back_links(priv, cmp, head, part[max_lev], list);
 }
-EXPORT_SYMBOL(list_sort);
-- 
2.22.0

[PATCH v2 5/8] lib: move init_constructors()

... into its own CU, for being unrelated to other things in
common/lib.c. For now it gets compiled into built_in.o rather than
lib.a, as it gets used unconditionally by Arm's as well as x86'es
{,__}start_xen(). But this could be changed in principle, the more that
there typically aren't any constructors anyway. Then again it's just
__init code anyway.

Signed-off-by: Jan Beulich 
---
 xen/common/lib.c | 14 --
 xen/lib/Makefile |  1 +
 xen/lib/ctors.c  | 25 +
 3 files changed, 26 insertions(+), 14 deletions(-)
 create mode 100644 xen/lib/ctors.c

diff --git a/xen/common/lib.c b/xen/common/lib.c
index 6cfa332142a5..f5ca179a0af4 100644
--- a/xen/common/lib.c
+++ b/xen/common/lib.c
@@ -1,6 +1,5 @@
 #include 
 #include 
-#include 
 #include 
 
 /*
@@ -423,19 +422,6 @@ uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 #endif
 }
 
-typedef void (*ctor_func_t)(void);
-extern const ctor_func_t __ctors_start[], __ctors_end[];
-
-void __init init_constructors(void)
-{
-const ctor_func_t *f;
-for ( f = __ctors_start; f < __ctors_end; ++f )
-(*f)();
-
-/* Putting this here seems as good (or bad) as any other place. */
-BUILD_BUG_ON(sizeof(size_t) != sizeof(ssize_t));
-}
-
 /*
  * Local variables:
  * mode: C
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index 99f857540c99..ba1fb7bcdee2 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -1,3 +1,4 @@
+obj-y += ctors.o
 obj-$(CONFIG_X86) += x86/
 
 lib-y += ctype.o
diff --git a/xen/lib/ctors.c b/xen/lib/ctors.c
new file mode 100644
index ..5bdc591cd50a
--- /dev/null
+++ b/xen/lib/ctors.c
@@ -0,0 +1,25 @@
+#include 
+#include 
+
+typedef void (*ctor_func_t)(void);
+extern const ctor_func_t __ctors_start[], __ctors_end[];
+
+void __init init_constructors(void)
+{
+const ctor_func_t *f;
+for ( f = __ctors_start; f < __ctors_end; ++f )
+(*f)();
+
+/* Putting this here seems as good (or bad) as any other place. */
+BUILD_BUG_ON(sizeof(size_t) != sizeof(ssize_t));
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.22.0

[PATCH v2 4/8] lib: move parse_size_and_unit()

... into its own CU, to build it into an archive.

Signed-off-by: Jan Beulich 

... into its own CU, to build it into an archive.

Signed-off-by: Jan Beulich 
---
 xen/common/lib.c | 39 --
 xen/lib/Makefile |  1 +
 xen/lib/parse-size.c | 50 
 3 files changed, 51 insertions(+), 39 deletions(-)
 create mode 100644 xen/lib/parse-size.c

diff --git a/xen/common/lib.c b/xen/common/lib.c
index a224efa8f6e8..6cfa332142a5 100644
--- a/xen/common/lib.c
+++ b/xen/common/lib.c
@@ -423,45 +423,6 @@ uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
 #endif
 }
 
-unsigned long long parse_size_and_unit(const char *s, const char **ps)
-{
-unsigned long long ret;
-const char *s1;
-
-ret = simple_strtoull(s, &s1, 0);
-
-switch ( *s1 )
-{
-case 'T': case 't':
-ret <<= 10;
-/* fallthrough */
-case 'G': case 'g':
-ret <<= 10;
-/* fallthrough */
-case 'M': case 'm':
-ret <<= 10;
-/* fallthrough */
-case 'K': case 'k':
-ret <<= 10;
-/* fallthrough */
-case 'B': case 'b':
-s1++;
-break;
-case '%':
-if ( ps )
-break;
-/* fallthrough */
-default:
-ret <<= 10; /* default to kB */
-break;
-}
-
-if ( ps != NULL )
-*ps = s1;
-
-return ret;
-}
-
 typedef void (*ctor_func_t)(void);
 extern const ctor_func_t __ctors_start[], __ctors_end[];
 
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index 764f3624b5f9..99f857540c99 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_X86) += x86/
 
 lib-y += ctype.o
 lib-y += list-sort.o
+lib-y += parse-size.o
diff --git a/xen/lib/parse-size.c b/xen/lib/parse-size.c
new file mode 100644
index ..ec980cadfff3
--- /dev/null
+++ b/xen/lib/parse-size.c
@@ -0,0 +1,50 @@
+#include 
+
+unsigned long long parse_size_and_unit(const char *s, const char **ps)
+{
+unsigned long long ret;
+const char *s1;
+
+ret = simple_strtoull(s, &s1, 0);
+
+switch ( *s1 )
+{
+case 'T': case 't':
+ret <<= 10;
+/* fallthrough */
+case 'G': case 'g':
+ret <<= 10;
+/* fallthrough */
+case 'M': case 'm':
+ret <<= 10;
+/* fallthrough */
+case 'K': case 'k':
+ret <<= 10;
+/* fallthrough */
+case 'B': case 'b':
+s1++;
+break;
+case '%':
+if ( ps )
+break;
+/* fallthrough */
+default:
+ret <<= 10; /* default to kB */
+break;
+}
+
+if ( ps != NULL )
+*ps = s1;
+
+return ret;
+}
+
+/*
+ * Local variables:
+ * mode: C
+ * c-file-style: "BSD"
+ * c-basic-offset: 4
+ * tab-width: 4
+ * indent-tabs-mode: nil
+ * End:
+ */
-- 
2.22.0

[PATCH v2 6/8] lib: move rbtree code

Build this code into an archive, which results in not linking it into
x86 final binaries. This saves about 1.5k of dead code.

While moving the source file, take the opportunity and drop the
pointless EXPORT_SYMBOL().

Signed-off-by: Jan Beulich 
---
 xen/common/Makefile  | 1 -
 xen/lib/Makefile | 1 +
 xen/{common => lib}/rbtree.c | 9 +
 3 files changed, 2 insertions(+), 9 deletions(-)
 rename xen/{common => lib}/rbtree.c (98%)

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 52d3c2aa9384..7bb779f780a1 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -33,7 +33,6 @@ obj-y += preempt.o
 obj-y += random.o
 obj-y += rangeset.o
 obj-y += radix-tree.o
-obj-y += rbtree.o
 obj-y += rcupdate.o
 obj-y += rwlock.o
 obj-y += shutdown.o
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index ba1fb7bcdee2..b469d2dff7b8 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -4,3 +4,4 @@ obj-$(CONFIG_X86) += x86/
 lib-y += ctype.o
 lib-y += list-sort.o
 lib-y += parse-size.o
+lib-y += rbtree.o
diff --git a/xen/common/rbtree.c b/xen/lib/rbtree.c
similarity index 98%
rename from xen/common/rbtree.c
rename to xen/lib/rbtree.c
index 9f5498a89d4e..95e045d52461 100644
--- a/xen/common/rbtree.c
+++ b/xen/lib/rbtree.c
@@ -25,7 +25,7 @@
 #include 
 
 /*
- * red-black trees properties:  http://en.wikipedia.org/wiki/Rbtree 
+ * red-black trees properties:  http://en.wikipedia.org/wiki/Rbtree
  *
  *  1) A node is either red or black
  *  2) The root is black
@@ -223,7 +223,6 @@ void rb_insert_color(struct rb_node *node, struct rb_root 
*root)
}
}
 }
-EXPORT_SYMBOL(rb_insert_color);
 
 static void __rb_erase_color(struct rb_node *parent, struct rb_root *root)
 {
@@ -467,7 +466,6 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
if (rebalance)
__rb_erase_color(rebalance, root);
 }
-EXPORT_SYMBOL(rb_erase);
 
 /*
  * This function returns the first node (in sort order) of the tree.
@@ -483,7 +481,6 @@ struct rb_node *rb_first(const struct rb_root *root)
n = n->rb_left;
return n;
 }
-EXPORT_SYMBOL(rb_first);
 
 struct rb_node *rb_last(const struct rb_root *root)
 {
@@ -496,7 +493,6 @@ struct rb_node *rb_last(const struct rb_root *root)
n = n->rb_right;
return n;
 }
-EXPORT_SYMBOL(rb_last);
 
 struct rb_node *rb_next(const struct rb_node *node)
 {
@@ -528,7 +524,6 @@ struct rb_node *rb_next(const struct rb_node *node)
 
return parent;
 }
-EXPORT_SYMBOL(rb_next);
 
 struct rb_node *rb_prev(const struct rb_node *node)
 {
@@ -557,7 +552,6 @@ struct rb_node *rb_prev(const struct rb_node *node)
 
return parent;
 }
-EXPORT_SYMBOL(rb_prev);
 
 void rb_replace_node(struct rb_node *victim, struct rb_node *new,
 struct rb_root *root)
@@ -574,4 +568,3 @@ void rb_replace_node(struct rb_node *victim, struct rb_node 
*new,
/* Copy the pointers/colour from the victim to the replacement */
*new = *victim;
 }
-EXPORT_SYMBOL(rb_replace_node);
-- 
2.22.0

[PATCH v2 8/8] lib: move sort code

Build this code into an archive, partly paralleling bsearch().

Signed-off-by: Jan Beulich 
---
 xen/common/Makefile| 1 -
 xen/lib/Makefile   | 1 +
 xen/{common => lib}/sort.c | 0
 3 files changed, 1 insertion(+), 1 deletion(-)
 rename xen/{common => lib}/sort.c (100%)

diff --git a/xen/common/Makefile b/xen/common/Makefile
index d8519a2cc163..90c679958965 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -36,7 +36,6 @@ obj-y += rcupdate.o
 obj-y += rwlock.o
 obj-y += shutdown.o
 obj-y += softirq.o
-obj-y += sort.o
 obj-y += smp.o
 obj-y += spinlock.o
 obj-y += stop_machine.o
diff --git a/xen/lib/Makefile b/xen/lib/Makefile
index 122eeb3d327b..33ff322b1655 100644
--- a/xen/lib/Makefile
+++ b/xen/lib/Makefile
@@ -6,3 +6,4 @@ lib-y += ctype.o
 lib-y += list-sort.o
 lib-y += parse-size.o
 lib-y += rbtree.o
+lib-y += sort.o
diff --git a/xen/common/sort.c b/xen/lib/sort.c
similarity index 100%
rename from xen/common/sort.c
rename to xen/lib/sort.c
-- 
2.22.0

[PATCH v2 7/8] lib: move bsearch code

Convert this code to an inline function (backed by an instance in an
archive in case the compiler decides against inlining), which results
in not having it in x86 final binaries. This saves a little bit of dead
code.

Signed-off-by: Jan Beulich 
---
v2: Make the function an extern inline in its header.
---
 xen/common/Makefile|  1 -
 xen/common/bsearch.c   | 51 --
 xen/include/xen/compiler.h |  1 +
 xen/include/xen/lib.h  | 42 ++-
 xen/lib/Makefile   |  1 +
 xen/lib/bsearch.c  | 13 ++
 6 files changed, 56 insertions(+), 53 deletions(-)
 delete mode 100644 xen/common/bsearch.c
 create mode 100644 xen/lib/bsearch.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 7bb779f780a1..d8519a2cc163 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -1,6 +1,5 @@
 obj-$(CONFIG_ARGO) += argo.o
 obj-y += bitmap.o
-obj-y += bsearch.o
 obj-$(CONFIG_HYPFS_CONFIG) += config_data.o
 obj-$(CONFIG_CORE_PARKING) += core_parking.o
 obj-y += cpu.o
diff --git a/xen/common/bsearch.c b/xen/common/bsearch.c
deleted file mode 100644
index 7090930aab5c..
--- a/xen/common/bsearch.c
+++ /dev/null
@@ -1,51 +0,0 @@
-/*
- * A generic implementation of binary search for the Linux kernel
- *
- * Copyright (C) 2008-2009 Ksplice, Inc.
- * Author: Tim Abbott 
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License as
- * published by the Free Software Foundation; version 2.
- */
-
-#include 
-
-/*
- * bsearch - binary search an array of elements
- * @key: pointer to item being searched for
- * @base: pointer to first element to search
- * @num: number of elements
- * @size: size of each element
- * @cmp: pointer to comparison function
- *
- * This function does a binary search on the given array.  The
- * contents of the array should already be in ascending sorted order
- * under the provided comparison function.
- *
- * Note that the key need not have the same type as the elements in
- * the array, e.g. key could be a string and the comparison function
- * could compare the string with the struct's name field.  However, if
- * the key and elements in the array are of the same type, you can use
- * the same comparison function for both sort() and bsearch().
- */
-void *bsearch(const void *key, const void *base, size_t num, size_t size,
- int (*cmp)(const void *key, const void *elt))
-{
-   size_t start = 0, end = num;
-   int result;
-
-   while (start < end) {
-   size_t mid = start + (end - start) / 2;
-
-   result = cmp(key, base + mid * size);
-   if (result < 0)
-   end = mid;
-   else if (result > 0)
-   start = mid + 1;
-   else
-   return (void *)base + mid * size;
-   }
-
-   return NULL;
-}
diff --git a/xen/include/xen/compiler.h b/xen/include/xen/compiler.h
index c0e0ee9f27be..2b7acdf3b188 100644
--- a/xen/include/xen/compiler.h
+++ b/xen/include/xen/compiler.h
@@ -12,6 +12,7 @@
 
 #define inline__inline__
 #define always_inline __inline__ __attribute__ ((__always_inline__))
+#define gnu_inline__inline__ __attribute__ ((__gnu_inline__))
 #define noinline  __attribute__((__noinline__))
 
 #define noreturn  __attribute__((__noreturn__))
diff --git a/xen/include/xen/lib.h b/xen/include/xen/lib.h
index 076bcfb67dbb..940d23755661 100644
--- a/xen/include/xen/lib.h
+++ b/xen/include/xen/lib.h
@@ -192,8 +192,48 @@ void dump_execstate(struct cpu_user_regs *);
 
 void init_constructors(void);
 
+/*
+ * bsearch - binary search an array of elements
+ * @key: pointer to item being searched for
+ * @base: pointer to first element to search
+ * @num: number of elements
+ * @size: size of each element
+ * @cmp: pointer to comparison function
+ *
+ * This function does a binary search on the given array.  The
+ * contents of the array should already be in ascending sorted order
+ * under the provided comparison function.
+ *
+ * Note that the key need not have the same type as the elements in
+ * the array, e.g. key could be a string and the comparison function
+ * could compare the string with the struct's name field.  However, if
+ * the key and elements in the array are of the same type, you can use
+ * the same comparison function for both sort() and bsearch().
+ */
+#ifndef BSEARCH_IMPLEMENTATION
+extern gnu_inline
+#endif
 void *bsearch(const void *key, const void *base, size_t num, size_t size,
-  int (*cmp)(const void *key, const void *elt));
+  int (*cmp)(const void *key, const void *elt))
+{
+size_t start = 0, end = num;
+int result;
+
+while ( start < end )
+{
+size_t mid = start + (end - start) / 2;
+
+result = cmp(key, base + mid * size);
+if ( result < 0 )
+end = mid;
+els

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-23 Thread Rahul Singh

Hello,

> On 23 Oct 2020, at 1:02 am, Stefano Stabellini  wrote:
> 
> On Thu, 22 Oct 2020, Julien Grall wrote:
 On 20/10/2020 16:25, Rahul Singh wrote:
> Add support for ARM architected SMMUv3 implementations. It is based on
> the Linux SMMUv3 driver.
> Major differences between the Linux driver are as follows:
> 1. Only Stage-2 translation is supported as compared to the Linux driver
>that supports both Stage-1 and Stage-2 translations.
> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>capability to share the page tables with the CPU.
> 3. Tasklets is used in place of threaded IRQ's in Linux for event queue
>and priority queue IRQ handling.
 
 Tasklets are not a replacement for threaded IRQ. In particular, they will
 have priority over anything else (IOW nothing will run on the pCPU until
 they are done).
 
 Do you know why Linux is using thread. Is it because of long running
 operations?
>>> 
>>> Yes you are right because of long running operations Linux is using the
>>> threaded IRQs.
>>> 
>>> SMMUv3 reports fault/events bases on memory-based circular buffer queues not
>>> based on the register. As per my understanding, it is time-consuming to
>>> process the memory based queues in interrupt context because of that Linux
>>> is using threaded IRQ to process the faults/events from SMMU.
>>> 
>>> I didn’t find any other solution in XEN in place of tasklet to defer the
>>> work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
>>> we do all work in interrupt context we will make XEN less responsive.
>> 
>> So we need to make sure that Xen continue to receives interrupts, but we also
>> need to make sure that a vCPU bound to the pCPU is also responsive.
>> 
>>> 
>>> If you know another solution in XEN that will be used to defer the work in
>>> the interrupt please let me know I will try to use that.
>> 
>> One of my work colleague encountered a similar problem recently. He had a 
>> long
>> running tasklet and wanted to be broken down in smaller chunk.
>> 
>> We decided to use a timer to reschedule the taslket in the future. This 
>> allows
>> the scheduler to run other loads (e.g. vCPU) for some time.
>> 
>> This is pretty hackish but I couldn't find a better solution as tasklet have
>> high priority.
>> 
>> Maybe the other will have a better idea.
> 
> Julien's suggestion is a good one.
> 
> But I think tasklets can be configured to be called from the idle_loop,
> in which case they are not run in interrupt context?
> 

 Yes you are right tasklet will be scheduled from the idle_loop that is not 
interrupt conext.

> Still, tasklets run until completion in Xen, which could take too long.
> The code has to voluntarily release control of the execution flow once
> it realizes it has been running for too long. The rescheduling via a
> timer works.
> 
> 
> Now, to brainstorm other possible alternatives, for hypercalls we have
> been using hypercall continuations.  Continuations is a way to break a
> hypercall implementation that takes too long into multiple execution
> chunks. It works by calling into itself again: making the same hypercall
> again with updated arguments, so that the scheduler has a chance to do
> other operations in between, including running other tasklets and
> softirqs.
> 
> That works well because  the source of the work is a guest request,
> specifically a hypercall. However, in the case of the SMMU driver, there
> is no hypercall. The Xen driver has to do work in response to an
> interrupt and the work is not tied to one particular domain.
> 
> So I don't think the hypercall continuation model could work here. The
> timer seems to be the best option.
> 

Yes, I agree with you as the source of the work is not a guest request in the 
case of SMMU I think we can not use the hyper call continuation.

As suggested I will try to use the timer to schedule the work and will share 
the findings.
> 
> 
> 4. Latest version of the Linux SMMUv3 code implements the commands queue
>access functions based on atomic operations implemented in Linux.
 
 Can you provide more details?
>>> 
>>> I tried to port the latest version of the SMMUv3 code than I observed that
>>> in order to port that code I have to also port atomic operation implemented
>>> in Linux to XEN. As latest Linux code uses atomic operation to process the
>>> command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
>>> atomic_fetch_andnot_relaxed()) .
>> 
>> Thank you for the explanation. I think it would be best to import the atomic
>> helpers and use the latest code.
>> 
>> This will ensure that we don't re-introduce bugs and also buy us some time
>> before the Linux and Xen driver diverge again too much.
>> 
>> Stefano, what do you think?
> 
> I think you are right.

Yes, I agree with you to have XEN code in sync with Linux code that's why I 
started with to port the Linux atomic op

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-23 Thread Rahul Singh

Hello Julien,

> On 22 Oct 2020, at 9:32 am, Julien Grall  wrote:
> 
> 
> 
> On 21/10/2020 12:25, Rahul Singh wrote:
>> Hello Julien,
> 
> Hi Rahul,
> 
>>> On 20 Oct 2020, at 6:03 pm, Julien Grall  wrote:
>>> 
>>> Hi Rahul,
>>> 
>>> Thank you for the contribution. Lets make sure this attempt to SMMUv3 
>>> support in Xen will be more successful than the other one :).
>> Yes sure.
>>> 
>>> I haven't reviewed the code yet, but I wanted to provide feedback on the 
>>> commit message.
>>> 
>>> On 20/10/2020 16:25, Rahul Singh wrote:
 Add support for ARM architected SMMUv3 implementations. It is based on
 the Linux SMMUv3 driver.
 Major differences between the Linux driver are as follows:
 1. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
 2. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
 3. Tasklets is used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
>>> 
>>> Tasklets are not a replacement for threaded IRQ. In particular, they will 
>>> have priority over anything else (IOW nothing will run on the pCPU until 
>>> they are done).
>>> 
>>> Do you know why Linux is using thread. Is it because of long running 
>>> operations?
>> Yes you are right because of long running operations Linux is using the 
>> threaded IRQs.
>> SMMUv3 reports fault/events bases on memory-based circular buffer queues not 
>> based on the register. As per my understanding, it is time-consuming to 
>> process the memory based queues in interrupt context because of that Linux 
>> is using threaded IRQ to process the faults/events from SMMU.
>> I didn’t find any other solution in XEN in place of tasklet to defer the 
>> work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If 
>> we do all work in interrupt context we will make XEN less responsive.
> 
> So we need to make sure that Xen continue to receives interrupts, but we also 
> need to make sure that a vCPU bound to the pCPU is also responsive.

Yes I agree.
> 
>> If you know another solution in XEN that will be used to defer the work in 
>> the interrupt please let me know I will try to use that.
> 
> One of my work colleague encountered a similar problem recently. He had a 
> long running tasklet and wanted to be broken down in smaller chunk.
> 
> We decided to use a timer to reschedule the taslket in the future. This 
> allows the scheduler to run other loads (e.g. vCPU) for some time.
> 
> This is pretty hackish but I couldn't find a better solution as tasklet have 
> high priority.
> 
> Maybe the other will have a better idea.

Let me try to use the timer and will share my findings.
> 
 4. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
>>> 
>>> Can you provide more details?
>> I tried to port the latest version of the SMMUv3 code than I observed that 
>> in order to port that code I have to also port atomic operation implemented 
>> in Linux to XEN. As latest Linux code uses atomic operation to process the 
>> command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() , 
>> atomic_fetch_andnot_relaxed()) .
> 
> Thank you for the explanation. I think it would be best to import the atomic 
> helpers and use the latest code.
> 
> This will ensure that we don't re-introduce bugs and also buy us some time 
> before the Linux and Xen driver diverge again too much.
> 
> Stefano, what do you think?
> 
>>> 
Atomic functions used by the commands queue access functions is not
implemented in XEN therefore we decided to port the earlier version
of the code. Once the proper atomic operations will be available in XEN
the driver can be updated.
 Signed-off-by: Rahul Singh 
 ---
  xen/drivers/passthrough/Kconfig   |   10 +
  xen/drivers/passthrough/arm/Makefile  |1 +
  xen/drivers/passthrough/arm/smmu-v3.c | 2847 +
  3 files changed, 2858 insertions(+)
>>> 
>>> This is quite significant patch to review. Is there any way to get it split 
>>> (maybe a verbatim Linux copy + Xen modification)?
>> Yes, I understand this is a quite significant patch to review let me think 
>> to get it split. If it is ok for you to review this patch and provide your 
>> comments then it will great for us.
> I will try to have a look next week.

Thanks in advance ☺️
> 
> Cheers,
> 
> -- 
> Julien Grall

Regards,
Rahul

[xen-unstable-smoke test] 156117: tolerable all pass - PUSHED

flight 156117 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156117/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  6ca70821b59849ad97c3fadc47e63c1a4af1a78c
baseline version:
 xen  861f0c110976fa8879b7bf63d9478b6be83d4ab6

Last test of basis   156108  2020-10-22 19:02:27 Z0 days
Testing same since   156117  2020-10-23 09:01:23 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 
  Roger Pau Monné 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   861f0c1109..6ca70821b5  6ca70821b59849ad97c3fadc47e63c1a4af1a78c -> smoke

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver


Hi Stefano,

On 23/10/2020 01:02, Stefano Stabellini wrote:

On Thu, 22 Oct 2020, Julien Grall wrote:

On 20/10/2020 16:25, Rahul Singh wrote:

Add support for ARM architected SMMUv3 implementations. It is based on
the Linux SMMUv3 driver.
Major differences between the Linux driver are as follows:
1. Only Stage-2 translation is supported as compared to the Linux driver
 that supports both Stage-1 and Stage-2 translations.
2. Use P2M  page table instead of creating one as SMMUv3 has the
 capability to share the page tables with the CPU.
3. Tasklets is used in place of threaded IRQ's in Linux for event queue
 and priority queue IRQ handling.


Tasklets are not a replacement for threaded IRQ. In particular, they will
have priority over anything else (IOW nothing will run on the pCPU until
they are done).

Do you know why Linux is using thread. Is it because of long running
operations?


Yes you are right because of long running operations Linux is using the
threaded IRQs.

SMMUv3 reports fault/events bases on memory-based circular buffer queues not
based on the register. As per my understanding, it is time-consuming to
process the memory based queues in interrupt context because of that Linux
is using threaded IRQ to process the faults/events from SMMU.

I didn’t find any other solution in XEN in place of tasklet to defer the
work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
we do all work in interrupt context we will make XEN less responsive.


So we need to make sure that Xen continue to receives interrupts, but we also
need to make sure that a vCPU bound to the pCPU is also responsive.



If you know another solution in XEN that will be used to defer the work in
the interrupt please let me know I will try to use that.


One of my work colleague encountered a similar problem recently. He had a long
running tasklet and wanted to be broken down in smaller chunk.

We decided to use a timer to reschedule the taslket in the future. This allows
the scheduler to run other loads (e.g. vCPU) for some time.

This is pretty hackish but I couldn't find a better solution as tasklet have
high priority.

Maybe the other will have a better idea.


Julien's suggestion is a good one.

But I think tasklets can be configured to be called from the idle_loop,
in which case they are not run in interrupt context?


Tasklets can either run from the IDLE loop or from a softirq context.

When running from a softirq context is may happen on return from 
receiving an interrupt. However, interrupts will always be enabled.


So I am not sure what concern you are trying to raise here.

Cheers,

--
Julien Grall

Re: [PATCH v2 06/11] x86/hvm: allowing registering EOI callbacks for GSIs

On 30.09.2020 12:41, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/irq.c
> +++ b/xen/arch/x86/hvm/irq.c
> @@ -595,6 +595,66 @@ int hvm_local_events_need_delivery(struct vcpu *v)
>  return !hvm_interrupt_blocked(v, intack);
>  }
>  
> +int hvm_gsi_register_callback(struct domain *d, unsigned int gsi,
> +  struct hvm_gsi_eoi_callback *cb)
> +{
> +if ( gsi >= hvm_domain_irq(d)->nr_gsis )
> +{
> +ASSERT_UNREACHABLE();
> +return -EINVAL;
> +}
> +
> +write_lock(&hvm_domain_irq(d)->gsi_callbacks_lock);
> +list_add(&cb->list, &hvm_domain_irq(d)->gsi_callbacks[gsi]);
> +write_unlock(&hvm_domain_irq(d)->gsi_callbacks_lock);
> +
> +return 0;
> +}
> +
> +void hvm_gsi_unregister_callback(struct domain *d, unsigned int gsi,
> + struct hvm_gsi_eoi_callback *cb)
> +{
> +struct list_head *tmp;

This could be const if you used ...

> +if ( gsi >= hvm_domain_irq(d)->nr_gsis )
> +{
> +ASSERT_UNREACHABLE();
> +return;
> +}
> +
> +write_lock(&hvm_domain_irq(d)->gsi_callbacks_lock);
> +list_for_each ( tmp, &hvm_domain_irq(d)->gsi_callbacks[gsi] )
> +if ( tmp == &cb->list )
> +{
> +list_del(tmp);

... &cb->list here.

> +break;
> +}
> +write_unlock(&hvm_domain_irq(d)->gsi_callbacks_lock);
> +}
> +
> +void hvm_gsi_execute_callbacks(unsigned int gsi, void *data)
> +{
> +struct domain *currd = current->domain;
> +struct hvm_gsi_eoi_callback *cb;
> +
> +read_lock(&hvm_domain_irq(currd)->gsi_callbacks_lock);
> +list_for_each_entry ( cb, &hvm_domain_irq(currd)->gsi_callbacks[gsi],
> +  list )
> +cb->callback(gsi, cb->data ?: data);

Are callback functions in principle permitted to unregister
themselves? If so, you'd need to use list_for_each_entry_safe()
here.

What's the idea of passing cb->data _or_ data?

Finally here and maybe in a few more places latch hvm_domain_irq()
into a local variable?

> +read_unlock(&hvm_domain_irq(currd)->gsi_callbacks_lock);
> +}
> +
> +bool hvm_gsi_has_callbacks(struct domain *d, unsigned int gsi)

I think a function like this would want to have all const inputs,
and it looks to be possible thanks to hvm_domain_irq() yielding
a pointer.

> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -393,6 +393,7 @@ static void eoi_callback(unsigned int vector, void *data)
>  for ( pin = 0; pin < vioapic->nr_pins; pin++ )
>  {
>  union vioapic_redir_entry *ent = &vioapic->redirtbl[pin];
> +unsigned int gsi = vioapic->base_gsi + pin;
>  
>  if ( ent->fields.vector != vector )
>  continue;
> @@ -402,13 +403,17 @@ static void eoi_callback(unsigned int vector, void 
> *data)
>  if ( is_iommu_enabled(d) )
>  {
>  spin_unlock(&d->arch.hvm.irq_lock);
> -hvm_dpci_eoi(vioapic->base_gsi + pin, ent);
> +hvm_dpci_eoi(gsi, ent);
>  spin_lock(&d->arch.hvm.irq_lock);
>  }
>  
> +spin_unlock(&d->arch.hvm.irq_lock);
> +hvm_gsi_execute_callbacks(gsi, ent);
> +spin_lock(&d->arch.hvm.irq_lock);

Iirc on an earlier patch Paul has already expressed concern about such
transient unlocking. At the very least I'd expect the description to
say why this is safe. One particular question would be in how far what
ents points to can't change across this window, disconnecting the uses
of it in the 1st locked section from those in the 2nd one.

> @@ -620,7 +628,7 @@ static int ioapic_load(struct domain *d, 
> hvm_domain_context_t *h)
>   * Add a callback for each possible vector injected by a redirection
>   * entry.
>   */
> -if ( vector < 16 || !ent->fields.remote_irr ||
> +if ( vector < 16 ||
>   (delivery_mode != dest_LowestPrio && delivery_mode != 
> dest_Fixed) )
>  continue;

I'm having trouble identifying what this gets replaced by.

Jan

Re: [PATCH v2 07/11] x86/dpci: move code

On 30.09.2020 12:41, Roger Pau Monne wrote:
> This is code movement in order to simply further changes.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné 

Acked-by: Jan Beulich 
albeit ...

> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -276,6 +276,92 @@ static struct vcpu *vector_hashing_dest(const struct 
> domain *d,
>  return dest;
>  }
>  
> +static void hvm_pirq_eoi(struct pirq *pirq,
> + const union vioapic_redir_entry *ent)
> +{
> +struct hvm_pirq_dpci *pirq_dpci;
> +
> +if ( !pirq )
> +{
> +ASSERT_UNREACHABLE();
> +return;
> +}
> +
> +pirq_dpci = pirq_dpci(pirq);
> +
> +/*
> + * No need to get vector lock for timer
> + * since interrupt is still not EOIed
> + */
> +if ( --pirq_dpci->pending ||
> + (ent && ent->fields.mask) ||
> + !pt_irq_need_timer(pirq_dpci->flags) )
> +return;
> +
> +stop_timer(&pirq_dpci->timer);
> +pirq_guest_eoi(pirq);
> +}
> +
> +static void __hvm_dpci_eoi(struct domain *d,

... could I talk you into dropping one of the two leading underscores
while moving the thing?

Jan

Re: [PATCH v2 08/11] x86/dpci: switch to use a GSI EOI callback

On 30.09.2020 12:41, Roger Pau Monne wrote:
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -327,9 +327,10 @@ static void hvm_gsi_eoi(struct domain *d, unsigned int 
> gsi,
>  hvm_pirq_eoi(pirq, ent);
>  }
>  
> -void hvm_dpci_eoi(unsigned int guest_gsi, const union vioapic_redir_entry 
> *ent)
> +static void dpci_eoi(unsigned int guest_gsi, void *data)
>  {
>  struct domain *d = current->domain;
> +const union vioapic_redir_entry *ent = data;
>  const struct hvm_irq_dpci *hvm_irq_dpci;
>  const struct hvm_girq_dpci_mapping *girq;
>  
> @@ -565,7 +566,7 @@ int pt_irq_create_bind(
>  unsigned int link;
>  
>  digl = xmalloc(struct dev_intx_gsi_link);
> -girq = xmalloc(struct hvm_girq_dpci_mapping);
> +girq = xzalloc(struct hvm_girq_dpci_mapping);
>  
>  if ( !digl || !girq )
>  {
> @@ -578,11 +579,22 @@ int pt_irq_create_bind(
>  girq->bus = digl->bus = pt_irq_bind->u.pci.bus;
>  girq->device = digl->device = pt_irq_bind->u.pci.device;
>  girq->intx = digl->intx = pt_irq_bind->u.pci.intx;
> -list_add_tail(&digl->list, &pirq_dpci->digl_list);
> +girq->cb.callback = dpci_eoi;
>  
>  guest_gsi = hvm_pci_intx_gsi(digl->device, digl->intx);
>  link = hvm_pci_intx_link(digl->device, digl->intx);
>  
> +rc = hvm_gsi_register_callback(d, guest_gsi, &girq->cb);

So this is where my question on the earlier patch gets answered:
You utilize passing NULL data to the callback to actually get
passed the IO-APIC redir entry pointer into the callback. This is
perhaps okay in principle if it was half way visible. May I ask
that at the very least instead of switching to xzalloc above you
set ->data to NULL here explicitly, accompanied by a comment on
the effect?

However, I wonder whether it wouldn't be better to have the
callback be passed const union vioapic_redir_entry * right away.
Albeit I haven't looked at the later patches yes, where it may
well be I'd find arguments against.

> @@ -590,8 +602,17 @@ int pt_irq_create_bind(
>  }
>  else
>  {
> +struct hvm_gsi_eoi_callback *cb =
> +xzalloc(struct hvm_gsi_eoi_callback);

I can't seem to be able to spot anywhere that this would get freed
(except on an error path in this function).

>  ASSERT(is_hardware_domain(d));
>  
> +if ( !cb )
> +{
> +spin_unlock(&d->event_lock);
> +return -ENOMEM;
> +}
> +
>  /* MSI_TRANSLATE is not supported for the hardware domain. */
>  if ( pt_irq_bind->irq_type != PT_IRQ_TYPE_PCI ||
>   pirq >= hvm_domain_irq(d)->nr_gsis )
> @@ -601,6 +622,19 @@ int pt_irq_create_bind(
>  return -EINVAL;
>  }

There's an error path here where you don't free cb, and I think
one or two more further down (where you then also may need to
unregister it first).

Jan

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver





On 23/10/2020 12:35, Rahul Singh wrote:

Hello,


On 23 Oct 2020, at 1:02 am, Stefano Stabellini  wrote:

On Thu, 22 Oct 2020, Julien Grall wrote:

On 20/10/2020 16:25, Rahul Singh wrote:

Add support for ARM architected SMMUv3 implementations. It is based on
the Linux SMMUv3 driver.
Major differences between the Linux driver are as follows:
1. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
2. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
3. Tasklets is used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.


Tasklets are not a replacement for threaded IRQ. In particular, they will
have priority over anything else (IOW nothing will run on the pCPU until
they are done).

Do you know why Linux is using thread. Is it because of long running
operations?


Yes you are right because of long running operations Linux is using the
threaded IRQs.

SMMUv3 reports fault/events bases on memory-based circular buffer queues not
based on the register. As per my understanding, it is time-consuming to
process the memory based queues in interrupt context because of that Linux
is using threaded IRQ to process the faults/events from SMMU.

I didn’t find any other solution in XEN in place of tasklet to defer the
work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
we do all work in interrupt context we will make XEN less responsive.


So we need to make sure that Xen continue to receives interrupts, but we also
need to make sure that a vCPU bound to the pCPU is also responsive.



If you know another solution in XEN that will be used to defer the work in
the interrupt please let me know I will try to use that.


One of my work colleague encountered a similar problem recently. He had a long
running tasklet and wanted to be broken down in smaller chunk.

We decided to use a timer to reschedule the taslket in the future. This allows
the scheduler to run other loads (e.g. vCPU) for some time.

This is pretty hackish but I couldn't find a better solution as tasklet have
high priority.

Maybe the other will have a better idea.


Julien's suggestion is a good one.

But I think tasklets can be configured to be called from the idle_loop,
in which case they are not run in interrupt context?



  Yes you are right tasklet will be scheduled from the idle_loop that is not 
interrupt conext.


This depends on your tasklet. Some will run from the softirq context 
which is usually (for Arm) on the return of an exception.





4. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.


Can you provide more details?


I tried to port the latest version of the SMMUv3 code than I observed that
in order to port that code I have to also port atomic operation implemented
in Linux to XEN. As latest Linux code uses atomic operation to process the
command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
atomic_fetch_andnot_relaxed()) .


Thank you for the explanation. I think it would be best to import the atomic
helpers and use the latest code.

This will ensure that we don't re-introduce bugs and also buy us some time
before the Linux and Xen driver diverge again too much.

Stefano, what do you think?


I think you are right.


Yes, I agree with you to have XEN code in sync with Linux code that's why I 
started with to port the Linux atomic operations to XEN  then I realised that 
it is not straightforward to port atomic operations and it requires lots of 
effort and testing. Therefore I decided to port the code before the atomic 
operation is introduced in Linux.


Hmmm... I would not have expected a lot of effort required to add the 3 
atomics operations above. Are you trying to also port the LSE support at 
the same time?


Cheers,

--
Julien Grall

Re: [PATCH 0/3] tools: avoid creating symbolic links during make

Juergen Gross writes ("[PATCH 0/3] tools: avoid creating symbolic links during 
make"):
> The rework of the Xen library build introduced creating some additional
> symbolic links during the build process.
> 
> This series is undoing that by moving all official Xen library headers
> to tools/include and by using include paths and the vpath directive
> when access to some private headers of another directory is needed.

I'm OK with these changes and inclined to give my ack and commit all
three.

I did have one observation: it is rather odd that all the
autogenerated header files are each generated by the relevant
tools/libs/foo/Makefile, but the file is in tools/include/.

This is particularly odd given that tools/include/ has a Makefile of
its own which mostly does install stuff.

Can we at least have a comment in tools/include/Makefile saying that
it is forbidden to add rules which build include files here, and
suggesting to the reader which other Makefiles to read ?

Thanks,
Ian.

Re: [PATCH 0/2] maintainers: correct some entries

Juergen Gross writes ("[PATCH 0/2] maintainers: correct some entries"):
> Fix some paths after reorg of library locations, and drop unreachable
> maintainer.

Thanks, both

Acked-by: Ian Jackson 

and committed.

Ian.

Re: [PATCH v2 0/5] xen: event handling cleanup

2020-10-23 Thread boris . ostrovsky



On 10/22/20 5:49 AM, Juergen Gross wrote:
> Do some cleanups in Xen event handling code.
>
> Changes in V2:
> - addressed comments
>
> Juergen Gross (5):
>   xen: remove no longer used functions
>   xen/events: make struct irq_info private to events_base.c
>   xen/events: only register debug interrupt for 2-level events
>   xen/events: unmask a fifo event channel only if it was masked
>   Documentation: add xen.fifo_events kernel parameter description
>
>  .../admin-guide/kernel-parameters.txt |  7 ++
>  arch/x86/xen/smp.c| 19 ++--
>  arch/x86/xen/xen-ops.h|  2 +
>  drivers/xen/events/events_2l.c|  7 +-
>  drivers/xen/events/events_base.c  | 94 +--
>  drivers/xen/events/events_fifo.c  |  9 +-
>  drivers/xen/events/events_internal.h  | 70 ++
>  include/xen/events.h  |  8 --
>  8 files changed, 102 insertions(+), 114 deletions(-)
>

Applied to for-linus-5.10b.


-boris

[qemu-mainline test] 156118: regressions - FAIL

flight 156118 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156118/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 152631
 build-amd64   6 xen-buildfail REGR. vs. 152631
 build-arm64   6 xen-buildfail REGR. vs. 152631
 build-arm64-xsm   6 xen-buildfail REGR. vs. 152631
 build-i3866 xen-buildfail REGR. vs. 152631
 build-i386-xsm6 xen-buildfail REGR. vs. 152631
 build-armhf   6 xen-buildfail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-inte

[PATCH v2 2/3] tools/libs/guest: don't use symbolic links for xenctrl headers

Instead of using symbolic links for accessing the xenctrl private
headers use an include path instead.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
---
 tools/libs/guest/Makefile | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/tools/libs/guest/Makefile b/tools/libs/guest/Makefile
index 5b4ad313cc..1c729040b3 100644
--- a/tools/libs/guest/Makefile
+++ b/tools/libs/guest/Makefile
@@ -6,11 +6,6 @@ ifeq ($(CONFIG_LIBXC_MINIOS),y)
 override CONFIG_MIGRATE := n
 endif
 
-LINK_FILES := xc_private.h xc_core.h xc_core_x86.h xc_core_arm.h xc_bitops.h
-
-$(LINK_FILES):
-   ln -sf $(XEN_ROOT)/tools/libs/ctrl/$(notdir $@) $@
-
 SRCS-y += xg_private.c
 SRCS-y += xg_domain.c
 SRCS-y += xg_suspend.c
@@ -29,6 +24,8 @@ else
 SRCS-y += xg_nomigrate.c
 endif
 
+CFLAGS += -I$(XEN_libxenctrl)
+
 vpath %.c ../../../xen/common/libelf
 CFLAGS += -I../../../xen/common/libelf
 
@@ -111,8 +108,6 @@ $(eval $(genpath-target))
 
 xc_private.h: _paths.h
 
-$(LIB_OBJS) $(PIC_OBJS): $(LINK_FILES)
-
 .PHONY: cleanlocal
 cleanlocal:
rm -f libxenguest.map
-- 
2.26.2

[PATCH v2 1/3] tools/libs: move official headers to common directory

Instead of each library having an own include directory move the
official headers to tools/include instead. This will drop the need to
link those headers to tools/include and there is no need any longer
to have library-specific include paths when building Xen.

While at it remove setting of the unused variable
PKG_CONFIG_CFLAGS_LOCAL in libs/*/Makefile.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
---
 .gitignore|  5 ++--
 stubdom/mini-os.mk|  2 +-
 tools/Rules.mk|  5 ++--
 tools/include/Makefile|  6 
 tools/{libs/vchan => }/include/libxenvchan.h  |  0
 tools/{libs/light => }/include/libxl.h|  0
 tools/{libs/light => }/include/libxl_event.h  |  0
 tools/{libs/light => }/include/libxl_json.h   |  0
 tools/{libs/light => }/include/libxl_utils.h  |  0
 tools/{libs/light => }/include/libxl_uuid.h   |  0
 tools/{libs/util => }/include/libxlutil.h |  0
 tools/{libs/call => }/include/xencall.h   |  0
 tools/{libs/ctrl => }/include/xenctrl.h   |  0
 .../{libs/ctrl => }/include/xenctrl_compat.h  |  0
 .../devicemodel => }/include/xendevicemodel.h |  0
 tools/{libs/evtchn => }/include/xenevtchn.h   |  0
 .../include/xenforeignmemory.h|  0
 tools/{libs/gnttab => }/include/xengnttab.h   |  0
 tools/{libs/guest => }/include/xenguest.h |  0
 tools/{libs/hypfs => }/include/xenhypfs.h |  0
 tools/{libs/stat => }/include/xenstat.h   |  0
 .../compat => include/xenstore-compat}/xs.h   |  0
 .../xenstore-compat}/xs_lib.h |  0
 tools/{libs/store => }/include/xenstore.h |  0
 tools/{xenstore => include}/xenstore_lib.h|  0
 .../{libs/toolcore => }/include/xentoolcore.h |  0
 .../include/xentoolcore_internal.h|  0
 tools/{libs/toollog => }/include/xentoollog.h |  0
 tools/libs/call/Makefile  |  3 --
 tools/libs/ctrl/Makefile  |  3 --
 tools/libs/devicemodel/Makefile   |  3 --
 tools/libs/evtchn/Makefile|  2 --
 tools/libs/foreignmemory/Makefile |  3 --
 tools/libs/gnttab/Makefile|  3 --
 tools/libs/guest/Makefile |  3 --
 tools/libs/hypfs/Makefile |  3 --
 tools/libs/libs.mk| 10 ++-
 tools/libs/light/Makefile | 28 ---
 tools/libs/stat/Makefile  |  2 --
 tools/libs/store/Makefile | 11 +++-
 tools/libs/toolcore/Makefile  |  9 +++---
 tools/libs/toollog/Makefile   |  2 --
 tools/libs/util/Makefile  |  3 --
 tools/libs/vchan/Makefile |  3 --
 tools/ocaml/libs/xentoollog/Makefile  |  2 +-
 tools/ocaml/libs/xentoollog/genlevels.py  |  2 +-
 46 files changed, 36 insertions(+), 77 deletions(-)
 rename tools/{libs/vchan => }/include/libxenvchan.h (100%)
 rename tools/{libs/light => }/include/libxl.h (100%)
 rename tools/{libs/light => }/include/libxl_event.h (100%)
 rename tools/{libs/light => }/include/libxl_json.h (100%)
 rename tools/{libs/light => }/include/libxl_utils.h (100%)
 rename tools/{libs/light => }/include/libxl_uuid.h (100%)
 rename tools/{libs/util => }/include/libxlutil.h (100%)
 rename tools/{libs/call => }/include/xencall.h (100%)
 rename tools/{libs/ctrl => }/include/xenctrl.h (100%)
 rename tools/{libs/ctrl => }/include/xenctrl_compat.h (100%)
 rename tools/{libs/devicemodel => }/include/xendevicemodel.h (100%)
 rename tools/{libs/evtchn => }/include/xenevtchn.h (100%)
 rename tools/{libs/foreignmemory => }/include/xenforeignmemory.h (100%)
 rename tools/{libs/gnttab => }/include/xengnttab.h (100%)
 rename tools/{libs/guest => }/include/xenguest.h (100%)
 rename tools/{libs/hypfs => }/include/xenhypfs.h (100%)
 rename tools/{libs/stat => }/include/xenstat.h (100%)
 rename tools/{libs/store/include/compat => include/xenstore-compat}/xs.h (100%)
 rename tools/{libs/store/include/compat => include/xenstore-compat}/xs_lib.h 
(100%)
 rename tools/{libs/store => }/include/xenstore.h (100%)
 rename tools/{xenstore => include}/xenstore_lib.h (100%)
 rename tools/{libs/toolcore => }/include/xentoolcore.h (100%)
 rename tools/{libs/toolcore => }/include/xentoolcore_internal.h (100%)
 rename tools/{libs/toollog => }/include/xentoollog.h (100%)

diff --git a/.gitignore b/.gitignore
index f6865c9cd8..b346a2abf6 100644
--- a/.gitignore
+++ b/.gitignore
@@ -143,7 +143,6 @@ tools/libs/light/test_timedereg
 tools/libs/light/test_fdderegrace
 tools/libs/light/tmp.*
 tools/libs/light/xenlight.pc
-tools/libs/light/include/_*.h
 tools/libs/stat/_paths.h
 tools/libs/stat/headers.chk
 tools/libs/stat/libxenstat.map
@@ -153,7 +152,6 @@ tools/libs/store/list.h
 tools/libs/store/utils.h
 tools/libs/store/xenstore.pc
 tools/libs/store/xs_lib.c
-tools/libs/store/in

[PATCH v2 3/3] tools/libs/store: don't use symbolic links for external files

Instead of using symbolic links to include files from xenstored use
the vpath directive and an include path.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
---
 tools/libs/store/Makefile | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/tools/libs/store/Makefile b/tools/libs/store/Makefile
index 930e763de9..bc89b9cd70 100644
--- a/tools/libs/store/Makefile
+++ b/tools/libs/store/Makefile
@@ -21,12 +21,8 @@ CFLAGS += $(CFLAGS_libxentoolcore)
 CFLAGS += -DXEN_LIB_STORED="\"$(XEN_LIB_STORED)\""
 CFLAGS += -DXEN_RUN_STORED="\"$(XEN_RUN_STORED)\""
 
-LINK_FILES = xs_lib.c list.h utils.h
-
-$(LIB_OBJS): $(LINK_FILES)
-
-$(LINK_FILES):
-   ln -sf $(XEN_ROOT)/tools/xenstore/$@ $@
+vpath xs_lib.c $(XEN_ROOT)/tools/xenstore
+CFLAGS += -I $(XEN_ROOT)/tools/xenstore
 
 xs.opic: CFLAGS += -DUSE_PTHREAD
 ifeq ($(CONFIG_Linux),y)
-- 
2.26.2

[PATCH v2 0/3] tools: avoid creating symbolic links during make

The rework of the Xen library build introduced creating some additional
symbolic links during the build process.

This series is undoing that by moving all official Xen library headers
to tools/include and by using include paths and the vpath directive
when access to some private headers of another directory is needed.

Changes in V2:
- added comment to tools/include/Makefile (Ian Jackson)

Juergen Gross (3):
  tools/libs: move official headers to common directory
  tools/libs/guest: don't use symbolic links for xenctrl headers
  tools/libs/store: don't use symbolic links for external files

 .gitignore|  5 ++--
 stubdom/mini-os.mk|  2 +-
 tools/Rules.mk|  5 ++--
 tools/include/Makefile|  6 
 tools/{libs/vchan => }/include/libxenvchan.h  |  0
 tools/{libs/light => }/include/libxl.h|  0
 tools/{libs/light => }/include/libxl_event.h  |  0
 tools/{libs/light => }/include/libxl_json.h   |  0
 tools/{libs/light => }/include/libxl_utils.h  |  0
 tools/{libs/light => }/include/libxl_uuid.h   |  0
 tools/{libs/util => }/include/libxlutil.h |  0
 tools/{libs/call => }/include/xencall.h   |  0
 tools/{libs/ctrl => }/include/xenctrl.h   |  0
 .../{libs/ctrl => }/include/xenctrl_compat.h  |  0
 .../devicemodel => }/include/xendevicemodel.h |  0
 tools/{libs/evtchn => }/include/xenevtchn.h   |  0
 .../include/xenforeignmemory.h|  0
 tools/{libs/gnttab => }/include/xengnttab.h   |  0
 tools/{libs/guest => }/include/xenguest.h |  0
 tools/{libs/hypfs => }/include/xenhypfs.h |  0
 tools/{libs/stat => }/include/xenstat.h   |  0
 .../compat => include/xenstore-compat}/xs.h   |  0
 .../xenstore-compat}/xs_lib.h |  0
 tools/{libs/store => }/include/xenstore.h |  0
 tools/{xenstore => include}/xenstore_lib.h|  0
 .../{libs/toolcore => }/include/xentoolcore.h |  0
 .../include/xentoolcore_internal.h|  0
 tools/{libs/toollog => }/include/xentoollog.h |  0
 tools/libs/call/Makefile  |  3 --
 tools/libs/ctrl/Makefile  |  3 --
 tools/libs/devicemodel/Makefile   |  3 --
 tools/libs/evtchn/Makefile|  2 --
 tools/libs/foreignmemory/Makefile |  3 --
 tools/libs/gnttab/Makefile|  3 --
 tools/libs/guest/Makefile | 12 ++--
 tools/libs/hypfs/Makefile |  3 --
 tools/libs/libs.mk| 10 ++-
 tools/libs/light/Makefile | 28 ---
 tools/libs/stat/Makefile  |  2 --
 tools/libs/store/Makefile | 15 +++---
 tools/libs/toolcore/Makefile  |  9 +++---
 tools/libs/toollog/Makefile   |  2 --
 tools/libs/util/Makefile  |  3 --
 tools/libs/vchan/Makefile |  3 --
 tools/ocaml/libs/xentoollog/Makefile  |  2 +-
 tools/ocaml/libs/xentoollog/genlevels.py  |  2 +-
 46 files changed, 38 insertions(+), 88 deletions(-)
 rename tools/{libs/vchan => }/include/libxenvchan.h (100%)
 rename tools/{libs/light => }/include/libxl.h (100%)
 rename tools/{libs/light => }/include/libxl_event.h (100%)
 rename tools/{libs/light => }/include/libxl_json.h (100%)
 rename tools/{libs/light => }/include/libxl_utils.h (100%)
 rename tools/{libs/light => }/include/libxl_uuid.h (100%)
 rename tools/{libs/util => }/include/libxlutil.h (100%)
 rename tools/{libs/call => }/include/xencall.h (100%)
 rename tools/{libs/ctrl => }/include/xenctrl.h (100%)
 rename tools/{libs/ctrl => }/include/xenctrl_compat.h (100%)
 rename tools/{libs/devicemodel => }/include/xendevicemodel.h (100%)
 rename tools/{libs/evtchn => }/include/xenevtchn.h (100%)
 rename tools/{libs/foreignmemory => }/include/xenforeignmemory.h (100%)
 rename tools/{libs/gnttab => }/include/xengnttab.h (100%)
 rename tools/{libs/guest => }/include/xenguest.h (100%)
 rename tools/{libs/hypfs => }/include/xenhypfs.h (100%)
 rename tools/{libs/stat => }/include/xenstat.h (100%)
 rename tools/{libs/store/include/compat => include/xenstore-compat}/xs.h (100%)
 rename tools/{libs/store/include/compat => include/xenstore-compat}/xs_lib.h 
(100%)
 rename tools/{libs/store => }/include/xenstore.h (100%)
 rename tools/{xenstore => include}/xenstore_lib.h (100%)
 rename tools/{libs/toolcore => }/include/xentoolcore.h (100%)
 rename tools/{libs/toolcore => }/include/xentoolcore_internal.h (100%)
 rename tools/{libs/toollog => }/include/xentoollog.h (100%)

-- 
2.26.2

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-23 Thread Rahul Singh

Hello Julien,

> On 23 Oct 2020, at 2:00 pm, Julien Grall  wrote:
> 
> 
> 
> On 23/10/2020 12:35, Rahul Singh wrote:
>> Hello,
>>> On 23 Oct 2020, at 1:02 am, Stefano Stabellini  
>>> wrote:
>>> 
>>> On Thu, 22 Oct 2020, Julien Grall wrote:
>> On 20/10/2020 16:25, Rahul Singh wrote:
>>> Add support for ARM architected SMMUv3 implementations. It is based on
>>> the Linux SMMUv3 driver.
>>> Major differences between the Linux driver are as follows:
>>> 1. Only Stage-2 translation is supported as compared to the Linux driver
>>>that supports both Stage-1 and Stage-2 translations.
>>> 2. Use P2M  page table instead of creating one as SMMUv3 has the
>>>capability to share the page tables with the CPU.
>>> 3. Tasklets is used in place of threaded IRQ's in Linux for event queue
>>>and priority queue IRQ handling.
>> 
>> Tasklets are not a replacement for threaded IRQ. In particular, they will
>> have priority over anything else (IOW nothing will run on the pCPU until
>> they are done).
>> 
>> Do you know why Linux is using thread. Is it because of long running
>> operations?
> 
> Yes you are right because of long running operations Linux is using the
> threaded IRQs.
> 
> SMMUv3 reports fault/events bases on memory-based circular buffer queues 
> not
> based on the register. As per my understanding, it is time-consuming to
> process the memory based queues in interrupt context because of that Linux
> is using threaded IRQ to process the faults/events from SMMU.
> 
> I didn’t find any other solution in XEN in place of tasklet to defer the
> work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
> we do all work in interrupt context we will make XEN less responsive.
 
 So we need to make sure that Xen continue to receives interrupts, but we 
 also
 need to make sure that a vCPU bound to the pCPU is also responsive.
 
> 
> If you know another solution in XEN that will be used to defer the work in
> the interrupt please let me know I will try to use that.
 
 One of my work colleague encountered a similar problem recently. He had a 
 long
 running tasklet and wanted to be broken down in smaller chunk.
 
 We decided to use a timer to reschedule the taslket in the future. This 
 allows
 the scheduler to run other loads (e.g. vCPU) for some time.
 
 This is pretty hackish but I couldn't find a better solution as tasklet 
 have
 high priority.
 
 Maybe the other will have a better idea.
>>> 
>>> Julien's suggestion is a good one.
>>> 
>>> But I think tasklets can be configured to be called from the idle_loop,
>>> in which case they are not run in interrupt context?
>>> 
>>  Yes you are right tasklet will be scheduled from the idle_loop that is not 
>> interrupt conext.
> 
> This depends on your tasklet. Some will run from the softirq context which is 
> usually (for Arm) on the return of an exception.
> 

Thanks for the info. I will check and will get better understanding of the 
tasklet how it will run in XEN.

>>> 
>>> 4. Latest version of the Linux SMMUv3 code implements the commands queue
>>>access functions based on atomic operations implemented in Linux.
>> 
>> Can you provide more details?
> 
> I tried to port the latest version of the SMMUv3 code than I observed that
> in order to port that code I have to also port atomic operation 
> implemented
> in Linux to XEN. As latest Linux code uses atomic operation to process the
> command queues 
> (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
> atomic_fetch_andnot_relaxed()) .
 
 Thank you for the explanation. I think it would be best to import the 
 atomic
 helpers and use the latest code.
 
 This will ensure that we don't re-introduce bugs and also buy us some time
 before the Linux and Xen driver diverge again too much.
 
 Stefano, what do you think?
>>> 
>>> I think you are right.
>> Yes, I agree with you to have XEN code in sync with Linux code that's why I 
>> started with to port the Linux atomic operations to XEN  then I realised 
>> that it is not straightforward to port atomic operations and it requires 
>> lots of effort and testing. Therefore I decided to port the code before the 
>> atomic operation is introduced in Linux.
> 
> Hmmm... I would not have expected a lot of effort required to add the 3 
> atomics operations above. Are you trying to also port the LSE support at the 
> same time?

There are other atomic operations used in the SMMUv3 code apart from the 3 
atomic operation I mention. I just mention 3 operation as an example. I tried 
to port at that time but when I start porting I realised that one atomic 
operation depend on another one so I decided not to proceed further.

> 
> Cheers,
> 
> -- 
> Julien Gral

Re: [PATCH v2 1/3] tools/libs: move official headers to common directory

Juergen Gross writes ("[PATCH v2 1/3] tools/libs: move official headers to 
common directory"):
> Instead of each library having an own include directory move the
> official headers to tools/include instead. This will drop the need to
> link those headers to tools/include and there is no need any longer
> to have library-specific include paths when building Xen.
> 
> While at it remove setting of the unused variable
> PKG_CONFIG_CFLAGS_LOCAL in libs/*/Makefile.
> 
> Signed-off-by: Juergen Gross 
> Acked-by: Christian Lindig 
> Tested-by: Bertrand Marquis 

Thanks, all three

Acked-by: Ian Jackson 

and pushed.

Ian.

Re: [PATCH v2 09/11] x86/vpt: switch interrupt injection model

On 30.09.2020 12:41, Roger Pau Monne wrote:
> Currently vPT relies on timers being assigned to a vCPU and performing
> checks on every return to HVM guest in order to check if an interrupt
> from a vPT timer assigned to the vCPU is currently being injected.
> 
> This model doesn't work properly since the interrupt destination vCPU
> of a vPT timer can be different from the vCPU where the timer is
> currently assigned, in which case the timer would get stuck because it
> never sees the interrupt as being injected.
> 
> Knowing when a vPT interrupt is injected is relevant for the guest
> timer modes where missed vPT interrupts are not discarded and instead
> are accumulated and injected when possible.
> 
> This change aims to modify the logic described above, so that vPT
> doesn't need to check on every return to HVM guest if a vPT interrupt
> is being injected. In order to achieve this the vPT code is modified
> to make use of the new EOI callbacks, so that virtual timers can
> detect when a interrupt has been serviced by the guest by waiting for
> the EOI callback to execute.
> 
> This model also simplifies some of the logic, as when executing the
> timer EOI callback Xen can try to inject another interrupt if the
> timer has interrupts pending for delivery.
> 
> Note that timers are still bound to a vCPU for the time being, this
> relation however doesn't limit the interrupt destination anymore, and
> will be removed by further patches.
> 
> This model has been tested with Windows 7 guests without showing any
> timer delay, even when the guest was limited to have very little CPU
> capacity and pending virtual timer interrupts accumulate.
> 
> Signed-off-by: Roger Pau Monné 
> ---
> Changes since v1:
>  - New in this version.
> ---
> Sorry, this is a big change, but I'm having issues splitting it into
> smaller pieces as the functionality needs to be changed in one go, or
> else timers would be broken.
> 
> If this approach seems sensible I can try to split it up.

If it can't sensibly be split, so be it, I would say. And yes, the
approach does look sensible to me, supported by ...

> ---
>  xen/arch/x86/hvm/svm/intr.c   |   3 -
>  xen/arch/x86/hvm/vmx/intr.c   |  59 --
>  xen/arch/x86/hvm/vpt.c| 326 ++
>  xen/include/asm-x86/hvm/vpt.h |   5 +-
>  4 files changed, 135 insertions(+), 258 deletions(-)

... this diffstat. Good work!

Just a couple of nits, but before giving this my ack I may need to
go through it a 2nd time.

> +/*
> + * The same callback is shared between LAPIC and PIC/IO-APIC based timers, as
> + * we ignore the first parameter that's different between them.
> + */
> +static void eoi_callback(unsigned int unused, void *data)
>  {
> -struct list_head *head = &v->arch.hvm.tm_list;
> -struct periodic_time *pt, *temp, *earliest_pt;
> -uint64_t max_lag;
> -int irq, pt_vector = -1;
> -bool level;
> +struct periodic_time *pt = data;
> +struct vcpu *v;
> +time_cb *cb = NULL;
> +void *cb_priv;
>  
> -pt_vcpu_lock(v);
> +pt_lock(pt);
>  
> -earliest_pt = NULL;
> -max_lag = -1ULL;
> -list_for_each_entry_safe ( pt, temp, head, list )
> +pt_irq_fired(pt->vcpu, pt);
> +if ( pt->pending_intr_nr )
>  {
> -if ( pt->pending_intr_nr )
> +if ( inject_interrupt(pt) )
> +{
> +pt->pending_intr_nr--;
> +cb = pt->cb;
> +cb_priv = pt->priv;
> +v = pt->vcpu;
> +}
> +else
>  {
> -/* RTC code takes care of disabling the timer itself. */
> -if ( (pt->irq != RTC_IRQ || !pt->priv) && pt_irq_masked(pt) &&
> - /* Level interrupts should be asserted even if masked. */
> - !pt->level )
> -{
> -/* suspend timer emulation */
> +/* Masked. */
> +if ( pt->on_list )
>  list_del(&pt->list);
> -pt->on_list = 0;
> -}
> -else
> -{
> -if ( (pt->last_plt_gtime + pt->period) < max_lag )
> -{
> -max_lag = pt->last_plt_gtime + pt->period;
> -earliest_pt = pt;
> -}
> -}
> +pt->on_list = false;
>  }
>  }
>  
> -if ( earliest_pt == NULL )
> -{
> -pt_vcpu_unlock(v);
> -return -1;
> -}
> +pt_unlock(pt);
>  
> -earliest_pt->irq_issued = 1;
> -irq = earliest_pt->irq;
> -level = earliest_pt->level;
> +if ( cb != NULL )
> +cb(v, cb_priv);

Nit: Like done elsewhere, omit the " != NULL"?

> +/* Update time when an interrupt is injected. */
> +if ( mode_is(v->domain, one_missed_tick_pending) ||
> + mode_is(v->domain, no_missed_ticks_pending) )
> +pt->last_plt_gtime = hvm_get_guest_time(v);
> +else
> +pt->last_plt_gtime += pt->period;
>  
> -pt_vcpu_unlock(v);

Re: [PATCH v2 1/3] tools/libs: move official headers to common directory

On 23.10.2020 16:52, Ian Jackson wrote:
> Juergen Gross writes ("[PATCH v2 1/3] tools/libs: move official headers to 
> common directory"):
>> Instead of each library having an own include directory move the
>> official headers to tools/include instead. This will drop the need to
>> link those headers to tools/include and there is no need any longer
>> to have library-specific include paths when building Xen.
>>
>> While at it remove setting of the unused variable
>> PKG_CONFIG_CFLAGS_LOCAL in libs/*/Makefile.
>>
>> Signed-off-by: Juergen Gross 
>> Acked-by: Christian Lindig 
>> Tested-by: Bertrand Marquis 
> 
> Thanks, all three
> 
> Acked-by: Ian Jackson 
> 
> and pushed.

While you're at it, Ian, could you also take a look at
"[PATCH 2/2] tools/libs: fix uninstall rule for header files"
(patch 1 there now obviously is obsolete)?

Thanks, Jan

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

On 23/10/2020 15:27, Rahul Singh wrote:

Hello Julien,

On 23 Oct 2020, at 2:00 pm, Julien Grall wrote:

On 23/10/2020 12:35, Rahul Singh wrote:

Hello,

On 23 Oct 2020, at 1:02 am, Stefano Stabellini wrote:

On Thu, 22 Oct 2020, Julien Grall wrote:

On 20/10/2020 16:25, Rahul Singh wrote:

Add support for ARM architected SMMUv3 implementations. It is based on
the Linux SMMUv3 driver.
Major differences between the Linux driver are as follows:
1. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
2. Use P2M page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
3. Tasklets is used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.

Tasklets are not a replacement for threaded IRQ. In particular, they will
have priority over anything else (IOW nothing will run on the pCPU until
they are done).

Do you know why Linux is using thread. Is it because of long running
operations?

Yes you are right because of long running operations Linux is using the
threaded IRQs.

SMMUv3 reports fault/events bases on memory-based circular buffer queues not
based on the register. As per my understanding, it is time-consuming to
process the memory based queues in interrupt context because of that Linux
is using threaded IRQ to process the faults/events from SMMU.

I didn’t find any other solution in XEN in place of tasklet to defer the
work, that’s why I used tasklet in XEN in replacement of threaded IRQs. If
we do all work in interrupt context we will make XEN less responsive.

So we need to make sure that Xen continue to receives interrupts, but we also
need to make sure that a vCPU bound to the pCPU is also responsive.

If you know another solution in XEN that will be used to defer the work in
the interrupt please let me know I will try to use that.

One of my work colleague encountered a similar problem recently. He had a long
running tasklet and wanted to be broken down in smaller chunk.

We decided to use a timer to reschedule the taslket in the future. This allows
the scheduler to run other loads (e.g. vCPU) for some time.

This is pretty hackish but I couldn't find a better solution as tasklet have
high priority.

Maybe the other will have a better idea.

Julien's suggestion is a good one.

But I think tasklets can be configured to be called from the idle_loop,
in which case they are not run in interrupt context?

Yes you are right tasklet will be scheduled from the idle_loop that is not
interrupt conext.

This depends on your tasklet. Some will run from the softirq context which is
usually (for Arm) on the return of an exception.

Thanks for the info. I will check and will get better understanding of the
tasklet how it will run in XEN.

4. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.

Can you provide more details?

I tried to port the latest version of the SMMUv3 code than I observed that
in order to port that code I have to also port atomic operation implemented
in Linux to XEN. As latest Linux code uses atomic operation to process the
command queues (atomic_cond_read_relaxed(),atomic_long_cond_read_relaxed() ,
atomic_fetch_andnot_relaxed()) .

Thank you for the explanation. I think it would be best to import the atomic
helpers and use the latest code.

This will ensure that we don't re-introduce bugs and also buy us some time
before the Linux and Xen driver diverge again too much.

Stefano, what do you think?

I think you are right.

Yes, I agree with you to have XEN code in sync with Linux code that's why I
started with to port the Linux atomic operations to XEN then I realised that
it is not straightforward to port atomic operations and it requires lots of
effort and testing. Therefore I decided to port the code before the atomic
operation is introduced in Linux.

Hmmm... I would not have expected a lot of effort required to add the 3 atomics
operations above. Are you trying to also port the LSE support at the same time?

There are other atomic operations used in the SMMUv3 code apart from the 3 atomic operation I mention. I just mention 3 operation as an example.

Ok. Do you have a list you could share?

Cheers,

--
Julien Grall

Re: [PATCH v2 10/11] x86/vpt: remove vPT timers per-vCPU lists

On 30.09.2020 12:41, Roger Pau Monne wrote:
> --- a/xen/arch/x86/domain.c
> +++ b/xen/arch/x86/domain.c
> @@ -1964,7 +1964,7 @@ void context_switch(struct vcpu *prev, struct vcpu 
> *next)
>  vpmu_switch_from(prev);
>  np2m_schedule(NP2M_SCHEDLE_OUT);
>  
> -if ( is_hvm_domain(prevd) && !list_empty(&prev->arch.hvm.tm_list) )
> +if ( is_hvm_domain(prevd) )
>  pt_save_timer(prev);

While most of the function goes away, pt_freeze_time() now will get
called in cases it previously wasn't called - is this benign?

> @@ -195,50 +182,20 @@ static void pt_thaw_time(struct vcpu *v)
>  
>  void pt_save_timer(struct vcpu *v)
>  {
> -struct list_head *head = &v->arch.hvm.tm_list;
> -struct periodic_time *pt;
> -
> -if ( v->pause_flags & VPF_blocked )
> -return;
> -
> -pt_vcpu_lock(v);
> -
> -list_for_each_entry ( pt, head, list )
> -if ( !pt->do_not_freeze )
> -stop_timer(&pt->timer);
>  
>  pt_freeze_time(v);
> -
> -pt_vcpu_unlock(v);
>  }
>  
>  void pt_restore_timer(struct vcpu *v)
>  {
> -struct list_head *head = &v->arch.hvm.tm_list;
> -struct periodic_time *pt;
> -
> -pt_vcpu_lock(v);
> -
> -list_for_each_entry ( pt, head, list )
> -if ( pt->pending_intr_nr == 0 )
> -set_timer(&pt->timer, pt->scheduled);
> -
>  pt_thaw_time(v);
> -
> -pt_vcpu_unlock(v);
>  }

In both functions the single function called also is the only
time it is used anywhere, so I guess the extra layer could be
removed.

> @@ -402,8 +339,7 @@ void create_periodic_time(
>  write_lock(&v->domain->arch.hvm.pl_time->pt_migrate);
>  
>  pt->pending_intr_nr = 0;
> -pt->do_not_freeze = 0;
> -pt->irq_issued = 0;
> +pt->masked = false;

I agree here, but ...

> @@ -479,10 +412,8 @@ void destroy_periodic_time(struct periodic_time *pt)
>  return;
>  
>  pt_lock(pt);
> -if ( pt->on_list )
> -list_del(&pt->list);
> -pt->on_list = 0;
>  pt->pending_intr_nr = 0;
> +pt->masked = false;

... why not "true" here, at the very least for pt_active()'s sake?

> --- a/xen/include/asm-x86/hvm/vpt.h
> +++ b/xen/include/asm-x86/hvm/vpt.h
> @@ -31,13 +31,10 @@
>  typedef void time_cb(struct vcpu *v, void *opaque);
>  
>  struct periodic_time {
> -struct list_head list;
> -bool on_list;
>  bool one_shot;
> -bool do_not_freeze;
> -bool irq_issued;
>  bool warned_timeout_too_short;
>  bool level;
> +bool masked;

"masked" aiui doesn't say anything about the present state of a
timer, but about its state the last time an interrupt was
attempted to be injected. If this is right, either a name change
("last_seen_masked" is somewhat longish) might be helpful, but
at the very least I'd like to ask for a comment to this effect.

> @@ -158,7 +153,7 @@ void pt_adjust_global_vcpu_target(struct vcpu *v);
>  void pt_may_unmask_irq(struct domain *d, struct periodic_time *vlapic_pt);
>  
>  /* Is given periodic timer active? */
> -#define pt_active(pt) ((pt)->on_list || (pt)->pending_intr_nr)
> +#define pt_active(pt) !(pt)->masked

This wants parentheses around it. And why does the right side of the
|| go away?

Jan

[PATCH v2 0/7] xen/arm: Unbreak ACPI

From: Julien Grall 

Hi all,

Xen on ARM has been broken for quite a while on ACPI systems. This
series aims to fix it.

This series also introduced support for ACPI 5.1. This allows Xen to
boot on QEMU.

I have only build tested the x86 side so far.

Cheers,

Julien Grall (7):
  xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()
  xen/arm: acpi: The fixmap area should always be cleared during
failure/unmap
  xen/arm: Check if the platform is not using ACPI before initializing
Dom0less
  xen/arm: Introduce fw_unreserved_regions() and use it
  xen/arm: acpi: add BAD_MADT_GICC_ENTRY() macro
  xen/arm: gic-v2: acpi: Use the correct length for the GICC structure
  xen/arm: acpi: Allow Xen to boot with ACPI 5.1

 xen/arch/arm/acpi/boot.c|  6 +--
 xen/arch/arm/acpi/lib.c | 79 ++---
 xen/arch/arm/gic-v2.c   |  5 ++-
 xen/arch/arm/gic-v3.c   |  2 +-
 xen/arch/arm/kernel.c   |  2 +-
 xen/arch/arm/setup.c| 25 +---
 xen/arch/x86/acpi/lib.c | 18 +
 xen/drivers/acpi/osl.c  | 34 
 xen/include/asm-arm/acpi.h  |  8 
 xen/include/asm-arm/setup.h |  3 +-
 xen/include/xen/acpi.h  |  1 +
 11 files changed, 139 insertions(+), 44 deletions(-)

-- 
2.17.1

[PATCH v2 6/7] xen/arm: gic-v2: acpi: Use the correct length for the GICC structure

From: Julien Grall 

The length of the GICC structure in the MADT ACPI table differs between
version 5.1 and 6.0, although there are no other relevant differences.

Use the BAD_MADT_GICC_ENTRY macro, which was specifically designed to
overcome this issue.

Signed-off-by: Julien Grall 
Signed-off-by: Andre Przywara 
Signed-off-by: Julien Grall 

---
Changes in v2:
- Patch added
---
 xen/arch/arm/acpi/boot.c | 2 +-
 xen/arch/arm/gic-v2.c| 5 +++--
 xen/arch/arm/gic-v3.c| 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
index 30e4bd1bc5a7..55c3e5cbc834 100644
--- a/xen/arch/arm/acpi/boot.c
+++ b/xen/arch/arm/acpi/boot.c
@@ -131,7 +131,7 @@ acpi_parse_gic_cpu_interface(struct acpi_subtable_header 
*header,
 struct acpi_madt_generic_interrupt *processor =
container_of(header, struct acpi_madt_generic_interrupt, 
header);
 
-if ( BAD_MADT_ENTRY(processor, end) )
+if ( BAD_MADT_GICC_ENTRY(processor, end) )
 return -EINVAL;
 
 acpi_table_print_madt_entry(header);
diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 0f747538dbcd..0e5f23201974 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -1136,7 +1136,8 @@ static int gicv2_make_hwdom_madt(const struct domain *d, 
u32 offset)
 
 host_gicc = container_of(header, struct acpi_madt_generic_interrupt,
  header);
-size = sizeof(struct acpi_madt_generic_interrupt);
+
+size = ACPI_MADT_GICC_LENGTH;
 /* Add Generic Interrupt */
 for ( i = 0; i < d->max_vcpus; i++ )
 {
@@ -1165,7 +1166,7 @@ gic_acpi_parse_madt_cpu(struct acpi_subtable_header 
*header,
 struct acpi_madt_generic_interrupt *processor =
container_of(header, struct acpi_madt_generic_interrupt, 
header);
 
-if ( BAD_MADT_ENTRY(processor, end) )
+if ( BAD_MADT_GICC_ENTRY(processor, end) )
 return -EINVAL;
 
 /* Read from APIC table and fill up the GIC variables */
diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index 0f6cbf6224e9..ce202402c0ed 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1558,7 +1558,7 @@ gic_acpi_parse_madt_cpu(struct acpi_subtable_header 
*header,
 struct acpi_madt_generic_interrupt *processor =
container_of(header, struct acpi_madt_generic_interrupt, 
header);
 
-if ( BAD_MADT_ENTRY(processor, end) )
+if ( BAD_MADT_GICC_ENTRY(processor, end) )
 return -EINVAL;
 
 /* Read from APIC table and fill up the GIC variables */
-- 
2.17.1

[PATCH v2 7/7] xen/arm: acpi: Allow Xen to boot with ACPI 5.1

From: Julien Grall 

At the moment Xen requires the FADT ACPI table to be at least version
6.0, apparently because of some reliance on other ACPI v6.0 features.

But actually this is overzealous, and Xen works now fine with ACPI v5.1.

Let's relax the version check for the FADT table to allow QEMU to
run the hypervisor with ACPI.

Signed-off-by: Julien Grall 
Signed-off-by: Andre Przywara 
Signed-off-by: Julien Grall 

---
Changes in v2:
- Patch added
---
 xen/arch/arm/acpi/boot.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
index 55c3e5cbc834..7ea2990cb82c 100644
--- a/xen/arch/arm/acpi/boot.c
+++ b/xen/arch/arm/acpi/boot.c
@@ -181,8 +181,8 @@ static int __init acpi_parse_fadt(struct acpi_table_header 
*table)
  * we only deal with ACPI 6.0 or newer revision to get GIC and SMP
  * boot protocol configuration data, or we will disable ACPI.
  */
-if ( table->revision > 6
- || (table->revision == 6 && fadt->minor_revision >= 0) )
+if ( table->revision > 5
+ || (table->revision == 5 && fadt->minor_revision >= 1) )
 return 0;
 
 printk("Unsupported FADT revision %d.%d, should be 6.0+, will disable 
ACPI\n",
-- 
2.17.1

[PATCH v2 5/7] xen/arm: acpi: add BAD_MADT_GICC_ENTRY() macro

From: Julien Grall 

Imported from Linux commit b6cfb277378ef831c0fa84bcff5049307294adc6:

The BAD_MADT_ENTRY() macro is designed to work for all of the subtables
of the MADT.  In the ACPI 5.1 version of the spec, the struct for the
GICC subtable (struct acpi_madt_generic_interrupt) is 76 bytes long; in
ACPI 6.0, the struct is 80 bytes long.  But, there is only one definition
in ACPICA for this struct -- and that is the 6.0 version.  Hence, when
BAD_MADT_ENTRY() compares the struct size to the length in the GICC
subtable, it fails if 5.1 structs are in use, and there are systems in
the wild that have them.

This patch adds the BAD_MADT_GICC_ENTRY() that checks the GICC subtable
only, accounting for the difference in specification versions that are
possible.  The BAD_MADT_ENTRY() will continue to work as is for all other
MADT subtables.

This code is being added to an arm64 header file since that is currently
the only architecture using the GICC subtable of the MADT.  As a GIC is
specific to ARM, it is also unlikely the subtable will be used elsewhere.

Fixes: aeb823bbacc2 ("ACPICA: ACPI 6.0: Add changes for FADT table.")
Signed-off-by: Al Stone 
Acked-by: Will Deacon 
Acked-by: "Rafael J. Wysocki" 
[catalin.mari...@arm.com: extra brackets around macro arguments]
Signed-off-by: Catalin Marinas 

Signed-off-by: Julien Grall 
Signed-off-by: Andre Przywara 
Signed-off-by: Julien Grall 

---

Changes in v2:
- Patch added

We may want to consider to also import:

commit 9eb1c92b47c73249465d388eaa394fe436a3b489
Author: Jeremy Linton 
Date:   Tue Nov 27 17:59:12 2018 +

arm64: acpi: Prepare for longer MADTs

The BAD_MADT_GICC_ENTRY check is a little too strict because
it rejects MADT entries that don't match the currently known
lengths. We should remove this restriction to avoid problems
if the table length changes. Future code which might depend on
additional fields should be written to validate those fields
before using them, rather than trying to globally check
known MADT version lengths.

Link: 
https://lkml.kernel.org/r/20181012192937.3819951-1-jeremy.lin...@arm.com
Signed-off-by: Jeremy Linton 
[lorenzo.pieral...@arm.com: added MADT macro comments]
Signed-off-by: Lorenzo Pieralisi 
Acked-by: Sudeep Holla 
Cc: Will Deacon 
Cc: Catalin Marinas 
Cc: Al Stone 
Cc: "Rafael J. Wysocki" 
Signed-off-by: Will Deacon 
---
 xen/include/asm-arm/acpi.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/xen/include/asm-arm/acpi.h b/xen/include/asm-arm/acpi.h
index 50340281a917..b52ae2d6ef72 100644
--- a/xen/include/asm-arm/acpi.h
+++ b/xen/include/asm-arm/acpi.h
@@ -54,6 +54,14 @@ void acpi_smp_init_cpus(void);
  */
 paddr_t acpi_get_table_offset(struct membank tbl_add[], EFI_MEM_RES index);
 
+/* Macros for consistency checks of the GICC subtable of MADT */
+#define ACPI_MADT_GICC_LENGTH  \
+(acpi_gbl_FADT.header.revision < 6 ? 76 : 80)
+
+#define BAD_MADT_GICC_ENTRY(entry, end)
\
+(!(entry) || (unsigned long)(entry) + sizeof(*(entry)) > (end) ||  \
+ (entry)->header.length != ACPI_MADT_GICC_LENGTH)
+
 #ifdef CONFIG_ACPI
 extern bool acpi_disabled;
 /* Basic configuration for ACPI */
-- 
2.17.1

[PATCH v2 2/7] xen/arm: acpi: The fixmap area should always be cleared during failure/unmap

From: Julien Grall 

Commit 022387ee1ad3 "xen/arm: mm: Don't open-code Xen PT update in
{set, clear}_fixmap()" enforced that each set_fixmap() should be
paired with a clear_fixmap(). Any failure to follow the model would
result to a platform crash.

Unfortunately, the use of fixmap in the ACPI code was overlooked as it
is calling set_fixmap() but not clear_fixmap().

The function __acpi_os_map_table() is reworked so:
- We know before the mapping whether the fixmap region is big
enough for the mapping.
- It will fail if the fixmap is already in use. This is not a
change of behavior but clarifying the current expectation to avoid
hitting a BUG().

The function __acpi_os_unmap_table() will now call clear_fixmap().

Reported-by: Wei Xu 
Signed-off-by: Julien Grall 

---

The discussion on the original thread [1] suggested to also zap it on
x86. This is technically not necessary today, so it is left alone for
now.

I looked at making the fixmap code common but the index are inverted
between Arm and x86.

Changes in v2:
- Clarify the commit message
- Fix the size computation in __acpi_unmap_table()

[1] https://lore.kernel.org/xen-devel/5e26c935.9080...@hisilicon.com/
---
 xen/arch/arm/acpi/lib.c | 73 +++--
 1 file changed, 56 insertions(+), 17 deletions(-)

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index fcc186b03399..b755620e67b5 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -25,40 +25,79 @@
 #include 
 #include 
 
+static bool fixmap_inuse;
+
 char *__acpi_map_table(paddr_t phys, unsigned long size)
 {
-unsigned long base, offset, mapped_size;
-int idx;
+unsigned long base, offset;
+mfn_t mfn;
+unsigned int idx;
 
 /* No arch specific implementation after early boot */
 if ( system_state >= SYS_STATE_boot )
 return NULL;
 
 offset = phys & (PAGE_SIZE - 1);
-mapped_size = PAGE_SIZE - offset;
-set_fixmap(FIXMAP_ACPI_BEGIN, maddr_to_mfn(phys), PAGE_HYPERVISOR);
-base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
+base = FIXMAP_ADDR(FIXMAP_ACPI_BEGIN) + offset;
+
+/* Check the fixmap is big enough to map the region */
+if ( (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - base) < size )
+return NULL;
+
+/* With the fixmap, we can only map one region at the time */
+if ( fixmap_inuse )
+return NULL;
 
-/* Most cases can be covered by the below. */
+fixmap_inuse = true;
+
+size += offset;
+mfn = maddr_to_mfn(phys);
 idx = FIXMAP_ACPI_BEGIN;
-while ( mapped_size < size )
-{
-if ( ++idx > FIXMAP_ACPI_END )
-return NULL;/* cannot handle this */
-phys += PAGE_SIZE;
-set_fixmap(idx, maddr_to_mfn(phys), PAGE_HYPERVISOR);
-mapped_size += PAGE_SIZE;
-}
 
-return ((char *) base + offset);
+do {
+set_fixmap(idx, mfn, PAGE_HYPERVISOR);
+size -= min(size, (unsigned long)PAGE_SIZE);
+mfn = mfn_add(mfn, 1);
+idx++;
+} while ( size > 0 );
+
+return (char *)base;
 }
 
 bool __acpi_unmap_table(const void *ptr, unsigned long size)
 {
 vaddr_t vaddr = (vaddr_t)ptr;
+unsigned int idx;
+
+/* We are only handling fixmap address in the arch code */
+if ( (vaddr < FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) ||
+ (vaddr >= FIXMAP_ADDR(FIXMAP_ACPI_END)) )
+return false;
+
+/*
+ * __acpi_map_table() will always return a pointer in the first page
+ * for the ACPI fixmap region. The caller is expected to free with
+ * the same address.
+ */
+ASSERT((vaddr & PAGE_MASK) == FIXMAP_ADDR(FIXMAP_ACPI_BEGIN));
+
+/* The region allocated fit in the ACPI fixmap region. */
+ASSERT(size < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE - vaddr));
+ASSERT(fixmap_inuse);
+
+fixmap_inuse = false;
+
+size += vaddr - FIXMAP_ADDR(FIXMAP_ACPI_BEGIN);
+idx = FIXMAP_ACPI_BEGIN;
+
+do
+{
+clear_fixmap(idx);
+size -= min(size, (unsigned long)PAGE_SIZE);
+idx++;
+} while ( size > 0 );
 
-return ((vaddr >= FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) &&
-(vaddr < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)));
+return true;
 }
 
 /* True to indicate PSCI 0.2+ is implemented */
-- 
2.17.1

[PATCH v2 4/7] xen/arm: Introduce fw_unreserved_regions() and use it

From: Julien Grall 

Since commit 6e3e77120378 "xen/arm: setup: Relocate the Device-Tree
later on in the boot", the device-tree will not be kept mapped when
using ACPI.

However, a few places are calling dt_unreserved_regions() which expects
a valid DT. This will lead to a crash.

As the DT should not be used for ACPI (other than for detecting the
modules), a new function fw_unreserved_regions() is introduced.

It will behave the same way on DT system. On ACPI system, it will
unreserve the whole region.

Take the opportunity to clarify that bootinfo.reserved_mem is only used
when booting using Device-Tree.

Signed-off-by: Julien Grall 

---

Is there any region we should exclude on ACPI?

Changes in v2:
- Add a comment on top of bootinfo.reserved_mem.
---
 xen/arch/arm/kernel.c   |  2 +-
 xen/arch/arm/setup.c| 22 +-
 xen/include/asm-arm/setup.h |  3 ++-
 3 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/xen/arch/arm/kernel.c b/xen/arch/arm/kernel.c
index 032923853f2c..ab78689ed2a6 100644
--- a/xen/arch/arm/kernel.c
+++ b/xen/arch/arm/kernel.c
@@ -307,7 +307,7 @@ static __init int kernel_decompress(struct bootmodule *mod)
  * Free the original kernel, update the pointers to the
  * decompressed kernel
  */
-dt_unreserved_regions(addr, addr + size, init_domheap_pages, 0);
+fw_unreserved_regions(addr, addr + size, init_domheap_pages, 0);
 
 return 0;
 }
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 35e5bee04efa..7fcff9af2a7e 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -196,8 +196,9 @@ static void __init processor_id(void)
 processor_setup();
 }
 
-void __init dt_unreserved_regions(paddr_t s, paddr_t e,
-  void (*cb)(paddr_t, paddr_t), int first)
+static void __init dt_unreserved_regions(paddr_t s, paddr_t e,
+ void (*cb)(paddr_t, paddr_t),
+ int first)
 {
 int i, nr = fdt_num_mem_rsv(device_tree_flattened);
 
@@ -244,6 +245,17 @@ void __init dt_unreserved_regions(paddr_t s, paddr_t e,
 cb(s, e);
 }
 
+void __init fw_unreserved_regions(paddr_t s, paddr_t e,
+  void (*cb)(paddr_t, paddr_t), int first)
+{
+if ( acpi_disabled )
+dt_unreserved_regions(s, e, cb, first);
+else
+cb(s, e);
+}
+
+
+
 struct bootmodule __init *add_boot_module(bootmodule_kind kind,
   paddr_t start, paddr_t size,
   bool domU)
@@ -405,7 +417,7 @@ void __init discard_initial_modules(void)
  !mfn_valid(maddr_to_mfn(e)) )
 continue;
 
-dt_unreserved_regions(s, e, init_domheap_pages, 0);
+fw_unreserved_regions(s, e, init_domheap_pages, 0);
 }
 
 mi->nr_mods = 0;
@@ -712,7 +724,7 @@ static void __init setup_mm(void)
 n = mfn_to_maddr(mfn_add(xenheap_mfn_start, xenheap_pages));
 }
 
-dt_unreserved_regions(s, e, init_boot_pages, 0);
+fw_unreserved_regions(s, e, init_boot_pages, 0);
 
 s = n;
 }
@@ -765,7 +777,7 @@ static void __init setup_mm(void)
 if ( e > bank_end )
 e = bank_end;
 
-dt_unreserved_regions(s, e, init_boot_pages, 0);
+fw_unreserved_regions(s, e, init_boot_pages, 0);
 s = n;
 }
 }
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index 2f8f24e286ed..28bf622aa196 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -67,6 +67,7 @@ struct bootcmdlines {
 
 struct bootinfo {
 struct meminfo mem;
+/* The reserved regions are only used when booting using Device-Tree */
 struct meminfo reserved_mem;
 struct bootmodules modules;
 struct bootcmdlines cmdlines;
@@ -96,7 +97,7 @@ int construct_dom0(struct domain *d);
 void create_domUs(void);
 
 void discard_initial_modules(void);
-void dt_unreserved_regions(paddr_t s, paddr_t e,
+void fw_unreserved_regions(paddr_t s, paddr_t e,
void (*cb)(paddr_t, paddr_t), int first);
 
 size_t boot_fdt_info(const void *fdt, paddr_t paddr);
-- 
2.17.1

[PATCH v2 3/7] xen/arm: Check if the platform is not using ACPI before initializing Dom0less

From: Julien Grall 

Dom0less requires a device-tree. However, since commit 6e3e77120378
"xen/arm: setup: Relocate the Device-Tree later on in the boot", the
device-tree will not get unflatten when using ACPI.

This will lead to a crash during boot.

Given the complexity to setup dom0less with ACPI (for instance how to
assign device?), we should skip any code related to Dom0less when using
ACPI.

Signed-off-by: Julien Grall 
Tested-by: Rahul Singh 
Reviewed-by: Rahul Singh 
Reviewed-by: Stefano Stabellini 

---
Changes in v2:
- Add Rahul's tested-by and reviewed-by
- Add Stefano's reviewed-by
---
 xen/arch/arm/setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index f16b33fa87a2..35e5bee04efa 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -987,7 +987,8 @@ void __init start_xen(unsigned long boot_phys_offset,
 
 system_state = SYS_STATE_active;
 
-create_domUs();
+if ( acpi_disabled )
+create_domUs();
 
 domain_unpause_by_systemcontroller(dom0);
 
-- 
2.17.1

[PATCH v2 1/7] xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

From: Julien Grall 

The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
while the __acpi_os_{un,}map_memory() are meant to be arch-specific.

Currently, the former are still containing x86 specific code.

To avoid this rather strange split, the generic helpers are reworked so
they are arch-agnostic. This requires the introduction of a new helper
__acpi_os_unmap_memory() that will undo any mapping done by
__acpi_os_map_memory().

Currently, the arch-helper for unmap is basically a no-op so it only
returns whether the mapping was arch specific. But this will change
in the future.

Note that the x86 version of acpi_os_map_memory() was already able to
able the 1MB region. Hence why there is no addition of new code.

Signed-off-by: Julien Grall 
Reviewed-by: Rahul Singh 
Tested-by: Rahul Singh 

---
Changes in v2:
- Constify ptr in __acpi_unmap_table()
- Coding style fixes
- Fix build on arm64
- Use PAGE_OFFSET() rather than open-coding it
- Add Rahul's tested-by and reviewed-by
---
 xen/arch/arm/acpi/lib.c | 12 
 xen/arch/x86/acpi/lib.c | 18 ++
 xen/drivers/acpi/osl.c  | 34 ++
 xen/include/xen/acpi.h  |  1 +
 4 files changed, 49 insertions(+), 16 deletions(-)

diff --git a/xen/arch/arm/acpi/lib.c b/xen/arch/arm/acpi/lib.c
index 4fc6e17322c1..fcc186b03399 100644
--- a/xen/arch/arm/acpi/lib.c
+++ b/xen/arch/arm/acpi/lib.c
@@ -30,6 +30,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 unsigned long base, offset, mapped_size;
 int idx;
 
+/* No arch specific implementation after early boot */
+if ( system_state >= SYS_STATE_boot )
+return NULL;
+
 offset = phys & (PAGE_SIZE - 1);
 mapped_size = PAGE_SIZE - offset;
 set_fixmap(FIXMAP_ACPI_BEGIN, maddr_to_mfn(phys), PAGE_HYPERVISOR);
@@ -49,6 +53,14 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
 return ((char *) base + offset);
 }
 
+bool __acpi_unmap_table(const void *ptr, unsigned long size)
+{
+vaddr_t vaddr = (vaddr_t)ptr;
+
+return ((vaddr >= FIXMAP_ADDR(FIXMAP_ACPI_BEGIN)) &&
+(vaddr < (FIXMAP_ADDR(FIXMAP_ACPI_END) + PAGE_SIZE)));
+}
+
 /* True to indicate PSCI 0.2+ is implemented */
 bool __init acpi_psci_present(void)
 {
diff --git a/xen/arch/x86/acpi/lib.c b/xen/arch/x86/acpi/lib.c
index 265b9ad81905..a22414a05c13 100644
--- a/xen/arch/x86/acpi/lib.c
+++ b/xen/arch/x86/acpi/lib.c
@@ -46,6 +46,10 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
if ((phys + size) <= (1 * 1024 * 1024))
return __va(phys);
 
+   /* No further arch specific implementation after early boot */
+   if (system_state >= SYS_STATE_boot)
+   return NULL;
+
offset = phys & (PAGE_SIZE - 1);
mapped_size = PAGE_SIZE - offset;
set_fixmap(FIX_ACPI_END, phys);
@@ -66,6 +70,20 @@ char *__acpi_map_table(paddr_t phys, unsigned long size)
return ((char *) base + offset);
 }
 
+bool __acpi_unmap_table(const void *ptr, unsigned long size)
+{
+   unsigned long vaddr = (unsigned long)ptr;
+
+   if ((vaddr >= DIRECTMAP_VIRT_START) &&
+   (vaddr < DIRECTMAP_VIRT_END)) {
+   ASSERT(!((__pa(ptr) + size - 1) >> 20));
+   return true;
+   }
+
+   return ((vaddr >= __fix_to_virt(FIX_ACPI_END)) &&
+   (vaddr < (__fix_to_virt(FIX_ACPI_BEGIN) + PAGE_SIZE)));
+}
+
 unsigned int acpi_get_processor_id(unsigned int cpu)
 {
unsigned int acpiid, apicid;
diff --git a/xen/drivers/acpi/osl.c b/xen/drivers/acpi/osl.c
index 4c8bb7839eda..389505f78666 100644
--- a/xen/drivers/acpi/osl.c
+++ b/xen/drivers/acpi/osl.c
@@ -92,27 +92,29 @@ acpi_physical_address __init acpi_os_get_root_pointer(void)
 void __iomem *
 acpi_os_map_memory(acpi_physical_address phys, acpi_size size)
 {
-   if (system_state >= SYS_STATE_boot) {
-   mfn_t mfn = _mfn(PFN_DOWN(phys));
-   unsigned int offs = phys & (PAGE_SIZE - 1);
-
-   /* The low first Mb is always mapped on x86. */
-   if (IS_ENABLED(CONFIG_X86) && !((phys + size - 1) >> 20))
-   return __va(phys);
-   return __vmap(&mfn, PFN_UP(offs + size), 1, 1,
- ACPI_MAP_MEM_ATTR, VMAP_DEFAULT) + offs;
-   }
-   return __acpi_map_table(phys, size);
+   void *ptr;
+   mfn_t mfn = _mfn(PFN_DOWN(phys));
+   unsigned int offs = PAGE_OFFSET(phys);
+
+   /* Try the arch specific implementation first */
+   ptr = __acpi_map_table(phys, size);
+   if (ptr)
+   return ptr;
+
+   /* No common implementation for early boot map */
+   if (unlikely(system_state < SYS_STATE_boot))
+   return NULL;
+
+   ptr = __vmap(&mfn, PFN_UP(offs + size), 1, 1,
+ACPI_MAP_MEM_ATTR, VMAP_DEFAULT);
+
+   return !ptr ? NULL : (ptr + offs);
 }
 
 void acpi_os_unmap_mem

Re: [PATCH v2 11/11] x86/vpt: introduce a per-vPT lock

On 30.09.2020 15:30, Roger Pau Monné wrote:
> On Wed, Sep 30, 2020 at 12:41:08PM +0200, Roger Pau Monne wrote:
>> Introduce a per virtual timer lock that replaces the existing per-vCPU
>> and per-domain vPT locks. Since virtual timers are no longer assigned
>> or migrated between vCPUs the locking can be simplified to a
>> in-structure spinlock that protects all the fields.
>>
>> This requires introducing a helper to initialize the spinlock, and
>> that could be used to initialize other virtual timer fields in the
>> future.
>>
>> Signed-off-by: Roger Pau Monné 
> 
> Just realized I had the following uncommitted chunk that should have
> been part of this patch, nothing critical but the tm_lock can now be
> removed.

And then
Reviewed-by: Jan Beulich 

Jan

Re: [PATCH v4 2/4] xen: Introduce HAS_M2P config and use to protect mfn_to_gmfn call


Hi,

On 26/09/2020 14:00, Julien Grall wrote:

Hi Andrew,

On 22/09/2020 19:56, Andrew Cooper wrote:

On 22/09/2020 19:20, Julien Grall wrote:

+
   #endif /* __ASM_DOMAIN_H__ */
     /*
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 5c5e55ebcb76..7564df5e8374 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -136,6 +136,12 @@ struct xen_domctl_getdomaininfo {
   uint64_aligned_t outstanding_pages;
   uint64_aligned_t shr_pages;
   uint64_aligned_t paged_pages;
+#define XEN_INVALID_SHARED_INFO_FRAME (~(uint64_t)0)


We've already got INVALID_GFN as a constant used in the interface.  
Lets

not proliferate more.


This was my original approach (see [1]) but this was reworked because:
    1) INVALID_GFN is not technically defined in the ABI. So the
toolstack has to hardcode the value in the check.
    2) The value is different between 32-bit and 64-bit Arm as
INVALID_GFN is defined as an unsigned long.

So providing a new define is the right way to go.


There is nothing special about this field.  It should not have a
dedicated constant, when a general one is the appropriate one to use.

libxl already has LIBXL_INVALID_GFN, which is already used.


Right, but that's imply it cannot be used by libxc as this would be a 
layer violation.




If this isn't good enough, them the right thing to do is put a proper
INVALID_GFN in the tools interface.


That would be nice but I can see some issue on x86 given that we don't 
consistenly define a GFN in the interface as a 64-bit value.


So would you still be happy to consider introducing XEN_INVALID_GFN in 
the interface with some caveats?


Gentle ping. @Andrew, are you happy with this approach?

Cheers,

--
Julien Grall

Re: [PATCH v2 1/7] xen/acpi: Rework acpi_os_map_memory() and acpi_os_unmap_memory()

On 23.10.2020 17:41, Julien Grall wrote:
> From: Julien Grall 
> 
> The functions acpi_os_{un,}map_memory() are meant to be arch-agnostic
> while the __acpi_os_{un,}map_memory() are meant to be arch-specific.
> 
> Currently, the former are still containing x86 specific code.
> 
> To avoid this rather strange split, the generic helpers are reworked so
> they are arch-agnostic. This requires the introduction of a new helper
> __acpi_os_unmap_memory() that will undo any mapping done by
> __acpi_os_map_memory().
> 
> Currently, the arch-helper for unmap is basically a no-op so it only
> returns whether the mapping was arch specific. But this will change
> in the future.
> 
> Note that the x86 version of acpi_os_map_memory() was already able to
> able the 1MB region. Hence why there is no addition of new code.
> 
> Signed-off-by: Julien Grall 
> Reviewed-by: Rahul Singh 
> Tested-by: Rahul Singh 

Non-Arm parts
Reviewed-by: Jan Beulich 

Jan

Re: [PATCH] PCI: drop dead pci_lock_*pdev() declarations


Hi Jan,

On 23/10/2020 09:02, Jan Beulich wrote:

They have no definitions, and hence users, anywhere.

Signed-off-by: Jan Beulich 


Acked-by: Julien Grall 

Cheers,



--- a/xen/include/xen/pci.h
+++ b/xen/include/xen/pci.h
@@ -155,9 +155,6 @@ bool_t pci_device_detect(u16 seg, u8 bus
  int scan_pci_devices(void);
  enum pdev_type pdev_type(u16 seg, u8 bus, u8 devfn);
  int find_upstream_bridge(u16 seg, u8 *bus, u8 *devfn, u8 *secbus);
-struct pci_dev *pci_lock_pdev(int seg, int bus, int devfn);
-struct pci_dev *pci_lock_domain_pdev(
-struct domain *, int seg, int bus, int devfn);
  
  void setup_hwdom_pci_devices(struct domain *,

  int (*)(u8 devfn, struct pci_dev *));



--
Julien Grall

Re: [PATCH] SUPPORT: Add linux device model stubdom to Toolstack


Hi Jason,

On 20/10/2020 14:27, Jason Andryuk wrote:

On Tue, May 26, 2020 at 10:13 AM Ian Jackson  wrote:


Jason Andryuk writes ("[PATCH] SUPPORT: Add linux device model stubdom to 
Toolstack"):

Add qemu-xen linux device model stubdomain to the Toolstack section as a
Tech Preview.


Acked-by: Ian Jackson 


Hi, this never got applied.  It should go to staging and 4.14.


Sorry this fell through the cracks. I have committed it with the 
existing Acks.


Regarding 4.14, this would need to go through a backport request.

Cheers,

--
Julien Grall

[OSSTEST PATCH] host reuse fixes: Properly clear out old static tasks from history

The algorithm for clearing out old lifecycle entries was wrong: it
would delete all entries for non-live tasks.

In practice this would properly remove all the old entries for
non-static tasks, since ownd tasks typically don't releease things
until the task ends (and it becomes non-live).  And it wouldn't remove
more than it should do unless some now-not-live task had an allocation
overlapping with us, which is not supposed to be possible if we are
doing a host wipe.  But it would not remove static tasks ever, since
they are always live.

Change to a completely different algorithm:

 * Check that only us (ie, $ttaskid) has (any shares of) this host
   allocated.  There's a function resource_check_allocated_core which
   already does this and since we're conceptually part of Executive
   it is proper for us to call it.  This is just a sanity check.

 * Delete all lifecycle entries predating the first entry made by
   us.  (We could just delete all entries other than ours, but in
   theory maybe some future code could result in a siutation where
   someone else could have had another share briefly at some point.)

This removes old junk from the "Tasks that could have affected" in
reports.

Signed-off-by: Ian Jackson 
---
 Osstest/JobDB/Executive.pm | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/Osstest/JobDB/Executive.pm b/Osstest/JobDB/Executive.pm
index 1dcf55ff..097c8d75 100644
--- a/Osstest/JobDB/Executive.pm
+++ b/Osstest/JobDB/Executive.pm
@@ -515,15 +515,19 @@ sub jobdb_host_update_lifecycle_info ($$$) { #method
 
 if ($mode eq 'wiped') {
db_retry($flight, [qw(running)], $dbh_tests,[], sub {
-$dbh_tests->do(<{Others};
+   $dbh_tests->do(<

[PATCH 05/25] libxl: s/detatched/detached in libxl_pci.c

From: Paul Durrant 

Simply spelling correction. Purely cosmetic fix.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 58242b5b94..3936d60a14 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1860,7 +1860,7 @@ static void pci_remove_qmp_query_cb(libxl__egc *egc,
 libxl__ev_qmp *qmp, const libxl__json_object *response, int rc);
 static void pci_remove_timeout(libxl__egc *egc,
 libxl__ev_time *ev, const struct timeval *requested_abs, int rc);
-static void pci_remove_detatched(libxl__egc *egc,
+static void pci_remove_detached(libxl__egc *egc,
 pci_remove_state *prs, int rc);
 static void pci_remove_stubdom_done(libxl__egc *egc,
 libxl__ao_device *aodev);
@@ -1973,7 +1973,7 @@ skip1:
 skip_irq:
 rc = 0;
 out_fail:
-pci_remove_detatched(egc, prs, rc); /* must be last */
+pci_remove_detached(egc, prs, rc); /* must be last */
 }
 
 static void pci_remove_qemu_trad_watch_state_cb(libxl__egc *egc,
@@ -1997,7 +1997,7 @@ static void 
pci_remove_qemu_trad_watch_state_cb(libxl__egc *egc,
 rc = qemu_pci_remove_xenstore(gc, domid, pci, prs->force);
 
 out:
-pci_remove_detatched(egc, prs, rc);
+pci_remove_detached(egc, prs, rc);
 }
 
 static void pci_remove_qmp_device_del(libxl__egc *egc,
@@ -2023,7 +2023,7 @@ static void pci_remove_qmp_device_del(libxl__egc *egc,
 return;
 
 out:
-pci_remove_detatched(egc, prs, rc);
+pci_remove_detached(egc, prs, rc);
 }
 
 static void pci_remove_qmp_device_del_cb(libxl__egc *egc,
@@ -2046,7 +2046,7 @@ static void pci_remove_qmp_device_del_cb(libxl__egc *egc,
 return;
 
 out:
-pci_remove_detatched(egc, prs, rc);
+pci_remove_detached(egc, prs, rc);
 }
 
 static void pci_remove_qmp_retry_timer_cb(libxl__egc *egc, libxl__ev_time *ev,
@@ -2062,7 +2062,7 @@ static void pci_remove_qmp_retry_timer_cb(libxl__egc 
*egc, libxl__ev_time *ev,
 return;
 
 out:
-pci_remove_detatched(egc, prs, rc);
+pci_remove_detached(egc, prs, rc);
 }
 
 static void pci_remove_qmp_query_cb(libxl__egc *egc,
@@ -2122,7 +2122,7 @@ static void pci_remove_qmp_query_cb(libxl__egc *egc,
 }
 
 out:
-pci_remove_detatched(egc, prs, rc); /* must be last */
+pci_remove_detached(egc, prs, rc); /* must be last */
 }
 
 static void pci_remove_timeout(libxl__egc *egc, libxl__ev_time *ev,
@@ -2141,12 +2141,12 @@ static void pci_remove_timeout(libxl__egc *egc, 
libxl__ev_time *ev,
 /* If we timed out, we might still want to keep destroying the device
  * (when force==true), so let the next function decide what to do on
  * error */
-pci_remove_detatched(egc, prs, rc);
+pci_remove_detached(egc, prs, rc);
 }
 
-static void pci_remove_detatched(libxl__egc *egc,
- pci_remove_state *prs,
- int rc)
+static void pci_remove_detached(libxl__egc *egc,
+pci_remove_state *prs,
+int rc)
 {
 STATE_AO_GC(prs->aodev->ao);
 int stubdomid = 0;
-- 
2.11.0

[PATCH 00/25] xl / libxl: named PCI pass-through devices

From: Paul Durrant 

This series adds support for naming devices added to the assignable list
and then using a name (instead of a BDF) for convenience when attaching
a device to a domain.

The first 15 patches are cleanup. The remaining 10 modify documentation
and add the new functionality.

Paul Durrant (25):
  xl / libxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X
  libxl: use LIBXL_DEFINE_DEVICE_LIST for pci devices
  libxl: use LIBXL_DEFINE_DEVICE_LIST for nic devices
  libxl: s/domainid/domid/g in libxl_pci.c
  libxl: s/detatched/detached in libxl_pci.c
  libxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c
  libxl: stop using aodev->device_config in libxl__device_pci_add()...
  libxl: generalise 'driver_path' xenstore access functions in
libxl_pci.c
  libxl: remove unnecessary check from libxl__device_pci_add()
  libxl: remove get_all_assigned_devices() from libxl_pci.c
  libxl: make sure callers of libxl_device_pci_list() free the list
after use
  libxl: add libxl_device_pci_assignable_list_free()...
  libxl: use COMPARE_PCI() macro is_pci_in_array()...
  libxl: add/recover 'rdm_policy' to/from PCI backend in xenstore
  libxl: Make sure devices added by pci-attach are reflected in the
config
  docs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg
manpage...
  docs/man: improve documentation of PCI_SPEC_STRING...
  docs/man: fix xl(1) documentation for 'pci' operations
  libxl: introduce 'libxl_pci_bdf' in the idl...
  libxlu: introduce xlu_pci_parse_spec_string()
  libxl: modify
libxl_device_pci_assignable_add/remove/list/list_free()...
  docs/man: modify xl(1) in preparation for naming of assignable devices
  xl / libxl: support naming of assignable devices
  docs/man: modify xl-pci-configuration(5) to add 'name' field to
PCI_SPEC_STRING
  xl / libxl: support 'xl pci-attach/detach' by name

 docs/man/xl-pci-configuration.5.pod  |  218 +++
 docs/man/xl.1.pod.in |   39 +-
 docs/man/xl.cfg.5.pod.in |   68 +--
 tools/golang/xenlight/helpers.gen.go |   77 ++-
 tools/golang/xenlight/types.gen.go   |8 +-
 tools/include/libxl.h|   67 ++-
 tools/include/libxlutil.h|8 +-
 tools/libs/light/libxl_create.c  |6 +-
 tools/libs/light/libxl_dm.c  |   18 +-
 tools/libs/light/libxl_internal.h|   53 +-
 tools/libs/light/libxl_nic.c |   19 +-
 tools/libs/light/libxl_pci.c | 1072 ++
 tools/libs/light/libxl_types.idl |   19 +-
 tools/libs/util/libxlu_pci.c |  359 ++--
 tools/ocaml/libs/xl/xenlight_stubs.c |   19 +-
 tools/xl/xl_cmdtable.c   |   16 +-
 tools/xl/xl_parse.c  |   30 +-
 tools/xl/xl_pci.c|  164 +++---
 tools/xl/xl_sxp.c|   12 +-
 19 files changed, 1337 insertions(+), 935 deletions(-)
 create mode 100644 docs/man/xl-pci-configuration.5.pod
---
Cc: Anthony PERARD 
Cc: Christian Lindig 
Cc: David Scott 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Nick Rosbrook 
Cc: Wei Liu 
-- 
2.11.0

[PATCH 03/25] libxl: use LIBXL_DEFINE_DEVICE_LIST for nic devices

From: Paul Durrant 

Remove open-coded definitions of libxl_device_nic_list() and
libxl_device_nic_list_free().

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 

This patch is slightly tangential. I just happend to notice the inefficiency
while looking at code for various device types.
---
 tools/libs/light/libxl_nic.c | 19 +--
 1 file changed, 1 insertion(+), 18 deletions(-)

diff --git a/tools/libs/light/libxl_nic.c b/tools/libs/light/libxl_nic.c
index 0e5d120ae9..a44058f929 100644
--- a/tools/libs/light/libxl_nic.c
+++ b/tools/libs/light/libxl_nic.c
@@ -403,24 +403,6 @@ static int libxl__nic_from_xenstore(libxl__gc *gc, const 
char *libxl_path,
 return rc;
 }
 
-libxl_device_nic *libxl_device_nic_list(libxl_ctx *ctx, uint32_t domid, int 
*num)
-{
-libxl_device_nic *r;
-
-GC_INIT(ctx);
-
-r = libxl__device_list(gc, &libxl__nic_devtype, domid, num);
-
-GC_FREE;
-
-return r;
-}
-
-void libxl_device_nic_list_free(libxl_device_nic* list, int num)
-{
-libxl__device_list_free(&libxl__nic_devtype, list, num);
-}
-
 int libxl_device_nic_getinfo(libxl_ctx *ctx, uint32_t domid,
   const libxl_device_nic *nic,
   libxl_nicinfo *nicinfo)
@@ -527,6 +509,7 @@ LIBXL_DEFINE_DEVID_TO_DEVICE(nic)
 LIBXL_DEFINE_DEVICE_ADD(nic)
 LIBXL_DEFINE_DEVICES_ADD(nic)
 LIBXL_DEFINE_DEVICE_REMOVE(nic)
+LIBXL_DEFINE_DEVICE_LIST(nic)
 
 DEFINE_DEVICE_TYPE_STRUCT(nic, VIF,
 .update_config = libxl_device_nic_update_config,
-- 
2.11.0

[PATCH 07/25] libxl: stop using aodev->device_config in libxl__device_pci_add()...

From: Paul Durrant 

... to hold a pointer to the device.

There is already a 'pci' field in 'pci_add_state' so simply use that from
the start. This also allows the 'pci' (#3) argument to be dropped from
do_pci_add().

NOTE: This patch also changes the type of the 'pci_domid' field in
  'pci_add_state' from 'int' to 'libxl_domid' which is more appropriate
  given what the field is used for.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 97889fda49..b8d8cc6a69 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1055,7 +1055,7 @@ typedef struct pci_add_state {
 libxl__ev_qmp qmp;
 libxl__ev_time timeout;
 libxl_device_pci *pci;
-int pci_domid;
+libxl_domid pci_domid;
 } pci_add_state;
 
 static void pci_add_qemu_trad_watch_state_cb(libxl__egc *egc,
@@ -1072,7 +1072,6 @@ static void pci_add_dm_done(libxl__egc *,
 
 static void do_pci_add(libxl__egc *egc,
libxl_domid domid,
-   libxl_device_pci *pci,
pci_add_state *pas)
 {
 STATE_AO_GC(pas->aodev->ao);
@@ -1082,7 +1081,6 @@ static void do_pci_add(libxl__egc *egc,
 /* init pci_add_state */
 libxl__xswait_init(&pas->xswait);
 libxl__ev_qmp_init(&pas->qmp);
-pas->pci = pci;
 pas->pci_domid = domid;
 libxl__ev_time_init(&pas->timeout);
 
@@ -1544,13 +1542,10 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 int stubdomid = 0;
 pci_add_state *pas;
 
-/* Store *pci to be used by callbacks */
-aodev->device_config = pci;
-aodev->device_type = &libxl__pci_devtype;
-
 GCNEW(pas);
 pas->aodev = aodev;
 pas->domid = domid;
+pas->pci = pci;
 pas->starting = starting;
 pas->callback = device_pci_add_stubdom_done;
 
@@ -1604,9 +1599,10 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 GCNEW(pci_s);
 libxl_device_pci_init(pci_s);
 libxl_device_pci_copy(CTX, pci_s, pci);
+pas->pci = pci_s;
 pas->callback = device_pci_add_stubdom_wait;
 
-do_pci_add(egc, stubdomid, pci_s, pas); /* must be last */
+do_pci_add(egc, stubdomid, pas); /* must be last */
 return;
 }
 
@@ -1661,9 +1657,8 @@ static void device_pci_add_stubdom_done(libxl__egc *egc,
 int i;
 
 /* Convenience aliases */
-libxl__ao_device *aodev = pas->aodev;
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = aodev->device_config;
+libxl_device_pci *pci = pas->pci;
 
 if (rc) goto out;
 
@@ -1698,7 +1693,7 @@ static void device_pci_add_stubdom_done(libxl__egc *egc,
 pci->vdevfn = orig_vdev;
 }
 pas->callback = device_pci_add_done;
-do_pci_add(egc, domid, pci, pas); /* must be last */
+do_pci_add(egc, domid, pas); /* must be last */
 return;
 }
 }
@@ -1714,7 +1709,7 @@ static void device_pci_add_done(libxl__egc *egc,
 EGC_GC;
 libxl__ao_device *aodev = pas->aodev;
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = aodev->device_config;
+libxl_device_pci *pci = pas->pci;
 
 if (rc) {
 LOGD(ERROR, domid,
-- 
2.11.0

[PATCH 02/25] libxl: use LIBXL_DEFINE_DEVICE_LIST for pci devices

From: Paul Durrant 

Remove open coded definition of libxl_device_pci_list().

NOTE: Using the macro also defines libxl_device_pci_list_free() so a prototype
  for it is added. Subsequent patches will make used of it.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/include/libxl.h|  7 +++
 tools/libs/light/libxl_pci.c | 27 ++-
 2 files changed, 9 insertions(+), 25 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index fbe4c81ba5..ee52d3cf7e 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -452,6 +452,12 @@
 #define LIBXL_HAVE_CONFIG_PCIS 1
 
 /*
+ * LIBXL_HAVE_DEVICE_PCI_LIST_FREE indicates that the
+ * libxl_device_pci_list_free() function is defined.
+ */
+#define LIBXL_HAVE_DEVICE_PCI_LIST_FREE 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
@@ -2321,6 +2327,7 @@ int libxl_device_pci_destroy(libxl_ctx *ctx, uint32_t 
domid,
 
 libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid,
 int *num);
+void libxl_device_pci_list_free(libxl_device_pci* list, int num);
 
 /*
  * Turns the current process into a backend device service daemon
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 2ff1c64a31..515e74fe5a 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -2393,31 +2393,6 @@ static int libxl__device_pci_get_num(libxl__gc *gc, 
const char *be_path,
 return rc;
 }
 
-libxl_device_pci *libxl_device_pci_list(libxl_ctx *ctx, uint32_t domid, int 
*num)
-{
-GC_INIT(ctx);
-char *be_path;
-unsigned int n, i;
-libxl_device_pci *pcis = NULL;
-
-*num = 0;
-
-be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
-LIBXL__DEVICE_KIND_PCI);
-if (libxl__device_pci_get_num(gc, be_path, &n))
-goto out;
-
-pcis = calloc(n, sizeof(libxl_device_pci));
-
-for (i = 0; i < n; i++)
-libxl__device_pci_from_xs_be(gc, be_path, i, pcis + i);
-
-*num = n;
-out:
-GC_FREE;
-return pcis;
-}
-
 void libxl__device_pci_destroy_all(libxl__egc *egc, uint32_t domid,
libxl__multidev *multidev)
 {
@@ -2492,6 +2467,8 @@ static int libxl_device_pci_compare(const 
libxl_device_pci *d1,
 return COMPARE_PCI(d1, d2);
 }
 
+LIBXL_DEFINE_DEVICE_LIST(pci)
+
 #define libxl__device_pci_update_devid NULL
 
 DEFINE_DEVICE_TYPE_STRUCT(pci, PCI,
-- 
2.11.0

[PATCH 01/25] xl / libxl: s/pcidev/pci and remove DEFINE_DEVICE_TYPE_STRUCT_X

From: Paul Durrant 

The seemingly arbitrary use of 'pci' and 'pcidev' in the code in libxl_pci.c
is confusing and also compromises use of some macros used for other device
types. Indeed it seems that DEFINE_DEVICE_TYPE_STRUCT_X exists solely because
of this duality.

This patch purges use of 'pcidev' from the libxl code, allowing evaluation of
DEFINE_DEVICE_TYPE_STRUCT_X to be replaced with DEFINE_DEVICE_TYPE_STRUCT,
hence allowing removal of the former.

For consistency the xl and libs/util code is also modified, but in this case
it is purely cosmetic.

NOTE: Some of the more gross formatting errors (such as lack of spaces after
  keywords) that came into context have been fixed in libxl_pci.c.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
 tools/include/libxl.h |  17 +-
 tools/libs/light/libxl_create.c   |   6 +-
 tools/libs/light/libxl_dm.c   |  18 +-
 tools/libs/light/libxl_internal.h |  45 ++-
 tools/libs/light/libxl_pci.c  | 582 +++---
 tools/libs/light/libxl_types.idl  |   2 +-
 tools/libs/util/libxlu_pci.c  |  36 +--
 tools/xl/xl_parse.c   |  28 +-
 tools/xl/xl_pci.c |  68 ++---
 tools/xl/xl_sxp.c |  12 +-
 10 files changed, 409 insertions(+), 405 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 1ea5b4f446..fbe4c81ba5 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -445,6 +445,13 @@
 #define LIBXL_HAVE_DISK_SAFE_REMOVE 1
 
 /*
+ * LIBXL_HAVE_CONFIG_PCIS indicates that the 'pcidevs' and 'num_pcidevs'
+ * fields in libxl_domain_config have been renamed to 'pcis' and 'num_pcis'
+ * respectively.
+ */
+#define LIBXL_HAVE_CONFIG_PCIS 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
@@ -2300,15 +2307,15 @@ int libxl_device_pvcallsif_destroy(libxl_ctx *ctx, 
uint32_t domid,
 
 /* PCI Passthrough */
 int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid,
- libxl_device_pci *pcidev,
+ libxl_device_pci *pci,
  const libxl_asyncop_how *ao_how)
  LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_pci_remove(libxl_ctx *ctx, uint32_t domid,
-libxl_device_pci *pcidev,
+libxl_device_pci *pci,
 const libxl_asyncop_how *ao_how)
 LIBXL_EXTERNAL_CALLERS_ONLY;
 int libxl_device_pci_destroy(libxl_ctx *ctx, uint32_t domid,
- libxl_device_pci *pcidev,
+ libxl_device_pci *pci,
  const libxl_asyncop_how *ao_how)
  LIBXL_EXTERNAL_CALLERS_ONLY;
 
@@ -2352,8 +2359,8 @@ int libxl_device_events_handler(libxl_ctx *ctx,
  * added or is not bound, the functions will emit a warning but return
  * SUCCESS.
  */
-int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pcidev, 
int rebind);
-int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci 
*pcidev, int rebind);
+int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pci, int 
rebind);
+int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pci, 
int rebind);
 libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
 
 /* CPUID handling */
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 321a13e519..1f5052c520 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -1100,7 +1100,7 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
 goto error_out;
 }
 
-bool need_pt = d_config->num_pcidevs || d_config->num_dtdevs;
+bool need_pt = d_config->num_pcis || d_config->num_dtdevs;
 if (c_info->passthrough == LIBXL_PASSTHROUGH_DEFAULT) {
 c_info->passthrough = need_pt
 ? LIBXL_PASSTHROUGH_ENABLED : LIBXL_PASSTHROUGH_DISABLED;
@@ -1141,7 +1141,7 @@ int libxl__domain_config_setdefault(libxl__gc *gc,
  * assignment when PoD is enabled.
  */
 if (d_config->c_info.type != LIBXL_DOMAIN_TYPE_PV &&
-d_config->num_pcidevs && pod_enabled) {
+d_config->num_pcis && pod_enabled) {
 ret = ERROR_INVAL;
 LOGD(ERROR, domid,
  "PCI device assignment for HVM guest failed due to PoD enabled");
@@ -1817,7 +1817,7 @@ const libxl__device_type *device_type_tbl[] = {
 &libxl__vtpm_devtype,
 &libxl__usbctrl_devtype,
 &libxl__usbdev_devtype,
-&libxl__pcidev_devtype,
+&libxl__pci_devtype,
 &libxl__dtdev_devtype,
 &libxl__vdispl_devtype,
 &libxl__vsnd_devtype,
diff --git a/tools/libs/light/libxl_dm.c b/tools/libs/light/libxl_dm.c
index d1ff35dda3..f147a733c8 100644
--- a/tools/libs/light/libxl_dm.c
+++ b/tools/libs/light/libxl_dm.c
@@ -442,7 +442,7 @@ int libxl__domain_devic

[PATCH 06/25] libxl: remove extraneous arguments to do_pci_remove() in libxl_pci.c

From: Paul Durrant 

Both 'domid' and 'pci' are available in 'pci_remove_state' so there is no
need to also pass them as separate arguments.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 3936d60a14..97889fda49 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1867,14 +1867,14 @@ static void pci_remove_stubdom_done(libxl__egc *egc,
 static void pci_remove_done(libxl__egc *egc,
 pci_remove_state *prs, int rc);
 
-static void do_pci_remove(libxl__egc *egc, uint32_t domid,
-  libxl_device_pci *pci, int force,
-  pci_remove_state *prs)
+static void do_pci_remove(libxl__egc *egc, pci_remove_state *prs)
 {
 STATE_AO_GC(prs->aodev->ao);
 libxl_ctx *ctx = libxl__gc_owner(gc);
 libxl_device_pci *assigned;
+uint32_t domid = prs->domid;
 libxl_domain_type type = libxl__domain_type(gc, domid);
+libxl_device_pci *pci = prs->pci;
 int rc, num;
 
 assigned = libxl_device_pci_list(ctx, domid, &num);
@@ -2269,7 +2269,6 @@ static void device_pci_remove_common_next(libxl__egc *egc,
 EGC_GC;
 
 /* Convenience aliases */
-libxl_domid domid = prs->domid;
 libxl_device_pci *const pci = prs->pci;
 libxl__ao_device *const aodev = prs->aodev;
 const unsigned int pfunc_mask = prs->pfunc_mask;
@@ -2287,7 +2286,7 @@ static void device_pci_remove_common_next(libxl__egc *egc,
 } else {
 pci->vdevfn = orig_vdev;
 }
-do_pci_remove(egc, domid, pci, prs->force, prs);
+do_pci_remove(egc, prs);
 return;
 }
 }
-- 
2.11.0

[PATCH 04/25] libxl: s/domainid/domid/g in libxl_pci.c

From: Paul Durrant 

It's pointless having two stack variables to hold exactly the same value.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 43 ---
 1 file changed, 20 insertions(+), 23 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 515e74fe5a..58242b5b94 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1326,8 +1326,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 int irq, i;
 int r;
 uint32_t flag = XEN_DOMCTL_DEV_RDM_RELAXED;
-uint32_t domainid = domid;
-bool isstubdom = libxl_is_stubdom(ctx, domid, &domainid);
+bool isstubdom = libxl_is_stubdom(ctx, domid, &domid);
 
 /* Convenience aliases */
 bool starting = pas->starting;
@@ -1349,7 +1348,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 irq = 0;
 
 if (f == NULL) {
-LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
+LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
 rc = ERROR_FAIL;
 goto out;
 }
@@ -1361,7 +1360,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 if (flags & PCI_BAR_IO) {
 r = xc_domain_ioport_permission(ctx->xch, domid, start, size, 
1);
 if (r < 0) {
-LOGED(ERROR, domainid,
+LOGED(ERROR, domid,
   "xc_domain_ioport_permission 0x%llx/0x%llx (error 
%d)",
   start, size, r);
 fclose(f);
@@ -1372,7 +1371,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 r = xc_domain_iomem_permission(ctx->xch, domid, 
start>>XC_PAGE_SHIFT,
 
(size+(XC_PAGE_SIZE-1))>>XC_PAGE_SHIFT, 1);
 if (r < 0) {
-LOGED(ERROR, domainid,
+LOGED(ERROR, domid,
   "xc_domain_iomem_permission 0x%llx/0x%llx (error 
%d)",
   start, size, r);
 fclose(f);
@@ -1387,13 +1386,13 @@ static void pci_add_dm_done(libxl__egc *egc,
 pci->bus, pci->dev, pci->func);
 f = fopen(sysfs_path, "r");
 if (f == NULL) {
-LOGED(ERROR, domainid, "Couldn't open %s", sysfs_path);
+LOGED(ERROR, domid, "Couldn't open %s", sysfs_path);
 goto out_no_irq;
 }
 if ((fscanf(f, "%u", &irq) == 1) && irq) {
 r = xc_physdev_map_pirq(ctx->xch, domid, irq, &irq);
 if (r < 0) {
-LOGED(ERROR, domainid, "xc_physdev_map_pirq irq=%d (error=%d)",
+LOGED(ERROR, domid, "xc_physdev_map_pirq irq=%d (error=%d)",
   irq, r);
 fclose(f);
 rc = ERROR_FAIL;
@@ -1401,7 +1400,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 }
 r = xc_domain_irq_permission(ctx->xch, domid, irq, 1);
 if (r < 0) {
-LOGED(ERROR, domainid,
+LOGED(ERROR, domid,
   "xc_domain_irq_permission irq=%d (error=%d)", irq, r);
 fclose(f);
 rc = ERROR_FAIL;
@@ -1414,7 +1413,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 if (pci->permissive) {
 if ( sysfs_write_bdf(gc, SYSFS_PCIBACK_DRIVER"/permissive",
  pci) < 0 ) {
-LOGD(ERROR, domainid, "Setting permissive for device");
+LOGD(ERROR, domid, "Setting permissive for device");
 rc = ERROR_FAIL;
 goto out;
 }
@@ -1425,13 +1424,13 @@ out_no_irq:
 if (pci->rdm_policy == LIBXL_RDM_RESERVE_POLICY_STRICT) {
 flag &= ~XEN_DOMCTL_DEV_RDM_RELAXED;
 } else if (pci->rdm_policy != LIBXL_RDM_RESERVE_POLICY_RELAXED) {
-LOGED(ERROR, domainid, "unknown rdm check flag.");
+LOGED(ERROR, domid, "unknown rdm check flag.");
 rc = ERROR_FAIL;
 goto out;
 }
 r = xc_assign_device(ctx->xch, domid, pci_encode_bdf(pci), flag);
 if (r < 0 && (hvm || errno != ENOSYS)) {
-LOGED(ERROR, domainid, "xc_assign_device failed");
+LOGED(ERROR, domid, "xc_assign_device failed");
 rc = ERROR_FAIL;
 goto out;
 }
@@ -1877,7 +1876,6 @@ static void do_pci_remove(libxl__egc *egc, uint32_t domid,
 libxl_device_pci *assigned;
 libxl_domain_type type = libxl__domain_type(gc, domid);
 int rc, num;
-uint32_t domainid = domid;
 
 assigned = libxl_device_pci_list(ctx, domid, &num);
 if (assigned == NULL) {
@@ -1889,7 +1887,7 @@ static void do_pci_remove(libxl__egc *egc, uint32_t domid,
 rc = ERROR_INVAL;
 if ( !is_pci_in_array(assigned, num, pci->domain,
   pci->bus, pci->dev, pci->func) ) {
-LOGD(ERROR, domainid, "PCI device not attached to this domain");
+LOGD(ERROR, domid, "PCI device not attached to this doma

[PATCH 09/25] libxl: remove unnecessary check from libxl__device_pci_add()

From: Paul Durrant 

The code currently checks explicitly whether the device is already assigned,
but this is actually unnecessary as assigned devices do not form part of
the list returned by libxl_device_pci_assignable_list() and hence the
libxl_pci_assignable() test would have already failed.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 16 +---
 1 file changed, 1 insertion(+), 15 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index f74203100d..0be1b21185 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1535,8 +1535,7 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 {
 STATE_AO_GC(aodev->ao);
 libxl_ctx *ctx = libxl__gc_owner(gc);
-libxl_device_pci *assigned;
-int num_assigned, rc;
+int rc;
 int stubdomid = 0;
 pci_add_state *pas;
 
@@ -1575,19 +1574,6 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 goto out;
 }
 
-rc = get_all_assigned_devices(gc, &assigned, &num_assigned);
-if ( rc ) {
-LOGD(ERROR, domid,
- "cannot determine if device is assigned, refusing to continue");
-goto out;
-}
-if ( is_pci_in_array(assigned, num_assigned, pci->domain,
- pci->bus, pci->dev, pci->func) ) {
-LOGD(ERROR, domid, "PCI device already attached to a domain");
-rc = ERROR_FAIL;
-goto out;
-}
-
 libxl__device_pci_reset(gc, pci->domain, pci->bus, pci->dev, pci->func);
 
 stubdomid = libxl_get_stubdom_id(ctx, domid);
-- 
2.11.0

[PATCH 08/25] libxl: generalise 'driver_path' xenstore access functions in libxl_pci.c

From: Paul Durrant 

For the purposes of re-binding a device to its previous driver
libxl__device_pci_assignable_add() writes the driver path into xenstore.
This path is then read back in libxl__device_pci_assignable_remove().

The functions that support this writing to and reading from xenstore are
currently dedicated for this purpose and hence the node name 'driver_path'
is hard-coded. This patch generalizes these utility functions and passes
'driver_path' as an argument. Subsequent patches will invoke them to
access other nodes.

NOTE: Because functions will have a broader use (other than storing a
  driver path in lieu of pciback) the base xenstore path is also
  changed from '/libxl/pciback' to '/libxl/pci'.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 66 +---
 1 file changed, 32 insertions(+), 34 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index b8d8cc6a69..f74203100d 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -718,48 +718,46 @@ static int pciback_dev_unassign(libxl__gc *gc, 
libxl_device_pci *pci)
 return 0;
 }
 
-#define PCIBACK_INFO_PATH "/libxl/pciback"
+#define PCI_INFO_PATH "/libxl/pci"
 
-static void pci_assignable_driver_path_write(libxl__gc *gc,
-libxl_device_pci *pci,
-char *driver_path)
+static char *pci_info_xs_path(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node)
 {
-char *path;
+return node ?
+GCSPRINTF(PCI_INFO_PATH"/"PCI_BDF_XSPATH"/%s",
+  pci->domain, pci->bus, pci->dev, pci->func,
+  node) :
+GCSPRINTF(PCI_INFO_PATH"/"PCI_BDF_XSPATH,
+  pci->domain, pci->bus, pci->dev, pci->func);
+}
+
+
+static void pci_info_xs_write(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node, const char *val)
+{
+char *path = pci_info_xs_path(gc, pci, node);
 
-path = GCSPRINTF(PCIBACK_INFO_PATH"/"PCI_BDF_XSPATH"/driver_path",
- pci->domain,
- pci->bus,
- pci->dev,
- pci->func);
-if ( libxl__xs_printf(gc, XBT_NULL, path, "%s", driver_path) < 0 ) {
-LOGE(WARN, "Write of %s to node %s failed.", driver_path, path);
+if ( libxl__xs_printf(gc, XBT_NULL, path, "%s", val) < 0 ) {
+LOGE(WARN, "Write of %s to node %s failed.", val, path);
 }
 }
 
-static char * pci_assignable_driver_path_read(libxl__gc *gc,
-  libxl_device_pci *pci)
+static char *pci_info_xs_read(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node)
 {
-return libxl__xs_read(gc, XBT_NULL,
-  GCSPRINTF(
-   PCIBACK_INFO_PATH "/" PCI_BDF_XSPATH "/driver_path",
-   pci->domain,
-   pci->bus,
-   pci->dev,
-   pci->func));
+char *path = pci_info_xs_path(gc, pci, node);
+
+return libxl__xs_read(gc, XBT_NULL, path);
 }
 
-static void pci_assignable_driver_path_remove(libxl__gc *gc,
-  libxl_device_pci *pci)
+static void pci_info_xs_remove(libxl__gc *gc, libxl_device_pci *pci,
+   const char *node)
 {
+char *path = pci_info_xs_path(gc, pci, node);
 libxl_ctx *ctx = libxl__gc_owner(gc);
 
 /* Remove the xenstore entry */
-xs_rm(ctx->xsh, XBT_NULL,
-  GCSPRINTF(PCIBACK_INFO_PATH "/" PCI_BDF_XSPATH,
-pci->domain,
-pci->bus,
-pci->dev,
-pci->func) );
+xs_rm(ctx->xsh, XBT_NULL, path);
 }
 
 static int libxl__device_pci_assignable_add(libxl__gc *gc,
@@ -805,9 +803,9 @@ static int libxl__device_pci_assignable_add(libxl__gc *gc,
 /* Store driver_path for rebinding to dom0 */
 if ( rebind ) {
 if ( driver_path ) {
-pci_assignable_driver_path_write(gc, pci, driver_path);
+pci_info_xs_write(gc, pci, "driver_path", driver_path);
 } else if ( (driver_path =
- pci_assignable_driver_path_read(gc, pci)) != NULL ) {
+ pci_info_xs_read(gc, pci, "driver_path")) != NULL ) {
 LOG(INFO, PCI_BDF" not bound to a driver, will be rebound to %s",
 dom, bus, dev, func, driver_path);
 } else {
@@ -815,7 +813,7 @@ static int libxl__device_pci_assignable_add(libxl__gc *gc,
 dom, bus, dev, func);
 }
 } else {
-pci_assignable_driver_path_remove(gc, pci);
+pci_info_xs_remove(gc, pci, "driver_path");
 }
 
 if ( pciback_dev_assign(gc, pci) ) {
@@ -865,7 +863,7 @@ static int libxl__devic

Re: [XEN PATCH v1] xen/arm : Add support for SMMUv3 driver

2020-10-23 Thread Stefano Stabellini

On Fri, 23 Oct 2020, Julien Grall wrote:
> On 23/10/2020 01:02, Stefano Stabellini wrote:
> > On Thu, 22 Oct 2020, Julien Grall wrote:
> > > > > On 20/10/2020 16:25, Rahul Singh wrote:
> > > > > > Add support for ARM architected SMMUv3 implementations. It is based
> > > > > > on
> > > > > > the Linux SMMUv3 driver.
> > > > > > Major differences between the Linux driver are as follows:
> > > > > > 1. Only Stage-2 translation is supported as compared to the Linux
> > > > > > driver
> > > > > >  that supports both Stage-1 and Stage-2 translations.
> > > > > > 2. Use P2M  page table instead of creating one as SMMUv3 has the
> > > > > >  capability to share the page tables with the CPU.
> > > > > > 3. Tasklets is used in place of threaded IRQ's in Linux for event
> > > > > > queue
> > > > > >  and priority queue IRQ handling.
> > > > > 
> > > > > Tasklets are not a replacement for threaded IRQ. In particular, they
> > > > > will
> > > > > have priority over anything else (IOW nothing will run on the pCPU
> > > > > until
> > > > > they are done).
> > > > > 
> > > > > Do you know why Linux is using thread. Is it because of long running
> > > > > operations?
> > > > 
> > > > Yes you are right because of long running operations Linux is using the
> > > > threaded IRQs.
> > > > 
> > > > SMMUv3 reports fault/events bases on memory-based circular buffer queues
> > > > not
> > > > based on the register. As per my understanding, it is time-consuming to
> > > > process the memory based queues in interrupt context because of that
> > > > Linux
> > > > is using threaded IRQ to process the faults/events from SMMU.
> > > > 
> > > > I didn’t find any other solution in XEN in place of tasklet to defer the
> > > > work, that’s why I used tasklet in XEN in replacement of threaded IRQs.
> > > > If
> > > > we do all work in interrupt context we will make XEN less responsive.
> > > 
> > > So we need to make sure that Xen continue to receives interrupts, but we
> > > also
> > > need to make sure that a vCPU bound to the pCPU is also responsive.
> > > 
> > > > 
> > > > If you know another solution in XEN that will be used to defer the work
> > > > in
> > > > the interrupt please let me know I will try to use that.
> > > 
> > > One of my work colleague encountered a similar problem recently. He had a
> > > long
> > > running tasklet and wanted to be broken down in smaller chunk.
> > > 
> > > We decided to use a timer to reschedule the taslket in the future. This
> > > allows
> > > the scheduler to run other loads (e.g. vCPU) for some time.
> > > 
> > > This is pretty hackish but I couldn't find a better solution as tasklet
> > > have
> > > high priority.
> > > 
> > > Maybe the other will have a better idea.
> > 
> > Julien's suggestion is a good one.
> > 
> > But I think tasklets can be configured to be called from the idle_loop,
> > in which case they are not run in interrupt context?
> 
> Tasklets can either run from the IDLE loop or from a softirq context.
> 
> When running from a softirq context is may happen on return from receiving an
> interrupt. However, interrupts will always be enabled.
> 
> So I am not sure what concern you are trying to raise here.

Not raising any concerns :-)

I thought one of the previous statements in this thread implied that
tasklets are run in interrupt context -- I just wanted to go into
details on that point as it is relevant.

[PATCH 15/25] libxl: Make sure devices added by pci-attach are reflected in the config

From: Paul Durrant 

Currently libxl__device_pci_add_xenstore() is broken in that does not
update the domain's configuration for the first device added (which causes
creation of the overall backend area in xenstore). This can be easily observed
by running 'xl list -l' after adding a single device: the device will be
missing.

This patch fixes the problem and adds a DEBUG log line to allow easy
verification that the domain configuration is being modified.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 68 +++-
 1 file changed, 35 insertions(+), 33 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index c5d73133eb..45685ebec2 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -79,39 +79,18 @@ static void libxl__device_from_pci(libxl__gc *gc, uint32_t 
domid,
 device->kind = LIBXL__DEVICE_KIND_PCI;
 }
 
-static int libxl__create_pci_backend(libxl__gc *gc, uint32_t domid,
- const libxl_device_pci *pci,
- int num)
+static void libxl__create_pci_backend(libxl__gc *gc, uint32_t domid,
+  flexarray_t *front, flexarray_t *back)
 {
-flexarray_t *front = NULL;
-flexarray_t *back = NULL;
-libxl__device device;
-int i;
-
-front = flexarray_make(gc, 16, 1);
-back = flexarray_make(gc, 16, 1);
-
 LOGD(DEBUG, domid, "Creating pci backend");
 
-/* add pci device */
-libxl__device_from_pci(gc, domid, pci, &device);
-
 flexarray_append_pair(back, "frontend-id", GCSPRINTF("%d", domid));
-flexarray_append_pair(back, "online", "1");
+flexarray_append_pair(back, "online", GCSPRINTF("%d", 1));
 flexarray_append_pair(back, "state", GCSPRINTF("%d", 
XenbusStateInitialising));
 flexarray_append_pair(back, "domain", libxl__domid_to_name(gc, domid));
 
-for (i = 0; i < num; i++, pci++)
-libxl_create_pci_backend_device(gc, back, i, pci);
-
-flexarray_append_pair(back, "num_devs", GCSPRINTF("%d", num));
 flexarray_append_pair(front, "backend-id", GCSPRINTF("%d", 0));
 flexarray_append_pair(front, "state", GCSPRINTF("%d", 
XenbusStateInitialising));
-
-return libxl__device_generic_add(gc, XBT_NULL, &device,
- libxl__xs_kvs_of_flexarray(gc, back),
- libxl__xs_kvs_of_flexarray(gc, front),
- NULL);
 }
 
 static int libxl__device_pci_add_xenstore(libxl__gc *gc,
@@ -119,7 +98,7 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
   const libxl_device_pci *pci,
   bool starting)
 {
-flexarray_t *back;
+flexarray_t *front, *back;
 char *num_devs, *be_path;
 int num = 0;
 xs_transaction_t t = XBT_NULL;
@@ -127,16 +106,22 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 libxl_domain_config d_config;
 libxl__flock *lock = NULL;
 bool is_stubdomain = libxl_is_stubdom(CTX, domid, NULL);
+libxl__device device;
+
+libxl__device_from_pci(gc, domid, pci, &device);
 
 /* Stubdomain doesn't have own config. */
 if (!is_stubdomain)
 libxl_domain_config_init(&d_config);
 
+front = flexarray_make(gc, 16, 1);
+back = flexarray_make(gc, 16, 1);
+
 be_path = libxl__domain_device_backend_path(gc, 0, domid, 0,
 LIBXL__DEVICE_KIND_PCI);
 num_devs = libxl__xs_read(gc, XBT_NULL, GCSPRINTF("%s/num_devs", be_path));
 if (!num_devs)
-return libxl__create_pci_backend(gc, domid, pci, 1);
+libxl__create_pci_backend(gc, domid, front, back);
 
 libxl_domain_type domtype = libxl__domain_type(gc, domid);
 if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
@@ -147,13 +132,11 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 return ERROR_FAIL;
 }
 
-back = flexarray_make(gc, 16, 1);
-
 LOGD(DEBUG, domid, "Adding new pci device to xenstore");
-num = atoi(num_devs);
+num = num_devs ? atoi(num_devs) : 0;
 libxl_create_pci_backend_device(gc, back, num, pci);
 flexarray_append_pair(back, "num_devs", GCSPRINTF("%d", num + 1));
-if (!starting)
+if (num && !starting)
 flexarray_append_pair(back, "state", GCSPRINTF("%d", 
XenbusStateReconfiguring));
 
 /*
@@ -170,6 +153,7 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 rc = libxl__get_domain_configuration(gc, domid, &d_config);
 if (rc) goto out;
 
+LOGD(DEBUG, domid, "Adding new pci device to config");
 device_add_domain_config(gc, &d_config, &libxl__pci_devtype,
  pci);
 
@@ -186,7 +170,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
 if (rc) goto out;
 }
 
-li

[PATCH 10/25] libxl: remove get_all_assigned_devices() from libxl_pci.c

From: Paul Durrant 

Use of this function is a very inefficient way to check whether a device
has already been assigned.

This patch adds code that saves the domain id in xenstore at the point of
assignment, and removes it again when the device id de-assigned (or the
domain is destroyed). It is then straightforward to check whether a device
has been assigned by checking whether a device has a saved domain id.

NOTE: To facilitate the xenstore check it is necessary to move the
  pci_info_xs_read() earlier in libxl_pci.c. To keep related functions
  together, the rest of the pci_info_xs_XXX() functions are moved too.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 149 ---
 1 file changed, 55 insertions(+), 94 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 0be1b21185..879b1b24a0 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -317,50 +317,6 @@ retry_transaction2:
 return 0;
 }
 
-static int get_all_assigned_devices(libxl__gc *gc, libxl_device_pci **list, 
int *num)
-{
-char **domlist;
-unsigned int nd = 0, i;
-
-*list = NULL;
-*num = 0;
-
-domlist = libxl__xs_directory(gc, XBT_NULL, "/local/domain", &nd);
-for(i = 0; i < nd; i++) {
-char *path, *num_devs;
-
-path = GCSPRINTF("/local/domain/0/backend/%s/%s/0/num_devs",
- libxl__device_kind_to_string(LIBXL__DEVICE_KIND_PCI),
- domlist[i]);
-num_devs = libxl__xs_read(gc, XBT_NULL, path);
-if ( num_devs ) {
-int ndev = atoi(num_devs), j;
-char *devpath, *bdf;
-
-for(j = 0; j < ndev; j++) {
-devpath = GCSPRINTF("/local/domain/0/backend/%s/%s/0/dev-%u",
-
libxl__device_kind_to_string(LIBXL__DEVICE_KIND_PCI),
-domlist[i], j);
-bdf = libxl__xs_read(gc, XBT_NULL, devpath);
-if ( bdf ) {
-unsigned dom, bus, dev, func;
-if ( sscanf(bdf, PCI_BDF, &dom, &bus, &dev, &func) != 4 )
-continue;
-
-*list = realloc(*list, sizeof(libxl_device_pci) * ((*num) 
+ 1));
-if (*list == NULL)
-return ERROR_NOMEM;
-pci_struct_fill(*list + *num, dom, bus, dev, func, 0);
-(*num)++;
-}
-}
-}
-}
-libxl__ptr_add(gc, *list);
-
-return 0;
-}
-
 static int is_pci_in_array(libxl_device_pci *assigned, int num_assigned,
int dom, int bus, int dev, int func)
 {
@@ -408,19 +364,58 @@ static int sysfs_write_bdf(libxl__gc *gc, const char * 
sysfs_path,
 return 0;
 }
 
+#define PCI_INFO_PATH "/libxl/pci"
+
+static char *pci_info_xs_path(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node)
+{
+return node ?
+GCSPRINTF(PCI_INFO_PATH"/"PCI_BDF_XSPATH"/%s",
+  pci->domain, pci->bus, pci->dev, pci->func,
+  node) :
+GCSPRINTF(PCI_INFO_PATH"/"PCI_BDF_XSPATH,
+  pci->domain, pci->bus, pci->dev, pci->func);
+}
+
+
+static int pci_info_xs_write(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node, const char *val)
+{
+char *path = pci_info_xs_path(gc, pci, node);
+int rc = libxl__xs_printf(gc, XBT_NULL, path, "%s", val);
+
+if (rc) LOGE(WARN, "Write of %s to node %s failed.", val, path);
+
+return rc;
+}
+
+static char *pci_info_xs_read(libxl__gc *gc, libxl_device_pci *pci,
+  const char *node)
+{
+char *path = pci_info_xs_path(gc, pci, node);
+
+return libxl__xs_read(gc, XBT_NULL, path);
+}
+
+static void pci_info_xs_remove(libxl__gc *gc, libxl_device_pci *pci,
+   const char *node)
+{
+char *path = pci_info_xs_path(gc, pci, node);
+libxl_ctx *ctx = libxl__gc_owner(gc);
+
+/* Remove the xenstore entry */
+xs_rm(ctx->xsh, XBT_NULL, path);
+}
+
 libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num)
 {
 GC_INIT(ctx);
-libxl_device_pci *pcis = NULL, *new, *assigned;
+libxl_device_pci *pcis = NULL, *new;
 struct dirent *de;
 DIR *dir;
-int r, num_assigned;
 
 *num = 0;
 
-r = get_all_assigned_devices(gc, &assigned, &num_assigned);
-if (r) goto out;
-
 dir = opendir(SYSFS_PCIBACK_DRIVER);
 if (NULL == dir) {
 if (errno == ENOENT) {
@@ -436,9 +431,6 @@ libxl_device_pci 
*libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num)
 if (sscanf(de->d_name, PCI_BDF, &dom, &bus, &dev, &func) != 4)
 continue;
 
-if (is_pci_in_array(assigned, num_assigned, dom, bus, dev, func))
-continue;
-
 new = reallo

[PATCH 13/25] libxl: use COMPARE_PCI() macro is_pci_in_array()...

From: Paul Durrant 

... rather than an open-coded equivalent.

This patch tidies up the is_pci_in_array() function, making it take a single
'libxl_device_pci' argument rather than separate domain, bus, device and
function arguments. The already-available COMPARE_PCI() macro can then be
used and it is also modified to return 'bool' rather than 'int'.

The patch also modifies libxl_pci_assignable() to use is_pci_in_array() rather
than a separate open-coded equivalent, and also modifies it to return a
'bool' rather than an 'int'.

NOTE: The COMPARE_PCI() macro is also fixed to include the 'domain' in its
  comparison, which should always have been the case.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_internal.h |  7 ---
 tools/libs/light/libxl_pci.c  | 38 +-
 2 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/tools/libs/light/libxl_internal.h 
b/tools/libs/light/libxl_internal.h
index 3e70ff639b..80d7988622 100644
--- a/tools/libs/light/libxl_internal.h
+++ b/tools/libs/light/libxl_internal.h
@@ -4744,9 +4744,10 @@ void libxl__xcinfo2xlinfo(libxl_ctx *ctx,
  * devices have same identifier. */
 #define COMPARE_DEVID(a, b) ((a)->devid == (b)->devid)
 #define COMPARE_DISK(a, b) (!strcmp((a)->vdev, (b)->vdev))
-#define COMPARE_PCI(a, b) ((a)->func == (b)->func &&\
-   (a)->bus == (b)->bus &&  \
-   (a)->dev == (b)->dev)
+#define COMPARE_PCI(a, b) ((a)->domain == (b)->domain && \
+   (a)->bus == (b)->bus &&   \
+   (a)->dev == (b)->dev &&   \
+   (a)->func == (b)->func)
 #define COMPARE_USB(a, b) ((a)->ctrl == (b)->ctrl && \
(a)->port == (b)->port)
 #define COMPARE_USBCTRL(a, b) ((a)->devid == (b)->devid)
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index e858509609..2e8e1c50f1 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -317,24 +317,17 @@ retry_transaction2:
 return 0;
 }
 
-static int is_pci_in_array(libxl_device_pci *assigned, int num_assigned,
-   int dom, int bus, int dev, int func)
+static int is_pci_in_array(libxl_device_pci *pcis, int num,
+   libxl_device_pci *pci)
 {
 int i;
 
-for(i = 0; i < num_assigned; i++) {
-if ( assigned[i].domain != dom )
-continue;
-if ( assigned[i].bus != bus )
-continue;
-if ( assigned[i].dev != dev )
-continue;
-if ( assigned[i].func != func )
-continue;
-return 1;
+for(i = 0; i < num; i++) {
+if (COMPARE_PCI(pci, &pcis[i]))
+break;
 }
 
-return 0;
+return i < num;
 }
 
 /* Write the standard BDF into the sysfs path given by sysfs_path. */
@@ -1467,21 +1460,17 @@ int libxl_device_pci_add(libxl_ctx *ctx, uint32_t domid,
 return AO_INPROGRESS;
 }
 
-static int libxl_pci_assignable(libxl_ctx *ctx, libxl_device_pci *pci)
+static bool libxl_pci_assignable(libxl_ctx *ctx, libxl_device_pci *pci)
 {
 libxl_device_pci *pcis;
-int num, i;
+int num;
+bool assignable;
 
 pcis = libxl_device_pci_assignable_list(ctx, &num);
-for (i = 0; i < num; i++) {
-if (pcis[i].domain == pci->domain &&
-pcis[i].bus == pci->bus &&
-pcis[i].dev == pci->dev &&
-pcis[i].func == pci->func)
-break;
-}
+assignable = is_pci_in_array(pcis, num, pci);
 libxl_device_pci_assignable_list_free(pcis, num);
-return i != num;
+
+return assignable;
 }
 
 static void device_pci_add_stubdom_wait(libxl__egc *egc,
@@ -1829,8 +1818,7 @@ static void do_pci_remove(libxl__egc *egc, 
pci_remove_state *prs)
 goto out_fail;
 }
 
-attached = is_pci_in_array(pcis, num, pci->domain,
-   pci->bus, pci->dev, pci->func);
+attached = is_pci_in_array(pcis, num, pci);
 libxl_device_pci_list_free(pcis, num);
 
 rc = ERROR_INVAL;
-- 
2.11.0

[PATCH 21/25] libxl: modify libxl_device_pci_assignable_add/remove/list/list_free()...

From: Paul Durrant 

... to use 'libxl_pci_bdf' rather than 'libxl_device_pci'.

This patch modifies the API and callers accordingly. It also modifies
several internal functions in libxl_pci.c that support the API to also use
'libxl_pci_bdf'.

NOTE: The OCaml bindings are adjusted to contain the interface change. It
  should therefore not affect compatibility with OCaml-based utilities.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Christian Lindig 
Cc: David Scott 
Cc: Anthony PERARD 
---
 tools/include/libxl.h|  15 ++-
 tools/libs/light/libxl_pci.c | 215 +++
 tools/ocaml/libs/xl/xenlight_stubs.c |  15 ++-
 tools/xl/xl_pci.c|  32 +++---
 4 files changed, 157 insertions(+), 120 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 5edacccbd1..5703fdf367 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -470,6 +470,13 @@
 #define LIBXL_HAVE_PCI_BDF 1
 
 /*
+ * LIBXL_HAVE_PCI_ASSIGNABLE_BDF indicates that the
+ * libxl_device_pci_assignable_add/remove/list/list_free() functions all
+ * use the 'libxl_pci_bdf' type rather than 'libxl_device_pci' type.
+ */
+#define LIBXL_HAVE_PCI_ASSIGNABLE_BDF 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
@@ -2378,10 +2385,10 @@ int libxl_device_events_handler(libxl_ctx *ctx,
  * added or is not bound, the functions will emit a warning but return
  * SUCCESS.
  */
-int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pci, int 
rebind);
-int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pci, 
int rebind);
-libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
-void libxl_device_pci_assignable_list_free(libxl_device_pci *list, int num);
+int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_pci_bdf *pcibdf, int 
rebind);
+int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_pci_bdf *pcibdf, 
int rebind);
+libxl_pci_bdf *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
+void libxl_device_pci_assignable_list_free(libxl_pci_bdf *list, int num);
 
 /* CPUID handling */
 int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index fec77dd270..5104f31448 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -25,26 +25,33 @@
 #define PCI_BDF_XSPATH "%04x-%02x-%02x-%01x"
 #define PCI_PT_QDEV_ID "pci-pt-%02x_%02x.%01x"
 
-static unsigned int pci_encode_bdf(libxl_device_pci *pci)
+static unsigned int pci_encode_bdf(libxl_pci_bdf *pcibdf)
 {
 unsigned int value;
 
-value = pci->bdf.domain << 16;
-value |= (pci->bdf.bus & 0xff) << 8;
-value |= (pci->bdf.dev & 0x1f) << 3;
-value |= (pci->bdf.func & 0x7);
+value = pcibdf->domain << 16;
+value |= (pcibdf->bus & 0xff) << 8;
+value |= (pcibdf->dev & 0x1f) << 3;
+value |= (pcibdf->func & 0x7);
 
 return value;
 }
 
+static void pcibdf_struct_fill(libxl_pci_bdf *pcibdf, unsigned int domain,
+   unsigned int bus, unsigned int dev,
+   unsigned int func)
+{
+pcibdf->domain = domain;
+pcibdf->bus = bus;
+pcibdf->dev = dev;
+pcibdf->func = func;
+}
+
 static void pci_struct_fill(libxl_device_pci *pci, unsigned int domain,
 unsigned int bus, unsigned int dev,
 unsigned int func, unsigned int vdevfn)
 {
-pci->bdf.domain = domain;
-pci->bdf.bus = bus;
-pci->bdf.dev = dev;
-pci->bdf.func = func;
+pcibdf_struct_fill(&pci->bdf, domain, bus, dev, func);
 pci->vdevfn = vdevfn;
 }
 
@@ -318,8 +325,8 @@ static int is_pci_in_array(libxl_device_pci *pcis, int num,
 }
 
 /* Write the standard BDF into the sysfs path given by sysfs_path. */
-static int sysfs_write_bdf(libxl__gc *gc, const char * sysfs_path,
-   libxl_device_pci *pci)
+static int sysfs_write_bdf(libxl__gc *gc, const char *sysfs_path,
+   libxl_pci_bdf *pcibdf)
 {
 int rc, fd;
 char *buf;
@@ -330,8 +337,8 @@ static int sysfs_write_bdf(libxl__gc *gc, const char * 
sysfs_path,
 return ERROR_FAIL;
 }
 
-buf = GCSPRINTF(PCI_BDF, pci->bdf.domain, pci->bdf.bus,
-pci->bdf.dev, pci->bdf.func);
+buf = GCSPRINTF(PCI_BDF, pcibdf->domain, pcibdf->bus,
+pcibdf->dev, pcibdf->func);
 rc = write(fd, buf, strlen(buf));
 /* Annoying to have two if's, but we need the errno */
 if (rc < 0)
@@ -346,22 +353,22 @@ static int sysfs_write_bdf(libxl__gc *gc, const char * 
sysfs_path,
 
 #define PCI_INFO_PATH "/libxl/pci"
 
-static char *pci_info_xs_path(libxl__gc *gc, libxl_device_pci *pci,
+static char *pci_info_xs_path(libxl__gc *gc, libxl_pci_bdf *pcibdf,

[PATCH 17/25] docs/man: improve documentation of PCI_SPEC_STRING...

From: Paul Durrant 

... and prepare for adding support for non-positional parsing of 'bdf' and
'vslot' in a subsequent patch.

Also document 'BDF' as a first-class parameter type and fix the documentation
to state that the default value of 'rdm_policy' is actually 'strict', not
'relaxed', as can be seen in libxl__device_pci_setdefault().

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 docs/man/xl-pci-configuration.5.pod | 177 ++--
 1 file changed, 148 insertions(+), 29 deletions(-)

diff --git a/docs/man/xl-pci-configuration.5.pod 
b/docs/man/xl-pci-configuration.5.pod
index 72a27bd95d..4dd73bc498 100644
--- a/docs/man/xl-pci-configuration.5.pod
+++ b/docs/man/xl-pci-configuration.5.pod
@@ -6,32 +6,105 @@ xl-pci-configuration - XL PCI Configuration Syntax
 
 =head1 SYNTAX
 
-This document specifies the format for B which is used by
-the L pci configuration option, and related L commands.
+This document specifies the format for B and B which are
+used by the L pci configuration option, and related L
+commands.
 
-Each B has the form of
-B<[:]BB:DD.F[@VSLOT],KEY=VALUE,KEY=VALUE,...> where:
+A B has the following form:
+
+[:]BB:SS.F
+
+B is the domain number, B is the bus number, B is the device (or
+slot) number, and B is the function number. This is the same scheme as
+used in the output of L for the device in question. By default
+L will omit the domain (B) if it is zero and hence a zero
+value for domain may also be omitted when specifying a B.
+
+Each B has the one of the forms:
+
+=over 4
+
+[[@,][=,]*
+[=,]*
+
+=back
+
+For example, these strings are equivalent:
 
 =over 4
 
-=item B<[:]BB:DD.F>
+36:00.0@20,seize=1
+36:00.0,vslot=20,seize=1
+bdf=36:00.0,vslot=20,seize=1
 
-Identifies the PCI device from the host perspective in the domain
-(B), Bus (B), Device (B) and Function (B) syntax. This is
-the same scheme as used in the output of B for the device in
-question.
+=back
+
+More formally, the string is a series of comma-separated keyword/value
+pairs, flags and positional parameters.  Parameters which are not bare
+keywords and which do not contain "=" symbols are assigned to the
+positional parameters, in the order specified below.  The positional
+parameters may also be specified by name.
+
+Each parameter may be specified at most once, either as a positional
+parameter or a named parameter.  Default values apply if the parameter
+is not specified, or if it is specified with an empty value (whether
+positionally or explicitly).
+
+B: In context of B (see L), parameters other than
+B will be ignored.
+
+=head1 Positional Parameters
+
+=over 4
+
+=item B=I
+
+=over 4
 
-Note: by default B will omit the domain (B) if it
-is zero and it is optional here also. You may specify the function
-(B) as B<*> to indicate all functions.
+=item Description
 
-=item B<@VSLOT>
+This identifies the PCI device from the host perspective.
 
-Specifies the virtual slot where the guest will see this
-device. This is equivalent to the B which the guest sees. In a
-guest B and B are C<:00>.
+In the context of a B you may specify the function (B) as
+B<*> to indicate all functions of a multi-function device.
 
-=item B
+=item Default Value
+
+None. This parameter is mandatory as it identifies the device.
+
+=back
+
+=item B=I
+
+=over 4
+
+=item Description
+
+Specifies the virtual slot (device) number where the guest will see this
+device. For example, running L in a Linux guest where B
+was specified as C<8> would identify the device as C<00:08.0>. Virtual domain
+and bus numbers are always 0.
+
+B This parameter is always parsed as a hexidecimal value.
+
+=item Default Value
+
+None. This parameter is not mandatory. An available B will be selected
+if this parameter is not specified.
+
+=back
+
+=back
+
+=head1 Other Parameters and Flags
+
+=over 4
+
+=item B=I
+
+=over 4
+
+=item Description
 
 By default pciback only allows PV guests to write "known safe" values
 into PCI configuration space, likewise QEMU (both qemu-xen and
@@ -46,33 +119,79 @@ more control over the device, which may have security or 
stability
 implications.  It is recommended to only enable this option for
 trusted VMs under administrator's control.
 
-=item B
+=item Default Value
+
+0
+
+=back
+
+=item B=I
+
+=over 4
+
+=item Description
 
 Specifies that MSI-INTx translation should be turned on for the PCI
 device. When enabled, MSI-INTx translation will always enable MSI on
-the PCI device regardless of whether the guest uses INTx or MSI. Some
-device drivers, such as NVIDIA's, detect an inconsistency and do not
+the PCI device regardless of whether the guest uses INTx or MSI.
+
+=item Default Value
+
+Some device drivers, such as NVIDIA's, detect an inconsistency and do not
 function when this option is enabled. Therefore the default is false (0).
 
-=item B
+=back
+
+=item B=I
+
+=over 4
+
+=item Description
 
-Tells B to automatically attempt to re-a

[PATCH 24/25] docs/man: modify xl-pci-configuration(5) to add 'name' field to PCI_SPEC_STRING

From: Paul Durrant 

Since assignable devices can be named, a subsequent patch will support use
of a PCI_SPEC_STRING containing a 'name' parameter instead of a 'bdf'. In
this case the name will be used to look up the 'bdf' in the list of assignable
(or assigned) devices.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 docs/man/xl-pci-configuration.5.pod | 25 +++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl-pci-configuration.5.pod 
b/docs/man/xl-pci-configuration.5.pod
index 4dd73bc498..db3360307c 100644
--- a/docs/man/xl-pci-configuration.5.pod
+++ b/docs/man/xl-pci-configuration.5.pod
@@ -51,7 +51,7 @@ is not specified, or if it is specified with an empty value 
(whether
 positionally or explicitly).
 
 B: In context of B (see L), parameters other than
-B will be ignored.
+B or B will be ignored.
 
 =head1 Positional Parameters
 
@@ -70,7 +70,11 @@ B<*> to indicate all functions of a multi-function device.
 
 =item Default Value
 
-None. This parameter is mandatory as it identifies the device.
+None. This parameter is mandatory in its positional form. As a non-positional
+parameter it is also mandatory unless a B parameter is present, in
+which case B must not be present since the B will be used to find
+the B in the list of assignable devices. See L for more information
+on naming assignable devices.
 
 =back
 
@@ -194,4 +198,21 @@ B: This overrides the global B option.
 
 =back
 
+=item B=I
+
+=over 4
+
+=item Description
+
+This is the name given when the B was made assignable. See L for
+more information on naming assignable devices.
+
+=item Default Value
+
+None. This parameter must not be present if a B parameter is present.
+If a B parameter is not present then B is mandatory as it is
+required to look up the B in the list of assignable devices.
+
+=back
+
 =back
-- 
2.11.0

[PATCH 16/25] docs/man: extract documentation of PCI_SPEC_STRING from the xl.cfg manpage...

From: Paul Durrant 

... and put it into a new xl-pci-configuration(5) manpage, akin to the
xl-network-configration(5) and xl-disk-configuration(5) manpages.

This patch moves the content of the section verbatim. A subsequent patch
will improve the documentation, once it is in its new location.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 docs/man/xl-pci-configuration.5.pod | 78 +
 docs/man/xl.cfg.5.pod.in| 68 +---
 2 files changed, 79 insertions(+), 67 deletions(-)
 create mode 100644 docs/man/xl-pci-configuration.5.pod

diff --git a/docs/man/xl-pci-configuration.5.pod 
b/docs/man/xl-pci-configuration.5.pod
new file mode 100644
index 00..72a27bd95d
--- /dev/null
+++ b/docs/man/xl-pci-configuration.5.pod
@@ -0,0 +1,78 @@
+=encoding utf8
+
+=head1 NAME
+
+xl-pci-configuration - XL PCI Configuration Syntax
+
+=head1 SYNTAX
+
+This document specifies the format for B which is used by
+the L pci configuration option, and related L commands.
+
+Each B has the form of
+B<[:]BB:DD.F[@VSLOT],KEY=VALUE,KEY=VALUE,...> where:
+
+=over 4
+
+=item B<[:]BB:DD.F>
+
+Identifies the PCI device from the host perspective in the domain
+(B), Bus (B), Device (B) and Function (B) syntax. This is
+the same scheme as used in the output of B for the device in
+question.
+
+Note: by default B will omit the domain (B) if it
+is zero and it is optional here also. You may specify the function
+(B) as B<*> to indicate all functions.
+
+=item B<@VSLOT>
+
+Specifies the virtual slot where the guest will see this
+device. This is equivalent to the B which the guest sees. In a
+guest B and B are C<:00>.
+
+=item B
+
+By default pciback only allows PV guests to write "known safe" values
+into PCI configuration space, likewise QEMU (both qemu-xen and
+qemu-xen-traditional) imposes the same constraint on HVM guests.
+However, many devices require writes to other areas of the configuration space
+in order to operate properly.  This option tells the backend (pciback or QEMU)
+to allow all writes to the PCI configuration space of this device by this
+domain.
+
+B it gives the guest much
+more control over the device, which may have security or stability
+implications.  It is recommended to only enable this option for
+trusted VMs under administrator's control.
+
+=item B
+
+Specifies that MSI-INTx translation should be turned on for the PCI
+device. When enabled, MSI-INTx translation will always enable MSI on
+the PCI device regardless of whether the guest uses INTx or MSI. Some
+device drivers, such as NVIDIA's, detect an inconsistency and do not
+function when this option is enabled. Therefore the default is false (0).
+
+=item B
+
+Tells B to automatically attempt to re-assign a device to
+pciback if it is not already assigned.
+
+B If you set this option, B will gladly re-assign a critical
+system device, such as a network or a disk controller being used by
+dom0 without confirmation.  Please use with care.
+
+=item B
+
+B<(HVM only)> Specifies that the VM should be able to program the
+D0-D3hot power management states for the PCI device. The default is false (0).
+
+=item B
+
+B<(HVM/x86 only)> This is the same as the policy setting inside the B
+option but just specific to a given device. The default is "relaxed".
+
+Note: this would override global B option.
+
+=back
diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 0532739c1f..b00644e852 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -1101,73 +1101,7 @@ option is valid only when the B option is 
specified.
 =item B
 
 Specifies the host PCI devices to passthrough to this guest.
-Each B has the form of
-B<[:]BB:DD.F[@VSLOT],KEY=VALUE,KEY=VALUE,...> where:
-
-=over 4
-
-=item B<[:]BB:DD.F>
-
-Identifies the PCI device from the host perspective in the domain
-(B), Bus (B), Device (B) and Function (B) syntax. This is
-the same scheme as used in the output of B for the device in
-question.
-
-Note: by default B will omit the domain (B) if it
-is zero and it is optional here also. You may specify the function
-(B) as B<*> to indicate all functions.
-
-=item B<@VSLOT>
-
-Specifies the virtual slot where the guest will see this
-device. This is equivalent to the B which the guest sees. In a
-guest B and B are C<:00>.
-
-=item B
-
-By default pciback only allows PV guests to write "known safe" values
-into PCI configuration space, likewise QEMU (both qemu-xen and
-qemu-xen-traditional) imposes the same constraint on HVM guests.
-However, many devices require writes to other areas of the configuration space
-in order to operate properly.  This option tells the backend (pciback or QEMU)
-to allow all writes to the PCI configuration space of this device by this
-domain.
-
-B it gives the guest much
-more control over the device, which may have security or stability
-implications.  It is recommended to only enable this option

[PATCH 12/25] libxl: add libxl_device_pci_assignable_list_free()...

From: Paul Durrant 

... to be used by callers of libxl_device_pci_assignable_list().

Currently there is no API for callers of libxl_device_pci_assignable_list()
to free the list. The xl function pciassignable_list() calls
libxl_device_pci_dispose() on each element of the returned list, but
libxl_pci_assignable() in libxl_pci.c does not. Neither does the implementation
of libxl_device_pci_assignable_list() call libxl_device_pci_init().

This patch adds the new API function, makes sure it is used everywhere and
also modifies libxl_device_pci_assignable_list() to initialize list
entries rather than just zeroing them.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Christian Lindig 
Cc: David Scott 
Cc: Anthony PERARD 
---
 tools/include/libxl.h|  7 +++
 tools/libs/light/libxl_pci.c | 14 --
 tools/ocaml/libs/xl/xenlight_stubs.c |  3 +--
 tools/xl/xl_pci.c|  3 +--
 4 files changed, 21 insertions(+), 6 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index ee52d3cf7e..8225809d94 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -458,6 +458,12 @@
 #define LIBXL_HAVE_DEVICE_PCI_LIST_FREE 1
 
 /*
+ * LIBXL_HAVE_DEVICE_PCI_ASSIGNABLE_LIST_FREE indicates that the
+ * libxl_device_pci_assignable_list_free() function is defined.
+ */
+#define LIBXL_HAVE_DEVICE_PCI_ASSIGNABLE_LIST_FREE 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
@@ -2369,6 +2375,7 @@ int libxl_device_events_handler(libxl_ctx *ctx,
 int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_device_pci *pci, int 
rebind);
 int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_device_pci *pci, 
int rebind);
 libxl_device_pci *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
+void libxl_device_pci_assignable_list_free(libxl_device_pci *list, int num);
 
 /* CPUID handling */
 int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 3162facb37..e858509609 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -438,7 +438,7 @@ libxl_device_pci 
*libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num)
 pcis = new;
 new = pcis + *num;
 
-memset(new, 0, sizeof(*new));
+libxl_device_pci_init(new);
 pci_struct_fill(new, dom, bus, dev, func, 0);
 
 if (pci_info_xs_read(gc, new, "domid")) /* already assigned */
@@ -453,6 +453,16 @@ out:
 return pcis;
 }
 
+void libxl_device_pci_assignable_list_free(libxl_device_pci *list, int num)
+{
+int i;
+
+for (i = 0; i < num; i++)
+libxl_device_pci_dispose(&list[i]);
+
+free(list);
+}
+
 /* Unbind device from its current driver, if any.  If driver_path is non-NULL,
  * store the path to the original driver in it. */
 static int sysfs_dev_unbind(libxl__gc *gc, libxl_device_pci *pci,
@@ -1470,7 +1480,7 @@ static int libxl_pci_assignable(libxl_ctx *ctx, 
libxl_device_pci *pci)
 pcis[i].func == pci->func)
 break;
 }
-free(pcis);
+libxl_device_pci_assignable_list_free(pcis, num);
 return i != num;
 }
 
diff --git a/tools/ocaml/libs/xl/xenlight_stubs.c 
b/tools/ocaml/libs/xl/xenlight_stubs.c
index 1181971da4..352a00134d 100644
--- a/tools/ocaml/libs/xl/xenlight_stubs.c
+++ b/tools/ocaml/libs/xl/xenlight_stubs.c
@@ -894,9 +894,8 @@ value stub_xl_device_pci_assignable_list(value ctx)
Field(list, 1) = temp;
temp = list;
Store_field(list, 0, Val_device_pci(&c_list[i]));
-   libxl_device_pci_dispose(&c_list[i]);
}
-   free(c_list);
+   libxl_device_pci_assignable_list_free(c_list, nb);
 
CAMLreturn(list);
 }
diff --git a/tools/xl/xl_pci.c b/tools/xl/xl_pci.c
index 7c0f102ac7..f71498cbb5 100644
--- a/tools/xl/xl_pci.c
+++ b/tools/xl/xl_pci.c
@@ -164,9 +164,8 @@ static void pciassignable_list(void)
 for (i = 0; i < num; i++) {
 printf("%04x:%02x:%02x.%01x\n",
pcis[i].domain, pcis[i].bus, pcis[i].dev, pcis[i].func);
-libxl_device_pci_dispose(&pcis[i]);
 }
-free(pcis);
+libxl_device_pci_assignable_list_free(pcis, num);
 }
 
 int main_pciassignable_list(int argc, char **argv)
-- 
2.11.0

[PATCH 14/25] libxl: add/recover 'rdm_policy' to/from PCI backend in xenstore

From: Paul Durrant 

Other parameters, such as 'msitranslate' and 'permissive' are dealt with
but 'rdm_policy' appears to be have been completely missed.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 tools/libs/light/libxl_pci.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 2e8e1c50f1..c5d73133eb 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -61,9 +61,9 @@ static void libxl_create_pci_backend_device(libxl__gc *gc,
 flexarray_append_pair(back, GCSPRINTF("vdevfn-%d", num), 
GCSPRINTF("%x", pci->vdevfn));
 flexarray_append(back, GCSPRINTF("opts-%d", num));
 flexarray_append(back,
-  GCSPRINTF("msitranslate=%d,power_mgmt=%d,permissive=%d",
- pci->msitranslate, pci->power_mgmt,
- pci->permissive));
+  
GCSPRINTF("msitranslate=%d,power_mgmt=%d,permissive=%d,rdm_policy=%s",
+pci->msitranslate, pci->power_mgmt,
+pci->permissive, 
libxl_rdm_reserve_policy_to_string(pci->rdm_policy)));
 flexarray_append_pair(back, GCSPRINTF("state-%d", num), GCSPRINTF("%d", 
XenbusStateInitialising));
 }
 
@@ -2310,6 +2310,9 @@ static int libxl__device_pci_from_xs_be(libxl__gc *gc,
 } else if (!strcmp(p, "permissive")) {
 p = strtok_r(NULL, ",=", &saveptr);
 pci->permissive = atoi(p);
+} else if (!strcmp(p, "rdm_policy")) {
+p = strtok_r(NULL, ",=", &saveptr);
+libxl_rdm_reserve_policy_from_string(p, &pci->rdm_policy);
 }
 } while ((p = strtok_r(NULL, ",=", &saveptr)) != NULL);
 }
-- 
2.11.0

[PATCH 25/25] xl / libxl: support 'xl pci-attach/detach' by name

From: Paul Durrant 

This patch adds a 'name' field into the idl for 'libxl_device_pci' and
libxlu_pci_parse_spec_string() is modified to parse the new 'name'
parameter of PCI_SPEC_STRING detailed in the updated documention in
xl-pci-configuration(5).

If the 'name' field is non-NULL then both libxl_device_pci_add() and
libxl_device_pci_remove() will use it to look up the device BDF in
the list of assignable devices.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
 tools/include/libxl.h|  6 
 tools/libs/light/libxl_pci.c | 67 +---
 tools/libs/light/libxl_types.idl |  1 +
 tools/libs/util/libxlu_pci.c |  7 -
 tools/xl/xl_pci.c|  1 +
 5 files changed, 76 insertions(+), 6 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 4025d3a3d4..5b55a20155 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -485,6 +485,12 @@
 #define LIBXL_HAVE_PCI_ASSIGNABLE_NAME 1
 
 /*
+ * LIBXL_HAVE_DEVICE_PCI_NAME indicates that the 'name' field of
+ * libxl_device_pci is defined.
+ */
+#define LIBXL_HAVE_DEVICE_PCI_NAME 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 0f7d655aff..e5d54732c3 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -60,6 +60,10 @@ static void libxl_create_pci_backend_device(libxl__gc *gc,
 int num,
 const libxl_device_pci *pci)
 {
+if (pci->name) {
+flexarray_append(back, GCSPRINTF("name-%d", num));
+flexarray_append(back, GCSPRINTF("%s", pci->name));
+}
 flexarray_append(back, GCSPRINTF("key-%d", num));
 flexarray_append(back, GCSPRINTF(PCI_BDF, pci->bdf.domain, pci->bdf.bus, 
pci->bdf.dev, pci->bdf.func));
 flexarray_append(back, GCSPRINTF("dev-%d", num));
@@ -252,6 +256,7 @@ retry_transaction:
 
 retry_transaction2:
 t = xs_transaction_start(ctx->xsh);
+xs_rm(ctx->xsh, t, GCSPRINTF("%s/name-%d", be_path, i));
 xs_rm(ctx->xsh, t, GCSPRINTF("%s/state-%d", be_path, i));
 xs_rm(ctx->xsh, t, GCSPRINTF("%s/key-%d", be_path, i));
 xs_rm(ctx->xsh, t, GCSPRINTF("%s/dev-%d", be_path, i));
@@ -290,6 +295,12 @@ retry_transaction2:
 xs_write(ctx->xsh, t, GCSPRINTF("%s/vdevfn-%d", be_path, j - 1), 
tmp, strlen(tmp));
 xs_rm(ctx->xsh, t, tmppath);
 }
+tmppath = GCSPRINTF("%s/name-%d", be_path, j);
+tmp = libxl__xs_read(gc, t, tmppath);
+if (tmp) {
+xs_write(ctx->xsh, t, GCSPRINTF("%s/name-%d", be_path, j - 1), 
tmp, strlen(tmp));
+xs_rm(ctx->xsh, t, tmppath);
+}
 }
 if (!xs_transaction_end(ctx->xsh, t, 0))
 if (errno == EAGAIN)
@@ -1586,6 +1597,23 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 pas->starting = starting;
 pas->callback = device_pci_add_stubdom_done;
 
+if (pci->name) {
+libxl_pci_bdf *pcibdf =
+libxl_device_pci_assignable_name2bdf(CTX, pci->name);
+
+if (!pcibdf) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+LOGD(DETAIL, domid, "'%s' -> %04x:%02x:%02x.%u", pci->name,
+ pcibdf->domain, pcibdf->bus, pcibdf->dev, pcibdf->func);
+
+libxl_pci_bdf_copy(CTX, &pci->bdf, pcibdf);
+libxl_pci_bdf_dispose(pcibdf);
+free(pcibdf);
+}
+
 if (libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM) {
 rc = xc_test_assign_device(ctx->xch, domid,
pci_encode_bdf(&pci->bdf));
@@ -1734,11 +1762,19 @@ static void device_pci_add_done(libxl__egc *egc,
 libxl_device_pci *pci = &pas->pci;
 
 if (rc) {
-LOGD(ERROR, domid,
- "libxl__device_pci_add  failed for "
- "PCI device %x:%x:%x.%x (rc %d)",
- pci->bdf.domain, pci->bdf.bus, pci->bdf.dev, pci->bdf.func,
- rc);
+if (pci->name) {
+LOGD(ERROR, domid,
+ "libxl__device_pci_add failed for "
+ "PCI device '%s' (rc %d)",
+ pci->name,
+ rc);
+} else {
+LOGD(ERROR, domid,
+ "libxl__device_pci_add failed for "
+ "PCI device %x:%x:%x.%x (rc %d)",
+ pci->bdf.domain, pci->bdf.bus, pci->bdf.dev, pci->bdf.func,
+ rc);
+}
 pci_info_xs_remove(gc, &pci->bdf, "domid");
 }
 libxl_device_pci_dispose(pci);
@@ -2284,6 +2320,23 @@ static void libxl__device_pci_remove_common(libxl__egc 
*egc,
 libxl__ev_time_init(&prs->timeout);
 libxl__ev_time_init(&prs->retry_timer);
 
+if (pci->name) {
+libxl_pci_bdf *pcibdf =
+libxl_device_pci_assignable_name2bdf(CTX, pci->name);
+
+if

[PATCH 20/25] libxlu: introduce xlu_pci_parse_spec_string()

From: Paul Durrant 

This patch largely re-writes the code to parse a PCI_SPEC_STRING and enters
it via the newly introduced function. The new parser also deals with 'bdf'
and 'vslot' as non-positional paramaters, as per the documentation in
xl-pci-configuration(5).

The existing xlu_pci_parse_bdf() function remains, but now strictly parses
BDF values. Some existing callers of xlu_pci_parse_bdf() are
modified to call xlu_pci_parse_spec_string() as per the documentation in xl(1).

NOTE: Usage text in xl_cmdtable.c and error messages are also modified
  appropriately.

Fixes: d25cc3ec93eb ("libxl: workaround gcc 10.2 maybe-uninitialized warning")
Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
 tools/include/libxlutil.h|   8 +-
 tools/libs/util/libxlu_pci.c | 354 +++
 tools/xl/xl_cmdtable.c   |   4 +-
 tools/xl/xl_parse.c  |   4 +-
 tools/xl/xl_pci.c|  37 +++--
 5 files changed, 220 insertions(+), 187 deletions(-)

diff --git a/tools/include/libxlutil.h b/tools/include/libxlutil.h
index 92e35c5462..cdd6aab4f8 100644
--- a/tools/include/libxlutil.h
+++ b/tools/include/libxlutil.h
@@ -109,9 +109,15 @@ int xlu_disk_parse(XLU_Config *cfg, int nspecs, const char 
*const *specs,
*/
 
 /*
+ * PCI BDF
+ */
+int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_pci_bdf *bdf, const char *str);
+
+/*
  * PCI specification parsing
  */
-int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_device_pci *pcidev, const char 
*str);
+int xlu_pci_parse_spec_string(XLU_Config *cfg, libxl_device_pci *pci,
+  const char *str);
 
 /*
  * RDM parsing
diff --git a/tools/libs/util/libxlu_pci.c b/tools/libs/util/libxlu_pci.c
index 5c107f2642..a8b6ce5427 100644
--- a/tools/libs/util/libxlu_pci.c
+++ b/tools/libs/util/libxlu_pci.c
@@ -1,5 +1,7 @@
 #define _GNU_SOURCE
 
+#include 
+
 #include "libxlu_internal.h"
 #include "libxlu_disk_l.h"
 #include "libxlu_disk_i.h"
@@ -9,185 +11,213 @@
 #define XLU__PCI_ERR(_c, _x, _a...) \
 if((_c) && (_c)->report) fprintf((_c)->report, _x, ##_a)
 
-static int hex_convert(const char *str, unsigned int *val, unsigned int mask)
+static int parse_bdf(libxl_pci_bdf *bdfp, uint32_t *vfunc_maskp,
+ const char *str, const char **endp)
 {
-unsigned long ret;
-char *end;
-
-ret = strtoul(str, &end, 16);
-if ( end == str || *end != '\0' )
-return -1;
-if ( ret & ~mask )
-return -1;
-*val = (unsigned int)ret & mask;
+const char *ptr = str;
+unsigned int colons = 0;
+unsigned int domain, bus, dev, func;
+int n;
+
+/* Count occurrences of ':' to detrmine presence/absence of the 'domain' */
+while (isxdigit(*ptr) || *ptr == ':') {
+if (*ptr == ':')
+colons++;
+ptr++;
+}
+
+ptr = str;
+switch (colons) {
+case 1:
+domain = 0;
+if (sscanf(ptr, "%x:%x.%n", &bus, &dev, &n) != 2)
+return ERROR_INVAL;
+break;
+case 2:
+if (sscanf(ptr, "%x:%x:%x.%n", &domain, &bus, &dev, &n) != 3)
+return ERROR_INVAL;
+break;
+default:
+return ERROR_INVAL;
+}
+
+if (domain > 0x || bus > 0xff || dev > 0x1f)
+return ERROR_INVAL;
+
+ptr += n;
+if (*ptr == '*') {
+if (!vfunc_maskp)
+return ERROR_INVAL;
+*vfunc_maskp = LIBXL_PCI_FUNC_ALL;
+func = 0;
+ptr++;
+} else {
+if (sscanf(ptr, "%x%n", &func, &n) != 1)
+return ERROR_INVAL;
+if (func > 7)
+return ERROR_INVAL;
+if (vfunc_maskp)
+*vfunc_maskp = 1;
+ptr += n;
+}
+
+bdfp->domain = domain;
+bdfp->bus = bus;
+bdfp->dev = dev;
+bdfp->func = func;
+
+if (endp)
+*endp = ptr;
+
 return 0;
 }
 
-static int pci_struct_fill(libxl_device_pci *pci, unsigned int domain,
-   unsigned int bus, unsigned int dev,
-   unsigned int func, unsigned int vdevfn)
+static int parse_vslot(uint32_t *vdevfnp, const char *str, const char **endp)
 {
-pci->bdf.domain = domain;
-pci->bdf.bus = bus;
-pci->bdf.dev = dev;
-pci->bdf.func = func;
-pci->vdevfn = vdevfn;
+const char *ptr = str;
+unsigned int val;
+int n;
+
+if (sscanf(ptr, "%x%n", &val, &n) != 1)
+return ERROR_INVAL;
+
+if (val > 0x1f)
+return ERROR_INVAL;
+
+ptr += n;
+
+*vdevfnp = val << 3;
+
+if (endp)
+*endp = ptr;
+
 return 0;
 }
 
-#define STATE_DOMAIN0
-#define STATE_BUS   1
-#define STATE_DEV   2
-#define STATE_FUNC  3
-#define STATE_VSLOT 4
-#define STATE_OPTIONS_K 6
-#define STATE_OPTIONS_V 7
-#define STATE_TERMINAL  8
-#define STATE_TYPE  9
-#define STATE_RDM_STRATEGY  10
-#define STATE_RESERVE_POLICY11
-#define INVALID 0x
-int xlu_pci_parse_bdf(XLU_Config *cfg, libxl_devi

[PATCH 19/25] libxl: introduce 'libxl_pci_bdf' in the idl...

From: Paul Durrant 

... and use in 'libxl_device_pci'

This patch is preparatory work for restricting the type passed to functions
that only require BDF information, rather than passing a 'libxl_device_pci'
structure which is only partially filled. In this patch only the minimal
mechanical changes necessary to deal with the structural changes are made.
Subsequent patches will adjust the code to make better use of the new type.

Signed-off-by: Paul Durrant 
---
Cc: George Dunlap 
Cc: Nick Rosbrook 
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
 tools/golang/xenlight/helpers.gen.go |  77 --
 tools/golang/xenlight/types.gen.go   |   8 +-
 tools/include/libxl.h|   6 ++
 tools/libs/light/libxl_dm.c  |   8 +-
 tools/libs/light/libxl_internal.h|   3 +-
 tools/libs/light/libxl_pci.c | 148 +--
 tools/libs/light/libxl_types.idl |  16 ++--
 tools/libs/util/libxlu_pci.c |   8 +-
 tools/xl/xl_pci.c|   6 +-
 tools/xl/xl_sxp.c|   4 +-
 10 files changed, 167 insertions(+), 117 deletions(-)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index c8605994e7..b7230f693c 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1999,6 +1999,41 @@ xc.colo_checkpoint_port = 
C.CString(x.ColoCheckpointPort)}
  return nil
  }
 
+// NewPciBdf returns an instance of PciBdf initialized with defaults.
+func NewPciBdf() (*PciBdf, error) {
+var (
+x PciBdf
+xc C.libxl_pci_bdf)
+
+C.libxl_pci_bdf_init(&xc)
+defer C.libxl_pci_bdf_dispose(&xc)
+
+if err := x.fromC(&xc); err != nil {
+return nil, err }
+
+return &x, nil}
+
+func (x *PciBdf) fromC(xc *C.libxl_pci_bdf) error {
+ x.Func = byte(xc._func)
+x.Dev = byte(xc.dev)
+x.Bus = byte(xc.bus)
+x.Domain = int(xc.domain)
+
+ return nil}
+
+func (x *PciBdf) toC(xc *C.libxl_pci_bdf) (err error){defer func(){
+if err != nil{
+C.libxl_pci_bdf_dispose(xc)}
+}()
+
+xc._func = C.uint8_t(x.Func)
+xc.dev = C.uint8_t(x.Dev)
+xc.bus = C.uint8_t(x.Bus)
+xc.domain = C.int(x.Domain)
+
+ return nil
+ }
+
 // NewDevicePci returns an instance of DevicePci initialized with defaults.
 func NewDevicePci() (*DevicePci, error) {
 var (
@@ -2014,10 +2049,9 @@ return nil, err }
 return &x, nil}
 
 func (x *DevicePci) fromC(xc *C.libxl_device_pci) error {
- x.Func = byte(xc._func)
-x.Dev = byte(xc.dev)
-x.Bus = byte(xc.bus)
-x.Domain = int(xc.domain)
+ if err := x.Bdf.fromC(&xc.bdf);err != nil {
+return fmt.Errorf("converting field Bdf: %v", err)
+}
 x.Vdevfn = uint32(xc.vdevfn)
 x.VfuncMask = uint32(xc.vfunc_mask)
 x.Msitranslate = bool(xc.msitranslate)
@@ -2033,10 +2067,9 @@ if err != nil{
 C.libxl_device_pci_dispose(xc)}
 }()
 
-xc._func = C.uint8_t(x.Func)
-xc.dev = C.uint8_t(x.Dev)
-xc.bus = C.uint8_t(x.Bus)
-xc.domain = C.int(x.Domain)
+if err := x.Bdf.toC(&xc.bdf); err != nil {
+return fmt.Errorf("converting field Bdf: %v", err)
+}
 xc.vdevfn = C.uint32_t(x.Vdevfn)
 xc.vfunc_mask = C.uint32_t(x.VfuncMask)
 xc.msitranslate = C.bool(x.Msitranslate)
@@ -2766,13 +2799,13 @@ if err := x.Nics[i].fromC(&v); err != nil {
 return fmt.Errorf("converting field Nics: %v", err) }
 }
 }
-x.Pcidevs = nil
-if n := int(xc.num_pcidevs); n > 0 {
-cPcidevs := (*[1<<28]C.libxl_device_pci)(unsafe.Pointer(xc.pcidevs))[:n:n]
-x.Pcidevs = make([]DevicePci, n)
-for i, v := range cPcidevs {
-if err := x.Pcidevs[i].fromC(&v); err != nil {
-return fmt.Errorf("converting field Pcidevs: %v", err) }
+x.Pcis = nil
+if n := int(xc.num_pcis); n > 0 {
+cPcis := (*[1<<28]C.libxl_device_pci)(unsafe.Pointer(xc.pcis))[:n:n]
+x.Pcis = make([]DevicePci, n)
+for i, v := range cPcis {
+if err := x.Pcis[i].fromC(&v); err != nil {
+return fmt.Errorf("converting field Pcis: %v", err) }
 }
 }
 x.Rdms = nil
@@ -2922,13 +2955,13 @@ return fmt.Errorf("converting field Nics: %v", err)
 }
 }
 }
-if numPcidevs := len(x.Pcidevs); numPcidevs > 0 {
-xc.pcidevs = 
(*C.libxl_device_pci)(C.malloc(C.ulong(numPcidevs)*C.sizeof_libxl_device_pci))
-xc.num_pcidevs = C.int(numPcidevs)
-cPcidevs := 
(*[1<<28]C.libxl_device_pci)(unsafe.Pointer(xc.pcidevs))[:numPcidevs:numPcidevs]
-for i,v := range x.Pcidevs {
-if err := v.toC(&cPcidevs[i]); err != nil {
-return fmt.Errorf("converting field Pcidevs: %v", err)
+if numPcis := len(x.Pcis); numPcis > 0 {
+xc.pcis = 
(*C.libxl_device_pci)(C.malloc(C.ulong(numPcis)*C.sizeof_libxl_device_pci))
+xc.num_pcis = C.int(numPcis)
+cPcis := 
(*[1<<28]C.libxl_device_pci)(unsafe.Pointer(xc.pcis))[:numPcis:numPcis]
+for i,v := range x.Pcis {
+if err := v.toC(&cPcis[i]); err != nil {
+return fmt.Errorf("converting field Pcis: %v", err)
 }
 }
 }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index b4c5df0f2c..bc62ae8ce9 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -707,11 +707,15 @@ ColoCheckpointHost string
 ColoCheckpointPort string
 }

[PATCH 22/25] docs/man: modify xl(1) in preparation for naming of assignable devices

From: Paul Durrant 

A subsequent patch will introduce code to allow a name to be specified to
'xl pci-assignable-add' such that the assignable device may be referred to
by than name in subsequent operations.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 docs/man/xl.1.pod.in | 19 ---
 1 file changed, 12 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 373a52839d..a45b423d0f 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -1595,19 +1595,23 @@ List virtual network interfaces for a domain.
 
 =over 4
 
-=item B
+=item B [I<-n>]
 
 List all the B of assignable PCI devices. See
-L for more information.
+L for more information. If the -n option is
+specified then any name supplied when the device was made assignable
+will also be displayed.
 
 These are devices in the system which are configured to be
 available for passthrough and are bound to a suitable PCI
 backend driver in domain 0 rather than a real driver.
 
-=item B I
+=item B [I<-n NAME>] I
 
 Make the device at B assignable to guests. See
-L for more information.
+L for more information. If the -n option is
+supplied then the assignable device entry will the named with the
+given B.
 
 This will bind the device to the pciback driver and assign it to the
 "quarantine domain".  If it is already bound to a driver, it will
@@ -1622,10 +1626,11 @@ not to do this on a device critical to domain 0's 
operation, such as
 storage controllers, network interfaces, or GPUs that are currently
 being used.
 
-=item B [I<-r>] I
+=item B [I<-r>] I|I
 
-Make the device at B not assignable to guests. See
-L for more information.
+Make a device non-assignable to guests. The device may be identified
+either by its B or the B supplied when the device was made
+assignable. See L for more information.
 
 This will at least unbind the device from pciback, and
 re-assign it from the "quarantine domain" back to domain 0.  If the -r
-- 
2.11.0

[PATCH 18/25] docs/man: fix xl(1) documentation for 'pci' operations

From: Paul Durrant 

Currently the documentation completely fails to mention the existence of
PCI_SPEC_STRING. This patch tidies things up, specifically clarifying that
'pci-assignable-add/remove' take  arguments where as 'pci-attach/detach'
take  arguments (which will be enforced in a subsequent
patch).

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
---
 docs/man/xl.1.pod.in | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/docs/man/xl.1.pod.in b/docs/man/xl.1.pod.in
index 5f7d3a7134..373a52839d 100644
--- a/docs/man/xl.1.pod.in
+++ b/docs/man/xl.1.pod.in
@@ -1597,14 +1597,18 @@ List virtual network interfaces for a domain.
 
 =item B
 
-List all the assignable PCI devices.
+List all the B of assignable PCI devices. See
+L for more information.
+
 These are devices in the system which are configured to be
 available for passthrough and are bound to a suitable PCI
 backend driver in domain 0 rather than a real driver.
 
 =item B I
 
-Make the device at PCI Bus/Device/Function BDF assignable to guests.
+Make the device at B assignable to guests. See
+L for more information.
+
 This will bind the device to the pciback driver and assign it to the
 "quarantine domain".  If it is already bound to a driver, it will
 first be unbound, and the original driver stored so that it can be
@@ -1620,8 +1624,10 @@ being used.
 
 =item B [I<-r>] I
 
-Make the device at PCI Bus/Device/Function BDF not assignable to
-guests.  This will at least unbind the device from pciback, and
+Make the device at B not assignable to guests. See
+L for more information.
+
+This will at least unbind the device from pciback, and
 re-assign it from the "quarantine domain" back to domain 0.  If the -r
 option is specified, it will also attempt to re-bind the device to its
 original driver, making it usable by Domain 0 again.  If the device is
@@ -1637,15 +1643,15 @@ As always, this should only be done if you trust the 
guest, or are
 confident that the particular device you're re-assigning to dom0 will
 cancel all in-flight DMA on FLR.
 
-=item B I I
+=item B I I
 
-Hot-plug a new pass-through pci device to the specified domain.
-B is the PCI Bus/Device/Function of the physical device to pass-through.
+Hot-plug a new pass-through pci device to the specified domain. See
+L for more information.
 
-=item B [I] I I
+=item B [I] I I
 
-Hot-unplug a previously assigned pci device from a domain. B is the PCI
-Bus/Device/Function of the physical device to be removed from the guest domain.
+Hot-unplug a pci device that was previously passed through to a domain. See
+L for more information.
 
 B
 
@@ -1660,7 +1666,7 @@ even without guest domain's collaboration.
 
 =item B I
 
-List pass-through pci devices for a domain.
+List the B of pci devices passed through to a domain.
 
 =back
 
-- 
2.11.0

[PATCH 11/25] libxl: make sure callers of libxl_device_pci_list() free the list after use

From: Paul Durrant 

A previous patch introduced libxl_device_pci_list_free() which should be used
by callers of libxl_device_pci_list() to properly dispose of the exported
'libxl_device_pci' types and the free the memory holding them. Whilst all
current callers do ensure the memory is freed, only the code in xl's
pcilist() function actually calls libxl_device_pci_dispose(). As it stands
this laxity does not lead to any memory leaks, but the simple addition of
.e.g. a 'string' into the idl definition of 'libxl_device_pci' would lead
to leaks.

This patch makes sure all callers of libxl_device_pci_list() can call
libxl_device_pci_list_free() by keeping copies of 'libxl_device_pci'
structures inline in 'pci_add_state' and 'pci_remove_state' (and also making
sure these are properly disposed at the end of the operations) rather
than keeping pointers to the structures returned by libxl_device_pci_list().

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Anthony PERARD 
---
 tools/libs/light/libxl_pci.c | 68 
 tools/xl/xl_pci.c|  3 +-
 2 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 879b1b24a0..3162facb37 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -1006,7 +1006,7 @@ typedef struct pci_add_state {
 libxl__xswait_state xswait;
 libxl__ev_qmp qmp;
 libxl__ev_time timeout;
-libxl_device_pci *pci;
+libxl_device_pci pci;
 libxl_domid pci_domid;
 } pci_add_state;
 
@@ -1078,7 +1078,7 @@ static void pci_add_qemu_trad_watch_state_cb(libxl__egc 
*egc,
 
 /* Convenience aliases */
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 
 rc = check_qemu_running(gc, domid, xswa, rc, state);
 if (rc == ERROR_NOT_READY)
@@ -1099,7 +1099,7 @@ static void pci_add_qmp_device_add(libxl__egc *egc, 
pci_add_state *pas)
 
 /* Convenience aliases */
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 libxl__ev_qmp *const qmp = &pas->qmp;
 
 rc = libxl__ev_time_register_rel(ao, &pas->timeout,
@@ -1180,7 +1180,7 @@ static void pci_add_qmp_query_pci_cb(libxl__egc *egc,
 int dev_slot, dev_func;
 
 /* Convenience aliases */
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 
 if (rc) goto out;
 
@@ -1280,7 +1280,7 @@ static void pci_add_dm_done(libxl__egc *egc,
 
 /* Convenience aliases */
 bool starting = pas->starting;
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 bool hvm = libxl__domain_type(gc, domid) == LIBXL_DOMAIN_TYPE_HVM;
 
 libxl__ev_qmp_dispose(gc, &pas->qmp);
@@ -1496,7 +1496,10 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 GCNEW(pas);
 pas->aodev = aodev;
 pas->domid = domid;
-pas->pci = pci;
+
+libxl_device_pci_copy(CTX, &pas->pci, pci);
+pci = &pas->pci;
+
 pas->starting = starting;
 pas->callback = device_pci_add_stubdom_done;
 
@@ -1535,12 +1538,6 @@ void libxl__device_pci_add(libxl__egc *egc, uint32_t 
domid,
 
 stubdomid = libxl_get_stubdom_id(ctx, domid);
 if (stubdomid != 0) {
-libxl_device_pci *pci_s;
-
-GCNEW(pci_s);
-libxl_device_pci_init(pci_s);
-libxl_device_pci_copy(CTX, pci_s, pci);
-pas->pci = pci_s;
 pas->callback = device_pci_add_stubdom_wait;
 
 do_pci_add(egc, stubdomid, pas); /* must be last */
@@ -1599,7 +1596,7 @@ static void device_pci_add_stubdom_done(libxl__egc *egc,
 
 /* Convenience aliases */
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 
 if (rc) goto out;
 
@@ -1650,7 +1647,7 @@ static void device_pci_add_done(libxl__egc *egc,
 EGC_GC;
 libxl__ao_device *aodev = pas->aodev;
 libxl_domid domid = pas->domid;
-libxl_device_pci *pci = pas->pci;
+libxl_device_pci *pci = &pas->pci;
 
 if (rc) {
 LOGD(ERROR, domid,
@@ -1660,6 +1657,7 @@ static void device_pci_add_done(libxl__egc *egc,
  rc);
 pci_info_xs_remove(gc, pci, "domid");
 }
+libxl_device_pci_dispose(pci);
 aodev->rc = rc;
 aodev->callback(egc, aodev);
 }
@@ -1766,7 +1764,7 @@ static int qemu_pci_remove_xenstore(libxl__gc *gc, 
uint32_t domid,
 typedef struct pci_remove_state {
 libxl__ao_device *aodev;
 libxl_domid domid;
-libxl_device_pci *pci;
+libxl_device_pci pci;
 bool force;
 bool hvm;
 unsigned int orig_vdev;
@@ -1808,22 +1806,25 @@ static void do_pci_remove(libxl__egc *egc, 
pci_remove_state *prs)
 {
 STATE_AO_GC(prs->aodev->ao);
 libxl_ctx *ctx = libxl__gc_owner(gc);
-libxl_device_pci *assigned;
+libxl_device_pci *pcis;
+bool attached;
 uint32_t domid = prs->domid;
 libxl

[PATCH 23/25] xl / libxl: support naming of assignable devices

From: Paul Durrant 

This patch modifies libxl_device_pci_assignable_add() to take an optional
'name' argument, which (if supplied) is saved into xenstore and can hence be
used to refer to the now-assignable BDF in subsequent operations. To
facilitate this, a new libxl_device_pci_assignable_name2bdf() function is
added.

The xl code is modified to allow a name to be specified in the
'pci-assignable-add' operation and also allow an option to be specified to
'pci-assignable-list' requesting that names be displayed. The latter is
facilitated by a new libxl_device_pci_assignable_bdf2name() function. Finally
xl 'pci-assignable-remove' is modified to that either a name or BDF can be
supplied. The supplied 'identifier' is first assumed to be a name, but if
libxl_device_pci_assignable_name2bdf() fails to find a matching BDF the
identifier itself will be parsed as a BDF. Names my only include printable
characters and may not include whitespace.

Signed-off-by: Paul Durrant 
---
Cc: Ian Jackson 
Cc: Wei Liu 
Cc: Christian Lindig 
Cc: David Scott 
Cc: Anthony PERARD 
---
 tools/include/libxl.h| 19 +++-
 tools/libs/light/libxl_pci.c | 86 +---
 tools/ocaml/libs/xl/xenlight_stubs.c |  3 +-
 tools/xl/xl_cmdtable.c   | 12 +++--
 tools/xl/xl_pci.c| 84 ---
 5 files changed, 166 insertions(+), 38 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 5703fdf367..4025d3a3d4 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -477,6 +477,14 @@
 #define LIBXL_HAVE_PCI_ASSIGNABLE_BDF 1
 
 /*
+ * LIBXL_HAVE_PCI_ASSIGNABLE_NAME indicates that the
+ * libxl_device_pci_assignable_add() function takes a 'name' argument
+ * and that the libxl_device_pci_assignable_name2bdf() and
+ * libxl_device_pci_assignable_bdf2name() functions are defined.
+ */
+#define LIBXL_HAVE_PCI_ASSIGNABLE_NAME 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
@@ -2385,11 +2393,18 @@ int libxl_device_events_handler(libxl_ctx *ctx,
  * added or is not bound, the functions will emit a warning but return
  * SUCCESS.
  */
-int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_pci_bdf *pcibdf, int 
rebind);
-int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_pci_bdf *pcibdf, 
int rebind);
+int libxl_device_pci_assignable_add(libxl_ctx *ctx, libxl_pci_bdf *pcibdf,
+const char *name, int rebind);
+int libxl_device_pci_assignable_remove(libxl_ctx *ctx, libxl_pci_bdf *pcibdf,
+   int rebind);
 libxl_pci_bdf *libxl_device_pci_assignable_list(libxl_ctx *ctx, int *num);
 void libxl_device_pci_assignable_list_free(libxl_pci_bdf *list, int num);
 
+libxl_pci_bdf *libxl_device_pci_assignable_name2bdf(libxl_ctx *ctx,
+const char *name);
+char *libxl_device_pci_assignable_bdf2name(libxl_ctx *ctx,
+   libxl_pci_bdf *pcibdf);
+
 /* CPUID handling */
 int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str);
 int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 5104f31448..0f7d655aff 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -713,6 +713,7 @@ static int pciback_dev_unassign(libxl__gc *gc, 
libxl_pci_bdf *pcibdf)
 
 static int libxl__device_pci_assignable_add(libxl__gc *gc,
 libxl_pci_bdf *pcibdf,
+const char *name,
 int rebind)
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
@@ -721,6 +722,23 @@ static int libxl__device_pci_assignable_add(libxl__gc *gc,
 int rc;
 struct stat st;
 
+/* Sanitise any name that was passed */
+if (name) {
+unsigned int i, n = strlen(name);
+
+if (n > 64) { /* Reasonable upper bound on name length */
+LOG(ERROR, "Name too long");
+return ERROR_FAIL;
+}
+
+for (i = 0; i < n; i++) {
+if (!isgraph(name[i])) {
+LOG(ERROR, "Names may only include printable characters");
+return ERROR_FAIL;
+}
+}
+}
+
 /* Local copy for convenience */
 dom = pcibdf->domain;
 bus = pcibdf->bus;
@@ -741,7 +759,7 @@ static int libxl__device_pci_assignable_add(libxl__gc *gc,
 }
 if ( rc ) {
 LOG(WARN, PCI_BDF" already assigned to pciback", dom, bus, dev, func);
-goto quarantine;
+goto name;
 }
 
 /* Check to see if there's already a driver that we need to unbind from */
@@ -772,7 +790,12 @@ static int libxl__device_pci_assignable_add(libxl__gc *gc,
 return ERROR_FAIL;
 }
 
-quarantine:
+name:
+if (name)
+

Re: [PATCH] xen/arm: Remove EXPERT dependancy

2020-10-23 Thread Stefano Stabellini

On Fri, 23 Oct 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 22/10/2020 22:17, Stefano Stabellini wrote:
> > On Thu, 22 Oct 2020, Julien Grall wrote:
> > > On 22/10/2020 02:43, Elliott Mitchell wrote:
> > > > Linux requires UEFI support to be enabled on ARM64 devices.  While many
> > > > ARM64 devices lack ACPI, the writing seems to be on the wall of
> > > > UEFI/ACPI
> > > > potentially taking over.  Some common devices may need ACPI table
> > > > support.
> > > > 
> > > > Presently I think it is worth removing the dependancy on CONFIG_EXPERT.
> > > 
> > > The idea behind EXPERT is to gate any feature that is not considered to be
> > > stable/complete enough to be used in production.
> > 
> > Yes, and from that point of view I don't think we want to remove EXPERT
> > from ACPI yet. However, the idea of hiding things behind EXPERT works
> > very well for new esoteric features, something like memory introspection
> > or memory overcommit.
> 
> Memaccess is not very new ;).
> 
> > It does not work well for things that are actually
> > required to boot on the platform.
> 
> I am not sure where is the problem. It is easy to select EXPERT from the
> menuconfig. It also hints the user that the feature may not fully work.
> 
> > 
> > Typically ACPI systems don't come with device tree at all (RPi4 being an
> > exception), so users don't really have much of a choice in the matter.
> 
> And they typically have IOMMUs.
> 
> > 
> >  From that point of view, it would be better to remove EXPERT from ACPI,
> > maybe even build ACPI by default, *but* to add a warning at boot saying
> > something like:
> > 
> > "ACPI support is experimental. Boot using Device Tree if you can."
> > 
> > 
> > That would better convey the risks of using ACPI, while at the same time
> > making it a bit easier for users to boot on their ACPI-only platforms.
> 
> Right, I agree that this make easier for users to boot Xen on ACPI-only
> platform. However, based on above, it is easy enough for a developper to
> rebuild Xen with ACPI and EXPERT enabled.
> 
> So what sort of users are you targeting?

Somebody trying Xen for the first time, they might know how to build it
but they might not know that ACPI is not available by default, and they
might not know that they need to enable EXPERT in order to get the ACPI
option in the menu. It is easy to do once you know it is there,
otherwise one might not know where to look in the menu.


> I am sort of okay to remove EXPERT. 

OK. This would help (even without building it by default) because as you
go and look at the menu the first time, you'll find ACPI among the
options right away.


> But I still think building ACPI by default
> is still wrong because our default .config is meant to be (security)
> supported. I don't think ACPI can earn this qualification today.

Certainly we don't want to imply ACPI is security supported. I was
looking at SUPPORT.md and it is only says:

"""
EXPERT and DEBUG Kconfig options are not security supported. Other
Kconfig options are supported, if the related features are marked as
supported in this document.
"""

So technically we could enable ACPI in the build by default as ACPI for
ARM is marked as experimental. However, I can see that it is not a
great idea to enable by default an unsupported option in the kconfig, so
from that point of view it might be best to leave ACPI disabled by
default. Probably the best compromise at this time.



> In order to remove EXPERT, there are a few things to needs to be done (or
> checked):
> 1) SUPPORT.MD has a statement about ACPI on Arm

### Host ACPI (via Domain 0)

Status, x86 PV: Supported
Status, ARM: Experimental



> 2) DT is favored over ACPI if the two firmware tables are present.

Good idea. xen/arch/arm/acpi/boot.c:acpi_boot_table_init has:

/*
 * Enable ACPI instead of device tree unless
 * - ACPI has been disabled explicitly (acpi=off), or
 * - the device tree is not empty (it has more than just a /chosen node)
 *   and ACPI has not been force enabled (acpi=force)
 */
if ( param_acpi_off)
goto disable;
if ( !param_acpi_force &&
 device_tree_for_each_node(device_tree_flattened, 0,
   dt_scan_depth1_nodes, NULL) )
goto disable;

We should be fine.

Re: [PATCH] xen/arm: ACPI: Remove EXPERT dependancy, default on for ARM64

2020-10-23 Thread Stefano Stabellini

On Thu, 22 Oct 2020, Elliott Mitchell wrote:
> Linux requires UEFI support to be enabled on ARM64 devices.  While many
> ARM64 devices lack ACPI, the writing seems to be on the wall of UEFI/ACPI
> potentially taking over.  Some common devices may require ACPI table
> support to boot.

Let's not make guesses on the direction of the industry in a commit
message :-)

The following would suffice:

Some common ARM64 devices require ACPI to boot (no device tree is
available).


> For devices which can boot in either mode, continue defaulting to
> device-tree.  Add warnings about using ACPI advising users of present
> situation.
> 
> Signed-off-by: Elliott Mitchell 
> ---
> Okay, hopefully this is okay.  Warning in Kconfig, warning on boot.
> Perhaps "default y if ARM_64" is redundant, yet if someone tries to make
> it possible to boot aarch32 on a ACPI machine...
> 
> I also want a date in the message.  Theory is this won't be there
> forever, so a date is essential.
> ---
>  xen/arch/arm/Kconfig | 7 ++-
>  xen/arch/arm/acpi/boot.c | 9 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
> index 2777388265..29624d03fa 100644
> --- a/xen/arch/arm/Kconfig
> +++ b/xen/arch/arm/Kconfig
> @@ -32,13 +32,18 @@ menu "Architecture Features"
>  source "arch/Kconfig"
>  
>  config ACPI
> - bool "ACPI (Advanced Configuration and Power Interface) Support" if 
> EXPERT
> + bool "ACPI (Advanced Configuration and Power Interface) Support"
>   depends on ARM_64
> + default y if ARM_64

I am not so sure about the "default y" for the reason that the option is
not technically "supported", so it is probably best to take the default
line out. Otherwise we end up with a default "unsupported" kconfig which
is not great.


>   ---help---
>  
> Advanced Configuration and Power Interface (ACPI) support for Xen is
> an alternative to device tree on ARM64.
>  
> +   Note this is presently EXPERIMENTAL.  If a given device has both
> +   device-tree and ACPI support, it is presently (October 2020)
> +   recommended to boot using the device-tree.

Please remove the date from the message. We'll update as needed in the
future. The following works:

 Note this is presently EXPERIMENTAL.  If a given device has both
 device-tree and ACPI support, it is recommended to boot using the
 device-tree.


>  config GICV3
>   bool "GICv3 driver"
>   depends on ARM_64 && !NEW_VGIC
> diff --git a/xen/arch/arm/acpi/boot.c b/xen/arch/arm/acpi/boot.c
> index 30e4bd1bc5..c0e8f85325 100644
> --- a/xen/arch/arm/acpi/boot.c
> +++ b/xen/arch/arm/acpi/boot.c
> @@ -254,6 +254,15 @@ int __init acpi_boot_table_init(void)
> dt_scan_depth1_nodes, NULL) )
>  goto disable;
>  
> +printk("\n"
> +"*\n"
> +"*WARNING WARNING WARNING WARNING WARNING WARNING WARNING WARNING*\n"
> +"*   *\n"
> +"* Xen-ARM ACPI support is EXPERIMENTAL.  It is presently (October 2020) *\n"
> +"* recommended you boot your system in device-tree mode if you can.  *\n"
> +"*\n"
> +"\n");

Please use warning_add and remove the date from the message.


>  /*
>   * ACPI is disabled at this point. Enable it in order to parse
>   * the ACPI tables.
> -- 
> 2.20.1
> 
> 
> -- 
> (\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
>  \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
>   \_CS\   |  _  -O #include  O-   _  |   /  _/
> 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
> 
>

[qemu-mainline test] 156122: regressions - FAIL

flight 156122 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156122/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 152631
 build-amd64   6 xen-buildfail REGR. vs. 152631
 build-arm64   6 xen-buildfail REGR. vs. 152631
 build-arm64-xsm   6 xen-buildfail REGR. vs. 152631
 build-i3866 xen-buildfail REGR. vs. 152631
 build-i386-xsm6 xen-buildfail REGR. vs. 152631
 build-armhf   6 xen-buildfail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-inte

[xen-unstable-smoke test] 156129: regressions - trouble: blocked/fail

flight 156129 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156129/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 156117
 build-arm64-xsm   6 xen-buildfail REGR. vs. 156117
 build-armhf   6 xen-buildfail REGR. vs. 156117

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a

version targeted for testing:
 xen  4ddd6499d999a7d08cabfda5b0262e473dd5beed
baseline version:
 xen  6ca70821b59849ad97c3fadc47e63c1a4af1a78c

Last test of basis   156117  2020-10-23 09:01:23 Z0 days
Failing since156120  2020-10-23 14:01:24 Z0 days2 attempts
Testing same since   156129  2020-10-23 18:01:24 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Bertrand Marquis 
  Christian Lindig 
  George Dunlap 
  Ian Jackson 
  Ian Jackson 
  Jan Beulich 
  Jason Andryuk 
  Juergen Gross 
  Wei Liu 

jobs:
 build-arm64-xsm  fail
 build-amd64  fail
 build-armhf  fail
 build-amd64-libvirt  blocked 
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  blocked 
 test-amd64-amd64-xl-qemuu-debianhvm-amd64blocked 
 test-amd64-amd64-libvirt blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 4ddd6499d999a7d08cabfda5b0262e473dd5beed
Author: Jason Andryuk 
Date:   Sun May 24 22:55:06 2020 -0400

SUPPORT: Add linux device model stubdom to Toolstack

Add qemu-xen linux device model stubdomain to the Toolstack section as a
Tech Preview.

Signed-off-by: Jason Andryuk 
Acked-by: George Dunlap 
Acked-by: Ian Jackson 

commit 06f0598b41f23c9e4cf7d8c5a05b282de92f3a35
Author: Jan Beulich 
Date:   Fri Oct 23 18:03:18 2020 +0200

x86emul: fix PINSRW and adjust other {,V}PINSR*

The use of simd_packed_int together with no further update to op_bytes
has lead to wrong signaling of #GP(0) for PINSRW without a 16-byte
aligned memory operand. Use simd_none instead and override it after
general decoding with simd_other, like is done for the B/D/Q siblings.

While benign, for consistency also use DstImplicit instead of DstReg
in x86_decode_twobyte().

PINSR{B,D,Q} also had a stray (redundant) get_fpu() invocation, which
gets dropped.

For further consistency also
- use src.bytes instead of op_bytes in relevant memcpy() invocations,
- avoid the pointless updating of op_bytes (all we care about later is
  that the value be less than 16).

Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 

commit 9af5e2b31b4e6f3892b4614ecd0a619af5d64d7e
Author: Juergen Gross 
Date:   Mon Oct 19 17:27:54 2020 +0200

tools/libs/store: don't use symbolic links for external files

Instead of using symbolic links to include files from xenstored use
the vpath directive and an include path.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
Acked-by: Ian Jackson 

commit 588756db020e73e6f5e4407bbf78fbd53f15b731
Author: Juergen Gross 
Date:   Mon Oct 19 17:27:54 2020 +0200

tools/libs/guest: don't use symbolic links for xenctrl headers

Instead of using symbolic links for accessing the xenctrl private
headers use an include path instead.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
Acked-by: Ian Jackson 

commit 4664034cdc720a52913bc26358240bb9d3798527
Author: Juergen Gr

[xen-unstable-smoke test] 156133: regressions - trouble: blocked/fail

flight 156133 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156133/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64   6 xen-buildfail REGR. vs. 156117
 build-arm64-xsm   6 xen-buildfail REGR. vs. 156117
 build-armhf   6 xen-buildfail REGR. vs. 156117

Tests which did not succeed, but are not blocking:
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-arm64-arm64-xl-xsm   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a

version targeted for testing:
 xen  4ddd6499d999a7d08cabfda5b0262e473dd5beed
baseline version:
 xen  6ca70821b59849ad97c3fadc47e63c1a4af1a78c

Last test of basis   156117  2020-10-23 09:01:23 Z0 days
Failing since156120  2020-10-23 14:01:24 Z0 days3 attempts
Testing same since   156129  2020-10-23 18:01:24 Z0 days2 attempts


People who touched revisions under test:
  Andrew Cooper 
  Bertrand Marquis 
  Christian Lindig 
  George Dunlap 
  Ian Jackson 
  Ian Jackson 
  Jan Beulich 
  Jason Andryuk 
  Juergen Gross 
  Wei Liu 

jobs:
 build-arm64-xsm  fail
 build-amd64  fail
 build-armhf  fail
 build-amd64-libvirt  blocked 
 test-armhf-armhf-xl  blocked 
 test-arm64-arm64-xl-xsm  blocked 
 test-amd64-amd64-xl-qemuu-debianhvm-amd64blocked 
 test-amd64-amd64-libvirt blocked 



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Not pushing.


commit 4ddd6499d999a7d08cabfda5b0262e473dd5beed
Author: Jason Andryuk 
Date:   Sun May 24 22:55:06 2020 -0400

SUPPORT: Add linux device model stubdom to Toolstack

Add qemu-xen linux device model stubdomain to the Toolstack section as a
Tech Preview.

Signed-off-by: Jason Andryuk 
Acked-by: George Dunlap 
Acked-by: Ian Jackson 

commit 06f0598b41f23c9e4cf7d8c5a05b282de92f3a35
Author: Jan Beulich 
Date:   Fri Oct 23 18:03:18 2020 +0200

x86emul: fix PINSRW and adjust other {,V}PINSR*

The use of simd_packed_int together with no further update to op_bytes
has lead to wrong signaling of #GP(0) for PINSRW without a 16-byte
aligned memory operand. Use simd_none instead and override it after
general decoding with simd_other, like is done for the B/D/Q siblings.

While benign, for consistency also use DstImplicit instead of DstReg
in x86_decode_twobyte().

PINSR{B,D,Q} also had a stray (redundant) get_fpu() invocation, which
gets dropped.

For further consistency also
- use src.bytes instead of op_bytes in relevant memcpy() invocations,
- avoid the pointless updating of op_bytes (all we care about later is
  that the value be less than 16).

Signed-off-by: Jan Beulich 
Acked-by: Andrew Cooper 

commit 9af5e2b31b4e6f3892b4614ecd0a619af5d64d7e
Author: Juergen Gross 
Date:   Mon Oct 19 17:27:54 2020 +0200

tools/libs/store: don't use symbolic links for external files

Instead of using symbolic links to include files from xenstored use
the vpath directive and an include path.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
Acked-by: Ian Jackson 

commit 588756db020e73e6f5e4407bbf78fbd53f15b731
Author: Juergen Gross 
Date:   Mon Oct 19 17:27:54 2020 +0200

tools/libs/guest: don't use symbolic links for xenctrl headers

Instead of using symbolic links for accessing the xenctrl private
headers use an include path instead.

Signed-off-by: Juergen Gross 
Acked-by: Christian Lindig 
Tested-by: Bertrand Marquis 
Acked-by: Ian Jackson 

commit 4664034cdc720a52913bc26358240bb9d3798527
Author: Juergen Gr

Re: [PATCH v2 0/7] xen/arm: Unbreak ACPI

2020-10-23 Thread Elliott Mitchell

On Fri, Oct 23, 2020 at 04:41:49PM +0100, Julien Grall wrote:
> Xen on ARM has been broken for quite a while on ACPI systems. This
> series aims to fix it.
> 
> This series also introduced support for ACPI 5.1. This allows Xen to
> boot on QEMU.
> 
> I have only build tested the x86 side so far.

On a Tianocore-utilizing Raspberry PI 4B, this series allows successful
boot (some other distinct issues remain).  As such, for the series on an
ARM device:

Tested-by: Elliott Mitchell 


-- 
(\___(\___(\__  --=> 8-) EHM <=--  __/)___/)___/)
 \BS (| ehem+sig...@m5p.com  PGP 87145445 |)   /
  \_CS\   |  _  -O #include  O-   _  |   /  _/
8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445

[qemu-mainline test] 156130: regressions - FAIL

flight 156130 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/156130/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-amd64-xsm   6 xen-buildfail REGR. vs. 152631
 build-amd64   6 xen-buildfail REGR. vs. 152631
 build-arm64   6 xen-buildfail REGR. vs. 152631
 build-arm64-xsm   6 xen-buildfail REGR. vs. 152631
 build-i3866 xen-buildfail REGR. vs. 152631
 build-i386-xsm6 xen-buildfail REGR. vs. 152631
 build-armhf   6 xen-buildfail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit1   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-i386  1 build-check(1)   blocked  n/a
 test-amd64-i386-freebsd10-amd64  1 build-check(1)   blocked  n/a
 test-amd64-coresched-i386-xl  1 build-check(1)   blocked  n/a
 test-amd64-coresched-amd64-xl  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-shadow1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ws16-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-win7-amd64  1 build-check(1) blocked n/a
 build-arm64-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-qemuu-debianhvm-i386-xsm  1 build-check(1) blocked n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow  1 build-check(1) blocked n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvshim1 build-check(1)   blocked  n/a
 test-amd64-amd64-dom0pvh-xl-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvhv2-intel  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-xl-pvhv2-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-freebsd11-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-qemuu-freebsd12-amd64  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-credit1   1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-amd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-qemuu-nested-intel  1 build-check(1)  blocked n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-qemuu-rhel6hvm-inte

[xen-unstable-smoke test] 156140: regressions - trouble: blocked/fail