date:20210121

[PATCH] tools/xenstore: fix unsigned < 0 compare in xenstore_control

2021-01-21 Thread Juergen Gross

Commit 7f97193e6aa858df ("tools/xenstore: add live update command to
xenstore-control") introduced testing an unsigned value to be less
than 0. Fix that.

Fixes: 7f97193e6aa858df ("tools/xenstore: add live update command to
xenstore-control")
Reported-by: Andrew Cooper 

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstore_control.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/xenstore/xenstore_control.c 
b/tools/xenstore/xenstore_control.c
index 6031f216c7..0c95cf767c 100644
--- a/tools/xenstore/xenstore_control.c
+++ b/tools/xenstore/xenstore_control.c
@@ -313,8 +313,8 @@ int main(int argc, char **argv)
 struct xs_handle *xsh;
 char *par = NULL;
 char *ret;
-unsigned int p, len = 0;
-int rc = 0;
+unsigned int p;
+int rc = 0, len = 0;
 
 if (argc < 2) {
 fprintf(stderr, "Usage:\n"
-- 
2.26.2

[qemu-mainline test] 158554: regressions - FAIL

2021-01-21 Thread osstest service owner

flight 158554 qemu-mainline real [real]
flight 158567 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158554/
http://logs.test-lab.xenproject.org/osstest/logs/158567/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass

version targeted for testing:
 qemuu954b83f13236d21b4116b93a726ea36b5dc2d303
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  154 days
Failing since152659  2020-08-21 14:07:39 Z  153 days  317 attempts
Testing same since   158554  2021-01-21 10:36:20 Z0 days1 attempts


357 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm

[PATCH v3 5/5] xen/x86/efi: Verify dom0 kernel with SHIM_LOCK protocol in efi_multiboot2()

2021-01-21 Thread Bobby Eshleman

From: Daniel Kiper 

This splits out efi_shim_lock() into common code and uses it to verify
the dom0 kernel in efi_multiboot2().

Signed-off-by: Daniel Kiper 
Signed-off-by: Bobby Eshleman 
---
 xen/arch/x86/boot/head.S| 20 ++--
 xen/arch/x86/efi/efi-boot.h |  6 ++
 xen/arch/x86/efi/stub.c |  5 -
 xen/common/efi/boot.c   | 19 +--
 4 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index f2edd182a5..943792eb43 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -244,9 +244,13 @@ __efi64_mb2_start:
 jmp x86_32_switch
 
 .Lefi_multiboot2_proto:
-/* Zero EFI SystemTable and EFI ImageHandle addresses. */
+/*
+ * Zero EFI SystemTable, EFI ImageHandle and
+ * dom0 kernel module struct addresses.
+ */
 xor %esi,%esi
 xor %edi,%edi
+xor %r14d, %r14d
 
 /* Skip Multiboot2 information fixed part. */
 lea (MB2_fixed_sizeof+MULTIBOOT2_TAG_ALIGN-1)(%rbx),%ecx
@@ -284,6 +288,15 @@ __efi64_mb2_start:
 cmove   MB2_efi64_ih(%rcx),%rdi
 je  .Lefi_mb2_next_tag
 
+/* Get Dom0 kernel module struct address from Multiboot2 information. 
*/
+cmpl$MULTIBOOT2_TAG_TYPE_MODULE,MB2_tag_type(%rcx)
+jne .Lefi_mb2_end
+
+test%r14d, %r14d
+cmovz   %ecx, %r14d
+jmp .Lefi_mb2_next_tag
+
+.Lefi_mb2_end:
 /* Is it the end of Multiboot2 information? */
 cmpl$MULTIBOOT2_TAG_TYPE_END,MB2_tag_type(%rcx)
 je  .Lrun_bs
@@ -345,9 +358,12 @@ __efi64_mb2_start:
 /* Keep the stack aligned. Do not pop a single item off it. */
 mov (%rsp),%rdi
 
+mov %r14d, %edx
+
 /*
  * efi_multiboot2() is called according to System V AMD64 ABI:
- *   - IN:  %rdi - EFI ImageHandle, %rsi - EFI SystemTable.
+ *   - IN: %rdi - EFI ImageHandle, %rsi - EFI SystemTable,
+ * %rdx - Dom0 kernel module struct address.
  */
 callefi_multiboot2
 
diff --git a/xen/arch/x86/efi/efi-boot.h b/xen/arch/x86/efi/efi-boot.h
index f694a069c9..0d025ad9a5 100644
--- a/xen/arch/x86/efi/efi-boot.h
+++ b/xen/arch/x86/efi/efi-boot.h
@@ -3,6 +3,8 @@
  * is intended to be included by common/efi/boot.c _only_, and
  * therefore can define arch specific global variables.
  */
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -762,6 +764,10 @@ void __init efi_multiboot2(EFI_HANDLE ImageHandle,
 
 gop = efi_get_gop();
 
+if ( dom0_kernel && dom0_kernel->mod_end > dom0_kernel->mod_start )
+efi_shim_lock((VOID *)(unsigned long)dom0_kernel->mod_start,
+  dom0_kernel->mod_end - dom0_kernel->mod_start);
+
 if ( gop )
 gop_mode = efi_find_gop_mode(gop, 0, 0, 0);
 
diff --git a/xen/arch/x86/efi/stub.c b/xen/arch/x86/efi/stub.c
index 9bd6355ec3..7d459905fa 100644
--- a/xen/arch/x86/efi/stub.c
+++ b/xen/arch/x86/efi/stub.c
@@ -1,7 +1,9 @@
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -29,7 +31,8 @@ asm (
 );
 
 void __init noreturn efi_multiboot2(EFI_HANDLE ImageHandle,
-EFI_SYSTEM_TABLE *SystemTable)
+EFI_SYSTEM_TABLE *SystemTable,
+multiboot2_tag_module_t *dom0_kernel)
 {
 static const CHAR16 __initconst err[] =
 L"Xen does not have EFI code build in!\r\nSystem halted!\r\n";
diff --git a/xen/common/efi/boot.c b/xen/common/efi/boot.c
index 63e289ab85..8ce6715b59 100644
--- a/xen/common/efi/boot.c
+++ b/xen/common/efi/boot.c
@@ -133,6 +133,7 @@ static void efi_console_set_mode(void);
 static EFI_GRAPHICS_OUTPUT_PROTOCOL *efi_get_gop(void);
 static UINTN efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
UINTN cols, UINTN rows, UINTN depth);
+static void efi_shim_lock(const VOID *Buffer, UINT32 Size);
 static void efi_tables(void);
 static void setup_efi_pci(void);
 static void efi_variables(void);
@@ -830,6 +831,17 @@ static UINTN __init 
efi_find_gop_mode(EFI_GRAPHICS_OUTPUT_PROTOCOL *gop,
 return gop_mode;
 }
 
+static void __init efi_shim_lock(const VOID *Buffer, UINT32 Size)
+{
+static EFI_GUID __initdata shim_lock_guid = SHIM_LOCK_PROTOCOL_GUID;
+EFI_SHIM_LOCK_PROTOCOL *shim_lock;
+EFI_STATUS status;
+
+if ( !EFI_ERROR(efi_bs->LocateProtocol(_lock_guid, NULL, (void 
**)_lock)) &&
+ (status = shim_lock->Verify(Buffer, Size)) != EFI_SUCCESS )
+PrintErrMesg(L"Dom0 kernel image could not be verified", status);
+}
+
 static void __init efi_tables(void)
 {
 unsigned int i;
@@ -1123,13 +1135,11 @@ void EFIAPI __init noreturn
 efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE *SystemTable)
 {
 static EFI_GUID __initdata loaded_image_guid =

[PATCH v3 4/5] xen/x86: add some addresses to the Multiboot2 header

2021-01-21 Thread Bobby Eshleman

From: Daniel Kiper 

In comparison to ELF the PE format is not supported by the Multiboot2
protocol. So, if we wish to load xen.mb.efi using this protocol we have
to add MULTIBOOT2_HEADER_TAG_ADDRESS and MULTIBOOT2_HEADER_TAG_ENTRY_ADDRESS
tags into Multiboot2 header.

Additionally, put MULTIBOOT2_HEADER_TAG_ENTRY_ADDRESS and
MULTIBOOT2_HEADER_TAG_ENTRY_ADDRESS_EFI64 tags close to each
other to make the header more readable.

The Multiboot2 protocol spec can be found at
  https://www.gnu.org/software/grub/manual/multiboot2/

Signed-off-by: Daniel Kiper 
Signed-off-by: Bobby Eshleman 
---
 xen/arch/x86/boot/head.S | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 189d91a872..f2edd182a5 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -94,6 +94,13 @@ multiboot2_header:
 /* Align modules at page boundry. */
 mb2ht_init MB2_HT(MODULE_ALIGN), MB2_HT(REQUIRED)
 
+/* The address tag. */
+mb2ht_init MB2_HT(ADDRESS), MB2_HT(REQUIRED), \
+   sym_offs(multiboot2_header), /* header_addr */ \
+   sym_offs(start), /* load_addr */ \
+   sym_offs(__bss_start),   /* load_end_addr */ \
+   sym_offs(__2M_rwdata_end)/* bss_end_addr */
+
 /* Load address preference. */
 mb2ht_init MB2_HT(RELOCATABLE), MB2_HT(OPTIONAL), \
sym_offs(start), /* Min load address. */ \
@@ -101,6 +108,14 @@ multiboot2_header:
0x20, /* Load address alignment (2 MiB). */ \
MULTIBOOT2_LOAD_PREFERENCE_HIGH
 
+/* Multiboot2 entry point. */
+mb2ht_init MB2_HT(ENTRY_ADDRESS), MB2_HT(REQUIRED), \
+   sym_offs(__start)
+
+/* EFI64 Multiboot2 entry point. */
+mb2ht_init MB2_HT(ENTRY_ADDRESS_EFI64), MB2_HT(OPTIONAL), \
+   sym_offs(__efi64_mb2_start)
+
 /* Console flags tag. */
 mb2ht_init MB2_HT(CONSOLE_FLAGS), MB2_HT(OPTIONAL), \
MULTIBOOT2_CONSOLE_FLAGS_EGA_TEXT_SUPPORTED
@@ -114,10 +129,6 @@ multiboot2_header:
 /* Request that ExitBootServices() not be called. */
 mb2ht_init MB2_HT(EFI_BS), MB2_HT(OPTIONAL)
 
-/* EFI64 Multiboot2 entry point. */
-mb2ht_init MB2_HT(ENTRY_ADDRESS_EFI64), MB2_HT(OPTIONAL), \
-   sym_offs(__efi64_mb2_start)
-
 /* Multiboot2 header end tag. */
 mb2ht_init MB2_HT(END), MB2_HT(REQUIRED)
 .Lmultiboot2_header_end:
-- 
2.30.0

[PATCH v3 2/5] xen/x86: manually build xen.mb.efi binary

2021-01-21 Thread Bobby Eshleman

From: Daniel Kiper 

This patch introduces xen.mb.efi which contains a manually built PE
header.

This allows us to support Xen on UEFI Secure Boot-enabled hosts via
multiboot2.

xen.mb.efi is a single binary that is loadable by a UEFI loader or via
the Multiboot/Multiboot2 protocols.

Signed-off-by: Daniel Kiper 
Signed-off-by: Bobby Eshleman 
---
 xen/Makefile|  20 +++---
 xen/arch/x86/Makefile   |   7 +-
 xen/arch/x86/arch.mk|   2 +
 xen/arch/x86/boot/Makefile  |   1 +
 xen/arch/x86/boot/head.S|   1 +
 xen/arch/x86/boot/pecoff.S  | 123 
 xen/arch/x86/efi/efi-boot.h |  24 ++-
 xen/arch/x86/efi/stub.c |  12 +++-
 xen/arch/x86/xen.lds.S  |  34 ++
 xen/include/xen/efi.h   |   1 +
 10 files changed, 212 insertions(+), 13 deletions(-)
 create mode 100644 xen/arch/x86/boot/pecoff.S

diff --git a/xen/Makefile b/xen/Makefile
index 85f9b763a4..9c689c2095 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -266,29 +266,31 @@ endif
 .PHONY: _build
 _build: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
 
+define install_xen_links
+   $(INSTALL_DATA) $(TARGET)$1 $(D)$(BOOT_DIR)/$(T)-$(XEN_FULLVERSION)$1
+   ln -f -s $(T)-$(XEN_FULLVERSION)$1 
$(D)$(BOOT_DIR)/$(T)-$(XEN_VERSION).$(XEN_SUBVERSION)$1
+   ln -f -s $(T)-$(XEN_FULLVERSION)$1 $(D)$(BOOT_DIR)/$(T)-$(XEN_VERSION)$1
+   ln -f -s $(T)-$(XEN_FULLVERSION)$1 $(D)$(BOOT_DIR)/$(T)$1
+endef
+
 .PHONY: _install
 _install: D=$(DESTDIR)
 _install: T=$(notdir $(TARGET))
 _install: Z=$(CONFIG_XEN_INSTALL_SUFFIX)
 _install: $(TARGET)$(CONFIG_XEN_INSTALL_SUFFIX)
[ -d $(D)$(BOOT_DIR) ] || $(INSTALL_DIR) $(D)$(BOOT_DIR)
-   $(INSTALL_DATA) $(TARGET)$(Z) 
$(D)$(BOOT_DIR)/$(T)-$(XEN_FULLVERSION)$(Z)
-   ln -f -s $(T)-$(XEN_FULLVERSION)$(Z) 
$(D)$(BOOT_DIR)/$(T)-$(XEN_VERSION).$(XEN_SUBVERSION)$(Z)
-   ln -f -s $(T)-$(XEN_FULLVERSION)$(Z) 
$(D)$(BOOT_DIR)/$(T)-$(XEN_VERSION)$(Z)
-   ln -f -s $(T)-$(XEN_FULLVERSION)$(Z) $(D)$(BOOT_DIR)/$(T)$(Z)
+   $(call install_xen_links,$(Z))
+   $(call install_xen_links,.mb.efi)
[ -d "$(D)$(DEBUG_DIR)" ] || $(INSTALL_DIR) $(D)$(DEBUG_DIR)
$(INSTALL_DATA) $(TARGET)-syms 
$(D)$(DEBUG_DIR)/$(T)-syms-$(XEN_FULLVERSION)
$(INSTALL_DATA) $(TARGET)-syms.map 
$(D)$(DEBUG_DIR)/$(T)-syms-$(XEN_FULLVERSION).map
$(INSTALL_DATA) $(KCONFIG_CONFIG) 
$(D)$(BOOT_DIR)/$(T)-$(XEN_FULLVERSION).config
if [ -r $(TARGET).efi -a -n '$(EFI_DIR)' ]; then \
[ -d $(D)$(EFI_DIR) ] || $(INSTALL_DIR) $(D)$(EFI_DIR); \
-   $(INSTALL_DATA) $(TARGET).efi 
$(D)$(EFI_DIR)/$(T)-$(XEN_FULLVERSION).efi; \
if [ -e $(TARGET).efi.map ]; then \
$(INSTALL_DATA) $(TARGET).efi.map 
$(D)$(DEBUG_DIR)/$(T)-$(XEN_FULLVERSION).efi.map; \
fi; \
-   ln -sf $(T)-$(XEN_FULLVERSION).efi 
$(D)$(EFI_DIR)/$(T)-$(XEN_VERSION).$(XEN_SUBVERSION).efi; \
-   ln -sf $(T)-$(XEN_FULLVERSION).efi 
$(D)$(EFI_DIR)/$(T)-$(XEN_VERSION).efi; \
-   ln -sf $(T)-$(XEN_FULLVERSION).efi $(D)$(EFI_DIR)/$(T).efi; \
+   $(call install_xen_links,.efi)) \
if [ -n '$(EFI_MOUNTPOINT)' -a -n '$(EFI_VENDOR)' ]; then \
$(INSTALL_DATA) $(TARGET).efi 
$(D)$(EFI_MOUNTPOINT)/efi/$(EFI_VENDOR)/$(T)-$(XEN_FULLVERSION).efi; \
elif [ "$(D)" = "$(patsubst $(shell cd $(XEN_ROOT) && 
pwd)/%,%,$(D))" ]; then \
@@ -341,7 +343,7 @@ _clean: delete-unfresh-files
$(MAKE) -f $(BASEDIR)/tools/kconfig/Makefile.kconfig ARCH=$(ARCH) 
SRCARCH=$(SRCARCH) clean
find . \( -name "*.o" -o -name ".*.d" -o -name ".*.d2" \
-o -name "*.gcno" -o -name ".*.cmd" \) -exec rm -f {} \;
-   rm -f include/asm $(TARGET) $(TARGET).gz $(TARGET).efi 
$(TARGET).efi.map $(TARGET)-syms $(TARGET)-syms.map *~ core
+   rm -f include/asm $(TARGET) $(TARGET).gz $(TARGET)*.efi 
$(TARGET).efi.map $(TARGET)-syms $(TARGET)-syms.map *~ core
rm -f include/asm-*/asm-offsets.h
rm -f .banner
 
diff --git a/xen/arch/x86/Makefile b/xen/arch/x86/Makefile
index 7769bb40d7..49c61adabf 100644
--- a/xen/arch/x86/Makefile
+++ b/xen/arch/x86/Makefile
@@ -110,7 +110,7 @@ syms-warn-dup-$(CONFIG_SUPPRESS_DUPLICATE_SYMBOL_WARNINGS) 
:=
 syms-warn-dup-$(CONFIG_ENFORCE_UNIQUE_SYMBOLS) := --error-dup
 
 $(TARGET): TMP = $(@D)/.$(@F).elf32
-$(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
+$(TARGET): $(TARGET).mb.efi $(TARGET)-syms $(efi-y) boot/mkelf32
./boot/mkelf32 $(notes_phdrs) $(TARGET)-syms $(TMP) $(XEN_IMG_OFFSET) \
   `$(NM) $(TARGET)-syms | sed -ne 's/^\([^ ]*\) . 
__2M_rwdata_end$$/0x\1/p'`
od -t x4 -N 8192 $(TMP)  | grep 1badb002 > /dev/null || \
@@ -119,6 +119,11 @@ $(TARGET): $(TARGET)-syms $(efi-y) boot/mkelf32
{ echo "No Multiboot2 header found" >&2; false; }
mv $(TMP) $(TARGET)
 
+$(TARGET).mb.efi: $(TARGET)-syms
+

[PATCH v3 3/5] xen/x86: add some addresses to the Multiboot header

2021-01-21 Thread Bobby Eshleman

From: Daniel Kiper 

In comparison to ELF the PE format is not supported by the Multiboot
protocol. So, if we wish to load xen.mb.efi using this protocol we
have to put header_addr, load_addr, load_end_addr, bss_end_addr and
entry_addr data into Multiboot header.

The Multiboot protocol spec can be found at
  https://www.gnu.org/software/grub/manual/multiboot/

Signed-off-by: Daniel Kiper 
Signed-off-by: Bobby Eshleman 
---
 xen/arch/x86/boot/head.S | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/arch/x86/boot/head.S b/xen/arch/x86/boot/head.S
index 2987b4f03a..189d91a872 100644
--- a/xen/arch/x86/boot/head.S
+++ b/xen/arch/x86/boot/head.S
@@ -50,13 +50,24 @@ ENTRY(start)
 .balign 4
 multiboot1_header: /*** MULTIBOOT1 HEADER /
 #define MULTIBOOT_HEADER_FLAGS (MULTIBOOT_HEADER_MODS_ALIGNED | \
-MULTIBOOT_HEADER_WANT_MEMORY)
+MULTIBOOT_HEADER_WANT_MEMORY | \
+MULTIBOOT_HEADER_HAS_ADDR)
 /* Magic number indicating a Multiboot header. */
 .long   MULTIBOOT_HEADER_MAGIC
 /* Flags to bootloader (see Multiboot spec). */
 .long   MULTIBOOT_HEADER_FLAGS
 /* Checksum: must be the negated sum of the first two fields. */
 .long   -(MULTIBOOT_HEADER_MAGIC + MULTIBOOT_HEADER_FLAGS)
+/* header_addr */
+.long   sym_offs(multiboot1_header)
+/* load_addr */
+.long   sym_offs(start)
+/* load_end_addr */
+.long   sym_offs(__bss_start)
+/* bss_end_addr */
+.long   sym_offs(__2M_rwdata_end)
+/* entry_addr */
+.long   sym_offs(__start)
 
 .size multiboot1_header, . - multiboot1_header
 .type multiboot1_header, @object
-- 
2.30.0

[PATCH v3 1/5] xen: add XEN_BUILD_POSIX_TIME

2021-01-21 Thread Bobby Eshleman

From: Daniel Kiper 

POSIX time is required to fill the TimeDateStamp field
in the PE header.

Use SOURCE_DATE_EPOCH if available, otherwise use
`date +%s` (available on both Linux and FreeBSD).

Signed-off-by: Daniel Kiper 
Signed-off-by: Bobby Eshleman 
---
 xen/Makefile | 2 ++
 xen/include/xen/compile.h.in | 1 +
 2 files changed, 3 insertions(+)

diff --git a/xen/Makefile b/xen/Makefile
index 544cc0995d..85f9b763a4 100644
--- a/xen/Makefile
+++ b/xen/Makefile
@@ -11,6 +11,7 @@ export XEN_DOMAIN ?= $(shell ([ -x /bin/dnsdomainname ] 
&& /bin/dnsdomainname) |
 export XEN_BUILD_DATE  ?= $(shell LC_ALL=C date)
 export XEN_BUILD_TIME  ?= $(shell LC_ALL=C date +%T)
 export XEN_BUILD_HOST  ?= $(shell hostname)
+export XEN_BUILD_POSIX_TIME?= $(shell echo $${SOURCE_DATE_EPOCH:-$(shell 
date +%s)})
 
 # Best effort attempt to find a python interpreter, defaulting to Python 3 if
 # available.  Fall back to just `python` if `which` is nowhere to be found.
@@ -386,6 +387,7 @@ delete-unfresh-files:
 include/xen/compile.h: include/xen/compile.h.in .banner
@sed -e 's/@@date@@/$(XEN_BUILD_DATE)/g' \
-e 's/@@time@@/$(XEN_BUILD_TIME)/g' \
+   -e 's/@@posix_time@@/$(XEN_BUILD_POSIX_TIME)/g' \
-e 's/@@whoami@@/$(XEN_WHOAMI)/g' \
-e 's/@@domain@@/$(XEN_DOMAIN)/g' \
-e 's/@@hostname@@/$(XEN_BUILD_HOST)/g' \
diff --git a/xen/include/xen/compile.h.in b/xen/include/xen/compile.h.in
index 440ecb25c1..b2ae6f96ad 100644
--- a/xen/include/xen/compile.h.in
+++ b/xen/include/xen/compile.h.in
@@ -1,5 +1,6 @@
 #define XEN_COMPILE_DATE   "@@date@@"
 #define XEN_COMPILE_TIME   "@@time@@"
+#define XEN_COMPILE_POSIX_TIME @@posix_time@@
 #define XEN_COMPILE_BY "@@whoami@@"
 #define XEN_COMPILE_DOMAIN "@@domain@@"
 #define XEN_COMPILE_HOST   "@@hostname@@"
-- 
2.30.0

[PATCH v3 0/5] Support Secure Boot for multiboot2 Xen

2021-01-21 Thread Bobby Eshleman

This is version 3 for a patch set sent out to the ML in 2018 [1] to
support UEFI Secure Boot for Xen on multiboot2 platforms.

A new binary, xen.mb.efi, is built.  It contains the mb2 header as well
as a hand-crafted PE/COFF header.  The dom0 kernel is verified using the
shim lock protocol.

I followed with v2 feedback and attempted to convert the PE/COFF header
into C instead of ASM.  Unfortunately, this was only possible for the
first part (Legacy) of the PE/COFF header.  The other parts required
addresses only available at link time (such as __2M_rwdata_end,
__pe_SizeOfImage, efi_mb_start address, etc...), which effectively ruled
out C.

The biggest difference between v2 and v3 is that in v3 we do not attempt
to merge xen.mb.efi and xen.efi into a single binary.  Instead, this
will be left to a future patch set, unless requested otherwise.

[1]: https://lists.xen.org/archives/html/xen-devel/2018-06/msg01292.html

Changes in v3:

- add requested comment clarification
- remove unnecessary fake data from PE/COFF head (like linker versions)
- macro-ize and refactor Makefile according to Jan's feedback
- break PE/COFF header into its own file
- shrink the PE/COFF to start 0x40 instead of 0x80 (my tests showed
  this function with no problem, on a live nested vm or using
  objdump/objcopy)
- support SOURCE_EPOCH for posix time
- removed `date` invocation that would break on FreeBSD
- style changes
- And obviously, ported to current HEAD

Daniel Kiper (5):
  xen: add XEN_BUILD_POSIX_TIME
  xen/x86: manually build xen.mb.efi binary
  xen/x86: add some addresses to the Multiboot header
  xen/x86: add some addresses to the Multiboot2 header
  xen/x86/efi: Verify dom0 kernel with SHIM_LOCK protocol in
efi_multiboot2()

 xen/Makefile |  22 ---
 xen/arch/x86/Makefile|   7 +-
 xen/arch/x86/arch.mk |   2 +
 xen/arch/x86/boot/Makefile   |   1 +
 xen/arch/x86/boot/head.S |  53 +--
 xen/arch/x86/boot/pecoff.S   | 123 +++
 xen/arch/x86/efi/efi-boot.h  |  30 -
 xen/arch/x86/efi/stub.c  |  17 -
 xen/arch/x86/xen.lds.S   |  34 ++
 xen/common/efi/boot.c|  19 --
 xen/include/xen/compile.h.in |   1 +
 xen/include/xen/efi.h|   1 +
 12 files changed, 283 insertions(+), 27 deletions(-)
 create mode 100644 xen/arch/x86/boot/pecoff.S

-- 
2.30.0

[xen-unstable-smoke test] 158565: tolerable all pass - PUSHED

2021-01-21 Thread osstest service owner

flight 158565 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158565/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  24114840ea4f82b6958ba0d7ac5e4cec44aafe11
baseline version:
 xen  dbf22970f5df8d20b2a6b7107cb9d977630181a6

Last test of basis   158561  2021-01-21 19:01:40 Z0 days
Testing same since   158565  2021-01-21 22:01:27 Z0 days1 attempts


People who touched revisions under test:
  Michal Orzel 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   dbf22970f5..24114840ea  24114840ea4f82b6958ba0d7ac5e4cec44aafe11 -> smoke

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Dario Faggioli

On Thu, 2021-01-21 at 19:40 +, Julien Grall wrote:
> Hi Dario,
> 
Hi!

> On 21/01/2021 18:32, Dario Faggioli wrote:
> > On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> > >   
> > > https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> > > .
> > > 
> > Right. Back then, PCI passthrough was involved, if I remember
> > correctly. Is it the case for you as well?
> 
> PCI passthrough is not yet supported on Arm :). However, the bug was 
> reported with platform device passthrough.
> 
Yeah, well... That! Which indeed is not PCI. Sorry for the terminology
mismatch. :-)

> > Well, I'll think about it. >
> > > Starting the system without "sched=null vwfi=native" does not
> > > result
> > > in
> > > the problem.
> > > 
> > Ok, how about, if you're up for some more testing:
> > 
> >   - booting with "sched=null" but not with "vwfi=native"
> >   - booting with "sched=null vwfi=native" but not doing the IRQ
> >     passthrough that you mentioned above
> > 
> > ?
> 
> I think we can skip the testing as the bug was fully diagnostics back
> then. Unfortunately, I don't think a patch was ever posted.
>
True. But an hackish debug patch was provided and, back then, it
worked.

OTOH, Anders seems to be reporting that such a patch did not work here.
I also continue to think that we're facing the same or a very similar
problem... But I'm curious why applying the patch did not help this
time. And that's why I asked for more testing.

Anyway, it's true that we left the issue pending, so something like
this:

>  From Xen PoV, any pCPU executing guest context can be considered 
> quiescent. So one way to solve the problem would be to mark the pCPU 
> when entering to the guest.
> 
Should be done anyway.

We'll then see if it actually solves this problem too, or if this is
really something else.

Thanks for the summary, BTW. :-)

I'll try to work on a patch.

Regards

> [1] 
> 
> https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

[ovmf test] 158555: all pass - PUSHED

2021-01-21 Thread osstest service owner

flight 158555 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158555/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 5b4a97bbc39ed8e7eb50038b9cffe2e948e49995
baseline version:
 ovmf 339371ef78eb3a6f2e9848f8b058379de5e87d39

Last test of basis   158546  2021-01-20 20:10:42 Z1 days
Testing same since   158555  2021-01-21 10:39:50 Z0 days1 attempts


People who touched revisions under test:
  Bob Feng 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   339371ef78..5b4a97bbc3  5b4a97bbc39ed8e7eb50038b9cffe2e948e49995 -> 
xen-tested-master

Re: [PATCH v2 1/4] xl: Add support for ignore_msrs option

2021-01-21 Thread Boris Ostrovsky




On 1/21/21 9:56 AM, Wei Liu wrote:
> On Wed, Jan 20, 2021 at 05:49:09PM -0500, Boris Ostrovsky wrote:
>> This option allows guest administrator specify what should happen when
>> guest accesses an MSR which is not explicitly emulated by the hypervisor.
>>
>> Signed-off-by: Boris Ostrovsky 
>> ---
>>  docs/man/xl.cfg.5.pod.in | 20 +++-
>>  tools/libs/light/libxl_types.idl |  7 +++
>>  tools/xl/xl_parse.c  |  7 +++
>>  3 files changed, 33 insertions(+), 1 deletion(-)
> 
> It is mainly missing a #define LIBXL_HAVE_XXX in libxl.h.
> 
> Other than that, this patch looks good to me. If you end up resending
> this series, please fix that issue.
> 
> If other patches are all reviewed, you can provide some text to be
> merged into this patch.
> 
> Wei.
> 


Ah yes, I forgot about this.



diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3433c950f9aa..f249740daf3f 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1310,6 +1310,13 @@ void libxl_mac_copy(libxl_ctx *ctx, libxl_mac *dst, 
const libxl_mac *src);
  */
 #define LIBXL_HAVE_CREATEINFO_XEND_SUSPEND_EVTCHN_COMPAT
 
+/*
+ *  LIBXL_HAVE_IGNORE_MSRS indicates that the libxl_ignore_msrs field is
+ *  present in libxl_domain_build_info. This field describes hypervisor
+ *  behavior on guest accesses to unimplemented MSRs.
+ */
+#define LIBXL_HAVE_IGNORE_MSRS 1
+
 typedef char **libxl_string_list;
 void libxl_string_list_dispose(libxl_string_list *sl);
 int libxl_string_list_length(const libxl_string_list *sl);

[PATCH v7 10/10] x86/vm_event: Carry Processor Trace buffer offset in vm_event

2021-01-21 Thread Andrew Cooper

From: Tamas K Lengyel 

Add pt_offset field to x86 regs in vm_event. Initialized to ~0 if PT
is not in use.

Signed-off-by: Tamas K Lengyel 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * New
---
 xen/arch/x86/vm_event.c   | 3 +++
 xen/include/public/vm_event.h | 6 ++
 2 files changed, 9 insertions(+)

diff --git a/xen/arch/x86/vm_event.c b/xen/arch/x86/vm_event.c
index 848d69c1b0..09dfc0924e 100644
--- a/xen/arch/x86/vm_event.c
+++ b/xen/arch/x86/vm_event.c
@@ -251,6 +251,9 @@ void vm_event_fill_regs(vm_event_request_t *req)
 
 req->data.regs.x86.shadow_gs = ctxt.shadow_gs;
 req->data.regs.x86.dr6 = ctxt.dr6;
+
+if ( hvm_vmtrace_output_position(curr, >data.regs.x86.pt_offset) != 1 
)
+req->data.regs.x86.pt_offset = ~0;
 #endif
 }
 
diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h
index 141ea024a3..57f34bf902 100644
--- a/xen/include/public/vm_event.h
+++ b/xen/include/public/vm_event.h
@@ -223,6 +223,12 @@ struct vm_event_regs_x86 {
  */
 uint64_t npt_base;
 
+/*
+ * Current offset in the Processor Trace buffer. For Intel Processor Trace
+ * this is MSR_RTIT_OUTPUT_MASK. Set to ~0 if no Processor Trace is active.
+ */
+uint64_t pt_offset;
+
 uint32_t cs_base;
 uint32_t ss_base;
 uint32_t ds_base;
-- 
2.11.0

Re: [PATCH V4 14/24] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2021-01-21 Thread Stefano Stabellini

On Thu, 21 Jan 2021, Oleksandr wrote:
> On 20.01.21 21:47, Stefano Stabellini wrote:
> > On Wed, 20 Jan 2021, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 20/01/2021 00:50, Stefano Stabellini wrote:
> > > > On Tue, 19 Jan 2021, Oleksandr wrote:
> > > > > diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> > > > > index 40b9e59..0508bd8 100644
> > > > > --- a/xen/arch/arm/ioreq.c
> > > > > +++ b/xen/arch/arm/ioreq.c
> > > > > @@ -101,12 +101,10 @@ enum io_state try_fwd_ioserv(struct
> > > > > cpu_user_regs
> > > > > *regs,
> > > > > 
> > > > >    bool arch_ioreq_complete_mmio(void)
> > > > >    {
> > > > > -    struct vcpu *v = current;
> > > > >    struct cpu_user_regs *regs = guest_cpu_user_regs();
> > > > >    const union hsr hsr = { .bits = regs->hsr };
> > > > > -    paddr_t addr = v->io.req.addr;
> > > > > 
> > > > > -    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
> > > > > +    if ( handle_ioserv(regs, current) == IO_HANDLED )
> > > > >    {
> > > > >    advance_pc(regs, hsr);
> > > > >    return true;
> > > > Yes, but I think we want to keep the check
> > > > 
> > > >   vio->req.state == STATE_IORESP_READY
> > > > 
> > > > So maybe (uncompiled, untested):
> > > > 
> > > >   if ( v->io.req.state != STATE_IORESP_READY )
> > > >   return false;
> > > Is it possible to reach this function with v->io.req.state !=
> > > STATE_IORESP_READY? If not, then I would suggest to add an
> > > ASSERT_UNREACHABLE() before the return.
> > If I am reading the state machine right it should *not* be possible to
> > get here with v->io.req.state != STATE_IORESP_READY, so yes,
> > ASSERT_UNREACHABLE() would work.
> Agree here. If the assumption is not correct (unlikely), I think I will catch
> this during testing.
> In addition, we can probably drop case STATE_IORESP_READY in try_fwd_ioserv().
> 
> 
> [not tested]

Yes, looks OK

 
> diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
> index 40b9e59..c7ee1a7 100644
> --- a/xen/arch/arm/ioreq.c
> +++ b/xen/arch/arm/ioreq.c
> @@ -71,9 +71,6 @@ enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
>  case STATE_IOREQ_NONE:
>  break;
> 
> -    case STATE_IORESP_READY:
> -    return IO_HANDLED;
> -
>  default:
>  gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state);
>  return IO_ABORT;
> @@ -104,9 +101,14 @@ bool arch_ioreq_complete_mmio(void)
>  struct vcpu *v = current;
>  struct cpu_user_regs *regs = guest_cpu_user_regs();
>  const union hsr hsr = { .bits = regs->hsr };
> -    paddr_t addr = v->io.req.addr;
> 
> -    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
> +    if ( v->io.req.state != STATE_IORESP_READY )
> +    {
> +    ASSERT_UNREACHABLE();
> +    return false;
> +    }
> +
> +    if ( handle_ioserv(regs, v) == IO_HANDLED )
>  {
>  advance_pc(regs, hsr);
>  return true;

[PATCH v7 09/10] xen/vmtrace: support for VM forks

2021-01-21 Thread Andrew Cooper

From: Tamas K Lengyel 

Implement vmtrace_reset_pt function. Properly set IPT
state for VM forks.

Signed-off-by: Tamas K Lengyel 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * New
---
 xen/arch/x86/hvm/vmx/vmx.c| 11 +++
 xen/arch/x86/mm/mem_sharing.c |  3 +++
 xen/include/asm-x86/hvm/hvm.h |  9 +
 3 files changed, 23 insertions(+)

diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index d4e7b50b8a..40ae398cf5 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2408,6 +2408,16 @@ static int vmtrace_output_position(struct vcpu *v, 
uint64_t *pos)
 return v->arch.hvm.vmx.ipt_active;
 }
 
+static int vmtrace_reset(struct vcpu *v)
+{
+if ( !v->arch.hvm.vmx.ipt_active )
+return -EINVAL;
+
+v->arch.msrs->rtit.output_offset = 0;
+v->arch.msrs->rtit.status = 0;
+return 0;
+}
+
 static struct hvm_function_table __initdata vmx_function_table = {
 .name = "VMX",
 .cpu_up_prepare   = vmx_cpu_up_prepare,
@@ -2467,6 +2477,7 @@ static struct hvm_function_table __initdata 
vmx_function_table = {
 .vmtrace_output_position = vmtrace_output_position,
 .vmtrace_set_option = vmtrace_set_option,
 .vmtrace_get_option = vmtrace_get_option,
+.vmtrace_reset = vmtrace_reset,
 .tsc_scaling = {
 .max_ratio = VMX_TSC_MULTIPLIER_MAX,
 },
diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c
index adaeab4612..fd07eb8d4d 100644
--- a/xen/arch/x86/mm/mem_sharing.c
+++ b/xen/arch/x86/mm/mem_sharing.c
@@ -1632,6 +1632,8 @@ static int copy_vcpu_settings(struct domain *cd, const 
struct domain *d)
 copy_domain_page(new_vcpu_info_mfn, vcpu_info_mfn);
 }
 
+hvm_vmtrace_reset(cd_vcpu);
+
 /*
  * TODO: to support VMs with PV interfaces copy additional
  * settings here, such as PV timers.
@@ -1782,6 +1784,7 @@ static int fork(struct domain *cd, struct domain *d)
 cd->max_pages = d->max_pages;
 *cd->arch.cpuid = *d->arch.cpuid;
 *cd->arch.msr = *d->arch.msr;
+cd->vmtrace_frames = d->vmtrace_frames;
 cd->parent = d;
 }
 
diff --git a/xen/include/asm-x86/hvm/hvm.h b/xen/include/asm-x86/hvm/hvm.h
index 960ec03917..150746de66 100644
--- a/xen/include/asm-x86/hvm/hvm.h
+++ b/xen/include/asm-x86/hvm/hvm.h
@@ -219,6 +219,7 @@ struct hvm_function_table {
 int (*vmtrace_output_position)(struct vcpu *v, uint64_t *pos);
 int (*vmtrace_set_option)(struct vcpu *v, uint64_t key, uint64_t value);
 int (*vmtrace_get_option)(struct vcpu *v, uint64_t key, uint64_t *value);
+int (*vmtrace_reset)(struct vcpu *v);
 
 /*
  * Parameters and callbacks for hardware-assisted TSC scaling,
@@ -696,6 +697,14 @@ static inline int hvm_vmtrace_get_option(
 return -EOPNOTSUPP;
 }
 
+static inline int hvm_vmtrace_reset(struct vcpu *v)
+{
+if ( hvm_funcs.vmtrace_reset )
+return hvm_funcs.vmtrace_reset(v);
+
+return -EOPNOTSUPP;
+}
+
 /*
  * This must be defined as a macro instead of an inline function,
  * because it uses 'struct vcpu' and 'struct domain' which have
-- 
2.11.0

[PATCH v7 01/10] xen+tools: Introduce XEN_SYSCTL_PHYSCAP_vmtrace

2021-01-21 Thread Andrew Cooper

We're about to introduce support for Intel Processor Trace, but similar
functionality exists in other platforms.

Aspects of vmtrace can reasonably can be common, so start with
XEN_SYSCTL_PHYSCAP_vmtrace and plumb the signal from Xen all the way down into
`xl info`.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Ian Jackson 
CC: Wei Liu 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * New
---
 tools/golang/xenlight/helpers.gen.go | 2 ++
 tools/golang/xenlight/types.gen.go   | 1 +
 tools/include/libxl.h| 7 +++
 tools/libs/light/libxl.c | 2 ++
 tools/libs/light/libxl_types.idl | 1 +
 tools/xl/xl_info.c   | 5 +++--
 xen/common/domain.c  | 2 ++
 xen/common/sysctl.c  | 2 ++
 xen/include/public/sysctl.h  | 1 +
 xen/include/xen/domain.h | 2 ++
 10 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index c8605994e7..62fb98ba30 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -3345,6 +3345,7 @@ x.CapHvmDirectio = bool(xc.cap_hvm_directio)
 x.CapHap = bool(xc.cap_hap)
 x.CapShadow = bool(xc.cap_shadow)
 x.CapIommuHapPtShare = bool(xc.cap_iommu_hap_pt_share)
+x.CapVmtrace = bool(xc.cap_vmtrace)
 
  return nil}
 
@@ -3375,6 +3376,7 @@ xc.cap_hvm_directio = C.bool(x.CapHvmDirectio)
 xc.cap_hap = C.bool(x.CapHap)
 xc.cap_shadow = C.bool(x.CapShadow)
 xc.cap_iommu_hap_pt_share = C.bool(x.CapIommuHapPtShare)
+xc.cap_vmtrace = C.bool(x.CapVmtrace)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index b4c5df0f2c..369da6dd1e 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -998,6 +998,7 @@ CapHvmDirectio bool
 CapHap bool
 CapShadow bool
 CapIommuHapPtShare bool
+CapVmtrace bool
 }
 
 type Connectorinfo struct {
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index 3433c950f9..c4d920f1e5 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -464,6 +464,13 @@
 #define LIBXL_HAVE_DEVICE_PCI_ASSIGNABLE_LIST_FREE 1
 
 /*
+ * LIBXL_HAVE_PHYSINFO_CAP_VMTRACE indicates that libxl_physinfo has a
+ * cap_vmtrace field, which indicates the availability of platform tracing
+ * functionality.
+ */
+#define LIBXL_HAVE_PHYSINFO_CAP_VMTRACE 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libs/light/libxl.c b/tools/libs/light/libxl.c
index d2a87157a2..204eb0be2d 100644
--- a/tools/libs/light/libxl.c
+++ b/tools/libs/light/libxl.c
@@ -402,6 +402,8 @@ int libxl_get_physinfo(libxl_ctx *ctx, libxl_physinfo 
*physinfo)
 !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_shadow);
 physinfo->cap_iommu_hap_pt_share =
 !!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_iommu_hap_pt_share);
+physinfo->cap_vmtrace =
+!!(xcphysinfo.capabilities & XEN_SYSCTL_PHYSCAP_vmtrace);
 
 GC_FREE;
 return 0;
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index 05324736b7..b43d5f1265 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -1050,6 +1050,7 @@ libxl_physinfo = Struct("physinfo", [
 ("cap_hap", bool),
 ("cap_shadow", bool),
 ("cap_iommu_hap_pt_share", bool),
+("cap_vmtrace", bool),
 ], dir=DIR_OUT)
 
 libxl_connectorinfo = Struct("connectorinfo", [
diff --git a/tools/xl/xl_info.c b/tools/xl/xl_info.c
index ca417df8e8..8383e4a6df 100644
--- a/tools/xl/xl_info.c
+++ b/tools/xl/xl_info.c
@@ -210,14 +210,15 @@ static void output_physinfo(void)
  info.hw_cap[4], info.hw_cap[5], info.hw_cap[6], info.hw_cap[7]
 );
 
-maybe_printf("virt_caps  :%s%s%s%s%s%s%s\n",
+maybe_printf("virt_caps  :%s%s%s%s%s%s%s%s\n",
  info.cap_pv ? " pv" : "",
  info.cap_hvm ? " hvm" : "",
  info.cap_hvm && info.cap_hvm_directio ? " hvm_directio" : "",
  info.cap_pv && info.cap_hvm_directio ? " pv_directio" : "",
  info.cap_hap ? " hap" : "",
  info.cap_shadow ? " shadow" : "",
- info.cap_iommu_hap_pt_share ? " iommu_hap_pt_share" : ""
+ info.cap_iommu_hap_pt_share ? " iommu_hap_pt_share" : "",
+ info.cap_vmtrace ? " vmtrace" : ""
 );
 
 vinfo = libxl_get_version_info(ctx);
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 2b461655c3..d1e94d88cf 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -82,6 +82,8 @@ struct vcpu *idle_vcpu[NR_CPUS] __read_mostly;
 
 vcpu_info_t dummy_vcpu_info;
 
+bool __read_mostly vmtrace_available;
+
 static void __domain_finalise_shutdown(struct domain *d)
 {
 struct vcpu *v;
diff --git a/xen/common/sysctl.c b/xen/common/sysctl.c
index ec916424e5..3558641cd9 100644
--- a/xen/common/sysctl.c
+++

[PATCH v7 00/10] Implement support for external IPT monitoring

2021-01-21 Thread Andrew Cooper

This is the next iteration of the external IPT monitoring series, reworked
massively from v6 to fix a whole slew of bugs with the XENMEM_acquire_resource
interface.  It also includes some additional bugfixes to make traced VM's work
when forked.

A branch with all prerequisites is available at:
  
https://xenbits.xen.org/gitweb/?p=people/andrewcoop/xen.git;a=shortlog;h=refs/heads/xen-ipt

This version has undergone basic testing from Michał, Tamas and myself, and
seems to work for the intended usecases.

Andrew Cooper (1):
  xen+tools: Introduce XEN_SYSCTL_PHYSCAP_vmtrace

Michał Leszczyński (7):
  xen/domain: Add vmtrace_frames domain creation parameter
  tools/[lib]xl: Add vmtrace_buf_size parameter
  xen/memory: Add a vmtrace_buf resource type
  x86/vmx: Add Intel Processor Trace support
  xen/domctl: Add XEN_DOMCTL_vmtrace_op
  tools/libxc: Add xc_vmtrace_* functions
  tools/misc: Add xen-vmtrace tool

Tamas K Lengyel (2):
  xen/vmtrace: support for VM forks
  x86/vm_event: Carry Processor Trace buffer offset in vm_event

 docs/man/xl.cfg.5.pod.in|   9 ++
 tools/golang/xenlight/helpers.gen.go|   4 +
 tools/golang/xenlight/types.gen.go  |   2 +
 tools/include/libxl.h   |  14 ++
 tools/include/xenctrl.h |  73 +++
 tools/libs/ctrl/Makefile|   1 +
 tools/libs/ctrl/xc_vmtrace.c| 128 ++
 tools/libs/light/libxl.c|   2 +
 tools/libs/light/libxl_cpuid.c  |   1 +
 tools/libs/light/libxl_create.c |   2 +
 tools/libs/light/libxl_types.idl|   5 +
 tools/misc/.gitignore   |   1 +
 tools/misc/Makefile |   4 +
 tools/misc/xen-cpuid.c  |   2 +-
 tools/misc/xen-vmtrace.c| 154 ++
 tools/xl/xl_info.c  |   5 +-
 tools/xl/xl_parse.c |   4 +
 xen/arch/x86/domain.c   |  23 
 xen/arch/x86/domctl.c   |  55 
 xen/arch/x86/hvm/vmx/vmcs.c |  15 ++-
 xen/arch/x86/hvm/vmx/vmx.c  | 196 +++-
 xen/arch/x86/mm/mem_sharing.c   |   3 +
 xen/arch/x86/vm_event.c |   3 +
 xen/common/domain.c |  58 
 xen/common/memory.c |  27 
 xen/common/sysctl.c |   2 +
 xen/include/asm-x86/cpufeature.h|   1 +
 xen/include/asm-x86/hvm/hvm.h   |  72 ++
 xen/include/asm-x86/hvm/vmx/vmcs.h  |   4 +
 xen/include/asm-x86/msr.h   |  32 +
 xen/include/public/arch-x86/cpufeatureset.h |   1 +
 xen/include/public/domctl.h |  36 +
 xen/include/public/memory.h |   1 +
 xen/include/public/sysctl.h |   1 +
 xen/include/public/vm_event.h   |   6 +
 xen/include/xen/domain.h|   2 +
 xen/include/xen/sched.h |   7 +
 xen/xsm/flask/hooks.c   |   1 +
 38 files changed, 952 insertions(+), 5 deletions(-)
 create mode 100644 tools/libs/ctrl/xc_vmtrace.c
 create mode 100644 tools/misc/xen-vmtrace.c

-- 
2.11.0

[PATCH v7 06/10] xen/domctl: Add XEN_DOMCTL_vmtrace_op

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Implement an interface to configure and control tracing operations.  Reuse the
existing SETDEBUGGING flask vector rather than inventing a new one.

Userspace using this interface is going to need platform specific knowledge
anyway to interpret the contents of the trace buffer.  While some operations
(e.g. enable/disable) can reasonably be generic, others cannot.  Provide an
explicitly-platform specific pair of get/set operations to reduce API churn as
new options get added/enabled.

For the VMX specific Processor Trace implementation, tolerate reading and
modifying a safe subset of bits in CTL, STATUS and OUTPUT_MASK.  This permits
userspace to control the content which gets logged, but prevents modification
of details such as the position/size of the output buffer.

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * Major chop within the series.
---
 xen/arch/x86/domctl.c |  55 +++
 xen/arch/x86/hvm/vmx/vmx.c| 151 ++
 xen/include/asm-x86/hvm/hvm.h |  63 ++
 xen/include/public/domctl.h   |  35 ++
 xen/xsm/flask/hooks.c |   1 +
 5 files changed, 305 insertions(+)

diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c
index b28cfe9817..aa6dfe8eed 100644
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -155,6 +155,55 @@ void arch_get_domain_info(const struct domain *d,
 info->arch_config.emulation_flags = d->arch.emulation_flags;
 }
 
+static int do_vmtrace_op(struct domain *d, struct xen_domctl_vmtrace_op *op,
+ XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl)
+{
+struct vcpu *v;
+int rc;
+
+if ( !d->vmtrace_frames || d == current->domain /* No vcpu_pause() */ )
+return -EINVAL;
+
+ASSERT(is_hvm_domain(d)); /* Restricted by domain creation logic. */
+
+v = domain_vcpu(d, op->vcpu);
+if ( !v )
+return -ENOENT;
+
+vcpu_pause(v);
+switch ( op->cmd )
+{
+case XEN_DOMCTL_vmtrace_enable:
+case XEN_DOMCTL_vmtrace_disable:
+case XEN_DOMCTL_vmtrace_reset_and_enable:
+rc = hvm_vmtrace_control(
+v, op->cmd != XEN_DOMCTL_vmtrace_disable,
+op->cmd == XEN_DOMCTL_vmtrace_reset_and_enable);
+break;
+
+case XEN_DOMCTL_vmtrace_output_position:
+rc = hvm_vmtrace_output_position(v, >value);
+if ( rc >= 0 )
+rc = 0;
+break;
+
+case XEN_DOMCTL_vmtrace_get_option:
+rc = hvm_vmtrace_get_option(v, op->key, >value);
+break;
+
+case XEN_DOMCTL_vmtrace_set_option:
+rc = hvm_vmtrace_set_option(v, op->key, op->value);
+break;
+
+default:
+rc = -EOPNOTSUPP;
+break;
+}
+vcpu_unpause(v);
+
+return rc;
+}
+
 #define MAX_IOPORTS 0x1
 
 long arch_do_domctl(
@@ -1320,6 +1369,12 @@ long arch_do_domctl(
 domain_unpause(d);
 break;
 
+case XEN_DOMCTL_vmtrace_op:
+ret = do_vmtrace_op(d, >u.vmtrace_op, u_domctl);
+if ( !ret )
+copyback = true;
+break;
+
 default:
 ret = iommu_do_domctl(domctl, d, u_domctl);
 break;
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 93121fbf27..d4e7b50b8a 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2261,6 +2261,153 @@ static bool vmx_get_pending_event(struct vcpu *v, 
struct x86_event *info)
 return true;
 }
 
+static int vmtrace_get_option(struct vcpu *v, uint64_t key, uint64_t *output)
+{
+const struct vcpu_msrs *msrs = v->arch.msrs;
+
+/*
+ * We only let vmtrace agents see and modify a subset of bits in
+ * MSR_RTIT_CTL.  These all pertain to date emitted into the trace
+ * buffer(s).  Must not include controls pertaining to the
+ * structure/position of the trace buffer(s).
+ */
+#define RTIT_CTL_MASK   \
+(RTIT_CTL_TRACE_EN | RTIT_CTL_OS | RTIT_CTL_USR | RTIT_CTL_TSC_EN | \
+ RTIT_CTL_DIS_RETC | RTIT_CTL_BRANCH_EN)
+
+/*
+ * Status bits restricted to the first-gen subset (i.e. no further CPUID
+ * requirements.)
+ */
+#define RTIT_STATUS_MASK \
+(RTIT_STATUS_FILTER_EN | RTIT_STATUS_CONTEXT_EN | RTIT_STATUS_TRIGGER_EN | 
\
+ RTIT_STATUS_ERROR | RTIT_STATUS_STOPPED)
+
+switch ( key )
+{
+case MSR_RTIT_OUTPUT_MASK:
+*output = msrs->rtit.output_mask;
+break;
+
+case MSR_RTIT_CTL:
+*output = msrs->rtit.ctl & RTIT_CTL_MASK;
+break;
+
+case MSR_RTIT_STATUS:
+*output = msrs->rtit.status & RTIT_STATUS_MASK;
+break;
+
+default:
+*output = 0;
+return -EINVAL;
+}
+return 0;
+}
+
+static int vmtrace_set_option(struct vcpu *v, uint64_t key, uint64_t value)
+{
+

[PATCH v7 07/10] tools/libxc: Add xc_vmtrace_* functions

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Add functions in libxc that use the new XEN_DOMCTL_vmtrace interface.

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * Use the name 'vmtrace' consistently.
---
 tools/include/xenctrl.h  |  73 
 tools/libs/ctrl/Makefile |   1 +
 tools/libs/ctrl/xc_vmtrace.c | 128 +++
 3 files changed, 202 insertions(+)
 create mode 100644 tools/libs/ctrl/xc_vmtrace.c

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index 3796425e1e..0efcdae8b4 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1583,6 +1583,79 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t 
mask);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
+/**
+ * Enable vmtrace for given vCPU.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_enable(xc_interface *xch, uint32_t domid, uint32_t vcpu);
+
+/**
+ * Enable vmtrace for given vCPU.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_disable(xc_interface *xch, uint32_t domid, uint32_t vcpu);
+
+/**
+ * Enable vmtrace for a given vCPU, along with resetting status/offset
+ * details.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_reset_and_enable(xc_interface *xch, uint32_t domid,
+uint32_t vcpu);
+
+/**
+ * Get current output position inside the trace buffer.
+ *
+ * Repeated calls will return different values if tracing is enabled.  It is
+ * platform specific what happens when the buffer fills completely.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm pos current output position in bytes
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_output_position(xc_interface *xch, uint32_t domid,
+   uint32_t vcpu, uint64_t *pos);
+
+/**
+ * Get platform specific vmtrace options.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm key platform-specific input
+ * @parm value platform-specific output
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_get_option(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu, uint64_t key, uint64_t *value);
+
+/**
+ * Set platform specific vntvmtrace options.
+ *
+ * @parm xch a handle to an open hypervisor interface
+ * @parm domid domain identifier
+ * @parm vcpu vcpu identifier
+ * @parm key platform-specific input
+ * @parm value platform-specific input
+ * @return 0 on success, -1 on failure
+ */
+int xc_vmtrace_set_option(xc_interface *xch, uint32_t domid,
+  uint32_t vcpu, uint64_t key, uint64_t value);
+
 int xc_domctl(xc_interface *xch, struct xen_domctl *domctl);
 int xc_sysctl(xc_interface *xch, struct xen_sysctl *sysctl);
 
diff --git a/tools/libs/ctrl/Makefile b/tools/libs/ctrl/Makefile
index 4185dc3f22..449c89f440 100644
--- a/tools/libs/ctrl/Makefile
+++ b/tools/libs/ctrl/Makefile
@@ -22,6 +22,7 @@ SRCS-y   += xc_pm.c
 SRCS-y   += xc_cpu_hotplug.c
 SRCS-y   += xc_resume.c
 SRCS-y   += xc_vm_event.c
+SRCS-y   += xc_vmtrace.c
 SRCS-y   += xc_monitor.c
 SRCS-y   += xc_mem_paging.c
 SRCS-y   += xc_mem_access.c
diff --git a/tools/libs/ctrl/xc_vmtrace.c b/tools/libs/ctrl/xc_vmtrace.c
new file mode 100644
index 00..602502367f
--- /dev/null
+++ b/tools/libs/ctrl/xc_vmtrace.c
@@ -0,0 +1,128 @@
+/**
+ * xc_vmtrace.c
+ *
+ * API for manipulating hardware tracing features
+ *
+ * Copyright (c) 2020, Michal Leszczynski
+ *
+ * Copyright 2020 CERT Polska. All rights reserved.
+ * Use is subject to license terms.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation;
+ * version 2.1 of the License.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; If not, see .
+ */
+
+#include "xc_private.h"
+
+int xc_vmtrace_enable(
+xc_interface

[PATCH v7 04/10] xen/memory: Add a vmtrace_buf resource type

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Allow to map processor trace buffer using acquire_resource().

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * Rebase over changes elsewhere in the series
---
 xen/common/memory.c | 27 +++
 xen/include/public/memory.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index c89923d909..ec6a55172a 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -1068,11 +1068,35 @@ static unsigned int resource_max_frames(const struct 
domain *d,
 case XENMEM_resource_grant_table:
 return gnttab_resource_max_frames(d, id);
 
+case XENMEM_resource_vmtrace_buf:
+return d->vmtrace_frames;
+
 default:
 return arch_resource_max_frames(d, type, id);
 }
 }
 
+static int acquire_vmtrace_buf(
+struct domain *d, unsigned int id, unsigned long frame,
+unsigned int nr_frames, xen_pfn_t mfn_list[])
+{
+const struct vcpu *v = domain_vcpu(d, id);
+unsigned int i;
+mfn_t mfn;
+
+if ( !v || !v->vmtrace.buf ||
+ nr_frames > d->vmtrace_frames ||
+ (frame + nr_frames) > d->vmtrace_frames )
+return -EINVAL;
+
+mfn = page_to_mfn(v->vmtrace.buf);
+
+for ( i = 0; i < nr_frames; i++ )
+mfn_list[i] = mfn_x(mfn) + frame + i;
+
+return nr_frames;
+}
+
 /*
  * Returns -errno on error, or positive in the range [1, nr_frames] on
  * success.  Returning less than nr_frames contitutes a request for a
@@ -1087,6 +,9 @@ static int _acquire_resource(
 case XENMEM_resource_grant_table:
 return gnttab_acquire_resource(d, id, frame, nr_frames, mfn_list);
 
+case XENMEM_resource_vmtrace_buf:
+return acquire_vmtrace_buf(d, id, frame, nr_frames, mfn_list);
+
 default:
 return arch_acquire_resource(d, type, id, frame, nr_frames, mfn_list);
 }
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index c4c47a0b38..b0378bb14b 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -625,6 +625,7 @@ struct xen_mem_acquire_resource {
 
 #define XENMEM_resource_ioreq_server 0
 #define XENMEM_resource_grant_table 1
+#define XENMEM_resource_vmtrace_buf 2
 
 /*
  * IN - a type-specific resource identifier, which must be zero
-- 
2.11.0

[PATCH v7 02/10] xen/domain: Add vmtrace_frames domain creation parameter

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

To use vmtrace, buffers of a suitable size need allocating, and different
tasks will want different sizes.

Add a domain creation parameter, and audit it appropriately in the
{arch_,}sanitise_domain_config() functions.

For now, the x86 specific auditing is tuned to Processor Trace running in
Single Output mode, which requires a single contiguous range of memory.

The size is given an arbitrary limit of 64M which is expected to be enough for
anticipated usecases, but not large enough to get into long-running-hypercall
problems.

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Anthony PERARD 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

When support for later generations of IPT get added, we can in principle start
to use ToTP which is a scatter list of smaller trace regions to use, if we
need to massively up the buffer size available.

v7:
 * Major chop within the series.
 * Use the name 'vmtrace' consistently.
 * Use the (new) common vcpu_teardown() functionality, rather than leaving a
   latent memory leak on ARM.
---
 xen/arch/x86/domain.c   | 23 +++
 xen/common/domain.c | 56 +
 xen/include/public/domctl.h |  1 +
 xen/include/xen/sched.h |  7 ++
 4 files changed, 87 insertions(+)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index b9ba04633e..3f12d68e9e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -660,6 +660,29 @@ int arch_sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 return -EINVAL;
 }
 
+if ( config->vmtrace_frames )
+{
+unsigned int frames = config->vmtrace_frames;
+
+ASSERT(vmtrace_available); /* Checked by common code. */
+
+/*
+ * For now, vmtrace is restricted to HVM guests, and using a
+ * power-of-2 buffer up to 64M in size.
+ */
+if ( !hvm )
+{
+dprintk(XENLOG_INFO, "vmtrace not supported for PV\n");
+return -EINVAL;
+}
+
+if ( frames > (MB(64) >> PAGE_SHIFT) || (frames & (frames - 1)) )
+{
+dprintk(XENLOG_INFO, "Unsupported vmtrace frames: %u\n", frames);
+return -EINVAL;
+}
+}
+
 return 0;
 }
 
diff --git a/xen/common/domain.c b/xen/common/domain.c
index d1e94d88cf..a844bc7b78 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -132,6 +132,48 @@ static void vcpu_info_reset(struct vcpu *v)
 v->vcpu_info_mfn = INVALID_MFN;
 }
 
+static void vmtrace_free_buffer(struct vcpu *v)
+{
+const struct domain *d = v->domain;
+struct page_info *pg = v->vmtrace.buf;
+unsigned int i;
+
+if ( !pg )
+return;
+
+for ( i = 0; i < d->vmtrace_frames; i++ )
+{
+put_page_alloc_ref([i]);
+put_page_and_type([i]);
+}
+
+v->vmtrace.buf = NULL;
+}
+
+static int vmtrace_alloc_buffer(struct vcpu *v)
+{
+struct domain *d = v->domain;
+struct page_info *pg;
+unsigned int i;
+
+if ( !d->vmtrace_frames )
+return 0;
+
+pg = alloc_domheap_pages(d, get_order_from_pages(d->vmtrace_frames),
+ MEMF_no_refcount);
+if ( !pg )
+return -ENOMEM;
+
+v->vmtrace.buf = pg;
+
+for ( i = 0; i < d->vmtrace_frames; i++ )
+/* Domain can't know about this page yet - something fishy going on. */
+if ( !get_page_and_type([i], d, PGT_writable_page) )
+BUG();
+
+return 0;
+}
+
 /*
  * Release resources held by a vcpu.  There may or may not be live references
  * to the vcpu, and it may or may not be fully constructed.
@@ -140,6 +182,8 @@ static void vcpu_info_reset(struct vcpu *v)
  */
 static int vcpu_teardown(struct vcpu *v)
 {
+vmtrace_free_buffer(v);
+
 return 0;
 }
 
@@ -201,6 +245,9 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int 
vcpu_id)
 if ( sched_init_vcpu(v) != 0 )
 goto fail_wq;
 
+if ( vmtrace_alloc_buffer(v) != 0 )
+goto fail_wq;
+
 if ( arch_vcpu_create(v) != 0 )
 goto fail_sched;
 
@@ -449,6 +496,12 @@ static int sanitise_domain_config(struct 
xen_domctl_createdomain *config)
 }
 }
 
+if ( config->vmtrace_frames && !vmtrace_available )
+{
+dprintk(XENLOG_INFO, "vmtrace requested but not available\n");
+return -EINVAL;
+}
+
 return arch_sanitise_domain_config(config);
 }
 
@@ -474,7 +527,10 @@ struct domain *domain_create(domid_t domid,
 ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
 
 if ( config )
+{
 d->options = config->flags;
+d->vmtrace_frames = config->vmtrace_frames;
+}
 
 /* Sort out our idea of is_control_domain(). */
 d->is_privileged = is_priv;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 666aeb71bf..1585678d50 100644
--- a/xen/include/public/domctl.h
+++

[PATCH v7 03/10] tools/[lib]xl: Add vmtrace_buf_size parameter

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Allow to specify the size of per-vCPU trace buffer upon
domain creation. This is zero by default (meaning: not enabled).

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Anthony PERARD 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * Use the name 'vmtrace' consistently.
---
 docs/man/xl.cfg.5.pod.in | 9 +
 tools/golang/xenlight/helpers.gen.go | 2 ++
 tools/golang/xenlight/types.gen.go   | 1 +
 tools/include/libxl.h| 7 +++
 tools/libs/light/libxl_create.c  | 2 ++
 tools/libs/light/libxl_types.idl | 4 
 tools/xl/xl_parse.c  | 4 
 7 files changed, 29 insertions(+)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index c8e017f950..86963298a3 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -681,6 +681,15 @@ Windows).
 
 If this option is not specified then it will default to B.
 
+=item B
+
+Specifies the size of vmtrace buffer that would be allocated for each
+vCPU belonging to this domain.  Disabled (i.e.  B) by
+default.
+
+B: Acceptable values are platform specific.  For Intel Processor
+Trace, this value must be a power of 2 between 4k and 16M.
+
 =back
 
 =head2 Devices
diff --git a/tools/golang/xenlight/helpers.gen.go 
b/tools/golang/xenlight/helpers.gen.go
index 62fb98ba30..1e22a407bf 100644
--- a/tools/golang/xenlight/helpers.gen.go
+++ b/tools/golang/xenlight/helpers.gen.go
@@ -1114,6 +1114,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 x.ArchArm.GicVersion = GicVersion(xc.arch_arm.gic_version)
 x.ArchArm.Vuart = VuartType(xc.arch_arm.vuart)
 x.Altp2M = Altp2MMode(xc.altp2m)
+x.VmtraceBufKb = int(xc.vmtrace_buf_kb)
 
  return nil}
 
@@ -1589,6 +1590,7 @@ return fmt.Errorf("invalid union key '%v'", x.Type)}
 xc.arch_arm.gic_version = C.libxl_gic_version(x.ArchArm.GicVersion)
 xc.arch_arm.vuart = C.libxl_vuart_type(x.ArchArm.Vuart)
 xc.altp2m = C.libxl_altp2m_mode(x.Altp2M)
+xc.vmtrace_buf_kb = C.int(x.VmtraceBufKb)
 
  return nil
  }
diff --git a/tools/golang/xenlight/types.gen.go 
b/tools/golang/xenlight/types.gen.go
index 369da6dd1e..130524dce9 100644
--- a/tools/golang/xenlight/types.gen.go
+++ b/tools/golang/xenlight/types.gen.go
@@ -513,6 +513,7 @@ GicVersion GicVersion
 Vuart VuartType
 }
 Altp2M Altp2MMode
+VmtraceBufKb int
 }
 
 type domainBuildInfoTypeUnion interface {
diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index c4d920f1e5..7606a21219 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -471,6 +471,13 @@
 #define LIBXL_HAVE_PHYSINFO_CAP_VMTRACE 1
 
 /*
+ * LIBXL_HAVE_VMTRACE_BUF_KB indicates that libxl_domain_create_info has a
+ * vmtrace_buf_kb parameter, which allows to enable pre-allocation of
+ * processor tracing buffers of given size.
+ */
+#define LIBXL_HAVE_VMTRACE_BUF_KB 1
+
+/*
  * libxl ABI compatibility
  *
  * The only guarantee which libxl makes regarding ABI compatibility
diff --git a/tools/libs/light/libxl_create.c b/tools/libs/light/libxl_create.c
index 86f4a8369d..d4a9380357 100644
--- a/tools/libs/light/libxl_create.c
+++ b/tools/libs/light/libxl_create.c
@@ -607,6 +607,8 @@ int libxl__domain_make(libxl__gc *gc, libxl_domain_config 
*d_config,
 .max_evtchn_port = b_info->event_channels,
 .max_grant_frames = b_info->max_grant_frames,
 .max_maptrack_frames = b_info->max_maptrack_frames,
+.vmtrace_frames = DIV_ROUNDUP(b_info->vmtrace_buf_kb,
+  XC_PAGE_SIZE >> 10),
 };
 
 if (info->type != LIBXL_DOMAIN_TYPE_PV) {
diff --git a/tools/libs/light/libxl_types.idl b/tools/libs/light/libxl_types.idl
index b43d5f1265..c092c31b5e 100644
--- a/tools/libs/light/libxl_types.idl
+++ b/tools/libs/light/libxl_types.idl
@@ -646,6 +646,10 @@ libxl_domain_build_info = Struct("domain_build_info",[
 # supported by x86 HVM and ARM support is planned.
 ("altp2m", libxl_altp2m_mode),
 
+# Size of preallocated vmtrace trace buffers (in KBYTES).
+# Use zero value to disable this feature.
+("vmtrace_buf_kb", integer),
+
 ], dir=DIR_IN,
copy_deprecated_fn="libxl__domain_build_info_copy_deprecated",
 )
diff --git a/tools/xl/xl_parse.c b/tools/xl/xl_parse.c
index 4ebf39620a..ca99af8d1b 100644
--- a/tools/xl/xl_parse.c
+++ b/tools/xl/xl_parse.c
@@ -1863,6 +1863,10 @@ void parse_config_data(const char *config_source,
 }
 }
 
+if (!xlu_cfg_get_long(config, "vmtrace_buf_kb", , 1) && l) {
+b_info->vmtrace_buf_kb = l;
+}
+
 if (!xlu_cfg_get_list(config, "ioports", , _ioports, 0)) {
 b_info->num_ioports = num_ioports;
 b_info->ioports = calloc(num_ioports, sizeof(*b_info->ioports));
-- 
2.11.0

[PATCH v7 05/10] x86/vmx: Add Intel Processor Trace support

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Add CPUID/MSR enumeration details for Processor Trace.  For now, we will only
support its use inside VMX operation.  Fill in the vmtrace_available boolean
to activate the newly introduced common infrastructure for allocating trace
buffers.

For now, Processor Trace is going to be operated in Single Output mode behind
the guests back.  Add the MSRs to struct vcpu_msrs, and set up the buffer
limit in vmx_init_pt() as it is fixed for the lifetime of the domain.

Context switch the most of the MSRs in and out of vCPU context switch, but the
main control register needs to reside in the MSR load/save lists.  Explicitly
pull the msrs pointer out into a local variable, because the optimiser cannot
keep it live across the memory clobbers in the MSR accesses.

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Roger Pau Monné 
CC: Wei Liu 
CC: Jun Nakajima 
CC: Kevin Tian 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 

v7:
 * Major chop within the series.
 * Move MSRs to vcpu_msrs, which is where they'll definitely want to live when
   we offer PT to VMs for their own use.
---
 tools/libs/light/libxl_cpuid.c  |  1 +
 tools/misc/xen-cpuid.c  |  2 +-
 xen/arch/x86/hvm/vmx/vmcs.c | 15 -
 xen/arch/x86/hvm/vmx/vmx.c  | 34 -
 xen/include/asm-x86/cpufeature.h|  1 +
 xen/include/asm-x86/hvm/vmx/vmcs.h  |  4 
 xen/include/asm-x86/msr.h   | 32 +++
 xen/include/public/arch-x86/cpufeatureset.h |  1 +
 8 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index 259612834e..289c59c742 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -188,6 +188,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*cpuid, const char* str)
 {"avx512-ifma",  0x0007,  0, CPUID_REG_EBX, 21,  1},
 {"clflushopt",   0x0007,  0, CPUID_REG_EBX, 23,  1},
 {"clwb", 0x0007,  0, CPUID_REG_EBX, 24,  1},
+{"proc-trace",   0x0007,  0, CPUID_REG_EBX, 25,  1},
 {"avx512pf", 0x0007,  0, CPUID_REG_EBX, 26,  1},
 {"avx512er", 0x0007,  0, CPUID_REG_EBX, 27,  1},
 {"avx512cd", 0x0007,  0, CPUID_REG_EBX, 28,  1},
diff --git a/tools/misc/xen-cpuid.c b/tools/misc/xen-cpuid.c
index c81aa93055..2d04162d8d 100644
--- a/tools/misc/xen-cpuid.c
+++ b/tools/misc/xen-cpuid.c
@@ -106,7 +106,7 @@ static const char *const str_7b0[32] =
 [18] = "rdseed",   [19] = "adx",
 [20] = "smap", [21] = "avx512-ifma",
 [22] = "pcommit",  [23] = "clflushopt",
-[24] = "clwb", [25] = "pt",
+[24] = "clwb", [25] = "proc-trace",
 [26] = "avx512pf", [27] = "avx512er",
 [28] = "avx512cd", [29] = "sha",
 [30] = "avx512bw", [31] = "avx512vl",
diff --git a/xen/arch/x86/hvm/vmx/vmcs.c b/xen/arch/x86/hvm/vmx/vmcs.c
index 164535f8f0..5576caad8e 100644
--- a/xen/arch/x86/hvm/vmx/vmcs.c
+++ b/xen/arch/x86/hvm/vmx/vmcs.c
@@ -291,6 +291,20 @@ static int vmx_init_vmcs_config(void)
 _vmx_cpu_based_exec_control &=
 ~(CPU_BASED_CR8_LOAD_EXITING | CPU_BASED_CR8_STORE_EXITING);
 
+rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
+
+/* Check whether IPT is supported in VMX operation. */
+if ( !smp_processor_id() )
+vmtrace_available = cpu_has_proc_trace &&
+(_vmx_misc_cap & VMX_MISC_PROC_TRACE);
+else if ( vmtrace_available &&
+  !(_vmx_misc_cap & VMX_MISC_PROC_TRACE) )
+{
+printk("VMX: IPT capabilities differ between CPU%u and CPU0\n",
+   smp_processor_id());
+return -EINVAL;
+}
+
 if ( _vmx_cpu_based_exec_control & CPU_BASED_ACTIVATE_SECONDARY_CONTROLS )
 {
 min = 0;
@@ -305,7 +319,6 @@ static int vmx_init_vmcs_config(void)
SECONDARY_EXEC_ENABLE_VIRT_EXCEPTIONS |
SECONDARY_EXEC_XSAVES |
SECONDARY_EXEC_TSC_SCALING);
-rdmsrl(MSR_IA32_VMX_MISC, _vmx_misc_cap);
 if ( _vmx_misc_cap & VMX_MISC_VMWRITE_ALL )
 opt |= SECONDARY_EXEC_ENABLE_VMCS_SHADOWING;
 if ( opt_vpid_enabled )
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 4120234c15..93121fbf27 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -428,6 +428,20 @@ static void vmx_domain_relinquish_resources(struct domain 
*d)
 vmx_free_vlapic_mapping(d);
 }
 
+static void vmx_init_pt(struct vcpu *v)
+{
+unsigned int frames = v->domain->vmtrace_frames;
+
+if ( !frames )
+return;
+
+/* Checked by domain creation logic. */
+ASSERT(v->vmtrace.buf);
+ASSERT(frames <= (GB(4) >> PAGE_SHIFT) && (frames & (frames - 1)) == 0);
+
+v->arch.msrs->rtit.output_limit = (frames << PAGE_SHIFT) - 1;
+}
+

[PATCH v7 08/10] tools/misc: Add xen-vmtrace tool

2021-01-21 Thread Andrew Cooper

From: Michał Leszczyński 

Add an demonstration tool that uses xc_vmtrace_* calls in order
to manage external IPT monitoring for DomU.

Signed-off-by: Michał Leszczyński 
Signed-off-by: Andrew Cooper 
---
CC: Ian Jackson 
CC: Wei Liu 
CC: Michał Leszczyński 
CC: Tamas K Lengyel 
---
 tools/misc/.gitignore|   1 +
 tools/misc/Makefile  |   4 ++
 tools/misc/xen-vmtrace.c | 154 +++
 3 files changed, 159 insertions(+)
 create mode 100644 tools/misc/xen-vmtrace.c

diff --git a/tools/misc/.gitignore b/tools/misc/.gitignore
index b2c3b21f57..ce6f937d0c 100644
--- a/tools/misc/.gitignore
+++ b/tools/misc/.gitignore
@@ -1,3 +1,4 @@
 xen-access
 xen-memshare
 xen-ucode
+xen-vmtrace
diff --git a/tools/misc/Makefile b/tools/misc/Makefile
index 912c5d4f0e..c5017e6c70 100644
--- a/tools/misc/Makefile
+++ b/tools/misc/Makefile
@@ -25,6 +25,7 @@ INSTALL_SBIN-$(CONFIG_X86) += xen-lowmemd
 INSTALL_SBIN-$(CONFIG_X86) += xen-memshare
 INSTALL_SBIN-$(CONFIG_X86) += xen-mfndump
 INSTALL_SBIN-$(CONFIG_X86) += xen-ucode
+INSTALL_SBIN-$(CONFIG_X86) += xen-vmtrace
 INSTALL_SBIN   += xencov
 INSTALL_SBIN   += xenhypfs
 INSTALL_SBIN   += xenlockprof
@@ -90,6 +91,9 @@ xen-hvmcrash: xen-hvmcrash.o
 xen-memshare: xen-memshare.o
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
+xen-vmtrace: xen-vmtrace.o
+   $(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) 
$(LDLIBS_libxenforeignmemory) $(APPEND_LDFLAGS)
+
 xenperf: xenperf.o
$(CC) $(LDFLAGS) -o $@ $< $(LDLIBS_libxenctrl) $(APPEND_LDFLAGS)
 
diff --git a/tools/misc/xen-vmtrace.c b/tools/misc/xen-vmtrace.c
new file mode 100644
index 00..47fea871cf
--- /dev/null
+++ b/tools/misc/xen-vmtrace.c
@@ -0,0 +1,154 @@
+/**
+ * tools/vmtrace.c
+ *
+ * Demonstrative tool for collecting Intel Processor Trace data from Xen.
+ *  Could be used to externally monitor a given vCPU in given DomU.
+ *
+ * Copyright (C) 2020 by CERT Polska - NASK PIB
+ *
+ * Authors: Michał Leszczyński, michal.leszczyn...@cert.pl
+ * Date:June, 2020
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; under version 2 of the License.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#define MSR_RTIT_CTL0x0570
+#define  RTIT_CTL_OS(1 <<  2)
+#define  RTIT_CTL_USR   (1 <<  3)
+#define  RTIT_CTL_BRANCH_EN (1 << 13)
+
+static volatile int interrupted = 0;
+
+static xc_interface *xch;
+static xenforeignmemory_handle *fh;
+
+void int_handler(int signum)
+{
+interrupted = 1;
+}
+
+int main(int argc, char **argv)
+{
+uint32_t domid, vcpu;
+int rc, exit = 1;
+size_t size;
+char *buf = NULL;
+xenforeignmemory_resource_handle *fres = NULL;
+uint64_t last_offset = 0;
+
+if ( signal(SIGINT, int_handler) == SIG_ERR )
+err(1, "Failed to register signal handler\n");
+
+if ( argc != 3 )
+{
+fprintf(stderr, "Usage: %s  \n", argv[0]);
+fprintf(stderr, "It's recommended to redirect thisprogram's output to 
file\n");
+fprintf(stderr, "or to pipe it's output to xxd or other program.\n");
+return 1;
+}
+
+domid = atoi(argv[1]);
+vcpu  = atoi(argv[2]);
+
+xch = xc_interface_open(NULL, NULL, 0);
+fh = xenforeignmemory_open(NULL, 0);
+
+if ( !xch )
+err(1, "xc_interface_open()");
+if ( !fh )
+err(1, "xenforeignmemory_open()");
+
+rc = xenforeignmemory_resource_size(
+fh, domid, XENMEM_resource_vmtrace_buf, vcpu, );
+if ( rc )
+err(1, "xenforeignmemory_resource_size()");
+
+fres = xenforeignmemory_map_resource(
+fh, domid, XENMEM_resource_vmtrace_buf, vcpu,
+0, size >> XC_PAGE_SHIFT, (void **), PROT_READ, 0);
+if ( !fres )
+err(1, "xenforeignmemory_map_resource()");
+
+if ( xc_vmtrace_set_option(
+ xch, domid, vcpu, MSR_RTIT_CTL,
+ RTIT_CTL_BRANCH_EN | RTIT_CTL_USR | RTIT_CTL_OS) )
+{
+perror("xc_vmtrace_set_option()");
+goto out;
+}
+
+if ( xc_vmtrace_enable(xch, domid, vcpu) )
+{
+perror("xc_vmtrace_enable()");
+goto out;
+}
+
+while ( !interrupted )
+{
+xc_dominfo_t

[linux-linus test] 158553: regressions - FAIL

2021-01-21 Thread osstest service owner

flight 158553 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158553/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemut-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-install fail REGR. 
vs. 152332
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-xsm7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-coresched-i386-xl  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-libvirt   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-install fail REGR. vs. 
152332
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-libvirt-xsm   7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-raw7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-pvshim 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-freebsd10-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-freebsd10-i386  7 xen-installfail REGR. vs. 152332
 test-amd64-i386-xl-qemut-debianhvm-i386-xsm 7 xen-install fail REGR. vs. 152332
 test-amd64-i386-xl-shadow 7 xen-install  fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-install   fail REGR. vs. 152332
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-install fail REGR. 
vs. 152332
 test-amd64-amd64-xl-multivcpu 14 guest-start fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 10 xen-install/src_host fail REGR. vs. 152332
 test-amd64-i386-libvirt-pair 11 xen-install/dst_host fail REGR. vs. 152332
 test-amd64-amd64-xl  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvshim   14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit2  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-intel 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-amd 14 guest-start   fail REGR. vs. 152332
 test-amd64-amd64-xl-shadow   14 guest-start  fail REGR. vs. 152332
 test-arm64-arm64-xl-credit1  10 host-ping-check-xen  fail REGR. vs. 152332
 test-amd64-i386-examine   6 xen-install  fail REGR. vs. 152332
 test-amd64-amd64-xl-pvhv2-amd 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-libvirt-xsm 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-qemuu-freebsd11-amd64 13 guest-startfail REGR. vs. 152332
 test-arm64-arm64-examine  8 reboot   fail REGR. vs. 152332
 test-amd64-amd64-libvirt 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-credit1  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-xl-xsm  14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-dom0pvh-xl-intel 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-amd64-pvgrub 12 debian-di-install   fail REGR. vs. 152332
 test-amd64-amd64-qemuu-freebsd12-amd64 13 guest-startfail REGR. vs. 152332
 test-amd64-amd64-libvirt-pair 25 guest-start/debian  fail REGR. vs. 152332
 test-amd64-amd64-i386-pvgrub 12 debian-di-installfail REGR. vs. 152332
 test-amd64-amd64-pair25 guest-start/debian   fail REGR. vs. 152332
 test-amd64-coresched-amd64-xl 14 guest-start fail REGR. vs. 152332
 test-amd64-amd64-qemuu-nested-amd 12 debian-hvm-install  fail REGR. vs. 152332
 test-amd64-amd64-xl-qemut-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
152332
 test-amd64-amd64-xl-qemut-win7-amd64 12 windows-install  fail REGR. vs. 152332
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 12 debian-hvm-install fail REGR. vs. 
152332
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 12 debian-hvm-install 
fail REGR. vs. 152332
 test-arm64-arm64-xl-thunderx 14 guest-start  fail REGR. vs. 152332
 test-amd64-amd64-qemuu-nested-intel 12 debian-hvm-install fail REGR. vs. 152332

[xen-unstable-smoke test] 158561: tolerable all pass - PUSHED

2021-01-21 Thread osstest service owner

flight 158561 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158561/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  dbf22970f5df8d20b2a6b7107cb9d977630181a6
baseline version:
 xen  9cd5bbf5369d520aacca9a1b141e9a84f62d507d

Last test of basis   158559  2021-01-21 16:00:25 Z0 days
Testing same since   158561  2021-01-21 19:01:40 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 
  Julien Grall 
  Julien Grall 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   9cd5bbf536..dbf22970f5  dbf22970f5df8d20b2a6b7107cb9d977630181a6 -> smoke

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Stefano Stabellini

On Thu, 21 Jan 2021, Rahul Singh wrote:
> > On 21 Jan 2021, at 6:43 pm, Julien Grall  wrote:
> > On 21/01/2021 17:10, Rahul Singh wrote:
> >>> On 20 Jan 2021, at 8:31 pm, Stefano Stabellini  
> >>> wrote:
>  -static void __iomem *arm_smmu_ioremap(struct device *dev, 
>  resource_size_t start,
>  -  resource_size_t size)
>  +
>  +static void arm_smmu_free_structures(struct arm_smmu_device *smmu)
>  {
>  -struct resource res = {
>  -.flags = IORESOURCE_MEM,
>  -.start = start,
>  -.end = start + size - 1,
>  -};
>  +if (smmu->cmdq.q.base)
>  +xfree(smmu->cmdq.q.base);
>  +
>  +if (smmu->evtq.q.base)
>  +xfree(smmu->evtq.q.base);
>  
>  -return devm_ioremap_resource(dev, );
>  +if (smmu->priq.q.base)
>  +xfree(smmu->priq.q.base);
>  +
>  +if (smmu->strtab_cfg.strtab)
>  +xfree(smmu->strtab_cfg.strtab);
>  +
>  +if (smmu->strtab_cfg.l1_desc)
>  +xfree(smmu->strtab_cfg.l1_desc);
> >>> 
> >>> From what I can tell we also need to free somewhere
> >>> smmu->strtab_cfg->l1_desc->l2ptr allocated by arm_smmu_init_l2_strtab
> >> "l1_desc->l2ptr" is a pointer to the Level 1 Stream Table Descriptor if 
> >> 2-level Stream Table supported.
> >> If the device is protected by IOMMU, SMMUv3 driver will allocate the  
> >> "l1_desc->l2ptr” when the device is added to the IOMMU via 
> >> arm_smmu_add_device() function and device will be configured in 
> >> bypass/abort mode.
> >> Once we assign the device to the domain(arm_smmu_assign_dev() ) smmuv3 hw 
> >> will be configured correctly to match the StreamID. If there is a failure 
> >> in assigning the device, that case also XEN will not remove the device and 
> >> master device still be in bypass/abort mode.
> > 
> > I am a bit confused with this answer. Wouldn't this mean that we are 
> > "leaking" memory if we fail to assign the device?
> 
> No we are not leaking memory as "l1_desc->l2ptr” is still be valid and is 
> required for correct SMMUv3 opeation if we fail to assign the device, as in 
> that case device will operate in bypass or abort mode. 
> 
> For example if by any chance there is faulty hw behind SMMUv3 then if we fail 
> to assign the device we can configure the device in abort mode in that case 
> SMMUv3 hw will silently report abort to device, without any event recorded.
> 
> We can also configure the device in bypass mode if we fail to assign. Thats 
> why I thought not to free the "l1_desc->l2ptr” if we fail to assign the 
> device.

Basically there are two places where l2ptr should be freed:

1) the error out path (err_free_master) in arm_smmu_add_device
2) arm_smmu_remove_device

However, arm_smmu_remove_device is not actually implementedi at all. In
regards to 1), it would be redundant because the memory allocation of
l2ptr is the last operation that could return an error in
arm_smmu_add_device. In other words, if it fails then the memory is not
allocated.

To avoid future memory-leaks in case we add function calls that could
fail after the arm_smmu_init_l2_strtab() call in arm_smmu_add_device, I
think it would be good to have an xfree(l2ptr) under err_free_master.
But given that it is not necessary today I am OK either way.

Re: IRQ latency measurements in hypervisor

2021-01-21 Thread Julien Grall


Hi Volodymir,

On 20/01/2021 23:03, Volodymyr Babchuk wrote:

Julien Grall writes:

This is very interestingi too. Did you get any spikes with the
period
set to 100us? It would be fantastic if there were none.


3. Huge latency spike during domain creation. I conducted some
  additional tests, including use of PV drivers, but this didn't
  affected the latency in my "real time" domain. But attempt to
  create another domain with relatively large memory size of 2GB led
  to huge spike in latency. Debugging led to this call path:

  XENMEM_populate_physmap -> populate_physmap() ->
  alloc_domheap_pages() -> alloc_heap_pages()-> huge
  "for ( i = 0; i < (1 << order); i++ )" loop.


There are two for loops in alloc_heap_pages() using this syntax. Which
one are your referring to?

I did some tracing with Lautrebach. It pointed to the first loop and
especially to flush_page_to_ram() call if I remember correctly.


Thanks, I am not entirely surprised because we are clean and
invalidating the region line by line and across all the CPUs.

If we are assuming 128 bytes cacheline, we will need to issue 32 cache
instructions per page. This going to involve quite a bit of traffic on
the system.

One possibility would be to defer the cache flush when the domain is
created and use the hypercall XEN_DOMCTL_cacheflush to issue the
flush.


Can we flush caches on first access to a page? What I mean - do not
populate stage 2 tables with allocated memory. Flush caches in abort
handler and then populate them.


We are using a similar approach for implementing set/way but it only 
works as long as you don't assign a device (with or without an IOMMU).


Currently, it is quite uncommon to have device that are able to restart 
a DMA transaction after faulting. So we would not be able to share the 
P2M with IOMMU if we populate the P2M on the first access (a device may 
be the first to access the memory).


Even if we decided to unshare the P2M, there would still be a trust 
problem if the device is non-coherent (i.e. it cannot snoop the cache).


The same can be said if you are not protecting the device with an IOMMU 
(you may have an MPU or something different on the system).


The only case where we could possibly disable the flush is when your 
memory is statically partionned as the guest would always receive the 
same pages.


[...]



Do you have any suggestion how to split it?



Well, it  is quite complex function and I can't tell right away.
At this time I don't quite understand why spin_unlock() is called after
the first (1 << order) loop for instance.


The part after the spin lock will check whether the pages are scrubbed. 
As the content of the pages cannot be touched by someone else (we made 
sure the scrub task removed them from its list). Therefore, the 
operation is fine to be called without the heap_lock held.




Also, this function does many different things for its simple name.


Everything in this function needs to happen before the page can safely 
be handed out to another part of Xen or a guest. AFAICT, the only thing 
that can possibly be split out is the call to flush_page_to_ram().







I think the first step is we need to figure out which part of the
allocation is slow (see my question above). From there, we can figure
out if there is a way to reduce the impact.

I'll do more tracing and will return with more accurate numbers.
But as far as I can see, any loop on 262144 pages will take some time..

.

It really depends on the content of the loop. On any modern
processors, you are very likely not going to notice a loop that update
just a flag.

However, you are likely going to be see an impact if your loop is
going to clean & invalidate the cache for each page.



Totally agree. I used Xen tracing subsystem to do the measurements and I
can confirm that call to flush_page_to_ram() causes most of the impact.


There is the details:


I added number of tracing points to the function:

static struct page_info *alloc_heap_pages(
 unsigned int zone_lo, unsigned int zone_hi,
 unsigned int order, unsigned int memflags,
 struct domain *d)
{
 nodeid_t node;
 unsigned int i, buddy_order, zone, first_dirty;
 unsigned long request = 1UL << order;
 struct page_info *pg;
 bool need_tlbflush = false;
 uint32_t tlbflush_timestamp = 0;
 unsigned int dirty_cnt = 0;

 /* Make sure there are enough bits in memflags for nodeID. */
 BUILD_BUG_ON((_MEMF_bits - _MEMF_node) < (8 * sizeof(nodeid_t)));

 ASSERT(zone_lo <= zone_hi);
 ASSERT(zone_hi < NR_ZONES);

 if ( unlikely(order > MAX_ORDER) )
 return NULL;

 spin_lock(_lock);

 TRACE_1D(TRC_PGALLOC_PT1, order); // <=

 /*
  * Claimed memory is considered unavailable unless the request
  * is made by a domain with sufficient unclaimed pages.
  */
 if ( (outstanding_claims + request > total_avail_pages) &&

[linux-5.4 test] 158552: regressions - FAIL

2021-01-21 Thread osstest service owner

flight 158552 linux-5.4 real [real]
flight 158562 linux-5.4 real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/158552/
http://logs.test-lab.xenproject.org/osstest/logs/158562/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-dom0pvh-xl-intel  8 xen-bootfail REGR. vs. 158387

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158387
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158387
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158387
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158387
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158387
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158387
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158387
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass

version targeted for testing:
 linuxd26b3110041a9fddc6c6e36398f53f7eab8cff82
baseline version:
 linuxa829146c3fdcf6d0b76d9c54556a223820f1f73b

Last test of basis   158387  2021-01-12 19:40:06 Z9 days
Failing since158473  2021-01-17 13:42:20 Z4 days8 attempts
Testing same since   158533  2021-01-20 00:41:04 Z1 days3 attempts


People who

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Rahul Singh

Hello Julien,

> On 21 Jan 2021, at 6:43 pm, Julien Grall  wrote:
> 
> Hi Rahul,
> 
> Please try to trim the e-mail when quoting, otherwise it is quite difficult 
> to find the only couple of answer you wrote.
> 
> On 21/01/2021 17:10, Rahul Singh wrote:
>>> On 20 Jan 2021, at 8:31 pm, Stefano Stabellini  
>>> wrote:
 -static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
 start,
 -resource_size_t size)
 +
 +static void arm_smmu_free_structures(struct arm_smmu_device *smmu)
 {
 -  struct resource res = {
 -  .flags = IORESOURCE_MEM,
 -  .start = start,
 -  .end = start + size - 1,
 -  };
 +  if (smmu->cmdq.q.base)
 +  xfree(smmu->cmdq.q.base);
 +
 +  if (smmu->evtq.q.base)
 +  xfree(smmu->evtq.q.base);
 
 -  return devm_ioremap_resource(dev, );
 +  if (smmu->priq.q.base)
 +  xfree(smmu->priq.q.base);
 +
 +  if (smmu->strtab_cfg.strtab)
 +  xfree(smmu->strtab_cfg.strtab);
 +
 +  if (smmu->strtab_cfg.l1_desc)
 +  xfree(smmu->strtab_cfg.l1_desc);
>>> 
>>> From what I can tell we also need to free somewhere
>>> smmu->strtab_cfg->l1_desc->l2ptr allocated by arm_smmu_init_l2_strtab
>> "l1_desc->l2ptr" is a pointer to the Level 1 Stream Table Descriptor if 
>> 2-level Stream Table supported.
>> If the device is protected by IOMMU, SMMUv3 driver will allocate the  
>> "l1_desc->l2ptr” when the device is added to the IOMMU via 
>> arm_smmu_add_device() function and device will be configured in bypass/abort 
>> mode.
>> Once we assign the device to the domain(arm_smmu_assign_dev() ) smmuv3 hw 
>> will be configured correctly to match the StreamID. If there is a failure in 
>> assigning the device, that case also XEN will not remove the device and 
>> master device still be in bypass/abort mode.
> 
> I am a bit confused with this answer. Wouldn't this mean that we are 
> "leaking" memory if we fail to assign the device?

No we are not leaking memory as "l1_desc->l2ptr” is still be valid and is 
required for correct SMMUv3 opeation if we fail to assign the device, as in 
that case device will operate in bypass or abort mode. 

For example if by any chance there is faulty hw behind SMMUv3 then if we fail 
to assign the device we can configure the device in abort mode in that case 
SMMUv3 hw will silently report abort to device, without any event recorded.

We can also configure the device in bypass mode if we fail to assign. Thats why 
I thought not to free the "l1_desc->l2ptr” if we fail to assign the device.

> 
>> As in XEN, there is no function to remove the master device from the IOMMU, 
>> because of that I feel there is no need to free the "l1_desc->l2ptr” in case 
>> of failure also.
> 
> Hmmm... Xen is able to remove device from the IOMMU. The reason this is not 
> implemented yet on Arm is because you can't hot-unplug "platform" device.
> 
> I expect the removal function to be useful for PCI.

Yes I agree for PCI hot plug devices we have to implement the IOMMU remove 
function. If we implement the remove function for PCI hot-plug we can free the 
the "l1_desc->l2ptr” in the remove function for that device. 

Regards,
Rahul
> 
> Cheers,
> 
> -- 
> Julien Grall

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Julien Grall





On 21/01/2021 18:50, Rahul Singh wrote:

Hello Julien,


On 21 Jan 2021, at 6:31 pm, Julien Grall  wrote:

On 21/01/2021 17:18, Rahul Singh wrote:

Hello Oleksandr,


Hi,


On 20 Jan 2021, at 9:33 pm, Oleksandr  wrote:


On 20.01.21 16:52, Rahul Singh wrote:

Hi Rahul


Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.

Driver is currently supported as Tech Preview.

Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
3. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.

Signed-off-by: Rahul Singh 
---
Changes since v2:
- added return statement for readx_poll_timeout function.
- remove iommu_get_dma_cookie and iommu_put_dma_cookie.
- remove struct arm_smmu_xen_device as not required.
- move dt_property_match_string to device_tree.c file.
- replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
- use ARM_SMMU_REG_SZ as size when map memory to XEN.
- remove bypass keyword to make sure when device-tree probe is failed we
   are reporting error and not continuing to configure SMMU in bypass
   mode.
- fixed minor comments.
Changes since v3:
- Fixed typo for CONFIG_MSI
- Added back the mutex code
- Rebase the patch on top of newly added WARN_ON().
- Remove the direct read of register VTCR_EL2.
- Fixed minor comments.
Changes since v4:
- Replace the ffsll() with ffs64() function.
- Add code to free resources when probe failed.


Thank you for addressing, patch looks ok to me, just one small remark below:



+
+static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
+{
+}


We discussed in V4 about adding some code here which all IOMMUs on Arm already 
have, copy it below for the convenience:


  /* Set to false options not supported on ARM. */
  if ( iommu_hwdom_inclusive )
  printk(XENLOG_WARNING
  "map-inclusive dom0-iommu option is not supported on ARM\n");
  iommu_hwdom_inclusive = false;
  if ( iommu_hwdom_reserved == 1 )
  printk(XENLOG_WARNING
  "map-reserved dom0-iommu option is not supported on ARM\n");
  iommu_hwdom_reserved = 0;

  arch_iommu_hwdom_init(d);


Also we discussed about possibility to fold the part of code (which disables 
unsupported options) in arch_iommu_hwdom_init() to avoid duplication as a 
follow-up.
At least, I expected to see arch_iommu_hwdom_init() to be called by 
arm_smmu_iommu_hwdom_init() it current patch... Please note, this is *not* a 
request for change immediately,
I think, driver is functional even without this code (hopefully 
arch_iommu_hwdom_init is empty now, etc).  But, if you happen to do V6 or 
probably it could be done on commit ...


Yes I will send the patch to move the code to arch_iommu_hwdom_init() to avoid 
duplication once SMMUv3 driver will be merged.
I thought anyway I have to remove the code from SMMUv1 and IPMMU I will take 
care of all the IOMMU driver at the same time because of that I didn’t modify 
the SMMUv3 driver.


There are two part in the problem here:
  1) Code duplication
  2) The SMMUv3 not checking the command line and calling 
arch_iommu_hwdom_init(d)

I agree that 1) can be deferred because it is a clean-up. However, I consider 
2) a (latent) bug because it

Re: [PATCH V4 22/24] xen/arm: Add mapcache invalidation handling

2021-01-21 Thread Oleksandr




On 15.01.21 04:11, Stefano Stabellini wrote:

Hi Stefano


On Tue, 12 Jan 2021, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

We need to send mapcache invalidation request to qemu/demu everytime
the page gets removed from a guest.

At the moment, the Arm code doesn't explicitely remove the existing
mapping before inserting the new mapping. Instead, this is done
implicitely by __p2m_set_entry().

So we need to recognize a case when old entry is a RAM page *and*
the new MFN is different in order to set the corresponding flag.
The most suitable place to do this is p2m_free_entry(), there
we can find the correct leaf type. The invalidation request
will be sent in do_trap_hypercall() later on.

Taking into the account the following the do_trap_hypercall()
is the best place to send invalidation request:
  - The only way a guest can modify its P2M on Arm is via an hypercall
  - When sending the invalidation request, the vCPU will be blocked
until all the IOREQ servers have acknowledged the invalidation

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

***
Please note, this patch depends on the following which is
on review:
https://patchwork.kernel.org/patch/11803383/

This patch is on par with x86 code (whether it is buggy or not).
If there is a need to improve/harden something, this can be done on
a follow-up.
***

Changes V1 -> V2:
- new patch, some changes were derived from (+ new explanation):
  xen/ioreq: Make x86's invalidate qemu mapcache handling common
- put setting of the flag into __p2m_set_entry()
- clarify the conditions when the flag should be set
- use domain_has_ioreq_server()
- update do_trap_hypercall() by adding local variable

Changes V2 -> V3:
- update patch description
- move check to p2m_free_entry()
- add a comment
- use "curr" instead of "v" in do_trap_hypercall()

Changes V3 -> V4:
- update patch description
- re-order check in p2m_free_entry() to call domain_has_ioreq_server()
  only if p2m->domain == current->domain
- add a comment in do_trap_hypercall()
---
  xen/arch/arm/p2m.c   | 25 +
  xen/arch/arm/traps.c | 20 +---
  2 files changed, 34 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index d41c4fa..26acb95d 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -1,6 +1,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -749,17 +750,25 @@ static void p2m_free_entry(struct p2m_domain *p2m,
  if ( !p2m_is_valid(entry) )
  return;
  
-/* Nothing to do but updating the stats if the entry is a super-page. */

-if ( p2m_is_superpage(entry, level) )
+if ( p2m_is_superpage(entry, level) || (level == 3) )
  {
-p2m->stats.mappings[level]--;
-return;
-}
+#ifdef CONFIG_IOREQ_SERVER
+/*
+ * If this gets called (non-recursively) then either the entry
+ * was replaced by an entry with a different base (valid case) or
+ * the shattering of a superpage was failed (error case).
+ * So, at worst, the spurious mapcache invalidation might be sent.
+ */
+if ( (p2m->domain == current->domain) &&
+  domain_has_ioreq_server(p2m->domain) &&
+  p2m_is_ram(entry.p2m.type) )
+p2m->domain->mapcache_invalidate = true;
+#endif
  
-if ( level == 3 )

-{
  p2m->stats.mappings[level]--;
-p2m_put_l3_page(entry);
+/* Nothing to do if the entry is a super-page. */
+if ( level == 3 )
+p2m_put_l3_page(entry);
  return;
  }
  
diff --git a/xen/arch/arm/traps.c b/xen/arch/arm/traps.c

index 35094d8..1070d1b 100644
--- a/xen/arch/arm/traps.c
+++ b/xen/arch/arm/traps.c
@@ -1443,6 +1443,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, 
register_t *nr,
const union hsr hsr)
  {
  arm_hypercall_fn_t call = NULL;
+struct vcpu *curr = current;
  
  BUILD_BUG_ON(NR_hypercalls < ARRAY_SIZE(arm_hypercall_table) );
  
@@ -1459,7 +1460,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, register_t *nr,

  return;
  }
  
-current->hcall_preempted = false;

+curr->hcall_preempted = false;
  
  perfc_incra(hypercalls, *nr);

  call = arm_hypercall_table[*nr].fn;
@@ -1472,7 +1473,7 @@ static void do_trap_hypercall(struct cpu_user_regs *regs, 
register_t *nr,
  HYPERCALL_RESULT_REG(regs) = call(HYPERCALL_ARGS(regs));
  
  #ifndef NDEBUG

-if ( !current->hcall_preempted )
+if ( !curr->hcall_preempted )
  {
  /* Deliberately corrupt parameter regs used by this hypercall. */
  switch ( arm_hypercall_table[*nr].nr_args ) {
@@ -1489,8 +1490,21 @@ static void

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Julien Grall


Hi Dario,

On 21/01/2021 18:32, Dario Faggioli wrote:

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:

Hi,
I see a problem with destroy and restart of a domain. Interrupts are
not
available when trying to restart a domain.

The situation seems very similar to the thread "null scheduler bug"
  
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html

.


Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?


PCI passthrough is not yet supported on Arm :). However, the bug was 
reported with platform device passthrough.


[...]


"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I
also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from
 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html

.
This does seem to change the results.


Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it. >

Starting the system without "sched=null vwfi=native" does not result
in
the problem.


Ok, how about, if you're up for some more testing:

  - booting with "sched=null" but not with "vwfi=native"
  - booting with "sched=null vwfi=native" but not doing the IRQ
passthrough that you mentioned above

?


I think we can skip the testing as the bug was fully diagnostics back 
then. Unfortunately, I don't think a patch was ever posted. The 
interesting bits start at [1]. Let me try to summarize here.


This has nothing to do with device passthrough, but the bug is easier to 
spot as interrupts are only going to be released when then domain is 
fully destroyed (we should really release them during the relinquish 
period...).


The last step of the domain destruction (complete_domain_destroy()) will 
*only* happen when all the CPUs are considered quiescent from the RCU PoV.


As you pointed out on that thread, the RCU implementation in Xen 
requires the pCPU to enter in the hypervisor (via hypercalls, 
interrupts...) time to time.


This assumption doesn't hold anymore when using "sched=null vwfi=native" 
because a vCPU will not exit when it is idling (vwfi=native) and there 
may not be any other source of interrupt on that vCPU.


Therefore the quiescent state will never be reached on the pCPU running 
that vCPU.


From Xen PoV, any pCPU executing guest context can be considered 
quiescent. So one way to solve the problem would be to mark the pCPU 
when entering to the guest.


Cheers,

[1] 
https://lore.kernel.org/xen-devel/acbeae1c-fda1-a079-322a-786d7528e...@arm.com/


--
Julien Grall

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Julien Grall





On 21/01/2021 10:54, Anders Törnqvist wrote:

Hi,


Hi Anders,

Thank you for reporting the bug. I am adding Stefano and Dario as IIRC 
they were going to work on a solution.


Cheers,

I see a problem with destroy and restart of a domain. Interrupts are not 
available when trying to restart a domain.


The situation seems very similar to the thread "null scheduler bug" 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.


The target system is a iMX8-based ARM board and Xen is a 4.13.0 version 
built from https://source.codeaurora.org/external/imx/imx-xen.git.


Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but 
then when trying to restart the domain

again with a "xl create " I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the 
domain without complaining about the IRQ.


Inspired from the discussion in the thread above I have put printks in 
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy 
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of 
complete_domain_destroy function\n") in the beginning.


With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also 
get:

(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. 
This does seem to change the results.


Starting the system without "sched=null vwfi=native" does not result in 
the problem.


BR
Anders





--
Julien Grall

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Rahul Singh

Hello Julien,

> On 21 Jan 2021, at 6:31 pm, Julien Grall  wrote:
> 
> On 21/01/2021 17:18, Rahul Singh wrote:
>> Hello Oleksandr,
> 
> Hi,
> 
>>> On 20 Jan 2021, at 9:33 pm, Oleksandr  wrote:
>>> 
>>> 
>>> On 20.01.21 16:52, Rahul Singh wrote:
>>> 
>>> Hi Rahul
>>> 
 Add support for ARM architected SMMUv3 implementation. It is based on
 the Linux SMMUv3 driver.

 Driver is currently supported as Tech Preview.

 Major differences with regard to Linux driver are as follows:
 2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
 3. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
 5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
 6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
 7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
 8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.

 Signed-off-by: Rahul Singh 
 ---
 Changes since v2:
 - added return statement for readx_poll_timeout function.
 - remove iommu_get_dma_cookie and iommu_put_dma_cookie.
 - remove struct arm_smmu_xen_device as not required.
 - move dt_property_match_string to device_tree.c file.
 - replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
 - use ARM_SMMU_REG_SZ as size when map memory to XEN.
 - remove bypass keyword to make sure when device-tree probe is failed we
   are reporting error and not continuing to configure SMMU in bypass
   mode.
 - fixed minor comments.
 Changes since v3:
 - Fixed typo for CONFIG_MSI
 - Added back the mutex code
 - Rebase the patch on top of newly added WARN_ON().
 - Remove the direct read of register VTCR_EL2.
 - Fixed minor comments.
 Changes since v4:
 - Replace the ffsll() with ffs64() function.
 - Add code to free resources when probe failed.
>>> 
>>> Thank you for addressing, patch looks ok to me, just one small remark below:
>>> 
>>> 
 +
 +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
 +{
 +}
>>> 
>>> We discussed in V4 about adding some code here which all IOMMUs on Arm 
>>> already have, copy it below for the convenience:
>>> 
>>> 
>>>  /* Set to false options not supported on ARM. */
>>>  if ( iommu_hwdom_inclusive )
>>>  printk(XENLOG_WARNING
>>>  "map-inclusive dom0-iommu option is not supported on ARM\n");
>>>  iommu_hwdom_inclusive = false;
>>>  if ( iommu_hwdom_reserved == 1 )
>>>  printk(XENLOG_WARNING
>>>  "map-reserved dom0-iommu option is not supported on ARM\n");
>>>  iommu_hwdom_reserved = 0;
>>> 
>>>  arch_iommu_hwdom_init(d);
>>> 
>>> 
>>> Also we discussed about possibility to fold the part of code (which 
>>> disables unsupported options) in arch_iommu_hwdom_init() to avoid 
>>> duplication as a follow-up.
>>> At least, I expected to see arch_iommu_hwdom_init() to be called by 
>>> arm_smmu_iommu_hwdom_init() it current patch... Please note, this is *not* 
>>> a request for change immediately,
>>> I think, driver is functional even without this code (hopefully 
>>> arch_iommu_hwdom_init is empty now, etc).  But, if you happen to do V6 or 
>>> probably it could be done on commit ...
>>> 
>> Yes I will send the patch to move the code to arch_iommu_hwdom_init() to 
>> avoid duplication once SMMUv3 driver will be merged.
>> I thought anyway I have to

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Julien Grall


Hi Jan,

On 21/01/2021 10:24, Jan Beulich wrote:

On 21.01.2021 10:56, Julien Grall wrote:

Hi Jan,

On 21/01/2021 09:43, Jan Beulich wrote:

On 21.01.2021 10:30, Michal Orzel wrote:

Fix compilation error when enabling early printk, introduced
by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
```
debug.S: Assembler messages:
debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
```

The fix is to include header  which now contains
definitions for page/size/mask etc.

Signed-off-by: Michal Orzel 


I'm sorry for the breakage, but I wonder how I should have noticed
the issue. In all the Arm .config-s I'm routinely building I can't
even see ...


--- a/xen/include/asm-arm/early_printk.h
+++ b/xen/include/asm-arm/early_printk.h
@@ -10,6 +10,7 @@
   #ifndef __ARM_EARLY_PRINTK_H__
   #define __ARM_EARLY_PRINTK_H__
   
+#include 
   
   #ifdef CONFIG_EARLY_PRINTK


... a respective Kconfig setting, i.e. it's not like I simply
failed to enable it.


EARLY_PRINTK is defined in arch/arm/Kconfig.debug and is selected when
you specify the UART to use.

Assuming you are only build testing, you could add the following for
testing EARLY_PRINTK:

CONFIG_DEBUG=y
CONFIG_EARLY_UART_CHOICE_8250=y
CONFIG_EARLY_UART_8250=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_UART_BASE_ADDRESS=
CONFIG_EARLY_UART_8250_REG_SHIFT=0
CONFIG_EARLY_PRINTK_INC="debug-8250.inc"


Ah yes, this works, thanks. The "optional" on the choice isn't
very helpful I suppose, because when going from an existing
.config one would neither find a setting presently turned off
in that .config, nor will there be a prompt.


Do you have a suggestion how the "choice" can appear in the .config?


I suppose any page-aligned base address is fine to use for my
build-testing-only purposes?


The base address doesn't need to be page-aligned (some HW have multiple 
UARTs in the same 4K region).


Cheers,

--
Julien Grall

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Julien Grall


Hi Rahul,

Please try to trim the e-mail when quoting, otherwise it is quite 
difficult to find the only couple of answer you wrote.


On 21/01/2021 17:10, Rahul Singh wrote:

On 20 Jan 2021, at 8:31 pm, Stefano Stabellini  wrote:

-static void __iomem *arm_smmu_ioremap(struct device *dev, resource_size_t 
start,
- resource_size_t size)
+
+static void arm_smmu_free_structures(struct arm_smmu_device *smmu)
{
-   struct resource res = {
-   .flags = IORESOURCE_MEM,
-   .start = start,
-   .end = start + size - 1,
-   };
+   if (smmu->cmdq.q.base)
+   xfree(smmu->cmdq.q.base);
+
+   if (smmu->evtq.q.base)
+   xfree(smmu->evtq.q.base);

-   return devm_ioremap_resource(dev, );
+   if (smmu->priq.q.base)
+   xfree(smmu->priq.q.base);
+
+   if (smmu->strtab_cfg.strtab)
+   xfree(smmu->strtab_cfg.strtab);
+
+   if (smmu->strtab_cfg.l1_desc)
+   xfree(smmu->strtab_cfg.l1_desc);


 From what I can tell we also need to free somewhere
smmu->strtab_cfg->l1_desc->l2ptr allocated by arm_smmu_init_l2_strtab


"l1_desc->l2ptr" is a pointer to the Level 1 Stream Table Descriptor if 2-level 
Stream Table supported.

If the device is protected by IOMMU, SMMUv3 driver will allocate the  
"l1_desc->l2ptr” when the device is added to the IOMMU via 
arm_smmu_add_device() function and device will be configured in bypass/abort mode.

Once we assign the device to the domain(arm_smmu_assign_dev() ) smmuv3 hw will 
be configured correctly to match the StreamID. If there is a failure in 
assigning the device, that case also XEN will not remove the device and master 
device still be in bypass/abort mode.


I am a bit confused with this answer. Wouldn't this mean that we are 
"leaking" memory if we fail to assign the device?




As in XEN, there is no function to remove the master device from the IOMMU, because of 
that I feel there is no need to free the "l1_desc->l2ptr” in case of failure 
also.


Hmmm... Xen is able to remove device from the IOMMU. The reason this is 
not implemented yet on Arm is because you can't hot-unplug "platform" 
device.


I expect the removal function to be useful for PCI.

Cheers,

--
Julien Grall

Re: [PATCH V4 16/24] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm

2021-01-21 Thread Oleksandr




On 21.01.21 15:57, Jan Beulich wrote:

Hi Jan



On 12.01.2021 22:52, Oleksandr Tyshchenko wrote:

From: Oleksandr Tyshchenko 

This patch implements reference counting of foreign entries in
in set_foreign_p2m_entry() on Arm. This is a mandatory action if
we want to run emulator (IOREQ server) in other than dom0 domain,
as we can't trust it to do the right thing if it is not running
in dom0. So we need to grab a reference on the page to avoid it
disappearing.

It is valid to always pass "p2m_map_foreign_rw" type to
guest_physmap_add_entry() since the current and foreign domains
would be always different. A case when they are equal would be
rejected by rcu_lock_remote_domain_by_id(). Besides the similar
comment in the code put a respective ASSERT() to catch incorrect
usage in future.

It was tested with IOREQ feature to confirm that all the pages given
to this function belong to a domain, so we can use the same approach
as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().

This involves adding an extra parameter for the foreign domain to
set_foreign_p2m_entry() and a helper to indicate whether the arch
supports the reference counting of foreign entries and the restriction
for the hardware domain in the common code can be skipped for it.

Signed-off-by: Oleksandr Tyshchenko 
CC: Julien Grall 
[On Arm only]
Tested-by: Wei Chen 

In principle x86 parts
Reviewed-by: Jan Beulich 


Thanks.



However, being a maintainer of ...


--- a/xen/include/asm-x86/p2m.h
+++ b/xen/include/asm-x86/p2m.h
@@ -382,6 +382,22 @@ struct p2m_domain {
  #endif
  #include 
  
+static inline bool arch_acquire_resource_check(struct domain *d)

+{
+/*
+ * The reference counting of foreign entries in set_foreign_p2m_entry()
+ * is not supported for translated domains on x86.
+ *
+ * FIXME: Until foreign pages inserted into the P2M are properly
+ * reference counted, it is unsafe to allow mapping of
+ * resource pages unless the caller is the hardware domain.
+ */
+if ( paging_mode_translate(d) && !is_hardware_domain(d) )
+return false;
+
+return true;
+}


... this code, I'd like to ask that such constructs be avoided
and this be a single return statement:

 return !paging_mode_translate(d) || is_hardware_domain(d);


ok, looks better.




I also think you may want to consider dropping the initial
"The" from the comment. I'm further unconvinced "foreign
entries" needs saying when set_foreign_p2m_entry() deals with
exclusively such. In the end the original comment moved here
would probably suffice, no need for any more additions than
perhaps a simple "(see set_foreign_p2m_entry())".


ok

--
Regards,

Oleksandr Tyshchenko

Re: Null scheduler and vwfi native problem

2021-01-21 Thread Dario Faggioli

On Thu, 2021-01-21 at 11:54 +0100, Anders Törnqvist wrote:
> Hi,
> 
Hello,

> I see a problem with destroy and restart of a domain. Interrupts are
> not 
> available when trying to restart a domain.
> 
> The situation seems very similar to the thread "null scheduler bug" 
>  
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html
> .
> 
Right. Back then, PCI passthrough was involved, if I remember
correctly. Is it the case for you as well?

> The target system is a iMX8-based ARM board and Xen is a 4.13.0
> version 
> built from https://source.codeaurora.org/external/imx/imx-xen.git.
> 
Mmm, perhaps it's me, but neither going at that url with a browser not
trying to clone it, I do not see anything. What I'm doing wrong?

> Xen is booted with sched=null vwfi=native.
> One physical CPU core is pinned to the domu.
> Some interrupts are passed through to the domu.
> 
Ok, I guess it is involved, since you say "some interrupts are passed
through..."

> When destroying the domain with xl destroy etc it does not complain
> but 
> then when trying to restart the domain
> again with a "xl create " I get:
> (XEN) IRQ 210 is already used by domain 1
> 
> "xl list" does not contain the domain.
> 
> Repeating the "xl create" command 5-10 times eventually starts the 
> domain without complaining about the IRQ.
> 
> Inspired from the discussion in the thread above I have put printks
> in 
> the xen/common/domain.c file.
> In the function domain_destroy I have a printk("End of domain_destroy
> function\n") in the end.
> In the function complete_domain_destroy have a printk("Begin of 
> complete_domain_destroy function\n") in the beginning.
> 
> With these printouts I get at "xl destroy":
> (XEN) End of domain_destroy function
> 
> So it seems like the function complete_domain_destroy is not called.
> 
Ok, thanks for making these tests. It's helpful to have this
information right away.

> "xl create" results in:
> (XEN) IRQ 210 is already used by domain 1
> (XEN) End of domain_destroy function
> 
> Then repeated "xl create" looks the same until after a few tries I
> also get:
> (XEN) Begin of complete_domain_destroy function
> 
> After that the next "xl create" creates the domain.
> 
> 
> I have also applied the patch from 
> 
> https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html
> . 
> This does seem to change the results.
> 
Ah... Really? That's a bit unexpected, TBH.

Well, I'll think about it.

> Starting the system without "sched=null vwfi=native" does not result
> in 
> the problem.
>
Ok, how about, if you're up for some more testing:

 - booting with "sched=null" but not with "vwfi=native"
 - booting with "sched=null vwfi=native" but not doing the IRQ 
   passthrough that you mentioned above

?

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Julien Grall


On 21/01/2021 17:18, Rahul Singh wrote:

Hello Oleksandr,


Hi,


On 20 Jan 2021, at 9:33 pm, Oleksandr  wrote:


On 20.01.21 16:52, Rahul Singh wrote:

Hi Rahul


Add support for ARM architected SMMUv3 implementation. It is based on
the Linux SMMUv3 driver.

Driver is currently supported as Tech Preview.

Major differences with regard to Linux driver are as follows:
2. Only Stage-2 translation is supported as compared to the Linux driver
that supports both Stage-1 and Stage-2 translations.
3. Use P2M  page table instead of creating one as SMMUv3 has the
capability to share the page tables with the CPU.
4. Tasklets are used in place of threaded IRQ's in Linux for event queue
and priority queue IRQ handling.
5. Latest version of the Linux SMMUv3 code implements the commands queue
access functions based on atomic operations implemented in Linux.
Atomic functions used by the commands queue access functions are not
implemented in XEN therefore we decided to port the earlier version
of the code. Atomic operations are introduced to fix the bottleneck
of the SMMU command queue insertion operation. A new algorithm for
inserting commands into the queue is introduced, which is lock-free
on the fast-path.
Consequence of reverting the patch is that the command queue
insertion will be slow for large systems as spinlock will be used to
serializes accesses from all CPUs to the single queue supported by
the hardware. Once the proper atomic operations will be available in
XEN the driver can be updated.
6. Spin lock is used in place of mutex when attaching a device to the
SMMU, as there is no blocking locks implementation available in XEN.
This might introduce latency in XEN. Need to investigate before
driver is out for tech preview.
7. PCI ATS functionality is not supported, as there is no support
available in XEN to test the functionality. Code is not tested and
compiled. Code is guarded by the flag CONFIG_PCI_ATS.
8. MSI interrupts are not supported as there is no support available in
XEN to request MSI interrupts. Code is not tested and compiled. Code
is guarded by the flag CONFIG_MSI.

Signed-off-by: Rahul Singh 
---
Changes since v2:
- added return statement for readx_poll_timeout function.
- remove iommu_get_dma_cookie and iommu_put_dma_cookie.
- remove struct arm_smmu_xen_device as not required.
- move dt_property_match_string to device_tree.c file.
- replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
- use ARM_SMMU_REG_SZ as size when map memory to XEN.
- remove bypass keyword to make sure when device-tree probe is failed we
   are reporting error and not continuing to configure SMMU in bypass
   mode.
- fixed minor comments.
Changes since v3:
- Fixed typo for CONFIG_MSI
- Added back the mutex code
- Rebase the patch on top of newly added WARN_ON().
- Remove the direct read of register VTCR_EL2.
- Fixed minor comments.
Changes since v4:
- Replace the ffsll() with ffs64() function.
- Add code to free resources when probe failed.


Thank you for addressing, patch looks ok to me, just one small remark below:



+
+static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
+{
+}


We discussed in V4 about adding some code here which all IOMMUs on Arm already 
have, copy it below for the convenience:


  /* Set to false options not supported on ARM. */
  if ( iommu_hwdom_inclusive )
  printk(XENLOG_WARNING
  "map-inclusive dom0-iommu option is not supported on ARM\n");
  iommu_hwdom_inclusive = false;
  if ( iommu_hwdom_reserved == 1 )
  printk(XENLOG_WARNING
  "map-reserved dom0-iommu option is not supported on ARM\n");
  iommu_hwdom_reserved = 0;

  arch_iommu_hwdom_init(d);


Also we discussed about possibility to fold the part of code (which disables 
unsupported options) in arch_iommu_hwdom_init() to avoid duplication as a 
follow-up.
At least, I expected to see arch_iommu_hwdom_init() to be called by 
arm_smmu_iommu_hwdom_init() it current patch... Please note, this is *not* a 
request for change immediately,
I think, driver is functional even without this code (hopefully 
arch_iommu_hwdom_init is empty now, etc).  But, if you happen to do V6 or 
probably it could be done on commit ...



Yes I will send the patch to move the code to arch_iommu_hwdom_init() to avoid 
duplication once SMMUv3 driver will be merged.
I thought anyway I have to remove the code from SMMUv1 and IPMMU I will take 
care of all the IOMMU driver at the same time because of that I didn’t modify 
the SMMUv3 driver.


There are two part in the problem here:
  1) Code duplication
  2) The SMMUv3 not checking the command line and calling 
arch_iommu_hwdom_init(d)


I agree that 1) can be deferred because it is a clean-up. However, I 
consider 2) a (latent) bug because it means that we risk unintentionally 
breaking the SMMUv3 driver is we need to add code in

[xen-unstable-smoke test] 158559: tolerable all pass - PUSHED

2021-01-21 Thread osstest service owner

flight 158559 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158559/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  9cd5bbf5369d520aacca9a1b141e9a84f62d507d
baseline version:
 xen  e8adbf680b56a3f4b9600c7bcc04fec1877a6213

Last test of basis   158545  2021-01-20 18:01:27 Z1 days
Testing same since   158559  2021-01-21 16:00:25 Z0 days1 attempts


People who touched revisions under test:
  Paul Durrant 
  Roger Pau Monné 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   e8adbf680b..9cd5bbf536  9cd5bbf5369d520aacca9a1b141e9a84f62d507d -> smoke

Re: [PATCH v2 2/4] x86/vioapic: issue EOI to dpci when switching pin to edge trigger mode

2021-01-21 Thread Roger Pau Monné

On Thu, Jan 21, 2021 at 05:23:04PM +0100, Jan Beulich wrote:
> On 15.01.2021 15:28, Roger Pau Monne wrote:
> > --- a/xen/arch/x86/hvm/vioapic.c
> > +++ b/xen/arch/x86/hvm/vioapic.c
> > @@ -268,6 +268,17 @@ static void vioapic_write_redirent(
> >  
> >  spin_unlock(>arch.hvm.irq_lock);
> >  
> > +if ( ent.fields.trig_mode == VIOAPIC_EDGE_TRIG &&
> > + ent.fields.remote_irr && is_iommu_enabled(d) )
> > +/*
> > + * Since IRR has been cleared and further interrupts can be
> > + * injected also attempt to deassert any virtual line of passed
> > + * through devices using this pin. Switching a pin from level 
> > to
> > + * trigger mode can be used as a way to EOI an interrupt at the
> > + * IO-APIC level.
> > + */
> > +hvm_dpci_eoi(d, gsi);
> > +
> >  if ( is_hardware_domain(d) && unmasked )
> >  {
> >  /*
> 
> I assume in the comment you mean "... from level to edge
> mode ...".

Yes, that's right, I completely missed it, sorry.

> But this isn't reflected in the if() you add -
> you do the same also when the mode doesn't change. Or do
> you build on the assumption that ent.fields.remote_irr can
> only be set if the prior mode was "level" (in which case
> an assertion may be warranted, as I don't think this is
> overly obvious)?

Yes, IRR is only set for level triggered interrupts, so it's indeed
build on the assumption that a pin can only have had IRR set when in
edge mode when it's being switched from level to edge.

I can add an assertion.

> Also, looking at this code, is it correct to trigger an IRQ
> upon the guest writing the upper half of an unmasked RTE
> with remote_irr clear? I'd assume this needs to be strictly
> limited to a 1->0 transition of the mask bit. If other code
> indeed guarantees this in all cases, perhaps another place
> where an assertion would be warranted?

Indeed. I don't think it should be possible for a write to the upper
half to trigger the injection of an interrupt, as having
gsi_assert_count > 0 would imply that either IRR is already set, or
that the pin is masked when processing an upper write.

I can add that a pre-patch if you agree.

In fact we could almost short-circuit the logic after the *pent = ent;
line for upper writes if it wasn't for the call to
vlapic_adjust_i8259_target, the rest of the code there shouldn't
matter for upper writes. And the i8259 target logic that we have is
very dodgy I would say. I have plans to fix it at some point, but
that requires fixing the virtual periodic timers logic first, which I
didn't get around to re-posting.

Thanks, Roger.

Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-21 Thread Robin Murphy


On 2021-01-21 15:48, Rob Herring wrote:

On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy 
wrote:


On 2021-01-20 21:31, Rob Herring wrote:

On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy
 wrote:


On 2021-01-20 16:53, Rob Herring wrote:

On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang
wrote:

Introduce the new compatible string, restricted-dma-pool,
for restricted DMA. One can specify the address and length
of the restricted DMA memory region by restricted-dma-pool
in the device tree.


If this goes into DT, I think we should be able to use
dma-ranges for this purpose instead. Normally, 'dma-ranges'
is for physical bus restrictions, but there's no reason it
can't be used for policy or to express restrictions the
firmware has enabled.


There would still need to be some way to tell SWIOTLB to pick
up the corresponding chunk of memory and to prevent the kernel
from using it for anything else, though.


Don't we already have that problem if dma-ranges had a very
small range? We just get lucky because the restriction is
generally much more RAM than needed.


Not really - if a device has a naturally tiny addressing capability
that doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer
will be allocated then it's unlikely to work well, but that's just
crap system design. Yes, memory pressure in ZONE_DMA{32} is
particularly problematic for such limited devices, but it's
irrelevant to the issue at hand here.


Yesterday's crap system design is today's security feature. Couldn't 
this feature make crap system design work better?


Indeed! Say you bring out your shiny new "Strawberry Flan 4" machine
with all the latest connectivity, but tragically its PCIe can only
address 25% of the RAM. So you decide to support deploying it in two
configurations: one where it runs normally for best performance, and
another "secure" one where it dedicates that quarter of RAM as a 
restricted DMA pool for any PCIe devices - that way, even if that hotel 
projector you plug in turns out to be a rogue Thunderbolt endpoint, it 
can never snarf your private keys off your eMMC out of the page cache.


(Yes, is is the thinnest of strawmen, but it sets the scene for the 
point you raised...)


...which is that in both cases the dma-ranges will still be identical. 
So how is the kernel going to know whether to steal that whole area from 
memblock before anything else can allocate from it, or not?


I don't disagree that even in Claire's original intended case it would 
be semantically correct to describe the hardware-firewalled region with 
dma-ranges. It just turns out not to be necessary, and you're already 
arguing for not adding anything in DT that doesn't need to be.



What we have here is a device that's not allowed to see *kernel*
memory at all. It's been artificially constrained to a particular
region by a TZASC or similar, and the only data which should ever
be placed in that


May have been constrained, but that's entirely optional.

In the optional case where the setup is entirely up to the OS, I
don't think this belongs in the DT at all. Perhaps that should be
solved first.


Yes! Let's definitely consider that case! Say you don't have any 
security or physical limitations but want to use a bounce pool for some 
device anyway because reasons (perhaps copying streaming DMA data to a 
better guaranteed alignment gives an overall performance win). Now the 
*only* relevant thing to communicate to the kernel is to, ahem, reserve 
a large chunk of memory, and use it for this special purpose. Isn't that 
literally what reserved-memory bindings are for?



region is data intended for that device to see. That way if it
tries to go rogue it physically can't start slurping data intended
for other devices or not mapped for DMA at all. The bouncing is an
important part of this - I forget the title off-hand but there was
an interesting paper a few years ago which demonstrated that even
with an IOMMU, streaming DMA of in-place buffers could reveal
enough adjacent data from the same page to mount an attack on the
system. Memory pressure should be immaterial since the size of each
bounce pool carveout will presumably be tuned for the needs of the
given device.

In any case, wouldn't finding all the dma-ranges do this? We're 
already walking the tree to find the max DMA address now.


If all you can see are two "dma-ranges" properties, how do you
propose to tell that one means "this is the extent of what I can
address, please set my masks and dma-range-map accordingly and try
to allocate things where I can reach them" while the other means
"take this output range away from the page allocator and hook it up
as my dedicated bounce pool, because it is Serious Security Time"?
Especially since getting that choice wrong either way would be a
Bad Thing.


Either we have some heuristic based on the size or we add some hint. 
The point is let's build on what we already have for defining DMA 
accessible memory in DT rather than some parallel

Re: [PATCH v4 4/5] xen/cpupool: add scheduling granularity entry to cpupool entries

2021-01-21 Thread Dario Faggioli

On Mon, 2021-01-18 at 12:55 +0100, Juergen Gross wrote:
> Add a "sched-gran" entry to the per-cpupool hypfs directories.
> 
> For now make this entry read-only and let it contain one of the
> strings "cpu", "core" or "socket".
> 
> Signed-off-by: Juergen Gross 
>
Reviewed-by: Dario Faggioli 

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v2 1/4] x86/vioapic: check IRR before attempting to inject interrupt after EOI

2021-01-21 Thread Roger Pau Monné

On Thu, Jan 21, 2021 at 05:03:51PM +0100, Jan Beulich wrote:
> On 15.01.2021 15:28, Roger Pau Monne wrote:
> > In vioapic_update_EOI the irq_lock will be dropped in order to forward
> > the EOI to the dpci handler, so there's a window between clearing IRR
> > and checking if the line is asserted where IRR can change behind our
> > back.
> > 
> > Fix this by checking whether IRR is set before attempting to inject a
> > new interrupt.
> > 
> > Fixes: 06e3f8f2766 ('vt-d: Do dpci eoi outside of irq_lock.')
> > Signed-off-by: Roger Pau Monné 
> 
> It's fine this way, so
> Reviewed-by: Jan Beulich 
> but how about a slightly different change:
> 
> > --- a/xen/arch/x86/hvm/vioapic.c
> > +++ b/xen/arch/x86/hvm/vioapic.c
> > @@ -526,7 +526,7 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
> >  }
> >  
> >  if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
> > - !ent->fields.mask &&
> > + !ent->fields.mask && !ent->fields.remote_irr &&
> >   hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
> >  {
> >  ent->fields.remote_irr = 1;
> 
> The check is only needed if the lock was dropped intermediately,
> which happens only conditionally. So an alternative would seem
> to be
> 
> if ( is_iommu_enabled(d) )
> {
> spin_unlock(>arch.hvm.irq_lock);
> hvm_dpci_eoi(d, vioapic->base_gsi + pin, ent);
> spin_lock(>arch.hvm.irq_lock);
> 
> if ( ent->fields.remote_irr )
> continue;
> }
> 
> in the code immediately above. Thoughts?

IMO that seems more dangerous as if new code is added below that chunk
that wouldn't depend on the value of IRR it might get skipped
unintentionally, as it's possible to oversight that the loop is
short-circuited here.

Thanks, Roger.

Re: [PATCH v4 5/5] xen/cpupool: make per-cpupool sched-gran hypfs node writable

2021-01-21 Thread Dario Faggioli

On Mon, 2021-01-18 at 12:55 +0100, Juergen Gross wrote:
> Make /cpupool//sched-gran in hypfs writable. This will enable per
> cpupool selectable scheduling granularity.
> 
> Writing this node is allowed only with no cpu assigned to the
> cpupool.
> Allowed are values "cpu", "core" and "socket".
> 
> Signed-off-by: Juergen Gross 
>
Reviewed-by: Dario Faggioli 

This holds with Jan's proposed adjustments.

Regards
-- 
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)


signature.asc
Description: This is a digitally signed message part

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Rahul Singh

Hello Oleksandr,

> On 20 Jan 2021, at 9:33 pm, Oleksandr  wrote:
> 
> 
> On 20.01.21 16:52, Rahul Singh wrote:
> 
> Hi Rahul
> 
>> Add support for ARM architected SMMUv3 implementation. It is based on
>> the Linux SMMUv3 driver.
>> 
>> Driver is currently supported as Tech Preview.
>> 
>> Major differences with regard to Linux driver are as follows:
>> 2. Only Stage-2 translation is supported as compared to the Linux driver
>>that supports both Stage-1 and Stage-2 translations.
>> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>>capability to share the page tables with the CPU.
>> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>>and priority queue IRQ handling.
>> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>>access functions based on atomic operations implemented in Linux.
>>Atomic functions used by the commands queue access functions are not
>>implemented in XEN therefore we decided to port the earlier version
>>of the code. Atomic operations are introduced to fix the bottleneck
>>of the SMMU command queue insertion operation. A new algorithm for
>>inserting commands into the queue is introduced, which is lock-free
>>on the fast-path.
>>Consequence of reverting the patch is that the command queue
>>insertion will be slow for large systems as spinlock will be used to
>>serializes accesses from all CPUs to the single queue supported by
>>the hardware. Once the proper atomic operations will be available in
>>XEN the driver can be updated.
>> 6. Spin lock is used in place of mutex when attaching a device to the
>>SMMU, as there is no blocking locks implementation available in XEN.
>>This might introduce latency in XEN. Need to investigate before
>>driver is out for tech preview.
>> 7. PCI ATS functionality is not supported, as there is no support
>>available in XEN to test the functionality. Code is not tested and
>>compiled. Code is guarded by the flag CONFIG_PCI_ATS.
>> 8. MSI interrupts are not supported as there is no support available in
>>XEN to request MSI interrupts. Code is not tested and compiled. Code
>>is guarded by the flag CONFIG_MSI.
>> 
>> Signed-off-by: Rahul Singh 
>> ---
>> Changes since v2:
>> - added return statement for readx_poll_timeout function.
>> - remove iommu_get_dma_cookie and iommu_put_dma_cookie.
>> - remove struct arm_smmu_xen_device as not required.
>> - move dt_property_match_string to device_tree.c file.
>> - replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
>> - use ARM_SMMU_REG_SZ as size when map memory to XEN.
>> - remove bypass keyword to make sure when device-tree probe is failed we
>>   are reporting error and not continuing to configure SMMU in bypass
>>   mode.
>> - fixed minor comments.
>> Changes since v3:
>> - Fixed typo for CONFIG_MSI
>> - Added back the mutex code
>> - Rebase the patch on top of newly added WARN_ON().
>> - Remove the direct read of register VTCR_EL2.
>> - Fixed minor comments.
>> Changes since v4:
>> - Replace the ffsll() with ffs64() function.
>> - Add code to free resources when probe failed.
> 
> Thank you for addressing, patch looks ok to me, just one small remark below:
> 
> 
>> +
>> +static void __hwdom_init arm_smmu_iommu_hwdom_init(struct domain *d)
>> +{
>> +}
> 
> We discussed in V4 about adding some code here which all IOMMUs on Arm 
> already have, copy it below for the convenience:
> 
> 
>  /* Set to false options not supported on ARM. */
>  if ( iommu_hwdom_inclusive )
>  printk(XENLOG_WARNING
>  "map-inclusive dom0-iommu option is not supported on ARM\n");
>  iommu_hwdom_inclusive = false;
>  if ( iommu_hwdom_reserved == 1 )
>  printk(XENLOG_WARNING
>  "map-reserved dom0-iommu option is not supported on ARM\n");
>  iommu_hwdom_reserved = 0;
> 
>  arch_iommu_hwdom_init(d);
> 
> 
> Also we discussed about possibility to fold the part of code (which disables 
> unsupported options) in arch_iommu_hwdom_init() to avoid duplication as a 
> follow-up.
> At least, I expected to see arch_iommu_hwdom_init() to be called by 
> arm_smmu_iommu_hwdom_init() it current patch... Please note, this is *not* a 
> request for change immediately,
> I think, driver is functional even without this code (hopefully 
> arch_iommu_hwdom_init is empty now, etc).  But, if you happen to do V6 or 
> probably it could be done on commit ...
> 

Yes I will send the patch to move the code to arch_iommu_hwdom_init() to avoid 
duplication once SMMUv3 driver will be merged.
I thought anyway I have to remove the code from SMMUv1 and IPMMU I will take 
care of all the IOMMU driver at the same time because of that I didn’t modify 
the SMMUv3 driver. Yes if there is a need for v6 I will add the 
arch_iommu_hwdom_init(d) in arm_smmu_iommu_hwdom_init().

Regards,
Rahul

> 
>> +
>> +static void arm_smmu_iommu_xen_domain_teardown(struct

Re: [PATCH v5 10/10] xen/arm: smmuv3: Add support for SMMUv3 driver

2021-01-21 Thread Rahul Singh

Hello Stefano,

> On 20 Jan 2021, at 8:31 pm, Stefano Stabellini  wrote:
> 
> On Wed, 20 Jan 2021, Rahul Singh wrote:
>> Add support for ARM architected SMMUv3 implementation. It is based on
>> the Linux SMMUv3 driver.
>> 
>> Driver is currently supported as Tech Preview.
>> 
>> Major differences with regard to Linux driver are as follows:
>> 2. Only Stage-2 translation is supported as compared to the Linux driver
>>   that supports both Stage-1 and Stage-2 translations.
>> 3. Use P2M  page table instead of creating one as SMMUv3 has the
>>   capability to share the page tables with the CPU.
>> 4. Tasklets are used in place of threaded IRQ's in Linux for event queue
>>   and priority queue IRQ handling.
>> 5. Latest version of the Linux SMMUv3 code implements the commands queue
>>   access functions based on atomic operations implemented in Linux.
>>   Atomic functions used by the commands queue access functions are not
>>   implemented in XEN therefore we decided to port the earlier version
>>   of the code. Atomic operations are introduced to fix the bottleneck
>>   of the SMMU command queue insertion operation. A new algorithm for
>>   inserting commands into the queue is introduced, which is lock-free
>>   on the fast-path.
>>   Consequence of reverting the patch is that the command queue
>>   insertion will be slow for large systems as spinlock will be used to
>>   serializes accesses from all CPUs to the single queue supported by
>>   the hardware. Once the proper atomic operations will be available in
>>   XEN the driver can be updated.
>> 6. Spin lock is used in place of mutex when attaching a device to the
>>   SMMU, as there is no blocking locks implementation available in XEN.
>>   This might introduce latency in XEN. Need to investigate before
>>   driver is out for tech preview.
>> 7. PCI ATS functionality is not supported, as there is no support
>>   available in XEN to test the functionality. Code is not tested and
>>   compiled. Code is guarded by the flag CONFIG_PCI_ATS.
>> 8. MSI interrupts are not supported as there is no support available in
>>   XEN to request MSI interrupts. Code is not tested and compiled. Code
>>   is guarded by the flag CONFIG_MSI.
>> 
>> Signed-off-by: Rahul Singh 
>> ---
>> Changes since v2:
>> - added return statement for readx_poll_timeout function.
>> - remove iommu_get_dma_cookie and iommu_put_dma_cookie.
>> - remove struct arm_smmu_xen_device as not required.
>> - move dt_property_match_string to device_tree.c file.
>> - replace arm_smmu_*_thread to arm_smmu_*_tasklet to avoid confusion.
>> - use ARM_SMMU_REG_SZ as size when map memory to XEN.
>> - remove bypass keyword to make sure when device-tree probe is failed we
>>  are reporting error and not continuing to configure SMMU in bypass
>>  mode.
>> - fixed minor comments.
>> Changes since v3:
>> - Fixed typo for CONFIG_MSI
>> - Added back the mutex code
>> - Rebase the patch on top of newly added WARN_ON().
>> - Remove the direct read of register VTCR_EL2.
>> - Fixed minor comments.
>> Changes since v4:
>> - Replace the ffsll() with ffs64() function.
>> - Add code to free resources when probe failed.
>> ---
>> ---
>> MAINTAINERS   |   6 +
>> SUPPORT.md|   1 +
>> xen/drivers/passthrough/Kconfig   |  11 +
>> xen/drivers/passthrough/arm/Makefile  |   1 +
>> xen/drivers/passthrough/arm/smmu-v3.c | 939 ++
>> 5 files changed, 830 insertions(+), 128 deletions(-)
>> 
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 5079b834c2..14240e8e1e 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -249,6 +249,12 @@ F:  xen/include/asm-arm/
>> F:   xen/include/public/arch-arm/
>> F:   xen/include/public/arch-arm.h
>> 
>> +ARM SMMUv3
>> +M:  Bertrand Marquis 
>> +M:  Rahul Singh 
>> +S:  Supported
>> +F:  xen/drivers/passthrough/arm/smmu-v3.c
>> +
>> Change Log
>> M:   Paul Durrant 
>> R:   Community Manager 
>> diff --git a/SUPPORT.md b/SUPPORT.md
>> index ab02aca5f4..5ee3c8651a 100644
>> --- a/SUPPORT.md
>> +++ b/SUPPORT.md
>> @@ -67,6 +67,7 @@ For the Cortex A57 r0p0 - r1p1, see Errata 832075.
>> Status, Intel VT-d: Supported
>> Status, ARM SMMUv1: Supported, not security supported
>> Status, ARM SMMUv2: Supported, not security supported
>> +Status, ARM SMMUv3: Tech Preview
>> Status, Renesas IPMMU-VMSA: Supported, not security supported
>> 
>> ### ARM/GICv3 ITS
>> diff --git a/xen/drivers/passthrough/Kconfig 
>> b/xen/drivers/passthrough/Kconfig
>> index 0036007ec4..341ba92b30 100644
>> --- a/xen/drivers/passthrough/Kconfig
>> +++ b/xen/drivers/passthrough/Kconfig
>> @@ -13,6 +13,17 @@ config ARM_SMMU
>>Say Y here if your SoC includes an IOMMU device implementing the
>>ARM SMMU architecture.
>> 
>> +config ARM_SMMU_V3
>> +bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" if EXPERT
>> +depends on ARM_64
>> +---help---
>> + Support for implementations of the ARM System MMU

Re: [PATCH v2 2/4] x86/vioapic: issue EOI to dpci when switching pin to edge trigger mode

2021-01-21 Thread Jan Beulich

On 15.01.2021 15:28, Roger Pau Monne wrote:
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -268,6 +268,17 @@ static void vioapic_write_redirent(
>  
>  spin_unlock(>arch.hvm.irq_lock);
>  
> +if ( ent.fields.trig_mode == VIOAPIC_EDGE_TRIG &&
> + ent.fields.remote_irr && is_iommu_enabled(d) )
> +/*
> + * Since IRR has been cleared and further interrupts can be
> + * injected also attempt to deassert any virtual line of passed
> + * through devices using this pin. Switching a pin from level to
> + * trigger mode can be used as a way to EOI an interrupt at the
> + * IO-APIC level.
> + */
> +hvm_dpci_eoi(d, gsi);
> +
>  if ( is_hardware_domain(d) && unmasked )
>  {
>  /*

I assume in the comment you mean "... from level to edge
mode ...". But this isn't reflected in the if() you add -
you do the same also when the mode doesn't change. Or do
you build on the assumption that ent.fields.remote_irr can
only be set if the prior mode was "level" (in which case
an assertion may be warranted, as I don't think this is
overly obvious)?

Also, looking at this code, is it correct to trigger an IRQ
upon the guest writing the upper half of an unmasked RTE
with remote_irr clear? I'd assume this needs to be strictly
limited to a 1->0 transition of the mask bit. If other code
indeed guarantees this in all cases, perhaps another place
where an assertion would be warranted?

Jan

Re: [PATCH v4 5/5] xen/cpupool: make per-cpupool sched-gran hypfs node writable

2021-01-21 Thread Jürgen Groß


On 21.01.21 16:55, Jan Beulich wrote:

On 18.01.2021 12:55, Juergen Gross wrote:

Make /cpupool//sched-gran in hypfs writable. This will enable per
cpupool selectable scheduling granularity.

Writing this node is allowed only with no cpu assigned to the cpupool.
Allowed are values "cpu", "core" and "socket".

Signed-off-by: Juergen Gross 


Reviewed-by: Jan Beulich 
with two small adjustment requests:


@@ -85,36 +85,43 @@ static int __init sched_select_granularity(const char *str)
  {
  if ( strcmp(sg_name[i].name, str) == 0 )
  {
-opt_sched_granularity = sg_name[i].mode;
+*mode = sg_name[i].mode;
  return 0;
  }
  }
  
  return -EINVAL;

  }
+
+static int __init sched_select_granularity(const char *str)
+{
+return sched_gran_get(str, _sched_granularity);
+}
  custom_param("sched-gran", sched_select_granularity);
+#elif CONFIG_HYPFS


Missing defined().


@@ -1126,17 +1136,55 @@ static unsigned int hypfs_gran_getsize(const struct 
hypfs_entry *entry)
  return strlen(gran) + 1;
  }
  
+static int cpupool_gran_write(struct hypfs_entry_leaf *leaf,

+  XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
+  unsigned int ulen)
+{
+const struct hypfs_dyndir_id *data;
+struct cpupool *cpupool;
+enum sched_gran gran;
+unsigned int sched_gran = 0;
+char name[SCHED_GRAN_NAME_LEN];
+int ret = 0;
+
+if ( ulen > SCHED_GRAN_NAME_LEN )
+return -ENOSPC;
+
+if ( copy_from_guest(name, uaddr, ulen) )
+return -EFAULT;
+
+if ( memchr(name, 0, ulen) == (name + ulen - 1) )
+sched_gran = sched_gran_get(name, ) ?
+ 0 : cpupool_check_granularity(gran);
+if ( sched_gran == 0 )
+return -EINVAL;
+
+data = hypfs_get_dyndata();
+cpupool = data->data;
+ASSERT(cpupool);
+
+if ( !cpumask_empty(cpupool->cpu_valid) )
+ret = -EBUSY;
+else
+{
+cpupool->gran = gran;
+cpupool->sched_gran = sched_gran;
+}


I think this could do with a comment clarifying what lock this
is protected by, as the function itself has no sign of any
locking, not even an assertion that a certain lock is being held.
If you were to suggest some text, this as well as the minor issue
above could likely be taken care of while committing.


cpupool_gran_[read|write]() are both guarded by the cpupool_lock
taken via cpupool_dir_enter().


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature

[xen-unstable test] 158550: tolerable FAIL - PUSHED

2021-01-21 Thread osstest service owner

flight 158550 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158550/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 158542
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 158542
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 158542
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 158542
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 158542
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 158542
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 158542
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 158542
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 158542
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 158542
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 158542
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  e8adbf680b56a3f4b9600c7bcc04fec1877a6213
baseline version:
 xen  3487f4cf8bf5cef47a4c3918c13a502afc9891f6

Last test of basis   158542  2021-01-20 12:59:36 Z1 days
Testing same since   158550  2021-01-21 01:39:13 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Stefano Stabellini 
  Vladimir Murzin 
  Wei Chen 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass

Re: [PATCH v2 1/4] x86/vioapic: check IRR before attempting to inject interrupt after EOI

2021-01-21 Thread Jan Beulich

On 15.01.2021 15:28, Roger Pau Monne wrote:
> In vioapic_update_EOI the irq_lock will be dropped in order to forward
> the EOI to the dpci handler, so there's a window between clearing IRR
> and checking if the line is asserted where IRR can change behind our
> back.
> 
> Fix this by checking whether IRR is set before attempting to inject a
> new interrupt.
> 
> Fixes: 06e3f8f2766 ('vt-d: Do dpci eoi outside of irq_lock.')
> Signed-off-by: Roger Pau Monné 

It's fine this way, so
Reviewed-by: Jan Beulich 
but how about a slightly different change:

> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -526,7 +526,7 @@ void vioapic_update_EOI(struct domain *d, u8 vector)
>  }
>  
>  if ( (ent->fields.trig_mode == VIOAPIC_LEVEL_TRIG) &&
> - !ent->fields.mask &&
> + !ent->fields.mask && !ent->fields.remote_irr &&
>   hvm_irq->gsi_assert_count[vioapic->base_gsi + pin] )
>  {
>  ent->fields.remote_irr = 1;

The check is only needed if the lock was dropped intermediately,
which happens only conditionally. So an alternative would seem
to be

if ( is_iommu_enabled(d) )
{
spin_unlock(>arch.hvm.irq_lock);
hvm_dpci_eoi(d, vioapic->base_gsi + pin, ent);
spin_lock(>arch.hvm.irq_lock);

if ( ent->fields.remote_irr )
continue;
}

in the code immediately above. Thoughts?

Jan

Re: [PATCH v4 5/5] xen/cpupool: make per-cpupool sched-gran hypfs node writable

2021-01-21 Thread Jan Beulich

On 18.01.2021 12:55, Juergen Gross wrote:
> Make /cpupool//sched-gran in hypfs writable. This will enable per
> cpupool selectable scheduling granularity.
> 
> Writing this node is allowed only with no cpu assigned to the cpupool.
> Allowed are values "cpu", "core" and "socket".
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich 
with two small adjustment requests:

> @@ -85,36 +85,43 @@ static int __init sched_select_granularity(const char 
> *str)
>  {
>  if ( strcmp(sg_name[i].name, str) == 0 )
>  {
> -opt_sched_granularity = sg_name[i].mode;
> +*mode = sg_name[i].mode;
>  return 0;
>  }
>  }
>  
>  return -EINVAL;
>  }
> +
> +static int __init sched_select_granularity(const char *str)
> +{
> +return sched_gran_get(str, _sched_granularity);
> +}
>  custom_param("sched-gran", sched_select_granularity);
> +#elif CONFIG_HYPFS

Missing defined().

> @@ -1126,17 +1136,55 @@ static unsigned int hypfs_gran_getsize(const struct 
> hypfs_entry *entry)
>  return strlen(gran) + 1;
>  }
>  
> +static int cpupool_gran_write(struct hypfs_entry_leaf *leaf,
> +  XEN_GUEST_HANDLE_PARAM(const_void) uaddr,
> +  unsigned int ulen)
> +{
> +const struct hypfs_dyndir_id *data;
> +struct cpupool *cpupool;
> +enum sched_gran gran;
> +unsigned int sched_gran = 0;
> +char name[SCHED_GRAN_NAME_LEN];
> +int ret = 0;
> +
> +if ( ulen > SCHED_GRAN_NAME_LEN )
> +return -ENOSPC;
> +
> +if ( copy_from_guest(name, uaddr, ulen) )
> +return -EFAULT;
> +
> +if ( memchr(name, 0, ulen) == (name + ulen - 1) )
> +sched_gran = sched_gran_get(name, ) ?
> + 0 : cpupool_check_granularity(gran);
> +if ( sched_gran == 0 )
> +return -EINVAL;
> +
> +data = hypfs_get_dyndata();
> +cpupool = data->data;
> +ASSERT(cpupool);
> +
> +if ( !cpumask_empty(cpupool->cpu_valid) )
> +ret = -EBUSY;
> +else
> +{
> +cpupool->gran = gran;
> +cpupool->sched_gran = sched_gran;
> +}

I think this could do with a comment clarifying what lock this
is protected by, as the function itself has no sign of any
locking, not even an assertion that a certain lock is being held.
If you were to suggest some text, this as well as the minor issue
above could likely be taken care of while committing.

Jan

Re: [PATCH v4 4/5] xen/cpupool: add scheduling granularity entry to cpupool entries

2021-01-21 Thread Jan Beulich

On 18.01.2021 12:55, Juergen Gross wrote:
> Add a "sched-gran" entry to the per-cpupool hypfs directories.
> 
> For now make this entry read-only and let it contain one of the
> strings "cpu", "core" or "socket".
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich

Re: [PATCH v4 2/5] xen/hypfs: add support for id-based dynamic directories

2021-01-21 Thread Jan Beulich

On 18.01.2021 12:55, Juergen Gross wrote:
> Add some helpers to hypfs.c to support dynamic directories with a
> numerical id as name.
> 
> The dynamic directory is based on a template specified by the user
> allowing to use specific access functions and having a predefined
> set of entries in the directory.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich

Re: [RFC PATCH v3 5/6] dt-bindings: of: Add restricted DMA pool

2021-01-21 Thread Rob Herring

On Wed, Jan 20, 2021 at 7:10 PM Robin Murphy  wrote:
>
> On 2021-01-20 21:31, Rob Herring wrote:
> > On Wed, Jan 20, 2021 at 11:30 AM Robin Murphy  wrote:
> >>
> >> On 2021-01-20 16:53, Rob Herring wrote:
> >>> On Wed, Jan 06, 2021 at 11:41:23AM +0800, Claire Chang wrote:
>  Introduce the new compatible string, restricted-dma-pool, for restricted
>  DMA. One can specify the address and length of the restricted DMA memory
>  region by restricted-dma-pool in the device tree.
> >>>
> >>> If this goes into DT, I think we should be able to use dma-ranges for
> >>> this purpose instead. Normally, 'dma-ranges' is for physical bus
> >>> restrictions, but there's no reason it can't be used for policy or to
> >>> express restrictions the firmware has enabled.
> >>
> >> There would still need to be some way to tell SWIOTLB to pick up the
> >> corresponding chunk of memory and to prevent the kernel from using it
> >> for anything else, though.
> >
> > Don't we already have that problem if dma-ranges had a very small
> > range? We just get lucky because the restriction is generally much
> > more RAM than needed.
>
> Not really - if a device has a naturally tiny addressing capability that
> doesn't even cover ZONE_DMA32 where the regular SWIOTLB buffer will be
> allocated then it's unlikely to work well, but that's just crap system
> design. Yes, memory pressure in ZONE_DMA{32} is particularly problematic
> for such limited devices, but it's irrelevant to the issue at hand here.

Yesterday's crap system design is today's security feature. Couldn't
this feature make crap system design work better?

> What we have here is a device that's not allowed to see *kernel* memory
> at all. It's been artificially constrained to a particular region by a
> TZASC or similar, and the only data which should ever be placed in that

May have been constrained, but that's entirely optional.

In the optional case where the setup is entirely up to the OS, I don't
think this belongs in the DT at all. Perhaps that should be solved
first.

> region is data intended for that device to see. That way if it tries to
> go rogue it physically can't start slurping data intended for other
> devices or not mapped for DMA at all. The bouncing is an important part
> of this - I forget the title off-hand but there was an interesting paper
> a few years ago which demonstrated that even with an IOMMU, streaming
> DMA of in-place buffers could reveal enough adjacent data from the same
> page to mount an attack on the system. Memory pressure should be
> immaterial since the size of each bounce pool carveout will presumably
> be tuned for the needs of the given device.
>
> > In any case, wouldn't finding all the dma-ranges do this? We're
> > already walking the tree to find the max DMA address now.
>
> If all you can see are two "dma-ranges" properties, how do you propose
> to tell that one means "this is the extent of what I can address, please
> set my masks and dma-range-map accordingly and try to allocate things
> where I can reach them" while the other means "take this output range
> away from the page allocator and hook it up as my dedicated bounce pool,
> because it is Serious Security Time"? Especially since getting that
> choice wrong either way would be a Bad Thing.

Either we have some heuristic based on the size or we add some hint.
The point is let's build on what we already have for defining DMA
accessible memory in DT rather than some parallel mechanism.

Rob

Re: [PATCH v4 1/5] xen/hypfs: support dynamic hypfs nodes

2021-01-21 Thread Jan Beulich

On 18.01.2021 12:55, Juergen Gross wrote:
> Add a HYPFS_VARDIR_INIT() macro for initializing such a directory
> statically, taking a struct hypfs_funcs pointer as parameter additional
> to those of HYPFS_DIR_INIT().
> 
> Modify HYPFS_VARSIZE_INIT() to take the function vector pointer as an
> additional parameter as this will be needed for dynamical entries.
> 
> For being able to let the generic hypfs coding continue to work on
> normal struct hypfs_entry entities even for dynamical nodes add some
> infrastructure for allocating a working area for the current hypfs
> request in order to store needed information for traversing the tree.
> This area is anchored in a percpu pointer and can be retrieved by any
> level of the dynamic entries. The normal way to handle allocation and
> freeing is to allocate the data in the enter() callback of a node and
> to free it in the related exit() callback.
> 
> Add a hypfs_add_dyndir() function for adding a dynamic directory
> template to the tree, which is needed for having the correct reference
> to its position in hypfs.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Jan Beulich

[PATCH v2.5 1/5] libxenguest: support zstd compressed kernels

2021-01-21 Thread Jan Beulich

This follows the logic used for other decompression methods utilizing an
external library, albeit here we can't ignore the 32-bit size field
appended to the compressed image - its presence causes decompression to
fail. Leverage the field instead to allocate the output buffer in one
go, i.e. without incrementally realloc()ing.

As far as configure.ac goes, I'm pretty sure there is a better (more
"standard") way of using PKG_CHECK_MODULES(). The construct also gets
put next to the other decompression library checks, albeit I think they
all ought to be x86-specific (e.g. placed in the existing case block a
few lines down).

Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
they get removed.

Signed-off-by: Jan Beulich 
Acked-by: Wei Liu 
---
Note for committer: As an alternative to patching tools/configure here,
autoconf may want re-running.
---
v2.5: Don't make libzstd a hard dependency. Adjust ./README.
v2: New.

--- a/README
+++ b/README
@@ -84,6 +84,8 @@ disabled at compile time:
 * 16-bit x86 assembler, loader and compiler for qemu-traditional / rombios
   (dev86 rpm or bin86 & bcc debs)
 * Development install of liblzma for rombios
+* Development install of libbz2, liblzma, liblzo2, and libzstd for DomU
+  kernel decompression.
 
 Second, you need to acquire a suitable kernel for use in domain 0. If
 possible you should use a kernel provided by your OS distributor. If
--- a/tools/configure
+++ b/tools/configure
@@ -643,6 +643,8 @@ PTHREAD_CFLAGS
 EXTFS_LIBS
 system_aio
 zlib
+libzstd_LIBS
+libzstd_CFLAGS
 FETCHER
 FTP
 FALSE
@@ -857,6 +859,8 @@ glib_CFLAGS
 glib_LIBS
 pixman_CFLAGS
 pixman_LIBS
+libzstd_CFLAGS
+libzstd_LIBS
 LIBNL3_CFLAGS
 LIBNL3_LIBS
 SYSTEMD_CFLAGS
@@ -1605,6 +1609,10 @@ Some influential environment variables:
   pixman_CFLAGS
   C compiler flags for pixman, overriding pkg-config
   pixman_LIBS linker flags for pixman, overriding pkg-config
+  libzstd_CFLAGS
+  C compiler flags for libzstd, overriding pkg-config
+  libzstd_LIBS
+  linker flags for libzstd, overriding pkg-config
   LIBNL3_CFLAGS
   C compiler flags for LIBNL3, overriding pkg-config
   LIBNL3_LIBS linker flags for LIBNL3, overriding pkg-config
@@ -8744,6 +8752,84 @@ fi
 
 
 
+pkg_failed=no
+{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for libzstd" >&5
+$as_echo_n "checking for libzstd... " >&6; }
+
+if test -n "$libzstd_CFLAGS"; then
+pkg_cv_libzstd_CFLAGS="$libzstd_CFLAGS"
+ elif test -n "$PKG_CONFIG"; then
+if test -n "$PKG_CONFIG" && \
+{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists 
--print-errors \"libzstd\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_libzstd_CFLAGS=`$PKG_CONFIG --cflags "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+pkg_failed=untried
+fi
+if test -n "$libzstd_LIBS"; then
+pkg_cv_libzstd_LIBS="$libzstd_LIBS"
+ elif test -n "$PKG_CONFIG"; then
+if test -n "$PKG_CONFIG" && \
+{ { $as_echo "$as_me:${as_lineno-$LINENO}: \$PKG_CONFIG --exists 
--print-errors \"libzstd\""; } >&5
+  ($PKG_CONFIG --exists --print-errors "libzstd") 2>&5
+  ac_status=$?
+  $as_echo "$as_me:${as_lineno-$LINENO}: \$? = $ac_status" >&5
+  test $ac_status = 0; }; then
+  pkg_cv_libzstd_LIBS=`$PKG_CONFIG --libs "libzstd" 2>/dev/null`
+ test "x$?" != "x0" && pkg_failed=yes
+else
+  pkg_failed=yes
+fi
+ else
+pkg_failed=untried
+fi
+
+
+
+if test $pkg_failed = yes; then
+   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+
+if $PKG_CONFIG --atleast-pkgconfig-version 0.20; then
+_pkg_short_errors_supported=yes
+else
+_pkg_short_errors_supported=no
+fi
+if test $_pkg_short_errors_supported = yes; then
+   libzstd_PKG_ERRORS=`$PKG_CONFIG --short-errors --print-errors 
--cflags --libs "libzstd" 2>&1`
+else
+   libzstd_PKG_ERRORS=`$PKG_CONFIG --print-errors --cflags --libs 
"libzstd" 2>&1`
+fi
+   # Put the nasty error message in config.log where it belongs
+   echo "$libzstd_PKG_ERRORS" >&5
+
+   true
+elif test $pkg_failed = untried; then
+   { $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
+$as_echo "no" >&6; }
+   true
+else
+   libzstd_CFLAGS=$pkg_cv_libzstd_CFLAGS
+   libzstd_LIBS=$pkg_cv_libzstd_LIBS
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: yes" >&5
+$as_echo "yes" >&6; }
+   true
+fi
+if test -z "$libzstd_PKG_ERRORS"
+then
+zlib="$zlib -DHAVE_ZSTD $libzstd_CFLAGS $libzstd_LIBS"
+else
+{ $as_echo "$as_me:${as_lineno-$LINENO}: WARNING: \"disabling zstd 
decompression: $libzstd_PKG_ERRORS\"" >&5
+$as_echo "$as_me: WARNING: \"disabling zstd decompression: 
$libzstd_PKG_ERRORS\"" >&2;}
+fi
+

Re: [PATCH v2 5/5] libxenguest: simplify kernel decompression

2021-01-21 Thread Wei Liu

On Tue, Jan 19, 2021 at 04:17:12PM +0100, Jan Beulich wrote:
> In all cases the kernel build makes available the uncompressed size in
> the final 4 bytes of the bzImage payload. Utilize this to avoid
> repeated realloc()ing of the output buffer.
> 
> As a side effect this also addresses the previous mistaken return of 0
> (success) from xc_try_{bzip2,lzma,xz}_decode() in case
> xc_dom_register_external() would have failed.
> 
> As another side effect this also addresses the first error path of
> _xc_try_lzma_decode() previously bypassing lzma_end().
> 
> Signed-off-by: Jan Beulich 

I think the code changes are correct:

Acked-by: Wei Liu 

But a second pair of eyes would be useful here since this patch is
complex.

Wei.

Re: [PATCH v2 1/5] libxenguest: support zstd compressed kernels

2021-01-21 Thread Wei Liu

On Thu, Jan 21, 2021 at 04:05:39PM +0100, Jan Beulich wrote:
> On 21.01.2021 16:01, Wei Liu wrote:
> > On Tue, Jan 19, 2021 at 04:15:25PM +0100, Jan Beulich wrote:
> >> This follows the logic used for other decompression methods utilizing an
> >> external library, albeit here we can't ignore the 32-bit size field
> >> appended to the compressed image - its presence causes decompression to
> >> fail. Leverage the field instead to allocate the output buffer in one
> >> go, i.e. without incrementally realloc()ing.
> >>
> >> Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
> >> they get removed.
> >>
> >> Signed-off-by: Jan Beulich 
> > 
> > 
> > Acked-by: Wei Liu 
> 
> Thanks, but I'm about to send v2.5 to address the issue reported
> by Michael Young (adjusting configure{.ac,} only).
> 
> >> ---
> >> Note for committer: As an alternative to patching tools/configure here,
> >> autoconf may want re-running.
> > 
> > Noted. FWIW I use Debian 10 to generate configure. If I don't get around
> > doing it someone who runs the same system should be able to help.
> 
> Well, the version I've been using to re-generate isn't identical
> to yours, but allows easily filtering out and discarding the few
> extra differences, which is why I dared to provide a diff for
> tools/configure in the first place. IOW if I would commit this
> myself, all I would ask that you or someone else using the
> "original" autoconf to generate the committed version to double
> check.

This works too.

Wei.

> 
> Jan

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Jan Beulich

On 21.01.2021 16:05, Roger Pau Monné wrote:
> On Thu, Jan 21, 2021 at 03:50:55PM +0100, Jan Beulich wrote:
>> On 21.01.2021 15:34, Roger Pau Monné wrote:
>>> On Thu, Jan 21, 2021 at 03:20:12PM +0100, Marek Marczykowski-Górecki wrote:
 On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
> Xen Security Advisory XSA-360
>
> IRQ vector leak on x86
>
> ISSUE DESCRIPTION
> =
>
> A x86 HVM guest with PCI pass through devices can force the allocation
> of all IDT vectors on the system by rebooting itself with MSI or MSI-X
> capabilities enabled and entries setup.

 (...)

> MITIGATION
> ==
>
> Not running HVM guests with PCI pass through devices will avoid the
> vulnerability.  Note that even non-malicious guests can trigger this
> vulnerability as part of normal operation.

 Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?
>>>
>>> Kind of. Note you will still leak the in use vectors when the guest is
>>> destroyed, but that would prevent the guest from entering a reboot
>>> loop and exhausting all vectors on the system unless the admin starts
>>> it again.
>>>
>>> In that case I think the premise of a guest 'rebooting itself' doesn't
>>> apply anymore, since the guest won't be able to perform such
>>> operation.
>>
>> And how exactly would an admin tell a guest from rebooting for
>> fair reasons from one rebooting for malicious reasons? To me,
>> setting 'on_reboot="destroy"' would imply there's then some
>> other mechanism to restart the guest (possibly with some delay),
>> or else a reboot attempt by this guest would effectively be a
>> DoS to its users.
> 
> Well, I would expect there are deployments or configurations that
> simply don't expect (some) domains to reboot themselves. Ie: for
> example you won't expect driver domains to restart themselves I think,
> and hence you could safely use on_reboot="destroy" in that case to
> mitigate a compromised driver domain from exploiting this
> vulnerability.

Otoh a driver domain may warrant 'oncrash="restart"', to limit
downtime of depending domains. Or, like Xen does by default, a
driver domain may invoke its own restart when crashed.

Jan

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Roger Pau Monné

On Thu, Jan 21, 2021 at 03:50:55PM +0100, Jan Beulich wrote:
> On 21.01.2021 15:34, Roger Pau Monné wrote:
> > On Thu, Jan 21, 2021 at 03:20:12PM +0100, Marek Marczykowski-Górecki wrote:
> >> On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
> >>> Xen Security Advisory XSA-360
> >>>
> >>> IRQ vector leak on x86
> >>>
> >>> ISSUE DESCRIPTION
> >>> =
> >>>
> >>> A x86 HVM guest with PCI pass through devices can force the allocation
> >>> of all IDT vectors on the system by rebooting itself with MSI or MSI-X
> >>> capabilities enabled and entries setup.
> >>
> >> (...)
> >>
> >>> MITIGATION
> >>> ==
> >>>
> >>> Not running HVM guests with PCI pass through devices will avoid the
> >>> vulnerability.  Note that even non-malicious guests can trigger this
> >>> vulnerability as part of normal operation.
> >>
> >> Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?
> > 
> > Kind of. Note you will still leak the in use vectors when the guest is
> > destroyed, but that would prevent the guest from entering a reboot
> > loop and exhausting all vectors on the system unless the admin starts
> > it again.
> > 
> > In that case I think the premise of a guest 'rebooting itself' doesn't
> > apply anymore, since the guest won't be able to perform such
> > operation.
> 
> And how exactly would an admin tell a guest from rebooting for
> fair reasons from one rebooting for malicious reasons? To me,
> setting 'on_reboot="destroy"' would imply there's then some
> other mechanism to restart the guest (possibly with some delay),
> or else a reboot attempt by this guest would effectively be a
> DoS to its users.

Well, I would expect there are deployments or configurations that
simply don't expect (some) domains to reboot themselves. Ie: for
example you won't expect driver domains to restart themselves I think,
and hence you could safely use on_reboot="destroy" in that case to
mitigate a compromised driver domain from exploiting this
vulnerability.

Roger.

Re: [PATCH v2 1/5] libxenguest: support zstd compressed kernels

2021-01-21 Thread Jan Beulich

On 21.01.2021 16:01, Wei Liu wrote:
> On Tue, Jan 19, 2021 at 04:15:25PM +0100, Jan Beulich wrote:
>> This follows the logic used for other decompression methods utilizing an
>> external library, albeit here we can't ignore the 32-bit size field
>> appended to the compressed image - its presence causes decompression to
>> fail. Leverage the field instead to allocate the output buffer in one
>> go, i.e. without incrementally realloc()ing.
>>
>> Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
>> they get removed.
>>
>> Signed-off-by: Jan Beulich 
> 
> 
> Acked-by: Wei Liu 

Thanks, but I'm about to send v2.5 to address the issue reported
by Michael Young (adjusting configure{.ac,} only).

>> ---
>> Note for committer: As an alternative to patching tools/configure here,
>> autoconf may want re-running.
> 
> Noted. FWIW I use Debian 10 to generate configure. If I don't get around
> doing it someone who runs the same system should be able to help.

Well, the version I've been using to re-generate isn't identical
to yours, but allows easily filtering out and discarding the few
extra differences, which is why I dared to provide a diff for
tools/configure in the first place. IOW if I would commit this
myself, all I would ask that you or someone else using the
"original" autoconf to generate the committed version to double
check.

Jan

Re: [PATCH v2 4/5] libxenguest: drop redundant decompression declarations

2021-01-21 Thread Wei Liu

On Tue, Jan 19, 2021 at 04:16:53PM +0100, Jan Beulich wrote:
> The ones in xg_dom_decompress_unsafe.h suffice.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Wei Liu

Re: [PATCH v2 3/5] libxenguest: "standardize" LZO kernel decompression code

2021-01-21 Thread Wei Liu

On Tue, Jan 19, 2021 at 04:16:35PM +0100, Jan Beulich wrote:
> Add a DOMPRINTF() other methods have, indicating success. To facilitate
> this, introduce an "outsize" local variable and update *size as well as
> *blob only once done. The latter then also avoids leaving a pointer to
> freed memory in dom->kernel_blob in case of a decompression error.
> 
> Signed-off-by: Jan Beulich 

Acked-by: Wei Liu

Re: [PATCH v2 2/5] xen/decompress: make helper symbols static

2021-01-21 Thread Wei Liu

On Tue, Jan 19, 2021 at 04:15:53PM +0100, Jan Beulich wrote:
> The individual decompression CUs need to only surface their top level
> functions to other code. Arrange for everything else to be static, to
> make sure no undue uses of that code exist or will appear without
> explicitly noticing. (In some cases this also results in code size
> reduction, but since this is all init-only code this probably doesn't
> matter very much.)
> 
> In the LZO case also take the opportunity and convert u8 where lines
> get touched anyway.
> 
> The downside is that the top level functions will now be non-static
> in stubdom builds of libxenguest, but I think that's acceptable. This
> does require declaring them first, though, as the compiler warns about
> the lack of declarations.
> 
> Signed-off-by: Jan Beulich 
> Acked-by: Andrew Cooper 

Acked-by: Wei Liu

Re: [PATCH v2 1/5] libxenguest: support zstd compressed kernels

2021-01-21 Thread Wei Liu

On Tue, Jan 19, 2021 at 04:15:25PM +0100, Jan Beulich wrote:
> This follows the logic used for other decompression methods utilizing an
> external library, albeit here we can't ignore the 32-bit size field
> appended to the compressed image - its presence causes decompression to
> fail. Leverage the field instead to allocate the output buffer in one
> go, i.e. without incrementally realloc()ing.
> 
> Note that, where possible, instead of #ifdef-ing xen/*.h inclusions,
> they get removed.
> 
> Signed-off-by: Jan Beulich 


Acked-by: Wei Liu 

> ---
> Note for committer: As an alternative to patching tools/configure here,
> autoconf may want re-running.

Noted. FWIW I use Debian 10 to generate configure. If I don't get around
doing it someone who runs the same system should be able to help.

Re: [PATCH v2 4/4] tools/libs: Apply MSR policy to a guest

2021-01-21 Thread Wei Liu

On Wed, Jan 20, 2021 at 05:49:12PM -0500, Boris Ostrovsky wrote:
> When creating a guest, if ignore_msrs option has been specified,
> apply it to guest's MSR policy.
> 
> Signed-off-by: Boris Ostrovsky 

Acked-by: Wei Liu

Re: [PATCH v2 1/4] xl: Add support for ignore_msrs option

2021-01-21 Thread Wei Liu

On Wed, Jan 20, 2021 at 05:49:09PM -0500, Boris Ostrovsky wrote:
> This option allows guest administrator specify what should happen when
> guest accesses an MSR which is not explicitly emulated by the hypervisor.
> 
> Signed-off-by: Boris Ostrovsky 
> ---
>  docs/man/xl.cfg.5.pod.in | 20 +++-
>  tools/libs/light/libxl_types.idl |  7 +++
>  tools/xl/xl_parse.c  |  7 +++
>  3 files changed, 33 insertions(+), 1 deletion(-)

It is mainly missing a #define LIBXL_HAVE_XXX in libxl.h.

Other than that, this patch looks good to me. If you end up resending
this series, please fix that issue.

If other patches are all reviewed, you can provide some text to be
merged into this patch.

Wei.

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Jan Beulich

On 21.01.2021 15:34, Roger Pau Monné wrote:
> On Thu, Jan 21, 2021 at 03:20:12PM +0100, Marek Marczykowski-Górecki wrote:
>> On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
>>> Xen Security Advisory XSA-360
>>>
>>> IRQ vector leak on x86
>>>
>>> ISSUE DESCRIPTION
>>> =
>>>
>>> A x86 HVM guest with PCI pass through devices can force the allocation
>>> of all IDT vectors on the system by rebooting itself with MSI or MSI-X
>>> capabilities enabled and entries setup.
>>
>> (...)
>>
>>> MITIGATION
>>> ==
>>>
>>> Not running HVM guests with PCI pass through devices will avoid the
>>> vulnerability.  Note that even non-malicious guests can trigger this
>>> vulnerability as part of normal operation.
>>
>> Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?
> 
> Kind of. Note you will still leak the in use vectors when the guest is
> destroyed, but that would prevent the guest from entering a reboot
> loop and exhausting all vectors on the system unless the admin starts
> it again.
> 
> In that case I think the premise of a guest 'rebooting itself' doesn't
> apply anymore, since the guest won't be able to perform such
> operation.

And how exactly would an admin tell a guest from rebooting for
fair reasons from one rebooting for malicious reasons? To me,
setting 'on_reboot="destroy"' would imply there's then some
other mechanism to restart the guest (possibly with some delay),
or else a reboot attempt by this guest would effectively be a
DoS to its users.

Jan

Re: [PATCH v7 7/7] libxl / libxlu: support 'xl pci-attach/detach' by name

2021-01-21 Thread Wei Liu

On Tue, Jan 05, 2021 at 05:46:42PM +, Paul Durrant wrote:
> From: Paul Durrant 
> 
> This patch modifies libxlu_pci_parse_spec_string() to parse the new 'name'
> parameter of PCI_SPEC_STRING detailed in the updated documention in
> xl-pci-configuration(5) and populate the 'name' field of 'libxl_device_pci'.
> 
> If the 'name' field is non-NULL then both libxl_device_pci_add() and
> libxl_device_pci_remove() will use it to look up the device BDF in
> the list of assignable devices.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Wei Liu

Re: [PATCH v7 5/7] xl: support naming of assignable devices

2021-01-21 Thread Wei Liu

On Tue, Jan 05, 2021 at 05:46:40PM +, Paul Durrant wrote:
> From: Paul Durrant 
> 
> With this patch applied 'xl pci-assignable-add' will take an optional '--name'
> parameter, 'xl pci-assignable-remove' can be passed either a BDF or a name and
> 'xl pci-assignable-list' will take a optional '--show-names' flag which
> determines whether names are displayed in its output.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Wei Liu

Re: [PATCH v7 4/7] libxl: add 'name' field to 'libxl_device_pci' in the IDL...

2021-01-21 Thread Wei Liu

On Tue, Jan 05, 2021 at 05:46:39PM +, Paul Durrant wrote:
> From: Paul Durrant 
> 
> ... and modify libxl_pci_bdf_assignable_add/remove/list() to make use of it.
> 
> libxl_pci_bdf_assignable_add() will store the name of the device in xenstore
> if the field is specified (i.e. non-NULL) and 
> libxl_pci_bdf_assignable_remove()
> will remove devices specified only by name, looking up the BDF as necessary.
> 
> libxl_pci_bdf_assignable_list() will also populate the 'name' field if a name
> was stored by libxl_pci_bdf_assignable_add().
> 
> NOTE: This patch also fixes whitespace in the declaration of 
> 'libxl_device_pci'
>   in the IDL.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Wei Liu

Re: [PATCH v7 3/7] libxl: stop setting 'vdevfn' in pci_struct_fill()

2021-01-21 Thread Wei Liu

On Tue, Jan 05, 2021 at 05:46:38PM +, Paul Durrant wrote:
> From: Paul Durrant 
> 
> There are only two call-sites. One always sets it to 0 (which is unnecessary
> as the structure is already initialized to zero) and the other can simply set
> the 'vdevfn' field directly (after proper structure initialization), avoiding
> the need for a local variable.
> 
> A subsequent patch will also make use of pci_struct_fill() in a context
> where 'vdevfn' may already have been set.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Wei Liu

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Roger Pau Monné

On Thu, Jan 21, 2021 at 03:20:12PM +0100, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
> > Xen Security Advisory XSA-360
> > 
> > IRQ vector leak on x86
> > 
> > ISSUE DESCRIPTION
> > =
> > 
> > A x86 HVM guest with PCI pass through devices can force the allocation
> > of all IDT vectors on the system by rebooting itself with MSI or MSI-X
> > capabilities enabled and entries setup.
> 
> (...)
> 
> > MITIGATION
> > ==
> > 
> > Not running HVM guests with PCI pass through devices will avoid the
> > vulnerability.  Note that even non-malicious guests can trigger this
> > vulnerability as part of normal operation.
> 
> Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?

Kind of. Note you will still leak the in use vectors when the guest is
destroyed, but that would prevent the guest from entering a reboot
loop and exhausting all vectors on the system unless the admin starts
it again.

In that case I think the premise of a guest 'rebooting itself' doesn't
apply anymore, since the guest won't be able to perform such
operation.

Roger.

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Jan Beulich

On 21.01.2021 15:20, Marek Marczykowski-Górecki wrote:
> On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
>> MITIGATION
>> ==
>>
>> Not running HVM guests with PCI pass through devices will avoid the
>> vulnerability.  Note that even non-malicious guests can trigger this
>> vulnerability as part of normal operation.
> 
> Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?

I don't think so, no.

Jan

Re: Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Marek Marczykowski-Górecki

On Thu, Jan 21, 2021 at 02:10:48PM +, Xen.org security team wrote:
> Xen Security Advisory XSA-360
> 
> IRQ vector leak on x86
> 
> ISSUE DESCRIPTION
> =
> 
> A x86 HVM guest with PCI pass through devices can force the allocation
> of all IDT vectors on the system by rebooting itself with MSI or MSI-X
> capabilities enabled and entries setup.

(...)

> MITIGATION
> ==
> 
> Not running HVM guests with PCI pass through devices will avoid the
> vulnerability.  Note that even non-malicious guests can trigger this
> vulnerability as part of normal operation.

Does the 'on_reboot="destroy"' mitigate the issue too? Or on_soft_reset?

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?


signature.asc
Description: PGP signature

Xen Security Advisory 360 v1 - IRQ vector leak on x86

2021-01-21 Thread Xen . org security team

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

Xen Security Advisory XSA-360

IRQ vector leak on x86

ISSUE DESCRIPTION
=

A x86 HVM guest with PCI pass through devices can force the allocation
of all IDT vectors on the system by rebooting itself with MSI or MSI-X
capabilities enabled and entries setup.

Such reboots will leak any vectors used by the MSI(-X) entries that the
guest might had enabled, and hence will lead to vector exhaustion on the
system, not allowing further PCI pass through devices to work properly.

IMPACT
==

HVM guests with PCI pass through devices can mount a Denial of Service (DoS)
attack affecting the pass through of PCI devices to other guests or the
hardware domain.  In the latter case this would affect the entire host.

VULNERABLE SYSTEMS
==

Xen versions 4.12.3, 4.12.4, and all versions from 4.13.1 onwards are
vulnerable.  Xen version 4.13.0 and all versions up to 4.12.2 are not
affected.

Only x86 systems running HVM guests with PCI pass through devices are
vulnerable.

MITIGATION
==

Not running HVM guests with PCI pass through devices will avoid the
vulnerability.  Note that even non-malicious guests can trigger this
vulnerability as part of normal operation.

RESOLUTION
==

Applying the appropriate attached patch resolves this issue.

Note that patches for released versions are generally prepared to
apply to the stable branches, and may not apply cleanly to the most
recent release tarball.  Downstreams are encouraged to update to the
tip of the stable branch before applying these patches.

xsa360.patch   xen-unstable
xsa360-4.14.patch  Xen 4.14 - 4.12

$ sha256sum xsa360*
c874ad2b9edb0791ac975735306d055b1916f4acbc59e6f1550fbf33223d6106  xsa360.meta
592f3afda63777d31844e0e34d85fbe387a62d59fa7903ee19b22a98fba68894  xsa360.patch
809515011efb781a2a8742e9acfd76412d3920c2d4142bb187588cd36f77383e  
xsa360-4.14.patch
$

CREDITS
===

This issue was discovered by James McCoy, debugged in combination with
Samuel Verschelde of Vates, and recognised as a security issue by Roger
Pau Monné of Citrix.

NOTE REGARDING LACK OF EMBARGO
==

This was reported and debugged publicly, before the security
implications were apparent.
-BEGIN PGP SIGNATURE-

iQFABAEBCAAqFiEEI+MiLBRfRHX6gGCng/4UyVfoK9kFAmAJixQMHHBncEB4ZW4u
b3JnAAoJEIP+FMlX6CvZh4cH/RyA5POGYEJEj4jHUFK+UmT08Bo6igUBMyJSvAJs
T81eb35E2E2I8P35L7q8OOuLIGPWnTXOGPRnwizr2YF7UhmMm/773q5ellShUBgm
SHtYl+btRaAp6gXB1PhgiETN3EH3aRgn89YBAQmg3U4Zb1RUiB2P2x6pVEGjMfBw
Ks3Zj/ElCtfJcBA6xerNNLuqhwamueCEukw5b8eEHnop+y7TuLordpGGMybpQctx
m04lp7zuJDAeshf47wlMQps79Ysx72CaThVKe/9A09z/c2mcR3m+NbieP7PJPggr
n1I6QEaSUuapszkj+lC/L05tiyHdjXkoNAHwtdPr8jKtbKo=
=YdXv
-END PGP SIGNATURE-


xsa360.meta
Description: Binary data


xsa360.patch
Description: Binary data


xsa360-4.14.patch
Description: Binary data

Re: [PATCH V4 21/24] xen/ioreq: Make x86's send_invalidate_req() common

2021-01-21 Thread Jan Beulich

On 18.01.2021 11:31, Paul Durrant wrote:
>> -Original Message-
>> From: Xen-devel  On Behalf Of 
>> Oleksandr Tyshchenko
>> Sent: 12 January 2021 21:52
>> To: xen-devel@lists.xenproject.org
>> Cc: Oleksandr Tyshchenko ; Jan Beulich 
>> ; Andrew
>> Cooper ; Roger Pau Monné ; 
>> Wei Liu ;
>> George Dunlap ; Ian Jackson ; 
>> Julien Grall
>> ; Stefano Stabellini ; Paul Durrant 
>> ; Julien
>> Grall 
>> Subject: [PATCH V4 21/24] xen/ioreq: Make x86's send_invalidate_req() common
>>
>> From: Oleksandr Tyshchenko 
>>
>> As the IOREQ is a common feature now and we also need to
>> invalidate qemu/demu mapcache on Arm when the required condition
>> occurs this patch moves this function to the common code
>> (and remames it to ioreq_signal_mapcache_invalidate).
>> This patch also moves per-domain qemu_mapcache_invalidate
>> variable out of the arch sub-struct (and drops "qemu" prefix).
>>
>> We don't put this variable inside the #ifdef CONFIG_IOREQ_SERVER
>> at the end of struct domain, but in the hole next to the group
>> of 5 bools further up which is more efficient.
>>
>> The subsequent patch will add mapcache invalidation handling on Arm.
>>
>> Signed-off-by: Oleksandr Tyshchenko 
>> CC: Julien Grall 
>> [On Arm only]
>> Tested-by: Wei Chen 
> 
> Reviewed-by: Paul Durrant 

Applicable parts
Acked-by: Jan Beulich 

Jan

Re: [PATCH V4 16/24] xen/mm: Handle properly reference in set_foreign_p2m_entry() on Arm

2021-01-21 Thread Jan Beulich

On 12.01.2021 22:52, Oleksandr Tyshchenko wrote:
> From: Oleksandr Tyshchenko 
> 
> This patch implements reference counting of foreign entries in
> in set_foreign_p2m_entry() on Arm. This is a mandatory action if
> we want to run emulator (IOREQ server) in other than dom0 domain,
> as we can't trust it to do the right thing if it is not running
> in dom0. So we need to grab a reference on the page to avoid it
> disappearing.
> 
> It is valid to always pass "p2m_map_foreign_rw" type to
> guest_physmap_add_entry() since the current and foreign domains
> would be always different. A case when they are equal would be
> rejected by rcu_lock_remote_domain_by_id(). Besides the similar
> comment in the code put a respective ASSERT() to catch incorrect
> usage in future.
> 
> It was tested with IOREQ feature to confirm that all the pages given
> to this function belong to a domain, so we can use the same approach
> as for XENMAPSPACE_gmfn_foreign handling in xenmem_add_to_physmap_one().
> 
> This involves adding an extra parameter for the foreign domain to
> set_foreign_p2m_entry() and a helper to indicate whether the arch
> supports the reference counting of foreign entries and the restriction
> for the hardware domain in the common code can be skipped for it.
> 
> Signed-off-by: Oleksandr Tyshchenko 
> CC: Julien Grall 
> [On Arm only]
> Tested-by: Wei Chen 

In principle x86 parts
Reviewed-by: Jan Beulich 
However, being a maintainer of ...

> --- a/xen/include/asm-x86/p2m.h
> +++ b/xen/include/asm-x86/p2m.h
> @@ -382,6 +382,22 @@ struct p2m_domain {
>  #endif
>  #include 
>  
> +static inline bool arch_acquire_resource_check(struct domain *d)
> +{
> +/*
> + * The reference counting of foreign entries in set_foreign_p2m_entry()
> + * is not supported for translated domains on x86.
> + *
> + * FIXME: Until foreign pages inserted into the P2M are properly
> + * reference counted, it is unsafe to allow mapping of
> + * resource pages unless the caller is the hardware domain.
> + */
> +if ( paging_mode_translate(d) && !is_hardware_domain(d) )
> +return false;
> +
> +return true;
> +}


... this code, I'd like to ask that such constructs be avoided
and this be a single return statement:

return !paging_mode_translate(d) || is_hardware_domain(d);

I also think you may want to consider dropping the initial
"The" from the comment. I'm further unconvinced "foreign
entries" needs saying when set_foreign_p2m_entry() deals with
exclusively such. In the end the original comment moved here
would probably suffice, no need for any more additions than
perhaps a simple "(see set_foreign_p2m_entry())".

Jan

[PATCH v7 0/7] xl / libxl: named PCI pass-through devices

2021-01-21 Thread Nastya Vicodin

Hi, Wei!

Wei, could you please review?

Regards,
Anastasiia Lukianenko

Re: [PATCH v2 0/2] tools/ocaml/libs/xc: domid control

2021-01-21 Thread Christian Lindig

Acked-by: Christian Lindig 

I'm providing some feedback on https://github.com/edwintorok/xen/pull/1


From: Edwin Török 
Sent: 15 January 2021 22:28
To: xen-devel@lists.xenproject.org
Cc: Edwin Torok; Christian Lindig; David Scott; Ian Jackson; Wei Liu
Subject: [PATCH v2 0/2] tools/ocaml/libs/xc: domid control

For debugging/testing purposes we want to be able to control the domid
from the XAPI toolstack too. Xen supports this since a long time.

For convenience here is a tree with all patch series applied:
https://github.com/edwintorok/xen/pull/1


Edwin Török (2):
  tools/ocaml/xenstored: trim txhistory on xenbus reconnect
  tools/ocaml/libs/xc: backward compatible domid control at domain
creation time

 tools/ocaml/libs/xc/xenctrl.ml  | 5 -
 tools/ocaml/libs/xc/xenctrl.mli | 4 ++--
 tools/ocaml/libs/xc/xenctrl_stubs.c | 6 +++---
 tools/ocaml/xenstored/connection.ml | 2 +-
 tools/ocaml/xenstored/history.ml| 4 
 tools/ocaml/xenstored/process.ml| 4 ++--
 6 files changed, 16 insertions(+), 9 deletions(-)

--
2.29.2

Re: [PATCH v4 0/4] tools/ocaml/xenstored: optimizations

2021-01-21 Thread Christian Lindig

Acked-by: Christian Lindig 

I am providing feedback on https://github.com/edwintorok/xen/pull/1.

In general: this is a large patch and therefore difficult to review for 
correctness. However:

* It comes with a lot of testing and was fuzz-tested
* It improves building OCaml xenstore
* It improves functional and performance problems

I welcome this work and think it ends a period of stagnation of this code base. 
This applies not just to this patch but the other patches sent out by Edwin in 
the context of this.



From: Edwin Török 
Sent: 15 January 2021 22:28
To: xen-devel@lists.xenproject.org
Cc: Edwin Torok; Christian Lindig; David Scott; Ian Jackson; Wei Liu
Subject: [PATCH v4 0/4] tools/ocaml/xenstored: optimizations

Various speed optimizations that have already been posted,
but committing them was delayed to avoid conflicts with XSAs.
The XSAs are out, so these are ready to go now.

The switch to Maps may expose bugs in certain xenstored clients,
which previously relied on iteration order of the DIRECTORY response.

In our testing we found one such case, which turned out to be a bug
in a testsuite (it always dropped the 1st xenstore key).

For convenience here is a tree with all patch series applied:
https://github.com/edwintorok/xen/pull/1

Edwin Török (4):
  tools/ocaml/xenstored: replace hand rolled GC with weak GC references
  tools/ocaml/xenstored: backport find_opt/update from 4.06
  tools/ocaml/xenstored: use more efficient node trees
  tools/ocaml/xenstored: use more efficient tries

 tools/ocaml/xenstored/connection.ml  |  3 --
 tools/ocaml/xenstored/connections.ml |  2 +-
 tools/ocaml/xenstored/history.ml | 14 --
 tools/ocaml/xenstored/stdext.ml  | 19 
 tools/ocaml/xenstored/store.ml   | 51 +---
 tools/ocaml/xenstored/symbol.ml  | 70 +++-
 tools/ocaml/xenstored/symbol.mli | 22 +++--
 tools/ocaml/xenstored/trie.ml| 61 +++-
 tools/ocaml/xenstored/trie.mli   | 26 +--
 tools/ocaml/xenstored/xenstored.ml   | 16 +--
 10 files changed, 109 insertions(+), 175 deletions(-)

--
2.29.2

Re: [PATCH V4 09/24] xen/ioreq: Make x86's IOREQ related dm-op handling common

2021-01-21 Thread Oleksandr




On 21.01.21 12:27, Jan Beulich wrote:

Hi Jan


On 21.01.2021 11:23, Oleksandr wrote:

I would like to clarify regarding do_dm_op() which is identical for both
arches and could *probably* be moved to the common code (we can return
common dm.c back to put it there) and make dm_op() global.
Would you/Paul be happy with that change? Or there are some reasons
(which we are not aware of yet) for not doing it this way?

Probably reasonable to do; the only reason not to that I
could see is that then dm_op() has to become non-static.
Thank you for the clarification. So I will make this change for V5 if no 
objections during these days.

I am going to leave your ack, please let me know if you think otherwise.


--
Regards,

Oleksandr Tyshchenko

Null scheduler and vwfi native problem

2021-01-21 Thread Anders Törnqvist


Hi,

I see a problem with destroy and restart of a domain. Interrupts are not 
available when trying to restart a domain.


The situation seems very similar to the thread "null scheduler bug" 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg01213.html.


The target system is a iMX8-based ARM board and Xen is a 4.13.0 version 
built from https://source.codeaurora.org/external/imx/imx-xen.git.


Xen is booted with sched=null vwfi=native.
One physical CPU core is pinned to the domu.
Some interrupts are passed through to the domu.

When destroying the domain with xl destroy etc it does not complain but 
then when trying to restart the domain

again with a "xl create " I get:
(XEN) IRQ 210 is already used by domain 1

"xl list" does not contain the domain.

Repeating the "xl create" command 5-10 times eventually starts the 
domain without complaining about the IRQ.


Inspired from the discussion in the thread above I have put printks in 
the xen/common/domain.c file.
In the function domain_destroy I have a printk("End of domain_destroy 
function\n") in the end.
In the function complete_domain_destroy have a printk("Begin of 
complete_domain_destroy function\n") in the beginning.


With these printouts I get at "xl destroy":
(XEN) End of domain_destroy function

So it seems like the function complete_domain_destroy is not called.

"xl create" results in:
(XEN) IRQ 210 is already used by domain 1
(XEN) End of domain_destroy function

Then repeated "xl create" looks the same until after a few tries I also get:
(XEN) Begin of complete_domain_destroy function

After that the next "xl create" creates the domain.


I have also applied the patch from 
https://lists.xenproject.org/archives/html/xen-devel/2018-09/msg02469.html. 
This does seem to change the results.


Starting the system without "sched=null vwfi=native" does not result in 
the problem.


BR
Anders

[qemu-mainline test] 158549: regressions - FAIL

2021-01-21 Thread osstest service owner

flight 158549 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/158549/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-libvirt-vhd 19 guest-start/debian.repeat fail REGR. vs. 152631
 test-amd64-amd64-xl-qcow2   21 guest-start/debian.repeat fail REGR. vs. 152631
 test-armhf-armhf-xl-vhd 17 guest-start/debian.repeat fail REGR. vs. 152631

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-libvirt  8 xen-boot   fail pass in 158536

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-check fail in 158536 like 
152631
 test-armhf-armhf-libvirt15 migrate-support-check fail in 158536 never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 152631
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 152631
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 152631
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 152631
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 152631
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 15 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 16 saverestore-support-checkfail never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass

version targeted for testing:
 qemuu48202c712412c803ddb56365c7bca322aa4e7506
baseline version:
 qemuu1d806cef0e38b5db8347a8e12f214d543204a314

Last test of basis   152631  2020-08-20 09:07:46 Z  154 days
Failing since152659  2020-08-21 14:07:39 Z  152 days  316 attempts
Testing same since   158529  2021-01-19 19:38:11 Z1 days3 attempts


355 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm

Re: [PATCH V4 09/24] xen/ioreq: Make x86's IOREQ related dm-op handling common

2021-01-21 Thread Jan Beulich

On 21.01.2021 11:23, Oleksandr wrote:
> I would like to clarify regarding do_dm_op() which is identical for both 
> arches and could *probably* be moved to the common code (we can return 
> common dm.c back to put it there) and make dm_op() global.
> Would you/Paul be happy with that change? Or there are some reasons 
> (which we are not aware of yet) for not doing it this way?

Probably reasonable to do; the only reason not to that I
could see is that then dm_op() has to become non-static.

Jan

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Jan Beulich

On 21.01.2021 10:56, Julien Grall wrote:
> Hi Jan,
> 
> On 21/01/2021 09:43, Jan Beulich wrote:
>> On 21.01.2021 10:30, Michal Orzel wrote:
>>> Fix compilation error when enabling early printk, introduced
>>> by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
>>> ```
>>> debug.S: Assembler messages:
>>> debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
>>> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
>>> debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
>>> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
>>> ```
>>>
>>> The fix is to include header  which now contains
>>> definitions for page/size/mask etc.
>>>
>>> Signed-off-by: Michal Orzel 
>>
>> I'm sorry for the breakage, but I wonder how I should have noticed
>> the issue. In all the Arm .config-s I'm routinely building I can't
>> even see ...
>>
>>> --- a/xen/include/asm-arm/early_printk.h
>>> +++ b/xen/include/asm-arm/early_printk.h
>>> @@ -10,6 +10,7 @@
>>>   #ifndef __ARM_EARLY_PRINTK_H__
>>>   #define __ARM_EARLY_PRINTK_H__
>>>   
>>> +#include 
>>>   
>>>   #ifdef CONFIG_EARLY_PRINTK
>>
>> ... a respective Kconfig setting, i.e. it's not like I simply
>> failed to enable it.
> 
> EARLY_PRINTK is defined in arch/arm/Kconfig.debug and is selected when 
> you specify the UART to use.
> 
> Assuming you are only build testing, you could add the following for 
> testing EARLY_PRINTK:
> 
> CONFIG_DEBUG=y
> CONFIG_EARLY_UART_CHOICE_8250=y
> CONFIG_EARLY_UART_8250=y
> CONFIG_EARLY_PRINTK=y
> CONFIG_EARLY_UART_BASE_ADDRESS=
> CONFIG_EARLY_UART_8250_REG_SHIFT=0
> CONFIG_EARLY_PRINTK_INC="debug-8250.inc"

Ah yes, this works, thanks. The "optional" on the choice isn't
very helpful I suppose, because when going from an existing
.config one would neither find a setting presently turned off
in that .config, nor will there be a prompt.

I suppose any page-aligned base address is fine to use for my
build-testing-only purposes?

Jan

Re: [PATCH V4 09/24] xen/ioreq: Make x86's IOREQ related dm-op handling common

2021-01-21 Thread Oleksandr




On 20.01.21 18:21, Jan Beulich wrote:

Hi Jan


On 12.01.2021 22:52, Oleksandr Tyshchenko wrote:

From: Julien Grall 

As a lot of x86 code can be re-used on Arm later on, this patch
moves the IOREQ related dm-op handling to the common code.

The idea is to have the top level dm-op handling arch-specific
and call into ioreq_server_dm_op() for otherwise unhandled ops.
Pros:
- More natural than doing it other way around (top level dm-op
handling common).
- Leave compat_dm_op() in x86 code.
Cons:
- Code duplication. Both arches have to duplicate do_dm_op(), etc.

Also update XSM code a bit to let dm-op be used on Arm.

This support is going to be used on Arm to be able run device
emulator outside of Xen hypervisor.

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
[On Arm only]
Tested-by: Wei Chen 

Assuming the moved code is indeed just being moved (which is
quite hard to ascertain by just looking at the diff),


I have checked and will double-check again.



applicable parts
Acked-by: Jan Beulich 


Thanks.

I would like to clarify regarding do_dm_op() which is identical for both 
arches and could *probably* be moved to the common code (we can return 
common dm.c back to put it there) and make dm_op() global.
Would you/Paul be happy with that change? Or there are some reasons 
(which we are not aware of yet) for not doing it this way?


Initial discussion happened in [1] (which, let say, suffers from the 
duplication) and more precise in [2].



[1] 
https://lore.kernel.org/xen-devel/1610488352-18494-15-git-send-email-olekst...@gmail.com/
[2] 
https://lore.kernel.org/xen-devel/alpine.DEB.2.21.2101191620050.14528@sstabellini-ThinkPad-T480s/


--
Regards,

Oleksandr Tyshchenko

Re: [PATCH v6 1/3] xen/arm: add support for run_in_exception_handler()

2021-01-21 Thread Jan Beulich

On 21.01.2021 10:50, Julien Grall wrote:
> Hi Jan,
> 
> On 21/01/2021 07:55, Jan Beulich wrote:
>> On 20.01.2021 19:20, Julien Grall wrote:
>>> On 16/01/2021 10:33, Juergen Gross wrote:
 Add support to run a function in an exception handler for Arm. Do it
 as on x86 via a bug_frame, but pass the function pointer via a
 register (this needs to be done that way, because inline asm support
 for 32-bit Arm lacks the needed functionality to reference an
 arbitrary function via the bugframe).
>>>
>>> I was going to commit the series, but then realized the commit message
>>> and comment needs some tweaking because technically GCC is supporting
>>> 'i' (I managed to get it working with -fno-pie).
>>>
>>> So how about:
>>>
>>> "This is needed to be done that way because GCC complains about invalid
>>> constraint when using a function pointer with "i" and PIE is enabled
>>> (default on most of GCC shipped with distro). Clang happily accepts it,
>>> so it may be a bug in GCC."
>>
>> May I ask why you think it's a bug in gcc? I'd rather consider it
>> a bug in clang. Taking the address of a symbol doesn't yield a
>> constant in PIC or PIE code, aiui.
> 
> I consider it a bug because we were using it as:
> 
> .pushsection .bug.frame
> 2b:
> .long (%0 - 2b)
> 
> So I expect the compiler to be able to find the displacement in both 
> cases as we don't need to know the exact address.
> 
> I think Clang is just clever enough to figure out we want a displacement.

If they take apart the contents of the asm(), this may be possible,
yes. (Did you try with -no-integrated-as, btw?) But taking apart
the asm() is a very risky game a compiler would play, as it may
easily break the programmer's intentions (or still fail to recognize
whether the use is okay, for the containing construct being too
complex).

> Do you have a suggestion of constraint that could resolve the issue?

No. Don't use -fpie (but I guess that's not an option for other
reasons).

Jan

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Julien Grall


Hi Jan,

On 21/01/2021 09:43, Jan Beulich wrote:

On 21.01.2021 10:30, Michal Orzel wrote:

Fix compilation error when enabling early printk, introduced
by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
```
debug.S: Assembler messages:
debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
```

The fix is to include header  which now contains
definitions for page/size/mask etc.

Signed-off-by: Michal Orzel 


I'm sorry for the breakage, but I wonder how I should have noticed
the issue. In all the Arm .config-s I'm routinely building I can't
even see ...


--- a/xen/include/asm-arm/early_printk.h
+++ b/xen/include/asm-arm/early_printk.h
@@ -10,6 +10,7 @@
  #ifndef __ARM_EARLY_PRINTK_H__
  #define __ARM_EARLY_PRINTK_H__
  
+#include 
  
  #ifdef CONFIG_EARLY_PRINTK


... a respective Kconfig setting, i.e. it's not like I simply
failed to enable it.


EARLY_PRINTK is defined in arch/arm/Kconfig.debug and is selected when 
you specify the UART to use.


Assuming you are only build testing, you could add the following for 
testing EARLY_PRINTK:


CONFIG_DEBUG=y
CONFIG_EARLY_UART_CHOICE_8250=y
CONFIG_EARLY_UART_8250=y
CONFIG_EARLY_PRINTK=y
CONFIG_EARLY_UART_BASE_ADDRESS=
CONFIG_EARLY_UART_8250_REG_SHIFT=0
CONFIG_EARLY_PRINTK_INC="debug-8250.inc"

Cheers,

--
Julien Grall

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Michal Orzel

Hi Jan,

On 21.01.2021 10:43, Jan Beulich wrote:
> On 21.01.2021 10:30, Michal Orzel wrote:
>> Fix compilation error when enabling early printk, introduced
>> by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
>> ```
>> debug.S: Assembler messages:
>> debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
>> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
>> debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
>> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
>> ```
>>
>> The fix is to include header  which now contains
>> definitions for page/size/mask etc.
>>
>> Signed-off-by: Michal Orzel 
> 
> I'm sorry for the breakage, but I wonder how I should have noticed
> the issue. In all the Arm .config-s I'm routinely building I can't
> even see ...
> 
This is not your fault. Nowadays it is becoming harder and harder to try all 
the possible combinations.
>> --- a/xen/include/asm-arm/early_printk.h
>> +++ b/xen/include/asm-arm/early_printk.h
>> @@ -10,6 +10,7 @@
>>  #ifndef __ARM_EARLY_PRINTK_H__
>>  #define __ARM_EARLY_PRINTK_H__
>>  
>> +#include 
>>  
>>  #ifdef CONFIG_EARLY_PRINTK
> 
> ... a respective Kconfig setting, i.e. it's not like I simply
> failed to enable it.
> 
> Jan
> 

Cheers,
Michal

Re: [PATCH V4 14/24] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2021-01-21 Thread Oleksandr




On 20.01.21 02:23, Stefano Stabellini wrote:

Hi Stefano



On Sun, 17 Jan 2021, Oleksandr wrote:

On 15.01.21 02:55, Stefano Stabellini wrote:

On Tue, 12 Jan 2021, Oleksandr Tyshchenko wrote:

From: Julien Grall 

This patch adds basic IOREQ/DM support on Arm. The subsequent
patches will improve functionality and add remaining bits.

The IOREQ/DM features are supposed to be built with IOREQ_SERVER
option enabled, which is disabled by default on Arm for now.

Please note, the "PIO handling" TODO is expected to left unaddressed
for the current series. It is not an big issue for now while Xen
doesn't have support for vPCI on Arm. On Arm64 they are only used
for PCI IO Bar and we would probably want to expose them to emulator
as PIO access to make a DM completely arch-agnostic. So "PIO handling"
should be implemented when we add support for vPCI.

Signed-off-by: Julien Grall 
Signed-off-by: Oleksandr Tyshchenko 
[On Arm only]
Tested-by: Wei Chen 

---
Please note, this is a split/cleanup/hardening of Julien's PoC:
"Add support for Guest IO forwarding to a device emulator"

Changes RFC -> V1:
 - was split into:
   - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
   - xen/mm: Handle properly reference in set_foreign_p2m_entry() on
Arm
 - update patch description
 - update asm-arm/hvm/ioreq.h according to the newly introduced arch
functions:
   - arch_hvm_destroy_ioreq_server()
   - arch_handle_hvm_io_completion()
 - update arch files to include xen/ioreq.h
 - remove HVMOP plumbing
 - rewrite a logic to handle properly case when hvm_send_ioreq()
returns IO_RETRY
 - add a logic to handle properly handle_hvm_io_completion() return
value
 - rename handle_mmio() to ioreq_handle_complete_mmio()
 - move paging_mark_pfn_dirty() to asm-arm/paging.h
 - remove forward declaration for hvm_ioreq_server in asm-arm/paging.h
 - move try_fwd_ioserv() to ioreq.c, provide stubs if
!CONFIG_IOREQ_SERVER
 - do not remove #ifdef CONFIG_IOREQ_SERVER in memory.c for guarding
xen/ioreq.h
 - use gdprintk in try_fwd_ioserv(), remove unneeded prints
 - update list of #include-s
 - move has_vpci() to asm-arm/domain.h
 - add a comment (TODO) to unimplemented yet handle_pio()
 - remove hvm_mmio_first(last)_byte() and hvm_ioreq_(page/vcpu/server)
structs
   from the arch files, they were already moved to the common code
 - remove set_foreign_p2m_entry() changes, they will be properly
implemented
   in the follow-up patch
 - select IOREQ_SERVER for Arm instead of Arm64 in Kconfig
 - remove x86's realmode and other unneeded stubs from xen/ioreq.h
 - clafify ioreq_t p.df usage in try_fwd_ioserv()
 - set ioreq_t p.count to 1 in try_fwd_ioserv()

Changes V1 -> V2:
 - was split into:
   - arm/ioreq: Introduce arch specific bits for IOREQ/DM features
   - xen/arm: Stick around in leave_hypervisor_to_guest until I/O has
completed
 - update the author of a patch
 - update patch description
 - move a loop in leave_hypervisor_to_guest() to a separate patch
 - set IOREQ_SERVER disabled by default
 - remove already clarified /* XXX */
 - replace BUG() by ASSERT_UNREACHABLE() in handle_pio()
 - remove default case for handling the return value of
try_handle_mmio()
 - remove struct hvm_domain, enum hvm_io_completion, struct
hvm_vcpu_io,
   struct hvm_vcpu from asm-arm/domain.h, these are common materials
now
 - update everything according to the recent changes (IOREQ related
function
   names don't contain "hvm" prefixes/infixes anymore, IOREQ related
fields
   are part of common struct vcpu/domain now, etc)

Changes V2 -> V3:
 - update patch according the "legacy interface" is x86 specific
 - add dummy arch hooks
 - remove dummy paging_mark_pfn_dirty()
 - don’t include  in common ioreq.c
 - don’t include  in arch ioreq.h
 - remove #define ioreq_params(d, i)

Changes V3 -> V4:
 - rebase
 - update patch according to the renaming IO_ -> VIO_ (io_ -> vio_)
   and misc changes to arch hooks
 - update patch according to the IOREQ related dm-op handling changes
 - don't include  from arch header
 - make all arch hooks out-of-line
 - add a comment above IOREQ_STATUS_* #define-s
---
   xen/arch/arm/Makefile   |   2 +
   xen/arch/arm/dm.c   | 122 +++
   xen/arch/arm/domain.c   |   9 ++
   xen/arch/arm/io.c   |  12 ++-
   xen/arch/arm/ioreq.c| 213

   xen/arch/arm/traps.c|  13 +++
   xen/include/asm-arm/domain.h|   3 +
   xen/include/asm-arm/hvm/ioreq.h |  72 ++
   xen/include/asm-arm/mmio.h  |   1 +
   9 files changed, 446 insertions(+), 1 deletion(-)
   create mode 100644 xen/arch/arm/dm.c
   create mode 100644 xen/arch/arm/ioreq.c
   create mode 100644 xen/include/asm-arm/hvm/ioreq.h

Re: [PATCH v6 1/3] xen/arm: add support for run_in_exception_handler()

2021-01-21 Thread Julien Grall


Hi Jan,

On 21/01/2021 07:55, Jan Beulich wrote:

On 20.01.2021 19:20, Julien Grall wrote:

On 16/01/2021 10:33, Juergen Gross wrote:

Add support to run a function in an exception handler for Arm. Do it
as on x86 via a bug_frame, but pass the function pointer via a
register (this needs to be done that way, because inline asm support
for 32-bit Arm lacks the needed functionality to reference an
arbitrary function via the bugframe).


I was going to commit the series, but then realized the commit message
and comment needs some tweaking because technically GCC is supporting
'i' (I managed to get it working with -fno-pie).

So how about:

"This is needed to be done that way because GCC complains about invalid
constraint when using a function pointer with "i" and PIE is enabled
(default on most of GCC shipped with distro). Clang happily accepts it,
so it may be a bug in GCC."


May I ask why you think it's a bug in gcc? I'd rather consider it
a bug in clang. Taking the address of a symbol doesn't yield a
constant in PIC or PIE code, aiui.


I consider it a bug because we were using it as:

.pushsection .bug.frame
2b:
.long (%0 - 2b)

So I expect the compiler to be able to find the displacement in both 
cases as we don't need to know the exact address.


I think Clang is just clever enough to figure out we want a displacement.

Do you have a suggestion of constraint that could resolve the issue?

Cheers,

--
Julien Grall

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Bertrand Marquis

Hi Michal,

> On 21 Jan 2021, at 09:30, Michal Orzel  wrote:
> 
> Fix compilation error when enabling early printk, introduced
> by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
> ```
> debug.S: Assembler messages:
> debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
> debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
> ```
> 
> The fix is to include header  which now contains
> definitions for page/size/mask etc.
> 
> Signed-off-by: Michal Orzel 
Reviewed-by: Bertrand Marquis 

Thanks a lot, this is fixing the compilation issues.

Cheers
Bertrand

> ---
> xen/include/asm-arm/early_printk.h | 1 +
> 1 file changed, 1 insertion(+)
> 
> diff --git a/xen/include/asm-arm/early_printk.h 
> b/xen/include/asm-arm/early_printk.h
> index d5485decfa..8dc911cf48 100644
> --- a/xen/include/asm-arm/early_printk.h
> +++ b/xen/include/asm-arm/early_printk.h
> @@ -10,6 +10,7 @@
> #ifndef __ARM_EARLY_PRINTK_H__
> #define __ARM_EARLY_PRINTK_H__
> 
> +#include 
> 
> #ifdef CONFIG_EARLY_PRINTK
> 
> -- 
> 2.29.0
>

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Jan Beulich

On 21.01.2021 10:30, Michal Orzel wrote:
> Fix compilation error when enabling early printk, introduced
> by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
> ```
> debug.S: Assembler messages:
> debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
> debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
> x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
> ```
> 
> The fix is to include header  which now contains
> definitions for page/size/mask etc.
> 
> Signed-off-by: Michal Orzel 

I'm sorry for the breakage, but I wonder how I should have noticed
the issue. In all the Arm .config-s I'm routinely building I can't
even see ...

> --- a/xen/include/asm-arm/early_printk.h
> +++ b/xen/include/asm-arm/early_printk.h
> @@ -10,6 +10,7 @@
>  #ifndef __ARM_EARLY_PRINTK_H__
>  #define __ARM_EARLY_PRINTK_H__
>  
> +#include 
>  
>  #ifdef CONFIG_EARLY_PRINTK

... a respective Kconfig setting, i.e. it's not like I simply
failed to enable it.

Jan

Re: [PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Julien Grall


Hi Michal,

On 21/01/2021 09:30, Michal Orzel wrote:

Fix compilation error when enabling early printk, introduced
by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
```
debug.S: Assembler messages:
debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
```

The fix is to include header  which now contains
definitions for page/size/mask etc.



Fixes: aa4b9d1ee653 ("include: don't use asm/page.h from common headers")


Signed-off-by: Michal Orzel 


Reviewed-by: Julien Grall 


---
  xen/include/asm-arm/early_printk.h | 1 +
  1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-arm/early_printk.h 
b/xen/include/asm-arm/early_printk.h
index d5485decfa..8dc911cf48 100644
--- a/xen/include/asm-arm/early_printk.h
+++ b/xen/include/asm-arm/early_printk.h
@@ -10,6 +10,7 @@
  #ifndef __ARM_EARLY_PRINTK_H__
  #define __ARM_EARLY_PRINTK_H__
  
+#include 
  
  #ifdef CONFIG_EARLY_PRINTK
  



Cheers,

--
Julien Grall

[PATCH] xen/arm: Fix compilation error when early printk is enabled

2021-01-21 Thread Michal Orzel

Fix compilation error when enabling early printk, introduced
by commit aa4b9d1ee6538b5cbe218d4d3fcdf9548130a063:
```
debug.S: Assembler messages:
debug.S:31: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
debug.S:38: Error: constant expression expected at operand 2 -- `ldr 
x15,=((0x0040+(0)*PAGE_SIZE)+(0x1c09&~PAGE_MASK))`
```

The fix is to include header  which now contains
definitions for page/size/mask etc.

Signed-off-by: Michal Orzel 
---
 xen/include/asm-arm/early_printk.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/include/asm-arm/early_printk.h 
b/xen/include/asm-arm/early_printk.h
index d5485decfa..8dc911cf48 100644
--- a/xen/include/asm-arm/early_printk.h
+++ b/xen/include/asm-arm/early_printk.h
@@ -10,6 +10,7 @@
 #ifndef __ARM_EARLY_PRINTK_H__
 #define __ARM_EARLY_PRINTK_H__
 
+#include 
 
 #ifdef CONFIG_EARLY_PRINTK
 
-- 
2.29.0

Re: [PATCH V4 14/24] arm/ioreq: Introduce arch specific bits for IOREQ/DM features

2021-01-21 Thread Oleksandr




On 20.01.21 21:47, Stefano Stabellini wrote:

Hi Julien, Stefano



On Wed, 20 Jan 2021, Julien Grall wrote:

Hi Stefano,

On 20/01/2021 00:50, Stefano Stabellini wrote:

On Tue, 19 Jan 2021, Oleksandr wrote:

diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 40b9e59..0508bd8 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -101,12 +101,10 @@ enum io_state try_fwd_ioserv(struct cpu_user_regs
*regs,

   bool arch_ioreq_complete_mmio(void)
   {
-    struct vcpu *v = current;
   struct cpu_user_regs *regs = guest_cpu_user_regs();
   const union hsr hsr = { .bits = regs->hsr };
-    paddr_t addr = v->io.req.addr;

-    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
+    if ( handle_ioserv(regs, current) == IO_HANDLED )
   {
   advance_pc(regs, hsr);
   return true;

Yes, but I think we want to keep the check

  vio->req.state == STATE_IORESP_READY

So maybe (uncompiled, untested):

  if ( v->io.req.state != STATE_IORESP_READY )
  return false;

Is it possible to reach this function with v->io.req.state !=
STATE_IORESP_READY? If not, then I would suggest to add an
ASSERT_UNREACHABLE() before the return.

If I am reading the state machine right it should *not* be possible to
get here with v->io.req.state != STATE_IORESP_READY, so yes,
ASSERT_UNREACHABLE() would work.
Agree here. If the assumption is not correct (unlikely), I think I will 
catch this during testing.
In addition, we can probably drop case STATE_IORESP_READY in 
try_fwd_ioserv().



[not tested]


diff --git a/xen/arch/arm/ioreq.c b/xen/arch/arm/ioreq.c
index 40b9e59..c7ee1a7 100644
--- a/xen/arch/arm/ioreq.c
+++ b/xen/arch/arm/ioreq.c
@@ -71,9 +71,6 @@ enum io_state try_fwd_ioserv(struct cpu_user_regs *regs,
 case STATE_IOREQ_NONE:
 break;

-    case STATE_IORESP_READY:
-    return IO_HANDLED;
-
 default:
 gdprintk(XENLOG_ERR, "wrong state %u\n", vio->req.state);
 return IO_ABORT;
@@ -104,9 +101,14 @@ bool arch_ioreq_complete_mmio(void)
 struct vcpu *v = current;
 struct cpu_user_regs *regs = guest_cpu_user_regs();
 const union hsr hsr = { .bits = regs->hsr };
-    paddr_t addr = v->io.req.addr;

-    if ( try_handle_mmio(regs, hsr, addr) == IO_HANDLED )
+    if ( v->io.req.state != STATE_IORESP_READY )
+    {
+    ASSERT_UNREACHABLE();
+    return false;
+    }
+
+    if ( handle_ioserv(regs, v) == IO_HANDLED )
 {
 advance_pc(regs, hsr);
 return true;


--
Regards,

Oleksandr Tyshchenko

Re: [PATCH v6 1/3] xen/arm: add support for run_in_exception_handler()

2021-01-21 Thread Jan Beulich

On 21.01.2021 09:00, Jürgen Groß wrote:
> On 21.01.21 08:55, Jan Beulich wrote:
>> On 20.01.2021 19:20, Julien Grall wrote:
>>> On 16/01/2021 10:33, Juergen Gross wrote:
 Add support to run a function in an exception handler for Arm. Do it
 as on x86 via a bug_frame, but pass the function pointer via a
 register (this needs to be done that way, because inline asm support
 for 32-bit Arm lacks the needed functionality to reference an
 arbitrary function via the bugframe).
>>>
>>> I was going to commit the series, but then realized the commit message
>>> and comment needs some tweaking because technically GCC is supporting
>>> 'i' (I managed to get it working with -fno-pie).
>>>
>>> So how about:
>>>
>>> "This is needed to be done that way because GCC complains about invalid
>>> constraint when using a function pointer with "i" and PIE is enabled
>>> (default on most of GCC shipped with distro). Clang happily accepts it,
>>> so it may be a bug in GCC."
>>
>> May I ask why you think it's a bug in gcc? I'd rather consider it
>> a bug in clang. Taking the address of a symbol doesn't yield a
>> constant in PIC or PIE code, aiui.
> 
> Maybe we should just not add the suspect of a bug in the compiler to
> either commit message or a comment.
> 
> It could be a bug in gcc, or just a shortcoming (feature combination
> not supported).
> 
> It could be a bug in clang, or clang is clever enough to produce
> correct code for "i" + PIE.

I think the last option can be excluded. There simply is no room
for cleverness. If an insn requires an immediate operand, the
compiler has to find a compile time constant (possibly needing a
relocation, but only PC-relative ones are permitted then). The
compiler may in particular not inspect the insn(s) specified and
try to substitute them.

"i" (with a symbol ref) and PIE (or PIC) simply are incompatible
with one another. One could imagine trickery requiring agreement
between programmer and compiler, where the programmer would be
to specify the relocation to use (and then do the necessary
calculations to yield the actual symbol address), but that's not
applicable here.

I've experimented a little with gcc10 on x86-64. It behaves quite
strangely, accepting some symbol references but not others (see
example below). None of them get accepted with -fpic, and the ones
that do get accepted with -fpie don't result in position
independent code (or my understanding thereof is wrong), or to be
precise in relocations that will have to be carried into the final
executable because of being dependent upon the resulting program's
load address. I'm pondering whether to open a bug ...

Jan

void efn(void);
void(*efp)(void);

void*test(int i) {
void*res;

efn();
efp();

switch(i) {
case 0:
asm("mov %1,%0" : "=r" (res) : "i" (test));
break;
#ifndef __PIE__
case 1:
asm("mov %1,%0" : "=r" (res) : "i" (efn));
break;
#endif
case 2:
asm("mov %1,%0" : "=r" (res) : "i" ());
break;
default:
res = (void*)0;
break;
}

return res;
}

1 2 >

1 - 100 of 103 matches

Mail list logo