Re: [PATCH v4 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-20 Thread Jan Beulich
On 20.07.2023 23:01, Shawn Anastasio wrote:
> On 7/19/23 9:05 AM, Jan Beulich wrote:
>> Before you/we grow more assembly code, may I re-raise a request regarding
>> readability: I think it would be nice if operands started at a fixed column,
>> unless the insn mnemonic is unusually long. Where exactly to draw the line
>> is up to each archtecture; on x86 we use 8 positions from the start of the
>> mnemonic.
> 
> There is quite a large variance in mnemonic length on ppc -- many common
> mnemonics only use 2 characters (e.g. ld, mr) while other common ones
> use 6+ (e.g. rldicr, the mtspr family, etc.). Enforcing a column size
> that's too short would make the longer mnemonics look misaligned and out
> of place, but using a longer column length (like 8) that can accommodate
> most common mnemonics adds too much space between short mnemonics and
> their arguments.

Common length is 3 on x86, and as said we use 8.

> That said if you still feel strongly about this then I am not strongly
> opposed to adding an 8-space column alignment.

I certainly think it helps readability a lot. 8 also matches the common
use (fair parts of gas'es testsuite, Linux) of hard tabs.

Jan



Re: [PATCH v2 1/4] build: make cc-option properly deal with unrecognized sub-options

2023-07-20 Thread Jan Beulich
On 19.07.2023 11:43, Jan Beulich wrote:
> In options like -march=, it may be only the sub-option which is
> unrecognized by the compiler. In such an event the error message often
> splits option and argument, typically saying something like "bad value
> '' for ''. Extend the grep invocation accordingly,
> also accounting for Clang to not mention e.g. -march at all when an
> incorrect argument was given for it.
> 
> To keep things halfway readable, re-wrap and re-indent the entire
> construct.
> 
> Signed-off-by: Jan Beulich 
> ---
> In principle -e "$$pat" could now be omitted from the grep invocation,
> since if that matches, both $$opt and $$arg will, too. But I thought I'd
> leave it for completeness.
> ---
> v2: Further relax grep patterns for clang, which doesn't mention -march
> when complaining about an invalid argument to it.

I wonder whether it would be sufficient (and a little less lax) ...

> --- a/Config.mk
> +++ b/Config.mk
> @@ -90,9 +90,14 @@ PYTHON_PREFIX_ARG ?= --prefix="$(prefix)
>  # of which would indicate an "unrecognized command-line option" 
> warning/error.
>  #
>  # Usage: cflags-y += $(call cc-option,$(CC),-march=winchip-c6,-march=i586)
> -cc-option = $(shell if test -z "`echo 'void*p=1;' | \
> -  $(1) $(2) -c -o /dev/null -x c - 2>&1 | grep -- 
> $(2:-Wa$(comma)%=%) -`"; \
> -  then echo "$(2)"; else echo "$(3)"; fi ;)
> +cc-option = $(shell pat='$(2:-Wa$(comma)%=%)'; \
> +opt="$${pat%%=*}" arg="$${pat\#*=}"; \
> +if test -z "`echo 'void*p=1;' | \
> + $(1) $(2) -c -o /dev/null -x c - 2>&1 | \
> + grep -e "$$pat" -e "$$opt" -e "$$arg" -`"; \

... to check for only $$arg here (which will be the same as $$pat when
there's no = in the full option).

In either case there's likely going to be an issue with options taking
very simple (e.g. plain numeric) arguments.

Jan

> +then echo "$(2)"; \
> +else echo "$(3)"; \
> +fi;)
>  
>  # cc-option-add: Add an option to compilation flags, but only if supported.
>  # Usage: $(call cc-option-add CFLAGS,CC,-march=winchip-c6)
> 
> 




[qemu-mainline test] 181918: tolerable FAIL - PUSHED

2023-07-20 Thread osstest service owner
flight 181918 qemu-mainline real [real]
flight 181942 qemu-mainline real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/181918/
http://logs.test-lab.xenproject.org/osstest/logs/181942/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl-vhd  13 guest-start fail pass in 181942-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-vhd 14 migrate-support-check fail in 181942 never pass
 test-armhf-armhf-xl-vhd 15 saverestore-support-check fail in 181942 never pass
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 180691
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 180691
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 180691
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 180691
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 180691
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 180691
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180691
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180691
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass

version targeted for testing:
 qemuu2c27fdc7a626408ee2cf30d791aa0b63027c7404
baseline version:
 qemuu6972ef1440a9d685482d78672620a7482f2bd09a

Last test of basis   180691  2023-05-17 10:45:22 Z   64 days
Failing since180699  2023-05-18 07:21:24 Z   63 days  127 attempts
Testing same since   181918  2023-07-20 06:54:14 Z0 days1 attempts


People who touched revisions under test:
  Afonso Bordado 
  Akihiko Odaki 
  Akihiro Suda 
  Alex 

Re: [PATCH v2 3/3] [FUTURE] xen/arm: enable vPCI for domUs

2023-07-20 Thread Stewart Hildebrand
On 7/7/23 07:04, Rahul Singh wrote:
> Hi Stewart,
> 
>> On 7 Jul 2023, at 2:47 am, Stewart Hildebrand  
>> wrote:
>>
>> Remove is_hardware_domain check in has_vpci, and select 
>> HAS_VPCI_GUEST_SUPPORT
>> in Kconfig.
>>
>> [1] 
>> https://lists.xenproject.org/archives/html/xen-devel/2023-06/msg00863.html
>>
>> Signed-off-by: Stewart Hildebrand 
>> ---
>> As the tag implies, this patch is not intended to be merged (yet).
>>
>> Note that CONFIG_HAS_VPCI_GUEST_SUPPORT is not currently used in the upstream
>> code base. It will be used by the vPCI series [1]. This patch is intended to 
>> be
>> merged as part of the vPCI series.
>>
>> v1->v2:
>> * new patch
>> ---
>> xen/arch/arm/Kconfig  | 1 +
>> xen/arch/arm/include/asm/domain.h | 2 +-
>> 2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/xen/arch/arm/Kconfig b/xen/arch/arm/Kconfig
>> index 4e0cc421ad48..75dfa2f5a82d 100644
>> --- a/xen/arch/arm/Kconfig
>> +++ b/xen/arch/arm/Kconfig
>> @@ -195,6 +195,7 @@ config PCI_PASSTHROUGH
>> depends on ARM_64
>> select HAS_PCI
>> select HAS_VPCI
>> + select HAS_VPCI_GUEST_SUPPORT
> 
> I tested this series on top of "SMMU handling for PCIe Passthrough on ARM” 
> series on the N1SDP board
> and observe the SMMUv3 fault.

Thanks for testing this. After a great deal of tinkering, I can reproduce the 
SMMU fault.

(XEN) smmu: /axi/smmu@fd80: Unhandled context fault: fsr=0x402, 
iova=0xf9030040, fsynr=0x12, cb=0

> Enable the Kconfig option PCI_PASSTHROUGH, ARM_SMMU_V3,HAS_ITS and "iommu=on”,
> "pci_passthrough_enabled=on" cmd line parameter and after that, there is an 
> SMMU fault
> for the ITS doorbell register access from the PCI devices.
> 
> As there is no upstream support for ARM for vPCI MSI/MSI-X handling because 
> of that SMMU fault is observed.
> 
> Linux Kernel will set the ITS doorbell register( physical address of doorbell 
> register as IOMMU is not enabled in Kernel)
> in PCI config space to set up the MSI-X interrupts, but there is no mapping 
> in SMMU page tables because of that SMMU
> fault is observed. To fix this we need to map the ITS doorbell register to 
> SMMU page tables to avoid the fault.
> 
> We can fix this after setting the mapping for the ITS doorbell offset in the 
> ITS code.
> 
> diff --git a/xen/arch/arm/vgic-v3-its.c b/xen/arch/arm/vgic-v3-its.c
> index 299b384250..8227a7a74b 100644
> --- a/xen/arch/arm/vgic-v3-its.c
> +++ b/xen/arch/arm/vgic-v3-its.c
> @@ -682,6 +682,18 @@ static int its_handle_mapd(struct virt_its *its, 
> uint64_t *cmdptr)
>   BIT(size, UL), valid);
>  if ( ret && valid )
>  return ret;
> +
> +if ( is_iommu_enabled(its->d) ) {
> +ret = map_mmio_regions(its->d, 
> gaddr_to_gfn(its->doorbell_address),
> +   PFN_UP(ITS_DOORBELL_OFFSET),
> +   maddr_to_mfn(its->doorbell_address));
> +if ( ret < 0 )
> +{
> +printk(XENLOG_ERR "GICv3: Map ITS translation register d%d 
> failed.\n",
> +its->d->domain_id);
> +return ret;
> +}
> +}
>  }

Thank you, this resolves the SMMU fault. If it's okay, I will include this 
patch in the next revision of the SMMU series (I see your Signed-off-by is 
already in the attachment).

> Also as per Julien's request, I tried to set up the IOMMU for the PCI device 
> without
> "pci_passthroigh_enable=on" and without HAS_VPCI everything works as expected
> after applying below patches.
> 
> To test enable kconfig options HAS_PCI, ARM_SMMU_V3 and HAS_ITS and add below
> patches to make it work.
> 
> • Set the mapping for the ITS doorbell offset in the ITS code when iommu 
> is enabled.
> • Reverted the patch that added the support for pci_passthrough_on.
> • Allow MMIO mapping of ECAM space to dom0 when vPCI is not enabled, as 
> of now MMIO
>   mapping for ECAM is based on pci_passthrough_enabled. We need this 
> patch if we want to avoid
>  enabling HAS_VPCI
> 
> Please find the attached patches in case you want to test at your end.
> 
> 
> 
> Regards,
> Rahul
> 
>> default n
>> help
>>  This option enables PCI device passthrough
>> diff --git a/xen/arch/arm/include/asm/domain.h 
>> b/xen/arch/arm/include/asm/domain.h
>> index 1a13965a26b8..6e016b00bae1 100644
>> --- a/xen/arch/arm/include/asm/domain.h
>> +++ b/xen/arch/arm/include/asm/domain.h
>> @@ -298,7 +298,7 @@ static inline void arch_vcpu_block(struct vcpu *v) {}
>>
>> #define arch_vm_assist_valid_mask(d) (1UL << 
>> VMASST_TYPE_runstate_update_flag)
>>
>> -#define has_vpci(d) ({ IS_ENABLED(CONFIG_HAS_VPCI) && 
>> is_hardware_domain(d); })
>> +#define has_vpci(d)({ (void)(d); IS_ENABLED(CONFIG_HAS_VPCI); })
>>
>> struct arch_vcpu_io {
>> struct instr_details dabt_instr; /* when the instruction is decoded */
>> --
>> 2.41.0
>>
>>
> 



Re: [PATCH] tools/xenstore: fix get_spec_node()

2023-07-20 Thread Juergen Gross

On 21.07.23 00:45, Julien Grall wrote:

Hi Juergen,

On 20/07/2023 16:08, Juergen Gross wrote:

In case get_spec_node() is being called for a special node starting
with '@' it won't set *canonical_name. This can result in a crash of
xenstored due to dereferencing the uninitialized name in
fire_watches().

This is no security issue as it requires either a privileged caller or
ownership of the special node in question by an unprivileged caller
(which is questionable, as this would make the owner privileged in some
way).

Fixes: d6bb63924fc2 ("tools/xenstore: introduce dummy nodes for special watch 
paths")

Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a1d3047e48..790c403904 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1252,8 +1252,11 @@ static struct node *get_spec_node(struct connection 
*conn, const void *ctx,

    const char *name, char **canonical_name,
    unsigned int perm)
  {
-    if (name[0] == '@')
+    if (name[0] == '@') {
+    if (canonical_name)
+    *canonical_name = (char *)name;


eww. Let's not continue the bad practice in Xenstored to cast away the const. I 
will have a look to remove the const and you can rebase your patch on top.


I think it should be possible to make canonical_name const. I'll look into that.


Juergen



OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


[xen-unstable-smoke test] 181941: tolerable all pass - PUSHED

2023-07-20 Thread osstest service owner
flight 181941 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181941/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  1ab2ae1610d99423af5b810829959431e43de12d
baseline version:
 xen  4bf014c6f7d7cc9a9e017cef0eb5ff4bf27526e9

Last test of basis   181923  2023-07-20 09:02:01 Z0 days
Testing same since   181941  2023-07-20 23:03:34 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jens Wiklander 
  Julien Grall 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   4bf014c6f7..1ab2ae1610  1ab2ae1610d99423af5b810829959431e43de12d -> smoke



Re: [PATCH v8 11/13] vpci: add initial support for virtual PCI bus topology

2023-07-20 Thread Volodymyr Babchuk


Hi Jan,

Jan Beulich  writes:

> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
>> --- a/xen/drivers/vpci/vpci.c
>> +++ b/xen/drivers/vpci/vpci.c
>> @@ -46,6 +46,16 @@ void vpci_remove_device(struct pci_dev *pdev)
>>  return;
>>  
>>  spin_lock(>vpci->lock);
>> +
>> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +if ( pdev->vpci->guest_sbdf.sbdf != ~0 )
>> +{
>> +__clear_bit(pdev->vpci->guest_sbdf.dev,
>> +>domain->vpci_dev_assigned_map);
>> +pdev->vpci->guest_sbdf.sbdf = ~0;
>> +}
>> +#endif
>
> The lock acquired above is not ...

vpci_remove_device() is called when d->pci_lock is already held.

But, I'll move this hunk before spin_lock(>vpci->lock); we don't
need to hold it while cleaning vpci_dev_assigned_map

>> @@ -115,6 +129,54 @@ int vpci_add_handlers(struct pci_dev *pdev)
>>  }
>>  
>>  #ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
>> +static int add_virtual_device(struct pci_dev *pdev)
>> +{
>> +struct domain *d = pdev->domain;
>> +pci_sbdf_t sbdf = { 0 };
>> +unsigned long new_dev_number;
>> +
>> +if ( is_hardware_domain(d) )
>> +return 0;
>> +
>> +ASSERT(pcidevs_locked());
>> +
>> +/*
>> + * Each PCI bus supports 32 devices/slots at max or up to 256 when
>> + * there are multi-function ones which are not yet supported.
>> + */
>> +if ( pdev->info.is_extfn )
>> +{
>> +gdprintk(XENLOG_ERR, "%pp: only function 0 passthrough supported\n",
>> + >sbdf);
>> +return -EOPNOTSUPP;
>> +}
>> +
>> +write_lock(>domain->pci_lock);
>> +new_dev_number = find_first_zero_bit(d->vpci_dev_assigned_map,
>> + VPCI_MAX_VIRT_DEV);
>> +if ( new_dev_number >= VPCI_MAX_VIRT_DEV )
>> +{
>> +write_unlock(>domain->pci_lock);
>> +return -ENOSPC;
>> +}
>> +
>> +__set_bit(new_dev_number, >vpci_dev_assigned_map);
>
> ... the same as the one held here, so the bitmap still isn't properly
> protected afaics, unless the intention is to continue to rely on
> the global PCI lock (assuming that one's held in both cases, which I
> didn't check it is). Conversely it looks like the vPCI lock isn't
> held here. Both aspects may be intentional, but the locks being
> acquired differing requires suitable code comments imo.

As I stated above, vpci_remove_device() is called when d->pci_lock is
already held.


> I've also briefly looked at patch 1, and I'm afraid that still lacks
> commentary about intended lock nesting. That might be relevant here
> in case locking visible from patch / patch context isn't providing
> the full picture.
>

There is
ASSERT(rw_is_write_locked(>domain->pci_lock));
at the beginning of vpci_remove_device(), which is added by
"vpci: use per-domain PCI lock to protect vpci structure".

I believe, it will be more beneficial to review series from the
beginning.

>> +/*
>> + * Both segment and bus number are 0:
>> + *  - we emulate a single host bridge for the guest, e.g. segment 0
>> + *  - with bus 0 the virtual devices are seen as embedded
>> + *endpoints behind the root complex
>> + *
>> + * TODO: add support for multi-function devices.
>> + */
>> +sbdf.devfn = PCI_DEVFN(new_dev_number, 0);
>> +pdev->vpci->guest_sbdf = sbdf;
>> +write_unlock(>domain->pci_lock);
>
> With the above I also wonder whether this lock can't (and hence
> should) be dropped a little earlier (right after fiddling with the
> bitmap).

This is the good observation, thanks.

-- 
WBR, Volodymyr


Re: [PATCH V2 2/2] xen: privcmd: Add support for irqfd

2023-07-20 Thread kernel test robot
Hi Viresh,

kernel test robot noticed the following build errors:

[auto build test ERROR on xen-tip/linux-next]
[also build test ERROR on linus/master v6.5-rc2 next-20230720]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Viresh-Kumar/xen-privcmd-Add-support-for-irqfd/20230720-173905
base:   https://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git linux-next
patch link:
https://lore.kernel.org/r/a25d5f01fe9b4624aa12cab77abd001044ea02d5.1689845210.git.viresh.kumar%40linaro.org
patch subject: [PATCH V2 2/2] xen: privcmd: Add support for irqfd
config: arm64-randconfig-r026-20230720 
(https://download.01.org/0day-ci/archive/20230721/202307210852.ukq5f98v-...@intel.com/config)
compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project.git 
4a5ac14ee968ff0ad5d2cc1ffa0299048db4c88a)
reproduce: 
(https://download.01.org/0day-ci/archive/20230721/202307210852.ukq5f98v-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202307210852.ukq5f98v-...@intel.com/

All errors (new ones prefixed by >>):

>> drivers/xen/privcmd.c:961:12: error: call to undeclared function 
>> 'eventfd_ctx_fileget'; ISO C99 and later do not support implicit function 
>> declarations [-Wimplicit-function-declaration]
 961 | eventfd = eventfd_ctx_fileget(f.file);
 |   ^
   drivers/xen/privcmd.c:961:12: note: did you mean 'eventfd_ctx_fdget'?
   include/linux/eventfd.h:56:35: note: 'eventfd_ctx_fdget' declared here
  56 | static inline struct eventfd_ctx *eventfd_ctx_fdget(int fd)
 |   ^
>> drivers/xen/privcmd.c:961:10: error: incompatible integer to pointer 
>> conversion assigning to 'struct eventfd_ctx *' from 'int' [-Wint-conversion]
 961 | eventfd = eventfd_ctx_fileget(f.file);
 | ^ ~~~
   2 errors generated.


vim +/eventfd_ctx_fileget +961 drivers/xen/privcmd.c

   936  
   937  static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
   938  {
   939  struct privcmd_kernel_irqfd *kirqfd, *tmp;
   940  struct eventfd_ctx *eventfd;
   941  __poll_t events;
   942  struct fd f;
   943  int ret;
   944  
   945  kirqfd = kzalloc(sizeof(*kirqfd), GFP_KERNEL);
   946  if (!kirqfd)
   947  return -ENOMEM;
   948  
   949  kirqfd->irq = irqfd->irq;
   950  kirqfd->dom = irqfd->dom;
   951  kirqfd->level = irqfd->level;
   952  INIT_LIST_HEAD(>list);
   953  INIT_WORK(>shutdown, irqfd_shutdown);
   954  
   955  f = fdget(irqfd->fd);
   956  if (!f.file) {
   957  ret = -EBADF;
   958  goto error_kfree;
   959  }
   960  
 > 961  eventfd = eventfd_ctx_fileget(f.file);
   962  if (IS_ERR(eventfd)) {
   963  ret = PTR_ERR(eventfd);
   964  goto error_fd_put;
   965  }
   966  
   967  kirqfd->eventfd = eventfd;
   968  
   969  /*
   970   * Install our own custom wake-up handling so we are notified 
via a
   971   * callback whenever someone signals the underlying eventfd.
   972   */
   973  init_waitqueue_func_entry(>wait, irqfd_wakeup);
   974  init_poll_funcptr(>pt, irqfd_poll_func);
   975  
   976  mutex_lock(_lock);
   977  
   978  list_for_each_entry(tmp, _list, list) {
   979  if (kirqfd->eventfd == tmp->eventfd) {
   980  ret = -EBUSY;
   981  mutex_unlock(_lock);
   982  goto error_eventfd;
   983  }
   984  }
   985  
   986  list_add_tail(>list, _list);
   987  mutex_unlock(_lock);
   988  
   989  /*
   990   * Check if there was an event already pending on the eventfd 
before we
   991   * registered, and trigger it as if we didn't miss it.
   992   */
   993  events = vfs_poll(f.file, >pt);
   994  if (events & EPOLLIN)
   995  irqfd_inject(kirqfd);
   996  
   997  /*
   998   * Do not drop the file until the kirqfd is fully initialized, 
otherwise
   999   * we might race against the EPOLLHUP.
  1000   */
  1001  fdput(f);
  1002  return 0;
  1003  
  1004  error_eventfd:
  1005  eventfd_ctx_put(eventfd);
  1006  
  1007  error_fd_put:
  

[PATCH] docs/misra: add Rule 1.1 and 5.6

2023-07-20 Thread Stefano Stabellini
From: Stefano Stabellini 

Rule 1.1 is uncontroversial and we are already following it.

Rule 5.6 has been deemed a good rule to have by the MISRA C group.
However, we do have a significant amount of violations that will take
time to resolve and might require partial deviations in the form of
in-code comments or MISRA C scanners special configurations (ECLAIR).
For new code, we want this rule to apply hence the addition to
docs/misra/rules.rst.

Signed-off-by: Stefano Stabellini 
---
 docs/misra/rules.rst | 12 
 1 file changed, 12 insertions(+)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 29a777938a..9406ff0d8f 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -82,6 +82,13 @@ maintainers if you want to suggest a change.
  - Summary
  - Notes
 
+   * - `Rule 1.1 
`_
+ - Required
+ - The program shall contain no violations of the standard C syntax
+   and constraints, and shall not exceed the implementation's
+   translation limits
+ -
+
* - `Rule 1.3 
`_
  - Required
  - There shall be no occurrence of undefined or critical unspecified
@@ -156,6 +163,11 @@ maintainers if you want to suggest a change.
headers (xen/include/public/) are allowed to retain longer
identifiers for backward compatibility.
 
+   * - `Rule 5.6 
`_
+ - Required
+ - A typedef name shall be a unique identifier
+ -
+
* - `Rule 6.1 
`_
  - Required
  - Bit-fields shall only be declared with an appropriate type
-- 
2.25.1




Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock

2023-07-20 Thread Volodymyr Babchuk


Hi Jan,

Jan Beulich  writes:

> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>  
>>  if ( devfn == pdev->devfn && pdev->domain != target )
>>  {
>> -list_move(>domain_list, >pdev_list);
>> -pdev->domain = target;
>> +write_lock(>domain->pci_lock);
>> +list_del(>domain_list);
>> +write_unlock(>domain->pci_lock);
>
> As mentioned on an earlier version, perhaps better (cheaper) to use
> "source" here? (Same in VT-d code then.)

Sorry, I saw you comment for the previous version, but missed to include
this change. It will be done in the next version.

>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>  ret = 0;
>>  if ( !pdev->domain )
>>  {
>> +write_lock(_domain->pci_lock);
>>  pdev->domain = hardware_domain;
>>  list_add(>domain_list, _domain->pdev_list);
>>  
>> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>  printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>>  list_del(>domain_list);
>>  pdev->domain = NULL;
>> +write_unlock(_domain->pci_lock);
>>  goto out;
>
> In addition to Roger's comments about locking scope: In a case like this
> one it would probably also be good to move the printk() out of the locked
> area. It can be slow, after all.
>
> Question is why you have this wide a locked area here in the first place:
> Don't you need to hold the lock just across the two list operations (but
> not in between)?

Strictly speaking yes, we need to hold lock only when operating on the
list. For now. Next patch will use the same lock to protect the VPCI
(de)alloction, so locked region will be extended anyways.

I think, I'll decrease locked area in this patch and increase in the
next one, it will be most logical.


>> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t 
>> seg, uint8_t bus,
>>  
>>  int pci_release_devices(struct domain *d)
>>  {
>> -struct pci_dev *pdev, *tmp;
>> -u8 bus, devfn;
>> -int ret;
>> +int combined_ret;
>> +LIST_HEAD(failed_pdevs);
>>  
>>  pcidevs_lock();
>> -ret = arch_pci_clean_pirqs(d);
>> -if ( ret )
>> +write_lock(>pci_lock);
>> +combined_ret = arch_pci_clean_pirqs(d);
>> +if ( combined_ret )
>>  {
>>  pcidevs_unlock();
>> -return ret;
>> +write_unlock(>pci_lock);
>> +return combined_ret;
>>  }
>> -list_for_each_entry_safe ( pdev, tmp, >pdev_list, domain_list )
>> +
>> +while ( !list_empty(>pdev_list) )
>>  {
>> -bus = pdev->bus;
>> -devfn = pdev->devfn;
>> -ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
>> +struct pci_dev *pdev = list_first_entry(>pdev_list,
>> +struct pci_dev,
>> +domain_list);
>> +uint16_t seg = pdev->seg;
>> +uint8_t bus = pdev->bus;
>> +uint8_t devfn = pdev->devfn;
>> +int ret;
>> +
>> +write_unlock(>pci_lock);
>> +ret = deassign_device(d, seg, bus, devfn);
>> +write_lock(>pci_lock);
>> +if ( ret )
>> +{
>> +bool still_present = false;
>> +const struct pci_dev *tmp;
>> +
>> +/*
>> + * We need to check if deassign_device() left our pdev in
>> + * domain's list. As we dropped the lock, we can't be sure
>> + * that list wasn't permutated in some random way, so we
>> + * need to traverse the whole list.
>> + */
>> +for_each_pdev ( d, tmp )
>> +{
>> +if ( tmp == pdev )
>> +{
>> +still_present = true;
>> +break;
>> +}
>> +}
>> +if ( still_present )
>> +list_move(>domain_list, _pdevs);
>
> In order to retain original ordering on the resulting list, perhaps better
> list_move_tail()?

Yes, thanks.


-- 
WBR, Volodymyr


Re: [PATCH LINUX v5 2/2] xen: add support for initializing xenstore later as HVM domain

2023-07-20 Thread Stefano Stabellini
On Thu, 20 Jul 2023, Petr Mladek wrote:
> On Wed 2023-07-19 18:46:08, Stefano Stabellini wrote:
> > On Wed, 19 Jul 2023, Petr Mladek wrote:
> > > On Fri 2022-05-13 14:19:38, Stefano Stabellini wrote:
> > > > From: Luca Miccio 
> > > > 
> > > > When running as dom0less guest (HVM domain on ARM) the xenstore event
> > > > channel is available at domain creation but the shared xenstore
> > > > interface page only becomes available later on.
> > > > 
> > > > In that case, wait for a notification on the xenstore event channel,
> > > > then complete the xenstore initialization later, when the shared page
> > > > is actually available.
> > > > 
> > > > The xenstore page has few extra field. Add them to the shared struct.
> > > > One of the field is "connection", when the connection is ready, it is
> > > > zero. If the connection is not-zero, wait for a notification.
> > > 
> > > I see the following warning from free_irq() in 6.5-rc2 when running
> > > livepatching selftests. It does not happen after reverting this patch.
> > > 
> > > [  352.168453] livepatch: signaling remaining tasks
> > > [  352.173228] [ cut here ]
> > > [  352.175563] Trying to free already-free IRQ 0
> > > [  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
> > > free_irq+0xbf/0x350
> > > [  352.179942] Modules linked in: test_klp_livepatch(EK)
> > > [  352.181621] CPU: 1 PID: 88 Comm: xenbus_probe Kdump: loaded Tainted: G 
> > >E K6.5.0-rc2-default+ #535
> > > [  352.184754] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
> > > [  352.188214] RIP: 0010:free_irq+0xbf/0x350
> > > [  352.192211] Code: 7a 08 75 0e e9 36 02 00 00 4c 3b 7b 08 74 5a 48 89 
> > > da 48 8b 5a 18 48 85 db 75 ee 44 89 f6 48 c7 c7 58 b0 8b 86 e8 21 0a f5 
> > > ff <0f> 0b 48 8b 34 24 4c 89 ef e8 53 bb e3 00 
> > > 48 8b 45 40 48 8b 40 78
> > > [  352.200079] RSP: 0018:af0440b4be80 EFLAGS: 00010086
> > > [  352.201465] RAX:  RBX: 99f105116c80 RCX: 
> > > 0003
> > > [  352.203324] RDX: 8003 RSI: 8691d4bc RDI: 
> > > 
> > > [  352.204989] RBP: 99f100052000 R08:  R09: 
> > > c0007fff
> > > [  352.206253] R10: af0440b4bd18 R11: af0440b4bd10 R12: 
> > > 99f1000521e8
> > > [  352.207451] R13: 99f1000520a8 R14:  R15: 
> > > 86f42360
> > > [  352.208787] FS:  () GS:99f15a40() 
> > > knlGS:
> > > [  352.210061] CS:  0010 DS:  ES:  CR0: 80050033
> > > [  352.210815] CR2: 7f8415d56000 CR3: 000105e36003 CR4: 
> > > 00370ee0
> > > [  352.211867] DR0:  DR1:  DR2: 
> > > 
> > > [  352.212912] DR3:  DR6: fffe0ff0 DR7: 
> > > 0400
> > > [  352.213951] Call Trace:
> > > [  352.214390]  
> > > [  352.214717]  ? __warn+0x81/0x170
> > > [  352.215436]  ? free_irq+0xbf/0x350
> > > [  352.215906]  ? report_bug+0x10b/0x200
> > > [  352.216408]  ? prb_read_valid+0x17/0x20
> > > [  352.216926]  ? handle_bug+0x44/0x80
> > > [  352.217409]  ? exc_invalid_op+0x13/0x60
> > > [  352.217932]  ? asm_exc_invalid_op+0x16/0x20
> > > [  352.218497]  ? free_irq+0xbf/0x350
> > > [  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
> > > [  352.219600]  xenbus_probe+0x7a/0x80
> > > [  352.221030]  xenbus_probe_thread+0x76/0xc0
> > > [  352.221416]  ? __pfx_autoremove_wake_function+0x10/0x10
> > > [  352.221882]  kthread+0xfd/0x130
> > > [  352.222191]  ? __pfx_kthread+0x10/0x10
> > > [  352.222544]  ret_from_fork+0x2d/0x50
> > > [  352.222893]  ? __pfx_kthread+0x10/0x10
> > > [  352.223260]  ret_from_fork_asm+0x1b/0x30
> > > [  352.223629] RIP: :0x0
> > > [  352.223931] Code: Unable to access opcode bytes at 0xffd6.
> > > [  352.224488] RSP: : EFLAGS:  ORIG_RAX: 
> > > 
> > > [  352.225044] RAX:  RBX:  RCX: 
> > > 
> > > [  352.225571] RDX:  RSI:  RDI: 
> > > 
> > > [  352.226106] RBP:  R08:  R09: 
> > > 
> > > [  352.226632] R10:  R11:  R12: 
> > > 
> > > [  352.227171] R13:  R14:  R15: 
> > > 
> > > [  352.227710]  
> > > [  352.227917] irq event stamp: 22
> > > [  352.228209] hardirqs last  enabled at (21): [] 
> > > ___slab_alloc+0x68e/0xc80
> > > [  352.228914] hardirqs last disabled at (22): [] 
> > > _raw_spin_lock_irqsave+0x8d/0x90
> > > [  352.229546] softirqs last  enabled at (0): [] 
> > > copy_process+0xaae/0x1fd0
> > > [  352.230079] softirqs last disabled at (0): [<>] 0x0
> > > [  352.230503] ---[ end trace  ]---
> > > 
> > > , where the message 

[libvirt test] 181917: tolerable all pass - PUSHED

2023-07-20 Thread osstest service owner
flight 181917 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181917/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181890
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181890
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181890
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  fa0d5f4ebc0aa178d9dea278914f9149a4c4af54
baseline version:
 libvirt  aece25f66517a327c2a6bde4d06b432d415ed7da

Last test of basis   181890  2023-07-19 04:20:19 Z1 days
Testing same since   181917  2023-07-20 04:21:51 Z0 days1 attempts


People who touched revisions under test:
  Boris Fiuczynski 
  Han Han 
  Jonathon Jongsma 
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs


Re: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses

2023-07-20 Thread Stefano Stabellini
On Thu, 20 Jul 2023, Julien Grall wrote:
> (+ Juergen)
> 
> On 19/07/2023 17:13, Andrei Cherechesu (OSS) wrote:
> > Hello,
> 
> Hi Andrei,
> 
> > As we're running Xen 4.17 (with platform-related support added) on NXP S32G
> > SoCs (ARMv8), with a custom Linux distribution built through Yocto, and
> > we've set some Xen-based demos up, we encountered some issues which we think
> > might not be related to our hardware. For additional context, the Linux
> > kernel version we're running is 5.15.96-rt (with platform-related support
> > added as well).
> > 
> > The setup to reproduce the problem is fairly simple: after booting a Dom0
> > (can provide configuration details if needed), we're booting a normal PV
> > DomU with PV Networking. Additionally, the VMs have k3s (Lightweight
> > Kubernetes - version v1.25.8+k3s1:
> > https://github.com/k3s-io/k3s/releases/tag/v1.25.8%2Bk3s1) installed in
> > their rootfs'es.
> > 
> > The problem is that the DomU console hangs (no new output is shown, no input
> > can be sent) some time (non-deterministic, sometimes 5 seconds, other times
> > like 15-20 seconds) after we run the `k3s server` command. We have this
> > command running as part of a sysvinit service, and the same behavior can be
> > observed in that case as well. The k3s version we use is the one mentioned
> > in the paragraph above, but this can be reproduced with other versions as
> > well (i.e., v1.21.11, v1.22.6). If the `k3s server` command is ran in the
> > Dom0 VM, everything works fine. Using DomU as an agent node is also working
> > fine, only when it is run as a server the console problem occurs.
> > 
> > Immediately after the serial console hangs, we can still log in on DomU
> > using SSH, and we can observe the following messages its dmesg:
> > [   57.905806] xencons: Illegal ring page indices
> 
> Looking at Linux code, this message is printed in a couple of place in the
> xenconsole driver.
> 
> I would assume that this is printed when reading from the buffer (otherwise
> you would not see any message). Can you confirm it?
> 
> Also, can you provide the indices that Linux considers buggy?
> 
> Lastly, it seems like the barrier used are incorrect. It should be the
> virt_*() version rather than a plain mb()/wmb(). I don't think it matter for
> arm64 though (I am assuming you are not running 32-bit).
> 
> > [   59.399620] xenbus: error -5 while reading message
> 
> So this message is coming from the xenbus driver (used to read the xenstore
> ring). This is -EIO, and AFAICT returned when the indices are also incorrect.
> 
> For this driver, I think there is also a TOCTOU because a compiler is free to
> reload intf->rsp_cons after the check. Moving virt_mb() is probably not
> sufficient. You would also want to use ACCESS_ONCE().
> 
> What I find odd is you have two distinct rings (xenconsole and xenbus) with
> similar issues. Above, you said you are using Linux RT. I wonder if this has a
> play into the issue because if I am not mistaken, the two functions would now
> be fully preemptible.
> 
> This could expose some races. For instance, there are some missing
> ACCESS_ONCE() (as mentioned above).
> 
> In particular, Xenstored (I haven't checked xenconsoled) is using += to update
> intf->rsp_cons. There is no guarantee that the update will be atomic.
> 
> Overall, I am not 100% sure what I wrote is related. But that's probably a
> good start of things that can be exacerbated with Linux RT.
> 
> > [   59.399649] xenbus: error -5 while writing message
> 
> This is in xenbus as well. But this time in the write part. The analysis I
> wrote above for the read part can be applied here.

This is really strange. What is also strange is that somehow the indexes
recover after 10-15 seconds? How is that even possible. Let's say there
is a memory corruption of some sort, maybe due to missing barriers like
Julien suggested, how can it go back to normal after a while?

I am really confused. I would try with regular Linux instead of Linux RT
and also would try to replace all the barriers in
drivers/tty/hvc/hvc_xen.c with their virt_* version to see if we can
narrow down the problem a bit.


Keep in mind that during PV network operations grants are used, which
involve mapping pages at the backend and changing the MMU/IOMMU
pagetables to introduce the new mapping. After the DMA operation,
typically the page is unmapped and removed from the pagetable.

Is it possible that the pagetable change is causing the problem, and
when the mapping is removed everything goes back to normal?

I don't know how that could happen, but the mapping and unmapping of the
page is something ongoing which could break things then go back to
normal. One thing you could try is to force all DMA operations to go via
swiotlb-xen in Linux:

diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
index 3d826c0b5fee..f78d86f1bb9c 100644
--- a/arch/arm/xen/mm.c
+++ b/arch/arm/xen/mm.c
@@ -112,8 +112,7 @@ bool xen_arch_need_swiotlb(struct device *dev,
 

Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock

2023-07-20 Thread Volodymyr Babchuk

Hi Roger,

Roger Pau Monné  writes:

> On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>> Add per-domain d->pci_lock that protects access to
>> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
>> that underlying pdev will not disappear under feet. This is a rw-lock,
>> but this patch adds only write_lock()s. There will be read_lock()
>> users in the next patches.
>> 
>> This lock should be taken in write mode every time d->pdev_list is
>> altered. This covers both accesses to d->pdev_list and accesses to
>> pdev->domain_list fields. All write accesses also should be protected
>> by pcidevs_lock() as well. Idea is that any user that wants read
>> access to the list or to the devices stored in the list should use
>> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
>> this two locks will ensure only that pdev of interest will not
>> disappear from under feet and that the pdev still will be assigned to
>> the same domain. Of course, any new users should use pcidevs_lock()
>> when it is appropriate (e.g. when accessing any other state that is
>> protected by the said lock).
>
> I think this needs a note about the ordering:
>
> "In case both the newly introduced per-domain rwlock and the pcidevs
> lock is taken, the later must be acquired first."

Thanks. Added.

>> 
>> Any write access to pdev->domain_list should be protected by both
>> pcidevs_lock() and d->pci_lock in the write mode.
>
> You also protect calls to vpci_remove_device() with the per-domain
> pci_lock it seems, and that will need some explanation as it's not
> obvious.

Well, strictly speaking, it is not required in this patch. But it is
needed in the next one. I can lock only "list_del(>domain_list);"
end extend then locked area in the next patch. On other hand, this patch
already protects vpci_add_handlers() call in the pci_add_device() due to
the code layout, so it may be natural to protect vpci_remove_device() as
well. What is your opinion?

>> 
>> Suggested-by: Roger Pau Monné 
>> Suggested-by: Jan Beulich 
>> Signed-off-by: Volodymyr Babchuk 
>> 
>> ---
>> 
>> Changes in v8:
>>  - New patch
>> 
>> Changes in v8 vs RFC:
>>  - Removed all read_locks after discussion with Roger in #xendevel
>>  - pci_release_devices() now returns the first error code
>>  - extended commit message
>>  - added missing lock in pci_remove_device()
>>  - extended locked region in pci_add_device() to protect list_del() calls
>> ---
>>  xen/common/domain.c |  1 +
>>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>>  xen/drivers/passthrough/pci.c   | 68 +
>>  xen/drivers/passthrough/vtd/iommu.c |  9 ++-
>>  xen/include/xen/sched.h |  1 +
>>  5 files changed, 74 insertions(+), 14 deletions(-)
>> 
>> diff --git a/xen/common/domain.c b/xen/common/domain.c
>> index caaa402637..5d8a8836da 100644
>> --- a/xen/common/domain.c
>> +++ b/xen/common/domain.c
>> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>>  
>>  #ifdef CONFIG_HAS_PCI
>>  INIT_LIST_HEAD(>pdev_list);
>> +rwlock_init(>pci_lock);
>>  #endif
>>  
>>  /* All error paths can depend on the above setup. */
>> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
>> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> index 94e3775506..e2f2e2e950 100644
>> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
>> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>>  
>>  if ( devfn == pdev->devfn && pdev->domain != target )
>>  {
>> -list_move(>domain_list, >pdev_list);
>> -pdev->domain = target;
>
> You seem to have inadvertently dropped the above line? (and so devices
> would keep the previous pdev->domain value)
>

Oops, yes. Thank you. I was testing those patches on Intel machine, so
AMD part left not verified.

>> +write_lock(>domain->pci_lock);
>> +list_del(>domain_list);
>> +write_unlock(>domain->pci_lock);
>> +
>> +write_lock(>pci_lock);
>> +list_add(>domain_list, >pdev_list);
>> +write_unlock(>pci_lock);
>>  }
>>  
>>  /*
>> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
>> index 95846e84f2..5b4632ead2 100644
>> --- a/xen/drivers/passthrough/pci.c
>> +++ b/xen/drivers/passthrough/pci.c
>> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>>  if ( pdev->domain )
>>  return;
>>  pdev->domain = dom_xen;
>> +write_lock(_xen->pci_lock);
>>  list_add(>domain_list, _xen->pdev_list);
>> +write_unlock(_xen->pci_lock);
>>  }
>>  
>>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
>> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>>  ret = 0;
>>  if ( !pdev->domain )
>>  {
>> +write_lock(_domain->pci_lock);
>>  pdev->domain = hardware_domain;
>>  

Re: [PATCH] tools/xenstore: fix get_spec_node()

2023-07-20 Thread Julien Grall

Hi Juergen,

On 20/07/2023 16:08, Juergen Gross wrote:

In case get_spec_node() is being called for a special node starting
with '@' it won't set *canonical_name. This can result in a crash of
xenstored due to dereferencing the uninitialized name in
fire_watches().

This is no security issue as it requires either a privileged caller or
ownership of the special node in question by an unprivileged caller
(which is questionable, as this would make the owner privileged in some
way).

Fixes: d6bb63924fc2 ("tools/xenstore: introduce dummy nodes for special watch 
paths")
Signed-off-by: Juergen Gross 
---
  tools/xenstore/xenstored_core.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a1d3047e48..790c403904 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1252,8 +1252,11 @@ static struct node *get_spec_node(struct connection 
*conn, const void *ctx,
  const char *name, char **canonical_name,
  unsigned int perm)
  {
-   if (name[0] == '@')
+   if (name[0] == '@') {
+   if (canonical_name)
+   *canonical_name = (char *)name;


eww. Let's not continue the bad practice in Xenstored to cast away the 
const. I will have a look to remove the const and you can rebase your 
patch on top.


Cheers,

--
Julien Grall



Re: [PATCH 07/10] x86 boot: define paddr_t and add macros for typedefing struct pointers

2023-07-20 Thread Christopher Clark
On Sat, Jul 8, 2023 at 3:24 PM Stefano Stabellini 
wrote:

> On Sat, 1 Jul 2023, Christopher Clark wrote:
> > Pointer fields within structs need to be defined as fixed size types in
> > the x86 boot build environment. Using a typedef for the field type
> > rather than a struct pointer type enables the type definition to
> > be changed in the 32-bit boot build and the main hypervisor build,
> > allowing for a single common structure definition and a common header
> file.
>
> Sorry for my ignorance, but why?
>
> struct boot_module is not used as part of any ABI, right? It is
> populated by Xen at boot by hand. Why do we need a specific memory
> layout for it?
>

Fair question! In the early x86 boot logic, which runs in 32-bit CPU mode,
struct boot_module is allocated and populated, so the structure needs to be
defined and available to code that is compiled in 32-bit to do that. The
same structures are also accessed later in 64-bit hypervisor logic, and the
memory layout of the structure needs to be the same in both cases, so we
want all the fields to be fixed-width types, and that includes pointers.

These macros help with declaring pointers as always-64-bit-sized struct
fields in a single definition of the struct. They're not strictly necessary
though - providing alternative definitions for typedefs can be used
instead, and I've been looking at doing that since posting this patch.

Christopher



>
>
>
> > Introduces DEFINE_STRUCT_PTR_TYPE and DEFINE_PTR_TYPE which will
> > generate typedefs with a _ptr_t suffix for pointers to the specified
> > type. This is then used in  for pointers within structs
> > as preparation for using these headers in the x86 boot build.
> >
> > The 32-bit behaviour is obtained by inclusion of "defs.h" first with a
> > check for such an existing definition on the  version.
> >
> > paddr_t is used in  so a definition is added here to
> > the x86 boot environment defs.h header.
> >
> > Signed-off-by: Christopher Clark 
> > Signed-off-by: Daniel P. Smith 
>
>
> > ---
> > Changes since v2: This is two v2 patches merged into one for v3.
> > Changes since v1: New in v2 of series.
> >
> >  xen/arch/x86/boot/defs.h|  9 +
> >  xen/arch/x86/include/asm/bootinfo.h |  4 +++-
> >  xen/include/xen/bootinfo.h  |  9 +
> >  xen/include/xen/types.h | 11 +++
> >  4 files changed, 28 insertions(+), 5 deletions(-)
> >
> > diff --git a/xen/arch/x86/boot/defs.h b/xen/arch/x86/boot/defs.h
> > index f9840044ec..bc0f1b5cf8 100644
> > --- a/xen/arch/x86/boot/defs.h
> > +++ b/xen/arch/x86/boot/defs.h
> > @@ -60,4 +60,13 @@ typedef u64 uint64_t;
> >  #define U16_MAX  ((u16)(~0U))
> >  #define UINT_MAX (~0U)
> >
> > +typedef unsigned long long paddr_t;
> > +
> > +#define DEFINE_STRUCT_PTR_TYPE(struct_name) \
> > +typedef uint64_t struct_name ## _ptr_t;
> > +
> > +#define DEFINE_PTR_TYPE(type) \
> > +typedef uint64_t type ## _ptr_t;
> > +DEFINE_PTR_TYPE(char);
> > +
> >  #endif /* __BOOT_DEFS_H__ */
> > diff --git a/xen/arch/x86/include/asm/bootinfo.h
> b/xen/arch/x86/include/asm/bootinfo.h
> > index 30c27980e0..989fb7a1da 100644
> > --- a/xen/arch/x86/include/asm/bootinfo.h
> > +++ b/xen/arch/x86/include/asm/bootinfo.h
> > @@ -6,6 +6,7 @@ struct arch_bootmodule {
> >  uint32_t flags;
> >  unsigned headroom;
> >  };
> > +DEFINE_STRUCT_PTR_TYPE(arch_bootmodule);
> >
> >  struct arch_boot_info {
> >  uint32_t flags;
> > @@ -14,11 +15,12 @@ struct arch_boot_info {
> >  #define BOOTINFO_FLAG_X86_MEMMAP   1U << 6
> >  #define BOOTINFO_FLAG_X86_LOADERNAME   1U << 9
> >
> > -char *boot_loader_name;
> > +char_ptr_t boot_loader_name;
> >
> >  uint32_t mmap_length;
> >  paddr_t mmap_addr;
> >  };
> > +DEFINE_STRUCT_PTR_TYPE(arch_boot_info);
> >
> >  struct __packed mb_memmap {
> >  uint32_t size;
> > diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
> > index 2f4284a91f..8389da4f72 100644
> > --- a/xen/include/xen/bootinfo.h
> > +++ b/xen/include/xen/bootinfo.h
> > @@ -35,17 +35,18 @@ struct boot_module {
> >  mfn_t mfn;
> >  size_t size;
> >
> > -struct arch_bootmodule *arch;
> > +arch_bootmodule_ptr_t arch;
> >  struct boot_string string;
> >  };
> > +DEFINE_STRUCT_PTR_TYPE(boot_module);
> >
> >  struct boot_info {
> > -char *cmdline;
> > +char_ptr_t cmdline;
> >
> >  unsigned int nr_mods;
> > -struct boot_module *mods;
> > +boot_module_ptr_t mods;
> >
> > -struct arch_boot_info *arch;
> > +arch_boot_info_ptr_t arch;
> >  };
> >
> >  #endif
> > diff --git a/xen/include/xen/types.h b/xen/include/xen/types.h
> > index 6aba80500a..e807ffe255 100644
> > --- a/xen/include/xen/types.h
> > +++ b/xen/include/xen/types.h
> > @@ -71,4 +71,15 @@ typedef bool bool_t;
> >  #define test_and_set_bool(b)   xchg(&(b), true)
> >  #define test_and_clear_bool(b) xchg(&(b), false)
> >
> > +#ifndef DEFINE_STRUCT_PTR_TYPE
> > +#define 

Re: [XEN PATCH] automation: add ECLAIR pipeline

2023-07-20 Thread Marek Marczykowski-Górecki
On Thu, Jul 20, 2023 at 11:20:29PM +0200, Simone Ballarin wrote:
> +# ECLAIR configuration files are maintened by BUGSENG
> +export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no"
> +[ -d ECLAIR_scripts ] || git clone 
> ssh://g...@git.bugseng.com/eclair/scripts/XEN ECLAIR_scripts
> +(cd ECLAIR_scripts; git pull --rebase)

I'd suggest to print here commit id of the scripts repo, so the
logs will keep that information.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab


signature.asc
Description: PGP signature


Re: [PATCH 04/10] x86 setup: porting dom0 construction logic to boot module structures

2023-07-20 Thread Christopher Clark
On Sat, Jul 8, 2023 at 12:15 PM Stefano Stabellini 
wrote:

> On Sat, 1 Jul 2023, Christopher Clark wrote:
> > Adjust the PV and PVH dom0 construction entry points to take boot module
> > structures as parameters, and add further fields to the boot module
> > structures to plumb the data needed to support this use. Populate these
> > from the multiboot module data.
> >
> > This change removes multiboot from the PV and PVH dom0 construction
> logic.
> >
> > Introduce and use new inline accessor functions for navigating the boot
> > module structures.
> >
> > The per-boot-module arrays are expanded from singletons to accommodate
> > all modules, up to a static maximum of 64 modules including Xen that can
> > be accepted from a bootloader to match the previous value from the
> > module map check.
> >
> > The field that identifies the type of a boot module (kernel, ramdisk,
> > etc) is introduced to the common boot module structure and declared as a
> > non-enum integer type to allow the field to be of a known-size and so
> > structure can be packed in a subsequent patch in the series, and it will
> > then be reconciled with the equivalent Arm boot field type.
> >
> > The command line provided by multiboot for each boot module is added
> > directly to the boot_module structure, which is appropriate for this
> > logic just replacing multiboot.
> >
> > The maximum number of boot modules that a bootloader can provide in
> > addition to the Xen hypervisor is preserved from prior logic with the
> > module_map at 63.
> >
> > Signed-off-by: Christopher Clark 
> > Signed-off-by: Daniel P. Smith 
> >
> > ---
> > Changes since v1: patch is a subset of v1 series patches 2 and 3.
> > - The module_map is kept for now since still in use.
> > - Move the static inline functions into a separate dedicated header.
> > -  and  replace prior inclusion of 
> >   for simpler dependencies.
> >
> >  xen/arch/x86/dom0_build.c |  10 +-
> >  xen/arch/x86/hvm/dom0_build.c |  43 +++---
> >  xen/arch/x86/include/asm/boot.h   |  36 +
> >  xen/arch/x86/include/asm/bootinfo.h   |  24 +++
> >  xen/arch/x86/include/asm/dom0_build.h |  13 +-
> >  xen/arch/x86/include/asm/setup.h  |   4 +-
> >  xen/arch/x86/pv/dom0_build.c  |  32 ++--
> >  xen/arch/x86/setup.c  | 206 +++---
> >  xen/include/xen/bootinfo.h|  27 
> >  9 files changed, 254 insertions(+), 141 deletions(-)
> >
> > diff --git a/xen/arch/x86/dom0_build.c b/xen/arch/x86/dom0_build.c
> > index 9f5300a3ef..42310202a2 100644
> > --- a/xen/arch/x86/dom0_build.c
> > +++ b/xen/arch/x86/dom0_build.c
> > @@ -4,6 +4,7 @@
> >   * Copyright (c) 2002-2005, K A Fraser
> >   */
> >
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -562,9 +563,8 @@ int __init dom0_setup_permissions(struct domain *d)
> >  return rc;
> >  }
> >
> > -int __init construct_dom0(struct domain *d, const module_t *image,
> > -  unsigned long image_headroom, module_t
> *initrd,
> > -  char *cmdline)
> > +int __init construct_dom0(struct domain *d, const struct boot_module
> *image,
> > +struct boot_module *initrd, char *cmdline)
> >  {
> >  int rc;
> >
> > @@ -576,9 +576,9 @@ int __init construct_dom0(struct domain *d, const
> module_t *image,
> >  process_pending_softirqs();
> >
> >  if ( is_hvm_domain(d) )
> > -rc = dom0_construct_pvh(d, image, image_headroom, initrd,
> cmdline);
> > +rc = dom0_construct_pvh(d, image, initrd, cmdline);
> >  else if ( is_pv_domain(d) )
> > -rc = dom0_construct_pv(d, image, image_headroom, initrd,
> cmdline);
> > +rc = dom0_construct_pv(d, image, initrd, cmdline);
> >  else
> >  panic("Cannot construct Dom0. No guest interface available\n");
> >
> > diff --git a/xen/arch/x86/hvm/dom0_build.c
> b/xen/arch/x86/hvm/dom0_build.c
> > index 56fe89632b..c094863bb8 100644
> > --- a/xen/arch/x86/hvm/dom0_build.c
> > +++ b/xen/arch/x86/hvm/dom0_build.c
> > @@ -8,9 +8,9 @@
> >   */
> >
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> > -#include 
> >  #include 
> >  #include 
> >
> > @@ -530,14 +530,13 @@ static paddr_t __init find_memory(
> >  return INVALID_PADDR;
> >  }
> >
> > -static int __init pvh_load_kernel(struct domain *d, const module_t
> *image,
> > -  unsigned long image_headroom,
> > -  module_t *initrd, void *image_base,
> > -  char *cmdline, paddr_t *entry,
> > -  paddr_t *start_info_addr)
> > +static int __init pvh_load_kernel(
> > +struct domain *d, const struct boot_module *image,
> > +struct boot_module *initrd, void *image_base, char *cmdline,
> paddr_t *entry,
> > +paddr_t *start_info_addr)
> >  {
> > -void *image_start = image_base + image_headroom;
> > -unsigned long image_len = image->mod_end;
> > 

Re: [PATCH] tools/xenstore: fix XSA-417 patch

2023-07-20 Thread Julien Grall

Hi Juergen,

On 20/07/2023 16:04, Juergen Gross wrote:

The fix for XSA-417 had a bug: domain_alloc_permrefs() will not return
a negative value in case of an error, but a plain errno value.

Note this is not considered to be a security issue, as the only case
where domain_alloc_permrefs() will return an error is a failed memory
allocation. As a guest should not be able to drive Xenstore out of
memory, this is NOT a problem a guest can trigger at will.

Fixes: ab128218225d ("tools/xenstore: fix checking node permissions")
Signed-off-by: Juergen Gross 


Acked-by: Julien Grall 

Cheers,

--
Julien Grall



Re: [XEN PATCH] automation: add ECLAIR pipeline

2023-07-20 Thread Stefano Stabellini
On Thu, 20 Jul 2023, Simone Ballarin wrote:
> Add two pipelines that analyze an ARM64 and a X86_64 build with the
> ECLAIR static analyzer on the guidelines contained in Set1.
> 
> The tool configuration is kept external to the xen repository for
> practical reasons, it will be included in a subsequent phase.
> 
> All commits on the xen-project/xen:staging branch will be analyzed
> and their artifacts will be stored indefinitely; the integration will
> report differential information with respect to the previous analysis.
> 
> All commits on other branches or repositories will be analyzed and
> only the last ten artifacts will be kept; the integration will report
> differential information with respect to the analysis done on the common
> ancestor with xen-project/xen:staging (if available).
> 
> Currently the pipeline variable ENABLE_ECLAIR_BOT is set to "n".
> Doing so disables the generation of comments with the analysis summary
> on the commit threads. The variable can be set to "y" if the a masked
> variable named ECLAIRIT_TOKEN is set with the impersonation token of
> an account with enough privileges to write on all repositories.
> 
> Additionaly any repository should be able to read a masked variable
> named WTOKEN with the token provided by BUGSENG.
> 
> Signed-off-by: Simone Ballarin 

Thanks for the patch!

Patchew automatically picked it up from xen-devel and started a pipeline
here:

https://gitlab.com/xen-project/patchew/xen/-/pipelines/939440592

However the eclair-x86_64 job failed with:

ERROR: Uploading artifacts as "archive" to coordinator... 413 Payload
Too Large

Also the eclair-ARM64 job failed but it is not clear to me why.

I think at least initially we should mark the two Eclair jobs with:

  allow_failure: true

until we are sure they work reliably all the time. Otherwise we end up
blocking the whole Xen staging pipeline if we make any mistakes here. We
can remove "allow_failure: true" once we are sure it works well all the
time.


The second thing I noticed is that the build phase didn't start until
the analyze phase was concluded. This is not good because it would
increase the overall time significantly. We need the build/test phases
to start in parallel. To do that you need to add the following change to
this patch:


diff --git a/automation/gitlab-ci/build.yaml b/automation/gitlab-ci/build.yaml
index c401f62d61..f01e2c32bb 100644
--- a/automation/gitlab-ci/build.yaml
+++ b/automation/gitlab-ci/build.yaml
@@ -11,6 +11,7 @@
   - '*.log'
   - '*/*.log'
 when: always
+  needs: []
   except:
 - master
 - smoke



> ---
>  .gitlab-ci.yml|  2 ++
>  automation/gitlab-ci/analyze.yaml | 38 +++
>  automation/scripts/eclair | 26 +
>  3 files changed, 66 insertions(+)
>  create mode 100644 automation/gitlab-ci/analyze.yaml
>  create mode 100755 automation/scripts/eclair
> 
> diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
> index c8bd7519d5..ee5430b8b7 100644
> --- a/.gitlab-ci.yml
> +++ b/.gitlab-ci.yml
> @@ -1,7 +1,9 @@
>  stages:
> +  - analyze
>- build
>- test
>  
>  include:
> +  - 'automation/gitlab-ci/analyze.yaml'
>- 'automation/gitlab-ci/build.yaml'
>- 'automation/gitlab-ci/test.yaml'
> diff --git a/automation/gitlab-ci/analyze.yaml 
> b/automation/gitlab-ci/analyze.yaml
> new file mode 100644
> index 00..be96d96e71
> --- /dev/null
> +++ b/automation/gitlab-ci/analyze.yaml
> @@ -0,0 +1,38 @@
> +.eclair-analysis:
> +  stage: analyze
> +  tags:
> +- eclair-analysis
> +- eclair
> +- misrac

I would only use 1 tag, eclair-analysis or eclair, up to you


> +  variables:
> +ECLAIR_OUTPUT_DIR: "ECLAIR_out"
> +ANALYSIS_KIND: "normal"
> +ECLAIR_REPORT_URL: "saas.eclairit.com"
> +ENABLE_ECLAIR_BOT: "n"
> +AUTOPRBRANCH: "staging"
> +AUTOPRREPOSITORY: "xen-project/xen"
> +  artifacts:
> +when: always
> +paths:
> +  - "${ECLAIR_OUTPUT_DIR}"
> +  - '*.log'
> +reports:
> +  codequality: gl-code-quality-report.json
> +
> +eclair-x86_64:
> +  extends: .eclair-analysis
> +  variables:
> +LOGFILE: "eclair-x86_64.log"
> +VARIANT: "X86_64"
> +RULESET: "Set1"
> +  script:
> +- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"

allow_failure: true


> +eclair-ARM64:
> +  extends: .eclair-analysis
> +  variables:
> +LOGFILE: "eclair-ARM64.log"
> +VARIANT: "ARM64"
> +RULESET: "Set1"
> +  script:
> +- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"

allow_failure: true


> diff --git a/automation/scripts/eclair b/automation/scripts/eclair
> new file mode 100755
> index 00..d7f0845aec
> --- /dev/null
> +++ b/automation/scripts/eclair
> @@ -0,0 +1,26 @@
> +#!/bin/bash -eu
> +
> +# ECLAIR configuration files are maintened by BUGSENG
> +export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no"
> +[ -d ECLAIR_scripts ] || git clone 
> ssh://g...@git.bugseng.com/eclair/scripts/XEN 

Re: [XEN PATCH v10 04/24] xen/arm: tee: add a primitive FF-A mediator

2023-07-20 Thread Julien Grall

Hi Bertrand,

On 20/07/2023 11:20, Bertrand Marquis wrote:

Hi Jens,


On 17 Jul 2023, at 09:20, Jens Wiklander  wrote:

Adds a FF-A version 1.1 [1] mediator to communicate with a Secure
Partition in secure world.

This commit brings in only the parts needed to negotiate FF-A version
number with guest and SPMC.

[1] https://developer.arm.com/documentation/den0077/e
Signed-off-by: Jens Wiklander 
---
xen/arch/arm/include/asm/psci.h|   4 +
xen/arch/arm/include/asm/tee/ffa.h |  35 +
xen/arch/arm/tee/Kconfig   |  11 ++
xen/arch/arm/tee/Makefile  |   1 +
xen/arch/arm/tee/ffa.c | 225 +
xen/arch/arm/vsmc.c|  17 ++-
xen/include/public/arch-arm.h  |   1 +
7 files changed, 291 insertions(+), 3 deletions(-)
create mode 100644 xen/arch/arm/include/asm/tee/ffa.h
create mode 100644 xen/arch/arm/tee/ffa.c

diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index 832f77afff3a..4780972621bb 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -24,6 +24,10 @@ void call_psci_cpu_off(void);
void call_psci_system_off(void);
void call_psci_system_reset(void);

+/* Range of allocated PSCI function numbers */
+#define PSCI_FNUM_MIN_VALUE _AC(0,U)
+#define PSCI_FNUM_MAX_VALUE _AC(0x1f,U)
+
/* PSCI v0.2 interface */
#define PSCI_0_2_FN32(nr) ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, \
  ARM_SMCCC_CONV_32,   \
diff --git a/xen/arch/arm/include/asm/tee/ffa.h 
b/xen/arch/arm/include/asm/tee/ffa.h
new file mode 100644
index ..44361a4e78e4
--- /dev/null
+++ b/xen/arch/arm/include/asm/tee/ffa.h
@@ -0,0 +1,35 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * xen/arch/arm/include/asm/tee/ffa.h
+ *
+ * Arm Firmware Framework for ARMv8-A(FFA) mediator
+ *
+ * Copyright (C) 2023  Linaro Limited
+ */
+
+#ifndef __ASM_ARM_TEE_FFA_H__
+#define __ASM_ARM_TEE_FFA_H__
+
+#include 
+#include 


None of the headers aside xen/config.h will include xen/kconfig.h. The 
former is included everything from the compiler command line. So I have 
removed it.



+
+#include 
+#include 
+
+#define FFA_FNUM_MIN_VALUE  _AC(0x60,U)
+#define FFA_FNUM_MAX_VALUE  _AC(0x86,U)
+
+static inline bool is_ffa_fid(uint32_t fid)
+{
+uint32_t fn = fid & ARM_SMCCC_FUNC_MASK;
+
+return fn >= FFA_FNUM_MIN_VALUE && fn <= FFA_FNUM_MAX_VALUE;
+}
+
+#ifdef CONFIG_FFA
+#define FFA_NR_FUNCS12
+#else
+#define FFA_NR_FUNCS0
+#endif
+
+#endif /*__ASM_ARM_TEE_FFA_H__*/
diff --git a/xen/arch/arm/tee/Kconfig b/xen/arch/arm/tee/Kconfig
index 392169b2559d..923f08ba8cb7 100644
--- a/xen/arch/arm/tee/Kconfig
+++ b/xen/arch/arm/tee/Kconfig
@@ -8,3 +8,14 @@ config OPTEE
  virtualization-enabled OP-TEE present. You can learn more
  about virtualization for OP-TEE at
  https://optee.readthedocs.io/architecture/virtualization.html
+
+config FFA
+ bool "Enable FF-A mediator support (UNSUPPORTED)" if UNSUPPORTED
+ default n
+ depends on ARM_64


Even if the tee Makefile is only included if CONFIG_TEE is activated,
the missing dependency on TEE here allows to select FFA without TEE
resulting in a config with FFA activated but not compiled in.

No build error is coming from this, FFA is just not in if selected without TEE.

Should be:

depends on ARM_64 && TEE

I am ok if this is fixed on commit and my R-B kept.


I have fixed it and committed up to patch #9. That said, I think it 
would be best if we have a category TEE where both the OPTEE and FFA 
config is under. This will help to make the menuconfig clearer and avoid 
the "depends TEE".


Bertrand, Jens, can one of you have a look?

Cheers,

--
Julien Grall



[ovmf test] 181937: all pass - PUSHED

2023-07-20 Thread osstest service owner
flight 181937 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181937/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf c6b512962e92ae54a895bdfd2147abaf2c9e3e22
baseline version:
 ovmf b2de9ec5a759aa4a7ac029cda9079dce077bf856

Last test of basis   181922  2023-07-20 08:11:06 Z0 days
Testing same since   181937  2023-07-20 20:42:43 Z0 days1 attempts


People who touched revisions under test:
  Kun Qin 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   b2de9ec5a7..c6b512962e  c6b512962e92ae54a895bdfd2147abaf2c9e3e22 -> 
xen-tested-master



Re: [PATCH 0/8] Make PDX compression optional

2023-07-20 Thread Andrew Cooper
On 20/07/2023 11:00 pm, Julien Grall wrote:
> Hi Alejandro,
>
> Great work!
>
> On 17/07/2023 17:03, Alejandro Vallejo wrote:
>> Currently there's a CONFIG_HAS_PDX Kconfig option, but it's
>> impossible to
>> disable it because the whole codebase performs unconditional
>> compression/decompression operations on addresses. This has the
>> unfortunate side effect that systems without a need for compression
>> still
>> have to pay the performance impact of juggling bits on every pfn<->pdx
>> conversion (this requires reading several global variables). This series
>> attempts to:
> Just as a datapoint. I applied this to a tree with Live-Update
> support. From the basic test I did, this is reducing the downtime by
> 10% :).

I'm not surprised in the slightest.

We've had many cases that prove that compression (of 0 bits, on all x86
systems) is a disaster perf wise, and its used in pretty much every
fastpath in Xen.

Look no further than c/s 564d261687c and the 10% improvements in general
PV runtime too, and that was optimising away one single instance in one
single fastpath.

It's also why I'm not entertaining the concept of leaving it active or
selectable on x86.

~Andrew



Re: [PATCH 03/10] x86 setup: change bootstrap map to accept new boot module structures

2023-07-20 Thread Christopher Clark
On Thu, Jul 13, 2023 at 11:51 PM Christopher Clark <
christopher.w.cl...@gmail.com> wrote:

>
>
> On Sat, Jul 8, 2023 at 11:47 AM Stefano Stabellini 
> wrote:
>
>> On Sat, 1 Jul 2023, Christopher Clark wrote:
>> > To convert the x86 boot logic from multiboot to boot module structures,
>> > change the bootstrap map function to accept a boot module parameter.
>> >
>> > To allow incremental change from multiboot to boot modules across all
>> > x86 setup logic, provide a temporary inline wrapper that still accepts a
>> > multiboot module parameter and use it where necessary. The wrapper is
>> > placed in a new arch/x86 header  to avoid putting a static
>> > inline function into an existing header that has no such functions
>> > already. This new header will be expanded with additional functions in
>> > subsequent patches in this series.
>> >
>> > No functional change intended.
>> >
>> > Signed-off-by: Christopher Clark 
>> > Signed-off-by: Daniel P. Smith 
>> >
>>
>> [...]
>>
>> > diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
>> > index b72ae31a66..eb93cc3439 100644
>> > --- a/xen/include/xen/bootinfo.h
>> > +++ b/xen/include/xen/bootinfo.h
>> > @@ -10,6 +10,9 @@
>> >  #endif
>> >
>> >  struct boot_module {
>> > +paddr_t start;
>> > +size_t size;
>>
>> I think size should be paddr_t (instead of size_t) to make sure it is
>> the right size on both 64-bit and 32-bit architectures that support
>> 64-bit addresses.
>>
>
> Thanks, that explanation does make sense - ack.
>

I've come back to reconsider this as it doesn't seem right to me to store a
non-address value (which this will always be) in a type explicitly defined
to hold an address: addresses may have architectural alignment requirements
whereas a size value is just a number of bytes so will not. The point of a
size_t value is that size_t is defined to be large enough to hold the size
of any valid object in memory, so I think this was right as-is.

Christopher



>
> Christopher
>
>
>>
>>
>> >  struct arch_bootmodule *arch;
>> >  };
>>
>


Re: [PATCH 02/10] x86 setup: per-arch bootmodule structure, headroom field

2023-07-20 Thread Christopher Clark
On Sat, Jul 8, 2023 at 12:15 PM Stefano Stabellini 
wrote:

> On Sat, 1 Jul 2023, Christopher Clark wrote:
> > Next step in incremental work towards adding a non-multiboot internal
> > representation of boot modules, converting the fields being accessed for
> > the startup calculations.
> >
> > Add a new array of structs for per-boot-module state, though only
> > allocate space for a single array entry in this change since that is all
> > that is required for functionality modified in this patch: moving the
> > headroom field for the image decompression calculation into a new
> > per-arch boot module struct and then using it in x86 dom0 construction.
> >
> > Introduces a per-arch header for x86 for arch-specific boot module
> > structures, and add a member to the common boot module structure for
> > access to it.
> >
> > No functional change intended.
> >
> > Signed-off-by: Christopher Clark 
> > Signed-off-by: Daniel P. Smith 
>
> [...]
>
>
> > diff --git a/xen/arch/x86/include/asm/bootinfo.h
> b/xen/arch/x86/include/asm/bootinfo.h
> > new file mode 100644
> > index 00..a25054f372
> > --- /dev/null
> > +++ b/xen/arch/x86/include/asm/bootinfo.h
> > @@ -0,0 +1,18 @@
> > +#ifndef __ARCH_X86_BOOTINFO_H__
> > +#define __ARCH_X86_BOOTINFO_H__
> > +
> > +struct arch_bootmodule {
> > +unsigned headroom;
> > +};
> > +
> > +#endif
> > +
> > +/*
> > + * Local variables:
> > + * mode: C
> > + * c-file-style: "BSD"
> > + * c-basic-offset: 4
> > + * tab-width: 4
> > + * indent-tabs-mode: nil
> > + * End:
> > + */
>
> [...]
>
> > diff --git a/xen/include/xen/bootinfo.h b/xen/include/xen/bootinfo.h
> > index 6a7d55d20e..b72ae31a66 100644
> > --- a/xen/include/xen/bootinfo.h
> > +++ b/xen/include/xen/bootinfo.h
> > @@ -3,8 +3,19 @@
> >
> >  #include 
> >
> > +#ifdef CONFIG_X86
> > +#include 
> > +#else
> > +struct arch_bootmodule { };
> > +#endif
> > +
> > +struct boot_module {
> > +struct arch_bootmodule *arch;
> > +};
> > +
> >  struct boot_info {
> >  unsigned int nr_mods;
> > +struct boot_module *mods;
>
> Also here you already made the effort of using the same data structures
> we use on ARM, you might as well use the same names too. Otherwise when
> we try to use them on ARM it will require a rename somewhere.
>

Thanks for the review. We consciously made an effort to derive from the Arm
data structures to ensure that we're able to support the logic that Arm
needs. Arm's bootmodules were a
good start as a means for abstraction, and the design for hyperlaunch was
striving to abstract even further to incorporate x86-ism and hopefully
enough foresight for PPC and Risc-V.

Christopher


>
> >  };
> >
> >  #endif
> > --
> > 2.25.1
> >
> >
>


Re: [PATCH 0/8] Make PDX compression optional

2023-07-20 Thread Julien Grall

Hi Alejandro,

Great work!

On 17/07/2023 17:03, Alejandro Vallejo wrote:

Currently there's a CONFIG_HAS_PDX Kconfig option, but it's impossible to
disable it because the whole codebase performs unconditional
compression/decompression operations on addresses. This has the
unfortunate side effect that systems without a need for compression still
have to pay the performance impact of juggling bits on every pfn<->pdx
conversion (this requires reading several global variables). This series
attempts to:
Just as a datapoint. I applied this to a tree with Live-Update support. 
From the basic test I did, this is reducing the downtime by 10% :).


Cheers,

--
Julien Grall



Re: [PATCH] x86/vRTC: move and tidy convert_hour() and {to,from}_bcd()

2023-07-20 Thread Stefano Stabellini
On Thu, 20 Jul 2023, Jan Beulich wrote:
> This is to avoid the need for forward declarations, which in turn
> addresses a violation of MISRA C:2012 Rule 8.3 ("All declarations of an
> object or function shall use the same names and type qualifiers").
> 
> While doing so,
> - drop inline (leaving the decision to the compiler),
> - add const,
> - add unsigned,
> - correct style.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Stefano Stabellini 


> --- a/xen/arch/x86/hvm/rtc.c
> +++ b/xen/arch/x86/hvm/rtc.c
> @@ -58,8 +58,6 @@ enum rtc_mode {
>  
>  static void rtc_copy_date(RTCState *s);
>  static void rtc_set_time(RTCState *s);
> -static inline int from_bcd(RTCState *s, int a);
> -static inline int convert_hour(RTCState *s, int hour);
>  
>  static void rtc_update_irq(RTCState *s)
>  {
> @@ -246,6 +244,40 @@ static void cf_check rtc_update_timer2(v
>  spin_unlock(>lock);
>  }
>  
> +static unsigned int to_bcd(const RTCState *s, unsigned int a)
> +{
> +if ( s->hw.cmos_data[RTC_REG_B] & RTC_DM_BINARY )
> +return a;
> +
> +return ((a / 10) << 4) | (a % 10);
> +}
> +
> +static unsigned int from_bcd(const RTCState *s, unsigned int a)
> +{
> +if ( s->hw.cmos_data[RTC_REG_B] & RTC_DM_BINARY )
> +return a;
> +
> +return ((a >> 4) * 10) + (a & 0x0f);
> +}
> +
> +/*
> + * Hours in 12 hour mode are in 1-12 range, not 0-11. So we need convert it
> + * before use.
> + */
> +static unsigned int convert_hour(const RTCState *s, unsigned int raw)
> +{
> +unsigned int hour = from_bcd(s, raw & 0x7f);
> +
> +if ( !(s->hw.cmos_data[RTC_REG_B] & RTC_24H) )
> +{
> +hour %= 12;
> +if ( raw & 0x80 )
> +hour += 12;
> +}
> +
> +return hour;
> +}
> +
>  /* handle alarm timer */
>  static void alarm_timer_update(RTCState *s)
>  {
> @@ -541,37 +573,6 @@ static int rtc_ioport_write(void *opaque
>  return 1;
>  }
>  
> -static inline int to_bcd(RTCState *s, int a)
> -{
> -if ( s->hw.cmos_data[RTC_REG_B] & RTC_DM_BINARY )
> -return a;
> -else
> -return ((a / 10) << 4) | (a % 10);
> -}
> -
> -static inline int from_bcd(RTCState *s, int a)
> -{
> -if ( s->hw.cmos_data[RTC_REG_B] & RTC_DM_BINARY )
> -return a;
> -else
> -return ((a >> 4) * 10) + (a & 0x0f);
> -}
> -
> -/* Hours in 12 hour mode are in 1-12 range, not 0-11.
> - * So we need convert it before using it*/
> -static inline int convert_hour(RTCState *s, int raw)
> -{
> -int hour = from_bcd(s, raw & 0x7f);
> -
> -if (!(s->hw.cmos_data[RTC_REG_B] & RTC_24H))
> -{
> -hour %= 12;
> -if (raw & 0x80)
> -hour += 12;
> -}
> -return hour;
> -}
> -
>  static void rtc_set_time(RTCState *s)
>  {
>  struct tm *tm = >current_tm;
> 



Re: [PATCH 01/10] x86 setup: move x86 boot module counting into a new boot_info struct

2023-07-20 Thread Christopher Clark
On Sat, Jul 8, 2023 at 11:30 AM Stefano Stabellini 
wrote:

> On Sat, 1 Jul 2023, Christopher Clark wrote:
> > An initial step towards a non-multiboot internal representation of boot
> > modules for common code, starting with x86 setup and converting the
> > fields that are accessed for the startup calculations.
> >
> > Introduce a new header, , and populate it with a new
> > boot_info structure initially containing a count of the number of boot
> > modules.
> >
> > The naming of the header, structure and fields is intended to respect
> > the boot structures on Arm -- see arm/include/asm/setup.h -- as part of
> > work towards aligning common architecture-neutral boot logic and
> > structures.
>
> Thanks for aligning the two archs. At some point we should also have ARM
> use the common headers.
>
>
> > No functional change intended.
> >
> > Signed-off-by: Christopher Clark 
> > Signed-off-by: Daniel P. Smith 
> >
> > ---
> > Changes since v1: patch is a subset of v1 series patches 2 and 3.
> >
> >  xen/arch/x86/setup.c   | 58 +++---
> >  xen/include/xen/bootinfo.h | 20 +
> >  2 files changed, 55 insertions(+), 23 deletions(-)
> >  create mode 100644 xen/include/xen/bootinfo.h
> >
> > diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
> > index 74e3915a4d..708639b236 100644
> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -1,3 +1,4 @@
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -268,7 +269,16 @@ static int __init cf_check parse_acpi_param(const
> char *s)
> >  custom_param("acpi", parse_acpi_param);
> >
> >  static const module_t *__initdata initial_images;
> > -static unsigned int __initdata nr_initial_images;
> > +static struct boot_info __initdata *boot_info;
>
> Why can't this be not a pointer?
>

In a later patch (10/10 in the same series posted), the boot_info pointer
is passed as an argument to start_xen. On x86 there are currently three
different entry points to this that have different environments which must
all be made to behave the same, and passing the argument as a pointer is a
lowest-common-denominater due to the 32bit x86 multiboot entry point.
Additionally another entry point will be coming soon for TrenchBoot.

Defining it as a pointer now where this logic is introduced saves having to
do a conversion of all accesses when the later change is made.

I can add a note about this to the commit message.



>
>
> > +static void __init multiboot_to_bootinfo(multiboot_info_t *mbi)
> > +{
> > +static struct boot_info __initdata info;
>
> Then we don't need this
>

(see above)

>
>
> > +info.nr_mods = mbi->mods_count;
> > +
> > +boot_info = 
>
> And we could just do:
>
>   boot_info.nr_mods = mbi->mods_count;
>
> ?
>

(see above)



>
>
> > +}
> >
> >  unsigned long __init initial_images_nrpages(nodeid_t node)
> >  {
> > @@ -277,7 +287,7 @@ unsigned long __init initial_images_nrpages(nodeid_t
> node)
> >  unsigned long nr;
> >  unsigned int i;
> >
> > -for ( nr = i = 0; i < nr_initial_images; ++i )
> > +for ( nr = i = 0; i < boot_info->nr_mods; ++i )
> >  {
> >  unsigned long start = initial_images[i].mod_start;
> >  unsigned long end = start + PFN_UP(initial_images[i].mod_end);
> > @@ -293,7 +303,7 @@ void __init discard_initial_images(void)
> >  {
> >  unsigned int i;
> >
> > -for ( i = 0; i < nr_initial_images; ++i )
> > +for ( i = 0; i < boot_info->nr_mods; ++i )
> >  {
> >  uint64_t start = (uint64_t)initial_images[i].mod_start <<
> PAGE_SHIFT;
> >
> > @@ -301,7 +311,7 @@ void __init discard_initial_images(void)
> > start +
> PAGE_ALIGN(initial_images[i].mod_end));
> >  }
> >
> > -nr_initial_images = 0;
> > +boot_info->nr_mods = 0;
> >  initial_images = NULL;
> >  }
> >
> > @@ -1020,6 +1030,8 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
> >  mod = __va(mbi->mods_addr);
> >  }
> >
> > +multiboot_to_bootinfo(mbi);
> > +
> >  loader = (mbi->flags & MBI_LOADERNAME)
> >  ? (char *)__va(mbi->boot_loader_name) : "unknown";
> >
> > @@ -1127,18 +1139,18 @@ void __init noreturn __start_xen(unsigned long
> mbi_p)
> > bootsym(boot_edd_info_nr));
> >
> >  /* Check that we have at least one Multiboot module. */
> > -if ( !(mbi->flags & MBI_MODULES) || (mbi->mods_count == 0) )
> > +if ( !(mbi->flags & MBI_MODULES) || (boot_info->nr_mods == 0) )
> >  panic("dom0 kernel not specified. Check bootloader
> configuration\n");
> >
> >  /* Check that we don't have a silly number of modules. */
> > -if ( mbi->mods_count > sizeof(module_map) * 8 )
> > +if ( boot_info->nr_mods > sizeof(module_map) * 8 )
> >  {
> > -mbi->mods_count = sizeof(module_map) * 8;
> > +boot_info->nr_mods = sizeof(module_map) * 8;
> >  printk("Excessive multiboot modules - using the first %u
> only\n",
> > -   

[XEN PATCH] automation: add ECLAIR pipeline

2023-07-20 Thread Simone Ballarin
Add two pipelines that analyze an ARM64 and a X86_64 build with the
ECLAIR static analyzer on the guidelines contained in Set1.

The tool configuration is kept external to the xen repository for
practical reasons, it will be included in a subsequent phase.

All commits on the xen-project/xen:staging branch will be analyzed
and their artifacts will be stored indefinitely; the integration will
report differential information with respect to the previous analysis.

All commits on other branches or repositories will be analyzed and
only the last ten artifacts will be kept; the integration will report
differential information with respect to the analysis done on the common
ancestor with xen-project/xen:staging (if available).

Currently the pipeline variable ENABLE_ECLAIR_BOT is set to "n".
Doing so disables the generation of comments with the analysis summary
on the commit threads. The variable can be set to "y" if the a masked
variable named ECLAIRIT_TOKEN is set with the impersonation token of
an account with enough privileges to write on all repositories.

Additionaly any repository should be able to read a masked variable
named WTOKEN with the token provided by BUGSENG.

Signed-off-by: Simone Ballarin 
---
 .gitlab-ci.yml|  2 ++
 automation/gitlab-ci/analyze.yaml | 38 +++
 automation/scripts/eclair | 26 +
 3 files changed, 66 insertions(+)
 create mode 100644 automation/gitlab-ci/analyze.yaml
 create mode 100755 automation/scripts/eclair

diff --git a/.gitlab-ci.yml b/.gitlab-ci.yml
index c8bd7519d5..ee5430b8b7 100644
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@@ -1,7 +1,9 @@
 stages:
+  - analyze
   - build
   - test
 
 include:
+  - 'automation/gitlab-ci/analyze.yaml'
   - 'automation/gitlab-ci/build.yaml'
   - 'automation/gitlab-ci/test.yaml'
diff --git a/automation/gitlab-ci/analyze.yaml 
b/automation/gitlab-ci/analyze.yaml
new file mode 100644
index 00..be96d96e71
--- /dev/null
+++ b/automation/gitlab-ci/analyze.yaml
@@ -0,0 +1,38 @@
+.eclair-analysis:
+  stage: analyze
+  tags:
+- eclair-analysis
+- eclair
+- misrac
+  variables:
+ECLAIR_OUTPUT_DIR: "ECLAIR_out"
+ANALYSIS_KIND: "normal"
+ECLAIR_REPORT_URL: "saas.eclairit.com"
+ENABLE_ECLAIR_BOT: "n"
+AUTOPRBRANCH: "staging"
+AUTOPRREPOSITORY: "xen-project/xen"
+  artifacts:
+when: always
+paths:
+  - "${ECLAIR_OUTPUT_DIR}"
+  - '*.log'
+reports:
+  codequality: gl-code-quality-report.json
+
+eclair-x86_64:
+  extends: .eclair-analysis
+  variables:
+LOGFILE: "eclair-x86_64.log"
+VARIANT: "X86_64"
+RULESET: "Set1"
+  script:
+- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
+
+eclair-ARM64:
+  extends: .eclair-analysis
+  variables:
+LOGFILE: "eclair-ARM64.log"
+VARIANT: "ARM64"
+RULESET: "Set1"
+  script:
+- ./automation/scripts/eclair 2>&1 | tee "${LOGFILE}"
diff --git a/automation/scripts/eclair b/automation/scripts/eclair
new file mode 100755
index 00..d7f0845aec
--- /dev/null
+++ b/automation/scripts/eclair
@@ -0,0 +1,26 @@
+#!/bin/bash -eu
+
+# ECLAIR configuration files are maintened by BUGSENG
+export GIT_SSH_COMMAND="ssh -o StrictHostKeyChecking=no"
+[ -d ECLAIR_scripts ] || git clone 
ssh://g...@git.bugseng.com/eclair/scripts/XEN ECLAIR_scripts
+(cd ECLAIR_scripts; git pull --rebase)
+
+ECLAIR_DIR=ECLAIR_scripts/ECLAIR
+ECLAIR_OUTPUT_DIR=$(realpath "${ECLAIR_OUTPUT_DIR}")
+
+ECLAIR_scripts/prepare.sh "${VARIANT}"
+
+ex=0
+"${ECLAIR_DIR}/analyze.sh" "${VARIANT}" "${RULESET}" || ex=$?
+"${ECLAIR_DIR}/action_log.sh" ANALYSIS_LOG \
+ "ECLAIR analysis log" \
+ "${ECLAIR_OUTPUT_DIR}/ANALYSIS.log" \
+ "${ex}"
+"${ECLAIR_DIR}/action_log.sh" REPORT_LOG \
+ "ECLAIR report log" \
+ "${ECLAIR_OUTPUT_DIR}/REPORT.log" \
+ "${ex}"
+[ "${ex}" = 0 ] || exit "${ex}"
+"${ECLAIR_DIR}/action_push.sh" "${WTOKEN}" "${ECLAIR_OUTPUT_DIR}"
+
+rm -rf "${ECLAIR_OUTPUT_DIR}/.data"
-- 
2.34.1




Re: [PATCH v4 3/4] xen/ppc: Implement early serial printk on pseries

2023-07-20 Thread Shawn Anastasio
On 7/19/23 9:05 AM, Jan Beulich wrote:
> Before you/we grow more assembly code, may I re-raise a request regarding
> readability: I think it would be nice if operands started at a fixed column,
> unless the insn mnemonic is unusually long. Where exactly to draw the line
> is up to each archtecture; on x86 we use 8 positions from the start of the
> mnemonic.

There is quite a large variance in mnemonic length on ppc -- many common
mnemonics only use 2 characters (e.g. ld, mr) while other common ones
use 6+ (e.g. rldicr, the mtspr family, etc.). Enforcing a column size
that's too short would make the longer mnemonics look misaligned and out
of place, but using a longer column length (like 8) that can accommodate
most common mnemonics adds too much space between short mnemonics and
their arguments.

That said if you still feel strongly about this then I am not strongly
opposed to adding an 8-space column alignment.

> Jan

Thanks,
Shawn



[qemu-upstream-4.17-testing test] 181914: tolerable FAIL - PUSHED

2023-07-20 Thread osstest service owner
flight 181914 qemu-upstream-4.17-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181914/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-qemuu-rhel6hvm-amd 12 redhat-install fail in 181865 pass in 
181889
 test-amd64-i386-libvirt 20 guest-start/debian.repeat fail in 181865 pass in 
181914
 test-amd64-i386-xl-vhd   22 guest-start.2fail in 181865 pass in 181914
 test-amd64-i386-xl-vhd7 xen-install  fail in 181889 pass in 181914
 test-amd64-i386-pair 10 xen-install/src_host fail in 181889 pass in 181914
 test-armhf-armhf-libvirt-raw 13 guest-startfail pass in 181865
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-install  fail pass in 181889
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-install fail pass in 181889
 test-amd64-i386-freebsd10-amd64  7 xen-install fail pass in 181889

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail in 181865 like 
175008
 test-armhf-armhf-libvirt-raw 14 migrate-support-check fail in 181865 never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop   fail in 181889 like 175008
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 175008
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 175008
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 175008
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 175008
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 175008
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 175008
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-check

Re: [PATCH 2/8] arm/mm: Document the differences between arm32 and arm64 directmaps

2023-07-20 Thread Julien Grall

Hi Alejandro,

On 17/07/2023 17:03, Alejandro Vallejo wrote:

arm32 merely covers the XENHEAP, whereas arm64 currently covers anything in
the frame table. These comments highlight why arm32 doesn't need to account for 
PDX
compression in its __va() implementation while arm64 does.

Signed-off-by: Alejandro Vallejo 
---
  xen/arch/arm/include/asm/mm.h | 27 +++
  1 file changed, 27 insertions(+)

diff --git a/xen/arch/arm/include/asm/mm.h b/xen/arch/arm/include/asm/mm.h
index 4262165ce2..1a83f41879 100644
--- a/xen/arch/arm/include/asm/mm.h
+++ b/xen/arch/arm/include/asm/mm.h
@@ -280,6 +280,19 @@ static inline paddr_t __virt_to_maddr(vaddr_t va)
  #define virt_to_maddr(va)   __virt_to_maddr((vaddr_t)(va))
  
  #ifdef CONFIG_ARM_32

+/**
+ * Find the virtual address corresponding to a machine address
+ *
+ * Only memory backing the XENHEAP has a corresponding virtual address to
+ * be found. This is so we can save precious virtual space, as it's in
+ * short supply on arm32. This mapping is not subject to PDX compression
+ * because XENHEAP is known to be physically contiguous and can't hence
+ * jump over the PDX hole. This means we can avoid the roundtrips
+ * converting to/from pdx.
+ *
+ * @param ma Machine address
+ * @return Virtual address mapped to `ma`
+ */
  static inline void *maddr_to_virt(paddr_t ma)
  {
  ASSERT(is_xen_heap_mfn(maddr_to_mfn(ma)));
@@ -287,6 +300,20 @@ static inline void *maddr_to_virt(paddr_t ma)
  return (void *)(unsigned long) ma + XENHEAP_VIRT_START;
  }
  #else
+/**
+ * Find the virtual address corresponding to a machine address
+ *
+ * The directmap covers all conventional memory accesible by the
+ * hypervisor. This means it's subject to PDX compression.
+ *
+ * More specifically to arm64, the directmap mappings start at the first
+ * GiB boundary containing valid RAM. This means there's an extra offset
+ * applied (directmap_base_pdx) on top of the regular PDX compression
+ * logic.


I find this paragraph a bit confusing to read because it leads to think 
that pdx_to_maddr(directmap_base_pdx) will return a GiB aligned address.


The base PDX corresponds to the start of the first region and the only 
requirement is it should be page-aligned. However, when mapping in the 
virtual address space we also offset the start to ensure that superpage 
can be used (this is where the GiB alignment is used).


That's why XENHEAP_VIRT_START points to directmap_virt_start rather than 
DIRECTMAP_VIRT_START. I think it would make sense to have the logic 
following what you suggest as it would remove a memory read. But I would 
understand if you don't want to take that extra work. :)


So for now, I would suggest to remove "GiB boundary containing".

Cheers,

--
Julien Grall



[linux-linus test] 181913: regressions - FAIL

2023-07-20 Thread osstest service owner
flight 181913 linux-linus real [real]
flight 181930 linux-linus real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/181913/
http://logs.test-lab.xenproject.org/osstest/logs/181930/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-arm64-arm64-libvirt-raw 13 guest-start  fail REGR. vs. 180278
 test-arm64-arm64-xl-vhd  13 guest-start  fail REGR. vs. 180278
 test-armhf-armhf-xl-credit1   8 xen-boot fail REGR. vs. 180278

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-win7-amd64  8 xen-bootfail pass in 181930-retest
 test-amd64-amd64-xl-multivcpu 20 guest-localmigrate/x10 fail pass in 
181930-retest
 test-amd64-amd64-xl-qemut-debianhvm-i386-xsm 18 guest-localmigrate/x10 fail 
pass in 181930-retest
 test-amd64-amd64-xl-qemut-debianhvm-amd64 18 guest-localmigrate/x10 fail pass 
in 181930-retest

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail in 181930 like 180278
 test-armhf-armhf-xl-arndale   8 xen-boot fail  like 180278
 test-armhf-armhf-libvirt  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-credit2   8 xen-boot fail  like 180278
 test-armhf-armhf-xl-vhd   8 xen-boot fail  like 180278
 test-armhf-armhf-xl   8 xen-boot fail  like 180278
 test-armhf-armhf-examine  8 reboot   fail  like 180278
 test-armhf-armhf-xl-multivcpu  8 xen-boot fail like 180278
 test-armhf-armhf-libvirt-qcow2  8 xen-bootfail like 180278
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 180278
 test-armhf-armhf-libvirt-raw  8 xen-boot fail  like 180278
 test-armhf-armhf-xl-rtds  8 xen-boot fail  like 180278
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 180278
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 180278
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 linuxbfa3037d828050896ae52f6467b6ca2489ae6fb1
baseline version:
 linux6c538e1adbfc696ac4747fb10d63e704344f763d

Last test of basis   180278  2023-04-16 19:41:46 Z   95 days
Failing since180281  2023-04-17 06:24:36 Z   94 days  181 attempts
Testing same since   181913  2023-07-19 21:15:31 Z0 days1 attempts


3787 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvops 

Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 06:03:49PM +0200, Jan Beulich wrote:
> On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> > --- a/xen/drivers/vpci/msi.c
> > +++ b/xen/drivers/vpci/msi.c
> > @@ -190,6 +190,8 @@ static int cf_check init_msi(struct pci_dev *pdev)
> >  uint16_t control;
> >  int ret;
> >  
> > +ASSERT(rw_is_write_locked(>domain->pci_lock));
> 
> I'm afraid I have to ask the opposite question, compared to Roger's:
> Why do you need the lock held for write here (and in init_msix())?
> Neither list of devices nor the pdev->vpci pointer are being
> altered.

This is called from vpci_add_handlers() which will acquire (or
requires being called) with the lock in write mode in order to set
pdev->vpci I would assume.  Strictly speaking however the init
handlers don't require the lock in write mode unless we use such
locking to get exclusive access to all the devices assigned to the
domain BARs array for modify_bars().

Thanks, Roger.



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Jan Beulich
On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> @@ -431,10 +447,23 @@ static void vpci_write_helper(const struct pci_dev 
> *pdev,
>   r->private);
>  }
>  
> +/* Helper function to unlock locks taken by vpci_write in proper order */
> +static void unlock_locks(struct domain *d)
> +{
> +ASSERT(rw_is_locked(>pci_lock));
> +
> +if ( is_hardware_domain(d) )
> +{
> +ASSERT(rw_is_locked(>pci_lock));

Copy-and-past mistake? You've asserted this same condition already above.

> +read_unlock(_xen->pci_lock);
> +}
> +read_unlock(>pci_lock);
> +}
> +
>  void vpci_write(pci_sbdf_t sbdf, unsigned int reg, unsigned int size,
>  uint32_t data)
>  {
> -const struct domain *d = current->domain;
> +struct domain *d = current->domain;
>  const struct pci_dev *pdev;
>  const struct vpci_register *r;
>  unsigned int data_offset = 0;
> @@ -447,8 +476,16 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size,
>  
>  /*
>   * Find the PCI dev matching the address, which for hwdom also requires
> - * consulting DomXEN.  Passthrough everything that's not trapped.
> + * consulting DomXEN. Passthrough everything that's not trapped.
> + * If this is hwdom, we need to hold locks for both domain in case if
> + * modify_bars is called()
>   */
> +read_lock(>pci_lock);
> +
> +/* dom_xen->pci_lock always should be taken second to prevent deadlock */
> +if ( is_hardware_domain(d) )
> +read_lock(_xen->pci_lock);

But I wonder anyway - can we perhaps get away without acquiring dom_xen's
lock here? Its list isn't altered anymore post-boot, iirc.

> @@ -498,6 +537,7 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
> unsigned int size,
>  ASSERT(data_offset < size);
>  }
>  spin_unlock(>vpci->lock);
> +unlock_locks(d);

In this context the question arises whether the function wouldn't better
be named more specific to its purpose: It's obvious here that it doesn't
unlock all the locks involved.

Jan



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Jan Beulich
On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> --- a/xen/drivers/vpci/msi.c
> +++ b/xen/drivers/vpci/msi.c
> @@ -190,6 +190,8 @@ static int cf_check init_msi(struct pci_dev *pdev)
>  uint16_t control;
>  int ret;
>  
> +ASSERT(rw_is_write_locked(>domain->pci_lock));

I'm afraid I have to ask the opposite question, compared to Roger's:
Why do you need the lock held for write here (and in init_msix())?
Neither list of devices nor the pdev->vpci pointer are being altered.

Jan



Re: [PATCH v8 05/13] vpci/header: implement guest BAR register handlers

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:32:32AM +, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> Add relevant vpci register handlers when assigning PCI device to a domain
> and remove those when de-assigning. This allows having different
> handlers for different domains, e.g. hwdom and other guests.
> 
> Emulate guest BAR register values: this allows creating a guest view
> of the registers and emulates size and properties probe as it is done
> during PCI device enumeration by the guest.
> 
> All empty, IO and ROM BARs for guests are emulated by returning 0 on
> reads and ignoring writes: this BARs are special with this respect as
> their lower bits have special meaning, so returning default ~0 on read
> may confuse guest OS.
> 
> Memory decoding is initially disabled when used by guests in order to
> prevent the BAR being placed on top of a RAM region.

I'm kind of lost on this last sentence, as I don't see the patch
explicitly disabling PCI_COMMAND_MEMORY form the command register.  Is
that more of an expectation on the initial device state?

Maybe there should be some checking in that case then?

> 
> Signed-off-by: Oleksandr Andrushchenko 
> ---
> 
> Since v6:
> - unify the writing of the PCI_COMMAND register on the
>   error path into a label
> - do not introduce bar_ignore_access helper and open code
> - s/guest_bar_ignore_read/empty_bar_read
> - update error message in guest_bar_write
> - only setup empty_bar_read for IO if !x86
> Since v5:
> - make sure that the guest set address has the same page offset
>   as the physical address on the host
> - remove guest_rom_{read|write} as those just implement the default
>   behaviour of the registers not being handled
> - adjusted comment for struct vpci.addr field
> - add guest handlers for BARs which are not handled and will otherwise
>   return ~0 on read and ignore writes. The BARs are special with this
>   respect as their lower bits have special meaning, so returning ~0
>   doesn't seem to be right
> Since v4:
> - updated commit message
> - s/guest_addr/guest_reg
> Since v3:
> - squashed two patches: dynamic add/remove handlers and guest BAR
>   handler implementation
> - fix guest BAR read of the high part of a 64bit BAR (Roger)
> - add error handling to vpci_assign_device
> - s/dom%pd/%pd
> - blank line before return
> Since v2:
> - remove unneeded ifdefs for CONFIG_HAS_VPCI_GUEST_SUPPORT as more code
>   has been eliminated from being built on x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - simplify some code3. simplify
>  - use gdprintk + error code instead of gprintk
>  - gate vpci_bar_{add|remove}_handlers with CONFIG_HAS_VPCI_GUEST_SUPPORT,
>so these do not get compiled for x86
>  - removed unneeded is_system_domain check
>  - re-work guest read/write to be much simpler and do more work on write
>than read which is expected to be called more frequently
>  - removed one too obvious comment
> ---
>  xen/drivers/vpci/header.c | 156 +++---
>  xen/include/xen/vpci.h|   3 +
>  2 files changed, 130 insertions(+), 29 deletions(-)
> 
> diff --git a/xen/drivers/vpci/header.c b/xen/drivers/vpci/header.c
> index 2780fcae72..5dc9b5338b 100644
> --- a/xen/drivers/vpci/header.c
> +++ b/xen/drivers/vpci/header.c
> @@ -457,6 +457,71 @@ static void cf_check bar_write(
>  pci_conf_write32(pdev->sbdf, reg, val);
>  }
>  
> +static void cf_check guest_bar_write(const struct pci_dev *pdev,
> + unsigned int reg, uint32_t val, void 
> *data)
> +{
> +struct vpci_bar *bar = data;
> +bool hi = false;
> +uint64_t guest_reg = bar->guest_reg;
> +
> +if ( bar->type == VPCI_BAR_MEM64_HI )
> +{
> +ASSERT(reg > PCI_BASE_ADDRESS_0);
> +bar--;
> +hi = true;
> +}
> +else
> +{
> +val &= PCI_BASE_ADDRESS_MEM_MASK;
> +val |= bar->type == VPCI_BAR_MEM32 ? PCI_BASE_ADDRESS_MEM_TYPE_32
> +   : PCI_BASE_ADDRESS_MEM_TYPE_64;
> +val |= bar->prefetchable ? PCI_BASE_ADDRESS_MEM_PREFETCH : 0;
> +}
> +
> +guest_reg &= ~(0xull << (hi ? 32 : 0));
> +guest_reg |= (uint64_t)val << (hi ? 32 : 0);
> +
> +guest_reg &= ~(bar->size - 1) | ~PCI_BASE_ADDRESS_MEM_MASK;
> +
> +/*
> + * Make sure that the guest set address has the same page offset
> + * as the physical address on the host or otherwise things won't work as
> + * expected.
> + */
> +if ( (guest_reg & (~PAGE_MASK & PCI_BASE_ADDRESS_MEM_MASK)) !=
> + (bar->addr & ~PAGE_MASK) )
> +{
> +gprintk(XENLOG_WARNING,
> +"%pp: ignored BAR %zu write attempting to change page 
> offset\n",
> +>sbdf, bar - pdev->vpci->header.bars + hi);
> +return;
> +}
> +
> +bar->guest_reg = guest_reg;
> +}
> +
> +static uint32_t cf_check guest_bar_read(const struct pci_dev *pdev,
> + 

Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Nicola Vetrini



If the value is always initialized in the callee, then there's no 
problem configuring ECLAIR so that it knows that this parameter is 
always written, and therefore any subsequent use in the caller is ok.


Another possibility is stating that a function never reads the 
pointee before writing to it (it may or may not write it, but if it 
doesn't, then the pointee is not read either). The 'strncmp' after 
'fdt_get_path' does get in the way, though, because this property is 
not strong enough to ensure that we can use 'path' after returning 
from the function.


I am not sure I fully understand what you wrote. Can you provide a C 
example?




void f(int *x) {
   if(x) {
 *x = 10;
 int y =*x; // read the pointee after it's initialized
   } else {
 int z; // in this branch the pointee is not read nor written
   }
   // we can say that f never reads *x before (possibly) writing to it.
}


I am having trouble to understand it in the context of fdt_get_path(). 
Is 'f' meant to be fdt_get_path()?




Yes, exactly. The point is that 'fdt_get_path' surely ensures not to 
read uninitialized addresses from the path array, therefore if the 
strcmp can be somehow incorporated in a function or macro e.g.


int fdt_compare_path(fdt, node, path, str) {
/* Check that the node is under "/chosen" (first 7 chars of path) */
ret = fdt_get_path(fdt, node, path, sizeof (path));
if ( ret != 0 || strncmp(path, "/chosen", 7) )
return ret;
return 0;
}

called in bootfdt as fdt_compare_path(fdt, node, path, "/chosen");

then 'fdt_compare_path' has the, shall we say, "no read before write" 
property and because path isn't used anywhere else in 
'process_multiboot_node' that is enough to make ECLAIR happy.




This should be probably discussed after deciding on the refactoring 
'dt_property_read_string'


FAOD, I think we should refactor dt_property_read_string(). I am happy 
to write a patch if you want.




That would be perfect, I'll ll test it when I see it, so that I can give 
you feedback on that patch directly.




The analysis here could use some more precision, but the modified 
construct is entirely equivalent.


I agree that they are equivalent. But in general, we don't change the 
style of the construct without explaining why.


In this case, the first step would be to improve Eclair.



The changes needed for this kind of analysis are not trivial: we've 
looked into this, but there's no easy way to support this in a timely 
manner. I understand that this is an estabilished pattern, but what 
would you think of an initializer using designators?


uint64_t cmd[4] = {
 .[0] = GITS_CMD_MAPC;
 .[1] = 0x00;
 .[2] = encode_rdbase(its, cpu, collection_id) | GITS_VALID_BIT;
 .[3] = 0x00;
}


The reability is Ok here. But this may not be the case here. In 
particular...







  cmd[3] = 0x00;
  return its_send_command(its, cmd);
@@ -215,9 +214,7 @@ static int its_send_cmd_mapd(struct host_its 
*its, uint32_t deviceid,

  }
  cmd[0] = GITS_CMD_MAPD | ((uint64_t)deviceid << 32);
  cmd[1] = size_bits;
-    cmd[2] = itt_addr;
-    if ( valid )
-    cmd[2] |= GITS_VALID_BIT;
+    cmd[2] = itt_addr | (valid ? GITS_VALID_BIT : 0x00);


Same here.


here... I much prefer the existing version.



Well, that if can be kept as well. Like this:

cmd = { .[0] = .., .[2] = itt_addr, ... };
if(valid)
  cmd[2] |= GITS_VALID_BIT;



Considering all of the replies above, a first draft of a 
strategy/policy I can think of is having:


- Initializer functions that always write their parameter, so that 
the strongest "pointee always written" property can be stated. This 
causes all further uses to be marked safe.


- Initialize the variable when there exists a known safe value that 
does not alter the semantics of the function. The initialization 
does not need to be at the declaration, but doing so simplifies the 
code.


As I mentionned in private there are two risks with that:
  1. You silence compiler to spot other issues
  2. You may now get warning from Coverity if it spots you set a 
value that get overwritten before its first use.


So I think such approach should be used with parcimony. Instead, we 
should look at reworking the code when possible.




Do you think it would help if you look directly at actual cautions to 
spot possible functions that can be refactored?


I have already looked at some. Can we focus on them and see how much it 
helps?




Yes. It would reduce the noise for me too

Regards,

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [XEN PATCH] xen/arm: optee: provide an initialization for struct arm_smccc_res

2023-07-20 Thread Julien Grall

Hi Nicola,

On 20/07/2023 15:29, Nicola Vetrini wrote:

The local variables with type 'struct arm_smccc_res' are initialized
just after the declaration to avoid any possible read usage prior
to any write usage, which would constitute a violation of
MISRA C:2012 Rule 9.1.

This is already prevented by suitable checks in the code,
but the correctness of this approach is difficult to prove and
reason about.


So I looked at the implementation of arm_smccc_smc(). For arm64, it is 
(simplified):


if ( cpus_have_const_cap(ARM_SMCCC_1_1) )
   arm_smccc_1_1_smc(__VA_ARGS__);
else
   arm_smccc_1_0_smc(_VA_ARGS__);

In arm_smccc_1_1_smc(), we will explicitly initialize __res:

if ( ___res )
  *___res = (typeof(*___res)) {r0, r1, r2, r3};


Whereas for arm_smccc_1_0_smc(), we would call assembly function. I 
assuming this is the problem?


I think this is similar to the discussion we had on set_interrupts() and 
dt_set_cells(). If so, couldn't we tell ECLAIR that 
__arm_smccc_1_0_smc() will always initialize *res?


Cheers,

--
Julien Grall



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Jan Beulich
On 20.07.2023 13:20, Roger Pau Monné wrote:
> On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>> @@ -318,14 +323,17 @@ void vpci_dump_msi(void)
>>   * holding the lock.
>>   */

Note the comment here.

>>  printk("unable to print all MSI-X entries: %d\n", rc);
>> -process_pending_softirqs();
>> -continue;
>> +goto pdev_done;
>>  }
>>  }
>>  
>>  spin_unlock(>vpci->lock);
>> + pdev_done:
>> +read_unlock(>pci_lock);
>>  process_pending_softirqs();
>> +read_lock(>pci_lock);
> 
> read_trylock().

Plus the same scheme as with the spin lock wants following imo:
vpci_msix_arch_print() returns an error only with (now) both locks
dropped. This then wants reflecting in the comment pointed out
above.

Jan



Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Nicola Vetrini




On 20/07/23 17:39, Julien Grall wrote:

Hi,

The e-mail is getting quite long. Can you trim the unnecessary bits when 
replying?




Ok.


On 20/07/2023 15:23, Nicola Vetrini wrote:
The problem is that _t may be uninitialized, hence assigning its 
address to t could be problematic.


But the value is set right after. IOW, there is no read between. So 
how is this prob


Another way to address this is to initialize _t to a bad value and 
use this variable in the body, then assign to t based on the value 
just before returning.


IHMO, neither solution are ideal. I think we should investigate 
whether Eclair can be improved.


[...]



I'll see what can be done about it, I'll reply when I have an answer.



What about this:

-    p2m_type_t _t;
+    p2m_type_t _t = p2m_invalid;
[...]
  t = t ?: &_t;
-    *t = p2m_invalid;
+    *t = _t;


The resulting code is still quite confusing. I am still not quite sure 
why ECLAIR can't understand the construct. Apologies if this was already 
said, but this thread is getting quite long with many different issues. 
So it is a bit difficult to navigate (which is why I suggested to split 
and have a commit message explaining the rationale for each).


Anyway, if we can't improve Eclair, then my preference would be the 
following:


diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index de32a2d638ba..05d65db01b0c 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -496,16 +496,13 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t 
gfn,

  lpae_t entry, *table;
  int rc;
  mfn_t mfn = INVALID_MFN;
-    p2m_type_t _t;
  DECLARE_OFFSETS(offsets, addr);

  ASSERT(p2m_is_locked(p2m));
  BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);

-    /* Allow t to be NULL */
-    t = t ?: &_t;
-
-    *t = p2m_invalid;
+    if ( t )
+    *t = p2m_invalid;

  if ( valid )
  *valid = false;
@@ -549,7 +546,8 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,

  if ( p2m_is_valid(entry) )
  {
-    *t = entry.p2m.type;
+    if ( t )
+    *t = entry.p2m.type;

  if ( a )
  *a = p2m_mem_access_radix_get(p2m, gfn);



Ok, I'll make a separate patch.

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock

2023-07-20 Thread Jan Beulich
On 20.07.2023 02:32, Volodymyr Babchuk wrote:
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>  
>  if ( devfn == pdev->devfn && pdev->domain != target )
>  {
> -list_move(>domain_list, >pdev_list);
> -pdev->domain = target;
> +write_lock(>domain->pci_lock);
> +list_del(>domain_list);
> +write_unlock(>domain->pci_lock);

As mentioned on an earlier version, perhaps better (cheaper) to use
"source" here? (Same in VT-d code then.)

> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  ret = 0;
>  if ( !pdev->domain )
>  {
> +write_lock(_domain->pci_lock);
>  pdev->domain = hardware_domain;
>  list_add(>domain_list, _domain->pdev_list);
>  
> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>  list_del(>domain_list);
>  pdev->domain = NULL;
> +write_unlock(_domain->pci_lock);
>  goto out;

In addition to Roger's comments about locking scope: In a case like this
one it would probably also be good to move the printk() out of the locked
area. It can be slow, after all.

Question is why you have this wide a locked area here in the first place:
Don't you need to hold the lock just across the two list operations (but
not in between)?

> @@ -887,26 +895,62 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>  
>  int pci_release_devices(struct domain *d)
>  {
> -struct pci_dev *pdev, *tmp;
> -u8 bus, devfn;
> -int ret;
> +int combined_ret;
> +LIST_HEAD(failed_pdevs);
>  
>  pcidevs_lock();
> -ret = arch_pci_clean_pirqs(d);
> -if ( ret )
> +write_lock(>pci_lock);
> +combined_ret = arch_pci_clean_pirqs(d);
> +if ( combined_ret )
>  {
>  pcidevs_unlock();
> -return ret;
> +write_unlock(>pci_lock);
> +return combined_ret;
>  }
> -list_for_each_entry_safe ( pdev, tmp, >pdev_list, domain_list )
> +
> +while ( !list_empty(>pdev_list) )
>  {
> -bus = pdev->bus;
> -devfn = pdev->devfn;
> -ret = deassign_device(d, pdev->seg, bus, devfn) ?: ret;
> +struct pci_dev *pdev = list_first_entry(>pdev_list,
> +struct pci_dev,
> +domain_list);
> +uint16_t seg = pdev->seg;
> +uint8_t bus = pdev->bus;
> +uint8_t devfn = pdev->devfn;
> +int ret;
> +
> +write_unlock(>pci_lock);
> +ret = deassign_device(d, seg, bus, devfn);
> +write_lock(>pci_lock);
> +if ( ret )
> +{
> +bool still_present = false;
> +const struct pci_dev *tmp;
> +
> +/*
> + * We need to check if deassign_device() left our pdev in
> + * domain's list. As we dropped the lock, we can't be sure
> + * that list wasn't permutated in some random way, so we
> + * need to traverse the whole list.
> + */
> +for_each_pdev ( d, tmp )
> +{
> +if ( tmp == pdev )
> +{
> +still_present = true;
> +break;
> +}
> +}
> +if ( still_present )
> +list_move(>domain_list, _pdevs);

In order to retain original ordering on the resulting list, perhaps better
list_move_tail()?

Jan



Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Julien Grall

Hi,

The e-mail is getting quite long. Can you trim the unnecessary bits when 
replying?


On 20/07/2023 15:23, Nicola Vetrini wrote:
The problem is that _t may be uninitialized, hence assigning its 
address to t could be problematic.


But the value is set right after. IOW, there is no read between. So 
how is this prob


Another way to address this is to initialize _t to a bad value and 
use this variable in the body, then assign to t based on the value 
just before returning.


IHMO, neither solution are ideal. I think we should investigate 
whether Eclair can be improved.


[...]



I'll see what can be done about it, I'll reply when I have an answer.



What about this:

-    p2m_type_t _t;
+    p2m_type_t _t = p2m_invalid;
[...]
  t = t ?: &_t;
-    *t = p2m_invalid;
+    *t = _t;


The resulting code is still quite confusing. I am still not quite sure 
why ECLAIR can't understand the construct. Apologies if this was already 
said, but this thread is getting quite long with many different issues. 
So it is a bit difficult to navigate (which is why I suggested to split 
and have a commit message explaining the rationale for each).


Anyway, if we can't improve Eclair, then my preference would be the 
following:


diff --git a/xen/arch/arm/p2m.c b/xen/arch/arm/p2m.c
index de32a2d638ba..05d65db01b0c 100644
--- a/xen/arch/arm/p2m.c
+++ b/xen/arch/arm/p2m.c
@@ -496,16 +496,13 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,
 lpae_t entry, *table;
 int rc;
 mfn_t mfn = INVALID_MFN;
-p2m_type_t _t;
 DECLARE_OFFSETS(offsets, addr);

 ASSERT(p2m_is_locked(p2m));
 BUILD_BUG_ON(THIRD_MASK != PAGE_MASK);

-/* Allow t to be NULL */
-t = t ?: &_t;
-
-*t = p2m_invalid;
+if ( t )
+*t = p2m_invalid;

 if ( valid )
 *valid = false;
@@ -549,7 +546,8 @@ mfn_t p2m_get_entry(struct p2m_domain *p2m, gfn_t gfn,

 if ( p2m_is_valid(entry) )
 {
-*t = entry.p2m.type;
+if ( t )
+*t = entry.p2m.type;

 if ( a )
 *a = p2m_mem_access_radix_get(p2m, gfn);

Cheers,

--
Julien Grall



Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Julien Grall

Hi Nicola,

On 20/07/2023 11:14, Nicola Vetrini wrote:



On 17/07/23 15:40, Julien Grall wrote:

Hi Nicola,

On 17/07/2023 13:08, Nicola Vetrini wrote:

On 14/07/23 15:00, Julien Grall wrote:

Hi Nicola,

On 14/07/2023 12:49, Nicola Vetrini wrote:

This patch aims to fix some occurrences of possibly uninitialized
variables, that may be read before being written. This behaviour would
violate MISRA C:2012 Rule 9.1, besides being generally undesirable.

In all the analyzed cases, such accesses were actually safe, but it's
quite difficult to prove so by automatic checking, therefore a safer
route is to change the code so as to avoid the behaviour from 
occurring,

while preserving the semantics.

To achieve this goal, I adopted the following strategies:


Please let's at least one patch per strategy. I would also consider 
some of the rework separate so they can go in regardless the 
decision for the SAF-*.




- Add a suitably formatted local deviation comment
   (as indicated in 'docs/misra/documenting-violations.rst')
   to exempt the following line from checking.

- Provide an initialization for the variable at the declaration.

- Substitute a goto breaking out of control flow logic with a 
semantically

   equivalent do { .. } while(0).


As I already mentioned in private, it is unclear to me how you 
decided which strategy to use. I still think we need to define our 
policy before changing the code. Otherwise, it is going to be 
difficult to decide for new code.




The main point of this RFC is doing so. From what I gathered, it's 
not an easy task: sometimes there are no 'safe' values to initialize 
variables to and sometimes there is no easy way to prove that indeed 
something is always initialized or not accessed at all.


But you wrote the code. So you should be able to explain how you took 
the decision between one and the others.


Also, even if this is an RFC, it would have been good to summarize any 
discussion that happened in private and if there were concern try to 
come up with ideas or at least listing the concerns after '---.




I'll keep this if the need arises in the future.





Signed-off-by: Nicola Vetrini 
---
  docs/misra/safe.json   |  8 +++
  xen/arch/arm/arm64/lib/find_next_bit.c |  1 +
  xen/arch/arm/bootfdt.c |  6 +
  xen/arch/arm/decode.c  |  2 ++
  xen/arch/arm/domain_build.c    | 29 ++
  xen/arch/arm/efi/efi-boot.h    |  6 +++--
  xen/arch/arm/gic-v3-its.c  |  9 ---
  xen/arch/arm/mm.c  |  1 +
  xen/arch/arm/p2m.c | 33 
+++---

  9 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index e3c8a1d8eb..244001f5be 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -12,6 +12,14 @@
  },
  {
  "id": "SAF-1-safe",
+    "analyser": {
+    "eclair": "MC3R1.R9.1"
+    },
+    "name": "Rule 9.1: initializer not needed",
+    "text": "The following local variables are possibly 
subject to being read before being written, but code inspection 
ensured that the control flow in the construct where they appear 
ensures that no such event may happen."
I am bit concerned which such statement because the code instance 
was today with the current code. This could change in the future and 
invalide the reasoning.


It is not clear to me if we have any mechanism to prevent that. If 
we don't, then I think we need to drastically reduce the number of 
time this is used (there are a bit too much for my taste).




Indeed, the purpose of such a deviation is that the sound 
overapproximation computed by the tool requires a human to look at 
the code and think twice before modifying it (i.e., if ever that code 
is touched, the reviewer ought to assess whether that justification 
still holds or some other thing should be done about it.


Your assumption is the reviewer will notice there is an existing 
devitation and be able to assess it has changed. I view this 
assumption as risky in the long term.


Have you investigate to improve the automatic tooling?



Well, as discussed elsewhere in the thread, a slightly modified version 
of this deviation comment can list the specific reason why such a thing 
was deviated directly at the declaration or where the caution is, if you 
think this is better.


Example:

// <- SAF-x here
int var;

[...]

// <- or HERE
f();

An alternative approach to justification, partly discussed with Stefano 
in private is a macro that looks like an attribute to signal that the 
variable is intentionally uninitialized. This does not have the benefit 
of a written justification with a proper comment or an entry in the json 
file, but is less intrusive and the justification for all occurrences of 
__uninit w.r.t R9.1 would be included in the static analysis tool 

Re: [XEN PATCH] xen: address MISRA C:2012 Rule 4.1

2023-07-20 Thread Jan Beulich
On 20.07.2023 02:23, Stefano Stabellini wrote:
> On Wed, 19 Jul 2023, Nicola Vetrini wrote:
>> MISRA C:2012 Rule 4.1 has the following headline:
>> "Octal and hexadecimal escape sequences shall be terminated."
>>
>> The string literals modified by this patch contain octal or
>> hexadecimal escape sequences that are neither terminated by the
>> end of the literal, nor by the beginning of another escape sequence.
>>
>> Therefore, such unterminated sequences have been split into a
>> separate literal as a way to comply with the rule and preserve the
>> semantics of the code.
>>
>> No functional changes.
>>
>> Signed-off-by: Nicola Vetrini 
> 
> Reviewed-by: Stefano Stabellini 

In order to get this off the plate
Acked-by: Jan Beulich 
albeit I'm not overly happy with ...

>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -3853,7 +3853,7 @@ void hvm_ud_intercept(struct cpu_user_regs *regs)
>>  cs, ) &&
>>   (hvm_copy_from_guest_linear(sig, addr, sizeof(sig),
>>   walk, NULL) == HVMTRANS_okay) &&
>> - (memcmp(sig, "\xf\xbxen", sizeof(sig)) == 0) )
>> + (memcmp(sig, "\xf\xb" "xen", sizeof(sig)) == 0) )

... this. Imo it should never have been a string literal here. But
I'm also not really up to making yet another alternative patch.

Jan



[PATCH] tools/xenstore: fix get_spec_node()

2023-07-20 Thread Juergen Gross
In case get_spec_node() is being called for a special node starting
with '@' it won't set *canonical_name. This can result in a crash of
xenstored due to dereferencing the uninitialized name in
fire_watches().

This is no security issue as it requires either a privileged caller or
ownership of the special node in question by an unprivileged caller
(which is questionable, as this would make the owner privileged in some
way).

Fixes: d6bb63924fc2 ("tools/xenstore: introduce dummy nodes for special watch 
paths")
Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index a1d3047e48..790c403904 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1252,8 +1252,11 @@ static struct node *get_spec_node(struct connection 
*conn, const void *ctx,
  const char *name, char **canonical_name,
  unsigned int perm)
 {
-   if (name[0] == '@')
+   if (name[0] == '@') {
+   if (canonical_name)
+   *canonical_name = (char *)name;
return get_node(conn, ctx, name, perm);
+   }
 
return get_node_canonicalized(conn, ctx, name, canonical_name, perm);
 }
-- 
2.35.3




Re: [XEN PATCH v2] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 02:33:34PM +0200, Federico Serafini wrote:
> Change parameter types of function declarations to be consistent with
> the ones used in the corresponding definitions,
> thus addressing violations of MISRA C:2012 Rule 8.3 ("All declarations
> of an object or function shall use the same names and type qualifiers").
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Roger Pau Monné 

Thanks, Roger.



[PATCH] tools/xenstore: fix XSA-417 patch

2023-07-20 Thread Juergen Gross
The fix for XSA-417 had a bug: domain_alloc_permrefs() will not return
a negative value in case of an error, but a plain errno value.

Note this is not considered to be a security issue, as the only case
where domain_alloc_permrefs() will return an error is a failed memory
allocation. As a guest should not be able to drive Xenstore out of
memory, this is NOT a problem a guest can trigger at will.

Fixes: ab128218225d ("tools/xenstore: fix checking node permissions")
Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 31a862b715..a1d3047e48 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1784,7 +1784,7 @@ static int do_set_perms(const void *ctx, struct 
connection *conn,
if (!xenstore_strings_to_perms(perms.p, perms.num, permstr))
return errno;
 
-   if (domain_alloc_permrefs() < 0)
+   if (domain_alloc_permrefs())
return ENOMEM;
if (perms.p[0].perms & XS_PERM_IGNORE)
return ENOENT;
-- 
2.35.3




Re: [XEN PATCH v3] x86/HVM: address violations of MISRA C:2012 Rules 8.2 and 8.3

2023-07-20 Thread Jan Beulich
On 20.07.2023 09:53, Federico Serafini wrote:
> Give a name to unnamed parameters thus addressing violations of
> MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with
> named parameters").
> Keep consistency between parameter names and types used in function
> declarations and the ones used in the corresponding function
> definitions, thus addressing violations of MISRA C:2012 Rule 8.3
> ("All declarations of an object or function shall use the same names
> and type qualifiers").
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





[XEN PATCH] xen/arm: optee: provide an initialization for struct arm_smccc_res

2023-07-20 Thread Nicola Vetrini
The local variables with type 'struct arm_smccc_res' are initialized
just after the declaration to avoid any possible read usage prior
to any write usage, which would constitute a violation of
MISRA C:2012 Rule 9.1.

This is already prevented by suitable checks in the code,
but the correctness of this approach is difficult to prove and
reason about.

Therefore, storing a suitable initial value in those registers
(OPTEE_SMC_RETURN_ENOTAVAIL) will prevent futher checks from
assuming the operation performed by the macro 'arm_smccc_smc'
was completed correctly.

Signed-off-by: Nicola Vetrini 
---
I was in doubt about the safe value to put in 'optee_relinquish_resources'
therefore I zero-initialized it.
---
 xen/arch/arm/tee/optee.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/tee/optee.c b/xen/arch/arm/tee/optee.c
index 301d205a36..2c2ae88c28 100644
--- a/xen/arch/arm/tee/optee.c
+++ b/xen/arch/arm/tee/optee.c
@@ -171,6 +171,10 @@ static bool optee_probe(void)
 {
 struct dt_device_node *node;
 struct arm_smccc_res resp;
+resp.a0 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a1 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a2 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a3 = OPTEE_SMC_RETURN_ENOTAVAIL;
 
 /* Check for entry in dtb */
 node = dt_find_compatible_node(NULL, NULL, "linaro,optee-tz");
@@ -229,6 +233,7 @@ static int optee_domain_init(struct domain *d)
 {
 struct arm_smccc_res resp;
 struct optee_domain *ctx;
+resp.a0 = OPTEE_SMC_RETURN_ENOTAVAIL;
 
 ctx = xzalloc(struct optee_domain);
 if ( !ctx )
@@ -640,7 +645,7 @@ static void free_optee_shm_buf_pg_list(struct optee_domain 
*ctx,
 
 static int optee_relinquish_resources(struct domain *d)
 {
-struct arm_smccc_res resp;
+struct arm_smccc_res resp = {0};
 struct optee_std_call *call, *call_tmp;
 struct shm_rpc *shm_rpc, *shm_rpc_tmp;
 struct optee_shm_buf *optee_shm_buf, *optee_shm_buf_tmp;
@@ -1169,6 +1174,10 @@ static void do_call_with_arg(struct optee_domain *ctx,
  register_t a3, register_t a4, register_t a5)
 {
 struct arm_smccc_res res;
+res.a0 = OPTEE_SMC_RETURN_ENOTAVAIL;
+res.a1 = OPTEE_SMC_RETURN_ENOTAVAIL;
+res.a2 = OPTEE_SMC_RETURN_ENOTAVAIL;
+res.a3 = OPTEE_SMC_RETURN_ENOTAVAIL;
 
 arm_smccc_smc(a0, a1, a2, a3, a4, a5, 0, OPTEE_CLIENT_ID(current->domain),
   );
@@ -1608,6 +1617,8 @@ static void handle_exchange_capabilities(struct 
cpu_user_regs *regs)
 {
 struct arm_smccc_res resp;
 uint32_t caps;
+resp.a0 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a1 = OPTEE_SMC_RETURN_ENOTAVAIL;
 
 /* Filter out unknown guest caps */
 caps = get_user_reg(regs, 1);
@@ -1643,6 +1654,10 @@ static bool optee_handle_call(struct cpu_user_regs *regs)
 {
 struct arm_smccc_res resp;
 struct optee_domain *ctx = current->domain->arch.tee;
+resp.a0 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a1 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a2 = OPTEE_SMC_RETURN_ENOTAVAIL;
+resp.a3 = OPTEE_SMC_RETURN_ENOTAVAIL;
 
 if ( !ctx )
 return false;
-- 
2.34.1




Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping

2023-07-20 Thread Oleksii
On Thu, 2023-07-20 at 16:06 +0200, Jan Beulich wrote:
> On 20.07.2023 15:34, Oleksii wrote:
> > On Thu, 2023-07-20 at 12:29 +0200, Jan Beulich wrote:
> > > On 20.07.2023 10:28, Oleksii wrote:
> > > > On Thu, 2023-07-20 at 07:58 +0200, Jan Beulich wrote:
> > > > > On 19.07.2023 18:35, Oleksii wrote:
> > > > > > Then we will have completely different L0 tables for
> > > > > > identity
> > > > > > mapping
> > > > > > and not identity and the code above will be correct.
> > > > > 
> > > > > As long as Xen won't grow beyond 2Mb total. Considering that
> > > > > at
> > > > > some point you may want to use large page mappings for .text,
> > > > > .data, and .rodata, that alone would grow Xen to 6 Mb (or
> > > > > really
> > > > > 8,
> > > > > assuming .init goes separate as well). That's leaving aside
> > > > > the
> > > > > realistic option of the mere sum of all sections being larger
> > > > > than
> > > > > 2. That said, even Arm64 with ACPI is still quite a bit below
> > > > > 2Mb.
> > > > > x86 is nearing 2.5 though in even a somewhat limited config;
> > > > > allyesconfig may well be beyond that already.
> > > > I am missing something about Xen size. Lets assume that Xen
> > > > will be
> > > > mapped using only 4k pagees ( like it is done now ). Then if
> > > > Xen
> > > > will
> > > > be more then 2Mb then only what will be changed is a number of
> > > > page
> > > > tables so it is only question of changing of
> > > > PGTBL_INITIAL_COUNT (
> > > > in
> > > > case of RISC-V).
> > > 
> > > And the way you do the tearing down of the transient 1:1 mapping.
> > It looks like removing 1:1 mapping will be the same.
> > 
> > Let's assume that the size of Xen is 4 MB and that load and linker
> > ranges don't overlap ( load and linker start address are 2Mb
> > aligned ),
> > and the difference between them isn't bigger than 1 GB. Then one L2
> > page table, one L1 page table and two L0 page tables for identity
> > mapping, and two L0 page tables for non-identity mapping are
> > needed.
> > Then at L1, we will have different indexes for load_start and
> > linker_start. So what will be needed is to clean two L1 page table
> > entries started from some index.
> > 
> > The only issue I see now is that it won't work in case if identity
> > mapping crosses a 1 Gb boundary. Then for identity mapping, it will
> > be
> > needed two L1 page tables, and only one of them identity mapping
> > will
> > be removed.
> > 
> > Do I miss anything else?
> 
> Looks correct to me.
> 
> > Wouldn't it be better to take into account that now?
> 
> Sure, it's generally better to avoid leaving traps for someone to
> fall into later.

Thanks a lot. Then it make sense to update the removing identity
mapping algo.

~ Oleksii




Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Nicola Vetrini




On 20/07/23 12:14, Nicola Vetrini wrote:



On 17/07/23 15:40, Julien Grall wrote:

Hi Nicola,

On 17/07/2023 13:08, Nicola Vetrini wrote:

On 14/07/23 15:00, Julien Grall wrote:

Hi Nicola,

On 14/07/2023 12:49, Nicola Vetrini wrote:

This patch aims to fix some occurrences of possibly uninitialized
variables, that may be read before being written. This behaviour would
violate MISRA C:2012 Rule 9.1, besides being generally undesirable.

In all the analyzed cases, such accesses were actually safe, but it's
quite difficult to prove so by automatic checking, therefore a safer
route is to change the code so as to avoid the behaviour from 
occurring,

while preserving the semantics.

To achieve this goal, I adopted the following strategies:


Please let's at least one patch per strategy. I would also consider 
some of the rework separate so they can go in regardless the 
decision for the SAF-*.




- Add a suitably formatted local deviation comment
   (as indicated in 'docs/misra/documenting-violations.rst')
   to exempt the following line from checking.

- Provide an initialization for the variable at the declaration.

- Substitute a goto breaking out of control flow logic with a 
semantically

   equivalent do { .. } while(0).


As I already mentioned in private, it is unclear to me how you 
decided which strategy to use. I still think we need to define our 
policy before changing the code. Otherwise, it is going to be 
difficult to decide for new code.




The main point of this RFC is doing so. From what I gathered, it's 
not an easy task: sometimes there are no 'safe' values to initialize 
variables to and sometimes there is no easy way to prove that indeed 
something is always initialized or not accessed at all.


But you wrote the code. So you should be able to explain how you took 
the decision between one and the others.


Also, even if this is an RFC, it would have been good to summarize any 
discussion that happened in private and if there were concern try to 
come up with ideas or at least listing the concerns after '---.




I'll keep this if the need arises in the future.





Signed-off-by: Nicola Vetrini 
---
  docs/misra/safe.json   |  8 +++
  xen/arch/arm/arm64/lib/find_next_bit.c |  1 +
  xen/arch/arm/bootfdt.c |  6 +
  xen/arch/arm/decode.c  |  2 ++
  xen/arch/arm/domain_build.c    | 29 ++
  xen/arch/arm/efi/efi-boot.h    |  6 +++--
  xen/arch/arm/gic-v3-its.c  |  9 ---
  xen/arch/arm/mm.c  |  1 +
  xen/arch/arm/p2m.c | 33 
+++---

  9 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index e3c8a1d8eb..244001f5be 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -12,6 +12,14 @@
  },
  {
  "id": "SAF-1-safe",
+    "analyser": {
+    "eclair": "MC3R1.R9.1"
+    },
+    "name": "Rule 9.1: initializer not needed",
+    "text": "The following local variables are possibly 
subject to being read before being written, but code inspection 
ensured that the control flow in the construct where they appear 
ensures that no such event may happen."
I am bit concerned which such statement because the code instance 
was today with the current code. This could change in the future and 
invalide the reasoning.


It is not clear to me if we have any mechanism to prevent that. If 
we don't, then I think we need to drastically reduce the number of 
time this is used (there are a bit too much for my taste).




Indeed, the purpose of such a deviation is that the sound 
overapproximation computed by the tool requires a human to look at 
the code and think twice before modifying it (i.e., if ever that code 
is touched, the reviewer ought to assess whether that justification 
still holds or some other thing should be done about it.


Your assumption is the reviewer will notice there is an existing 
devitation and be able to assess it has changed. I view this 
assumption as risky in the long term.


Have you investigate to improve the automatic tooling?



Well, as discussed elsewhere in the thread, a slightly modified version 
of this deviation comment can list the specific reason why such a thing 
was deviated directly at the declaration or where the caution is, if you 
think this is better.


Example:

// <- SAF-x here
int var;

[...]

// <- or HERE
f();

An alternative approach to justification, partly discussed with Stefano 
in private is a macro that looks like an attribute to signal that the 
variable is intentionally uninitialized. This does not have the benefit 
of a written justification with a proper comment or an entry in the json 
file, but is less intrusive and the justification for all occurrences of 
__uninit w.r.t R9.1 would be included in the static analysis tool 
configuration, which 

Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping

2023-07-20 Thread Jan Beulich
On 20.07.2023 15:34, Oleksii wrote:
> On Thu, 2023-07-20 at 12:29 +0200, Jan Beulich wrote:
>> On 20.07.2023 10:28, Oleksii wrote:
>>> On Thu, 2023-07-20 at 07:58 +0200, Jan Beulich wrote:
 On 19.07.2023 18:35, Oleksii wrote:
> Then we will have completely different L0 tables for identity
> mapping
> and not identity and the code above will be correct.

 As long as Xen won't grow beyond 2Mb total. Considering that at
 some point you may want to use large page mappings for .text,
 .data, and .rodata, that alone would grow Xen to 6 Mb (or really
 8,
 assuming .init goes separate as well). That's leaving aside the
 realistic option of the mere sum of all sections being larger
 than
 2. That said, even Arm64 with ACPI is still quite a bit below
 2Mb.
 x86 is nearing 2.5 though in even a somewhat limited config;
 allyesconfig may well be beyond that already.
>>> I am missing something about Xen size. Lets assume that Xen will be
>>> mapped using only 4k pagees ( like it is done now ). Then if Xen
>>> will
>>> be more then 2Mb then only what will be changed is a number of page
>>> tables so it is only question of changing of PGTBL_INITIAL_COUNT (
>>> in
>>> case of RISC-V).
>>
>> And the way you do the tearing down of the transient 1:1 mapping.
> It looks like removing 1:1 mapping will be the same.
> 
> Let's assume that the size of Xen is 4 MB and that load and linker
> ranges don't overlap ( load and linker start address are 2Mb aligned ),
> and the difference between them isn't bigger than 1 GB. Then one L2
> page table, one L1 page table and two L0 page tables for identity
> mapping, and two L0 page tables for non-identity mapping are needed.
> Then at L1, we will have different indexes for load_start and
> linker_start. So what will be needed is to clean two L1 page table
> entries started from some index.
> 
> The only issue I see now is that it won't work in case if identity
> mapping crosses a 1 Gb boundary. Then for identity mapping, it will be
> needed two L1 page tables, and only one of them identity mapping will
> be removed.
> 
> Do I miss anything else?

Looks correct to me.

> Wouldn't it be better to take into account that now?

Sure, it's generally better to avoid leaving traps for someone to
fall into later.

Jan



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 03:27:29PM +0200, Jan Beulich wrote:
> On 20.07.2023 13:20, Roger Pau Monné wrote:
> > On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> >> @@ -447,8 +476,16 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
> >> unsigned int size,
> >>  
> >>  /*
> >>   * Find the PCI dev matching the address, which for hwdom also 
> >> requires
> >> - * consulting DomXEN.  Passthrough everything that's not trapped.
> >> + * consulting DomXEN. Passthrough everything that's not trapped.
> >> + * If this is hwdom, we need to hold locks for both domain in case if
> >> + * modify_bars is called()
> > 
> > Typo: the () wants to be at the end of modify_bars().
> > 
> >>   */
> >> +read_lock(>pci_lock);
> >> +
> >> +/* dom_xen->pci_lock always should be taken second to prevent 
> >> deadlock */
> >> +if ( is_hardware_domain(d) )
> >> +read_lock(_xen->pci_lock);
> > 
> > For modify_bars() we also want the locks to be in write mode (at least
> > the hw one), so that the position of the BARs can't be changed while
> > modify_bars() is iterating over them.
> 
> Isn't changing of the BARs happening under the vpci lock?

It is.

> Or else I guess
> I haven't understood the description correctly: My reading so far was
> that it is only the presence (allocation status / pointer validity) that
> is protected by this new lock.

Hm, I see, yes.  I guess it was a previous patch version that also
took care of the modify_bars() issue by taking the lock in exclusive
mode here.

We can always do that later, so forget about that comment (for now).

Thanks, Roger.



[xen-unstable test] 181904: tolerable FAIL - PUSHED

2023-07-20 Thread osstest service owner
flight 181904 xen-unstable real [real]
flight 181925 xen-unstable real-retest [real]
http://logs.test-lab.xenproject.org/osstest/logs/181904/
http://logs.test-lab.xenproject.org/osstest/logs/181925/

Failures :-/ but no regressions.

Tests which are failing intermittently (not blocking):
 test-amd64-i386-xl-qemuu-debianhvm-i386-xsm 7 xen-install fail pass in 
181925-retest

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 181875
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 181875
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 181875
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 181875
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 181875
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 181875
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 181875
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 181875
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 181875
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 181875
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 181875
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stopfail like 181875
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 xen  b1c16800e52743d9afd9af62c810f03af16dd942
baseline version:
 xen  e04cc8a08df3574bd7d5f7860008f1625e28f8b1

Last test of basis   181875  2023-07-18 

Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping

2023-07-20 Thread Oleksii
On Thu, 2023-07-20 at 12:29 +0200, Jan Beulich wrote:
> On 20.07.2023 10:28, Oleksii wrote:
> > On Thu, 2023-07-20 at 07:58 +0200, Jan Beulich wrote:
> > > On 19.07.2023 18:35, Oleksii wrote:
> > > > On Tue, 2023-07-18 at 17:03 +0200, Jan Beulich wrote:
> > > > > > +    unsigned long load_end = LINK_TO_LOAD(_end);
> > > > > > +    unsigned long pt_level_size =
> > > > > > XEN_PT_LEVEL_SIZE(i
> > > > > > -
> > > > > > 1);
> > > > > > +    unsigned long xen_size = ROUNDUP(load_end -
> > > > > > load_start, pt_level_size);
> > > > > > +    unsigned long page_entries_num = xen_size /
> > > > > > pt_level_size;
> > > > > > +
> > > > > > +    while ( page_entries_num-- )
> > > > > > +    pgtbl[index++].pte = 0;
> > > > > > +
> > > > > > +    break;
> > > > > 
> > > > > Unless there's a "not crossing a 2Mb boundary" guarantee
> > > > > somewhere
> > > > > that I've missed, this "break" is still too early afaict.
> > > > If I will add a '2 MB boundary check' for load_start and
> > > > linker_start
> > > > could it be an upstreamable solution?
> > > > 
> > > > Something like:
> > > >     if ( !IS_ALIGNED(load_start, MB(2) )
> > > > printk("load_start should be 2Mb algined\n");
> > > > and
> > > >     ASSERT( !IS_ALIGNED(XEN_VIRT_START, MB(2) )
> > > > in xen.lds.S.
> > > 
> > > Arranging for the linked address to be 2Mb-aligned is certainly
> > > reasonable. Whether expecting the load address to also be depends
> > > on whether that can be arranged for (which in turn depends on
> > > boot
> > > loader behavior); it cannot be left to "luck".
> > Maybe I didn't quite understand you here, but if Xen has an
> > alignment
> > check of load address then boot loader has to follow the alignment
> > requirements of Xen. So it doesn't look as 'luck'.
> 
> That depends on (a) the alignment being properly expressed in the
> final binary and (b) the boot loader honoring it. (b) is what you
> double-check above, emitting a printk(), but I'm not sure about (a)
> being sufficiently enforced with just the ASSERT in the linker
> script. Maybe I'm wrong, though.
It should be enough for current purpose but probably I am missing
something.

> 
> > > > Then we will have completely different L0 tables for identity
> > > > mapping
> > > > and not identity and the code above will be correct.
> > > 
> > > As long as Xen won't grow beyond 2Mb total. Considering that at
> > > some point you may want to use large page mappings for .text,
> > > .data, and .rodata, that alone would grow Xen to 6 Mb (or really
> > > 8,
> > > assuming .init goes separate as well). That's leaving aside the
> > > realistic option of the mere sum of all sections being larger
> > > than
> > > 2. That said, even Arm64 with ACPI is still quite a bit below
> > > 2Mb.
> > > x86 is nearing 2.5 though in even a somewhat limited config;
> > > allyesconfig may well be beyond that already.
> > I am missing something about Xen size. Lets assume that Xen will be
> > mapped using only 4k pagees ( like it is done now ). Then if Xen
> > will
> > be more then 2Mb then only what will be changed is a number of page
> > tables so it is only question of changing of PGTBL_INITIAL_COUNT (
> > in
> > case of RISC-V).
> 
> And the way you do the tearing down of the transient 1:1 mapping.
It looks like removing 1:1 mapping will be the same.

Let's assume that the size of Xen is 4 MB and that load and linker
ranges don't overlap ( load and linker start address are 2Mb aligned ),
and the difference between them isn't bigger than 1 GB. Then one L2
page table, one L1 page table and two L0 page tables for identity
mapping, and two L0 page tables for non-identity mapping are needed.
Then at L1, we will have different indexes for load_start and
linker_start. So what will be needed is to clean two L1 page table
entries started from some index.

The only issue I see now is that it won't work in case if identity
mapping crosses a 1 Gb boundary. Then for identity mapping, it will be
needed two L1 page tables, and only one of them identity mapping will
be removed.

Do I miss anything else?
Wouldn't it be better to take into account that now?

> 
> > Could you please explain why Xen will grow to 6/8 MB in case of
> > larger
> > page mappings? In case of larger page mapping fewer tables are
> > needed.
> > For example, if we would like to use 2Mb pages then we will stop at
> > L1
> > page table and write an physical address to L1 page table entry
> > instead
> > of creating new L0 page table.
> 
> When you use 2Mb mappings, then you will want to use separate ones
> for .text, .rodata, and .data (plus perhaps .init), to express the
> differing permissions correctly. Consequently you'll need more
> virtual address space, but - yes - fewer page table pages. And of
> course the 1:1 unmapping logic will again be slightly different.
Thanks for clarification.

~ Oleksii




Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Jan Beulich
On 20.07.2023 13:20, Roger Pau Monné wrote:
> On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
>> @@ -447,8 +476,16 @@ void vpci_write(pci_sbdf_t sbdf, unsigned int reg, 
>> unsigned int size,
>>  
>>  /*
>>   * Find the PCI dev matching the address, which for hwdom also requires
>> - * consulting DomXEN.  Passthrough everything that's not trapped.
>> + * consulting DomXEN. Passthrough everything that's not trapped.
>> + * If this is hwdom, we need to hold locks for both domain in case if
>> + * modify_bars is called()
> 
> Typo: the () wants to be at the end of modify_bars().
> 
>>   */
>> +read_lock(>pci_lock);
>> +
>> +/* dom_xen->pci_lock always should be taken second to prevent deadlock 
>> */
>> +if ( is_hardware_domain(d) )
>> +read_lock(_xen->pci_lock);
> 
> For modify_bars() we also want the locks to be in write mode (at least
> the hw one), so that the position of the BARs can't be changed while
> modify_bars() is iterating over them.

Isn't changing of the BARs happening under the vpci lock? Or else I guess
I haven't understood the description correctly: My reading so far was
that it is only the presence (allocation status / pointer validity) that
is protected by this new lock.

Jan



Re: [PATCH v8 04/13] vpci: add hooks for PCI device assign/de-assign

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> When a PCI device gets assigned/de-assigned some work on vPCI side needs
> to be done for that device. Introduce a pair of hooks so vPCI can handle
> that.
> 
> Signed-off-by: Oleksandr Andrushchenko 
> ---
> Since v8:
> - removed vpci_deassign_device
> Since v6:
> - do not pass struct domain to vpci_{assign|deassign}_device as
>   pdev->domain can be used
> - do not leave the device assigned (pdev->domain == new domain) in case
>   vpci_assign_device fails: try to de-assign and if this also fails, then
>   crash the domain
> Since v5:
> - do not split code into run_vpci_init
> - do not check for is_system_domain in vpci_{de}assign_device
> - do not use vpci_remove_device_handlers_locked and re-allocate
>   pdev->vpci completely
> - make vpci_deassign_device void
> Since v4:
>  - de-assign vPCI from the previous domain on device assignment
>  - do not remove handlers in vpci_assign_device as those must not
>exist at that point
> Since v3:
>  - remove toolstack roll-back description from the commit message
>as error are to be handled with proper cleanup in Xen itself
>  - remove __must_check
>  - remove redundant rc check while assigning devices
>  - fix redundant CONFIG_HAS_VPCI check for CONFIG_HAS_VPCI_GUEST_SUPPORT
>  - use REGISTER_VPCI_INIT machinery to run required steps on device
>init/assign: add run_vpci_init helper
> Since v2:
> - define CONFIG_HAS_VPCI_GUEST_SUPPORT so dead code is not compiled
>   for x86
> Since v1:
>  - constify struct pci_dev where possible
>  - do not open code is_system_domain()
>  - extended the commit message
> ---
>  xen/drivers/Kconfig   |  4 
>  xen/drivers/passthrough/pci.c | 21 +
>  xen/drivers/vpci/vpci.c   | 18 ++
>  xen/include/xen/vpci.h| 15 +++
>  4 files changed, 58 insertions(+)
> 
> diff --git a/xen/drivers/Kconfig b/xen/drivers/Kconfig
> index db94393f47..780490cf8e 100644
> --- a/xen/drivers/Kconfig
> +++ b/xen/drivers/Kconfig
> @@ -15,4 +15,8 @@ source "drivers/video/Kconfig"
>  config HAS_VPCI
>   bool
>  
> +config HAS_VPCI_GUEST_SUPPORT
> + bool
> + depends on HAS_VPCI
> +
>  endmenu
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 6f8692cd9c..265d359704 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -885,6 +885,10 @@ static int deassign_device(struct domain *d, uint16_t 
> seg, uint8_t bus,
>  if ( ret )
>  goto out;
>  
> +write_lock(>domain->pci_lock);
> +vpci_deassign_device(pdev);
> +write_unlock(>domain->pci_lock);
> +
>  if ( pdev->domain == hardware_domain  )
>  pdev->quarantine = false;
>  
> @@ -1484,6 +1488,10 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>  if ( pdev->broken && d != hardware_domain && d != dom_io )
>  goto done;
>  
> +write_lock(>domain->pci_lock);
> +vpci_deassign_device(pdev);
> +write_unlock(>domain->pci_lock);
> +
>  rc = pdev_msix_assign(d, pdev);
>  if ( rc )
>  goto done;
> @@ -1509,6 +1517,19 @@ static int assign_device(struct domain *d, u16 seg, u8 
> bus, u8 devfn, u32 flag)
>  rc = iommu_call(hd->platform_ops, assign_device, d, devfn,
>  pci_to_dev(pdev), flag);
>  }
> +if ( rc )
> +goto done;
> +
> +devfn = pdev->devfn;
> +write_lock(>domain->pci_lock);
> +rc = vpci_assign_device(pdev);
> +write_unlock(>domain->pci_lock);
> +if ( rc && deassign_device(d, seg, bus, devfn) )
> +{
> +printk(XENLOG_ERR "%pd: %pp was left partially assigned\n",
> +   d, _SBDF(seg, bus, devfn));

>sbdf?  Then you can get of the devfn usage above.

> +domain_crash(d);

This seems like a bit different from the other error paths in the
function, isn't it fine to return an error and let the caller handle
the deassign?

Also, if we really need to call deassign_device() we must do so for
all possible phantom devices, see the above loop around
iommu_call(..., assing_device, ...);

> +}
>  
>   done:
>  if ( rc )
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index a6d2cf8660..a97710a806 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -107,6 +107,24 @@ int vpci_add_handlers(struct pci_dev *pdev)
>  
>  return rc;
>  }
> +
> +#ifdef CONFIG_HAS_VPCI_GUEST_SUPPORT
> +/* Notify vPCI that device is assigned to guest. */
> +int vpci_assign_device(struct pci_dev *pdev)
> +{
> +int rc;
> +
> +if ( !has_vpci(pdev->domain) )
> +return 0;
> +
> +rc = vpci_add_handlers(pdev);
> +if ( rc )
> +vpci_deassign_device(pdev);

Why do you need this handler, vpci_add_handlers() when failing will
already call vpci_remove_device(), which is what
vpci_deassign_device() does.

> +
> +

Re: [XEN PATCH] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types

2023-07-20 Thread Federico Serafini




On 20/07/23 14:15, Roger Pau Monné wrote:

On Thu, Jul 20, 2023 at 12:48:36PM +0200, Federico Serafini wrote:

+extern uint32_t get_pat_flags(struct vcpu *v, uint32_t gl1e_flags,
+  paddr_t gpaddr, paddr_t spaddr,
+   
uint8_t gmtrr_mtype);


Wrong usage of hard tabs.

Thanks, Roger.


Sorry, some update must have changed the settings of my editor.
Thanks for reporting.

Regards
--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



[XEN PATCH v2] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types

2023-07-20 Thread Federico Serafini
Change parameter types of function declarations to be consistent with
the ones used in the corresponding definitions,
thus addressing violations of MISRA C:2012 Rule 8.3 ("All declarations
of an object or function shall use the same names and type qualifiers").

No functional changes.

Signed-off-by: Federico Serafini 
---
Changes in v2:
  - removed unwated tabs.
---
 xen/arch/x86/include/asm/mtrr.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/mtrr.h b/xen/arch/x86/include/asm/mtrr.h
index e4f6ca6048..14246e3387 100644
--- a/xen/arch/x86/include/asm/mtrr.h
+++ b/xen/arch/x86/include/asm/mtrr.h
@@ -59,9 +59,10 @@ extern int mtrr_del_page(int reg, unsigned long base, 
unsigned long size);
 extern int mtrr_get_type(const struct mtrr_state *m, paddr_t pa,
  unsigned int order);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern u32 get_pat_flags(struct vcpu *v, u32 gl1e_flags, paddr_t gpaddr,
-  paddr_t spaddr, uint8_t gmtrr_mtype);
-extern unsigned char pat_type_2_pte_flags(unsigned char pat_type);
+extern uint32_t get_pat_flags(struct vcpu *v, uint32_t gl1e_flags,
+  paddr_t gpaddr, paddr_t spaddr,
+  uint8_t gmtrr_mtype);
+extern uint8_t pat_type_2_pte_flags(uint8_t pat_type);
 extern int hold_mtrr_updates_on_aps;
 extern void mtrr_aps_sync_begin(void);
 extern void mtrr_aps_sync_end(void);
-- 
2.34.1




Re: [XEN PATCH] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:48:36PM +0200, Federico Serafini wrote:
> Change parameter types of function declarations to be consistent with
> the ones used in the corresponding definitions,
> thus addressing violations of MISRA C:2012 Rule 8.3 ("All declarations
> of an object or function shall use the same names and type qualifiers").
> 
> No functional changes.
> 
> Signed-off-by: Federico Serafini 
> ---
>  xen/arch/x86/include/asm/mtrr.h | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/xen/arch/x86/include/asm/mtrr.h b/xen/arch/x86/include/asm/mtrr.h
> index e4f6ca6048..5d57a596ea 100644
> --- a/xen/arch/x86/include/asm/mtrr.h
> +++ b/xen/arch/x86/include/asm/mtrr.h
> @@ -59,9 +59,10 @@ extern int mtrr_del_page(int reg, unsigned long base, 
> unsigned long size);
>  extern int mtrr_get_type(const struct mtrr_state *m, paddr_t pa,
>   unsigned int order);
>  extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
> -extern u32 get_pat_flags(struct vcpu *v, u32 gl1e_flags, paddr_t gpaddr,
> -  paddr_t spaddr, uint8_t gmtrr_mtype);
> -extern unsigned char pat_type_2_pte_flags(unsigned char pat_type);
> +extern uint32_t get_pat_flags(struct vcpu *v, uint32_t gl1e_flags,
> +  paddr_t gpaddr, paddr_t spaddr,
> + 
> uint8_t gmtrr_mtype);

Wrong usage of hard tabs.

Thanks, Roger.



[QEMU PATCH v4 1/1] virtgpu: do not destroy resources when guest suspend

2023-07-20 Thread Jiqian Chen
After suspending and resuming guest VM, you will get
a black screen, and the display can't come back.

This is because when guest did suspending, it called
into qemu to call virtio_gpu_gl_reset. In function
virtio_gpu_gl_reset, it destroyed resources and reset
renderer, which were used for display. As a result,
guest's screen can't come back to the time when it was
suspended and only showed black.

So, this patch adds a new ctrl message
VIRTIO_GPU_CMD_SET_FREEZE_MODE to get notifications from
guest. If guest is during suspending, it sets freeze mode
of virtgpu to freeze_S3, this will prevent destroying
resources and resetting renderer when guest calls into
virtio_gpu_gl_reset. If guest is during resuming, it sets
freeze mode to unfreeze, and then virtio_gpu_gl_reset
will keep its origin actions and has no other impaction.

Due to this implemention needs cooperation with guest,
so it added a new feature flag VIRTIO_GPU_F_FREEZE_S3, so
that guest and host can negotiate whenever freeze_S3 is
supported or not.

Signed-off-by: Jiqian Chen 
---
 hw/display/virtio-gpu-base.c   |  3 ++
 hw/display/virtio-gpu-gl.c | 10 ++-
 hw/display/virtio-gpu-virgl.c  |  7 +
 hw/display/virtio-gpu.c| 55 --
 hw/virtio/virtio.c |  3 ++
 include/hw/virtio/virtio-gpu.h |  6 
 6 files changed, 81 insertions(+), 3 deletions(-)

diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index a29f191aa8..40ae4f9678 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -215,6 +215,9 @@ virtio_gpu_base_get_features(VirtIODevice *vdev, uint64_t 
features,
 if (virtio_gpu_blob_enabled(g->conf)) {
 features |= (1 << VIRTIO_GPU_F_RESOURCE_BLOB);
 }
+if (virtio_gpu_freeze_S3_enabled(g->conf)) {
+features |= (1 << VIRTIO_GPU_F_FREEZE_S3);
+}
 
 return features;
 }
diff --git a/hw/display/virtio-gpu-gl.c b/hw/display/virtio-gpu-gl.c
index e06be60dfb..cb418dae9a 100644
--- a/hw/display/virtio-gpu-gl.c
+++ b/hw/display/virtio-gpu-gl.c
@@ -100,7 +100,15 @@ static void virtio_gpu_gl_reset(VirtIODevice *vdev)
  */
 if (gl->renderer_inited && !gl->renderer_reset) {
 virtio_gpu_virgl_reset_scanout(g);
-gl->renderer_reset = true;
+/*
+ * If guest is suspending, we shouldn't reset renderer,
+ * otherwise, the display can't come back to the time when
+ * it was suspended after guest was resumed.
+ */
+if (!virtio_gpu_freeze_S3_enabled(g->parent_obj.conf) ||
+g->freeze_mode == VIRTIO_GPU_FREEZE_MODE_UNFREEZE) {
+gl->renderer_reset = true;
+}
 }
 }
 
diff --git a/hw/display/virtio-gpu-virgl.c b/hw/display/virtio-gpu-virgl.c
index 73cb92c8d5..fc1971be70 100644
--- a/hw/display/virtio-gpu-virgl.c
+++ b/hw/display/virtio-gpu-virgl.c
@@ -464,6 +464,13 @@ void virtio_gpu_virgl_process_cmd(VirtIOGPU *g,
 case VIRTIO_GPU_CMD_GET_EDID:
 virtio_gpu_get_edid(g, cmd);
 break;
+case VIRTIO_GPU_CMD_SET_FREEZE_MODE:
+if (virtio_gpu_freeze_S3_enabled(g->parent_obj.conf)) {
+virtio_gpu_cmd_set_freeze_mode(g, cmd);
+} else {
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+}
+break;
 default:
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 break;
diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 5e15c79b94..dcf83379a8 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -373,6 +373,16 @@ static void virtio_gpu_resource_create_blob(VirtIOGPU *g,
 QTAILQ_INSERT_HEAD(>reslist, res, next);
 }
 
+void virtio_gpu_cmd_set_freeze_mode(VirtIOGPU *g,
+ struct virtio_gpu_ctrl_command *cmd)
+{
+struct virtio_gpu_set_freeze_mode sf;
+
+VIRTIO_GPU_FILL_CMD(sf);
+virtio_gpu_bswap_32(, sizeof(sf));
+g->freeze_mode = sf.freeze_mode;
+}
+
 static void virtio_gpu_disable_scanout(VirtIOGPU *g, int scanout_id)
 {
 struct virtio_gpu_scanout *scanout = >parent_obj.scanout[scanout_id];
@@ -986,6 +996,13 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
 case VIRTIO_GPU_CMD_RESOURCE_DETACH_BACKING:
 virtio_gpu_resource_detach_backing(g, cmd);
 break;
+case VIRTIO_GPU_CMD_SET_FREEZE_MODE:
+if (virtio_gpu_freeze_S3_enabled(g->parent_obj.conf)) {
+virtio_gpu_cmd_set_freeze_mode(g, cmd);
+} else {
+cmd->error = VIRTIO_GPU_RESP_ERR_INVALID_PARAMETER;
+}
+break;
 default:
 cmd->error = VIRTIO_GPU_RESP_ERR_UNSPEC;
 break;
@@ -1344,6 +1361,29 @@ void virtio_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 QTAILQ_INIT(>reslist);
 QTAILQ_INIT(>cmdq);
 QTAILQ_INIT(>fenceq);
+
+g->freeze_mode = VIRTIO_GPU_FREEZE_MODE_UNFREEZE;
+}
+
+static void virtio_gpu_device_unrealize(DeviceState *qdev)
+{
+VirtIOGPU *g = VIRTIO_GPU(qdev);
+struct 

[QEMU PATCH v4 0/1] S3 support

2023-07-20 Thread Jiqian Chen
v4:

Hi all,
Thanks for Gerd Hoffmann's advice. V4 makes below changes:
* Use enum for freeze mode, so this can be extended with more
  modes in the future.
* Rename functions and paratemers with "_S3" postfix.
And no functional changes.

latest version on kernel side:
https://lore.kernel.org/lkml/20230720115805.8206-1-jiqian.c...@amd.com/T/#t

Best regards,
Jiqian Chen.


v3:
link,
https://lore.kernel.org/qemu-devel/20230719074726.1613088-1-jiqian.c...@amd.com/T/#t

Hi all,
Thanks for Michael S. Tsirkin's advice. V3 makes below changes:
* Remove changes in file include/standard-headers/linux/virtio_gpu.h
  I am not supposed to edit this file and it will be imported after
  the patches of linux kernel was merged.


v2:
link,
https://lore.kernel.org/qemu-devel/20230630070016.841459-1-jiqian.c...@amd.com/T/#t

Hi all,
Thanks to Marc-André Lureau, Robert Beckett and Gerd Hoffmann for
their advice and guidance. V2 makes below changes:

* Change VIRTIO_CPU_CMD_STATUS_FREEZING to 0x0400 (<0x1000)
* Add virtio_gpu_device_unrealize to destroy resources to solve
  potential memory leak problem. This also needs hot-plug support.
* Add a new feature flag VIRTIO_GPU_F_FREEZING, so that guest and
  host can negotiate whenever freezing is supported or not.

v1:
link,
https://lore.kernel.org/qemu-devel/20230608025655.1674357-1-jiqian.c...@amd.com/

Hi all,
I am working to implement virtgpu S3 function on Xen.

Currently on Xen, if we start a guest who enables virtgpu, and then
run "echo mem > /sys/power/state" to suspend guest. And run
"sudo xl trigger  s3resume" to resume guest. We can find that
the guest kernel comes back, but the display doesn't. It just shown a
black screen.

Through reading codes, I founded that when guest was during suspending,
it called into Qemu to call virtio_gpu_gl_reset. In virtio_gpu_gl_reset,
it destroyed all resources and reset renderer. This made the display
gone after guest resumed.

I think we should keep resources or prevent they being destroyed when
guest is suspending. So, I add a new status named freezing to virtgpu,
and add a new ctrl message VIRTIO_GPU_CMD_STATUS_FREEZING to get
notification from guest. If freezing is set to true, and then Qemu will
realize that guest is suspending, it will not destroy resources and will
not reset renderer. If freezing is set to false, Qemu will do its origin
actions, and has no other impaction.

And now, display can come back and applications can continue their
status after guest resumes.

Jiqian Chen (1):
  virtgpu: do not destroy resources when guest suspend

 hw/display/virtio-gpu-base.c   |  3 ++
 hw/display/virtio-gpu-gl.c | 10 ++-
 hw/display/virtio-gpu-virgl.c  |  7 +
 hw/display/virtio-gpu.c| 55 --
 hw/virtio/virtio.c |  3 ++
 include/hw/virtio/virtio-gpu.h |  6 
 6 files changed, 81 insertions(+), 3 deletions(-)

-- 
2.34.1




[LINUX KERNEL PATCH v3 1/1] virtgpu: init vq during resume and notify qemu guest status

2023-07-20 Thread Jiqian Chen
This patch solves two problem:

First, when we suspended guest VM, it called into Qemu to call
virtio_reset->__virtio_queue_reset, this cleared all virtuqueue
information of virtgpu on Qemu end. As a result, after guest
resumed, guest sended ctrl/cursor requests to Qemu through
virtqueue, but Qemu can't get requests from the virtqueue now.
In function virtio_queue_notify, vq->vring.desc is NULL.

So, this patch add freeze and restore function for virtgpu driver.
In freeze function, it flushes all virtqueue works and deletes
virtqueues. In restore function, it initializes virtqueues. And
then, Qemu and guest can communicate normally.

Second, when we suspended guest VM, it called into Qemu to call
virtio_reset->virtio_gpu_gl_reset, this destroyed resources and
reset renderer which were used for display. As a result, after
guest resumed, the display can't come back and we only saw a black
screen.

So, this patch add a new ctrl message VIRTIO_GPU_CMD_SET_FREEZE_MODE.
When guest is during suspending, we set freeze mode to freeze_S3 to
notify Qemu that guest entered suspending, and then Qemu will not
destroy resources. When guest is during resuming, we set freeze mode
to unfreeze to notify Qemu that guest exited suspending, and then
Qemu will keep its origin actions. As a result, the display can come
back and everything of guest can come back to the time when guest
was suspended.

Due to this implemention needs cooperation with host Qemu, so it
added a new feature flag VIRTIO_GPU_F_FREEZE_S3, so that guest and
host can negotiate whenever freeze_S3 is supported or not.

Signed-off-by: Jiqian Chen 
---
 drivers/gpu/drm/virtio/virtgpu_debugfs.c |  1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c | 39 
 drivers/gpu/drm/virtio/virtgpu_drv.h |  5 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c | 36 --
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 16 ++
 include/uapi/linux/virtio_gpu.h  | 19 
 6 files changed, 107 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/virtio/virtgpu_debugfs.c 
b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
index 853dd9aa397e..c84fd6d7f5f3 100644
--- a/drivers/gpu/drm/virtio/virtgpu_debugfs.c
+++ b/drivers/gpu/drm/virtio/virtgpu_debugfs.c
@@ -55,6 +55,7 @@ static int virtio_gpu_features(struct seq_file *m, void *data)
 
virtio_gpu_add_bool(m, "blob resources", vgdev->has_resource_blob);
virtio_gpu_add_bool(m, "context init", vgdev->has_context_init);
+   virtio_gpu_add_bool(m, "freeze_S3", vgdev->has_freeze_S3);
virtio_gpu_add_int(m, "cap sets", vgdev->num_capsets);
virtio_gpu_add_int(m, "scanouts", vgdev->num_scanouts);
if (vgdev->host_visible_region.len) {
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c 
b/drivers/gpu/drm/virtio/virtgpu_drv.c
index add075681e18..83ad0ac82b94 100644
--- a/drivers/gpu/drm/virtio/virtgpu_drv.c
+++ b/drivers/gpu/drm/virtio/virtgpu_drv.c
@@ -130,6 +130,40 @@ static void virtio_gpu_config_changed(struct virtio_device 
*vdev)
schedule_work(>config_changed_work);
 }
 
+#ifdef CONFIG_PM
+static int virtio_gpu_freeze(struct virtio_device *dev)
+{
+   struct drm_device *ddev = dev->priv;
+   struct virtio_gpu_device *vgdev = ddev->dev_private;
+   int ret = 0;
+
+   if (vgdev->has_freeze_S3) {
+   ret = virtio_gpu_cmd_set_freeze_mode(vgdev,
+   VIRTIO_GPU_FREEZE_MODE_FREEZE_S3);
+   }
+   if (!ret) {
+   flush_work(>ctrlq.dequeue_work);
+   flush_work(>cursorq.dequeue_work);
+   vgdev->vdev->config->del_vqs(vgdev->vdev);
+   }
+   return ret;
+}
+
+static int virtio_gpu_restore(struct virtio_device *dev)
+{
+   struct drm_device *ddev = dev->priv;
+   struct virtio_gpu_device *vgdev = ddev->dev_private;
+   int ret;
+
+   ret = virtio_gpu_init_vqs(dev);
+   if (!ret && vgdev->has_freeze_S3) {
+   ret = virtio_gpu_cmd_set_freeze_mode(vgdev,
+   VIRTIO_GPU_FREEZE_MODE_UNFREEZE);
+   }
+   return ret;
+}
+#endif
+
 static struct virtio_device_id id_table[] = {
{ VIRTIO_ID_GPU, VIRTIO_DEV_ANY_ID },
{ 0 },
@@ -148,6 +182,7 @@ static unsigned int features[] = {
VIRTIO_GPU_F_RESOURCE_UUID,
VIRTIO_GPU_F_RESOURCE_BLOB,
VIRTIO_GPU_F_CONTEXT_INIT,
+   VIRTIO_GPU_F_FREEZE_S3,
 };
 static struct virtio_driver virtio_gpu_driver = {
.feature_table = features,
@@ -156,6 +191,10 @@ static struct virtio_driver virtio_gpu_driver = {
.driver.owner = THIS_MODULE,
.id_table = id_table,
.probe = virtio_gpu_probe,
+#ifdef CONFIG_PM
+   .freeze = virtio_gpu_freeze,
+   .restore = virtio_gpu_restore,
+#endif
.remove = virtio_gpu_remove,
.config_changed = virtio_gpu_config_changed
 };
diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.h 
b/drivers/gpu/drm/virtio/virtgpu_drv.h
index 

[LINUX KERNEL PATCH v3 0/1] add S3 support for virtgpu

2023-07-20 Thread Jiqian Chen
v3:
Hi all,
Thanks for Gerd Hoffmann's advice. V3 makes below changes:
* Use enum for freeze mode, so this can be extended with more
  modes in the future.
* Rename functions and paratemers with "_S3" postfix.
And no functional changes.

Best regards,
Jiqian Chen.


v2:

Hi all,
Thanks to Marc-André Lureau, Robert Beckett and Gerd Hoffmann for
their advice and guidance. V2 makes below changes:
* Change VIRTIO_CPU_CMD_STATUS_FREEZING to 0x0400 (<0x1000)
* Add a new feature flag VIRTIO_GPU_F_FREEZING, so that guest and
  host can negotiate whenever freezing is supported or not.

V2 of Qemu patch:
https://lore.kernel.org/qemu-devel/20230630070016.841459-1-jiqian.c...@amd.com/T/#t


v1:

link,
https://lore.kernel.org/lkml/20230608063857.1677973-1-jiqian.c...@amd.com/

Hi all,
I am working to implement virtgpu S3 function on Xen.

Currently on Xen, if we start a guest who enables virtgpu, and then
run "echo mem > /sys/power/state" to suspend guest. And run
"sudo xl trigger  s3resume" to resume guest. We can find that
the guest kernel comes back, but the display doesn't. It just shows a
black screen.

In response to the above phenomenon, I have found two problems.

First, if we move mouse on the black screen, guest kernel still sends a
cursor request to Qemu, but Qemu doesn't response. Because when guest
is suspending, it calls device_suspend, and then call into Qemu to call
virtio_reset->__virtio_queue_reset. In __virtio_queue_reset, it clears
all virtqueue information on Qemu end. So, after guest resumes, Qemu
can't get message from virtqueue.

Second, the reason why display can't come back is that when guest is
suspending, it calls into Qemu to call virtio_reset->virtio_gpu_gl_reset.
In virtio_gpu_gl_reset, it destroys all resources and resets renderer,
which are used for display. So after guest resumes, the display can't
come back to the status when guest is suspended.

This patch initializes virtqueue when guest is resuming to solve first
problem. And it notifies Qemu that guest is suspending to prevent Qemu
destroying resources, this is to solve second problem. And then, I can
bring the display back, and everything continues their actions after
guest resumes.

Modifications on Qemu end:
https://lore.kernel.org/qemu-devel/20230608025655.1674357-2-jiqian.c...@amd.com/

Jiqian Chen (1):
  virtgpu: init vq during resume and notify qemu guest status

 drivers/gpu/drm/virtio/virtgpu_debugfs.c |  1 +
 drivers/gpu/drm/virtio/virtgpu_drv.c | 39 
 drivers/gpu/drm/virtio/virtgpu_drv.h |  5 +++
 drivers/gpu/drm/virtio/virtgpu_kms.c | 36 --
 drivers/gpu/drm/virtio/virtgpu_vq.c  | 16 ++
 include/uapi/linux/virtio_gpu.h  | 19 
 6 files changed, 107 insertions(+), 9 deletions(-)

-- 
2.34.1




Re: [PATCH v8 03/13] vpci: restrict unhandled read/write operations for guests

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> A guest would be able to read and write those registers which are not
> emulated and have no respective vPCI handlers, so it will be possible
> for it to access the hardware directly.
> In order to prevent a guest from reads and writes from/to the unhandled
^ extra 'the'
> registers make sure only hardware domain can access the hardware directly
> and restrict guests from doing so.
> 
> Suggested-by: Roger Pau Monné 
> Signed-off-by: Oleksandr Andrushchenko 

Reviewed-by: Roger Pau Monné 

With the stray change below removed.

> 
> ---
> Since v6:
> - do not use is_hwdom parameter for vpci_{read|write}_hw and use
>   current->domain internally
> - update commit message
> New in v6
> ---
>  xen/drivers/vpci/vpci.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/drivers/vpci/vpci.c b/xen/drivers/vpci/vpci.c
> index f22cbf2112..a6d2cf8660 100644
> --- a/xen/drivers/vpci/vpci.c
> +++ b/xen/drivers/vpci/vpci.c
> @@ -233,6 +233,10 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned 
> int reg,
>  {
>  uint32_t data;
>  
> +/* Guest domains are not allowed to read real hardware. */
> +if ( !is_hardware_domain(current->domain) )
> +return ~(uint32_t)0;
> +
>  switch ( size )
>  {
>  case 4:
> @@ -273,9 +277,13 @@ static uint32_t vpci_read_hw(pci_sbdf_t sbdf, unsigned 
> int reg,
>  return data;
>  }
>  
> -static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg, unsigned int 
> size,
> -  uint32_t data)
> +static void vpci_write_hw(pci_sbdf_t sbdf, unsigned int reg,
> +  unsigned int size, uint32_t data)

Unrelated change?

Thanks, Roger.



[xen-unstable-smoke test] 181923: tolerable all pass - PUSHED

2023-07-20 Thread osstest service owner
flight 181923 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181923/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  4bf014c6f7d7cc9a9e017cef0eb5ff4bf27526e9
baseline version:
 xen  b1c16800e52743d9afd9af62c810f03af16dd942

Last test of basis   181893  2023-07-19 09:03:36 Z1 days
Testing same since   181923  2023-07-20 09:02:01 Z0 days1 attempts


People who touched revisions under test:
  Federico Serafini 
  Jan Beulich 
  Luca Fancellu 
  Stefano Stabellini 
  Tamas K Lengyel 
  Yang Xu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   b1c16800e5..4bf014c6f7  4bf014c6f7d7cc9a9e017cef0eb5ff4bf27526e9 -> smoke



Re: [PATCH v8 02/13] vpci: use per-domain PCI lock to protect vpci structure

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> From: Oleksandr Andrushchenko 
> 
> Use a previously introduced per-domain read/write lock to check
> whether vpci is present, so we are sure there are no accesses to the
> contents of the vpci struct if not. This lock can be used (and in a
> few cases is used right away) so that vpci removal can be performed
> while holding the lock in write mode. Previously such removal could
> race with vpci_read for example.

This I think needs to state the locking order of the per-domain
pci_lock wrt the vpci->lock.  AFAICT that's d->pci_lock first, then
vpci->lock.

> 1. Per-domain's pci_rwlock is used to protect pdev->vpci structure
> from being removed.
> 
> 2. Writing the command register and ROM BAR register may trigger
> modify_bars to run, which in turn may access multiple pdevs while
> checking for the existing BAR's overlap. The overlapping check, if
> done under the read lock, requires vpci->lock to be acquired on both
> devices being compared, which may produce a deadlock. It is not
> possible to upgrade read lock to write lock in such a case. So, in
> order to prevent the deadlock, use d->pci_lock instead. To prevent
> deadlock while locking both hwdom->pci_lock and dom_xen->pci_lock,
> always lock hwdom first.
> 
> All other code, which doesn't lead to pdev->vpci destruction and does
> not access multiple pdevs at the same time, can still use a
> combination of the read lock and pdev->vpci->lock.
> 
> 3. Drop const qualifier where the new rwlock is used and this is
> appropriate.
> 
> 4. Do not call process_pending_softirqs with any locks held. For that
> unlock prior the call and re-acquire the locks after. After
> re-acquiring the lock there is no need to check if pdev->vpci exists:
>  - in apply_map because of the context it is called (no race condition
>possible)
>  - for MSI/MSI-X debug code because it is called at the end of
>pdev->vpci access and no further access to pdev->vpci is made

I assume that's vpci_msix_arch_print(): there are further accesses to
pdev->vpci, but those use the msix local variable, which holds a copy
of the pointer in pdev->vpci->msix, so that last sentence is not true
I'm afraid.

However the code already try to cater for the pdev going away, and
hence it's IMO fine.  IOW: your change doesn't make this any better or
worse.

> 
> 5. Introduce pcidevs_trylock, so there is a possibility to try locking
> the pcidev's lock.

I'm confused by this addition, the more that's no used anywhere.  Can
you defer the addition until the patch that makes use of it?

> 
> 6. Use d->pci_lock around for_each_pdev and pci_get_pdev_by_domain
> while accessing pdevs in vpci code.
> 
> Suggested-by: Roger Pau Monné 
> Suggested-by: Jan Beulich 
> Signed-off-by: Oleksandr Andrushchenko 
> Signed-off-by: Volodymyr Babchuk 
> 
> ---
> 
> Changes in v8:
>  - changed d->vpci_lock to d->pci_lock
>  - introducing d->pci_lock in a separate patch
>  - extended locked region in vpci_process_pending
>  - removed pcidevs_lockis vpci_dump_msi()
>  - removed some changes as they are not needed with
>the new locking scheme
>  - added handling for hwdom && dom_xen case
> ---
>  xen/arch/x86/hvm/vmsi.c   |  4 +++
>  xen/drivers/passthrough/pci.c |  7 +
>  xen/drivers/vpci/header.c | 18 
>  xen/drivers/vpci/msi.c| 14 --
>  xen/drivers/vpci/msix.c   | 52 ++-
>  xen/drivers/vpci/vpci.c   | 46 +--
>  xen/include/xen/pci.h |  1 +
>  7 files changed, 129 insertions(+), 13 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/vmsi.c b/xen/arch/x86/hvm/vmsi.c
> index 3cd4923060..8c1bd66b9c 100644
> --- a/xen/arch/x86/hvm/vmsi.c
> +++ b/xen/arch/x86/hvm/vmsi.c
> @@ -895,6 +895,8 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  {
>  unsigned int i;
>  
> +ASSERT(rw_is_locked(>pdev->domain->pci_lock));
> +
>  for ( i = 0; i < msix->max_entries; i++ )
>  {
>  const struct vpci_msix_entry *entry = >entries[i];
> @@ -913,7 +915,9 @@ int vpci_msix_arch_print(const struct vpci_msix *msix)
>  struct pci_dev *pdev = msix->pdev;
>  
>  spin_unlock(>pdev->vpci->lock);
> +read_unlock(>domain->pci_lock);
>  process_pending_softirqs();
> +read_lock(>domain->pci_lock);

This should be a read_trylock(), much like the spin_trylock() below.

>  /* NB: we assume that pdev cannot go away for an alive domain. */
>  if ( !pdev->vpci || !spin_trylock(>vpci->lock) )
>  return -EBUSY;
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 5b4632ead2..6f8692cd9c 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -57,6 +57,11 @@ void pcidevs_lock(void)
>  spin_lock_recursive(&_pcidevs_lock);
>  }
>  
> +int pcidevs_trylock(void)
> +{
> +return 

[ovmf test] 181922: all pass - PUSHED

2023-07-20 Thread osstest service owner
flight 181922 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/181922/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf b2de9ec5a759aa4a7ac029cda9079dce077bf856
baseline version:
 ovmf 6510dcf6f71adbe282bff0ba2b236f1d074f819f

Last test of basis   181916  2023-07-20 01:41:49 Z0 days
Testing same since   181922  2023-07-20 08:11:06 Z0 days1 attempts


People who touched revisions under test:
  Sheng Wei 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   6510dcf6f7..b2de9ec5a7  b2de9ec5a759aa4a7ac029cda9079dce077bf856 -> 
xen-tested-master



Re: [PATCH] x86/vRTC: move and tidy convert_hour() and {to,from}_bcd()

2023-07-20 Thread Andrew Cooper
On 20/07/2023 8:11 am, Jan Beulich wrote:
> This is to avoid the need for forward declarations, which in turn
> addresses a violation of MISRA C:2012 Rule 8.3 ("All declarations of an
> object or function shall use the same names and type qualifiers").
>
> While doing so,
> - drop inline (leaving the decision to the compiler),
> - add const,
> - add unsigned,
> - correct style.
>
> Signed-off-by: Jan Beulich 

Acked-by: Andrew Cooper 



[XEN PATCH] x86/mtrr: address violations of MISRA C:2012 Rule 8.3 on parameter types

2023-07-20 Thread Federico Serafini
Change parameter types of function declarations to be consistent with
the ones used in the corresponding definitions,
thus addressing violations of MISRA C:2012 Rule 8.3 ("All declarations
of an object or function shall use the same names and type qualifiers").

No functional changes.

Signed-off-by: Federico Serafini 
---
 xen/arch/x86/include/asm/mtrr.h | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/xen/arch/x86/include/asm/mtrr.h b/xen/arch/x86/include/asm/mtrr.h
index e4f6ca6048..5d57a596ea 100644
--- a/xen/arch/x86/include/asm/mtrr.h
+++ b/xen/arch/x86/include/asm/mtrr.h
@@ -59,9 +59,10 @@ extern int mtrr_del_page(int reg, unsigned long base, 
unsigned long size);
 extern int mtrr_get_type(const struct mtrr_state *m, paddr_t pa,
  unsigned int order);
 extern void mtrr_centaur_report_mcr(int mcr, u32 lo, u32 hi);
-extern u32 get_pat_flags(struct vcpu *v, u32 gl1e_flags, paddr_t gpaddr,
-  paddr_t spaddr, uint8_t gmtrr_mtype);
-extern unsigned char pat_type_2_pte_flags(unsigned char pat_type);
+extern uint32_t get_pat_flags(struct vcpu *v, uint32_t gl1e_flags,
+  paddr_t gpaddr, paddr_t spaddr,
+   
uint8_t gmtrr_mtype);
+extern uint8_t pat_type_2_pte_flags(uint8_t pat_type);
 extern int hold_mtrr_updates_on_aps;
 extern void mtrr_aps_sync_begin(void);
 extern void mtrr_aps_sync_end(void);
-- 
2.34.1




Re: [ARM][xencons] PV Console hangs due to illegal ring buffer accesses

2023-07-20 Thread Julien Grall

(+ Juergen)

On 19/07/2023 17:13, Andrei Cherechesu (OSS) wrote:

Hello,


Hi Andrei,


As we're running Xen 4.17 (with platform-related support added) on NXP S32G 
SoCs (ARMv8), with a custom Linux distribution built through Yocto, and we've 
set some Xen-based demos up, we encountered some issues which we think might 
not be related to our hardware. For additional context, the Linux kernel 
version we're running is 5.15.96-rt (with platform-related support added as 
well).

The setup to reproduce the problem is fairly simple: after booting a Dom0 (can 
provide configuration details if needed), we're booting a normal PV DomU with 
PV Networking. Additionally, the VMs have k3s (Lightweight Kubernetes - version 
v1.25.8+k3s1: https://github.com/k3s-io/k3s/releases/tag/v1.25.8%2Bk3s1) 
installed in their rootfs'es.

The problem is that the DomU console hangs (no new output is shown, no input 
can be sent) some time (non-deterministic, sometimes 5 seconds, other times 
like 15-20 seconds) after we run the `k3s server` command. We have this command 
running as part of a sysvinit service, and the same behavior can be observed in 
that case as well. The k3s version we use is the one mentioned in the paragraph 
above, but this can be reproduced with other versions as well (i.e., v1.21.11, 
v1.22.6). If the `k3s server` command is ran in the Dom0 VM, everything works 
fine. Using DomU as an agent node is also working fine, only when it is run as 
a server the console problem occurs.

Immediately after the serial console hangs, we can still log in on DomU using 
SSH, and we can observe the following messages its dmesg:
[   57.905806] xencons: Illegal ring page indices


Looking at Linux code, this message is printed in a couple of place in 
the xenconsole driver.


I would assume that this is printed when reading from the buffer 
(otherwise you would not see any message). Can you confirm it?


Also, can you provide the indices that Linux considers buggy?

Lastly, it seems like the barrier used are incorrect. It should be the 
virt_*() version rather than a plain mb()/wmb(). I don't think it matter 
for arm64 though (I am assuming you are not running 32-bit).



[   59.399620] xenbus: error -5 while reading message


So this message is coming from the xenbus driver (used to read the 
xenstore ring). This is -EIO, and AFAICT returned when the indices are 
also incorrect.


For this driver, I think there is also a TOCTOU because a compiler is 
free to reload intf->rsp_cons after the check. Moving virt_mb() is 
probably not sufficient. You would also want to use ACCESS_ONCE().


What I find odd is you have two distinct rings (xenconsole and xenbus) 
with similar issues. Above, you said you are using Linux RT. I wonder if 
this has a play into the issue because if I am not mistaken, the two 
functions would now be fully preemptible.


This could expose some races. For instance, there are some missing 
ACCESS_ONCE() (as mentioned above).


In particular, Xenstored (I haven't checked xenconsoled) is using += to 
update intf->rsp_cons. There is no guarantee that the update will be atomic.


Overall, I am not 100% sure what I wrote is related. But that's probably 
a good start of things that can be exacerbated with Linux RT.



[   59.399649] xenbus: error -5 while writing message


This is in xenbus as well. But this time in the write part. The analysis 
I wrote above for the read part can be applied here.


Cheers,

--
Julien Grall



Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping

2023-07-20 Thread Jan Beulich
On 20.07.2023 10:28, Oleksii wrote:
> On Thu, 2023-07-20 at 07:58 +0200, Jan Beulich wrote:
>> On 19.07.2023 18:35, Oleksii wrote:
>>> On Tue, 2023-07-18 at 17:03 +0200, Jan Beulich wrote:
> +    unsigned long load_end = LINK_TO_LOAD(_end);
> +    unsigned long pt_level_size = XEN_PT_LEVEL_SIZE(i
> -
> 1);
> +    unsigned long xen_size = ROUNDUP(load_end -
> load_start, pt_level_size);
> +    unsigned long page_entries_num = xen_size /
> pt_level_size;
> +
> +    while ( page_entries_num-- )
> +    pgtbl[index++].pte = 0;
> +
> +    break;

 Unless there's a "not crossing a 2Mb boundary" guarantee
 somewhere
 that I've missed, this "break" is still too early afaict.
>>> If I will add a '2 MB boundary check' for load_start and
>>> linker_start
>>> could it be an upstreamable solution?
>>>
>>> Something like:
>>>     if ( !IS_ALIGNED(load_start, MB(2) )
>>> printk("load_start should be 2Mb algined\n");
>>> and
>>>     ASSERT( !IS_ALIGNED(XEN_VIRT_START, MB(2) )
>>> in xen.lds.S.
>>
>> Arranging for the linked address to be 2Mb-aligned is certainly
>> reasonable. Whether expecting the load address to also be depends
>> on whether that can be arranged for (which in turn depends on boot
>> loader behavior); it cannot be left to "luck".
> Maybe I didn't quite understand you here, but if Xen has an alignment
> check of load address then boot loader has to follow the alignment
> requirements of Xen. So it doesn't look as 'luck'.

That depends on (a) the alignment being properly expressed in the
final binary and (b) the boot loader honoring it. (b) is what you
double-check above, emitting a printk(), but I'm not sure about (a)
being sufficiently enforced with just the ASSERT in the linker
script. Maybe I'm wrong, though.

>>> Then we will have completely different L0 tables for identity
>>> mapping
>>> and not identity and the code above will be correct.
>>
>> As long as Xen won't grow beyond 2Mb total. Considering that at
>> some point you may want to use large page mappings for .text,
>> .data, and .rodata, that alone would grow Xen to 6 Mb (or really 8,
>> assuming .init goes separate as well). That's leaving aside the
>> realistic option of the mere sum of all sections being larger than
>> 2. That said, even Arm64 with ACPI is still quite a bit below 2Mb.
>> x86 is nearing 2.5 though in even a somewhat limited config;
>> allyesconfig may well be beyond that already.
> I am missing something about Xen size. Lets assume that Xen will be
> mapped using only 4k pagees ( like it is done now ). Then if Xen will
> be more then 2Mb then only what will be changed is a number of page
> tables so it is only question of changing of PGTBL_INITIAL_COUNT ( in
> case of RISC-V).

And the way you do the tearing down of the transient 1:1 mapping.

> Could you please explain why Xen will grow to 6/8 MB in case of larger
> page mappings? In case of larger page mapping fewer tables are needed.
> For example, if we would like to use 2Mb pages then we will stop at L1
> page table and write an physical address to L1 page table entry instead
> of creating new L0 page table.

When you use 2Mb mappings, then you will want to use separate ones
for .text, .rodata, and .data (plus perhaps .init), to express the
differing permissions correctly. Consequently you'll need more
virtual address space, but - yes - fewer page table pages. And of
course the 1:1 unmapping logic will again be slightly different.

Jan



Re: [PATCH v2 2/4] x86: allow Kconfig control over psABI level

2023-07-20 Thread Jan Beulich
On 19.07.2023 14:28, Jan Beulich wrote:
> On 19.07.2023 12:04, Andrew Cooper wrote:
>> On 19/07/2023 10:44 am, Jan Beulich wrote:
>>> --- a/xen/arch/x86/Kconfig
>>> +++ b/xen/arch/x86/Kconfig
>>> @@ -118,6 +118,36 @@ config HVM
>>>  
>>>   If unsure, say Y.
>>>  
>>> +choice
>>> +   prompt "base psABI level"
>>> +   default X86_64_BASELINE
>>> +   help
>>> + The psABI defines 4 levels of ISA extension sets as a coarse granular
>>> + way of identifying advanced functionality that would be uniformly
>>> + available in respectively newer hardware.  While v4 is not really of
>>> + interest for Xen, the others can be selected here, making the
>>> + resulting Xen no longer work on older hardware.  This option won't
>>> + have any effect if the toolchain doesn't support the distinction.
>>> +
>>> + If unsure, stick to the default.
>>> +
>>> +config X86_64_BASELINE
>>> +   bool "baseline"
>>> +
>>> +config X86_64_V2
>>> +   bool "v2"
>>> +   help
>>> + This enables POPCNT and CX16, besides other extensions which are of
>>> + no interest here.
>>> +
>>> +config X86_64_V3
>>> +   bool "v3"
>>> +   help
>>> + This enables BMI, BMI2, LZCNT, MOVBE, and XSAVE, besides other
>>> + extensions which are of no interest here.
>>> +
>>> +endchoice
>>> +
>>>  config XEN_SHSTK
>>> bool "Supervisor Shadow Stacks"
>>> depends on HAS_AS_CET_SS
>>> --- a/xen/arch/x86/arch.mk
>>> +++ b/xen/arch/x86/arch.mk
>>> @@ -36,6 +36,10 @@ CFLAGS += -mno-red-zone -fpic
>>>  # the SSE setup for variadic function calls.
>>>  CFLAGS += -mno-mmx -mno-sse $(call cc-option,$(CC),-mskip-rax-setup)
>>>  
>>> +# Enable the selected baseline ABI, if supported by the compiler.
>>> +CFLAGS-$(CONFIG_X86_64_V2) += $(call cc-option,$(CC),-march=x86-64-v2)
>>> +CFLAGS-$(CONFIG_X86_64_V3) += $(call cc-option,$(CC),-march=x86-64-v3)
>>
>> I know we're having severe disagreements over Kconfig compiler checking,
>> but this patch cannot cannot go in in this form.
>>
>> You're asking the user unconditionally for the psABI level, then
>> ignoring the answer on toolchains which don't understand it.
>>
>> The makefile needs to be unconditional, and the Kconfig options need to
>> depend on suitable toolchain support.  This is the only way we don't get
>> a false statement written into the .config, and embedded in hypfs.
> 
> I was tempted to base this on "x86: convert CET tool chain feature checks
> to mixed Kconfig/Makefile model", but then it likely wouldn't have stood
> a chance to go in either. The technical issues aside that need solving in
> that other patch, I still haven't had any feedback on the conceptual
> aspects. Yet as said in other contexts, without having the conceptual
> side (largely) settled, there's no incentive for me to invest time in
> dealing with the technical issues (which surely are solvable).
> 
> When raising this aspect, did you pay attention to the first of the TBDs
> in the patch? If we were to force build errors (for no real reason, see
> below), we should first try those fallbacks, to limit the possible
> damage. As mentioned there, support for these -march= forms isn't all
> that old.
> 
> As to forcing build errors in the first place, that goes against the
> intentions with the mixed Kconfig / Makefile checking model. There we
> would only issue warnings. Albeit as mentioned in that patch, that's up
> for discussion, and a majority may view things differently than I do.
> Especially here there's no reason to outright fail builds, though:
> .config / hypfs wouldn't really state anything wrong - the binary merely
> wouldn't make use of newer insns despite being permitted to.

In an attempt to fit both your and my expectations, what about another
prereq patch along the lines of the below one, of course then accompanied
by adjustments to this patch (to first try the fallback mentioned, and
then complain - as configured - if that's also not successful)?

Cc-ing other people as well which would be Cc-ed on an eventual proper
submission.

Jan

build: permit Kconfig control over how to deal with unsatisfiable choices

Some options we allow the build admin to select may require new enough
tool chain components to fulfill (partly or entirely). Provide yet
another control to pick what action to take at the end of the build
process - be silent about this, warn, or fail the build.

Signed-off-by: Jan Beulich 
---
This may not be fine grained enough: Optimization settings (like added
by "x86: allow Kconfig control over psABI level") may want dealing with
differently than security relevant ones (like XEN_SHSTK or XEN_IBT).

Whether to do this uniformly at the end of the build is up for
discussion: In the "warn" case we will want the resulting output late,
so it is more likely to be noticed. In the "fail build" case though we
may want the failure to occur early.

--- a/xen/Kconfig
+++ b/xen/Kconfig
@@ -64,6 +64,25 @@ config UNSUPPORTED
  preview features as defined by SUPPORT.md. 

Re: [XEN PATCH v10 04/24] xen/arm: tee: add a primitive FF-A mediator

2023-07-20 Thread Bertrand Marquis
Hi Jens,

> On 17 Jul 2023, at 09:20, Jens Wiklander  wrote:
> 
> Adds a FF-A version 1.1 [1] mediator to communicate with a Secure
> Partition in secure world.
> 
> This commit brings in only the parts needed to negotiate FF-A version
> number with guest and SPMC.
> 
> [1] https://developer.arm.com/documentation/den0077/e
> Signed-off-by: Jens Wiklander 
> ---
> xen/arch/arm/include/asm/psci.h|   4 +
> xen/arch/arm/include/asm/tee/ffa.h |  35 +
> xen/arch/arm/tee/Kconfig   |  11 ++
> xen/arch/arm/tee/Makefile  |   1 +
> xen/arch/arm/tee/ffa.c | 225 +
> xen/arch/arm/vsmc.c|  17 ++-
> xen/include/public/arch-arm.h  |   1 +
> 7 files changed, 291 insertions(+), 3 deletions(-)
> create mode 100644 xen/arch/arm/include/asm/tee/ffa.h
> create mode 100644 xen/arch/arm/tee/ffa.c
> 
> diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> index 832f77afff3a..4780972621bb 100644
> --- a/xen/arch/arm/include/asm/psci.h
> +++ b/xen/arch/arm/include/asm/psci.h
> @@ -24,6 +24,10 @@ void call_psci_cpu_off(void);
> void call_psci_system_off(void);
> void call_psci_system_reset(void);
> 
> +/* Range of allocated PSCI function numbers */
> +#define PSCI_FNUM_MIN_VALUE _AC(0,U)
> +#define PSCI_FNUM_MAX_VALUE _AC(0x1f,U)
> +
> /* PSCI v0.2 interface */
> #define PSCI_0_2_FN32(nr) ARM_SMCCC_CALL_VAL(ARM_SMCCC_FAST_CALL, 
> \
>  ARM_SMCCC_CONV_32,   
> \
> diff --git a/xen/arch/arm/include/asm/tee/ffa.h 
> b/xen/arch/arm/include/asm/tee/ffa.h
> new file mode 100644
> index ..44361a4e78e4
> --- /dev/null
> +++ b/xen/arch/arm/include/asm/tee/ffa.h
> @@ -0,0 +1,35 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * xen/arch/arm/include/asm/tee/ffa.h
> + *
> + * Arm Firmware Framework for ARMv8-A(FFA) mediator
> + *
> + * Copyright (C) 2023  Linaro Limited
> + */
> +
> +#ifndef __ASM_ARM_TEE_FFA_H__
> +#define __ASM_ARM_TEE_FFA_H__
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +
> +#define FFA_FNUM_MIN_VALUE  _AC(0x60,U)
> +#define FFA_FNUM_MAX_VALUE  _AC(0x86,U)
> +
> +static inline bool is_ffa_fid(uint32_t fid)
> +{
> +uint32_t fn = fid & ARM_SMCCC_FUNC_MASK;
> +
> +return fn >= FFA_FNUM_MIN_VALUE && fn <= FFA_FNUM_MAX_VALUE;
> +}
> +
> +#ifdef CONFIG_FFA
> +#define FFA_NR_FUNCS12
> +#else
> +#define FFA_NR_FUNCS0
> +#endif
> +
> +#endif /*__ASM_ARM_TEE_FFA_H__*/
> diff --git a/xen/arch/arm/tee/Kconfig b/xen/arch/arm/tee/Kconfig
> index 392169b2559d..923f08ba8cb7 100644
> --- a/xen/arch/arm/tee/Kconfig
> +++ b/xen/arch/arm/tee/Kconfig
> @@ -8,3 +8,14 @@ config OPTEE
>  virtualization-enabled OP-TEE present. You can learn more
>  about virtualization for OP-TEE at
>  https://optee.readthedocs.io/architecture/virtualization.html
> +
> +config FFA
> + bool "Enable FF-A mediator support (UNSUPPORTED)" if UNSUPPORTED
> + default n
> + depends on ARM_64

Even if the tee Makefile is only included if CONFIG_TEE is activated,
the missing dependency on TEE here allows to select FFA without TEE
resulting in a config with FFA activated but not compiled in.

No build error is coming from this, FFA is just not in if selected without TEE.

Should be:

depends on ARM_64 && TEE

I am ok if this is fixed on commit and my R-B kept.

Cheers
Bertrand

> + help
> +  This option enables a minimal FF-A mediator. The mediator is
> +  generic as it follows the FF-A specification [1], but it only
> +  implements a small subset of the specification.
> +
> +  [1] https://developer.arm.com/documentation/den0077/latest
> diff --git a/xen/arch/arm/tee/Makefile b/xen/arch/arm/tee/Makefile
> index 982c87968447..58a1015e40e0 100644
> --- a/xen/arch/arm/tee/Makefile
> +++ b/xen/arch/arm/tee/Makefile
> @@ -1,2 +1,3 @@
> +obj-$(CONFIG_FFA) += ffa.o
> obj-y += tee.o
> obj-$(CONFIG_OPTEE) += optee.o
> diff --git a/xen/arch/arm/tee/ffa.c b/xen/arch/arm/tee/ffa.c
> new file mode 100644
> index ..927c4d33a380
> --- /dev/null
> +++ b/xen/arch/arm/tee/ffa.c
> @@ -0,0 +1,225 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * xen/arch/arm/tee/ffa.c
> + *
> + * Arm Firmware Framework for ARMv8-A (FF-A) mediator
> + *
> + * Copyright (C) 2023  Linaro Limited
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* Error codes */
> +#define FFA_RET_OK  0
> +#define FFA_RET_NOT_SUPPORTED   -1
> +#define FFA_RET_INVALID_PARAMETERS  -2
> +#define FFA_RET_NO_MEMORY   -3
> +#define FFA_RET_BUSY-4
> +#define FFA_RET_INTERRUPTED -5
> +#define FFA_RET_DENIED  -6
> +#define FFA_RET_RETRY   -7
> +#define FFA_RET_ABORTED -8
> 

Re: [PATCH LINUX v5 2/2] xen: add support for initializing xenstore later as HVM domain

2023-07-20 Thread Petr Mladek
On Wed 2023-07-19 18:46:08, Stefano Stabellini wrote:
> On Wed, 19 Jul 2023, Petr Mladek wrote:
> > On Fri 2022-05-13 14:19:38, Stefano Stabellini wrote:
> > > From: Luca Miccio 
> > > 
> > > When running as dom0less guest (HVM domain on ARM) the xenstore event
> > > channel is available at domain creation but the shared xenstore
> > > interface page only becomes available later on.
> > > 
> > > In that case, wait for a notification on the xenstore event channel,
> > > then complete the xenstore initialization later, when the shared page
> > > is actually available.
> > > 
> > > The xenstore page has few extra field. Add them to the shared struct.
> > > One of the field is "connection", when the connection is ready, it is
> > > zero. If the connection is not-zero, wait for a notification.
> > 
> > I see the following warning from free_irq() in 6.5-rc2 when running
> > livepatching selftests. It does not happen after reverting this patch.
> > 
> > [  352.168453] livepatch: signaling remaining tasks
> > [  352.173228] [ cut here ]
> > [  352.175563] Trying to free already-free IRQ 0
> > [  352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 
> > free_irq+0xbf/0x350
> > [  352.179942] Modules linked in: test_klp_livepatch(EK)
> > [  352.181621] CPU: 1 PID: 88 Comm: xenbus_probe Kdump: loaded Tainted: G   
> >  E K6.5.0-rc2-default+ #535
> > [  352.184754] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
> > [  352.188214] RIP: 0010:free_irq+0xbf/0x350
> > [  352.192211] Code: 7a 08 75 0e e9 36 02 00 00 4c 3b 7b 08 74 5a 48 89 da 
> > 48 8b 5a 18 48 85 db 75 ee 44 89 f6 48 c7 c7 58 b0 8b 86 e8 21 0a f5 ff 
> > <0f> 0b 48 8b 34 24 4c 89 ef e8 53 bb e3 00 
> > 48 8b 45 40 48 8b 40 78
> > [  352.200079] RSP: 0018:af0440b4be80 EFLAGS: 00010086
> > [  352.201465] RAX:  RBX: 99f105116c80 RCX: 
> > 0003
> > [  352.203324] RDX: 8003 RSI: 8691d4bc RDI: 
> > 
> > [  352.204989] RBP: 99f100052000 R08:  R09: 
> > c0007fff
> > [  352.206253] R10: af0440b4bd18 R11: af0440b4bd10 R12: 
> > 99f1000521e8
> > [  352.207451] R13: 99f1000520a8 R14:  R15: 
> > 86f42360
> > [  352.208787] FS:  () GS:99f15a40() 
> > knlGS:
> > [  352.210061] CS:  0010 DS:  ES:  CR0: 80050033
> > [  352.210815] CR2: 7f8415d56000 CR3: 000105e36003 CR4: 
> > 00370ee0
> > [  352.211867] DR0:  DR1:  DR2: 
> > 
> > [  352.212912] DR3:  DR6: fffe0ff0 DR7: 
> > 0400
> > [  352.213951] Call Trace:
> > [  352.214390]  
> > [  352.214717]  ? __warn+0x81/0x170
> > [  352.215436]  ? free_irq+0xbf/0x350
> > [  352.215906]  ? report_bug+0x10b/0x200
> > [  352.216408]  ? prb_read_valid+0x17/0x20
> > [  352.216926]  ? handle_bug+0x44/0x80
> > [  352.217409]  ? exc_invalid_op+0x13/0x60
> > [  352.217932]  ? asm_exc_invalid_op+0x16/0x20
> > [  352.218497]  ? free_irq+0xbf/0x350
> > [  352.218979]  ? __pfx_xenbus_probe_thread+0x10/0x10
> > [  352.219600]  xenbus_probe+0x7a/0x80
> > [  352.221030]  xenbus_probe_thread+0x76/0xc0
> > [  352.221416]  ? __pfx_autoremove_wake_function+0x10/0x10
> > [  352.221882]  kthread+0xfd/0x130
> > [  352.222191]  ? __pfx_kthread+0x10/0x10
> > [  352.222544]  ret_from_fork+0x2d/0x50
> > [  352.222893]  ? __pfx_kthread+0x10/0x10
> > [  352.223260]  ret_from_fork_asm+0x1b/0x30
> > [  352.223629] RIP: :0x0
> > [  352.223931] Code: Unable to access opcode bytes at 0xffd6.
> > [  352.224488] RSP: : EFLAGS:  ORIG_RAX: 
> > 
> > [  352.225044] RAX:  RBX:  RCX: 
> > 
> > [  352.225571] RDX:  RSI:  RDI: 
> > 
> > [  352.226106] RBP:  R08:  R09: 
> > 
> > [  352.226632] R10:  R11:  R12: 
> > 
> > [  352.227171] R13:  R14:  R15: 
> > 
> > [  352.227710]  
> > [  352.227917] irq event stamp: 22
> > [  352.228209] hardirqs last  enabled at (21): [] 
> > ___slab_alloc+0x68e/0xc80
> > [  352.228914] hardirqs last disabled at (22): [] 
> > _raw_spin_lock_irqsave+0x8d/0x90
> > [  352.229546] softirqs last  enabled at (0): [] 
> > copy_process+0xaae/0x1fd0
> > [  352.230079] softirqs last disabled at (0): [<>] 0x0
> > [  352.230503] ---[ end trace  ]---
> > 
> > , where the message "livepatch: signaling remaining tasks" means that
> > it might send fake signals to non-kthread tasks.
> > 
> > The aim is to force userspace tasks to enter and leave kernel space
> > so that they might start using the new patched code. It 

Re: [RFC PATCH 1/4] xen/arm: justify or initialize conditionally uninitialized variables

2023-07-20 Thread Nicola Vetrini




On 17/07/23 15:40, Julien Grall wrote:

Hi Nicola,

On 17/07/2023 13:08, Nicola Vetrini wrote:

On 14/07/23 15:00, Julien Grall wrote:

Hi Nicola,

On 14/07/2023 12:49, Nicola Vetrini wrote:

This patch aims to fix some occurrences of possibly uninitialized
variables, that may be read before being written. This behaviour would
violate MISRA C:2012 Rule 9.1, besides being generally undesirable.

In all the analyzed cases, such accesses were actually safe, but it's
quite difficult to prove so by automatic checking, therefore a safer
route is to change the code so as to avoid the behaviour from 
occurring,

while preserving the semantics.

To achieve this goal, I adopted the following strategies:


Please let's at least one patch per strategy. I would also consider 
some of the rework separate so they can go in regardless the decision 
for the SAF-*.




- Add a suitably formatted local deviation comment
   (as indicated in 'docs/misra/documenting-violations.rst')
   to exempt the following line from checking.

- Provide an initialization for the variable at the declaration.

- Substitute a goto breaking out of control flow logic with a 
semantically

   equivalent do { .. } while(0).


As I already mentioned in private, it is unclear to me how you 
decided which strategy to use. I still think we need to define our 
policy before changing the code. Otherwise, it is going to be 
difficult to decide for new code.




The main point of this RFC is doing so. From what I gathered, it's not 
an easy task: sometimes there are no 'safe' values to initialize 
variables to and sometimes there is no easy way to prove that indeed 
something is always initialized or not accessed at all.


But you wrote the code. So you should be able to explain how you took 
the decision between one and the others.


Also, even if this is an RFC, it would have been good to summarize any 
discussion that happened in private and if there were concern try to 
come up with ideas or at least listing the concerns after '---.




I'll keep this if the need arises in the future.





Signed-off-by: Nicola Vetrini 
---
  docs/misra/safe.json   |  8 +++
  xen/arch/arm/arm64/lib/find_next_bit.c |  1 +
  xen/arch/arm/bootfdt.c |  6 +
  xen/arch/arm/decode.c  |  2 ++
  xen/arch/arm/domain_build.c    | 29 ++
  xen/arch/arm/efi/efi-boot.h    |  6 +++--
  xen/arch/arm/gic-v3-its.c  |  9 ---
  xen/arch/arm/mm.c  |  1 +
  xen/arch/arm/p2m.c | 33 
+++---

  9 files changed, 69 insertions(+), 26 deletions(-)

diff --git a/docs/misra/safe.json b/docs/misra/safe.json
index e3c8a1d8eb..244001f5be 100644
--- a/docs/misra/safe.json
+++ b/docs/misra/safe.json
@@ -12,6 +12,14 @@
  },
  {
  "id": "SAF-1-safe",
+    "analyser": {
+    "eclair": "MC3R1.R9.1"
+    },
+    "name": "Rule 9.1: initializer not needed",
+    "text": "The following local variables are possibly 
subject to being read before being written, but code inspection 
ensured that the control flow in the construct where they appear 
ensures that no such event may happen."
I am bit concerned which such statement because the code instance was 
today with the current code. This could change in the future and 
invalide the reasoning.


It is not clear to me if we have any mechanism to prevent that. If we 
don't, then I think we need to drastically reduce the number of time 
this is used (there are a bit too much for my taste).




Indeed, the purpose of such a deviation is that the sound 
overapproximation computed by the tool requires a human to look at the 
code and think twice before modifying it (i.e., if ever that code is 
touched, the reviewer ought to assess whether that justification still 
holds or some other thing should be done about it.


Your assumption is the reviewer will notice there is an existing 
devitation and be able to assess it has changed. I view this assumption 
as risky in the long term.


Have you investigate to improve the automatic tooling?



Well, as discussed elsewhere in the thread, a slightly modified version 
of this deviation comment can list the specific reason why such a thing 
was deviated directly at the declaration or where the caution is, if you 
think this is better.


Example:

// <- SAF-x here
int var;

[...]

// <- or HERE
f();

An alternative approach to justification, partly discussed with Stefano 
in private is a macro that looks like an attribute to signal that the 
variable is intentionally uninitialized. This does not have the benefit 
of a written justification with a proper comment or an entry in the json 
file, but is less intrusive and the justification for all occurrences of 
__uninit w.r.t R9.1 would be included in the static analysis tool 
configuration, which would be part of the MISRA compliance 

Re: [PATCH v8 01/13] pci: introduce per-domain PCI rwlock

2023-07-20 Thread Roger Pau Monné
On Thu, Jul 20, 2023 at 12:32:31AM +, Volodymyr Babchuk wrote:
> Add per-domain d->pci_lock that protects access to
> d->pdev_list. Purpose of this lock is to give guarantees to VPCI code
> that underlying pdev will not disappear under feet. This is a rw-lock,
> but this patch adds only write_lock()s. There will be read_lock()
> users in the next patches.
> 
> This lock should be taken in write mode every time d->pdev_list is
> altered. This covers both accesses to d->pdev_list and accesses to
> pdev->domain_list fields. All write accesses also should be protected
> by pcidevs_lock() as well. Idea is that any user that wants read
> access to the list or to the devices stored in the list should use
> either this new d->pci_lock or old pcidevs_lock(). Usage of any of
> this two locks will ensure only that pdev of interest will not
> disappear from under feet and that the pdev still will be assigned to
> the same domain. Of course, any new users should use pcidevs_lock()
> when it is appropriate (e.g. when accessing any other state that is
> protected by the said lock).

I think this needs a note about the ordering:

"In case both the newly introduced per-domain rwlock and the pcidevs
lock is taken, the later must be acquired first."

> 
> Any write access to pdev->domain_list should be protected by both
> pcidevs_lock() and d->pci_lock in the write mode.

You also protect calls to vpci_remove_device() with the per-domain
pci_lock it seems, and that will need some explanation as it's not
obvious.

> 
> Suggested-by: Roger Pau Monné 
> Suggested-by: Jan Beulich 
> Signed-off-by: Volodymyr Babchuk 
> 
> ---
> 
> Changes in v8:
>  - New patch
> 
> Changes in v8 vs RFC:
>  - Removed all read_locks after discussion with Roger in #xendevel
>  - pci_release_devices() now returns the first error code
>  - extended commit message
>  - added missing lock in pci_remove_device()
>  - extended locked region in pci_add_device() to protect list_del() calls
> ---
>  xen/common/domain.c |  1 +
>  xen/drivers/passthrough/amd/pci_amd_iommu.c |  9 ++-
>  xen/drivers/passthrough/pci.c   | 68 +
>  xen/drivers/passthrough/vtd/iommu.c |  9 ++-
>  xen/include/xen/sched.h |  1 +
>  5 files changed, 74 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index caaa402637..5d8a8836da 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -645,6 +645,7 @@ struct domain *domain_create(domid_t domid,
>  
>  #ifdef CONFIG_HAS_PCI
>  INIT_LIST_HEAD(>pdev_list);
> +rwlock_init(>pci_lock);
>  #endif
>  
>  /* All error paths can depend on the above setup. */
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 94e3775506..e2f2e2e950 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -476,8 +476,13 @@ static int cf_check reassign_device(
>  
>  if ( devfn == pdev->devfn && pdev->domain != target )
>  {
> -list_move(>domain_list, >pdev_list);
> -pdev->domain = target;

You seem to have inadvertently dropped the above line? (and so devices
would keep the previous pdev->domain value)

> +write_lock(>domain->pci_lock);
> +list_del(>domain_list);
> +write_unlock(>domain->pci_lock);
> +
> +write_lock(>pci_lock);
> +list_add(>domain_list, >pdev_list);
> +write_unlock(>pci_lock);
>  }
>  
>  /*
> diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
> index 95846e84f2..5b4632ead2 100644
> --- a/xen/drivers/passthrough/pci.c
> +++ b/xen/drivers/passthrough/pci.c
> @@ -454,7 +454,9 @@ static void __init _pci_hide_device(struct pci_dev *pdev)
>  if ( pdev->domain )
>  return;
>  pdev->domain = dom_xen;
> +write_lock(_xen->pci_lock);
>  list_add(>domain_list, _xen->pdev_list);
> +write_unlock(_xen->pci_lock);
>  }
>  
>  int __init pci_hide_device(unsigned int seg, unsigned int bus,
> @@ -747,6 +749,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  ret = 0;
>  if ( !pdev->domain )
>  {
> +write_lock(_domain->pci_lock);
>  pdev->domain = hardware_domain;
>  list_add(>domain_list, _domain->pdev_list);
>  
> @@ -760,6 +763,7 @@ int pci_add_device(u16 seg, u8 bus, u8 devfn,
>  printk(XENLOG_ERR "Setup of vPCI failed: %d\n", ret);
>  list_del(>domain_list);
>  pdev->domain = NULL;
> +write_unlock(_domain->pci_lock);

Strictly speaking, this could move one line earlier, as accesses to
pdev->domain are not protected by the d->pci_lock?  Same in other
instances (above and below), as you seem to introduce a pattern to
perform accesses to pdev->domain with the rwlock taken.

>  goto out;
>  }
>  ret = iommu_add_device(pdev);
> @@ -768,8 +772,10 @@ 

Re: [PATCH] xen: privcmd: Add support for irqfd

2023-07-20 Thread Viresh Kumar
On 13-07-23, 14:40, Oleksandr Tyshchenko wrote:
> Viresh, great work!

Thanks Oleksandr.

> Do you perhaps have corresponding users-space (virtio backend) example 
> adopted for that feature (I would like to take a look at it if possible)?

This is taken care by the xen-vhost-frontend Rust crate in our case
(which was initially designed based on virtio-disk but has deviated a
lot from it now). Here is the commit of interest. The backends remain
unmodified though.

https://github.com/vireshk/xen-vhost-frontend/commit/d79c419f14c1f54240b3147c342894998c274364

And I have updated the commit with CONFIG_ARM64 thingy..

-- 
viresh



[PATCH V2 2/2] xen: privcmd: Add support for irqfd

2023-07-20 Thread Viresh Kumar
Xen provides support for injecting interrupts to the guests via the
HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based
device backend implementations, in an inefficient manner currently.

Generally, the Virtio backends are implemented to work with the Eventfd
based mechanism. In order to make such backends work with Xen, another
software layer needs to poll the Eventfds and raise an interrupt to the
guest using the Xen based mechanism. This results in an extra context
switch.

This is not a new problem in Linux though. It is present with other
hypervisors like KVM, etc. as well. The generic solution implemented in
the kernel for them is to provide an IOCTL call to pass the interrupt
details and eventfd, which lets the kernel take care of polling the
eventfd and raising of the interrupt, instead of handling this in user
space (which involves an extra context switch).

This patch adds support to inject a specific interrupt to guest using
the eventfd mechanism, by preventing the extra context switch.

Inspired by existing implementations for KVM, etc..

Signed-off-by: Viresh Kumar 
---
V1->V2:
- Improve error handling.
- Remove the unnecessary usage of list_for_each_entry_safe().
- Restrict the use of XEN_DMOP_set_irq_level to only ARM64.

 drivers/xen/privcmd.c  | 276 -
 include/uapi/xen/privcmd.h |  14 ++
 2 files changed, 288 insertions(+), 2 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index e2f580e30a86..0debc5482253 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -9,11 +9,16 @@
 
 #define pr_fmt(fmt) "xen:" KBUILD_MODNAME ": " fmt
 
+#include 
+#include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -833,6 +838,257 @@ static long privcmd_ioctl_mmap_resource(struct file *file,
return rc;
 }
 
+/* Irqfd support */
+static struct workqueue_struct *irqfd_cleanup_wq;
+static DEFINE_MUTEX(irqfds_lock);
+static LIST_HEAD(irqfds_list);
+
+struct privcmd_kernel_irqfd {
+   domid_t dom;
+   u8 level;
+   bool error;
+   u32 irq;
+   struct eventfd_ctx *eventfd;
+   struct work_struct shutdown;
+   wait_queue_entry_t wait;
+   struct list_head list;
+   poll_table pt;
+};
+
+static void irqfd_deactivate(struct privcmd_kernel_irqfd *kirqfd)
+{
+   lockdep_assert_held(_lock);
+
+   list_del_init(>list);
+   queue_work(irqfd_cleanup_wq, >shutdown);
+}
+
+static void irqfd_shutdown(struct work_struct *work)
+{
+   struct privcmd_kernel_irqfd *kirqfd =
+   container_of(work, struct privcmd_kernel_irqfd, shutdown);
+   u64 cnt;
+
+   eventfd_ctx_remove_wait_queue(kirqfd->eventfd, >wait, );
+   eventfd_ctx_put(kirqfd->eventfd);
+   kfree(kirqfd);
+}
+
+static void irqfd_inject(struct privcmd_kernel_irqfd *kirqfd)
+{
+   /* Different architectures support this differently */
+   struct xen_dm_op dm_op = {
+#ifdef CONFIG_ARM64
+   .op = XEN_DMOP_set_irq_level,
+   .u.set_irq_level.irq = kirqfd->irq,
+   .u.set_irq_level.level = kirqfd->level,
+#endif
+   };
+   struct xen_dm_op_buf xbufs = {
+   .size = sizeof(dm_op),
+   };
+   u64 cnt;
+   long rc;
+
+   eventfd_ctx_do_read(kirqfd->eventfd, );
+   set_xen_guest_handle(xbufs.h, _op);
+
+   xen_preemptible_hcall_begin();
+   rc = HYPERVISOR_dm_op(kirqfd->dom, 1, );
+   xen_preemptible_hcall_end();
+
+   /* Don't repeat the error message for consecutive failures */
+   if (rc && !kirqfd->error) {
+   pr_err("Failed to configure irq: %d to level: %d for guest 
domain: %d\n",
+  kirqfd->irq, kirqfd->level, kirqfd->dom);
+   }
+
+   kirqfd->error = !!rc;
+}
+
+static int
+irqfd_wakeup(wait_queue_entry_t *wait, unsigned int mode, int sync, void *key)
+{
+   struct privcmd_kernel_irqfd *kirqfd =
+   container_of(wait, struct privcmd_kernel_irqfd, wait);
+   __poll_t flags = key_to_poll(key);
+
+   if (flags & EPOLLIN)
+   irqfd_inject(kirqfd);
+
+   if (flags & EPOLLHUP) {
+   mutex_lock(_lock);
+   irqfd_deactivate(kirqfd);
+   mutex_unlock(_lock);
+   }
+
+   return 0;
+}
+
+static void
+irqfd_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt)
+{
+   struct privcmd_kernel_irqfd *kirqfd =
+   container_of(pt, struct privcmd_kernel_irqfd, pt);
+
+   add_wait_queue_priority(wqh, >wait);
+}
+
+static int privcmd_irqfd_assign(struct privcmd_irqfd *irqfd)
+{
+   struct privcmd_kernel_irqfd *kirqfd, *tmp;
+   struct eventfd_ctx *eventfd;
+   __poll_t events;
+   struct fd f;
+   int ret;
+
+   kirqfd = kzalloc(sizeof(*kirqfd), GFP_KERNEL);
+   if (!kirqfd)
+   return -ENOMEM;
+
+   kirqfd->irq = irqfd->irq;
+   

[PATCH V2 1/2] xen: Update dm_op.h from Xen public header

2023-07-20 Thread Viresh Kumar
Update the definitions in dm_op.h from Xen public header.

Signed-off-by: Viresh Kumar 
---
V1->V2:
- New commit.

 include/xen/interface/hvm/dm_op.h | 445 ++
 1 file changed, 445 insertions(+)

diff --git a/include/xen/interface/hvm/dm_op.h 
b/include/xen/interface/hvm/dm_op.h
index 08d972f87c7b..bc6948fd1815 100644
--- a/include/xen/interface/hvm/dm_op.h
+++ b/include/xen/interface/hvm/dm_op.h
@@ -6,6 +6,451 @@
 #ifndef __XEN_PUBLIC_HVM_DM_OP_H__
 #define __XEN_PUBLIC_HVM_DM_OP_H__
 
+#include 
+#include 
+
+#ifndef uint64_aligned_t
+#define uint64_aligned_t uint64_t
+#endif
+
+/*
+ * IOREQ Servers
+ *
+ * The interface between an I/O emulator and Xen is called an IOREQ Server.
+ * A domain supports a single 'legacy' IOREQ Server which is instantiated if
+ * parameter...
+ *
+ * HVM_PARAM_IOREQ_PFN is read (to get the gfn containing the synchronous
+ * ioreq structures), or...
+ * HVM_PARAM_BUFIOREQ_PFN is read (to get the gfn containing the buffered
+ * ioreq ring), or...
+ * HVM_PARAM_BUFIOREQ_EVTCHN is read (to get the event channel that Xen uses
+ * to request buffered I/O emulation).
+ *
+ * The following hypercalls facilitate the creation of IOREQ Servers for
+ * 'secondary' emulators which are invoked to implement port I/O, memory, or
+ * PCI config space ranges which they explicitly register.
+ */
+
+typedef uint16_t ioservid_t;
+
+/*
+ * XEN_DMOP_create_ioreq_server: Instantiate a new IOREQ Server for a
+ *   secondary emulator.
+ *
+ * The  handed back is unique for target domain. The valur of
+ *  should be one of HVM_IOREQSRV_BUFIOREQ_* defined in
+ * hvm_op.h. If the value is HVM_IOREQSRV_BUFIOREQ_OFF then  the buffered
+ * ioreq ring will not be allocated and hence all emulation requests to
+ * this server will be synchronous.
+ */
+#define XEN_DMOP_create_ioreq_server 1
+
+struct xen_dm_op_create_ioreq_server {
+/* IN - should server handle buffered ioreqs */
+uint8_t handle_bufioreq;
+uint8_t pad[3];
+/* OUT - server id */
+ioservid_t id;
+};
+
+/*
+ * XEN_DMOP_get_ioreq_server_info: Get all the information necessary to
+ * access IOREQ Server .
+ *
+ * If the IOREQ Server is handling buffered emulation requests, the
+ * emulator needs to bind to event channel  to listen for
+ * them. (The event channels used for synchronous emulation requests are
+ * specified in the per-CPU ioreq structures).
+ * In addition, if the XENMEM_acquire_resource memory op cannot be used,
+ * the emulator will need to map the synchronous ioreq structures and
+ * buffered ioreq ring (if it exists) from guest memory. If  does
+ * not contain XEN_DMOP_no_gfns then these pages will be made available and
+ * the frame numbers passed back in gfns  and 
+ * respectively. (If the IOREQ Server is not handling buffered emulation
+ * only  will be valid).
+ *
+ * NOTE: To access the synchronous ioreq structures and buffered ioreq
+ *   ring, it is preferable to use the XENMEM_acquire_resource memory
+ *   op specifying resource type XENMEM_resource_ioreq_server.
+ */
+#define XEN_DMOP_get_ioreq_server_info 2
+
+struct xen_dm_op_get_ioreq_server_info {
+/* IN - server id */
+ioservid_t id;
+/* IN - flags */
+uint16_t flags;
+
+#define _XEN_DMOP_no_gfns 0
+#define XEN_DMOP_no_gfns (1u << _XEN_DMOP_no_gfns)
+
+/* OUT - buffered ioreq port */
+evtchn_port_t bufioreq_port;
+/* OUT - sync ioreq gfn (see block comment above) */
+uint64_aligned_t ioreq_gfn;
+/* OUT - buffered ioreq gfn (see block comment above)*/
+uint64_aligned_t bufioreq_gfn;
+};
+
+/*
+ * XEN_DMOP_map_io_range_to_ioreq_server: Register an I/O range for
+ *emulation by the client of
+ *IOREQ Server .
+ * XEN_DMOP_unmap_io_range_from_ioreq_server: Deregister an I/O range
+ *previously registered for
+ *emulation by the client of
+ *IOREQ Server .
+ *
+ * There are three types of I/O that can be emulated: port I/O, memory
+ * accesses and PCI config space accesses. The  field denotes which
+ * type of range* the  and  (inclusive) fields are specifying.
+ * PCI config space ranges are specified by segment/bus/device/function
+ * values which should be encoded using the DMOP_PCI_SBDF helper macro
+ * below.
+ *
+ * NOTE: unless an emulation request falls entirely within a range mapped
+ * by a secondary emulator, it will not be passed to that emulator.
+ */
+#define XEN_DMOP_map_io_range_to_ioreq_server 3
+#define XEN_DMOP_unmap_io_range_from_ioreq_server 4
+
+struct xen_dm_op_ioreq_server_range {
+/* IN - server id */
+ioservid_t id;
+uint16_t pad;
+/* IN - type of range */
+uint32_t type;
+# define XEN_DMOP_IO_RANGE_PORT   0 /* I/O port range */
+# define XEN_DMOP_IO_RANGE_MEMORY 

Re: [PATCH v2 1/3] xen/arm: pci: introduce PCI_PASSTHROUGH Kconfig option

2023-07-20 Thread Julien Grall

Hi,

On 18/07/2023 18:35, Stewart Hildebrand wrote:

On 7/13/23 14:40, Julien Grall wrote:

Hi Stewart,

On 07/07/2023 02:47, Stewart Hildebrand wrote:

From: Rahul Singh 

Setting CONFIG_PCI_PASSTHROUGH=y will enable PCI passthrough on ARM, even though
the feature is not yet complete in the current upstream codebase. The purpose of
this is to make it easier to enable the necessary configs (HAS_PCI, HAS_VPCI) 
for
testing and development of PCI passthrough on ARM.

Since PCI passthrough on ARM is still work in progress at this time, make it
depend on EXPERT.


While preparing the patch for committing, I noticed that HAS_PASSTHROUGH
will now allow the user to select one of the IOMMU quarantine options.

There are three of them right now:
   1. none
   2. basic (i.e. faulting)
   3. scratch page

The latter is unlikely to work on Arm because we don't setup the scratch
page. AFAIU, for that, we would need to implement the callback
quarantine_init().

I would expect 1 and 2 work. That said, I think 1. would behave like 2.
because on Arm the device should not be automatically re-assigned to
dom0. I know this is correct for platform device, but will it be valid
for PCI as well?


In a system with dom0 where the guest is created from the xl toolstack, we rely on 
"xl pci-assignable-add". Upon domain destruction, the device automatically gets 
assigned to domIO.


Ok. To clarify, does this mean any DMA will fault, the same as for 
platform device?



However, there's nothing preventing a user from attempting to invoke "xl 
pci-assignable-remove", which should assign the device back to dom0, but it is not 
automatic.


I don't think we want to fully prevent a user to re-assign a device to 
dom0. But we at least want to avoid re-assigning the device to dom0 by 
default. After that a user can reset the device before it gets 
re-assigned to dom0.


Cheers,

--
Julien Grall



Re: [PATCH v3 3/3] xen/riscv: introduce identity mapping

2023-07-20 Thread Oleksii
On Thu, 2023-07-20 at 07:58 +0200, Jan Beulich wrote:
> On 19.07.2023 18:35, Oleksii wrote:
> > On Tue, 2023-07-18 at 17:03 +0200, Jan Beulich wrote:
> > > > +    unsigned long load_end = LINK_TO_LOAD(_end);
> > > > +    unsigned long pt_level_size = XEN_PT_LEVEL_SIZE(i
> > > > -
> > > > 1);
> > > > +    unsigned long xen_size = ROUNDUP(load_end -
> > > > load_start, pt_level_size);
> > > > +    unsigned long page_entries_num = xen_size /
> > > > pt_level_size;
> > > > +
> > > > +    while ( page_entries_num-- )
> > > > +    pgtbl[index++].pte = 0;
> > > > +
> > > > +    break;
> > > 
> > > Unless there's a "not crossing a 2Mb boundary" guarantee
> > > somewhere
> > > that I've missed, this "break" is still too early afaict.
> > If I will add a '2 MB boundary check' for load_start and
> > linker_start
> > could it be an upstreamable solution?
> > 
> > Something like:
> >     if ( !IS_ALIGNED(load_start, MB(2) )
> > printk("load_start should be 2Mb algined\n");
> > and
> >     ASSERT( !IS_ALIGNED(XEN_VIRT_START, MB(2) )
> > in xen.lds.S.
> 
> Arranging for the linked address to be 2Mb-aligned is certainly
> reasonable. Whether expecting the load address to also be depends
> on whether that can be arranged for (which in turn depends on boot
> loader behavior); it cannot be left to "luck".
Maybe I didn't quite understand you here, but if Xen has an alignment
check of load address then boot loader has to follow the alignment
requirements of Xen. So it doesn't look as 'luck'.

> 
> > Then we will have completely different L0 tables for identity
> > mapping
> > and not identity and the code above will be correct.
> 
> As long as Xen won't grow beyond 2Mb total. Considering that at
> some point you may want to use large page mappings for .text,
> .data, and .rodata, that alone would grow Xen to 6 Mb (or really 8,
> assuming .init goes separate as well). That's leaving aside the
> realistic option of the mere sum of all sections being larger than
> 2. That said, even Arm64 with ACPI is still quite a bit below 2Mb.
> x86 is nearing 2.5 though in even a somewhat limited config;
> allyesconfig may well be beyond that already.
I am missing something about Xen size. Lets assume that Xen will be
mapped using only 4k pagees ( like it is done now ). Then if Xen will
be more then 2Mb then only what will be changed is a number of page
tables so it is only question of changing of PGTBL_INITIAL_COUNT ( in
case of RISC-V).

Could you please explain why Xen will grow to 6/8 MB in case of larger
page mappings? In case of larger page mapping fewer tables are needed.
For example, if we would like to use 2Mb pages then we will stop at L1
page table and write an physical address to L1 page table entry instead
of creating new L0 page table.

> 
> Of course you may legitimately leave dealing with that to the
> future.
Then I'll send new patch series with updated alignment requirements.

~ Oleksii



[PATCH v3 6/6] libxl: add support for parsing MSR features

2023-07-20 Thread Roger Pau Monne
Introduce support for handling MSR features in
libxl_cpuid_parse_config().  The MSR policies are added to the
libxl_cpuid_policy like the CPUID one, which gets passed to
xc_cpuid_apply_policy().

This allows existing users of libxl to provide MSR related features as
key=value pairs to libxl_cpuid_parse_config() without requiring the
usage of a different API.

Signed-off-by: Roger Pau Monné 
Acked-by: Anthony PERARD 
---
Changes since v2:
 - Add some braces.
---
 tools/libs/light/libxl_cpuid.c | 64 +-
 1 file changed, 63 insertions(+), 1 deletion(-)

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index f8b2e45ee681..15fac03a9046 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -157,6 +157,60 @@ static int cpuid_add(libxl_cpuid_policy_list *policy,
 return 0;
 }
 
+static struct xc_msr *msr_find_match(libxl_cpuid_policy_list *pl, uint32_t 
index)
+{
+unsigned int i = 0;
+libxl_cpuid_policy_list policy = *pl;
+
+if (policy == NULL)
+policy = *pl = calloc(1, sizeof(*policy));
+
+if (policy->msr != NULL) {
+for (i = 0; policy->msr[i].index != XC_MSR_INPUT_UNUSED; i++) {
+if (policy->msr[i].index == index) {
+return >msr[i];
+}
+}
+}
+
+policy->msr = realloc(policy->msr, sizeof(struct xc_msr) * (i + 2));
+policy->msr[i].index = index;
+memset(policy->msr[i].policy, 'x', ARRAY_SIZE(policy->msr[0].policy) - 1);
+policy->msr[i].policy[ARRAY_SIZE(policy->msr[0].policy) - 1] = '\0';
+policy->msr[i + 1].index = XC_MSR_INPUT_UNUSED;
+
+return >msr[i];
+}
+
+static int msr_add(libxl_cpuid_policy_list *policy, uint32_t index, unsigned 
int bit,
+   const char *val)
+{
+struct xc_msr *entry = msr_find_match(policy, index);
+
+/* Only allow options taking a character for MSRs, no values allowed. */
+if (strlen(val) != 1)
+return 3;
+
+switch (val[0]) {
+case '0':
+case '1':
+case 'x':
+case 'k':
+entry->policy[63 - bit] = val[0];
+break;
+
+case 's':
+/* Translate s -> k as xc_msr doesn't support the deprecated 's'. */
+entry->policy[63 - bit] = 'k';
+break;
+
+default:
+return 3;
+}
+
+return 0;
+}
+
 struct feature_name {
 const char *name;
 unsigned int bit;
@@ -336,7 +390,15 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*policy, const char* str)
 }
 
 case FEAT_MSR:
-return 2;
+{
+unsigned int bit = feat->bit % 32;
+
+if (feature_to_policy[feat->bit / 32].msr.reg == CPUID_REG_EDX)
+bit += 32;
+
+return msr_add(policy, feature_to_policy[feat->bit / 32].msr.index,
+   bit, val);
+}
 }
 
 return 2;
-- 
2.41.0




[PATCH v3 5/6] libxl: use the cpuid feature names from cpufeatureset.h

2023-07-20 Thread Roger Pau Monne
The current implementation in libxl_cpuid_parse_config() requires
keeping a list of cpuid feature bits that should be mostly in sync
with the contents of cpufeatureset.h.

Avoid such duplication by using the automatically generated list of
cpuid features in INIT_FEATURE_NAMES in order to map feature names to
featureset bits, and then translate from featureset bits into cpuid
leaf, subleaf, register tuple.

Note that the full contents of the previous cpuid translation table
can't be removed.  That's because some feature names allowed by libxl
are not described in the featuresets, or because naming has diverged
and the previous nomenclature is preserved for compatibility reasons.

Should result in no functional change observed by callers, albeit some
new cpuid features will be available as a result of the change.

While there constify cpuid_flags name field.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Anthony PERARD 
---
Changes since v1:
 - const unnamed structure cast.
 - Declare struct feature_name outside the function.
 - Use strcmp.
 - Fix indentation.
 - Add back missing feature name options.
 - Return ERROR_NOMEM if allocation fails.
 - Improve xl.cfg documentation about how to reference the features
   described in the public header.
---
 docs/man/xl.cfg.5.pod.in   |  24 +--
 tools/libs/light/libxl_cpuid.c | 267 -
 tools/xl/xl_parse.c|   3 +
 3 files changed, 107 insertions(+), 187 deletions(-)

diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
index 3979be2a590a..55161856f4c7 100644
--- a/docs/man/xl.cfg.5.pod.in
+++ b/docs/man/xl.cfg.5.pod.in
@@ -2010,24 +2010,16 @@ proccount procpkg stepping
 
 =back
 
-List of keys taking a character:
+List of keys taking a character can be found in the public header file
+Lhttps://xenbits.xen.org/docs/unstable/hypercall/x86_64/include,public,arch-x86,cpufeatureset.h.html>
 
-=over 4
-
-3dnow 3dnowext 3dnowprefetch abm acpi adx aes altmovcr8 apic arat avx avx2
-avx512-4fmaps avx512-4vnniw avx512bw avx512cd avx512dq avx512er avx512f
-avx512ifma avx512pf avx512vbmi avx512vl bmi1 bmi2 clflushopt clfsh clwb cmov
-cmplegacy cmpxchg16 cmpxchg8 cmt cntxid dca de ds dscpl dtes64 erms est extapic
-f16c ffxsr fma fma4 fpu fsgsbase fxsr hle htt hypervisor ia64 ibs invpcid
-invtsc lahfsahf lm lwp mca mce misalignsse mmx mmxext monitor movbe mpx msr
-mtrr nodeid nx ospke osvw osxsave pae page1gb pat pbe pcid pclmulqdq pdcm
-perfctr_core perfctr_nb pge pku popcnt pse pse36 psn rdrand rdseed rdtscp rtm
-sha skinit smap smep smx ss sse sse2 sse3 sse4.1 sse4.2 sse4_1 sse4_2 sse4a
-ssse3 svm svm_decode svm_lbrv svm_npt svm_nrips svm_pausefilt svm_tscrate
-svm_vmcbclean syscall sysenter tbm tm tm2 topoext tsc tsc-deadline tsc_adjust
-umip vme vmx wdt x2apic xop xsave xtpr
+The feature names described in C should be specified in all
+lowercase letters, and with underscores converted to hyphens.  For example in
+order to reference feature C the string C should be used.
 
-=back
+Note that C is described as an option that takes a value, and that
+takes precedence over the C flag in C.  The feature
+flag must be referenced as C.
 
 =back
 
diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index c62247f9bda7..f8b2e45ee681 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -14,6 +14,8 @@
 
 #include "libxl_internal.h"
 
+#include 
+
 int libxl__cpuid_policy_is_empty(libxl_cpuid_policy_list *pl)
 {
 return !*pl || (!libxl_cpuid_policy_list_length(pl) && !(*pl)->msr);
@@ -60,7 +62,7 @@ void libxl_cpuid_dispose(libxl_cpuid_policy_list *pl)
  * Used for the static structure describing all features.
  */
 struct cpuid_flags {
-char* name;
+const char *name;
 uint32_t leaf;
 uint32_t subleaf;
 int reg;
@@ -153,7 +155,19 @@ static int cpuid_add(libxl_cpuid_policy_list *policy,
 entry->policy[flag->reg - 1] = resstr;
 
 return 0;
+}
+
+struct feature_name {
+const char *name;
+unsigned int bit;
+};
+
+static int search_feature(const void *a, const void *b)
+{
+const char *key = a;
+const char *feat = ((const struct feature_name *)b)->name;
 
+return strcmp(key, feat);
 }
 
 /* parse a single key=value pair and translate it into the libxc
@@ -176,208 +190,42 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*policy, const char* str)
 {"proccount",0x0001, NA, CPUID_REG_EBX, 16,  8},
 {"localapicid",  0x0001, NA, CPUID_REG_EBX, 24,  8},
 
-{"sse3", 0x0001, NA, CPUID_REG_ECX,  0,  1},
-{"pclmulqdq",0x0001, NA, CPUID_REG_ECX,  1,  1},
-{"dtes64",   0x0001, NA, CPUID_REG_ECX,  2,  1},
-{"monitor",  0x0001, NA, CPUID_REG_ECX,  3,  1},
-{"dscpl",0x0001, NA, CPUID_REG_ECX,  4,  1},
-{"vmx",  0x0001, NA, CPUID_REG_ECX,  5,  1},
-{"smx",  0x0001, NA, CPUID_REG_ECX,  6,  

[PATCH v3 4/6] libxl: split logic to parse user provided CPUID features

2023-07-20 Thread Roger Pau Monne
Move the CPUID value parsers out of libxl_cpuid_parse_config() into a
newly created cpuid_add() local helper.  This is in preparation for
also adding MSR feature parsing support.

No functional change intended.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Anthony PERARD 
---
 tools/libs/light/libxl_cpuid.c | 120 +
 1 file changed, 63 insertions(+), 57 deletions(-)

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index 68b797886642..c62247f9bda7 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -96,6 +96,66 @@ static struct xc_xend_cpuid 
*cpuid_find_match(libxl_cpuid_policy_list *pl,
 return *list + i;
 }
 
+static int cpuid_add(libxl_cpuid_policy_list *policy,
+ const struct cpuid_flags *flag, const char *val)
+{
+struct xc_xend_cpuid *entry = cpuid_find_match(policy, flag->leaf,
+   flag->subleaf);
+unsigned long num;
+char flags[33], *resstr, *endptr;
+unsigned int i;
+
+resstr = entry->policy[flag->reg - 1];
+num = strtoull(val, , 0);
+flags[flag->length] = 0;
+if (endptr != val) {
+/* if this was a valid number, write the binary form into the string */
+for (i = 0; i < flag->length; i++) {
+flags[flag->length - 1 - i] = "01"[!!(num & (1 << i))];
+}
+} else {
+switch(val[0]) {
+case 'x': case 'k': case 's':
+memset(flags, val[0], flag->length);
+break;
+default:
+return 3;
+}
+}
+
+if (resstr == NULL) {
+resstr = strdup("");
+}
+
+/* the family and model entry is potentially split up across
+ * two fields in Fn_0001_EAX, so handle them here separately.
+ */
+if (!strcmp(flag->name, "family")) {
+if (num < 16) {
+memcpy(resstr + (32 - 4) - flag->bit, flags + 4, 4);
+memcpy(resstr + (32 - 8) - 20, "", 8);
+} else {
+num -= 15;
+memcpy(resstr + (32 - 4) - flag->bit, "", 4);
+for (i = 0; i < 7; i++) {
+flags[7 - i] = "01"[num & 1];
+num >>= 1;
+}
+memcpy(resstr + (32 - 8) - 20, flags, 8);
+}
+} else if (!strcmp(flag->name, "model")) {
+memcpy(resstr + (32 - 4) - 16, flags, 4);
+memcpy(resstr + (32 - 4) - flag->bit, flags + 4, 4);
+} else {
+memcpy(resstr + (32 - flag->length) - flag->bit, flags,
+   flag->length);
+}
+entry->policy[flag->reg - 1] = resstr;
+
+return 0;
+
+}
+
 /* parse a single key=value pair and translate it into the libxc
  * used interface using 32-characters strings for each register.
  * Will overwrite earlier entries and thus can be called multiple
@@ -340,12 +400,8 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*policy, const char* str)
 {NULL, 0, NA, CPUID_REG_INV, 0, 0}
 };
 #undef NA
-char *sep, *val, *endptr;
-int i;
+const char *sep, *val;
 const struct cpuid_flags *flag;
-struct xc_xend_cpuid *entry;
-unsigned long num;
-char flags[33], *resstr;
 
 sep = strchr(str, '=');
 if (sep == NULL) {
@@ -355,60 +411,10 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*policy, const char* str)
 }
 for (flag = cpuid_flags; flag->name != NULL; flag++) {
 if(!strncmp(str, flag->name, sep - str) && flag->name[sep - str] == 0)
-break;
-}
-if (flag->name == NULL) {
-return 2;
-}
-entry = cpuid_find_match(policy, flag->leaf, flag->subleaf);
-resstr = entry->policy[flag->reg - 1];
-num = strtoull(val, , 0);
-flags[flag->length] = 0;
-if (endptr != val) {
-/* if this was a valid number, write the binary form into the string */
-for (i = 0; i < flag->length; i++) {
-flags[flag->length - 1 - i] = "01"[!!(num & (1 << i))];
-}
-} else {
-switch(val[0]) {
-case 'x': case 'k': case 's':
-memset(flags, val[0], flag->length);
-break;
-default:
-return 3;
-}
-}
-
-if (resstr == NULL) {
-resstr = strdup("");
+return cpuid_add(policy, flag, val);
 }
 
-/* the family and model entry is potentially split up across
- * two fields in Fn_0001_EAX, so handle them here separately.
- */
-if (!strncmp(str, "family", sep - str)) {
-if (num < 16) {
-memcpy(resstr + (32 - 4) - flag->bit, flags + 4, 4);
-memcpy(resstr + (32 - 8) - 20, "", 8);
-} else {
-num -= 15;
-memcpy(resstr + (32 - 4) - flag->bit, "", 4);
-for (i = 0; i < 7; i++) {
-flags[7 - i] = "01"[num & 1];
-num >>= 1;
-}
-  

[PATCH v3 3/6] libxl: introduce MSR data in libxl_cpuid_policy

2023-07-20 Thread Roger Pau Monne
Add a new array field to libxl_cpuid_policy in order to store the MSR
policies.

Adding the MSR data in the libxl_cpuid_policy_list type is done so
that existing users can seamlessly pass MSR features as part of the
CPUID data, without requiring the introduction of a separate
domain_build_info field, and a new set of handlers functions.

Signed-off-by: Roger Pau Monné 
---
Changes since v2:
 - Unconditionally call free().
 - Implement the JSON marshaling functions.
---
It would be nice to rename the json output field to 'cpu_policy'
instead of 'cpuid', so that it looks like:

"cpu_policy": {
"cpuid": [
{
"leaf": 7,
"subleaf": 0,
"edx": "xx1x"
},
{
"leaf": 1,
"ebx": "0001"
}
}
}
],
"msr": [
{
"index": 266,
"policy": 
"xx1xx1x1"
}
]
},

Sadly I have no idea how to do that, and can be done in a followup
change anyway.
---
 tools/libs/light/libxl_cpuid.c| 142 ++
 tools/libs/light/libxl_internal.h |   1 +
 tools/libs/light/libxl_types.idl  |   2 +-
 3 files changed, 128 insertions(+), 17 deletions(-)

diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index 3c8b2a72c0b8..68b797886642 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -16,7 +16,7 @@
 
 int libxl__cpuid_policy_is_empty(libxl_cpuid_policy_list *pl)
 {
-return !libxl_cpuid_policy_list_length(pl);
+return !*pl || (!libxl_cpuid_policy_list_length(pl) && !(*pl)->msr);
 }
 
 void libxl_cpuid_dispose(libxl_cpuid_policy_list *pl)
@@ -40,6 +40,8 @@ void libxl_cpuid_dispose(libxl_cpuid_policy_list *pl)
 free(policy->cpuid);
 }
 
+free(policy->msr);
+
 free(policy);
 *pl = NULL;
 return;
@@ -516,7 +518,8 @@ int libxl__cpuid_legacy(libxl_ctx *ctx, uint32_t domid, 
bool restore,
 
 r = xc_cpuid_apply_policy(ctx->xch, domid, restore, NULL, 0,
   pae, itsc, nested_virt,
-  info->cpuid ? info->cpuid->cpuid : NULL, NULL);
+  info->cpuid ? info->cpuid->cpuid : NULL,
+  info->cpuid ? info->cpuid->msr : NULL);
 if (r)
 LOGEVD(ERROR, -r, domid, "Failed to apply CPUID policy");
 
@@ -528,16 +531,22 @@ static const char *input_names[2] = { "leaf", "subleaf" };
 static const char *policy_names[4] = { "eax", "ebx", "ecx", "edx" };
 /*
  * Aiming for:
- * [
- * { 'leaf':'val-eax',
- *   'subleaf': 'val-ecx',
- *   'eax': 'filter',
- *   'ebx': 'filter',
- *   'ecx': 'filter',
- *   'edx': 'filter' },
- * { 'leaf':'val-eax', ..., 'eax': 'filter', ... },
- * ... etc ...
- * ]
+ * {   'cpuid': [
+ *  { 'leaf':'val-eax',
+ *'subleaf': 'val-ecx',
+ *'eax': 'filter',
+ *'ebx': 'filter',
+ *'ecx': 'filter',
+ *'edx': 'filter' },
+ *  { 'leaf':'val-eax', ..., 'eax': 'filter', ... },
+ *  ... etc ...
+ * ],
+ * 'msr': [
+ *{ 'index': 'val-index',
+ *  'policy': 'filter', },
+ *  ... etc ...
+ * ],
+ * }
  */
 
 yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen hand,
@@ -545,9 +554,16 @@ yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen 
hand,
 {
 libxl_cpuid_policy_list policy = *pl;
 struct xc_xend_cpuid *cpuid;
+struct xc_msr *msr;
 yajl_gen_status s;
 int i, j;
 
+s = yajl_gen_map_open(hand);
+if (s != yajl_gen_status_ok) goto out;
+
+s = libxl__yajl_gen_asciiz(hand, "cpuid");
+if (s != yajl_gen_status_ok) goto out;
+
 s = yajl_gen_array_open(hand);
 if (s != yajl_gen_status_ok) goto out;
 
@@ -582,6 +598,39 @@ yajl_gen_status libxl_cpuid_policy_list_gen_json(yajl_gen 
hand,
 
 empty:
 s = yajl_gen_array_close(hand);
+if (s != yajl_gen_status_ok) goto out;
+
+s = libxl__yajl_gen_asciiz(hand, "msr");
+if (s != yajl_gen_status_ok) goto out;
+
+s = yajl_gen_array_open(hand);
+if (s != yajl_gen_status_ok) goto out;
+
+if (!policy || !policy->msr) goto done;
+msr = policy->msr;
+
+for (i = 0; msr[i].index != XC_MSR_INPUT_UNUSED; i++) {
+s = yajl_gen_map_open(hand);
+if (s != yajl_gen_status_ok) goto out;
+
+s = libxl__yajl_gen_asciiz(hand, "index");
+if (s != yajl_gen_status_ok) goto out;
+s = yajl_gen_integer(hand, msr[i].index);
+if (s != yajl_gen_status_ok) goto out;
+s = libxl__yajl_gen_asciiz(hand, "policy");
+if (s != yajl_gen_status_ok) goto out;
+s = yajl_gen_string(hand,
+(const unsigned char *)msr[i].policy, 64);
+  

[PATCH v3 2/6] libxl: change the type of libxl_cpuid_policy_list

2023-07-20 Thread Roger Pau Monne
Currently libxl_cpuid_policy_list is an opaque type to the users of
libxl, and internally it's an array of xc_xend_cpuid objects.

Change the type to instead be a structure that contains one array for
CPUID policies, in preparation for it also holding another array for
MSR policies.

Signed-off-by: Roger Pau Monné 
Reviewed-by: Anthony PERARD 
---
Changes since v2:
 - Add braces in the inner loop.
 - Do not set the policy to NULL.
---
 tools/include/libxl.h |  8 +--
 tools/libs/light/libxl_cpuid.c| 87 ---
 tools/libs/light/libxl_internal.h |  4 ++
 3 files changed, 63 insertions(+), 36 deletions(-)

diff --git a/tools/include/libxl.h b/tools/include/libxl.h
index cac641a7eba2..f3975ecc021f 100644
--- a/tools/include/libxl.h
+++ b/tools/include/libxl.h
@@ -1455,12 +1455,8 @@ typedef struct {
 void libxl_bitmap_init(libxl_bitmap *map);
 void libxl_bitmap_dispose(libxl_bitmap *map);
 
-/*
- * libxl_cpuid_policy is opaque in the libxl ABI.  Users of both libxl and
- * libxc may not make assumptions about xc_xend_cpuid.
- */
-typedef struct xc_xend_cpuid libxl_cpuid_policy;
-typedef libxl_cpuid_policy * libxl_cpuid_policy_list;
+struct libxl__cpu_policy;
+typedef struct libxl__cpu_policy *libxl_cpuid_policy_list;
 void libxl_cpuid_dispose(libxl_cpuid_policy_list *cpuid_list);
 int libxl_cpuid_policy_list_length(const libxl_cpuid_policy_list *l);
 void libxl_cpuid_policy_list_copy(libxl_ctx *ctx,
diff --git a/tools/libs/light/libxl_cpuid.c b/tools/libs/light/libxl_cpuid.c
index c96aeb3bce46..3c8b2a72c0b8 100644
--- a/tools/libs/light/libxl_cpuid.c
+++ b/tools/libs/light/libxl_cpuid.c
@@ -19,22 +19,29 @@ int libxl__cpuid_policy_is_empty(libxl_cpuid_policy_list 
*pl)
 return !libxl_cpuid_policy_list_length(pl);
 }
 
-void libxl_cpuid_dispose(libxl_cpuid_policy_list *p_cpuid_list)
+void libxl_cpuid_dispose(libxl_cpuid_policy_list *pl)
 {
-int i, j;
-libxl_cpuid_policy_list cpuid_list = *p_cpuid_list;
+libxl_cpuid_policy_list policy = *pl;
 
-if (cpuid_list == NULL)
+if (policy == NULL)
 return;
-for (i = 0; cpuid_list[i].input[0] != XEN_CPUID_INPUT_UNUSED; i++) {
-for (j = 0; j < 4; j++)
-if (cpuid_list[i].policy[j] != NULL) {
-free(cpuid_list[i].policy[j]);
-cpuid_list[i].policy[j] = NULL;
+
+if (policy->cpuid) {
+unsigned int i, j;
+struct xc_xend_cpuid *cpuid_list = policy->cpuid;
+
+for (i = 0; cpuid_list[i].input[0] != XEN_CPUID_INPUT_UNUSED; i++) {
+for (j = 0; j < 4; j++) {
+if (cpuid_list[i].policy[j] != NULL) {
+free(cpuid_list[i].policy[j]);
+}
 }
+}
+free(policy->cpuid);
 }
-free(cpuid_list);
-*p_cpuid_list = NULL;
+
+free(policy);
+*pl = NULL;
 return;
 }
 
@@ -62,11 +69,17 @@ struct cpuid_flags {
 /* go through the dynamic array finding the entry for a specified leaf.
  * if no entry exists, allocate one and return that.
  */
-static libxl_cpuid_policy_list cpuid_find_match(libxl_cpuid_policy_list *list,
-  uint32_t leaf, uint32_t subleaf)
+static struct xc_xend_cpuid *cpuid_find_match(libxl_cpuid_policy_list *pl,
+  uint32_t leaf, uint32_t subleaf)
 {
+libxl_cpuid_policy_list policy = *pl;
+struct xc_xend_cpuid **list;
 int i = 0;
 
+if (policy == NULL)
+policy = *pl = calloc(1, sizeof(*policy));
+
+list = >cpuid;
 if (*list != NULL) {
 for (i = 0; (*list)[i].input[0] != XEN_CPUID_INPUT_UNUSED; i++) {
 if ((*list)[i].input[0] == leaf && (*list)[i].input[1] == subleaf)
@@ -86,7 +99,7 @@ static libxl_cpuid_policy_list 
cpuid_find_match(libxl_cpuid_policy_list *list,
  * Will overwrite earlier entries and thus can be called multiple
  * times.
  */
-int libxl_cpuid_parse_config(libxl_cpuid_policy_list *cpuid, const char* str)
+int libxl_cpuid_parse_config(libxl_cpuid_policy_list *policy, const char* str)
 {
 #define NA XEN_CPUID_INPUT_UNUSED
 static const struct cpuid_flags cpuid_flags[] = {
@@ -345,7 +358,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*cpuid, const char* str)
 if (flag->name == NULL) {
 return 2;
 }
-entry = cpuid_find_match(cpuid, flag->leaf, flag->subleaf);
+entry = cpuid_find_match(policy, flag->leaf, flag->subleaf);
 resstr = entry->policy[flag->reg - 1];
 num = strtoull(val, , 0);
 flags[flag->length] = 0;
@@ -400,7 +413,7 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
*cpuid, const char* str)
  * the strings for each register were directly exposed to the user.
  * Used for maintaining compatibility with older config files
  */
-int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *cpuid,
+int libxl_cpuid_parse_config_xend(libxl_cpuid_policy_list *policy,
   const char* str)
 {
  

[PATCH v3 1/6] libs/guest: introduce support for setting guest MSRs

2023-07-20 Thread Roger Pau Monne
Like it's done with CPUID, introduce support for passing MSR values to
xc_cpuid_apply_policy().  The chosen format for expressing MSR policy
data matches the current one used for CPUID.  Note that existing
callers of xc_cpuid_apply_policy() can pass NULL as the value for the
newly introduced 'msr' parameter in order to preserve the same
functionality, and in fact that's done in libxl on this patch.

Signed-off-by: Roger Pau Monné 
Acked-by: Anthony PERARD 
---
Changes since v2:
 - Some code adjustment, no functional change.
---
 tools/include/xenctrl.h |  21 +++-
 tools/libs/guest/xg_cpuid_x86.c | 169 +++-
 tools/libs/light/libxl_cpuid.c  |   2 +-
 3 files changed, 188 insertions(+), 4 deletions(-)

diff --git a/tools/include/xenctrl.h b/tools/include/xenctrl.h
index dba33d5d0f39..faec1dd82453 100644
--- a/tools/include/xenctrl.h
+++ b/tools/include/xenctrl.h
@@ -1822,6 +1822,21 @@ struct xc_xend_cpuid {
 char *policy[4];
 };
 
+/*
+ * MSR policy data.
+ *
+ * The format of the policy string is the following:
+ *   '1' -> force to 1
+ *   '0' -> force to 0
+ *   'x' -> we don't care (use default)
+ *   'k' -> pass through host value
+ */
+struct xc_msr {
+uint32_t index;
+char policy[65];
+};
+#define XC_MSR_INPUT_UNUSED 0xu
+
 /*
  * Make adjustments to the CPUID settings for a domain.
  *
@@ -1833,13 +1848,15 @@ struct xc_xend_cpuid {
  * Either pass a full new @featureset (and @nr_features), or adjust individual
  * features (@pae, @itsc, @nested_virt).
  *
- * Then (optionally) apply legacy XEND overrides (@xend) to the result.
+ * Then (optionally) apply legacy XEND CPUID overrides (@xend) or MSR (@msr)
+ * to the result.
  */
 int xc_cpuid_apply_policy(xc_interface *xch,
   uint32_t domid, bool restore,
   const uint32_t *featureset,
   unsigned int nr_features, bool pae, bool itsc,
-  bool nested_virt, const struct xc_xend_cpuid *xend);
+  bool nested_virt, const struct xc_xend_cpuid *xend,
+  const struct xc_msr *msr);
 int xc_mca_op(xc_interface *xch, struct xen_mc *mc);
 int xc_mca_op_inject_v2(xc_interface *xch, unsigned int flags,
 xc_cpumap_t cpumap, unsigned int nr_cpus);
diff --git a/tools/libs/guest/xg_cpuid_x86.c b/tools/libs/guest/xg_cpuid_x86.c
index 5b035223f4f5..f2b1e809011d 100644
--- a/tools/libs/guest/xg_cpuid_x86.c
+++ b/tools/libs/guest/xg_cpuid_x86.c
@@ -423,10 +423,170 @@ static int xc_cpuid_xend_policy(
 return rc;
 }
 
+static int compare_msr(const void *l, const void *r)
+{
+const xen_msr_entry_t *lhs = l;
+const xen_msr_entry_t *rhs = r;
+
+if ( lhs->idx == rhs->idx )
+return 0;
+
+return lhs->idx < rhs->idx ? -1 : 1;
+}
+
+static xen_msr_entry_t *find_msr(
+xen_msr_entry_t *msrs, unsigned int nr_msrs,
+uint32_t index)
+{
+const xen_msr_entry_t key = { .idx = index };
+
+return bsearch(, msrs, nr_msrs, sizeof(*msrs), compare_msr);
+}
+
+
+static int xc_msr_policy(xc_interface *xch, domid_t domid,
+ const struct xc_msr *msr)
+{
+int rc;
+bool hvm;
+xc_domaininfo_t di;
+unsigned int nr_leaves, nr_msrs;
+uint32_t err_leaf = -1, err_subleaf = -1, err_msr = -1;
+/*
+ * Three full policies.  The host, default for the domain type,
+ * and domain current.
+ */
+xen_msr_entry_t *host = NULL, *def = NULL, *cur = NULL;
+unsigned int nr_host, nr_def, nr_cur;
+
+if ( (rc = xc_domain_getinfo_single(xch, domid, )) < 0 )
+{
+PERROR("Failed to obtain d%d info", domid);
+rc = -errno;
+goto out;
+}
+hvm = di.flags & XEN_DOMINF_hvm_guest;
+
+rc = xc_cpu_policy_get_size(xch, _leaves, _msrs);
+if ( rc )
+{
+PERROR("Failed to obtain policy info size");
+rc = -errno;
+goto out;
+}
+
+if ( (host = calloc(nr_msrs, sizeof(*host))) == NULL ||
+ (def  = calloc(nr_msrs, sizeof(*def)))  == NULL ||
+ (cur  = calloc(nr_msrs, sizeof(*cur)))  == NULL )
+{
+ERROR("Unable to allocate memory for %u CPUID leaves", nr_leaves);
+rc = -ENOMEM;
+goto out;
+}
+
+/* Get the domain's current policy. */
+nr_leaves = 0;
+nr_cur = nr_msrs;
+rc = get_domain_cpu_policy(xch, domid, _leaves, NULL, _cur, cur);
+if ( rc )
+{
+PERROR("Failed to obtain d%d current policy", domid);
+rc = -errno;
+goto out;
+}
+
+/* Get the domain type's default policy. */
+nr_leaves = 0;
+nr_def = nr_msrs;
+rc = get_system_cpu_policy(xch, hvm ? XEN_SYSCTL_cpu_policy_hvm_default
+: XEN_SYSCTL_cpu_policy_pv_default,
+   _leaves, NULL, _def, def);
+if ( rc )
+{
+PERROR("Failed to obtain %s def policy", hvm ? "hvm" : "pv");
+rc = 

[PATCH v3 0/6] lib{xc,xl}: support for guest MSR features

2023-07-20 Thread Roger Pau Monne
Hello,

The following series adds support for handling guest MSR features as
defined in arch-x86/cpufeatureset.h.

The end result is the user being able to use such features with the
xl.cfg(5) cpuid option.  This also involves adding support to all the
underlying layers, so both libxl and libxc also get new functionality in
order to properly parse those.

Thanks, Roger.

Roger Pau Monne (6):
  libs/guest: introduce support for setting guest MSRs
  libxl: change the type of libxl_cpuid_policy_list
  libxl: introduce MSR data in libxl_cpuid_policy
  libxl: split logic to parse user provided CPUID features
  libxl: use the cpuid feature names from cpufeatureset.h
  libxl: add support for parsing MSR features

 docs/man/xl.cfg.5.pod.in  |  24 +-
 tools/include/libxl.h |   8 +-
 tools/include/xenctrl.h   |  21 +-
 tools/libs/guest/xg_cpuid_x86.c   | 169 +++-
 tools/libs/light/libxl_cpuid.c| 662 ++
 tools/libs/light/libxl_internal.h |   5 +
 tools/libs/light/libxl_types.idl  |   2 +-
 tools/xl/xl_parse.c   |   3 +
 8 files changed, 602 insertions(+), 292 deletions(-)

-- 
2.41.0




Re: [PATCH v4 0/8] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 10:13:20AM +0200, Petr Tesařík wrote:
> Fine with me. I removed it after all my testing showed no performance
> impact as long as the size of the initial SWIOTLB is kept at the
> default value (and sufficient for the workload), but it's OK for me if
> dynamic SWIOTLB allocations are off by default.
> 
> OTOH I'd like to make it a boot-time option rather than build-time
> option. Would that be OK for you?

I'd really like the config option to not even build the code.  But
a boot time option sounds very useful in addition to that.



Re: [PATCH v4 8/8] swiotlb: search the software IO TLB only if a device makes use of it

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 10:02:38AM +0200, Petr Tesařík wrote:
> On Thu, 20 Jul 2023 08:47:44 +0200
> Christoph Hellwig  wrote:
> 
> > Any reason this can't just do a list_empty_careful on the list
> > instead of adding yet another field that grows struct device?
> 
> On which list?

dev->dma_io_tlb_mem->pools?

> 
> The dma_io_tlb_pools list only contains transient pools, but a device
> may use bounce buffers from a regular pool.

Oh, true.

> The dma_io_tlb_mem.pools list will always be non-empty, unless the
> system runs without SWIOTLB.
> 
> On a system which does have a SWIOTLB, the flag allows to differentiate
> between devices that actually use bounce buffers and devices that do
> not (e.g. because they do not have any addressing limitations).

Ok.



Re: [PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 10:01:10 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 20, 2023 at 09:56:09AM +0200, Petr Tesařík wrote:
> > On Thu, 20 Jul 2023 08:38:19 +0200
> > Christoph Hellwig  wrote:
> >   
> > > On Thu, Jul 13, 2023 at 05:23:13PM +0200, Petr Tesarik wrote:  
> > > > From: Petr Tesarik 
> > > > 
> > > > Add some kernel-doc comments and move the existing documentation of 
> > > > struct
> > > > io_tlb_slot to its correct location. The latter was forgotten in commit
> > > > 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").
> > > > 
> > > > Use the opportunity to give swiotlb_do_find_slots() a more descriptive
> > > > name, which makes it clear how it differs from swiotlb_find_slots().
> > > 
> > > Please keep the swiotlb_ prefix.  Otherwise this looks good to me.  
> > 
> > Will do. Out of curiosity, why does it matter for a static (file-local)
> > function?  
> 
> Because it makes looking at stack traces much easier.

Got it. Thanks!

Petr T



Re: [PATCH v4 0/8] Allow dynamic allocation of software IO TLB bounce buffers

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:52:16 +0200
Christoph Hellwig  wrote:

> Just to add a highlevel comment here after I feel like I need a little
> more time to review the guts.
> 
> I'm still pretty concerned about the extra list that needs to be
> consulted in is_swiotlb_buffer, but I can't really think of
> anything better.  Maybe an xarray has better cache characteristics,
> but that one requires even more allocations in the low-level dma map
> path.
> 
> One thing I'd like to see for the next version is to make the
> new growing code a config option at least for now.  It is a pretty
> big change of the existing swiotlb behavior, and I want people to opt
> into it conciously.  Maybe we can drop the option again after a few
> years once everything has settled.

Fine with me. I removed it after all my testing showed no performance
impact as long as the size of the initial SWIOTLB is kept at the
default value (and sufficient for the workload), but it's OK for me if
dynamic SWIOTLB allocations are off by default.

OTOH I'd like to make it a boot-time option rather than build-time
option. Would that be OK for you?

Petr T



Re: [PATCH v4 8/8] swiotlb: search the software IO TLB only if a device makes use of it

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:47:44 +0200
Christoph Hellwig  wrote:

> Any reason this can't just do a list_empty_careful on the list
> instead of adding yet another field that grows struct device?

On which list?

The dma_io_tlb_pools list only contains transient pools, but a device
may use bounce buffers from a regular pool.

The dma_io_tlb_mem.pools list will always be non-empty, unless the
system runs without SWIOTLB.

On a system which does have a SWIOTLB, the flag allows to differentiate
between devices that actually use bounce buffers and devices that do
not (e.g. because they do not have any addressing limitations).

Petr T



Re: [PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-20 Thread Christoph Hellwig
On Thu, Jul 20, 2023 at 09:56:09AM +0200, Petr Tesařík wrote:
> On Thu, 20 Jul 2023 08:38:19 +0200
> Christoph Hellwig  wrote:
> 
> > On Thu, Jul 13, 2023 at 05:23:13PM +0200, Petr Tesarik wrote:
> > > From: Petr Tesarik 
> > > 
> > > Add some kernel-doc comments and move the existing documentation of struct
> > > io_tlb_slot to its correct location. The latter was forgotten in commit
> > > 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").
> > > 
> > > Use the opportunity to give swiotlb_do_find_slots() a more descriptive
> > > name, which makes it clear how it differs from swiotlb_find_slots().  
> > 
> > Please keep the swiotlb_ prefix.  Otherwise this looks good to me.
> 
> Will do. Out of curiosity, why does it matter for a static (file-local)
> function?

Because it makes looking at stack traces much easier.



Re: [PATCH v4 2/8] swiotlb: add documentation and rename swiotlb_do_find_slots()

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:38:19 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 13, 2023 at 05:23:13PM +0200, Petr Tesarik wrote:
> > From: Petr Tesarik 
> > 
> > Add some kernel-doc comments and move the existing documentation of struct
> > io_tlb_slot to its correct location. The latter was forgotten in commit
> > 942a8186eb445 ("swiotlb: move struct io_tlb_slot to swiotlb.c").
> > 
> > Use the opportunity to give swiotlb_do_find_slots() a more descriptive
> > name, which makes it clear how it differs from swiotlb_find_slots().  
> 
> Please keep the swiotlb_ prefix.  Otherwise this looks good to me.

Will do. Out of curiosity, why does it matter for a static (file-local)
function?

Petr T



Re: [PATCH v4 1/8] swiotlb: make io_tlb_default_mem local to swiotlb.c

2023-07-20 Thread Petr Tesařík
On Thu, 20 Jul 2023 08:37:44 +0200
Christoph Hellwig  wrote:

> On Thu, Jul 13, 2023 at 05:23:12PM +0200, Petr Tesarik wrote:
> > From: Petr Tesarik 
> > 
> > SWIOTLB implementation details should not be exposed to the rest of the
> > kernel. This will allow to make changes to the implementation without
> > modifying non-swiotlb code.
> > 
> > To avoid breaking existing users, provide helper functions for the few
> > required fields.
> > 
> > As a bonus, using a helper function to initialize struct device allows to
> > get rid of an #ifdef in driver core.
> > 
> > Signed-off-by: Petr Tesarik 
> > ---
> >  arch/arm/xen/mm.c  |  2 +-
> >  arch/mips/pci/pci-octeon.c |  2 +-
> >  arch/x86/kernel/pci-dma.c  |  2 +-
> >  drivers/base/core.c|  4 +---
> >  drivers/xen/swiotlb-xen.c  |  2 +-
> >  include/linux/swiotlb.h| 25 +++-
> >  kernel/dma/swiotlb.c   | 39 +-
> >  7 files changed, 67 insertions(+), 9 deletions(-)
> > 
> > diff --git a/arch/arm/xen/mm.c b/arch/arm/xen/mm.c
> > index 3d826c0b5fee..0f32c14eb786 100644
> > --- a/arch/arm/xen/mm.c
> > +++ b/arch/arm/xen/mm.c
> > @@ -125,7 +125,7 @@ static int __init xen_mm_init(void)
> > return 0;
> >  
> > /* we can work with the default swiotlb */
> > -   if (!io_tlb_default_mem.nslabs) {
> > +   if (!is_swiotlb_allocated()) {
> > rc = swiotlb_init_late(swiotlb_size_or_default(),
> >xen_swiotlb_gfp(), NULL);
> > if (rc < 0)  
> 
> I'd much rather move the already initialized check into
> swiotlb_init_late, which is a much cleaer interface.
> 
> > /* we can work with the default swiotlb */
> > -   if (!io_tlb_default_mem.nslabs) {
> > +   if (!is_swiotlb_allocated()) {
> > int rc = swiotlb_init_late(swiotlb_size_or_default(),
> >GFP_KERNEL, xen_swiotlb_fixup);
> > if (rc < 0)  
> 
> .. and would take care of this one as well.

Oh, you're right! These are the only two places that look at
io_tlb_default_mem.nslabs, and all they need is to avoid double
initialization. Makes perfect sense to move it inside
swiotlb_init_late().

> > +bool is_swiotlb_allocated(void)
> > +{
> > +   return !!io_tlb_default_mem.nslabs;  
> 
> Nit: no need for the !!, we can rely on the implicit promotion to
> bool.  But with the suggestion above the need for this helper
> should go away anyway.

Eh, yes. I initially declared the return type as int and then forgot to
change the return statement. But as you say, the whole function will go
away entirely.

Petr T



[XEN PATCH v3] x86/HVM: address violations of MISRA C:2012 Rules 8.2 and 8.3

2023-07-20 Thread Federico Serafini
Give a name to unnamed parameters thus addressing violations of
MISRA C:2012 Rule 8.2 ("Function types shall be in prototype form with
named parameters").
Keep consistency between parameter names and types used in function
declarations and the ones used in the corresponding function
definitions, thus addressing violations of MISRA C:2012 Rule 8.3
("All declarations of an object or function shall use the same names
and type qualifiers").

No functional changes.

Signed-off-by: Federico Serafini 
---
Changes in v3:
  - removed changes to convert_hour() (Jan took care of it);
  - modified also hvm_set_rdtsc_exiting() declaration;
  - modified also svm_intercept_msr() declaration.
---
Changes in v2:
  - u64 vs uint64_t mismatches are solved in favor of the stdint types;
  - adapted parameter names of nsvm_vcpu_vmexit_event() definition to
the names used in its declaration.
---

Signed-off-by: Federico Serafini 
---
 xen/arch/x86/hvm/domain.c   |  2 +-
 xen/arch/x86/hvm/hvm.c  |  6 +++---
 xen/arch/x86/hvm/svm/nestedsvm.c|  8 
 xen/arch/x86/hvm/vioapic.c  |  2 +-
 xen/arch/x86/include/asm/hvm/domain.h   |  2 +-
 xen/arch/x86/include/asm/hvm/hvm.h  | 20 ++--
 xen/arch/x86/include/asm/hvm/irq.h  | 14 +++---
 xen/arch/x86/include/asm/hvm/save.h |  4 ++--
 xen/arch/x86/include/asm/hvm/support.h  |  2 +-
 xen/arch/x86/include/asm/hvm/svm/vmcb.h |  2 +-
 10 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/xen/arch/x86/hvm/domain.c b/xen/arch/x86/hvm/domain.c
index 7692ee24c2..7f6e362a70 100644
--- a/xen/arch/x86/hvm/domain.c
+++ b/xen/arch/x86/hvm/domain.c
@@ -100,7 +100,7 @@ static int check_segment(struct segment_register *reg, enum 
x86_segment seg)
 }
 
 /* Called by VCPUOP_initialise for HVM guests. */
-int arch_set_info_hvm_guest(struct vcpu *v, const vcpu_hvm_context_t *ctx)
+int arch_set_info_hvm_guest(struct vcpu *v, const struct vcpu_hvm_context *ctx)
 {
 const struct domain *d = v->domain;
 struct cpu_user_regs *uregs = >arch.user_regs;
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 57363c2ae1..28d131a202 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -272,7 +272,7 @@ uint8_t hvm_combine_hw_exceptions(uint8_t vec1, uint8_t 
vec2)
 return X86_EXC_DF;
 }
 
-void hvm_set_rdtsc_exiting(struct domain *d, bool_t enable)
+void hvm_set_rdtsc_exiting(struct domain *d, bool enable)
 {
 struct vcpu *v;
 
@@ -280,7 +280,7 @@ void hvm_set_rdtsc_exiting(struct domain *d, bool_t enable)
 alternative_vcall(hvm_funcs.set_rdtsc_exiting, v, enable);
 }
 
-void hvm_get_guest_pat(struct vcpu *v, u64 *guest_pat)
+void hvm_get_guest_pat(struct vcpu *v, uint64_t *guest_pat)
 {
 if ( !alternative_call(hvm_funcs.get_guest_pat, v, guest_pat) )
 *guest_pat = v->arch.hvm.pat_cr;
@@ -426,7 +426,7 @@ static void hvm_set_guest_tsc_adjust(struct vcpu *v, u64 
tsc_adjust)
 update_vcpu_system_time(v);
 }
 
-u64 hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc)
+uint64_t hvm_get_guest_tsc_fixed(struct vcpu *v, uint64_t at_tsc)
 {
 uint64_t tsc;
 
diff --git a/xen/arch/x86/hvm/svm/nestedsvm.c b/xen/arch/x86/hvm/svm/nestedsvm.c
index 5d74863268..a09b6abaae 100644
--- a/xen/arch/x86/hvm/svm/nestedsvm.c
+++ b/xen/arch/x86/hvm/svm/nestedsvm.c
@@ -837,12 +837,12 @@ nsvm_vcpu_vmexit_inject(struct vcpu *v, struct 
cpu_user_regs *regs,
 }
 
 int cf_check nsvm_vcpu_vmexit_event(
-struct vcpu *v, const struct x86_event *trap)
+struct vcpu *v, const struct x86_event *event)
 {
 ASSERT(vcpu_nestedhvm(v).nv_vvmcx != NULL);
 
-nestedsvm_vmexit_defer(v, VMEXIT_EXCEPTION_DE + trap->vector,
-   trap->error_code, trap->cr2);
+nestedsvm_vmexit_defer(v, VMEXIT_EXCEPTION_DE + event->vector,
+   event->error_code, event->cr2);
 return NESTEDHVM_VMEXIT_DONE;
 }
 
@@ -1538,7 +1538,7 @@ nestedsvm_vcpu_interrupt(struct vcpu *v, const struct 
hvm_intack intack)
 return NSVM_INTR_NOTINTERCEPTED;
 }
 
-bool_t
+bool
 nestedsvm_gif_isset(struct vcpu *v)
 {
 struct nestedsvm *svm = _nestedsvm(v);
diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
index 41e3c4d5e4..4e40d3609a 100644
--- a/xen/arch/x86/hvm/vioapic.c
+++ b/xen/arch/x86/hvm/vioapic.c
@@ -43,7 +43,7 @@
 /* HACK: Route IRQ0 only to VCPU0 to prevent time jumps. */
 #define IRQ0_SPECIAL_ROUTING 1
 
-static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int irq);
+static void vioapic_deliver(struct hvm_vioapic *vioapic, unsigned int pin);
 
 static struct hvm_vioapic *addr_vioapic(const struct domain *d,
 unsigned long addr)
diff --git a/xen/arch/x86/include/asm/hvm/domain.h 
b/xen/arch/x86/include/asm/hvm/domain.h
index 02c32cf26d..6e53ce4449 100644
--- a/xen/arch/x86/include/asm/hvm/domain.h
+++ b/xen/arch/x86/include/asm/hvm/domain.h
@@ -47,7 +47,7 @@ struct 

Re: [RFC PATCH 3/4] xen/arm: initialize conditionally uninitialized local variables

2023-07-20 Thread Nicola Vetrini




On 19/07/23 16:06, Julien Grall wrote:

Hi,

On 19/07/2023 14:27, Nicola Vetrini wrote:

On 14/07/23 15:21, Julien Grall wrote:

Hi,

On 14/07/2023 12:49, Nicola Vetrini wrote:

This patch aims to fix some occurrences of possibly uninitialized
variables, that may be read before being written. This behaviour would
violate MISRA C:2012 Rule 9.1, besides being generally undesirable.

In all the analyzed cases, such accesses were actually safe, but it's
quite difficult to prove so by automatic checking, therefore a safer
route is to change the code so as to avoid the behaviour from 
occurring,

while preserving the semantics.

An initialization to a safe value is provided to reach this aim.

Signed-off-by: Nicola Vetrini 
---
Additional input on which values may be 'safe' in each context is
surely welcome, to avoid possibly compromising the correctness of
the function semantics.
---
  xen/arch/arm/cpuerrata.c    |  6 +++---
  xen/arch/arm/domctl.c   |  8 
  xen/arch/arm/gic-v3-lpi.c   | 17 +
  xen/arch/arm/include/asm/p2m.h  | 10 ++
  xen/arch/arm/platforms/xilinx-zynqmp-eemi.c | 10 ++
  xen/arch/arm/psci.c | 10 +-
  xen/drivers/char/pl011.c    |  2 +-
  7 files changed, 30 insertions(+), 33 deletions(-)

diff --git a/xen/arch/arm/cpuerrata.c b/xen/arch/arm/cpuerrata.c
index d0658aedb6..14694c6081 100644
--- a/xen/arch/arm/cpuerrata.c
+++ b/xen/arch/arm/cpuerrata.c
@@ -159,7 +159,7 @@ extern char __mitigate_spectre_bhb_loop_start_32[],
  static int enable_smccc_arch_workaround_1(void *data)
  {
-    struct arm_smccc_res res;
+    struct arm_smccc_res res = {0};


I understand you desire to make happy. But I am not sure that 
initializing to 0 is the right thing. If the SMCC were not properly 
setting the register, then we most likely don't want to install the 
workaround. Instead, we most likely want to warn.


So you want (int)res.a0 to be negative. We don't care about the other 
fields.




In principle I'm ok with this, but see below.


  const struct arm_cpu_capabilities *entry = data;
  /*
@@ -252,7 +252,7 @@ static int enable_spectre_bhb_workaround(void 
*data)

  if ( cpus_have_cap(ARM_WORKAROUND_BHB_SMCC_3) )
  {
-    struct arm_smccc_res res;
+    struct arm_smccc_res res = {0};


Same remark here.


  if ( smccc_ver < SMCCC_VERSION(1, 1) )
  goto warn;
@@ -393,7 +393,7 @@ DEFINE_PER_CPU_READ_MOSTLY(register_t, 
ssbd_callback_required);
  static bool has_ssbd_mitigation(const struct arm_cpu_capabilities 
*entry)

  {
-    struct arm_smccc_res res;
+    struct arm_smccc_res res = {0};


Here you would want (int)res.a0 to be equal to ARM_SMCCC_NOT_SUPPORTED.


I see that ARM_SMCCC_NOT_SUPPORTED is
#define ARM_SMCCC_NOT_SUPPORTED (-1)

thus an assignment to res.a0 would violate Rule 10.3:
"The value of an expression shall not be assigned to an object with a 
narrower essential type or of a different essential type category."

(signed vs unsigned, and the exception does not apply here).

This rule is not yet under discussion, but I would like to avoid 
knowingly introducing more violations if there's an alternative.


Do the fields of struct arm_smccc_res really need to be unsigned?


Yes, all the fields represent a register. Also, in this context, only 
the first 32-bit of the register should be taken into account.


That why you will see code using (int)res.a0.



If the purpose of defining it to -1 is to have the value 
0x, then it could be defined as ~0UL.



  bool required;
  if ( smccc_ver < SMCCC_VERSION(1, 1) )
diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index ad56efb0f5..b38fed72be 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -29,10 +29,10 @@ static int handle_vuart_init(struct domain *d,
   struct xen_domctl_vuart_op *vuart_op)
  {
  int rc;
-    struct vpl011_init_info info;
-
-    info.console_domid = vuart_op->console_domid;
-    info.gfn = _gfn(vuart_op->gfn);
+    struct vpl011_init_info info = {
+    .console_domid = vuart_op->console_domid,
+    .gfn = _gfn(vuart_op->gfn)
+    };


I am not against, this change. But I don't quite understand how this 
makes Eclair much happier?


It also zero-initializes the third field:

struct vpl011_init_info {
 domid_t console_domid;
 gfn_t gfn;
 evtchn_port_t evtchn;
};




Also, if this is the desired way, then I think this should be written 
down in the CODING_STYLE.


This is just a matter of style, I can also set the other field 
explicitly, if you prefer.


I am confused. In a previous reply, I thought you said the following 
would also make ECLAIR unhappy:

 > info.console_domid = <...>;
info.gfn = <...>;
info.evtchn = <...>;



If I did (I can't find it right now, but I'll try to dig it up), I'd say 
that I was either wrong or I 

  1   2   >