[Xen-devel] [linux-next test] 128276: regressions - FAIL

2018-10-01 Thread osstest service owner
flight 128276 linux-next real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128276/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-debianhvm-amd64  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
128170
 test-amd64-amd64-xl-credit2   7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-qemuu-rhel6hvm-intel  7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-qemut-ws16-amd64  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-freebsd10-i386  7 xen-boot   fail REGR. vs. 128170
 test-amd64-amd64-qemuu-nested-intel  7 xen-boot  fail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-win10-i386  7 xen-boot  fail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  7 xen-bootfail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-ws16-amd64  7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-i386-pvgrub  7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-qemut-win7-amd64  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemut-debianhvm-amd64-xsm  7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
128170
 test-amd64-i386-xl-raw7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-ovmf-amd64  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-i386-freebsd10-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. vs. 
128170
 test-amd64-i386-pair 10 xen-boot/src_hostfail REGR. vs. 128170
 test-amd64-i386-pair 11 xen-boot/dst_hostfail REGR. vs. 128170
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
128170
 test-amd64-i386-libvirt   7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-shadow7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-qemuu-nested-amd  7 xen-bootfail REGR. vs. 128170
 test-amd64-amd64-xl-qemut-win10-i386  7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-pvhv2-amd  7 xen-bootfail REGR. vs. 128170
 test-amd64-i386-xl-qemut-ws16-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-i386-qemut-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 128170
 test-amd64-amd64-xl   7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-xsm   7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-xsm7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 7 xen-boot fail REGR. 
vs. 128170
 test-amd64-amd64-libvirt-pair 10 xen-boot/src_host   fail REGR. vs. 128170
 test-amd64-amd64-libvirt-pair 11 xen-boot/dst_host   fail REGR. vs. 128170
 test-amd64-amd64-xl-pvshim7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-libvirt  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-debianhvm-amd64-shadow 7 xen-boot fail REGR. vs. 
128170
 test-amd64-i386-xl-qemuu-debianhvm-amd64-xsm  7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-ws16-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-i386-libvirt-xsm   7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-pair10 xen-boot/src_hostfail REGR. vs. 128170
 test-amd64-amd64-pair11 xen-boot/dst_hostfail REGR. vs. 128170
 test-amd64-amd64-xl-qemut-debianhvm-amd64  7 xen-bootfail REGR. vs. 128170
 test-amd64-i386-xl-qemuu-win7-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-dmrestrict-amd64-dmrestrict 7 xen-boot fail REGR. 
vs. 128170
 test-amd64-i386-qemuu-rhel6hvm-amd  7 xen-boot   fail REGR. vs. 128170
 test-amd64-i386-xl-pvshim 7 xen-boot fail REGR. vs. 128170
 test-amd64-amd64-amd64-pvgrub  7 xen-bootfail REGR. vs. 128170
 test-amd64-amd64-xl-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 128170
 test-amd64-i386-xl-qemut-win7-amd64  7 xen-boot  fail REGR. vs. 128170
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 7 xen-boot fail REGR. vs. 
128170
 test-amd64-amd64-xl-qemuu-win10-i386  7 xen-boot fail REGR. vs. 128170

Tests which are failing intermittently (not blocking):
 test-armhf-armhf-xl   6 xen-installfail pass in 128167
 test-armhf-armhf-xl-cubietruck  7 xen-boot fail pass in 128167

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-credit1   7 xen-bootfail baseline untested
 test-armhf-armhf-xl 13 migrate-support-check fail in 128167 never pass
 test-armhf-armhf-xl 14 saverestore-support-check fail in 128167 never pass
 

[Xen-devel] [xen-unstable-smoke test] 128296: tolerable all pass - PUSHED

2018-10-01 Thread osstest service owner
flight 128296 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128296/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  0bc6a68da585cb5d781f27404943a1dbd393e95f
baseline version:
 xen  3722e563432bdbf0cbd16ace3075cf643186b01e

Last test of basis   128288  2018-10-01 17:00:45 Z0 days
Testing same since   128296  2018-10-01 22:00:33 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Stefano Stabellini 
  Volodymyr Babchuk 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   3722e56343..0bc6a68da5  0bc6a68da585cb5d781f27404943a1dbd393e95f -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Julien Grall

Hi Stefano,

On 10/01/2018 09:52 PM, Stefano Stabellini wrote:

On Mon, 1 Oct 2018, Stefano Stabellini wrote:

On Mon, 1 Oct 2018, Julien Grall wrote:

call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
do SMCCC call, replace all call to the former by the later.

Signed-off-by: Julien Grall 


Reviewed-by: Stefano Stabellini 

(you might want to make sure the double [[ is removed on commit.)


Hey Julien,

I was going to commit the series, but I am getting a build error:

psci.c: In function 'call_psci_cpu_on':
psci.c:45:40: error: request for member 'a0' in something not a structure or 
union
  #define PSCI_RET(res)   ((int32_t)(res).a0)
 ^
psci.c:54:12: note: in expansion of macro 'PSCI_RET'
  return PSCI_RET(res.a0);
 ^
psci.c:55:1: error: control reaches end of non-void function 
[-Werror=return-type]
  }


I already mentioned this error in reply to the patch [1] and suggested 
it could be fixed on commit...


Cheers,

[1] <9e0160ad-853c-6efa-1a04-15f801ef0...@arm.com>

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] xen/arm: vgic-v3: Don't create empty re-distributor regions

2018-10-01 Thread Stefano Stabellini
On Mon, 1 Oct 2018, Julien Grall wrote:
> At the moment, Xen is assuming the hardware domain will have the same
> number of re-distributor regions as the host. However, as the
> number of CPUs or the stride (e.g on GICv4) may be different we end up
> exposing regions which does not contain any re-distributors.
> 
> When booting, Linux will go through all the re-distributor region to
> check whether a property (e.g vPLIs) is available accross all the
> re-distributors. This will result to a data abort on empty regions
> because there are no underlying re-distributor.
> 
> So we need to limit the number of regions exposed to the hardware
> domain. The code reworked to only expose the minimun number of regions
> required by the hardware domain. It is assumed the regions will be
> populated starting from the first one.
> 
> Lastly, rename vgic_v3_rdist_count to reflect the value return by the
> helper.
> 
> Reported-by: Shameerali Kolothum Thodi 
> Signed-off-by: Julien Grall 
> Tested-by: Shameer Kolothum 
> 
> ---
> Changes in v2:
> - Rename vgic_v3_rdist_count to vgic_v3_max_rdist_count
> - Fixup #re-distributors
> - Fix typoes
> - Add Shameer's tested tag
> ---
>  xen/arch/arm/gic-v3.c  | 14 +++---
>  xen/arch/arm/vgic-v3.c | 21 ++---
>  2 files changed, 29 insertions(+), 6 deletions(-)
> 
> diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
> index c98a163ee7..2c1454f425 100644
> --- a/xen/arch/arm/gic-v3.c
> +++ b/xen/arch/arm/gic-v3.c
> @@ -1265,7 +1265,8 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
> *d,
>  if ( res )
>  return res;
>  
> -res = fdt_property_cell(fdt, "#redistributor-regions", 
> gicv3.rdist_count);
> +res = fdt_property_cell(fdt, "#redistributor-regions",
> +d->arch.vgic.nr_regions);
>  if ( res )
>  return res;
>  
> @@ -1274,8 +1275,10 @@ static int gicv3_make_hwdom_dt_node(const struct 
> domain *d,
>   * GIC has two memory regions: Distributor + rdist regions
>   * CPU interface and virtual cpu interfaces accessesed as System 
> registers
>   * So cells are created only for Distributor and rdist regions
> + * The hardware domain may not use all the regions. So only copy
> + * what is necessary.
>   */
> -new_len = new_len * (gicv3.rdist_count + 1);
> +new_len = new_len * (d->arch.vgic.nr_regions + 1);
>  
>  hw_reg = dt_get_property(gic, "reg", );
>  if ( !hw_reg )
> @@ -1466,6 +1469,7 @@ static inline bool gic_dist_supports_dvis(void)
>  }
>  
>  static int gicv3_make_hwdom_madt(const struct domain *d, u32 offset)
> +
>  {

Aside from this spurious change, the patch is OK. Provided you remove
the blank on commit:

Reviewed-by: Stefano Stabellini 


>  struct acpi_subtable_header *header;
>  struct acpi_madt_generic_interrupt *host_gicc, *gicc;
> @@ -1503,7 +1507,11 @@ static int gicv3_make_hwdom_madt(const struct domain 
> *d, u32 offset)
>  
>  /* Add Generic Redistributor */
>  size = sizeof(struct acpi_madt_generic_redistributor);
> -for ( i = 0; i < gicv3.rdist_count; i++ )
> +/*
> + * The hardware domain may not used all the regions. So only copy
> + * what is necessary.
> + */
> +for ( i = 0; i < d->arch.vgic.nr_regions; i++ )
>  {
>  gicr = (struct acpi_madt_generic_redistributor *)(base_ptr + 
> table_len);
>  gicr->header.type = ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR;
> diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
> index df1bab3a35..efe824c6fb 100644
> --- a/xen/arch/arm/vgic-v3.c
> +++ b/xen/arch/arm/vgic-v3.c
> @@ -1640,7 +1640,11 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
>  return 0;
>  }
>  
> -static inline unsigned int vgic_v3_rdist_count(struct domain *d)
> +/*
> + * Return the maximum number possible of re-distributor regions for
> + * a given domain.
> + */
> +static inline unsigned int vgic_v3_max_rdist_count(struct domain *d)
>  {
>  /*
>   * Normally there is only one GICv3 redistributor region.
> @@ -1662,7 +1666,7 @@ static int vgic_v3_real_domain_init(struct domain *d)
>  int rdist_count, i, ret;
>  
>  /* Allocate memory for Re-distributor regions */
> -rdist_count = vgic_v3_rdist_count(d);
> +rdist_count = vgic_v3_max_rdist_count(d);
>  
>  rdist_regions = xzalloc_array(struct vgic_rdist_region, rdist_count);
>  if ( !rdist_regions )
> @@ -1695,8 +1699,19 @@ static int vgic_v3_real_domain_init(struct domain *d)
>  d->arch.vgic.rdist_regions[i].first_cpu = first_cpu;
>  
>  first_cpu += size / GICV3_GICR_SIZE;
> +
> +if ( first_cpu >= d->max_vcpus )
> +break;
>  }
>  
> +/*
> + * The hardware domain may not use all the re-distributors
> + * regions (e.g when the number of vCPUs does not match the
> + * number of pCPUs). Update the number of regions to avoid
> + 

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Stefano Stabellini
On Mon, 1 Oct 2018, Stefano Stabellini wrote:
> On Mon, 1 Oct 2018, Julien Grall wrote:
> > call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
> > do SMCCC call, replace all call to the former by the later.
> > 
> > Signed-off-by: Julien Grall 
> 
> Reviewed-by: Stefano Stabellini 
> 
> (you might want to make sure the double [[ is removed on commit.)

Hey Julien,

I was going to commit the series, but I am getting a build error:

psci.c: In function 'call_psci_cpu_on':
psci.c:45:40: error: request for member 'a0' in something not a structure or 
union
 #define PSCI_RET(res)   ((int32_t)(res).a0)
^
psci.c:54:12: note: in expansion of macro 'PSCI_RET'
 return PSCI_RET(res.a0);
^
psci.c:55:1: error: control reaches end of non-void function 
[-Werror=return-type]
 }



> > ---
> > 
> > Changes in v3:
> > - Use PSCI_RET where needed
> > ---
> >  xen/arch/arm/Makefile|  1 -
> >  xen/arch/arm/platforms/exynos5.c |  3 ++-
> >  xen/arch/arm/platforms/seattle.c |  4 ++--
> >  xen/arch/arm/psci.c  | 37 +
> >  xen/arch/arm/smc.S   | 21 -
> >  xen/include/asm-arm/processor.h  |  3 ---
> >  6 files changed, 29 insertions(+), 40 deletions(-)
> >  delete mode 100644 xen/arch/arm/smc.S
> > 
> > diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> > index b9b141dc84..37fa8268b3 100644
> > --- a/xen/arch/arm/Makefile
> > +++ b/xen/arch/arm/Makefile
> > @@ -39,7 +39,6 @@ obj-y += processor.o
> >  obj-y += psci.o
> >  obj-y += setup.o
> >  obj-y += shutdown.o
> > -obj-y += smc.o
> >  obj-y += smp.o
> >  obj-y += smpboot.o
> >  obj-y += sysctl.o
> > diff --git a/xen/arch/arm/platforms/exynos5.c 
> > b/xen/arch/arm/platforms/exynos5.c
> > index c15ecf80f5..e2c0b7b878 100644
> > --- a/xen/arch/arm/platforms/exynos5.c
> > +++ b/xen/arch/arm/platforms/exynos5.c
> > @@ -26,6 +26,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  static bool secure_firmware;
> >  
> > @@ -249,7 +250,7 @@ static int exynos5_cpu_up(int cpu)
> >  iounmap(power);
> >  
> >  if ( secure_firmware )
> > -call_smc(SMC_CMD_CPU1BOOT, cpu, 0, 0);
> > +arm_smccc_smc(SMC_CMD_CPU1BOOT, cpu, NULL);
> >  
> >  return cpu_up_send_sgi(cpu);
> >  }
> > diff --git a/xen/arch/arm/platforms/seattle.c 
> > b/xen/arch/arm/platforms/seattle.c
> > index 893cc17972..64cc1868c2 100644
> > --- a/xen/arch/arm/platforms/seattle.c
> > +++ b/xen/arch/arm/platforms/seattle.c
> > @@ -33,12 +33,12 @@ static const char * const seattle_dt_compat[] 
> > __initconst =
> >   */
> >  static void seattle_system_reset(void)
> >  {
> > -call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
> > +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
> >  }
> >  
> >  static void seattle_system_off(void)
> >  {
> > -call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
> > +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
> >  }
> >  
> >  PLATFORM_START(seattle, "SEATTLE")
> > diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> > index 941eec921b..4ae6504f3e 100644
> > --- a/xen/arch/arm/psci.c
> > +++ b/xen/arch/arm/psci.c
> > @@ -42,42 +42,53 @@ uint32_t smccc_ver;
> >  
> >  static uint32_t psci_cpu_on_nr;
> >  
> > +#define PSCI_RET(res)   ((int32_t)(res).a0)
> > +
> >  int call_psci_cpu_on(int cpu)
> >  {
> > -return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
> > __pa(init_secondary), 0);
> > +struct arm_smccc_res res;
> > +
> > +arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
> > __pa(init_secondary),
> > +  );
> > +
> > +return PSCI_RET(res.a0);
> >  }
> >  
> >  void call_psci_cpu_off(void)
> >  {
> >  if ( psci_ver > PSCI_VERSION(0, 1) )
> >  {
> > -int errno;
> > +struct arm_smccc_res res;
> >  
> >  /* If successfull the PSCI cpu_off call doesn't return */
> > -errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
> > +arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, );
> >  panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
> > -  errno);
> > +  PSCI_RET(res));
> >  }
> >  }
> >  
> >  void call_psci_system_off(void)
> >  {
> >  if ( psci_ver > PSCI_VERSION(0, 1) )
> > -call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
> > +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
> >  }
> >  
> >  void call_psci_system_reset(void)
> >  {
> >  if ( psci_ver > PSCI_VERSION(0, 1) )
> > -call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
> > +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
> >  }
> >  
> >  static int __init psci_features(uint32_t psci_func_id)
> >  {
> > +struct arm_smccc_res res;
> > +
> >  if ( psci_ver < PSCI_VERSION(1, 0) )
> >  return PSCI_NOT_SUPPORTED;
> >  
> > -return call_smc(PSCI_1_0_FN32_PSCI_FEATURES, psci_func_id, 0, 0);
> > +

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Stefano Stabellini
On Mon, 1 Oct 2018, Julien Grall wrote:
> call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
> do SMCCC call, replace all call to the former by the later.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 

(you might want to make sure the double [[ is removed on commit.)

> ---
> 
> Changes in v3:
> - Use PSCI_RET where needed
> ---
>  xen/arch/arm/Makefile|  1 -
>  xen/arch/arm/platforms/exynos5.c |  3 ++-
>  xen/arch/arm/platforms/seattle.c |  4 ++--
>  xen/arch/arm/psci.c  | 37 +
>  xen/arch/arm/smc.S   | 21 -
>  xen/include/asm-arm/processor.h  |  3 ---
>  6 files changed, 29 insertions(+), 40 deletions(-)
>  delete mode 100644 xen/arch/arm/smc.S
> 
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index b9b141dc84..37fa8268b3 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -39,7 +39,6 @@ obj-y += processor.o
>  obj-y += psci.o
>  obj-y += setup.o
>  obj-y += shutdown.o
> -obj-y += smc.o
>  obj-y += smp.o
>  obj-y += smpboot.o
>  obj-y += sysctl.o
> diff --git a/xen/arch/arm/platforms/exynos5.c 
> b/xen/arch/arm/platforms/exynos5.c
> index c15ecf80f5..e2c0b7b878 100644
> --- a/xen/arch/arm/platforms/exynos5.c
> +++ b/xen/arch/arm/platforms/exynos5.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static bool secure_firmware;
>  
> @@ -249,7 +250,7 @@ static int exynos5_cpu_up(int cpu)
>  iounmap(power);
>  
>  if ( secure_firmware )
> -call_smc(SMC_CMD_CPU1BOOT, cpu, 0, 0);
> +arm_smccc_smc(SMC_CMD_CPU1BOOT, cpu, NULL);
>  
>  return cpu_up_send_sgi(cpu);
>  }
> diff --git a/xen/arch/arm/platforms/seattle.c 
> b/xen/arch/arm/platforms/seattle.c
> index 893cc17972..64cc1868c2 100644
> --- a/xen/arch/arm/platforms/seattle.c
> +++ b/xen/arch/arm/platforms/seattle.c
> @@ -33,12 +33,12 @@ static const char * const seattle_dt_compat[] __initconst 
> =
>   */
>  static void seattle_system_reset(void)
>  {
> -call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
>  }
>  
>  static void seattle_system_off(void)
>  {
> -call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
>  }
>  
>  PLATFORM_START(seattle, "SEATTLE")
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index 941eec921b..4ae6504f3e 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -42,42 +42,53 @@ uint32_t smccc_ver;
>  
>  static uint32_t psci_cpu_on_nr;
>  
> +#define PSCI_RET(res)   ((int32_t)(res).a0)
> +
>  int call_psci_cpu_on(int cpu)
>  {
> -return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
> __pa(init_secondary), 0);
> +struct arm_smccc_res res;
> +
> +arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
> +  );
> +
> +return PSCI_RET(res.a0);
>  }
>  
>  void call_psci_cpu_off(void)
>  {
>  if ( psci_ver > PSCI_VERSION(0, 1) )
>  {
> -int errno;
> +struct arm_smccc_res res;
>  
>  /* If successfull the PSCI cpu_off call doesn't return */
> -errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, );
>  panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
> -  errno);
> +  PSCI_RET(res));
>  }
>  }
>  
>  void call_psci_system_off(void)
>  {
>  if ( psci_ver > PSCI_VERSION(0, 1) )
> -call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
>  }
>  
>  void call_psci_system_reset(void)
>  {
>  if ( psci_ver > PSCI_VERSION(0, 1) )
> -call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
>  }
>  
>  static int __init psci_features(uint32_t psci_func_id)
>  {
> +struct arm_smccc_res res;
> +
>  if ( psci_ver < PSCI_VERSION(1, 0) )
>  return PSCI_NOT_SUPPORTED;
>  
> -return call_smc(PSCI_1_0_FN32_PSCI_FEATURES, psci_func_id, 0, 0);
> +arm_smccc_smc(PSCI_1_0_FN32_PSCI_FEATURES, psci_func_id, NULL);
> +
> +return PSCI_RET(res);
>  }
>  
>  static int __init psci_is_smc_method(const struct dt_device_node *psci)
> @@ -112,11 +123,11 @@ static void __init psci_init_smccc(void)
>  
>  if ( psci_features(ARM_SMCCC_VERSION_FID) != PSCI_NOT_SUPPORTED )
>  {
> -uint32_t ret;
> +struct arm_smccc_res res;
>  
> -ret = call_smc(ARM_SMCCC_VERSION_FID, 0, 0, 0);
> -if ( ret != ARM_SMCCC_NOT_SUPPORTED )
> -smccc_ver = ret;
> +arm_smccc_smc(ARM_SMCCC_VERSION_FID, );
> +if ( PSCI_RET(res) != ARM_SMCCC_NOT_SUPPORTED )
> +smccc_ver = PSCI_RET(res);
>  }
>  
>  if ( smccc_ver >= SMCCC_VERSION(1, 1) )
> @@ -165,6 +176,7 @@ static int __init psci_init_0_2(void)
>  { /* 

Re: [Xen-devel] [[PATCH v3] 2/4] xen/arm: cpufeature: Add helper to check constant caps

2018-10-01 Thread Stefano Stabellini
On Mon, 1 Oct 2018, Julien Grall wrote:
> Some capababilities are set right during boot and will never change
> afterwards. At the moment, the function cpu_have_caps will check whether
> the cap is enabled from the memory.
> 
> It is possible to avoid the load from the memory by using an
> ALTERNATIVE. With that the check is just reduced to 1 instruction.
> 
> Signed-off-by: Julien Grall 

Reviewed-by: Stefano Stabellini 

> ---
> 
> This is the static key for the poor. At some point we might want to
> introduce something similar to static key in Xen.
> 
> Changes in v2:
> - Use unlikely
> ---
>  xen/include/asm-arm/cpufeature.h | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/xen/include/asm-arm/cpufeature.h 
> b/xen/include/asm-arm/cpufeature.h
> index 3de6b54301..c6cbc2ec84 100644
> --- a/xen/include/asm-arm/cpufeature.h
> +++ b/xen/include/asm-arm/cpufeature.h
> @@ -63,6 +63,18 @@ static inline bool cpus_have_cap(unsigned int num)
>  return test_bit(num, cpu_hwcaps);
>  }
>  
> +/* System capability check for constant cap */
> +#define cpus_have_const_cap(num) ({ \
> +bool __ret; \
> +\
> +asm volatile (ALTERNATIVE("mov %0, #0", \
> +  "mov %0, #1", \
> +  num)  \
> +  : "=r" (__ret));  \
> +\
> +unlikely(__ret);\
> +})
> +
>  static inline void cpus_set_cap(unsigned int num)
>  {
>  if (num >= ARM_NCAPS)
> -- 
> 2.11.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 128288: tolerable all pass - PUSHED

2018-10-01 Thread osstest service owner
flight 128288 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128288/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  3722e563432bdbf0cbd16ace3075cf643186b01e
baseline version:
 xen  5c28a3c6814cc3a054fde133c0f6ef77d80c0412

Last test of basis   128283  2018-10-01 13:00:52 Z0 days
Testing same since   128288  2018-10-01 17:00:45 Z0 days1 attempts


People who touched revisions under test:
  George Dunlap 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   5c28a3c681..3722e56343  3722e563432bdbf0cbd16ace3075cf643186b01e -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] xen/arm: vgic-v3: Don't create empty re-distributor regions

2018-10-01 Thread Julien Grall

Hi,

I forgot to remove patch from the previous series before sending the new 
one.


Please ignore that patch. Sorry for the noise.

Cheers,


On 10/01/2018 07:57 PM, Julien Grall wrote:

At the moment, Xen is assuming the hardware domain will have the same
number of re-distributor regions as the host. However, as the
number of CPUs or the stride (e.g on GICv4) may be different we end up
exposing regions which does not contain any re-distributors.

When booting, Linux will go through all the re-distributor region to
check whether a property (e.g vPLIs) is available accross all the
re-distributors. This will result to a data abort on empty regions
because there are no underlying re-distributor.

So we need to limit the number of regions exposed to the hardware
domain. The code reworked to only expose the minimun number of regions
required by the hardware domain. It is assumed the regions will be
populated starting from the first one.

Lastly, rename vgic_v3_rdist_count to reflect the value return by the
helper.

Reported-by: Shameerali Kolothum Thodi 
Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 

---
 Changes in v2:
 - Rename vgic_v3_rdist_count to vgic_v3_max_rdist_count
 - Fixup #re-distributors
 - Fix typoes
 - Add Shameer's tested tag
---
  xen/arch/arm/gic-v3.c  | 14 +++---
  xen/arch/arm/vgic-v3.c | 21 ++---
  2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index c98a163ee7..2c1454f425 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1265,7 +1265,8 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
*d,
  if ( res )
  return res;
  
-res = fdt_property_cell(fdt, "#redistributor-regions", gicv3.rdist_count);

+res = fdt_property_cell(fdt, "#redistributor-regions",
+d->arch.vgic.nr_regions);
  if ( res )
  return res;
  
@@ -1274,8 +1275,10 @@ static int gicv3_make_hwdom_dt_node(const struct domain *d,

   * GIC has two memory regions: Distributor + rdist regions
   * CPU interface and virtual cpu interfaces accessesed as System registers
   * So cells are created only for Distributor and rdist regions
+ * The hardware domain may not use all the regions. So only copy
+ * what is necessary.
   */
-new_len = new_len * (gicv3.rdist_count + 1);
+new_len = new_len * (d->arch.vgic.nr_regions + 1);
  
  hw_reg = dt_get_property(gic, "reg", );

  if ( !hw_reg )
@@ -1466,6 +1469,7 @@ static inline bool gic_dist_supports_dvis(void)
  }
  
  static int gicv3_make_hwdom_madt(const struct domain *d, u32 offset)

+
  {
  struct acpi_subtable_header *header;
  struct acpi_madt_generic_interrupt *host_gicc, *gicc;
@@ -1503,7 +1507,11 @@ static int gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
  
  /* Add Generic Redistributor */

  size = sizeof(struct acpi_madt_generic_redistributor);
-for ( i = 0; i < gicv3.rdist_count; i++ )
+/*
+ * The hardware domain may not used all the regions. So only copy
+ * what is necessary.
+ */
+for ( i = 0; i < d->arch.vgic.nr_regions; i++ )
  {
  gicr = (struct acpi_madt_generic_redistributor *)(base_ptr + 
table_len);
  gicr->header.type = ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index df1bab3a35..efe824c6fb 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1640,7 +1640,11 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
  return 0;
  }
  
-static inline unsigned int vgic_v3_rdist_count(struct domain *d)

+/*
+ * Return the maximum number possible of re-distributor regions for
+ * a given domain.
+ */
+static inline unsigned int vgic_v3_max_rdist_count(struct domain *d)
  {
  /*
   * Normally there is only one GICv3 redistributor region.
@@ -1662,7 +1666,7 @@ static int vgic_v3_real_domain_init(struct domain *d)
  int rdist_count, i, ret;
  
  /* Allocate memory for Re-distributor regions */

-rdist_count = vgic_v3_rdist_count(d);
+rdist_count = vgic_v3_max_rdist_count(d);
  
  rdist_regions = xzalloc_array(struct vgic_rdist_region, rdist_count);

  if ( !rdist_regions )
@@ -1695,8 +1699,19 @@ static int vgic_v3_real_domain_init(struct domain *d)
  d->arch.vgic.rdist_regions[i].first_cpu = first_cpu;
  
  first_cpu += size / GICV3_GICR_SIZE;

+
+if ( first_cpu >= d->max_vcpus )
+break;
  }
  
+/*

+ * The hardware domain may not use all the re-distributors
+ * regions (e.g when the number of vCPUs does not match the
+ * number of pCPUs). Update the number of regions to avoid
+ * exposing unused region as they will not get emulated.
+ */
+d->arch.vgic.nr_regions = i + 1;
+
  d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
  

Re: [Xen-devel] [PATCH v2 1/2] xen/arm: vgic-v3: Delay the initialization of the domain information

2018-10-01 Thread Julien Grall

Hi,

I forgot to remove patch from the previous series before sending the new 
one.


Please ignore that patch. Sorry for the noise.

Cheers,

On 10/01/2018 07:57 PM, Julien Grall wrote:

A follow-up patch will require to know the number of vCPUs when
initializating the vGICv3 domain structure. However this information is
not available at domain creation. This is only known once
XEN_DOMCTL_max_vpus is called for that domain.

In order to get the max vCPUs around, delay the domain part of the vGIC
v3 initialization until the first vCPU of the domain is initialized.

Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 
Acked-but-disliked-by: Stefano Stabellini 

---

Cc: Andrew Cooper 

This is nasty but I can't find a better way for Xen 4.11 and older. We
still need it for unstable because the number of vCPUs is not known in
arch_domain_init. There are discussion to rework the domain creation a
bit further but I would hope to fix the bug first.

Andrew, I have CCed you to know whether you have a better idea where to
place this call on Xen 4.11 and older.

 Changes in v2:
 - The patch is also needed for the time being on unstable
 - Add Stefano's recently invented tag
 - Add Shameer's tested tag
---
  xen/arch/arm/vgic-v3.c | 29 +++--
  1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 4b42739a52..df1bab3a35 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1573,9 +1573,11 @@ static const struct mmio_handler_ops 
vgic_distr_mmio_handler = {
  .write = vgic_v3_distr_mmio_write,
  };
  
+static int vgic_v3_real_domain_init(struct domain *d);

+
  static int vgic_v3_vcpu_init(struct vcpu *v)
  {
-int i;
+int i, rc;
  paddr_t rdist_base;
  struct vgic_rdist_region *region;
  unsigned int last_cpu;
@@ -1584,6 +1586,19 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
  struct domain *d = v->domain;
  
  /*

+ * This is the earliest place where the number of vCPUs is
+ * known. This is required to initialize correctly the vGIC v3
+ * domain structure. We only to do that when vCPU 0 is
+ * initilialized.
+ */
+if ( v->vcpu_id == 0 )
+{
+rc = vgic_v3_real_domain_init(d);
+if ( rc )
+return rc;
+}
+
+/*
   * Find the region where the re-distributor lives. For this purpose,
   * we look one region ahead as we have only the first CPU in hand.
   */
@@ -1641,7 +1656,7 @@ static inline unsigned int vgic_v3_rdist_count(struct 
domain *d)
 GUEST_GICV3_RDIST_REGIONS;
  }
  
-static int vgic_v3_domain_init(struct domain *d)

+static int vgic_v3_real_domain_init(struct domain *d)
  {
  struct vgic_rdist_region *rdist_regions;
  int rdist_count, i, ret;
@@ -1733,6 +1748,16 @@ static int vgic_v3_domain_init(struct domain *d)
  return 0;
  }
  
+static int vgic_v3_domain_init(struct domain *d)

+{
+/*
+ * The domain initialization for vGIC v3 is delayed until the first vCPU
+ * is created. This because the initialization may require to know the
+ * number of vCPUs that is not known when creating the domain.
+ */
+return 0;
+}
+
  static void vgic_v3_domain_free(struct domain *d)
  {
  vgic_v3_its_free_domain(d);



--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 2/3] tools/libxl: Deprecate PV fields kernel, ramdisk, cmdline

2018-10-01 Thread Julien Grall
The PV fields kernel, ramdisk, cmdline are only there for compatibility
with old toolstack. Instead of manually copying them over to there new
field, use the deprecated_by attribute in the IDL.

Suggested-by: Roger Pau Monné 
Signed-off-by: Julien Grall 

---
Changes in v3:
- Patch added
---
 tools/libxl/libxl_create.c  | 19 ---
 tools/libxl/libxl_types.idl |  6 +++---
 2 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 580320d272..fe97eebdea 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -368,25 +368,6 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 b_info->shadow_memkb = 0;
 if (b_info->u.pv.slack_memkb == LIBXL_MEMKB_DEFAULT)
 b_info->u.pv.slack_memkb = 0;
-
-/* For compatibility, fill in b_info->kernel|ramdisk|cmdline
- * with the value in u.pv, later processing will use
- * b_info->kernel|ramdisk|cmdline only.
- * User with old APIs that passes u.pv.kernel|ramdisk|cmdline
- * is not affected.
- */
-if (!b_info->kernel && b_info->u.pv.kernel) {
-b_info->kernel = b_info->u.pv.kernel;
-b_info->u.pv.kernel = NULL;
-}
-if (!b_info->ramdisk && b_info->u.pv.ramdisk) {
-b_info->ramdisk = b_info->u.pv.ramdisk;
-b_info->u.pv.ramdisk = NULL;
-}
-if (!b_info->cmdline && b_info->u.pv.cmdline) {
-b_info->cmdline = b_info->u.pv.cmdline;
-b_info->u.pv.cmdline = NULL;
-}
 break;
 case LIBXL_DOMAIN_TYPE_PVH:
 libxl_defbool_setdefault(_info->u.pvh.pvshim, false);
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 2cceb8c057..3b8f967651 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -588,12 +588,12 @@ libxl_domain_build_info = Struct("domain_build_info",[
("rdm_mem_boundary_memkb", MemKB),
("mca_caps", uint64),
])),
- ("pv", Struct(None, [("kernel", string),
+ ("pv", Struct(None, [("kernel", string, {'deprecated_by': 
'kernel'}),
   ("slack_memkb", MemKB),
   ("bootloader", string, {'deprecated_by': 
'bootloader'}),
   ("bootloader_args", libxl_string_list, 
{'deprecated_by': 'bootloader_args'}),
-  ("cmdline", string),
-  ("ramdisk", string),
+  ("cmdline", string, {'deprecated_by': 
'cmdline'}),
+  ("ramdisk", string, {'deprecated_by': 
'ramdisk'}),
   ("features", string, {'const': True}),
   # Use host's E820 for PCI passthrough.
   ("e820_host", libxl_defbool),
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/2] xen/arm: vgic-v3: Don't create empty re-distributor regions

2018-10-01 Thread Julien Grall
At the moment, Xen is assuming the hardware domain will have the same
number of re-distributor regions as the host. However, as the
number of CPUs or the stride (e.g on GICv4) may be different we end up
exposing regions which does not contain any re-distributors.

When booting, Linux will go through all the re-distributor region to
check whether a property (e.g vPLIs) is available accross all the
re-distributors. This will result to a data abort on empty regions
because there are no underlying re-distributor.

So we need to limit the number of regions exposed to the hardware
domain. The code reworked to only expose the minimun number of regions
required by the hardware domain. It is assumed the regions will be
populated starting from the first one.

Lastly, rename vgic_v3_rdist_count to reflect the value return by the
helper.

Reported-by: Shameerali Kolothum Thodi 
Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 

---
Changes in v2:
- Rename vgic_v3_rdist_count to vgic_v3_max_rdist_count
- Fixup #re-distributors
- Fix typoes
- Add Shameer's tested tag
---
 xen/arch/arm/gic-v3.c  | 14 +++---
 xen/arch/arm/vgic-v3.c | 21 ++---
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index c98a163ee7..2c1454f425 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1265,7 +1265,8 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
*d,
 if ( res )
 return res;
 
-res = fdt_property_cell(fdt, "#redistributor-regions", gicv3.rdist_count);
+res = fdt_property_cell(fdt, "#redistributor-regions",
+d->arch.vgic.nr_regions);
 if ( res )
 return res;
 
@@ -1274,8 +1275,10 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
*d,
  * GIC has two memory regions: Distributor + rdist regions
  * CPU interface and virtual cpu interfaces accessesed as System registers
  * So cells are created only for Distributor and rdist regions
+ * The hardware domain may not use all the regions. So only copy
+ * what is necessary.
  */
-new_len = new_len * (gicv3.rdist_count + 1);
+new_len = new_len * (d->arch.vgic.nr_regions + 1);
 
 hw_reg = dt_get_property(gic, "reg", );
 if ( !hw_reg )
@@ -1466,6 +1469,7 @@ static inline bool gic_dist_supports_dvis(void)
 }
 
 static int gicv3_make_hwdom_madt(const struct domain *d, u32 offset)
+
 {
 struct acpi_subtable_header *header;
 struct acpi_madt_generic_interrupt *host_gicc, *gicc;
@@ -1503,7 +1507,11 @@ static int gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
 
 /* Add Generic Redistributor */
 size = sizeof(struct acpi_madt_generic_redistributor);
-for ( i = 0; i < gicv3.rdist_count; i++ )
+/*
+ * The hardware domain may not used all the regions. So only copy
+ * what is necessary.
+ */
+for ( i = 0; i < d->arch.vgic.nr_regions; i++ )
 {
 gicr = (struct acpi_madt_generic_redistributor *)(base_ptr + 
table_len);
 gicr->header.type = ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index df1bab3a35..efe824c6fb 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1640,7 +1640,11 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
 return 0;
 }
 
-static inline unsigned int vgic_v3_rdist_count(struct domain *d)
+/*
+ * Return the maximum number possible of re-distributor regions for
+ * a given domain.
+ */
+static inline unsigned int vgic_v3_max_rdist_count(struct domain *d)
 {
 /*
  * Normally there is only one GICv3 redistributor region.
@@ -1662,7 +1666,7 @@ static int vgic_v3_real_domain_init(struct domain *d)
 int rdist_count, i, ret;
 
 /* Allocate memory for Re-distributor regions */
-rdist_count = vgic_v3_rdist_count(d);
+rdist_count = vgic_v3_max_rdist_count(d);
 
 rdist_regions = xzalloc_array(struct vgic_rdist_region, rdist_count);
 if ( !rdist_regions )
@@ -1695,8 +1699,19 @@ static int vgic_v3_real_domain_init(struct domain *d)
 d->arch.vgic.rdist_regions[i].first_cpu = first_cpu;
 
 first_cpu += size / GICV3_GICR_SIZE;
+
+if ( first_cpu >= d->max_vcpus )
+break;
 }
 
+/*
+ * The hardware domain may not use all the re-distributors
+ * regions (e.g when the number of vCPUs does not match the
+ * number of pCPUs). Update the number of regions to avoid
+ * exposing unused region as they will not get emulated.
+ */
+d->arch.vgic.nr_regions = i + 1;
+
 d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
 }
 else
@@ -1825,7 +1840,7 @@ int vgic_v3_init(struct domain *d, int *mmio_count)
 }
 
 /* GICD region + number of Redistributors */
-*mmio_count = vgic_v3_rdist_count(d) + 1;
+*mmio_count = vgic_v3_max_rdist_count(d) + 1;
 
  

[Xen-devel] [PATCH v3 1/3] tools/libxl: Rename libxl__arch_domain_build_info_acpi_setdefault to...

2018-10-01 Thread Julien Grall
libxl__arch_domain_build_info_setdefault

A follow-up will require to modify default of multiple fields of
build_info. So rename the function accordingly.

No functional change.

Signed-off-by: Julien Grall 
Reviewed-by: Roger Pau Monné 
Acked-by: Wei Liu 

---

Changes in v3:
- Add Roger's reviewed-by
- Add Wei's acked-by
---
 tools/libxl/libxl_arch.h   | 3 +--
 tools/libxl/libxl_arm.c| 4 ++--
 tools/libxl/libxl_create.c | 2 +-
 tools/libxl/libxl_x86.c| 3 +--
 4 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index c8ccaaf14c..5ab0c95974 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -65,8 +65,7 @@ _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
 _hidden
-void libxl__arch_domain_build_info_acpi_setdefault(
-libxl_domain_build_info *b_info);
+void libxl__arch_domain_build_info_setdefault(libxl_domain_build_info *b_info);
 
 _hidden
 int libxl__arch_extra_memory(libxl__gc *gc,
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index baa0d38e01..699fd9ddc6 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -1110,9 +1110,9 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
 }
 
-void libxl__arch_domain_build_info_acpi_setdefault(
-libxl_domain_build_info *b_info)
+void libxl__arch_domain_build_info_setdefault(libxl_domain_build_info *b_info)
 {
+/* ACPI is disabled by default */
 libxl_defbool_setdefault(_info->acpi, false);
 }
 
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index dcfde7787e..580320d272 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -215,7 +215,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 if (!b_info->event_channels)
 b_info->event_channels = 1023;
 
-libxl__arch_domain_build_info_acpi_setdefault(b_info);
+libxl__arch_domain_build_info_setdefault(b_info);
 libxl_defbool_setdefault(_info->dm_restrict, false);
 
 switch (b_info->type) {
diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 6f670b03b5..81523a568f 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -613,8 +613,7 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc 
*gc,
 return rc;
 }
 
-void libxl__arch_domain_build_info_acpi_setdefault(
-libxl_domain_build_info *b_info)
+void libxl__arch_domain_build_info_setdefault(libxl_domain_build_info *b_info)
 {
 libxl_defbool_setdefault(_info->acpi, true);
 }
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/2] xen/arm: vgic-v3: Delay the initialization of the domain information

2018-10-01 Thread Julien Grall
A follow-up patch will require to know the number of vCPUs when
initializating the vGICv3 domain structure. However this information is
not available at domain creation. This is only known once
XEN_DOMCTL_max_vpus is called for that domain.

In order to get the max vCPUs around, delay the domain part of the vGIC
v3 initialization until the first vCPU of the domain is initialized.

Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 
Acked-but-disliked-by: Stefano Stabellini 

---

Cc: Andrew Cooper 

This is nasty but I can't find a better way for Xen 4.11 and older. We
still need it for unstable because the number of vCPUs is not known in
arch_domain_init. There are discussion to rework the domain creation a
bit further but I would hope to fix the bug first.

Andrew, I have CCed you to know whether you have a better idea where to
place this call on Xen 4.11 and older.

Changes in v2:
- The patch is also needed for the time being on unstable
- Add Stefano's recently invented tag
- Add Shameer's tested tag
---
 xen/arch/arm/vgic-v3.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 4b42739a52..df1bab3a35 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1573,9 +1573,11 @@ static const struct mmio_handler_ops 
vgic_distr_mmio_handler = {
 .write = vgic_v3_distr_mmio_write,
 };
 
+static int vgic_v3_real_domain_init(struct domain *d);
+
 static int vgic_v3_vcpu_init(struct vcpu *v)
 {
-int i;
+int i, rc;
 paddr_t rdist_base;
 struct vgic_rdist_region *region;
 unsigned int last_cpu;
@@ -1584,6 +1586,19 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
 struct domain *d = v->domain;
 
 /*
+ * This is the earliest place where the number of vCPUs is
+ * known. This is required to initialize correctly the vGIC v3
+ * domain structure. We only to do that when vCPU 0 is
+ * initilialized.
+ */
+if ( v->vcpu_id == 0 )
+{
+rc = vgic_v3_real_domain_init(d);
+if ( rc )
+return rc;
+}
+
+/*
  * Find the region where the re-distributor lives. For this purpose,
  * we look one region ahead as we have only the first CPU in hand.
  */
@@ -1641,7 +1656,7 @@ static inline unsigned int vgic_v3_rdist_count(struct 
domain *d)
GUEST_GICV3_RDIST_REGIONS;
 }
 
-static int vgic_v3_domain_init(struct domain *d)
+static int vgic_v3_real_domain_init(struct domain *d)
 {
 struct vgic_rdist_region *rdist_regions;
 int rdist_count, i, ret;
@@ -1733,6 +1748,16 @@ static int vgic_v3_domain_init(struct domain *d)
 return 0;
 }
 
+static int vgic_v3_domain_init(struct domain *d)
+{
+/*
+ * The domain initialization for vGIC v3 is delayed until the first vCPU
+ * is created. This because the initialization may require to know the
+ * number of vCPUs that is not known when creating the domain.
+ */
+return 0;
+}
+
 static void vgic_v3_domain_free(struct domain *d)
 {
 vgic_v3_its_free_domain(d);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 0/3] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Julien Grall
Hi all,

This small patch series adds switch Arm guest type to PVH in libxl. See
patch #2 for the rationale.

Cheers,

Julien Grall (3):
  tools/libxl: Rename libxl__arch_domain_build_info_acpi_setdefault
to...
  tools/libxl: Deprecate PV fields kernel, ramdisk, cmdline
  tools/libxl: Switch Arm guest type to PVH

 docs/man/xl.cfg.pod.5.in|  5 +++--
 tools/libxl/libxl_arch.h|  4 ++--
 tools/libxl/libxl_arm.c | 28 +---
 tools/libxl/libxl_create.c  | 21 +
 tools/libxl/libxl_types.idl |  6 +++---
 tools/libxl/libxl_x86.c |  4 ++--
 tools/xl/xl_parse.c |  4 
 7 files changed, 40 insertions(+), 32 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v3 3/3] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Julien Grall
Currently, the toolstack is considering Arm guest always PV. However,
they are very similar to PVH because HW virtualization extension are used
and QEMU is not started. So switch Arm guest type to PVH.

To keep compatibility with toolstack creating Arm guest with PV type
(e.g libvirt), libxl will now convert those guests to PVH.

Furthermore, the default type for Arm in xl will now be PVH to allow
smooth transition for user.

Signed-off-by: Julien Grall 

---

This was discussed at Xen Summit and also in various thread on
xen-devel. The latest one was when Andrew sent a patch to deny guest creation
on Arm with XEN_DOMCTL_CDF_hap unset.

I suspect we first implemented Arm guest as PV in libxl because PVH was
non-existent and the type was easier to avoid spawning QEMU. Note that
Linux and Xen are already considering Arm guest as PVH.

Changes in v3:
- Properly reset u.pvh
- Update documentation and print
- Return ERROR_INVAL rather than ERROR_FAIL

Changes in v2:
- Rather than denying PV guest, convert them to PVH
---
 docs/man/xl.cfg.pod.5.in   |  5 +++--
 tools/libxl/libxl_arch.h   |  3 ++-
 tools/libxl/libxl_arm.c| 26 --
 tools/libxl/libxl_create.c |  2 +-
 tools/libxl/libxl_x86.c|  3 ++-
 tools/xl/xl_parse.c|  4 
 6 files changed, 36 insertions(+), 7 deletions(-)

diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
index b72718151b..b1c0be14cd 100644
--- a/docs/man/xl.cfg.pod.5.in
+++ b/docs/man/xl.cfg.pod.5.in
@@ -80,13 +80,14 @@ single host must be unique.
 =item B
 
 Specifies that this is to be a PV domain, suitable for hosting Xen-aware
-guest operating systems. This is the default.
+guest operating systems. This is the default on x86.
 
 =item B
 
 Specifies that this is to be an PVH domain. That is a lightweight HVM-like
 guest without a device model and without many of the emulated devices
-available to HVM guests. Note that this mode requires a PVH aware kernel.
+available to HVM guests. Note that this mode requires a PVH aware kernel on
+x86. This is the default on Arm.
 
 =item B
 
diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index 5ab0c95974..930570ef1e 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -65,7 +65,8 @@ _hidden
 int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq);
 
 _hidden
-void libxl__arch_domain_build_info_setdefault(libxl_domain_build_info *b_info);
+void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
+  libxl_domain_build_info *b_info);
 
 _hidden
 int libxl__arch_extra_memory(libxl__gc *gc,
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 699fd9ddc6..25dc3defc6 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -953,7 +953,11 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
 int rc;
 uint64_t val;
 
-assert(info->type == LIBXL_DOMAIN_TYPE_PV);
+if (info->type != LIBXL_DOMAIN_TYPE_PVH) {
+LOG(ERROR, "Unsupported Arm guest type %s",
+libxl_domain_type_to_string(info->type));
+return ERROR_INVAL;
+}
 
 /* Set the value of domain param HVM_PARAM_CALLBACK_IRQ. */
 val = MASK_INSR(HVM_PARAM_CALLBACK_TYPE_PPI,
@@ -1110,10 +1114,28 @@ int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t 
domid, int irq)
 return xc_domain_bind_pt_spi_irq(CTX->xch, domid, irq, irq);
 }
 
-void libxl__arch_domain_build_info_setdefault(libxl_domain_build_info *b_info)
+void libxl__arch_domain_build_info_setdefault(libxl__gc *gc,
+  libxl_domain_build_info *b_info)
 {
 /* ACPI is disabled by default */
 libxl_defbool_setdefault(_info->acpi, false);
+
+/*
+ * Arm guest are now considered as PVH by the toolstack. To allow
+ * compatibility with previous toolstack, PV guest are automatically
+ * converted to PVH.
+ */
+if (b_info->type != LIBXL_DOMAIN_TYPE_PV)
+return;
+
+LOG(WARN, "Converting PV guest to PVH.");
+LOG(WARN, "Arm guest are now PVH.");
+LOG(WARN, "Please fix your configuration file/toolstack.");
+
+/* Re-initialize type to PVH and all associated fields to defaults. */
+memset(_info->u, '\0', sizeof(b_info->u));
+b_info->type = LIBXL_DOMAIN_TYPE_INVALID;
+libxl_domain_build_info_init_type(b_info, LIBXL_DOMAIN_TYPE_PVH);
 }
 
 /*
diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index fe97eebdea..320dbed3c6 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -215,7 +215,7 @@ int libxl__domain_build_info_setdefault(libxl__gc *gc,
 if (!b_info->event_channels)
 b_info->event_channels = 1023;
 
-libxl__arch_domain_build_info_setdefault(b_info);
+libxl__arch_domain_build_info_setdefault(gc, b_info);
 libxl_defbool_setdefault(_info->dm_restrict, false);
 
 switch (b_info->type) {
diff --git 

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Dario Faggioli
On Mon, 2018-10-01 at 16:35 +0100, Wei Liu wrote:
> On Mon, Oct 01, 2018 at 04:19:07PM +0100, George Dunlap wrote:
> > Wait, the migration code reads the scheduler parameters -- even if
> > these
> > have not been explicitly set by the admin -- and sends them along
> > with
> > the migration stream?  And if the remote scheduler is different,
> > the
> > migration fails?
> > 
> > That's not so good. :-)
> 
> But one can argue that the guest is specific configured that way so
> it's
> parameters should be preserved. We normally analyse things on a case
> by
> case basis.
> 
Sure, but then this means that, in the restore path, we should query
the scheduler which is in use on the host.

If it is the same one that we also see in the migration stream, we try
to set the parameters. If it is a different one, we skip the setparam
all together (which is basically what Juergen is suggesting later in
the thread, AFAICT).

I don't think I've ever worked on migration, but I can try to look into
fixing this (tomorrow).

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Dario Faggioli
On Mon, 2018-10-01 at 18:07 +0200, Juergen Gross wrote:
> On 01/10/2018 17:48, George Dunlap wrote:
> > On 10/01/2018 04:40 PM, Andrew Cooper wrote:
> > > On 01/10/18 16:35, Wei Liu wrote:
> > > > > Wait, the migration code reads the scheduler parameters --
> > > > > even if these
> > > > > have not been explicitly set by the admin -- and sends them
> > > > > along with
> > > > > the migration stream?  And if the remote scheduler is
> > > > > different, the
> > > > > migration fails?
> > > > > 
> > > > > That's not so good. :-)
> > > > 
> > > > But one can argue that the guest is specific configured that
> > > > way so it's
> > > > parameters should be preserved. We normally analyse things on a
> > > > case by
> > > > case basis.
> > > 
> > > If there isn't an obvious fix, then the switch of default
> > > scheduler
> > > needs reverting until there is a fix present.  This is currently
> > > blocking master.
> > 
> > Agreed.  I'd argue for ignoring failures to set scheduler
> > parameters on
> > migrate, on the grounds that this will be less risk to the project
> > as a
> > whole than reverting credit2 again.  But either way we should do
> > something quickly.
> 
> We should ignore a mismatch of the scheduler. Failures when setting
> parameters for a matching scheduler should not be ignored IMO.
> 
Indeed! Especially considering that this isn't really related on what
the default scheduler is (despite it being making Credit2 default that
triggers the issue).

In fact, what if:
- user uses Credit1 (default and supported) on host A
- user uses Credit2 (supported) on host B
- user migrates VM
- BOOOM!

So, unless it is intended --and, I'd say, also documented somewhere--
that migrating between hosts which use different schedulers is to be
avoided, this is already a bug, whatever the default scheduler is...

George, let me know if you're working on a fix already, or if I should
do that myself.

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Software Engineer @ SUSE https://www.suse.com/


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [qemu-mainline test] 128274: regressions - FAIL

2018-10-01 Thread osstest service owner
flight 128274 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128274/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-qcow217 guest-localmigrate/x10   fail REGR. vs. 128215

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128215
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 128215
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128215
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 128215
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 128215
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass

version targeted for testing:
 qemuu07f426c35eddd79388a23d11cb278600d7e3831d
baseline version:
 qemuu042938f46e1c477419d1931381fdadffaa49d45e

Last test of basis   128215  2018-09-29 02:08:34 Z2 days
Testing same since   128274  2018-10-01 08:36:57 Z0 days1 attempts


People who touched revisions under test:
  Emilio G. Cota 
  Peter Maydell 
  Richard Henderson 
  Roman Kapl 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  

Re: [Xen-devel] [PATCH 2/2] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Julien Grall

Hi Roger,

On 10/01/2018 05:37 PM, Roger Pau Monné wrote:

On Mon, Oct 01, 2018 at 04:33:45PM +0100, Julien Grall wrote:

Hi,

On 10/01/2018 04:27 PM, Roger Pau Monné wrote:

On Tue, Sep 25, 2018 at 08:36:36PM +0100, Julien Grall wrote:

Hi Roger,

Sorry for the late reply.

On 09/03/2018 03:40 PM, Roger Pau Monné wrote:

On Mon, Sep 03, 2018 at 12:15:16PM +0100, Julien Grall wrote:

On 03/09/18 12:09, Julien Grall wrote:



On 23/08/18 08:58, Roger Pau Monné wrote:

On Wed, Aug 22, 2018 at 06:48:05PM +0100, Julien Grall wrote:



+
+    b_info->type = LIBXL_DOMAIN_TYPE_PVH;
+
+    /*
+ * They only fie ld in u.pv that matters on Arm are:
kernel, cmdline,
+ * ramdisk.
+ */
+
+    if (!b_info->kernel && b_info->u.pv.kernel)
+    b_info->kernel = b_info->u.pv.kernel;
+
+    if (!b_info->ramdisk && b_info->u.pv.ramdisk)
+    b_info->ramdisk = b_info->u.pv.ramdisk;
+
+    if (!b_info->cmdline && b_info->u.pv.cmdline)
+    b_info->cmdline = b_info->u.pv.cmdline;
+
+    /* Reset b_info->u.pvh to default values */
+    memset(_info->u.pvh, 0, sizeof(b_info->u.pvh));


I'm afraid that's not correct. The default values for u.pvh are set
by libxl__domain_build_info_setdefault.


I thought that this should be covered by the switch right after
the call of
libxl__arch_domain_build_info_setdefault. Did I miss anything?


Oh right, libxl__arch_domain_build_info_setdefault is called by
libxl__domain_build_info_setdefault.


What I wanted to do here is resetting the union to 0 so you
don't get data
mangled by the pv fields.


Another possible option I think would be to mark those fields as
deprecated in the IDL, and libxl__domain_build_info_copy_deprecated
will take care of copying them to the new place. In fact I think all
guest types should be using the top level kernel, ramdisk and cmdline
fields.


I will have a look at it.



I'm not specially comfortable with changing the guest type in the
middle of libxl__domain_build_info_setdefault, but I also don't have a
much better suggestion apart from using the deprecation helper.


I forgot to answer to this bit. I don't think the deprecation helper will do
all the jobs. There are still PV specific parameters: slack_memkb, features,
e820_host.


Those can be left inside the PV sub-struct and shouldn't be marked as
deprecated.


Those are not necessary for Arm, if you don't zero them then you will not
initialize the PVH structure with default values. How do you suggest to
handle them?


But I guess ARM doesn't use any of those fields (or else they should
be moved to the pvh sub-struct anyway)?


Those fields should not be used by Arm. Looking at the current fields in pv,
they all should be zeroed.

However, I am not sure if we can assume that will always be the case.
If we can assume it, then I think it would just be sufficient to have
b_info->type = LIBXL_DOMAIN_TYPE_PVH in Arm.

Any thoughts?


If we go down this route for ARM I think you should add to xl.cfg that
the `type` option only applies to x86, and that there's only one guest
type on ARM. IMO it's best to not use type if there's only one type
available, so that it can be intruded later on ARM if required.


That's not what we discussed before. The idea is to default Arm guest to PVH
for xl. But "type=" would still be available.

We still have to support other type as other toolstack (e.g libvirt) may use
PV for Arm. So the idea here is libxl will check if it was PV and convert to
PVH.

In an earlier reply, you said that we can use deprecated to copy most of the
options over. However, can we always assume the pv.* fields will always be
zero after the value was copied over? If not, then we would have to memset
them.


Yes, the field is cleared/disposed after the value is copied over, see
libxl_C_type_copy_deprecated in gentypes.py.


That's answer part of my question :). AFAIU, nothing promise that a 
default value will always be 0 (see libxl_devid). I am concerned that 
someone may decide to add such field in either PV or PVH in the future.


Anyway, I think I find a way to address that, I will resend a new 
version later today.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/2] xen/arm: vgic-v3: Delay the initialization of the domain information

2018-10-01 Thread Julien Grall
A follow-up patch will require to know the number of vCPUs when
initializating the vGICv3 domain structure. However this information is
not available at domain creation. This is only known once
XEN_DOMCTL_max_vpus is called for that domain.

In order to get the max vCPUs around, delay the domain part of the vGIC
v3 initialization until the first vCPU of the domain is initialized.

Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 
Acked-but-disliked-by: Stefano Stabellini 

---

Cc: Andrew Cooper 

This is nasty but I can't find a better way for Xen 4.11 and older. We
still need it for unstable because the number of vCPUs is not known in
arch_domain_init. There are discussion to rework the domain creation a
bit further but I would hope to fix the bug first.

Andrew, I have CCed you to know whether you have a better idea where to
place this call on Xen 4.11 and older.

Changes in v2:
- The patch is also needed for the time being on unstable
- Add Stefano's recently invented tag
- Add Shameer's tested tag
---
 xen/arch/arm/vgic-v3.c | 29 +++--
 1 file changed, 27 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index 4b42739a52..df1bab3a35 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1573,9 +1573,11 @@ static const struct mmio_handler_ops 
vgic_distr_mmio_handler = {
 .write = vgic_v3_distr_mmio_write,
 };
 
+static int vgic_v3_real_domain_init(struct domain *d);
+
 static int vgic_v3_vcpu_init(struct vcpu *v)
 {
-int i;
+int i, rc;
 paddr_t rdist_base;
 struct vgic_rdist_region *region;
 unsigned int last_cpu;
@@ -1584,6 +1586,19 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
 struct domain *d = v->domain;
 
 /*
+ * This is the earliest place where the number of vCPUs is
+ * known. This is required to initialize correctly the vGIC v3
+ * domain structure. We only to do that when vCPU 0 is
+ * initilialized.
+ */
+if ( v->vcpu_id == 0 )
+{
+rc = vgic_v3_real_domain_init(d);
+if ( rc )
+return rc;
+}
+
+/*
  * Find the region where the re-distributor lives. For this purpose,
  * we look one region ahead as we have only the first CPU in hand.
  */
@@ -1641,7 +1656,7 @@ static inline unsigned int vgic_v3_rdist_count(struct 
domain *d)
GUEST_GICV3_RDIST_REGIONS;
 }
 
-static int vgic_v3_domain_init(struct domain *d)
+static int vgic_v3_real_domain_init(struct domain *d)
 {
 struct vgic_rdist_region *rdist_regions;
 int rdist_count, i, ret;
@@ -1733,6 +1748,16 @@ static int vgic_v3_domain_init(struct domain *d)
 return 0;
 }
 
+static int vgic_v3_domain_init(struct domain *d)
+{
+/*
+ * The domain initialization for vGIC v3 is delayed until the first vCPU
+ * is created. This because the initialization may require to know the
+ * number of vCPUs that is not known when creating the domain.
+ */
+return 0;
+}
+
 static void vgic_v3_domain_free(struct domain *d)
 {
 vgic_v3_its_free_domain(d);
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/2] xen/arm: vgic-v3: Don't create empty re-distributor regions

2018-10-01 Thread Julien Grall
At the moment, Xen is assuming the hardware domain will have the same
number of re-distributor regions as the host. However, as the
number of CPUs or the stride (e.g on GICv4) may be different we end up
exposing regions which does not contain any re-distributors.

When booting, Linux will go through all the re-distributor region to
check whether a property (e.g vPLIs) is available accross all the
re-distributors. This will result to a data abort on empty regions
because there are no underlying re-distributor.

So we need to limit the number of regions exposed to the hardware
domain. The code reworked to only expose the minimun number of regions
required by the hardware domain. It is assumed the regions will be
populated starting from the first one.

Lastly, rename vgic_v3_rdist_count to reflect the value return by the
helper.

Reported-by: Shameerali Kolothum Thodi 
Signed-off-by: Julien Grall 
Tested-by: Shameer Kolothum 

---
Changes in v2:
- Rename vgic_v3_rdist_count to vgic_v3_max_rdist_count
- Fixup #re-distributors
- Fix typoes
- Add Shameer's tested tag
---
 xen/arch/arm/gic-v3.c  | 14 +++---
 xen/arch/arm/vgic-v3.c | 21 ++---
 2 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/xen/arch/arm/gic-v3.c b/xen/arch/arm/gic-v3.c
index c98a163ee7..2c1454f425 100644
--- a/xen/arch/arm/gic-v3.c
+++ b/xen/arch/arm/gic-v3.c
@@ -1265,7 +1265,8 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
*d,
 if ( res )
 return res;
 
-res = fdt_property_cell(fdt, "#redistributor-regions", gicv3.rdist_count);
+res = fdt_property_cell(fdt, "#redistributor-regions",
+d->arch.vgic.nr_regions);
 if ( res )
 return res;
 
@@ -1274,8 +1275,10 @@ static int gicv3_make_hwdom_dt_node(const struct domain 
*d,
  * GIC has two memory regions: Distributor + rdist regions
  * CPU interface and virtual cpu interfaces accessesed as System registers
  * So cells are created only for Distributor and rdist regions
+ * The hardware domain may not use all the regions. So only copy
+ * what is necessary.
  */
-new_len = new_len * (gicv3.rdist_count + 1);
+new_len = new_len * (d->arch.vgic.nr_regions + 1);
 
 hw_reg = dt_get_property(gic, "reg", );
 if ( !hw_reg )
@@ -1466,6 +1469,7 @@ static inline bool gic_dist_supports_dvis(void)
 }
 
 static int gicv3_make_hwdom_madt(const struct domain *d, u32 offset)
+
 {
 struct acpi_subtable_header *header;
 struct acpi_madt_generic_interrupt *host_gicc, *gicc;
@@ -1503,7 +1507,11 @@ static int gicv3_make_hwdom_madt(const struct domain *d, 
u32 offset)
 
 /* Add Generic Redistributor */
 size = sizeof(struct acpi_madt_generic_redistributor);
-for ( i = 0; i < gicv3.rdist_count; i++ )
+/*
+ * The hardware domain may not used all the regions. So only copy
+ * what is necessary.
+ */
+for ( i = 0; i < d->arch.vgic.nr_regions; i++ )
 {
 gicr = (struct acpi_madt_generic_redistributor *)(base_ptr + 
table_len);
 gicr->header.type = ACPI_MADT_TYPE_GENERIC_REDISTRIBUTOR;
diff --git a/xen/arch/arm/vgic-v3.c b/xen/arch/arm/vgic-v3.c
index df1bab3a35..efe824c6fb 100644
--- a/xen/arch/arm/vgic-v3.c
+++ b/xen/arch/arm/vgic-v3.c
@@ -1640,7 +1640,11 @@ static int vgic_v3_vcpu_init(struct vcpu *v)
 return 0;
 }
 
-static inline unsigned int vgic_v3_rdist_count(struct domain *d)
+/*
+ * Return the maximum number possible of re-distributor regions for
+ * a given domain.
+ */
+static inline unsigned int vgic_v3_max_rdist_count(struct domain *d)
 {
 /*
  * Normally there is only one GICv3 redistributor region.
@@ -1662,7 +1666,7 @@ static int vgic_v3_real_domain_init(struct domain *d)
 int rdist_count, i, ret;
 
 /* Allocate memory for Re-distributor regions */
-rdist_count = vgic_v3_rdist_count(d);
+rdist_count = vgic_v3_max_rdist_count(d);
 
 rdist_regions = xzalloc_array(struct vgic_rdist_region, rdist_count);
 if ( !rdist_regions )
@@ -1695,8 +1699,19 @@ static int vgic_v3_real_domain_init(struct domain *d)
 d->arch.vgic.rdist_regions[i].first_cpu = first_cpu;
 
 first_cpu += size / GICV3_GICR_SIZE;
+
+if ( first_cpu >= d->max_vcpus )
+break;
 }
 
+/*
+ * The hardware domain may not use all the re-distributors
+ * regions (e.g when the number of vCPUs does not match the
+ * number of pCPUs). Update the number of regions to avoid
+ * exposing unused region as they will not get emulated.
+ */
+d->arch.vgic.nr_regions = i + 1;
+
 d->arch.vgic.intid_bits = vgic_v3_hw.intid_bits;
 }
 else
@@ -1825,7 +1840,7 @@ int vgic_v3_init(struct domain *d, int *mmio_count)
 }
 
 /* GICD region + number of Redistributors */
-*mmio_count = vgic_v3_rdist_count(d) + 1;
+*mmio_count = vgic_v3_max_rdist_count(d) + 1;
 
  

[Xen-devel] [PATCH v2 0/2] xen/arm: vgic-v3: bug fixes

2018-10-01 Thread Julien Grall
Hi all,

This series is meant to address Dom0 boot failure when running on GICv4
platforms as well as when the number of vCPUs is not equal to the numbers of
pCPUs. They should be backported up Xen 4.8.

Cheers,

Julien Grall (2):
  xen/arm: vgic-v3: Delay the initialization of the domain information
  xen/arm: vgic-v3: Don't create empty re-distributor regions

 xen/arch/arm/gic-v3.c  | 14 +++---
 xen/arch/arm/vgic-v3.c | 50 +-
 2 files changed, 56 insertions(+), 8 deletions(-)

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Roger Pau Monné
On Mon, Oct 01, 2018 at 04:33:45PM +0100, Julien Grall wrote:
> Hi,
> 
> On 10/01/2018 04:27 PM, Roger Pau Monné wrote:
> > On Tue, Sep 25, 2018 at 08:36:36PM +0100, Julien Grall wrote:
> > > Hi Roger,
> > > 
> > > Sorry for the late reply.
> > > 
> > > On 09/03/2018 03:40 PM, Roger Pau Monné wrote:
> > > > On Mon, Sep 03, 2018 at 12:15:16PM +0100, Julien Grall wrote:
> > > > > On 03/09/18 12:09, Julien Grall wrote:
> > > > > > 
> > > > > > 
> > > > > > On 23/08/18 08:58, Roger Pau Monné wrote:
> > > > > > > On Wed, Aug 22, 2018 at 06:48:05PM +0100, Julien Grall wrote:
> > > > > > > > > 
> > > > > > > > > > +
> > > > > > > > > > +    b_info->type = LIBXL_DOMAIN_TYPE_PVH;
> > > > > > > > > > +
> > > > > > > > > > +    /*
> > > > > > > > > > + * They only fie ld in u.pv that matters on Arm are:
> > > > > > > > > > kernel, cmdline,
> > > > > > > > > > + * ramdisk.
> > > > > > > > > > + */
> > > > > > > > > > +
> > > > > > > > > > +    if (!b_info->kernel && b_info->u.pv.kernel)
> > > > > > > > > > +    b_info->kernel = b_info->u.pv.kernel;
> > > > > > > > > > +
> > > > > > > > > > +    if (!b_info->ramdisk && b_info->u.pv.ramdisk)
> > > > > > > > > > +    b_info->ramdisk = b_info->u.pv.ramdisk;
> > > > > > > > > > +
> > > > > > > > > > +    if (!b_info->cmdline && b_info->u.pv.cmdline)
> > > > > > > > > > +    b_info->cmdline = b_info->u.pv.cmdline;
> > > > > > > > > > +
> > > > > > > > > > +    /* Reset b_info->u.pvh to default values */
> > > > > > > > > > +    memset(_info->u.pvh, 0, sizeof(b_info->u.pvh));
> > > > > > > > > 
> > > > > > > > > I'm afraid that's not correct. The default values for u.pvh 
> > > > > > > > > are set
> > > > > > > > > by libxl__domain_build_info_setdefault.
> > > > > > > > 
> > > > > > > > I thought that this should be covered by the switch right after
> > > > > > > > the call of
> > > > > > > > libxl__arch_domain_build_info_setdefault. Did I miss anything?
> > > > > > > 
> > > > > > > Oh right, libxl__arch_domain_build_info_setdefault is called by
> > > > > > > libxl__domain_build_info_setdefault.
> > > > > > > 
> > > > > > > > What I wanted to do here is resetting the union to 0 so you
> > > > > > > > don't get data
> > > > > > > > mangled by the pv fields.
> > > > > > > 
> > > > > > > Another possible option I think would be to mark those fields as
> > > > > > > deprecated in the IDL, and 
> > > > > > > libxl__domain_build_info_copy_deprecated
> > > > > > > will take care of copying them to the new place. In fact I think 
> > > > > > > all
> > > > > > > guest types should be using the top level kernel, ramdisk and 
> > > > > > > cmdline
> > > > > > > fields.
> > > > > > 
> > > > > > I will have a look at it.
> > > > > > 
> > > > > > > 
> > > > > > > I'm not specially comfortable with changing the guest type in the
> > > > > > > middle of libxl__domain_build_info_setdefault, but I also don't 
> > > > > > > have a
> > > > > > > much better suggestion apart from using the deprecation helper.
> > > > > 
> > > > > I forgot to answer to this bit. I don't think the deprecation helper 
> > > > > will do
> > > > > all the jobs. There are still PV specific parameters: slack_memkb, 
> > > > > features,
> > > > > e820_host.
> > > > 
> > > > Those can be left inside the PV sub-struct and shouldn't be marked as
> > > > deprecated.
> > > > 
> > > > > Those are not necessary for Arm, if you don't zero them then you will 
> > > > > not
> > > > > initialize the PVH structure with default values. How do you suggest 
> > > > > to
> > > > > handle them?
> > > > 
> > > > But I guess ARM doesn't use any of those fields (or else they should
> > > > be moved to the pvh sub-struct anyway)?
> > > 
> > > Those fields should not be used by Arm. Looking at the current fields in 
> > > pv,
> > > they all should be zeroed.
> > > 
> > > However, I am not sure if we can assume that will always be the case.
> > > If we can assume it, then I think it would just be sufficient to have
> > > b_info->type = LIBXL_DOMAIN_TYPE_PVH in Arm.
> > > 
> > > Any thoughts?
> > 
> > If we go down this route for ARM I think you should add to xl.cfg that
> > the `type` option only applies to x86, and that there's only one guest
> > type on ARM. IMO it's best to not use type if there's only one type
> > available, so that it can be intruded later on ARM if required.
> 
> That's not what we discussed before. The idea is to default Arm guest to PVH
> for xl. But "type=" would still be available.
> 
> We still have to support other type as other toolstack (e.g libvirt) may use
> PV for Arm. So the idea here is libxl will check if it was PV and convert to
> PVH.
> 
> In an earlier reply, you said that we can use deprecated to copy most of the
> options over. However, can we always assume the pv.* fields will always be
> zero after the value was copied over? If not, then we would have to memset
> them.

Yes, the field is cleared/disposed after the value is copied over, see
libxl_C_type_copy_deprecated in 

[Xen-devel] [PATCH v2 2/2] ns16550: enable use of PCI MSI

2018-10-01 Thread Jan Beulich
Which, on x86, requires fiddling with the INTx bit in PCI config space,
since for internally used MSI we can't delegate this to Dom0.

ns16550_init_postirq() also needs (benign) re-ordering of its
operations.

Signed-off-by: Jan Beulich 
---
v2: Re-base.

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -307,7 +307,7 @@ Flag to indicate whether to probe for a
 ACPI indicating none to be there.
 
 ### com1,com2
-> `= 
[/][,[DPS][,[|pci|amt][,[][,[][,[]]`
+> `= 
[/][,[DPS][,[|pci|amt][,[|msi][,[][,[]]`
 
 Both option `com1` and `com2` follow the same format.
 
@@ -328,7 +328,7 @@ Both option `com1` and `com2` follow the
 * `` is an integer which specifies the IO base port for UART
   registers.
 * `` is the IRQ number to use, or `0` to use the UART in poll
-  mode only.
+  mode only, or `msi` to set up a Message Signaled Interrupt.
 * `` is the PCI location of the UART, in
   `:.` notation.
 * `` is the PCI bridge behind which is the UART, in
--- a/xen/arch/x86/msi.c
+++ b/xen/arch/x86/msi.c
@@ -742,6 +742,16 @@ static int msi_capability_init(struct pc
 
 *desc = entry;
 /* Restore the original MSI enabled bits  */
+if ( !hardware_domain )
+{
+/*
+ * ..., except for internal requests (before Dom0 starts), in which
+ * case we rather need to behave "normally", i.e. not follow the split
+ * brain model where Dom0 actually enables MSI (and disables INTx).
+ */
+pci_intx(dev, 0);
+control |= PCI_MSI_FLAGS_ENABLE;
+}
 pci_conf_write16(seg, bus, slot, func, msi_control_reg(pos), control);
 
 return 0;
@@ -1019,6 +1029,18 @@ static int msix_capability_init(struct p
 ++msix->used_entries;
 
 /* Restore MSI-X enabled bits */
+if ( !hardware_domain )
+{
+/*
+ * ..., except for internal requests (before Dom0 starts), in which
+ * case we rather need to behave "normally", i.e. not follow the split
+ * brain model where Dom0 actually enables MSI (and disables INTx).
+ */
+pci_intx(dev, 0);
+control |= PCI_MSIX_FLAGS_ENABLE;
+control &= ~PCI_MSIX_FLAGS_MASKALL;
+maskall = 0;
+}
 msix->host_maskall = maskall;
 pci_conf_write16(seg, bus, slot, func, msix_control_reg(pos), control);
 
@@ -1073,6 +1095,8 @@ static void __pci_disable_msi(struct msi
 
 dev = entry->dev;
 msi_set_enable(dev, 0);
+if ( entry->irq > 0 && !(irq_to_desc(entry->irq)->status & IRQ_GUEST) )
+pci_intx(dev, 1);
 
 BUG_ON(list_empty(>msi_list));
 }
--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -92,6 +92,7 @@ static struct ns16550 {
 u32 bar64;
 u16 cr;
 u8 bar_idx;
+bool_t msi;
 const struct ns16550_config_param *param; /* Points into .init.*! */
 #endif
 } ns16550_com[2] = { { 0 } };
@@ -712,6 +713,16 @@ static void __init ns16550_init_preirq(s
 uart->fifo_size = 16;
 }
 
+static void __init ns16550_init_irq(struct serial_port *port)
+{
+#ifdef CONFIG_HAS_PCI
+struct ns16550 *uart = port->uart;
+
+if ( uart->msi )
+uart->irq = create_irq(0);
+#endif
+}
+
 static void ns16550_setup_postirq(struct ns16550 *uart)
 {
 if ( uart->irq > 0 )
@@ -746,17 +757,6 @@ static void __init ns16550_init_postirq(
 uart->timeout_ms = max_t(
 unsigned int, 1, (bits * uart->fifo_size * 1000) / uart->baud);
 
-if ( uart->irq > 0 )
-{
-uart->irqaction.handler = ns16550_interrupt;
-uart->irqaction.name= "ns16550";
-uart->irqaction.dev_id  = port;
-if ( (rc = setup_irq(uart->irq, 0, >irqaction)) != 0 )
-printk("ERROR: Failed to allocate ns16550 IRQ %d\n", uart->irq);
-}
-
-ns16550_setup_postirq(uart);
-
 #ifdef CONFIG_HAS_PCI
 if ( uart->bar || uart->ps_bdf_enable )
 {
@@ -777,8 +777,65 @@ static void __init ns16550_init_postirq(
 uart->ps_bdf[0], uart->ps_bdf[1],
 uart->ps_bdf[2]);
 }
+
+if ( uart->msi )
+{
+struct msi_info msi = {
+.bus = uart->ps_bdf[0],
+.devfn = PCI_DEVFN(uart->ps_bdf[1], uart->ps_bdf[2]),
+.irq = rc = uart->irq,
+.entry_nr = 1
+};
+
+if ( rc > 0 )
+{
+struct msi_desc *msi_desc = NULL;
+
+pcidevs_lock();
+
+rc = pci_enable_msi(, _desc);
+if ( !rc )
+{
+struct irq_desc *desc = irq_to_desc(msi.irq);
+unsigned long flags;
+
+spin_lock_irqsave(>lock, flags);
+rc = setup_msi_irq(desc, msi_desc);
+spin_unlock_irqrestore(>lock, flags);
+if ( rc )
+pci_disable_msi(msi_desc);
+}
+
+pcidevs_unlock();
+
+   

Re: [Xen-devel] [PATCH] libxl: keep assigned pci devices across domain reboots

2018-10-01 Thread Andreas Kinzler

Fill the from_xenstore libxl_device_type hook for PCI devices so that
libxl_retrieve_domain_configuration can properly retrieve PCI devices
from xenstore.
This fixes disappearing pci devices across domain reboots.
This patch seems to be committed now. Please backport this to Xen 4.10  
stable branch, for upcoming 4.10.3, because original bugreport was about  
Xen 4.10.


Thanks to devs for the patch. I tested the patch and I can confirm that it  
fixes my original problem on Xen 4.10. To use with Xen 4.10 you need a  
small backport patch (attached).


Regards Andreas

backport-4.10.patch
Description: Binary data
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 1/2] console: adjust IRQ initialization

2018-10-01 Thread Jan Beulich
In order for a Xen internal PCI device driver to enable MSI on the
device, we need another hook which the driver can use to create the IRQ
(doing this in the init_preirq hook is too early, since IRQ code hasn't
got initialized at that time yet, and doing it in init_postirq is too
late because at least on x86 smp_intr_init() needs to know the IRQ
number).

On x86 this additionally requires a slight ordering change to IRQ
initialization, to facilitate calling the new hook between basic
initialization and the call path leading to smp_intr_init().

Signed-off-by: Jan Beulich 
---
v2: Re-base.

--- a/xen/arch/x86/i8259.c
+++ b/xen/arch/x86/i8259.c
@@ -341,8 +341,6 @@ void __init init_IRQ(void)
 
 init_8259A(0);
 
-BUG_ON(init_irq_data() < 0);
-
 for (irq = 0; platform_legacy_irq(irq); irq++) {
 struct irq_desc *desc = irq_to_desc(irq);
 
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -677,6 +677,7 @@ void __init noreturn __start_xen(unsigne
 unsigned long nr_pages, raw_max_page, modules_headroom, *module_map;
 int i, j, e820_warn = 0, bytes = 0;
 bool acpi_boot_table_init_done = false, relocated = false;
+int ret;
 struct ns16550_defaults ns16550 = {
 .data_bits = 8,
 .parity= 'n',
@@ -1559,6 +1560,12 @@ void __init noreturn __start_xen(unsigne
 
 x2apic_bsp_setup();
 
+ret = init_irq_data();
+if ( ret < 0 )
+panic("Error %d setting up IRQ data\n", ret);
+
+console_init_irq();
+
 init_IRQ();
 
 module_map = xmalloc_array(unsigned long, BITS_TO_LONGS(mbi->mods_count));
@@ -1661,7 +1668,7 @@ void __init noreturn __start_xen(unsigne
 if ( (park_offline_cpus || num_online_cpus() < max_cpus) &&
  !cpu_online(i) )
 {
-int ret = cpu_up(i);
+ret = cpu_up(i);
 if ( ret != 0 )
 printk("Failed to bring up CPU %u (error %d)\n", i, ret);
 else if ( num_online_cpus() > max_cpus ||
--- a/xen/drivers/char/console.c
+++ b/xen/drivers/char/console.c
@@ -909,6 +909,11 @@ void __init console_init_ring(void)
 printk("Allocated console ring of %u KiB.\n", opt_conring_size >> 10);
 }
 
+void __init console_init_irq(void)
+{
+serial_init_irq();
+}
+
 void __init console_init_postirq(void)
 {
 serial_init_postirq();
--- a/xen/drivers/char/serial.c
+++ b/xen/drivers/char/serial.c
@@ -503,6 +503,15 @@ void __init serial_init_preirq(void)
 com[i].driver->init_preirq([i]);
 }
 
+void __init serial_init_irq(void)
+{
+unsigned int i;
+
+for ( i = 0; i < ARRAY_SIZE(com); i++ )
+if ( com[i].driver && com[i].driver->init_irq )
+com[i].driver->init_irq([i]);
+}
+
 void __init serial_init_postirq(void)
 {
 int i;
--- a/xen/include/xen/console.h
+++ b/xen/include/xen/console.h
@@ -15,6 +15,7 @@ long read_console_ring(struct xen_sysctl
 
 void console_init_preirq(void);
 void console_init_ring(void);
+void console_init_irq(void);
 void console_init_postirq(void);
 void console_endboot(void);
 int console_has(const char *device);
--- a/xen/include/xen/serial.h
+++ b/xen/include/xen/serial.h
@@ -64,6 +64,7 @@ struct serial_port {
 struct uart_driver {
 /* Driver initialisation (pre- and post-IRQ subsystem setup). */
 void (*init_preirq)(struct serial_port *);
+void (*init_irq)(struct serial_port *);
 void (*init_postirq)(struct serial_port *);
 /* Hook to clean up after Xen bootstrap (before domain 0 runs). */
 void (*endboot)(struct serial_port *);
@@ -99,8 +100,9 @@ struct uart_driver {
 #define SERHND_LO   (1<<3) /* Ditto, except that the MSB is cleared.  */
 #define SERHND_COOKED   (1<<4) /* Newline/carriage-return translation?*/
 
-/* Two-stage initialisation (before/after IRQ-subsystem initialisation). */
+/* Three-stage initialisation (before/during/after IRQ-subsystem setup). */
 void serial_init_preirq(void);
+void serial_init_irq(void);
 void serial_init_postirq(void);
 
 /* Clean-up hook before domain 0 runs. */




___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH RFC] mm/memory_hotplug: Introduce memory block types

2018-10-01 Thread Dave Hansen
> How should a policy in user space look like when new memory gets added
> - on s390x? Not onlining paravirtualized memory is very wrong.

Because we're going to balloon it away in a moment anyway?

We have auto-onlining.  Why isn't that being used on s390?


> So the type of memory is very important here to have in user space.
> Relying on checks like "isS390()", "isKVMGuest()" or "isHyperVGuest()"
> to decide whether to online memory and how to online memory is wrong.
> Only some specific memory types (which I call "normal") are to be
> handled by user space.
> 
> For the other ones, we exactly know what to do:
> - standby? don't online

I think you're horribly conflating the software desire for what the stae
should be and the hardware itself.

>> As for the OOM issues, that sounds like something we need to fix by
>> refusing to do (or delaying) hot-add operations once we consume too much
>> ZONE_NORMAL from memmap[]s rather than trying to indirectly tell
>> userspace to hurry thing along.
> 
> That is a moving target and doing that automatically is basically
> impossible.

Nah.  We know how much metadata we've allocated.  We know how much
ZONE_NORMAL we are eating.  We can *easily* add something to
add_memory() that just sleeps until the ratio is not out-of-whack.

> You can add a lot of memory to the movable zone and
> everything is fine. Suddenly a lot of processes are started - boom.
> MOVABLE should only every be used if you expect an unplug. And for
> paravirtualized devices, a "typical" unplug does not exist.

No, it's more complicated than that.  People use MOVABLE, for instance,
to allow more consistent huge page allocations.  It's certainly not just
hot-remove.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Juergen Gross
On 01/10/2018 17:48, George Dunlap wrote:
> On 10/01/2018 04:40 PM, Andrew Cooper wrote:
>> On 01/10/18 16:35, Wei Liu wrote:
>>> On Mon, Oct 01, 2018 at 04:19:07PM +0100, George Dunlap wrote:
 On 10/01/2018 04:17 PM, Wei Liu wrote:
> On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
> On 01.10.18 at 16:33,  wrote:
>>> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
>>> On 30.09.18 at 23:59,  wrote:
> flight 128240 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
> REGR. vs. 
>>> 128084
 At the first glance

 libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 
 1:Getting 
>>> domain sched credit: Invalid argument
 libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 
 1:cannot 
>>> (re-)build domain: -3
 might indicate a problem resulting from the switch to credit2 as the 
 default
 scheduler. But "first glance" here really means what it says - I 
 didn't look
 (yet) at what exactly libxl tries to do there, in the hope that others 
 may
 know without much digging.
>>> I think this is due to toolstack trying to set the same scheduler
>>> parameters for the newly created guest.
>>>
>>> But in this test, the destination host is using a different scheduler
>>> from the source host. Asking for credit scheduler on a credit2 host is
>>> wrong.
>>>
>>> The relevant snippet in guest cfg (JSON) is:
>>>
>>> "sched_params": {
>>> "sched": "credit",
>>> "weight": 256,
>>> "cap": 0
>>> },
>>>
>>> I can't think of a method to fix it off the top of my head though.
>> So is this something that was specified in the original config? Or
>> is it just the current value which gets read and an attempt made
>> to re-install. If there was no explicit setting in the guest config,
>> shouldn't such a "default" setting be retained by not transferring
>> any scheduler specifics during migration?
>>
> No setting in guest cfg. Those values are extracted from the hypervisor.
> I think we may be able to not send default values to the remote end.
 Wait, the migration code reads the scheduler parameters -- even if these
 have not been explicitly set by the admin -- and sends them along with
 the migration stream?  And if the remote scheduler is different, the
 migration fails?

 That's not so good. :-)
>>> But one can argue that the guest is specific configured that way so it's
>>> parameters should be preserved. We normally analyse things on a case by
>>> case basis.
>>
>> If there isn't an obvious fix, then the switch of default scheduler
>> needs reverting until there is a fix present.  This is currently
>> blocking master.
> 
> Agreed.  I'd argue for ignoring failures to set scheduler parameters on
> migrate, on the grounds that this will be less risk to the project as a
> whole than reverting credit2 again.  But either way we should do
> something quickly.

We should ignore a mismatch of the scheduler. Failures when setting
parameters for a matching scheduler should not be ignored IMO.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/2] ns16550: enable use of PCI MSI

2018-10-01 Thread Jan Beulich
Patch 2 is no longer RFC, now that I have a device where MSI actually
works (suggesting that it was really the other device's fault that things
didn't work).

1: console: adjust IRQ initialization
2: ns16550: enable use of PCI MSI

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Wei Liu
On Mon, Oct 01, 2018 at 03:25:36PM +0100, George Dunlap wrote:
> On 10/01/2018 03:00 PM, Jan Beulich wrote:
>  On 01.10.18 at 15:38,  wrote:
> >> On 10/01/2018 12:25 PM, Jan Beulich wrote:
> >>> I think the main concern
> >>> was with the way migration of the new value was implemented. But I
> >>> really have to defer to Andrew for that, irrespective of him not
> >>> having responded (on the list) to prior pings.
> >>
> >> Is Andrew really the only person who knows enough about migration to
> >> give this the thumbs-up?
> > 
> > That's not the point here, at least afaic: He had voiced _some_
> > concern on an earlier version. In such a case it is, I think, only
> > appropriate to wait with committing until there was indication
> > that the concerns were sufficiently addressed (verbally or by
> > adjustments to the code).
> 
> Right -- but it's not your job to make sure the migration stuff is
> properly addressed; it's Wei and Ian's job.  Wei's R-b was a statement
> from him that the code was good; when Andy questioned that, I think it
> was then *Wei's* job to address the question, not yours or Andy's (or
> even Olaf's).  If Wei says, "I've considered Andy's objections and I
> think the patch is fine as-is", then it can be checked in (given a
> reasonable amount of time for Andy to respond); and Wei can own whatever
> consequences there are.

This patch touched more than toolstack code, that's why Jan gave his R-b
in the first place.

The contention is not on the correctness of the code, but on if this
mechanism had unintended consequences. Both Jan and I thought the code
was correct, but we didn't feel comfortable enough to ignore objections.

Sorry Olaf.

Wei.

> 
>  -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 7/8] libxc/pvh: set default MTRR type to write-back

2018-10-01 Thread Andrew Cooper
On 01/10/18 16:52, Roger Pau Monné wrote:
> On Wed, Sep 26, 2018 at 01:29:27PM +0100, Andrew Cooper wrote:
>> On 08/06/18 10:59, Roger Pau Monne wrote:
>>> @@ -1014,6 +1034,30 @@ static int vcpu_hvm(struct xc_dom_image *dom)
>>>  if ( dom->start_info_seg.pfn )
>>>  bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
>>>  
>>> +/* Set the MTRR. */
>>> +bsp_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
>>> +bsp_ctx.mtrr_d.instance = 0;
>>> +bsp_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
>>> +
>>> +mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0);
>>> +if ( !mtrr_record )
>>> +{
>>> +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
>>> + "%s: unable to get MTRR save record", __func__);
>>> +goto out;
>>> +}
>>> +
>>> +memcpy(_ctx.mtrr, mtrr_record, sizeof(bsp_ctx.mtrr));
>>> +
>>> +/* TODO: maybe this should be a firmware option instead? */
>>> +if ( !dom->device_model )
>>> +/*
>>> + * Enable MTRR, set default type to WB.
>>> + * TODO: add MMIO areas as UC when passthrough is supported.
>>> + */
>>> +bsp_ctx.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK |
>>> + MTRR_DEF_TYPE_ENABLE;
>> This is buggy.  MTRRs are per-vcpu and expected to match.  This only
>> works by chance in the HVM case because all settings are still 0 at this
>> point.
> Yes, I will look into it ASAP, but I'm not sure if the current
> save/restore trick that's done here will be possible with vcpus that
> are down or I will need to expand the save/restore logic to allow
> setting the state of vcpus that are down.

This gets back into "what is an offline vcpu" argument which happened
recently on the Introspection side.

As far as I'm concerned, once the vcpus have been allocated
XEN_DOMCTL_max_vcpus, they are present and "around".

Xen should be responsible for putting them into the correct INIT state
(which is one of the many bugs I've yet to get fully fixed from the
debug handling side of things), and the domain builder is responsible
for making any changes that firmware would normally expect to make.

Whether the vcpus are scheduled or not is unrelated to the state (and
indeed set-ability) of their registers.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] ns16550/PCI: fix skipping of devices

2018-10-01 Thread Jan Beulich
Selecting between single/multiple BAR mode should happen after checking
whether to skip the present device, or else multi-BAR devices won't be
skipped correctly, due to port_idx getting set to zero in that case.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/char/ns16550.c
+++ b/xen/drivers/char/ns16550.c
@@ -1037,18 +1037,18 @@ pci_uart_config(struct ns16550 *uart, bo
 }
 }
 
-if ( !param->bar0 )
-{
-bar_idx = idx;
-port_idx = 0;
-}
-
 if ( port_idx >= param->max_ports )
 {
 idx -= param->max_ports;
 continue;
 }
 
+if ( !param->bar0 )
+{
+bar_idx = idx;
+port_idx = 0;
+}
+
 uart->io_base = 0;
 bar = pci_conf_read32(0, b, d, f,
   PCI_BASE_ADDRESS_0 + bar_idx*4);





___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] IOREQ server on Arm

2018-10-01 Thread Roger Pau Monné
On Wed, Sep 26, 2018 at 11:32:38AM +0100, Julien Grall wrote:
> Hi,
> 
> On 09/26/2018 10:14 AM, Paul Durrant wrote:
> > > -Original Message-
> > > From: Jan Beulich [mailto:jbeul...@suse.com]
> > > Sent: 26 September 2018 09:09
> > > To: Julien Grall ; Paul Durrant
> > > 
> > > Cc: Andrew Cooper ; Roger Pau Monne
> > > ; Stefano Stabellini ; xen-
> > > devel 
> > > Subject: Re: IOREQ server on Arm
> > > 
> > > > > > On 26.09.18 at 00:39,  wrote:
> > > > Hi Paul,
> > > > 
> > > > I am looking at porting the IOREQ server infrastructure on Arm. I didn't
> > > > need much modification to make it run for Arm. Although, the
> > > > implementation could be simplified over the x86 implementation.
> > > > 
> > > > I noticed some issue while trying to implement the hypercall
> > > > XENMEM_acquire_resource. Per my understanding, all the page mapped via
> > > > that hypercall will use the type p2m_mapping_foreign.
> > > > 
> > > > This will result to trigger the ASSERT(fdom != dom) in get_page_from_gfn
> > > > (asm-arm/p2m.h) because the IOREQ page has been allocated to the
> > > > emulator domain and mapped to it. AFAICT x86 has the same assert in
> > > > p2m_get_page_from_gfn(...).
> > > > 
> > > > IHMO, the ASSERT makes sense because you are only meant to map page
> > > > belonging to other domain with that type.
> > > > 
> > > > So I am wondering whether IOREQ server running in PVH Dom0 has been
> > > > tested? What would be the best course of action to fix the issue?
> > > 
> > > I think the p2m type needs to be chosen based on
> > > XENMEM_rsrc_acq_caller_owned.
> > > 
> > 
> > Yes, that's correct. There is a FIXME clause in acquire_resource so that 
> > that only caller owned resources can be mapped by HVM/PVH domains. Thus the 
> > new call can be used for IOREQ server pages, but not grant frames.
> 
> I don't understand your last sentence. IOREQ is caller owned and therefore
> should work.
> 
> As I said, I don't have any problem with mapping the resource. The problem
> is when unmapping it because the region is set with p2m_mapping_foreign. You
> will reach the ASSERT(fdom != p2m->domain) in p2m_get_page_from_gfn.
> 
> Regardless the reference problem (we support it on Arm). Can you explain how
> this is working on PVH Dom0 today?

IOREQs (QEMU) used to work on a PVH Dom0 in the past, it's been some
time since I've launched a HVM guest now, but it definitely worked
before, like it worked for PVHv1 Dom0.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 7/8] libxc/pvh: set default MTRR type to write-back

2018-10-01 Thread Roger Pau Monné
On Wed, Sep 26, 2018 at 01:29:27PM +0100, Andrew Cooper wrote:
> On 08/06/18 10:59, Roger Pau Monne wrote:
> > @@ -1014,6 +1034,30 @@ static int vcpu_hvm(struct xc_dom_image *dom)
> >  if ( dom->start_info_seg.pfn )
> >  bsp_ctx.cpu.rbx = dom->start_info_seg.pfn << PAGE_SHIFT;
> >  
> > +/* Set the MTRR. */
> > +bsp_ctx.mtrr_d.typecode = HVM_SAVE_CODE(MTRR);
> > +bsp_ctx.mtrr_d.instance = 0;
> > +bsp_ctx.mtrr_d.length = HVM_SAVE_LENGTH(MTRR);
> > +
> > +mtrr_record = hvm_get_save_record(full_ctx, HVM_SAVE_CODE(MTRR), 0);
> > +if ( !mtrr_record )
> > +{
> > +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
> > + "%s: unable to get MTRR save record", __func__);
> > +goto out;
> > +}
> > +
> > +memcpy(_ctx.mtrr, mtrr_record, sizeof(bsp_ctx.mtrr));
> > +
> > +/* TODO: maybe this should be a firmware option instead? */
> > +if ( !dom->device_model )
> > +/*
> > + * Enable MTRR, set default type to WB.
> > + * TODO: add MMIO areas as UC when passthrough is supported.
> > + */
> > +bsp_ctx.mtrr.msr_mtrr_def_type = MTRR_TYPE_WRBACK |
> > + MTRR_DEF_TYPE_ENABLE;
> 
> This is buggy.  MTRRs are per-vcpu and expected to match.  This only
> works by chance in the HVM case because all settings are still 0 at this
> point.

Yes, I will look into it ASAP, but I'm not sure if the current
save/restore trick that's done here will be possible with vcpus that
are down or I will need to expand the save/restore logic to allow
setting the state of vcpus that are down.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread George Dunlap
On 10/01/2018 04:40 PM, Andrew Cooper wrote:
> On 01/10/18 16:35, Wei Liu wrote:
>> On Mon, Oct 01, 2018 at 04:19:07PM +0100, George Dunlap wrote:
>>> On 10/01/2018 04:17 PM, Wei Liu wrote:
 On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
 On 01.10.18 at 16:33,  wrote:
>> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
>> On 30.09.18 at 23:59,  wrote:
 flight 128240 xen-unstable real [real]
 http://logs.test-lab.xenproject.org/osstest/logs/128240/ 

 Regressions :-(

 Tests which did not succeed and are blocking,
 including tests which could not be run:
  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
 REGR. vs. 
>> 128084
>>> At the first glance
>>>
>>> libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 
>>> 1:Getting 
>> domain sched credit: Invalid argument
>>> libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 
>>> 1:cannot 
>> (re-)build domain: -3
>>> might indicate a problem resulting from the switch to credit2 as the 
>>> default
>>> scheduler. But "first glance" here really means what it says - I didn't 
>>> look
>>> (yet) at what exactly libxl tries to do there, in the hope that others 
>>> may
>>> know without much digging.
>> I think this is due to toolstack trying to set the same scheduler
>> parameters for the newly created guest.
>>
>> But in this test, the destination host is using a different scheduler
>> from the source host. Asking for credit scheduler on a credit2 host is
>> wrong.
>>
>> The relevant snippet in guest cfg (JSON) is:
>>
>> "sched_params": {
>> "sched": "credit",
>> "weight": 256,
>> "cap": 0
>> },
>>
>> I can't think of a method to fix it off the top of my head though.
> So is this something that was specified in the original config? Or
> is it just the current value which gets read and an attempt made
> to re-install. If there was no explicit setting in the guest config,
> shouldn't such a "default" setting be retained by not transferring
> any scheduler specifics during migration?
>
 No setting in guest cfg. Those values are extracted from the hypervisor.
 I think we may be able to not send default values to the remote end.
>>> Wait, the migration code reads the scheduler parameters -- even if these
>>> have not been explicitly set by the admin -- and sends them along with
>>> the migration stream?  And if the remote scheduler is different, the
>>> migration fails?
>>>
>>> That's not so good. :-)
>> But one can argue that the guest is specific configured that way so it's
>> parameters should be preserved. We normally analyse things on a case by
>> case basis.
> 
> If there isn't an obvious fix, then the switch of default scheduler
> needs reverting until there is a fix present.  This is currently
> blocking master.

Agreed.  I'd argue for ignoring failures to set scheduler parameters on
migrate, on the grounds that this will be less risk to the project as a
whole than reverting credit2 again.  But either way we should do
something quickly.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/altp2m: propagate ept.ad changes to all active altp2ms

2018-10-01 Thread George Dunlap
On 09/28/2018 05:19 PM, Razvan Cojocaru wrote:
> On 9/28/18 6:55 PM, Jan Beulich wrote:
> On 28.09.18 at 17:25,  wrote:
>>> On 9/28/18 5:52 PM, Jan Beulich wrote:
>>> On 28.09.18 at 13:55,  wrote:
> @@ -1218,34 +1219,67 @@ static void ept_tlb_flush(struct p2m_domain *p2m)
>  ept_sync_domain_mask(p2m, p2m->domain->dirty_cpumask);
>  }
>  
> +static void ept_set_ad_sync(struct p2m_domain *p2m, int value)
> +{
> +struct domain *d = p2m->domain;
> +unsigned int i;
> +
> +if ( likely(!altp2m_active(d)) )
> +{
> +p2m_lock(p2m);
> +p2m->ept.ad = value;
> +p2m_unlock(p2m);
> +
> +return;
> +}

 Why would you want to skip updating the host p2m's flag when
 altp2m is active?
>>>
>>> It's not really skipped if I understand the altp2m code correctly: in
>>> that case the hostp2m is d->arch.altp2m_p2m[0], which is take care of in
>>> the loop below the code you've quoted.
>>
>> p2m_init_altp2m() (and other code in p2m.c) suggests otherwise to me.
> 
> That's interesting, p2m_set_mem_access() is treating altp2m index 0 as
> the hostp2m:
> 
> 360 /*
> 361  * Set access type for a region of gfns.
> 362  * If gfn == INVALID_GFN, sets the default access type.
> 363  */
> 364 long p2m_set_mem_access(struct domain *d, gfn_t gfn, uint32_t nr,
> 365 uint32_t start, uint32_t mask,
> xenmem_access_t access,
> 366 unsigned int altp2m_idx)
> 367 {
> 368 struct p2m_domain *p2m = p2m_get_hostp2m(d), *ap2m = NULL;
> 369 p2m_access_t a;
> 370 unsigned long gfn_l;
> 371 long rc = 0;
> 372
> 373 /* altp2m view 0 is treated as the hostp2m */
> 374 #ifdef CONFIG_HVM
> 375 if ( altp2m_idx )
> 376 {
> 377 if ( altp2m_idx >= MAX_ALTP2M ||
> 378  d->arch.altp2m_eptp[altp2m_idx] == mfn_x(INVALID_MFN) )
> 379 return -EINVAL;
> 380
> 381 ap2m = d->arch.altp2m_p2m[altp2m_idx];
> 382 }
> 383 #else
> 384 ASSERT(!altp2m_idx);
> 385 #endif
> 
> which would seem to imply that either we should be able to treat
> d->arch.altp2m_p2m[0] and hostp2m interchangeably, or we are currently
> wasting an altp2m array slot.

Yes, ISTR that altp2m_idx 0 was specifically meant to be the host p2m
(unmodified).  But there seems to be some confusion over that -- this
seems to say it needs to be "manually" interpreted; some of the
mem_access code seems to think that it can just grab
d->arch.altp2m_p2m[0] and that will affect the hostp2m.  And as you say,
if we do the "manual interpretation", then we're allocating an extra p2m
in slot 0 that we're not using.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Andrew Cooper
On 01/10/18 16:35, Wei Liu wrote:
> On Mon, Oct 01, 2018 at 04:19:07PM +0100, George Dunlap wrote:
>> On 10/01/2018 04:17 PM, Wei Liu wrote:
>>> On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
>>> On 01.10.18 at 16:33,  wrote:
> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
> On 30.09.18 at 23:59,  wrote:
>>> flight 128240 xen-unstable real [real]
>>> http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
>>>
>>> Regressions :-(
>>>
>>> Tests which did not succeed and are blocking,
>>> including tests which could not be run:
>>>  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
>>> REGR. vs. 
> 128084
>> At the first glance
>>
>> libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 
>> 1:Getting 
> domain sched credit: Invalid argument
>> libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 
>> 1:cannot 
> (re-)build domain: -3
>> might indicate a problem resulting from the switch to credit2 as the 
>> default
>> scheduler. But "first glance" here really means what it says - I didn't 
>> look
>> (yet) at what exactly libxl tries to do there, in the hope that others 
>> may
>> know without much digging.
> I think this is due to toolstack trying to set the same scheduler
> parameters for the newly created guest.
>
> But in this test, the destination host is using a different scheduler
> from the source host. Asking for credit scheduler on a credit2 host is
> wrong.
>
> The relevant snippet in guest cfg (JSON) is:
>
> "sched_params": {
> "sched": "credit",
> "weight": 256,
> "cap": 0
> },
>
> I can't think of a method to fix it off the top of my head though.
 So is this something that was specified in the original config? Or
 is it just the current value which gets read and an attempt made
 to re-install. If there was no explicit setting in the guest config,
 shouldn't such a "default" setting be retained by not transferring
 any scheduler specifics during migration?

>>> No setting in guest cfg. Those values are extracted from the hypervisor.
>>> I think we may be able to not send default values to the remote end.
>> Wait, the migration code reads the scheduler parameters -- even if these
>> have not been explicitly set by the admin -- and sends them along with
>> the migration stream?  And if the remote scheduler is different, the
>> migration fails?
>>
>> That's not so good. :-)
> But one can argue that the guest is specific configured that way so it's
> parameters should be preserved. We normally analyse things on a case by
> case basis.

If there isn't an obvious fix, then the switch of default scheduler
needs reverting until there is a fix present.  This is currently
blocking master.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Wei Liu
On Mon, Oct 01, 2018 at 04:19:07PM +0100, George Dunlap wrote:
> On 10/01/2018 04:17 PM, Wei Liu wrote:
> > On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
> > On 01.10.18 at 16:33,  wrote:
> >>> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
> >>> On 30.09.18 at 23:59,  wrote:
> > flight 128240 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
> > REGR. vs. 
> >>> 128084
> 
>  At the first glance
> 
>  libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 
>  1:Getting 
> >>> domain sched credit: Invalid argument
>  libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 
>  1:cannot 
> >>> (re-)build domain: -3
> 
>  might indicate a problem resulting from the switch to credit2 as the 
>  default
>  scheduler. But "first glance" here really means what it says - I didn't 
>  look
>  (yet) at what exactly libxl tries to do there, in the hope that others 
>  may
>  know without much digging.
> >>>
> >>> I think this is due to toolstack trying to set the same scheduler
> >>> parameters for the newly created guest.
> >>>
> >>> But in this test, the destination host is using a different scheduler
> >>> from the source host. Asking for credit scheduler on a credit2 host is
> >>> wrong.
> >>>
> >>> The relevant snippet in guest cfg (JSON) is:
> >>>
> >>> "sched_params": {
> >>> "sched": "credit",
> >>> "weight": 256,
> >>> "cap": 0
> >>> },
> >>>
> >>> I can't think of a method to fix it off the top of my head though.
> >>
> >> So is this something that was specified in the original config? Or
> >> is it just the current value which gets read and an attempt made
> >> to re-install. If there was no explicit setting in the guest config,
> >> shouldn't such a "default" setting be retained by not transferring
> >> any scheduler specifics during migration?
> >>
> > 
> > No setting in guest cfg. Those values are extracted from the hypervisor.
> > I think we may be able to not send default values to the remote end.
> 
> Wait, the migration code reads the scheduler parameters -- even if these
> have not been explicitly set by the admin -- and sends them along with
> the migration stream?  And if the remote scheduler is different, the
> migration fails?
> 
> That's not so good. :-)

But one can argue that the guest is specific configured that way so it's
parameters should be preserved. We normally analyse things on a case by
case basis.

Wei.

> 
>  -George
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Julien Grall

Hi,

On 10/01/2018 04:27 PM, Roger Pau Monné wrote:

On Tue, Sep 25, 2018 at 08:36:36PM +0100, Julien Grall wrote:

Hi Roger,

Sorry for the late reply.

On 09/03/2018 03:40 PM, Roger Pau Monné wrote:

On Mon, Sep 03, 2018 at 12:15:16PM +0100, Julien Grall wrote:

On 03/09/18 12:09, Julien Grall wrote:



On 23/08/18 08:58, Roger Pau Monné wrote:

On Wed, Aug 22, 2018 at 06:48:05PM +0100, Julien Grall wrote:



+
+    b_info->type = LIBXL_DOMAIN_TYPE_PVH;
+
+    /*
+ * They only fie ld in u.pv that matters on Arm are:
kernel, cmdline,
+ * ramdisk.
+ */
+
+    if (!b_info->kernel && b_info->u.pv.kernel)
+    b_info->kernel = b_info->u.pv.kernel;
+
+    if (!b_info->ramdisk && b_info->u.pv.ramdisk)
+    b_info->ramdisk = b_info->u.pv.ramdisk;
+
+    if (!b_info->cmdline && b_info->u.pv.cmdline)
+    b_info->cmdline = b_info->u.pv.cmdline;
+
+    /* Reset b_info->u.pvh to default values */
+    memset(_info->u.pvh, 0, sizeof(b_info->u.pvh));


I'm afraid that's not correct. The default values for u.pvh are set
by libxl__domain_build_info_setdefault.


I thought that this should be covered by the switch right after
the call of
libxl__arch_domain_build_info_setdefault. Did I miss anything?


Oh right, libxl__arch_domain_build_info_setdefault is called by
libxl__domain_build_info_setdefault.


What I wanted to do here is resetting the union to 0 so you
don't get data
mangled by the pv fields.


Another possible option I think would be to mark those fields as
deprecated in the IDL, and libxl__domain_build_info_copy_deprecated
will take care of copying them to the new place. In fact I think all
guest types should be using the top level kernel, ramdisk and cmdline
fields.


I will have a look at it.



I'm not specially comfortable with changing the guest type in the
middle of libxl__domain_build_info_setdefault, but I also don't have a
much better suggestion apart from using the deprecation helper.


I forgot to answer to this bit. I don't think the deprecation helper will do
all the jobs. There are still PV specific parameters: slack_memkb, features,
e820_host.


Those can be left inside the PV sub-struct and shouldn't be marked as
deprecated.


Those are not necessary for Arm, if you don't zero them then you will not
initialize the PVH structure with default values. How do you suggest to
handle them?


But I guess ARM doesn't use any of those fields (or else they should
be moved to the pvh sub-struct anyway)?


Those fields should not be used by Arm. Looking at the current fields in pv,
they all should be zeroed.

However, I am not sure if we can assume that will always be the case.
If we can assume it, then I think it would just be sufficient to have
b_info->type = LIBXL_DOMAIN_TYPE_PVH in Arm.

Any thoughts?


If we go down this route for ARM I think you should add to xl.cfg that
the `type` option only applies to x86, and that there's only one guest
type on ARM. IMO it's best to not use type if there's only one type
available, so that it can be intruded later on ARM if required.


That's not what we discussed before. The idea is to default Arm guest to 
PVH for xl. But "type=" would still be available.


We still have to support other type as other toolstack (e.g libvirt) may 
use PV for Arm. So the idea here is libxl will check if it was PV and 
convert to PVH.


In an earlier reply, you said that we can use deprecated to copy most of 
the options over. However, can we always assume the pv.* fields will 
always be zero after the value was copied over? If not, then we would 
have to memset them.


I asked opinion on the last bits.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] AMD Ping [PATCH] amd/iommu: remove hidden AMD inclusive mappings

2018-10-01 Thread Roger Pau Monné
Can I please get an Ack or otherwise from the AMD maintainers?

Thanks, Roger.

On Fri, Sep 21, 2018 at 05:20:47PM +0200, Roger Pau Monne wrote:
> And just rely on arch_iommu_hwdom_init to setup the correct inclusive
> mappings as it's done for Intel.
> 
> AMD has code in amd_iommu_hwdom_init to setup inclusive mappings up to
> max_pdx, remove this since it's now a duplication of
> arch_iommu_hwdom_init. Note that AMD mapped every page with a valid
> mfn up to max_pdx, arch_iommu_hwdom_init will only do so for memory
> below 4GB, so this is a functional change for AMD.
> 
> Move the default setting of iommu_hwdom_{inclusive/reserved} to
> arch_iommu_hwdom_init since the defaults are now the same for both
> Intel and AMD.
> 
> Reported-by: Paul Durrant 
> Signed-off-by: Roger Pau Monné 
> ---
> Cc: Suravee Suthikulpanit 
> Cc: Brian Woods 
> Cc: Kevin Tian 
> Cc: Jan Beulich 
> Cc: Paul Durrant 
> ---
>  xen/drivers/passthrough/amd/pci_amd_iommu.c | 39 -
>  xen/drivers/passthrough/vtd/iommu.c |  7 
>  xen/drivers/passthrough/x86/iommu.c |  8 -
>  3 files changed, 7 insertions(+), 47 deletions(-)
> 
> diff --git a/xen/drivers/passthrough/amd/pci_amd_iommu.c 
> b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> index 4a633ca940..821fe03df5 100644
> --- a/xen/drivers/passthrough/amd/pci_amd_iommu.c
> +++ b/xen/drivers/passthrough/amd/pci_amd_iommu.c
> @@ -250,50 +250,11 @@ static int amd_iommu_add_device(u8 devfn, struct 
> pci_dev *pdev);
>  
>  static void __hwdom_init amd_iommu_hwdom_init(struct domain *d)
>  {
> -unsigned long i; 
>  const struct amd_iommu *iommu;
>  
> -/* Inclusive IOMMU mappings are disabled by default on AMD hardware. */
> -if ( iommu_hwdom_inclusive == -1 )
> -iommu_hwdom_inclusive = 0;
> -/* Reserved IOMMU mappings are disabled by default on AMD hardware. */
> -if ( iommu_hwdom_reserved == -1 )
> -iommu_hwdom_reserved = 0;
> -
>  if ( allocate_domain_resources(dom_iommu(d)) )
>  BUG();
>  
> -if ( !iommu_hwdom_passthrough && !need_iommu(d) )
> -{
> -int rc = 0;
> -
> -/* Set up 1:1 page table for dom0 */
> -for ( i = 0; i < max_pdx; i++ )
> -{
> -unsigned long pfn = pdx_to_pfn(i);
> -
> -/*
> - * XXX Should we really map all non-RAM (above 4G)? Minimally
> - * a pfn_valid() check would seem desirable here.
> - */
> -if ( mfn_valid(_mfn(pfn)) )
> -{
> -int ret = amd_iommu_map_page(d, pfn, pfn,
> - 
> IOMMUF_readable|IOMMUF_writable);
> -
> -if ( !rc )
> -rc = ret;
> -}
> -
> -if ( !(i & 0xf) )
> -process_pending_softirqs();
> -}
> -
> -if ( rc )
> -AMD_IOMMU_DEBUG("d%d: IOMMU mapping failed: %d\n",
> -d->domain_id, rc);
> -}
> -
>  for_each_amd_iommu ( iommu )
>  if ( iomem_deny_access(d, PFN_DOWN(iommu->mmio_base_phys),
> PFN_DOWN(iommu->mmio_base_phys +
> diff --git a/xen/drivers/passthrough/vtd/iommu.c 
> b/xen/drivers/passthrough/vtd/iommu.c
> index bb422ec58c..cf8a80d7a1 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -1304,13 +1304,6 @@ static void __hwdom_init intel_iommu_hwdom_init(struct 
> domain *d)
>  {
>  struct acpi_drhd_unit *drhd;
>  
> -/* Inclusive mappings are enabled by default on Intel hardware for PV. */
> -if ( iommu_hwdom_inclusive == -1 )
> -iommu_hwdom_inclusive = is_pv_domain(d);
> -/* Reserved IOMMU mappings are enabled by default on Intel hardware. */
> -if ( iommu_hwdom_reserved == -1 )
> -iommu_hwdom_reserved = 1;
> -
>  setup_hwdom_pci_devices(d, setup_hwdom_device);
>  setup_hwdom_rmrr(d);
>  /* Make sure workarounds are applied before enabling the IOMMU(s). */
> diff --git a/xen/drivers/passthrough/x86/iommu.c 
> b/xen/drivers/passthrough/x86/iommu.c
> index b7c8b5be41..2de8822c59 100644
> --- a/xen/drivers/passthrough/x86/iommu.c
> +++ b/xen/drivers/passthrough/x86/iommu.c
> @@ -210,7 +210,13 @@ void __hwdom_init arch_iommu_hwdom_init(struct domain *d)
>  
>  BUG_ON(!is_hardware_domain(d));
>  
> -ASSERT(iommu_hwdom_inclusive != -1 && iommu_hwdom_inclusive != -1);
> +/* Inclusive mappings are enabled by default for PV. */
> +if ( iommu_hwdom_inclusive == -1 )
> +iommu_hwdom_inclusive = is_pv_domain(d);
> +/* Reserved IOMMU mappings are enabled by default. */
> +if ( iommu_hwdom_reserved == -1 )
> +iommu_hwdom_reserved = 1;
> +
>  if ( iommu_hwdom_inclusive && !is_pv_domain(d) )
>  {
>  printk(XENLOG_WARNING
> -- 
> 2.19.0
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 17:17,  wrote:
> On 01/10/18 14:29, Jan Beulich wrote:
> On 01.10.18 at 14:39,  wrote:
>>> On 07/06/18 14:08, Olaf Hering wrote:
 Add an option to control when vTSC emulation will be activated for a
 domU with tsc_mode=default. Without such option each TSC access from
 domU will be emulated, which causes a significant perfomance drop for
 workloads that make use of rdtsc.

 One option to avoid the TSC option is to run domUs with tsc_mode=native.
 This has the drawback that migrating a domU from a "2.3GHz" class host
 to a "2.4GHz" class host may change the rate at wich the TSC counter
 increases, the domU may not be prepared for that.

 With the new option the host admin can decide how a domU should behave
 when it is migrated across systems of the same class. Since there is
 always some jitter when Xen calibrates the cpu_khz value, all hosts of
 the same class will most likely have slightly different values. As a
 result vTSC emulation is unavoidable. Data collected during the incident
 which triggered this change showed a jitter of up to 200 KHz across
 systems of the same class.
>>> Do you have any further details of the systems involved?  If they are
>>> identical systems, they should all have the same real TSC frequency, and
>>> its a known issue that Xen isn't very good at working out the
>>> frequency.  TBH, fixing that would be far better overall.
>> Are you convinced all parts match their nominal frequency without
>> _any_ deviation?
> 
> That is the intent of publishing the numbers, yes.
> 
>> If that was the case, we could indeed use CPUID
>> leaves 0x15 / 0x16 output, if available.
> 
> We very much should be doing this.  There are also model-specific ways
> of getting the same data on older processors.
> 
>> But I very much doubt this.
>> As an example, here's what bare metal Linux says on my newest
>> system:
>>
>> tsc: Detected 2600.000 MHz processor
>> tsc: Refined TSC clocksource calibration: 2591.990 MHz
>>
>> Xen figures:
>>
>> (XEN) Detected 2592.107 MHz processor.
>>
>> And then after another re-boot bare metal Linux again
>>
>> tsc: Refined TSC clocksource calibration: 2592.008 MHz
> 
> What is surprising here?  The calibration loop is not 100% accurate and
> cannot be made to be perfect.
> 
> The fact that Linux and Xen agree is because they basically share the
> same calibration algorithm - not that the processor is really running at
> 2592MHz.

And I'm not claiming it is. I'm merely voicing my doubt that the
processor is running at exactly the announced 2600.000 MHz.
In which case it is simply unknown whether calibrated or
nominal values come closer to the truth.

>  For one, all calibration options will read slow by the amount
> of time it takes an interrupt to propagate through the system fabric,
> and there is basically nothing software can do to account for this.

Well, if you measure under otherwise identical conditions twice
the arrival of instances of the same signal, then its time to travel
through the fabric doesn't matter for the distance in time between
the arrival of the two instances. But of course this is as idealized
as is an assumption that humans would be able to manufacture
clocks ticking at exactly their nominal frequency.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] tools/libxl: Switch Arm guest type to PVH

2018-10-01 Thread Roger Pau Monné
On Tue, Sep 25, 2018 at 08:36:36PM +0100, Julien Grall wrote:
> Hi Roger,
> 
> Sorry for the late reply.
> 
> On 09/03/2018 03:40 PM, Roger Pau Monné wrote:
> > On Mon, Sep 03, 2018 at 12:15:16PM +0100, Julien Grall wrote:
> > > On 03/09/18 12:09, Julien Grall wrote:
> > > > 
> > > > 
> > > > On 23/08/18 08:58, Roger Pau Monné wrote:
> > > > > On Wed, Aug 22, 2018 at 06:48:05PM +0100, Julien Grall wrote:
> > > > > > > 
> > > > > > > > +
> > > > > > > > +    b_info->type = LIBXL_DOMAIN_TYPE_PVH;
> > > > > > > > +
> > > > > > > > +    /*
> > > > > > > > + * They only fie ld in u.pv that matters on Arm are:
> > > > > > > > kernel, cmdline,
> > > > > > > > + * ramdisk.
> > > > > > > > + */
> > > > > > > > +
> > > > > > > > +    if (!b_info->kernel && b_info->u.pv.kernel)
> > > > > > > > +    b_info->kernel = b_info->u.pv.kernel;
> > > > > > > > +
> > > > > > > > +    if (!b_info->ramdisk && b_info->u.pv.ramdisk)
> > > > > > > > +    b_info->ramdisk = b_info->u.pv.ramdisk;
> > > > > > > > +
> > > > > > > > +    if (!b_info->cmdline && b_info->u.pv.cmdline)
> > > > > > > > +    b_info->cmdline = b_info->u.pv.cmdline;
> > > > > > > > +
> > > > > > > > +    /* Reset b_info->u.pvh to default values */
> > > > > > > > +    memset(_info->u.pvh, 0, sizeof(b_info->u.pvh));
> > > > > > > 
> > > > > > > I'm afraid that's not correct. The default values for u.pvh are 
> > > > > > > set
> > > > > > > by libxl__domain_build_info_setdefault.
> > > > > > 
> > > > > > I thought that this should be covered by the switch right after
> > > > > > the call of
> > > > > > libxl__arch_domain_build_info_setdefault. Did I miss anything?
> > > > > 
> > > > > Oh right, libxl__arch_domain_build_info_setdefault is called by
> > > > > libxl__domain_build_info_setdefault.
> > > > > 
> > > > > > What I wanted to do here is resetting the union to 0 so you
> > > > > > don't get data
> > > > > > mangled by the pv fields.
> > > > > 
> > > > > Another possible option I think would be to mark those fields as
> > > > > deprecated in the IDL, and libxl__domain_build_info_copy_deprecated
> > > > > will take care of copying them to the new place. In fact I think all
> > > > > guest types should be using the top level kernel, ramdisk and cmdline
> > > > > fields.
> > > > 
> > > > I will have a look at it.
> > > > 
> > > > > 
> > > > > I'm not specially comfortable with changing the guest type in the
> > > > > middle of libxl__domain_build_info_setdefault, but I also don't have a
> > > > > much better suggestion apart from using the deprecation helper.
> > > 
> > > I forgot to answer to this bit. I don't think the deprecation helper will 
> > > do
> > > all the jobs. There are still PV specific parameters: slack_memkb, 
> > > features,
> > > e820_host.
> > 
> > Those can be left inside the PV sub-struct and shouldn't be marked as
> > deprecated.
> > 
> > > Those are not necessary for Arm, if you don't zero them then you will not
> > > initialize the PVH structure with default values. How do you suggest to
> > > handle them?
> > 
> > But I guess ARM doesn't use any of those fields (or else they should
> > be moved to the pvh sub-struct anyway)?
> 
> Those fields should not be used by Arm. Looking at the current fields in pv,
> they all should be zeroed.
> 
> However, I am not sure if we can assume that will always be the case.
> If we can assume it, then I think it would just be sufficient to have
> b_info->type = LIBXL_DOMAIN_TYPE_PVH in Arm.
> 
> Any thoughts?

If we go down this route for ARM I think you should add to xl.cfg that
the `type` option only applies to x86, and that there's only one guest
type on ARM. IMO it's best to not use type if there's only one type
available, so that it can be intruded later on ARM if required.

Thanks, Roger.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Non-DoD Source] [PATCH 2/2] xen/xsm: Add new SILO mode for XSM

2018-10-01 Thread DeGraaf, Daniel G
> 
> When SILO is enabled, there would be no page-sharing or event notifications
> between unprivileged VMs (no grant tables or event channels).
> 
> Signed-off-by: Xin Li 
> 

Acked-by: Daniel De Graaf 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread George Dunlap
On 10/01/2018 04:17 PM, Wei Liu wrote:
> On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
> On 01.10.18 at 16:33,  wrote:
>>> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
>>> On 30.09.18 at 23:59,  wrote:
> flight 128240 xen-unstable real [real]
> http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
>  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
> REGR. vs. 
>>> 128084

 At the first glance

 libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 1:Getting 
>>> domain sched credit: Invalid argument
 libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 1:cannot 
>>> (re-)build domain: -3

 might indicate a problem resulting from the switch to credit2 as the 
 default
 scheduler. But "first glance" here really means what it says - I didn't 
 look
 (yet) at what exactly libxl tries to do there, in the hope that others may
 know without much digging.
>>>
>>> I think this is due to toolstack trying to set the same scheduler
>>> parameters for the newly created guest.
>>>
>>> But in this test, the destination host is using a different scheduler
>>> from the source host. Asking for credit scheduler on a credit2 host is
>>> wrong.
>>>
>>> The relevant snippet in guest cfg (JSON) is:
>>>
>>> "sched_params": {
>>> "sched": "credit",
>>> "weight": 256,
>>> "cap": 0
>>> },
>>>
>>> I can't think of a method to fix it off the top of my head though.
>>
>> So is this something that was specified in the original config? Or
>> is it just the current value which gets read and an attempt made
>> to re-install. If there was no explicit setting in the guest config,
>> shouldn't such a "default" setting be retained by not transferring
>> any scheduler specifics during migration?
>>
> 
> No setting in guest cfg. Those values are extracted from the hypervisor.
> I think we may be able to not send default values to the remote end.

Wait, the migration code reads the scheduler parameters -- even if these
have not been explicitly set by the admin -- and sends them along with
the migration stream?  And if the remote scheduler is different, the
migration fails?

That's not so good. :-)

 -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Andrew Cooper
On 01/10/18 14:29, Jan Beulich wrote:
 On 01.10.18 at 14:39,  wrote:
>> On 07/06/18 14:08, Olaf Hering wrote:
>>> Add an option to control when vTSC emulation will be activated for a
>>> domU with tsc_mode=default. Without such option each TSC access from
>>> domU will be emulated, which causes a significant perfomance drop for
>>> workloads that make use of rdtsc.
>>>
>>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>>> This has the drawback that migrating a domU from a "2.3GHz" class host
>>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>>> increases, the domU may not be prepared for that.
>>>
>>> With the new option the host admin can decide how a domU should behave
>>> when it is migrated across systems of the same class. Since there is
>>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>>> the same class will most likely have slightly different values. As a
>>> result vTSC emulation is unavoidable. Data collected during the incident
>>> which triggered this change showed a jitter of up to 200 KHz across
>>> systems of the same class.
>> Do you have any further details of the systems involved?  If they are
>> identical systems, they should all have the same real TSC frequency, and
>> its a known issue that Xen isn't very good at working out the
>> frequency.  TBH, fixing that would be far better overall.
> Are you convinced all parts match their nominal frequency without
> _any_ deviation?

That is the intent of publishing the numbers, yes.

> If that was the case, we could indeed use CPUID
> leaves 0x15 / 0x16 output, if available.

We very much should be doing this.  There are also model-specific ways
of getting the same data on older processors.

> But I very much doubt this.
> As an example, here's what bare metal Linux says on my newest
> system:
>
> tsc: Detected 2600.000 MHz processor
> tsc: Refined TSC clocksource calibration: 2591.990 MHz
>
> Xen figures:
>
> (XEN) Detected 2592.107 MHz processor.
>
> And then after another re-boot bare metal Linux again
>
> tsc: Refined TSC clocksource calibration: 2592.008 MHz

What is surprising here?  The calibration loop is not 100% accurate and
cannot be made to be perfect.

The fact that Linux and Xen agree is because they basically share the
same calibration algorithm - not that the processor is really running at
2592MHz.  For one, all calibration options will read slow by the amount
of time it takes an interrupt to propagate through the system fabric,
and there is basically nothing software can do to account for this.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Non-DoD Source] [PATCH 1/2] xen/xsm: Introduce new boot parameter xsm

2018-10-01 Thread DeGraaf, Daniel G
> 
> Introduce new boot parameter xsm to choose which xsm module is enabled,
> and set default to dummy.
> 
> Signed-off-by: Xin Li 

Acked-by: Daniel De Graaf 

It might be useful for the commit message to also reference the new Kconfig 
option; thanks for adding it.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Wei Liu
On Mon, Oct 01, 2018 at 09:10:25AM -0600, Jan Beulich wrote:
> >>> On 01.10.18 at 16:33,  wrote:
> > On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
> >> >>> On 30.09.18 at 23:59,  wrote:
> >> > flight 128240 xen-unstable real [real]
> >> > http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
> >> > 
> >> > Regressions :-(
> >> > 
> >> > Tests which did not succeed and are blocking,
> >> > including tests which could not be run:
> >> >  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
> >> > REGR. vs. 
> > 128084
> >> 
> >> At the first glance
> >> 
> >> libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 1:Getting 
> > domain sched credit: Invalid argument
> >> libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 1:cannot 
> > (re-)build domain: -3
> >> 
> >> might indicate a problem resulting from the switch to credit2 as the 
> >> default
> >> scheduler. But "first glance" here really means what it says - I didn't 
> >> look
> >> (yet) at what exactly libxl tries to do there, in the hope that others may
> >> know without much digging.
> > 
> > I think this is due to toolstack trying to set the same scheduler
> > parameters for the newly created guest.
> > 
> > But in this test, the destination host is using a different scheduler
> > from the source host. Asking for credit scheduler on a credit2 host is
> > wrong.
> > 
> > The relevant snippet in guest cfg (JSON) is:
> > 
> > "sched_params": {
> > "sched": "credit",
> > "weight": 256,
> > "cap": 0
> > },
> > 
> > I can't think of a method to fix it off the top of my head though.
> 
> So is this something that was specified in the original config? Or
> is it just the current value which gets read and an attempt made
> to re-install. If there was no explicit setting in the guest config,
> shouldn't such a "default" setting be retained by not transferring
> any scheduler specifics during migration?
> 

No setting in guest cfg. Those values are extracted from the hypervisor.
I think we may be able to not send default values to the remote end.

Wei.


> Jan
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 16:28,  wrote:
> On Mon, 2018-10-01 at 14:54 +0100, George Dunlap wrote:
>> Right, the whole point of idle loop scrubbing is that you *don't*
>> syncronously wait for *all* the memory to finish scrubbing before you
>> can use part of it.  So why is this an issue for you guys -- what
>> concrete problem did it cause, that the full amount of memory took 40s
>> to finish scrubbing rather than only 8s?
> 
> There is no issue at the moment. Using idle loop to scrub all the memory
> is just not viable: it doesn't scale. How long does it currently take
> to bootscrub a multi-Tb system? If it takes a few minutes then I fear
> that it might take an hour to idle-scrub it.

But that's fine, at least in theory, as long as no scrubbing will be
skipped when it was actually needed. The system will be slightly
more busy until done with the scrubbing, but that's exactly the
expected price to pay for the cut down on boot time.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 16:40,  wrote:
> On 01/10/18 12:13, Jan Beulich wrote:
> On 01.10.18 at 11:58,  wrote:
>>> After this patch, alloc_heap_pages() is guaranteed to return scrubbed
>>> pages to a caller unless MEMF_no_scrub flag was provided.
>> 
>> I also don't understand the point of this: Xen's internal allocations
>> have no need to come from scrubbed memory. This in particular
>> also puts under question the need to "eagerly scrub all allocated
>> pages during boot" (quoted from an earlier paragraph).
> 
> There are ways to share a Xen's page with a guest. So from a security
> point of view, there is no guarantee that a page allocated with
> alloc_xenheap_pages() will not end up accessible by some guest.

But this is the exception, not the rule, and hence the code enabling
the sharing is responsible for initializing the page suitably.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 16:11,  wrote:
> I think this is the main argument here: what to do about those security
> sensitive use cases? Scrubbing everything unconditionally might be a too
> radical approach. Would inroducing a new cmdline param be appropriate?

Yes, I'm surely fine with this being an optional (and hence command
line driven) mode.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 16:33,  wrote:
> On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
>> >>> On 30.09.18 at 23:59,  wrote:
>> > flight 128240 xen-unstable real [real]
>> > http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
>> > 
>> > Regressions :-(
>> > 
>> > Tests which did not succeed and are blocking,
>> > including tests which could not be run:
>> >  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail 
>> > REGR. vs. 
> 128084
>> 
>> At the first glance
>> 
>> libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 1:Getting 
> domain sched credit: Invalid argument
>> libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 1:cannot 
> (re-)build domain: -3
>> 
>> might indicate a problem resulting from the switch to credit2 as the default
>> scheduler. But "first glance" here really means what it says - I didn't look
>> (yet) at what exactly libxl tries to do there, in the hope that others may
>> know without much digging.
> 
> I think this is due to toolstack trying to set the same scheduler
> parameters for the newly created guest.
> 
> But in this test, the destination host is using a different scheduler
> from the source host. Asking for credit scheduler on a credit2 host is
> wrong.
> 
> The relevant snippet in guest cfg (JSON) is:
> 
> "sched_params": {
> "sched": "credit",
> "weight": 256,
> "cap": 0
> },
> 
> I can't think of a method to fix it off the top of my head though.

So is this something that was specified in the original config? Or
is it just the current value which gets read and an attempt made
to re-install. If there was no explicit setting in the guest config,
shouldn't such a "default" setting be retained by not transferring
any scheduler specifics during migration?

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [xen-unstable-smoke test] 128283: tolerable all pass - PUSHED

2018-10-01 Thread osstest service owner
flight 128283 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128283/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  5c28a3c6814cc3a054fde133c0f6ef77d80c0412
baseline version:
 xen  146bb7e1934afc5056179fffbf8e24c0db415f8c

Last test of basis   128279  2018-10-01 11:00:41 Z0 days
Testing same since   128283  2018-10-01 13:00:52 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Julien Grall 
  Marc Zyngier 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   146bb7e193..5c28a3c681  5c28a3c6814cc3a054fde133c0f6ef77d80c0412 -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Andrew Cooper
On 01/10/18 14:57, Boris Ostrovsky wrote:
> On 10/1/18 9:50 AM, George Dunlap wrote:
>> On 10/01/2018 02:44 PM, Boris Ostrovsky wrote:
>>> On 10/1/18 9:12 AM, Andrew Cooper wrote:
 On 01/10/18 12:13, Jan Beulich wrote:
 On 01.10.18 at 11:58,  wrote:
>> Having the allocator return unscrubbed pages is a potential security
>> concern: some domain can be given pages with memory contents of another
>> domain. This may happen, for example, if a domain voluntarily releases
>> its own memory (ballooning being the easiest way for doing this).
> And we've always said that in this case it's the domain's responsibility
> to scrub the memory of secrets it cares about. Therefore I'm at the
> very least missing some background on this change of expectations.
 You were on the call when this was discussed, along with the synchronous
 scrubbing in destroydomain.

 Put simply, the current behaviour is not good enough for a number of
 security sensitive usecases.

 The main reason however for doing this is the optimisations it enables,
 and in particular, not double scrubbing most of our pages.
>>> OTOH, it introduces double scrubbing for ballooning because (at least
>>> Linux) guests do scrub memory before handing it back to the hypervisor.
>> We could add a Xen feature flag which tells the guest balloon driver
>> whether the hypervisor will scrub pages (and thus it doesn't need to).
> We can, but we are regressing existing guests.

There is no regression.  How Xen handles its free memory is unrelated to
how the guest handles its free memory, and real usecases exist (AMD SEV
most obviously) where there deliberately isn't full trust between the
guest and the hypervisor.

It is suboptimal in the case where Xen and the guest mutually trust each
other, but that isn't the only case which exists.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On 01/10/18 12:13, Jan Beulich wrote:
 On 01.10.18 at 11:58,  wrote:
>> Having the allocator return unscrubbed pages is a potential security
>> concern: some domain can be given pages with memory contents of another
>> domain. This may happen, for example, if a domain voluntarily releases
>> its own memory (ballooning being the easiest way for doing this).
> 
> And we've always said that in this case it's the domain's responsibility
> to scrub the memory of secrets it cares about. Therefore I'm at the
> very least missing some background on this change of expectations.
> 
>> Change the allocator to always scrub the pages given to it by:
>>
>> 1. free_xenheap_pages()
>> 2. free_domheap_pages()
>> 3. online_page()
>> 4. init_heap_pages()
>>
>> Performance testing has shown that on multi-node machines bootscrub
>> vastly outperforms idle-loop scrubbing. So instead of marking all pages
>> dirty initially, introduce bootscrub_done to track the completion of
>> the process and eagerly scrub all allocated pages during boot.
> 
> I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
> or at least I can't see how that might be skipped (other than due to
> "bootscrub=0"). I was actually expecting this to change at some
> point. Am I perhaps simply mis-reading this part of the description?
> 
>> If bootscrub is disabled, then all pages will be marked as dirty right
>> away and scrubbed either in idle-loop or eagerly during allocation.
>>
>> After this patch, alloc_heap_pages() is guaranteed to return scrubbed
>> pages to a caller unless MEMF_no_scrub flag was provided.
> 
> I also don't understand the point of this: Xen's internal allocations
> have no need to come from scrubbed memory. This in particular
> also puts under question the need to "eagerly scrub all allocated
> pages during boot" (quoted from an earlier paragraph).

There are ways to share a Xen's page with a guest. So from a security
point of view, there is no guarantee that a page allocated with
alloc_xenheap_pages() will not end up accessible by some guest.

--
Thanks,
Sergey

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [xen-unstable test] 128240: regressions - FAIL

2018-10-01 Thread Wei Liu
On Mon, Oct 01, 2018 at 03:04:02AM -0600, Jan Beulich wrote:
> >>> On 30.09.18 at 23:59,  wrote:
> > flight 128240 xen-unstable real [real]
> > http://logs.test-lab.xenproject.org/osstest/logs/128240/ 
> > 
> > Regressions :-(
> > 
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> >  test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail REGR. 
> > vs. 128084
> 
> At the first glance
> 
> libxl: error: libxl_sched.c:232:sched_credit_domain_set: Domain 1:Getting 
> domain sched credit: Invalid argument
> libxl: error: libxl_create.c:1275:domcreate_rebuild_done: Domain 1:cannot 
> (re-)build domain: -3
> 
> might indicate a problem resulting from the switch to credit2 as the default
> scheduler. But "first glance" here really means what it says - I didn't look
> (yet) at what exactly libxl tries to do there, in the hope that others may
> know without much digging.

I think this is due to toolstack trying to set the same scheduler
parameters for the newly created guest.

But in this test, the destination host is using a different scheduler
from the source host. Asking for credit scheduler on a credit2 host is
wrong.

The relevant snippet in guest cfg (JSON) is:

"sched_params": {
"sched": "credit",
"weight": 256,
"cap": 0
},

I can't think of a method to fix it off the top of my head though.


Wei.

> 
> Jan
> 
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 14:54 +0100, George Dunlap wrote:
> On 10/01/2018 02:44 PM, Sergey Dyasli wrote:
> > On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > > > On 01.10.18 at 15:12,  wrote:
> > > > 
> > > > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > > > On 01.10.18 at 11:58,  wrote:
> > > > > > 
> > > > > > Having the allocator return unscrubbed pages is a potential security
> > > > > > concern: some domain can be given pages with memory contents of 
> > > > > > another
> > > > > > domain. This may happen, for example, if a domain voluntarily 
> > > > > > releases
> > > > > > its own memory (ballooning being the easiest way for doing this).
> > > > > 
> > > > > And we've always said that in this case it's the domain's 
> > > > > responsibility
> > > > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > > > very least missing some background on this change of expectations.
> > > > 
> > > > You were on the call when this was discussed, along with the synchronous
> > > > scrubbing in destroydomain.
> > > 
> > > Quite possible, but it has been a while.
> > > 
> > > > Put simply, the current behaviour is not good enough for a number of
> > > > security sensitive usecases.
> > > 
> > > Well, I'm looking forward for Sergey to expand on this in the commit
> > > message.
> > > 
> > > > The main reason however for doing this is the optimisations it enables,
> > > > and in particular, not double scrubbing most of our pages.
> > > 
> > > Well, wait - scrubbing != zeroing (taking into account also what you
> > > say further down).
> > > 
> > > > > > Change the allocator to always scrub the pages given to it by:
> > > > > > 
> > > > > > 1. free_xenheap_pages()
> > > > > > 2. free_domheap_pages()
> > > > > > 3. online_page()
> > > > > > 4. init_heap_pages()
> > > > > > 
> > > > > > Performance testing has shown that on multi-node machines bootscrub
> > > > > > vastly outperforms idle-loop scrubbing. So instead of marking all 
> > > > > > pages
> > > > > > dirty initially, introduce bootscrub_done to track the completion of
> > > > > > the process and eagerly scrub all allocated pages during boot.
> > > > > 
> > > > > I'm afraid I'm somewhat lost: There still is active boot time 
> > > > > scrubbing,
> > > > > or at least I can't see how that might be skipped (other than due to
> > > > > "bootscrub=0"). I was actually expecting this to change at some
> > > > > point. Am I perhaps simply mis-reading this part of the description?
> > > > 
> > > > No.  Sergey tried that, and found a massive perf difference between
> > > > scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)
> > > 
> > > That's not something you can reasonably compare, imo: For one,
> > > it is certainly expected for the background scrubbing to be slower,
> > > simply because of other activity on the system. And then 1.2s
> > > looks awfully small for a multi-Tb system. Yet it is mainly large
> > > systems where the synchronous boot time scrubbing is a problem.
> > 
> > Let me throw in some numbers.
> > 
> > Performance of current idle loop scrubbing is just not good enough:
> > on 8 nodes, 32 CPUs and 512GB RAM machine it takes ~40 seconds to scrub
> > all the memory instead of ~8 seconds for current bootscrub implementation.
> > 
> > This was measured while synchronously waiting for CPUs to scrub all the
> > memory in idle-loop. But scrubbing can happen in background, of course.
> 
> Right, the whole point of idle loop scrubbing is that you *don't*
> syncronously wait for *all* the memory to finish scrubbing before you
> can use part of it.  So why is this an issue for you guys -- what
> concrete problem did it cause, that the full amount of memory took 40s
> to finish scrubbing rather than only 8s?

There is no issue at the moment. Using idle loop to scrub all the memory
is just not viable: it doesn't scale. How long does it currently take
to bootscrub a multi-Tb system? If it takes a few minutes then I fear
that it might take an hour to idle-scrub it.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [freebsd-master test] 128277: all pass - PUSHED

2018-10-01 Thread osstest service owner
flight 128277 freebsd-master real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128277/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 freebsd  04d432fdc0c15f2da76dac4a9a5caf1aeb051ef0
baseline version:
 freebsd  a62d8e729c6db95f0c2e1de618be0b0796c0a97a

Last test of basis   128102  2018-09-26 09:19:09 Z5 days
Failing since128168  2018-09-28 09:19:05 Z3 days2 attempts
Testing same since   128277  2018-10-01 09:19:04 Z0 days1 attempts


People who touched revisions under test:
  0mp <0...@freebsd.org>
  ae 
  allanjude 
  andrew 
  bdrewery 
  brooks 
  bz 
  dim 
  emaste 
  gjb 
  gonzo 
  gordon 
  hselasky 
  imp 
  jhb 
  kib 
  mjg 
  np 
  sef 
  slavash 
  tuexen 
  ygy 

jobs:
 build-amd64-freebsd-againpass
 build-amd64-freebsd  pass
 build-amd64-xen-freebsd  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/freebsd.git
   a62d8e729c6..04d432fdc0c  04d432fdc0c15f2da76dac4a9a5caf1aeb051ef0 -> 
tested/master

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread George Dunlap
On 10/01/2018 03:00 PM, Jan Beulich wrote:
 On 01.10.18 at 15:38,  wrote:
>> On 10/01/2018 12:25 PM, Jan Beulich wrote:
>>> I think the main concern
>>> was with the way migration of the new value was implemented. But I
>>> really have to defer to Andrew for that, irrespective of him not
>>> having responded (on the list) to prior pings.
>>
>> Is Andrew really the only person who knows enough about migration to
>> give this the thumbs-up?
> 
> That's not the point here, at least afaic: He had voiced _some_
> concern on an earlier version. In such a case it is, I think, only
> appropriate to wait with committing until there was indication
> that the concerns were sufficiently addressed (verbally or by
> adjustments to the code).

Right -- but it's not your job to make sure the migration stuff is
properly addressed; it's Wei and Ian's job.  Wei's R-b was a statement
from him that the code was good; when Andy questioned that, I think it
was then *Wei's* job to address the question, not yours or Andy's (or
even Olaf's).  If Wei says, "I've considered Andy's objections and I
think the patch is fine as-is", then it can be checked in (given a
reasonable amount of time for Andy to respond); and Wei can own whatever
consequences there are.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > On 01.10.18 at 15:12,  wrote:
> > 
> > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > On 01.10.18 at 11:58,  wrote:
> > > > 
> > > > Having the allocator return unscrubbed pages is a potential security
> > > > concern: some domain can be given pages with memory contents of another
> > > > domain. This may happen, for example, if a domain voluntarily releases
> > > > its own memory (ballooning being the easiest way for doing this).
> > > 
> > > And we've always said that in this case it's the domain's responsibility
> > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > very least missing some background on this change of expectations.
> > 
> > You were on the call when this was discussed, along with the synchronous
> > scrubbing in destroydomain.
> 
> Quite possible, but it has been a while.
> 
> > Put simply, the current behaviour is not good enough for a number of
> > security sensitive usecases.
> 
> Well, I'm looking forward for Sergey to expand on this in the commit
> message.

Jan,

I think this is the main argument here: what to do about those security
sensitive use cases? Scrubbing everything unconditionally might be a too
radical approach. Would inroducing a new cmdline param be appropriate?

> 
> > The main reason however for doing this is the optimisations it enables,
> > and in particular, not double scrubbing most of our pages.
> 
> Well, wait - scrubbing != zeroing (taking into account also what you
> say further down).

Andrew,

I'm not yet convinced myself about the value that returning always zeroed
pages from the allocator provides. Most of the pages are given to guests
anyway, and re-zeroing a few pages in the hypervisor doesn't sound
too bad. But maybe I'm just not aware of cases where Xen performs large
allocations and zeroes them afterwards?

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/vsprintf: Introduce %pd formatter for domains

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 15:55,  wrote:
> On 01/10/18 11:25, Jan Beulich wrote:
> On 01.10.18 at 12:23,  wrote:
>>> On 01/10/18 11:14, Jan Beulich wrote:
>>> On 01.10.18 at 12:02,  wrote:
> On 01/10/18 10:08, Jan Beulich wrote:
> On 28.09.18 at 19:22,  wrote:
>>> +static char *print_domain(char *str, char *end, const struct domain *d)
>>> +{
>>> +const char *name = NULL;
>>> +
>>> +/* Some debugging may have an optionally-NULL pointer. */
>>> +if ( unlikely(!d) )
>>> +return string(str, end, "NULL", -1, -1, 0);
>>> +
>>> +if ( str < end )
>>> +*str = 'd';
>>> +
>>> +switch ( d->domain_id )
>>> +{
>>> +case DOMID_IO:   name = "[IO]";   break;
>>> +case DOMID_XEN:  name = "[XEN]";  break;
>>> +case DOMID_COW:  name = "[COW]";  break;
>>> +case DOMID_IDLE: name = "[IDLE]"; break;
>> default: ASSERT_UNREACHABLE();
>>
>> ?
> No - specifically not in this case.
>
> This path is used when printing crash information, and falling back to a
> number is better behaviour than falling into an infinite loop,
> overflowing the primary stack, then taking a #DF (which escalates to
> triple fault on AMD), without printing anything useful.
 Ah, good point. Perhaps worth a brief comment instead of a "default:"
 then?
>>> This incremental diff?
>> LGTM, thanks.
> 
> I've committed this now, but its just occurred to me that we couldn't
> possibly have had a default case, because that is the common case for
> most domains.

Oh, indeed, but still, while the "unreachable" part of my suggestion
was indeed wrong, but it still could have been
ASSERT(d->domain_id < DOMID_FIRST_RESERVED). But I agree this
wouldn't have been overly helpful.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 15:38,  wrote:
> On 10/01/2018 12:25 PM, Jan Beulich wrote:
>> I think the main concern
>> was with the way migration of the new value was implemented. But I
>> really have to defer to Andrew for that, irrespective of him not
>> having responded (on the list) to prior pings.
> 
> Is Andrew really the only person who knows enough about migration to
> give this the thumbs-up?

That's not the point here, at least afaic: He had voiced _some_
concern on an earlier version. In such a case it is, I think, only
appropriate to wait with committing until there was indication
that the concerns were sufficiently addressed (verbally or by
adjustments to the code). It is instead my incomplete
knowledge on the various parts of migration that made it
undesirable to try to judge in his place, after pings lead no-
where.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages]

2018-10-01 Thread Andrew Cooper
On 01/10/18 14:24, Ian Jackson wrote:
>
>>> handle_tsc_info has no code to verify that padding is indeed zero. Due
>>> to the lack of a version field it is impossible to know if the sender
>>> already has the newly introduced vtsc_tolerance field. In the worst
>>> case the receiving domU will get an unemulated TSC.
>> The lack of padding verification is deliberate, for forwards
>> compatibility.  Why does the sending code matter?  One way or another,
>> if the field is 0, the option wasn't present or wasn't configured. 
>> Neither of these situations affect the decision-making that the
>> receiving side needs to perform.
> We are talking here about an unused field that is (supposedly?)
> always sent as zero ?

The spec requires the padding to always be sent as zero, and
verify-stream-v2 does check the padding and object to non-zero values.

libxc's write_tsc_info() has always fully zeroed the structure, and
convert-legacy-stream also behaves the same way.  However, I notice this
change will break migration from pre 4.6 as write_libxc_tsc_info() now
has a mismatched number of parameters to pack and throw an exception.

>
> AIUI:
>
> In the new code the semantics of zero is "do not allow any tolerance".
> The old code handles this correctly.  The issue is with migrations
> from new code to old: the tolerance value will silently ignored.

Migrating from new to old doesn't remotely work, and whether things
explode or not is very context dependent.

I'm not sure its worth caring about this case.  We never have before. 
(Getting backwards migration working for emergency cases is on my TODO
list, but it is dependent on rewriting the hypervisor interfaces for
getting/setting guest state.)

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Boris Ostrovsky
On 10/1/18 9:50 AM, George Dunlap wrote:
> On 10/01/2018 02:44 PM, Boris Ostrovsky wrote:
>> On 10/1/18 9:12 AM, Andrew Cooper wrote:
>>> On 01/10/18 12:13, Jan Beulich wrote:
>>> On 01.10.18 at 11:58,  wrote:
> Having the allocator return unscrubbed pages is a potential security
> concern: some domain can be given pages with memory contents of another
> domain. This may happen, for example, if a domain voluntarily releases
> its own memory (ballooning being the easiest way for doing this).
 And we've always said that in this case it's the domain's responsibility
 to scrub the memory of secrets it cares about. Therefore I'm at the
 very least missing some background on this change of expectations.
>>> You were on the call when this was discussed, along with the synchronous
>>> scrubbing in destroydomain.
>>>
>>> Put simply, the current behaviour is not good enough for a number of
>>> security sensitive usecases.
>>>
>>> The main reason however for doing this is the optimisations it enables,
>>> and in particular, not double scrubbing most of our pages.
>> OTOH, it introduces double scrubbing for ballooning because (at least
>> Linux) guests do scrub memory before handing it back to the hypervisor.
> We could add a Xen feature flag which tells the guest balloon driver
> whether the hypervisor will scrub pages (and thus it doesn't need to).

We can, but we are regressing existing guests.

Can we except ballooned pages from being scrubbed, and instead have a
guest request scrubbing if it has concerns (that Andrew mentioned) about
this?

-boris


>
> Andy, wasn't unconditional scrubbing of guest pages also required by
> some certification or other?
>
>  -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Andrew Cooper
On 01/10/18 14:50, George Dunlap wrote:
> On 10/01/2018 02:44 PM, Boris Ostrovsky wrote:
>> On 10/1/18 9:12 AM, Andrew Cooper wrote:
>>> On 01/10/18 12:13, Jan Beulich wrote:
>>> On 01.10.18 at 11:58,  wrote:
> Having the allocator return unscrubbed pages is a potential security
> concern: some domain can be given pages with memory contents of another
> domain. This may happen, for example, if a domain voluntarily releases
> its own memory (ballooning being the easiest way for doing this).
 And we've always said that in this case it's the domain's responsibility
 to scrub the memory of secrets it cares about. Therefore I'm at the
 very least missing some background on this change of expectations.
>>> You were on the call when this was discussed, along with the synchronous
>>> scrubbing in destroydomain.
>>>
>>> Put simply, the current behaviour is not good enough for a number of
>>> security sensitive usecases.
>>>
>>> The main reason however for doing this is the optimisations it enables,
>>> and in particular, not double scrubbing most of our pages.
>> OTOH, it introduces double scrubbing for ballooning because (at least
>> Linux) guests do scrub memory before handing it back to the hypervisor.
> We could add a Xen feature flag which tells the guest balloon driver
> whether the hypervisor will scrub pages (and thus it doesn't need to).
>
> Andy, wasn't unconditional scrubbing of guest pages also required by
> some certification or other?

It was for Common Criteria, and not trusting guests to zero their own
pages is by far the simplest way of ensuring that they can't set up a
sidechannel.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/vsprintf: Introduce %pd formatter for domains

2018-10-01 Thread Andrew Cooper
On 01/10/18 11:25, Jan Beulich wrote:
 On 01.10.18 at 12:23,  wrote:
>> On 01/10/18 11:14, Jan Beulich wrote:
>> On 01.10.18 at 12:02,  wrote:
 On 01/10/18 10:08, Jan Beulich wrote:
 On 28.09.18 at 19:22,  wrote:
>> +static char *print_domain(char *str, char *end, const struct domain *d)
>> +{
>> +const char *name = NULL;
>> +
>> +/* Some debugging may have an optionally-NULL pointer. */
>> +if ( unlikely(!d) )
>> +return string(str, end, "NULL", -1, -1, 0);
>> +
>> +if ( str < end )
>> +*str = 'd';
>> +
>> +switch ( d->domain_id )
>> +{
>> +case DOMID_IO:   name = "[IO]";   break;
>> +case DOMID_XEN:  name = "[XEN]";  break;
>> +case DOMID_COW:  name = "[COW]";  break;
>> +case DOMID_IDLE: name = "[IDLE]"; break;
> default: ASSERT_UNREACHABLE();
>
> ?
 No - specifically not in this case.

 This path is used when printing crash information, and falling back to a
 number is better behaviour than falling into an infinite loop,
 overflowing the primary stack, then taking a #DF (which escalates to
 triple fault on AMD), without printing anything useful.
>>> Ah, good point. Perhaps worth a brief comment instead of a "default:"
>>> then?
>> This incremental diff?
> LGTM, thanks.

I've committed this now, but its just occurred to me that we couldn't
possibly have had a default case, because that is the common case for
most domains.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread George Dunlap
On 10/01/2018 02:44 PM, Sergey Dyasli wrote:
> On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> On 01.10.18 at 15:12,  wrote:
>>>
>>> On 01/10/18 12:13, Jan Beulich wrote:
>>> On 01.10.18 at 11:58,  wrote:
>
> Having the allocator return unscrubbed pages is a potential security
> concern: some domain can be given pages with memory contents of another
> domain. This may happen, for example, if a domain voluntarily releases
> its own memory (ballooning being the easiest way for doing this).

 And we've always said that in this case it's the domain's responsibility
 to scrub the memory of secrets it cares about. Therefore I'm at the
 very least missing some background on this change of expectations.
>>>
>>> You were on the call when this was discussed, along with the synchronous
>>> scrubbing in destroydomain.
>>
>> Quite possible, but it has been a while.
>>
>>> Put simply, the current behaviour is not good enough for a number of
>>> security sensitive usecases.
>>
>> Well, I'm looking forward for Sergey to expand on this in the commit
>> message.
>>
>>> The main reason however for doing this is the optimisations it enables,
>>> and in particular, not double scrubbing most of our pages.
>>
>> Well, wait - scrubbing != zeroing (taking into account also what you
>> say further down).
>>
> Change the allocator to always scrub the pages given to it by:
>
> 1. free_xenheap_pages()
> 2. free_domheap_pages()
> 3. online_page()
> 4. init_heap_pages()
>
> Performance testing has shown that on multi-node machines bootscrub
> vastly outperforms idle-loop scrubbing. So instead of marking all pages
> dirty initially, introduce bootscrub_done to track the completion of
> the process and eagerly scrub all allocated pages during boot.

 I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
 or at least I can't see how that might be skipped (other than due to
 "bootscrub=0"). I was actually expecting this to change at some
 point. Am I perhaps simply mis-reading this part of the description?
>>>
>>> No.  Sergey tried that, and found a massive perf difference between
>>> scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)
>>
>> That's not something you can reasonably compare, imo: For one,
>> it is certainly expected for the background scrubbing to be slower,
>> simply because of other activity on the system. And then 1.2s
>> looks awfully small for a multi-Tb system. Yet it is mainly large
>> systems where the synchronous boot time scrubbing is a problem.
> 
> Let me throw in some numbers.
> 
> Performance of current idle loop scrubbing is just not good enough:
> on 8 nodes, 32 CPUs and 512GB RAM machine it takes ~40 seconds to scrub
> all the memory instead of ~8 seconds for current bootscrub implementation.
> 
> This was measured while synchronously waiting for CPUs to scrub all the
> memory in idle-loop. But scrubbing can happen in background, of course.

Right, the whole point of idle loop scrubbing is that you *don't*
syncronously wait for *all* the memory to finish scrubbing before you
can use part of it.  So why is this an issue for you guys -- what
concrete problem did it cause, that the full amount of memory took 40s
to finish scrubbing rather than only 8s?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread George Dunlap
On 10/01/2018 02:44 PM, Boris Ostrovsky wrote:
> On 10/1/18 9:12 AM, Andrew Cooper wrote:
>> On 01/10/18 12:13, Jan Beulich wrote:
>> On 01.10.18 at 11:58,  wrote:
 Having the allocator return unscrubbed pages is a potential security
 concern: some domain can be given pages with memory contents of another
 domain. This may happen, for example, if a domain voluntarily releases
 its own memory (ballooning being the easiest way for doing this).
>>> And we've always said that in this case it's the domain's responsibility
>>> to scrub the memory of secrets it cares about. Therefore I'm at the
>>> very least missing some background on this change of expectations.
>> You were on the call when this was discussed, along with the synchronous
>> scrubbing in destroydomain.
>>
>> Put simply, the current behaviour is not good enough for a number of
>> security sensitive usecases.
>>
>> The main reason however for doing this is the optimisations it enables,
>> and in particular, not double scrubbing most of our pages.
> 
> OTOH, it introduces double scrubbing for ballooning because (at least
> Linux) guests do scrub memory before handing it back to the hypervisor.

We could add a Xen feature flag which tells the guest balloon driver
whether the hypervisor will scrub pages (and thus it doesn't need to).

Andy, wasn't unconditional scrubbing of guest pages also required by
some certification or other?

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Boris Ostrovsky
On 10/1/18 9:12 AM, Andrew Cooper wrote:
> On 01/10/18 12:13, Jan Beulich wrote:
> On 01.10.18 at 11:58,  wrote:
>>> Having the allocator return unscrubbed pages is a potential security
>>> concern: some domain can be given pages with memory contents of another
>>> domain. This may happen, for example, if a domain voluntarily releases
>>> its own memory (ballooning being the easiest way for doing this).
>> And we've always said that in this case it's the domain's responsibility
>> to scrub the memory of secrets it cares about. Therefore I'm at the
>> very least missing some background on this change of expectations.
> You were on the call when this was discussed, along with the synchronous
> scrubbing in destroydomain.
>
> Put simply, the current behaviour is not good enough for a number of
> security sensitive usecases.
>
> The main reason however for doing this is the optimisations it enables,
> and in particular, not double scrubbing most of our pages.

OTOH, it introduces double scrubbing for ballooning because (at least
Linux) guests do scrub memory before handing it back to the hypervisor.

-boris


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Sergey Dyasli
On Mon, 2018-10-01 at 07:38 -0600, Jan Beulich wrote:
> > > > On 01.10.18 at 15:12,  wrote:
> > 
> > On 01/10/18 12:13, Jan Beulich wrote:
> > > > > > On 01.10.18 at 11:58,  wrote:
> > > > 
> > > > Having the allocator return unscrubbed pages is a potential security
> > > > concern: some domain can be given pages with memory contents of another
> > > > domain. This may happen, for example, if a domain voluntarily releases
> > > > its own memory (ballooning being the easiest way for doing this).
> > > 
> > > And we've always said that in this case it's the domain's responsibility
> > > to scrub the memory of secrets it cares about. Therefore I'm at the
> > > very least missing some background on this change of expectations.
> > 
> > You were on the call when this was discussed, along with the synchronous
> > scrubbing in destroydomain.
> 
> Quite possible, but it has been a while.
> 
> > Put simply, the current behaviour is not good enough for a number of
> > security sensitive usecases.
> 
> Well, I'm looking forward for Sergey to expand on this in the commit
> message.
> 
> > The main reason however for doing this is the optimisations it enables,
> > and in particular, not double scrubbing most of our pages.
> 
> Well, wait - scrubbing != zeroing (taking into account also what you
> say further down).
> 
> > > > Change the allocator to always scrub the pages given to it by:
> > > > 
> > > > 1. free_xenheap_pages()
> > > > 2. free_domheap_pages()
> > > > 3. online_page()
> > > > 4. init_heap_pages()
> > > > 
> > > > Performance testing has shown that on multi-node machines bootscrub
> > > > vastly outperforms idle-loop scrubbing. So instead of marking all pages
> > > > dirty initially, introduce bootscrub_done to track the completion of
> > > > the process and eagerly scrub all allocated pages during boot.
> > > 
> > > I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
> > > or at least I can't see how that might be skipped (other than due to
> > > "bootscrub=0"). I was actually expecting this to change at some
> > > point. Am I perhaps simply mis-reading this part of the description?
> > 
> > No.  Sergey tried that, and found a massive perf difference between
> > scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)
> 
> That's not something you can reasonably compare, imo: For one,
> it is certainly expected for the background scrubbing to be slower,
> simply because of other activity on the system. And then 1.2s
> looks awfully small for a multi-Tb system. Yet it is mainly large
> systems where the synchronous boot time scrubbing is a problem.

Let me throw in some numbers.

Performance of current idle loop scrubbing is just not good enough:
on 8 nodes, 32 CPUs and 512GB RAM machine it takes ~40 seconds to scrub
all the memory instead of ~8 seconds for current bootscrub implementation.

This was measured while synchronously waiting for CPUs to scrub all the
memory in idle-loop. But scrubbing can happen in background, of course.

-- 
Thanks,
Sergey
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH] x86: deal with firmware setting bogus TSC_ADJUST values

2018-10-01 Thread Jan Beulich
The system Intel have handed me for AVX512 emulator work ("Gigabyte
Technology Co., Ltd. X299 AORUS Gaming 3 Pro/X299 AORUS Gaming 3
Pro-CF, BIOS F3 12/28/2017") would not come up under Xen - it hung in
the middle of Dom0 PCI initialization. As it turned out, Xen's time
management did not work because of the firmware setting (only) the boot
CPU's TSC_ADJUST MSR to a large negative value (on the order of -2^50).

Follow Linux (also shamelessly stealing their comments) in
- clearing the register for the boot CPU (we don't have a need for
  exceptions here yet, as the only exception in Linux is a class of
  systems Xen doesn't work on anyway as far as I'm aware),
- forcing non-negative values uniformly,
- syncing the registers within sockets.
Linux caps at 0x7fff as well, but their comment saying "as those
wreckage the timer as well" does, to me at least, neither really explain
the reason nor the particular value chosen. Hence until someone runs
into such a system (and can then, hopefully, explain things) I think we
should leave out that specific part.

In order to avoid making init_percpu_time() depend on running _before_
set_cpu_sibling_map() (and hence the booting CPU _not_ being accounted
in socket_cpumask[] yet), move that call slightly earlier in
start_secondary().

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -381,6 +381,8 @@ void start_secondary(void *unused)
 
 smp_callin();
 
+set_cpu_sibling_map(cpu);
+
 init_percpu_time();
 
 setup_secondary_APIC_clock();
@@ -393,7 +395,6 @@ void start_secondary(void *unused)
 
 /* This must be done before setting cpu_online_map */
 spin_debug_enable();
-set_cpu_sibling_map(cpu);
 notify_cpu_starting(cpu);
 
 /*
--- a/xen/arch/x86/time.c
+++ b/xen/arch/x86/time.c
@@ -88,6 +88,9 @@ static bool __read_mostly using_pit;
 /* Boot timestamp, filled in head.S */
 u64 __initdata boot_tsc_stamp;
 
+/* Per-socket TSC_ADJUST values, for secondary cores/threads to sync to. */
+static uint64_t *__read_mostly tsc_adjust;
+
 /*
  * 32-bit division of integer dividend and integer divisor yielding
  * 32-bit fractional quotient.
@@ -1602,6 +1605,57 @@ void init_percpu_time(void)
 /* Initial estimate for TSC rate. */
 t->tsc_scale = per_cpu(cpu_time, 0).tsc_scale;
 
+if ( tsc_adjust )
+{
+unsigned int socket = cpu_to_socket(smp_processor_id());
+int64_t adj;
+
+/* For now we don't want to come here for the BSP. */
+ASSERT(system_state >= SYS_STATE_smp_boot);
+
+rdmsrl(MSR_IA32_TSC_ADJUST, adj);
+
+/*
+ * Check whether this CPU is the first in a package to come up. In
+ * this case do not check the boot value against another package
+ * because the new package might have been physically hotplugged,
+ * where TSC_ADJUST is expected to be different.
+ */
+if ( cpumask_weight(socket_cpumask[socket]) == 1 )
+{
+/*
+ * On the boot CPU we just force the ADJUST value to 0 if it's non-
+ * zero (in early_time_init()). We don't do that on non-boot CPUs
+ * because physical hotplug should have set the ADJUST register to 
a
+ * value > 0, so the TSC is in sync with the already running CPUs.
+ *
+ * But we always force non-negative ADJUST values. Otherwise the 
TSC
+ * deadline timer creates an interrupt storm.
+ */
+if ( adj < 0 )
+{
+printk(XENLOG_WARNING
+   "TSC ADJUST set to -%lx on CPU%u - clearing\n",
+   -adj, smp_processor_id());
+wrmsrl(MSR_IA32_TSC_ADJUST, 0);
+adj = 0;
+}
+tsc_adjust[socket] = adj;
+}
+else if ( adj != tsc_adjust[socket] )
+{
+static bool __read_mostly warned;
+
+if ( !warned )
+{
+warned = true;
+printk(XENLOG_WARNING
+   "Differing TSC ADJUST values within socket(s) - fixing 
all\n");
+}
+wrmsrl(MSR_IA32_TSC_ADJUST, tsc_adjust[socket]);
+}
+}
+
 local_irq_save(flags);
 now = read_platform_stime(NULL);
 tsc = rdtsc_ordered();
@@ -1788,6 +1842,15 @@ int __init init_xen_time(void)
 /* Finish platform timer initialization. */
 try_platform_timer_tail(false);
 
+/*
+ * Setup space to track per-socket TSC_ADJUST values. Don't fiddle with
+ * values if the TSC is not reported as invariant. Ignore allocation
+ * failure here - most systems won't need any adjustment anyway.
+ */
+if ( boot_cpu_has(X86_FEATURE_TSC_ADJUST) &&
+ boot_cpu_has(X86_FEATURE_ITSC) )
+tsc_adjust = xzalloc_array(uint64_t, nr_sockets);
+
 return 0;
 }
 
@@ -1798,6 +1861,19 @@ void __init early_time_init(void)
 struct cpu_time *t = _cpu(cpu_time);

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 15:12,  wrote:
> On 01/10/18 12:13, Jan Beulich wrote:
> On 01.10.18 at 11:58,  wrote:
>>> Having the allocator return unscrubbed pages is a potential security
>>> concern: some domain can be given pages with memory contents of another
>>> domain. This may happen, for example, if a domain voluntarily releases
>>> its own memory (ballooning being the easiest way for doing this).
>> And we've always said that in this case it's the domain's responsibility
>> to scrub the memory of secrets it cares about. Therefore I'm at the
>> very least missing some background on this change of expectations.
> 
> You were on the call when this was discussed, along with the synchronous
> scrubbing in destroydomain.

Quite possible, but it has been a while.

> Put simply, the current behaviour is not good enough for a number of
> security sensitive usecases.

Well, I'm looking forward for Sergey to expand on this in the commit
message.

> The main reason however for doing this is the optimisations it enables,
> and in particular, not double scrubbing most of our pages.

Well, wait - scrubbing != zeroing (taking into account also what you
say further down).

>>> Change the allocator to always scrub the pages given to it by:
>>>
>>> 1. free_xenheap_pages()
>>> 2. free_domheap_pages()
>>> 3. online_page()
>>> 4. init_heap_pages()
>>>
>>> Performance testing has shown that on multi-node machines bootscrub
>>> vastly outperforms idle-loop scrubbing. So instead of marking all pages
>>> dirty initially, introduce bootscrub_done to track the completion of
>>> the process and eagerly scrub all allocated pages during boot.
>> I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
>> or at least I can't see how that might be skipped (other than due to
>> "bootscrub=0"). I was actually expecting this to change at some
>> point. Am I perhaps simply mis-reading this part of the description?
> 
> No.  Sergey tried that, and found a massive perf difference between
> scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)

That's not something you can reasonably compare, imo: For one,
it is certainly expected for the background scrubbing to be slower,
simply because of other activity on the system. And then 1.2s
looks awfully small for a multi-Tb system. Yet it is mainly large
systems where the synchronous boot time scrubbing is a problem.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread George Dunlap
On 10/01/2018 12:25 PM, Jan Beulich wrote:
 On 01.10.18 at 12:52,  wrote:
>> Olaf Hering writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
>> avoid TSC emulation"):
>>> Am Thu, 13 Sep 2018 09:39:13 +0200
>>> schrieb Olaf Hering :
 this patch was not applied yet, even after a few "pings".
>>>
>>> No reaction since months.
>>> So scrap that patch, just in case it is still part of someones to-consider 
>> queue.
>>
>> I think it would be worth exploring whether this patch could be
>> applied without an explicit ack from Andrew.
>>
>> I confess I ignored all the previous mails because they all started
>>Andrew,
>> or
>>Andrew, Lars,
>> so I assumed that you didn't want attention from other
>> maintainers/committers.
>>
>> Now that I look at the thread it is difficult for me to see the wood
>> for the trees but I don't see unanswered concerns.
> 
> Problem is - discussion around this was (iirc) happening not only on
> the list, but also on irc (including perhaps private chats). It was for
> that reason that I made my R-b conditional upon Andrew at least
> giving an informal go-ahead (otherwise, together with Wei's R-b,
> the patch could have gone in).
> 
> Besides the question of correctness from the perspective of guests
> (which imo is not a problem as the feature needs to be actively
> enabled, and I think we have no reason to keep admins from
> breaking their guests if they really mean to), 

I agree with this, BTW.

> I think the main concern
> was with the way migration of the new value was implemented. But I
> really have to defer to Andrew for that, irrespective of him not
> having responded (on the list) to prior pings.

Is Andrew really the only person who knows enough about migration to
give this the thumbs-up?  Technically migration is in the toolstack, and
so Wei should have double-checked that before giving his review; and
when a question was raised, Wei (as the relevant maintainer who had
given it an R-b) should either have asserted that the code was indeed
correct, or withdrawn his R-b and given Olaf feedback to allow him to
get it into shape.

(This is all based on the history sketched out by Jan above; if there is
more to it then of course this analysis may not be correct.)

I was only skimming the thread, and intended to weigh in at some point;
but I didn't really understand why it was blocked on Andrew in the first
place.

 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Julien Grall

Hi Andrew,

On 10/01/2018 02:11 PM, Andrew Cooper wrote:

On 01/10/18 13:46, Julien Grall wrote:

call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
do SMCCC call, replace all call to the former by the later.

Signed-off-by: Julien Grall 

---

 Changes in v3:
 - Use PSCI_RET where needed
---
  xen/arch/arm/Makefile|  1 -
  xen/arch/arm/platforms/exynos5.c |  3 ++-
  xen/arch/arm/platforms/seattle.c |  4 ++--
  xen/arch/arm/psci.c  | 37 +
  xen/arch/arm/smc.S   | 21 -
  xen/include/asm-arm/processor.h  |  3 ---
  6 files changed, 29 insertions(+), 40 deletions(-)
  delete mode 100644 xen/arch/arm/smc.S

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index b9b141dc84..37fa8268b3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,7 +39,6 @@ obj-y += processor.o
  obj-y += psci.o
  obj-y += setup.o
  obj-y += shutdown.o
-obj-y += smc.o
  obj-y += smp.o
  obj-y += smpboot.o
  obj-y += sysctl.o
diff --git a/xen/arch/arm/platforms/exynos5.c b/xen/arch/arm/platforms/exynos5.c
index c15ecf80f5..e2c0b7b878 100644
--- a/xen/arch/arm/platforms/exynos5.c
+++ b/xen/arch/arm/platforms/exynos5.c
@@ -26,6 +26,7 @@
  #include 
  #include 
  #include 
+#include 
  
  static bool secure_firmware;
  
@@ -249,7 +250,7 @@ static int exynos5_cpu_up(int cpu)

  iounmap(power);
  
  if ( secure_firmware )

-call_smc(SMC_CMD_CPU1BOOT, cpu, 0, 0);
+arm_smccc_smc(SMC_CMD_CPU1BOOT, cpu, NULL);
  
  return cpu_up_send_sgi(cpu);

  }
diff --git a/xen/arch/arm/platforms/seattle.c b/xen/arch/arm/platforms/seattle.c
index 893cc17972..64cc1868c2 100644
--- a/xen/arch/arm/platforms/seattle.c
+++ b/xen/arch/arm/platforms/seattle.c
@@ -33,12 +33,12 @@ static const char * const seattle_dt_compat[] __initconst =
   */
  static void seattle_system_reset(void)
  {
-call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
  }
  
  static void seattle_system_off(void)

  {
-call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
  }
  
  PLATFORM_START(seattle, "SEATTLE")

diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 941eec921b..4ae6504f3e 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -42,42 +42,53 @@ uint32_t smccc_ver;
  
  static uint32_t psci_cpu_on_nr;
  
+#define PSCI_RET(res)   ((int32_t)(res).a0)

+
  int call_psci_cpu_on(int cpu)
  {
-return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
__pa(init_secondary), 0);
+struct arm_smccc_res res;
+
+arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
+  );
+
+return PSCI_RET(res.a0);
  }


Sorry if I'm jumping into the middle of a conversation, but why force
all callers to manually extract the return value when it is a fixed
register?


The SMCCC allows up to 4 return value (a0 ... a3). At the moment, most 
of the caller in Xen only care about one result value. But this will 
change soon (see OP-TEE support in Xen).




Wouldn't it be far easier to do this:

#define arcm_smccc_smc(...) \
 ({  \
 struct arm_smccc_res res;   \
 \
 if ( cpus_have_const_cap(ARM_SMCCC_1_1) )   \
 res = arm_smccc_1_1_smc(__VA_ARGS__);   \
 else\
 res = arm_smccc_1_0_smc(__VA_ARGS__);   \
 \
 (int)res.a0;\


We can't possibly cast here. The interpretation of a0 is different from 
one call to another.



 })

Which also allows the compiler to optimise out the structure if it isn't
read, and also avoids the caller needing to pass a NULL pointer for "I
don't want the result".


While I understand the value, I don't think this would be correct if we 
want to implement an interface with the SMCCC.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 14:39,  wrote:
> On 07/06/18 14:08, Olaf Hering wrote:
>> Add an option to control when vTSC emulation will be activated for a
>> domU with tsc_mode=default. Without such option each TSC access from
>> domU will be emulated, which causes a significant perfomance drop for
>> workloads that make use of rdtsc.
>>
>> One option to avoid the TSC option is to run domUs with tsc_mode=native.
>> This has the drawback that migrating a domU from a "2.3GHz" class host
>> to a "2.4GHz" class host may change the rate at wich the TSC counter
>> increases, the domU may not be prepared for that.
>>
>> With the new option the host admin can decide how a domU should behave
>> when it is migrated across systems of the same class. Since there is
>> always some jitter when Xen calibrates the cpu_khz value, all hosts of
>> the same class will most likely have slightly different values. As a
>> result vTSC emulation is unavoidable. Data collected during the incident
>> which triggered this change showed a jitter of up to 200 KHz across
>> systems of the same class.
> 
> Do you have any further details of the systems involved?  If they are
> identical systems, they should all have the same real TSC frequency, and
> its a known issue that Xen isn't very good at working out the
> frequency.  TBH, fixing that would be far better overall.

Are you convinced all parts match their nominal frequency without
_any_ deviation? If that was the case, we could indeed use CPUID
leaves 0x15 / 0x16 output, if available. But I very much doubt this.
As an example, here's what bare metal Linux says on my newest
system:

tsc: Detected 2600.000 MHz processor
tsc: Refined TSC clocksource calibration: 2591.990 MHz

Xen figures:

(XEN) Detected 2592.107 MHz processor.

And then after another re-boot bare metal Linux again

tsc: Refined TSC clocksource calibration: 2592.008 MHz

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread George Dunlap
On 10/01/2018 02:12 PM, Andrew Cooper wrote:
> On 01/10/18 12:13, Jan Beulich wrote:
> On 01.10.18 at 11:58,  wrote:
>>> Having the allocator return unscrubbed pages is a potential security
>>> concern: some domain can be given pages with memory contents of another
>>> domain. This may happen, for example, if a domain voluntarily releases
>>> its own memory (ballooning being the easiest way for doing this).
>> And we've always said that in this case it's the domain's responsibility
>> to scrub the memory of secrets it cares about. Therefore I'm at the
>> very least missing some background on this change of expectations.
> 
> You were on the call when this was discussed, along with the synchronous
> scrubbing in destroydomain.
> 
> Put simply, the current behaviour is not good enough for a number of
> security sensitive usecases.
> 
> The main reason however for doing this is the optimisations it enables,
> and in particular, not double scrubbing most of our pages.

All of that should have been in the changelog, at least in summary form,
regardless of where else it may have been discussed.

 -George


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation [and 1 more messages]

2018-10-01 Thread Ian Jackson
Jan Beulich writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
avoid TSC emulation"):
> Problem is - discussion around this was (iirc) happening not only on
> the list, but also on irc (including perhaps private chats).

Hrm.  Well, if it didn't happen on the list, it didn't happen.  It is
often useful and productive, of course, to thrash something out on
irc, or ping there, or whatever but:

Conclusions from irc (and from in-person conversations, phone calls,
or other kinds of un-minuted discussions) must be transferred to email
as otherwise they are lost (and the effort of having them is often
wasted as the conversation has to be had again).

If properly writing up the conclusion from an irc conversation is too
hard, pasting a transcript into an email seems a bare minimum.

> It was for that reason that I made my R-b conditional upon Andrew at
> least giving an informal go-ahead (otherwise, together with Wei's
> R-b, the patch could have gone in).

Right.  Thanks for the clarification.


Andrew Cooper writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
avoid TSC emulation"):
> [lots of stuff]

Thanks for the review.  I hope Olaf will be able to address most of
it, but:

Andrew:
> [Olaf:]
> > Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> > The padding is sent as zero in write_tsc_info to the receving host.
> > The padding is undefined if the changed code runs as receiver.
> 
> I'm not sure what you mean by this final sentence.

Me neither.

> > handle_tsc_info has no code to verify that padding is indeed zero. Due
> > to the lack of a version field it is impossible to know if the sender
> > already has the newly introduced vtsc_tolerance field. In the worst
> > case the receiving domU will get an unemulated TSC.
> 
> The lack of padding verification is deliberate, for forwards
> compatibility.  Why does the sending code matter?  One way or another,
> if the field is 0, the option wasn't present or wasn't configured. 
> Neither of these situations affect the decision-making that the
> receiving side needs to perform.

We are talking here about an unused field that is (supposedly?)
always sent as zero ?

AIUI:

In the new code the semantics of zero is "do not allow any tolerance".
The old code handles this correctly.  The issue is with migrations
from new code to old: the tolerance value will silently ignored.

Presumably this is considered preferable to the alternative, which is
to extend the migration stream in a deliberately incompatible way so
that the migration fails.  I think this is what Olaf is trying to
say ?

> > Signed-off-by: Olaf Hering 
> > Reviewed-by: Wei Liu  (v07/v08)
> > Reviewed-by: Jan Beulich  (v08)
> 
> I'm still -0.5 for this patch.  I can appreciate why you want it, but it
> is a gross hack which only works when you don't skew time more than NTP
> in the guest can cope with.

That surely is why there is a limit to the tolerance.  I've asked Olaf
to try to quantify an appropriate limit.

>   My gut feeling is that there will be other
> more subtle fallout.

That's not particularly convincing to me.  But if you could try to be
more specific I think this could be usefull addressed in the
documentation for the feature ?


Olaf, I think the ball is now in your court.  I hope you can address
Andrew's comments, and maybe mine too.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Julien Grall

Hi,

On 10/01/2018 01:46 PM, Julien Grall wrote:

  PLATFORM_START(seattle, "SEATTLE")
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 941eec921b..4ae6504f3e 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -42,42 +42,53 @@ uint32_t smccc_ver;
  
  static uint32_t psci_cpu_on_nr;
  
+#define PSCI_RET(res)   ((int32_t)(res).a0)

+
  int call_psci_cpu_on(int cpu)
  {
-return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
__pa(init_secondary), 0);
+struct arm_smccc_res res;
+
+arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
+  );
+
+return PSCI_RET(res.a0);


Hmmm this should have been PSCI_RET(res). I guess this could be fixed if 
not other change.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] mm/page_alloc: always scrub pages given to the allocator

2018-10-01 Thread Andrew Cooper
On 01/10/18 12:13, Jan Beulich wrote:
 On 01.10.18 at 11:58,  wrote:
>> Having the allocator return unscrubbed pages is a potential security
>> concern: some domain can be given pages with memory contents of another
>> domain. This may happen, for example, if a domain voluntarily releases
>> its own memory (ballooning being the easiest way for doing this).
> And we've always said that in this case it's the domain's responsibility
> to scrub the memory of secrets it cares about. Therefore I'm at the
> very least missing some background on this change of expectations.

You were on the call when this was discussed, along with the synchronous
scrubbing in destroydomain.

Put simply, the current behaviour is not good enough for a number of
security sensitive usecases.

The main reason however for doing this is the optimisations it enables,
and in particular, not double scrubbing most of our pages.

>
>> Change the allocator to always scrub the pages given to it by:
>>
>> 1. free_xenheap_pages()
>> 2. free_domheap_pages()
>> 3. online_page()
>> 4. init_heap_pages()
>>
>> Performance testing has shown that on multi-node machines bootscrub
>> vastly outperforms idle-loop scrubbing. So instead of marking all pages
>> dirty initially, introduce bootscrub_done to track the completion of
>> the process and eagerly scrub all allocated pages during boot.
> I'm afraid I'm somewhat lost: There still is active boot time scrubbing,
> or at least I can't see how that might be skipped (other than due to
> "bootscrub=0"). I was actually expecting this to change at some
> point. Am I perhaps simply mis-reading this part of the description?

No.  Sergey tried that, and found a massive perf difference between
scrubbing in the idle loop and scrubbing at boot.  (1.2s vs 40s iirc)

Scrubbing at boot has some deliberate optimisations to reduce the
pressure on the heap lock, and I expect that is where the performance
difference lies.  It is an issue which wants looking into irrespective
of other changes.

>
>> If bootscrub is disabled, then all pages will be marked as dirty right
>> away and scrubbed either in idle-loop or eagerly during allocation.
>>
>> After this patch, alloc_heap_pages() is guaranteed to return scrubbed
>> pages to a caller unless MEMF_no_scrub flag was provided.
> I also don't understand the point of this: Xen's internal allocations
> have no need to come from scrubbed memory.

This isn't true.  Almost every caller re-zeroes an allocated page which
is the cause of the double scrubbing in most cases.

By having the allocators guarantee to hand out zeroed pages, we can
avoid the redundant scrubbing.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Andrew Cooper
On 01/10/18 13:46, Julien Grall wrote:
> call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
> do SMCCC call, replace all call to the former by the later.
>
> Signed-off-by: Julien Grall 
>
> ---
>
> Changes in v3:
> - Use PSCI_RET where needed
> ---
>  xen/arch/arm/Makefile|  1 -
>  xen/arch/arm/platforms/exynos5.c |  3 ++-
>  xen/arch/arm/platforms/seattle.c |  4 ++--
>  xen/arch/arm/psci.c  | 37 +
>  xen/arch/arm/smc.S   | 21 -
>  xen/include/asm-arm/processor.h  |  3 ---
>  6 files changed, 29 insertions(+), 40 deletions(-)
>  delete mode 100644 xen/arch/arm/smc.S
>
> diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
> index b9b141dc84..37fa8268b3 100644
> --- a/xen/arch/arm/Makefile
> +++ b/xen/arch/arm/Makefile
> @@ -39,7 +39,6 @@ obj-y += processor.o
>  obj-y += psci.o
>  obj-y += setup.o
>  obj-y += shutdown.o
> -obj-y += smc.o
>  obj-y += smp.o
>  obj-y += smpboot.o
>  obj-y += sysctl.o
> diff --git a/xen/arch/arm/platforms/exynos5.c 
> b/xen/arch/arm/platforms/exynos5.c
> index c15ecf80f5..e2c0b7b878 100644
> --- a/xen/arch/arm/platforms/exynos5.c
> +++ b/xen/arch/arm/platforms/exynos5.c
> @@ -26,6 +26,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  static bool secure_firmware;
>  
> @@ -249,7 +250,7 @@ static int exynos5_cpu_up(int cpu)
>  iounmap(power);
>  
>  if ( secure_firmware )
> -call_smc(SMC_CMD_CPU1BOOT, cpu, 0, 0);
> +arm_smccc_smc(SMC_CMD_CPU1BOOT, cpu, NULL);
>  
>  return cpu_up_send_sgi(cpu);
>  }
> diff --git a/xen/arch/arm/platforms/seattle.c 
> b/xen/arch/arm/platforms/seattle.c
> index 893cc17972..64cc1868c2 100644
> --- a/xen/arch/arm/platforms/seattle.c
> +++ b/xen/arch/arm/platforms/seattle.c
> @@ -33,12 +33,12 @@ static const char * const seattle_dt_compat[] __initconst 
> =
>   */
>  static void seattle_system_reset(void)
>  {
> -call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
>  }
>  
>  static void seattle_system_off(void)
>  {
> -call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
> +arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
>  }
>  
>  PLATFORM_START(seattle, "SEATTLE")
> diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
> index 941eec921b..4ae6504f3e 100644
> --- a/xen/arch/arm/psci.c
> +++ b/xen/arch/arm/psci.c
> @@ -42,42 +42,53 @@ uint32_t smccc_ver;
>  
>  static uint32_t psci_cpu_on_nr;
>  
> +#define PSCI_RET(res)   ((int32_t)(res).a0)
> +
>  int call_psci_cpu_on(int cpu)
>  {
> -return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
> __pa(init_secondary), 0);
> +struct arm_smccc_res res;
> +
> +arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
> +  );
> +
> +return PSCI_RET(res.a0);
>  }

Sorry if I'm jumping into the middle of a conversation, but why force
all callers to manually extract the return value when it is a fixed
register?

Wouldn't it be far easier to do this:

#define arcm_smccc_smc(...) \
({  \
struct arm_smccc_res res;   \
\
if ( cpus_have_const_cap(ARM_SMCCC_1_1) )   \
res = arm_smccc_1_1_smc(__VA_ARGS__);   \
else\
res = arm_smccc_1_0_smc(__VA_ARGS__);   \
\
(int)res.a0;\
})

Which also allows the compiler to optimise out the structure if it isn't
read, and also avoids the caller needing to pass a NULL pointer for "I
don't want the result".

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [[PATCH v3] 1/4] xen/arm: add SMC wrapper that is compatible with SMCCC v1.0

2018-10-01 Thread Julien Grall
From: Volodymyr Babchuk 

Existing SMC wrapper call_smc() allows only 4 parameters and
returns only one value. This is enough for existing
use in PSCI code, but TEE mediator will need a call that is
fully compatible with ARM SMCCC v1.0.

This patch adds a wrapper for both arm32 and arm64. In the case of
arm32, the wrapper is just an alias to the ARM SMCCC v1.1 as the
convention is the same.

CC: "Edgar E. Iglesias" 
Signed-off-by: Volodymyr Babchuk 
[julien: Rework the wrapper to make it closer to SMCC 1.1 wrapper]
Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 

---
Changes in v3:
- Add Stefano's reviewed-by
---
 xen/arch/arm/arm64/Makefile  |  1 +
 xen/arch/arm/arm64/asm-offsets.c |  5 
 xen/arch/arm/arm64/smc.S | 32 +
 xen/include/asm-arm/smccc.h  | 51 +++-
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 xen/arch/arm/arm64/smc.S

diff --git a/xen/arch/arm/arm64/Makefile b/xen/arch/arm/arm64/Makefile
index bb5c610b2a..c4f3a28a0d 100644
--- a/xen/arch/arm/arm64/Makefile
+++ b/xen/arch/arm/arm64/Makefile
@@ -8,6 +8,7 @@ obj-y += domain.o
 obj-y += entry.o
 obj-y += insn.o
 obj-$(CONFIG_LIVEPATCH) += livepatch.o
+obj-y += smc.o
 obj-y += smpboot.o
 obj-y += traps.o
 obj-y += vfp.o
diff --git a/xen/arch/arm/arm64/asm-offsets.c b/xen/arch/arm/arm64/asm-offsets.c
index 62833d8c8b..280ddb55bf 100644
--- a/xen/arch/arm/arm64/asm-offsets.c
+++ b/xen/arch/arm/arm64/asm-offsets.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define DEFINE(_sym, _val) \
 asm volatile ("\n.ascii\"==>#define " #_sym " %0 /* " #_val " */<==\"" \
@@ -51,6 +52,10 @@ void __dummy__(void)
 
BLANK();
OFFSET(INITINFO_stack, struct init_info, stack);
+
+   BLANK();
+   OFFSET(SMCCC_RES_a0, struct arm_smccc_res, a0);
+   OFFSET(SMCCC_RES_a2, struct arm_smccc_res, a2);
 }
 
 /*
diff --git a/xen/arch/arm/arm64/smc.S b/xen/arch/arm/arm64/smc.S
new file mode 100644
index 00..b0752be57e
--- /dev/null
+++ b/xen/arch/arm/arm64/smc.S
@@ -0,0 +1,32 @@
+/*
+ * xen/arch/arm/arm64/smc.S
+ *
+ * Wrapper for Secure Monitors Calls
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+
+/*
+ * void __arm_smccc_1_0_smc(register_t a0, register_t a1, register_t a2,
+ *  register_t a3, register_t a4, register_t a5,
+ *  register_t a6, register_t a7,
+ *  struct arm_smccc_res *res)
+ */
+ENTRY(__arm_smccc_1_0_smc)
+smc #0
+ldr x4, [sp]
+cbz x4, 1f  /* No need to store the result */
+stp x0, x1, [x4, #SMCCC_RES_a0]
+stp x2, x3, [x4, #SMCCC_RES_a2]
+1:
+ret
diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
index 648bef28bd..1ed6cbaa48 100644
--- a/xen/include/asm-arm/smccc.h
+++ b/xen/include/asm-arm/smccc.h
@@ -207,7 +207,56 @@ struct arm_smccc_res {
 *___res = (typeof(*___res)){r0, r1, r2, r3};\
 } while ( 0 )
 
-#endif
+/*
+ * The calling convention for arm32 is the same for both SMCCC v1.0 and
+ * v1.1.
+ */
+#ifdef CONFIG_ARM_32
+#define arm_smccc_1_0_smc(...) arm_smccc_1_1_smc(__VA_ARGS__)
+#else
+
+void __arm_smccc_1_0_smc(register_t a0, register_t a1, register_t a2,
+ register_t a3, register_t a4, register_t a5,
+ register_t a6, register_t a7,
+ struct arm_smccc_res *res);
+
+/* Macros to handle variadic parameter for SMCCC v1.0 helper */
+#define __arm_smccc_1_0_smc_7(a0, a1, a2, a3, a4, a5, a6, a7, res)  \
+__arm_smccc_1_0_smc(a0, a1, a2, a3, a4, a5, a6, a7, res)
+
+#define __arm_smccc_1_0_smc_6(a0, a1, a2, a3, a4, a5, a6, res)  \
+__arm_smccc_1_0_smc_7(a0, a1, a2, a3, a4, a5, a6, 0, res)
+
+#define __arm_smccc_1_0_smc_5(a0, a1, a2, a3, a4, a5, res)  \
+__arm_smccc_1_0_smc_6(a0, a1, a2, a3, a4, a5, 0, res)
+
+#define __arm_smccc_1_0_smc_4(a0, a1, a2, a3, a4, res)  \
+__arm_smccc_1_0_smc_5(a0, a1, a2, a3, a4, 0, res)
+
+#define __arm_smccc_1_0_smc_3(a0, a1, a2, a3, res)  \
+__arm_smccc_1_0_smc_4(a0, a1, a2, a3, 0, res)
+
+#define __arm_smccc_1_0_smc_2(a0, a1, a2, res)  \
+__arm_smccc_1_0_smc_3(a0, a1, a2, 0, res)
+
+#define __arm_smccc_1_0_smc_1(a0, a1, res)  \
+__arm_smccc_1_0_smc_2(a0, a1, 0, res)
+
+#define __arm_smccc_1_0_smc_0(a0, res)  \
+__arm_smccc_1_0_smc_1(a0, 0, res)
+
+#define 

[Xen-devel] [xen-unstable-smoke test] 128279: tolerable all pass - PUSHED

2018-10-01 Thread osstest service owner
flight 128279 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128279/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  146bb7e1934afc5056179fffbf8e24c0db415f8c
baseline version:
 xen  edb4724e36256c495a6aa3cf1a12722efe271f9d

Last test of basis   128209  2018-09-28 21:06:31 Z2 days
Testing same since   128279  2018-10-01 11:00:41 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-i386 pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   edb4724e36..146bb7e193  146bb7e1934afc5056179fffbf8e24c0db415f8c -> smoke

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [[PATCH v3] 2/4] xen/arm: cpufeature: Add helper to check constant caps

2018-10-01 Thread Julien Grall
Some capababilities are set right during boot and will never change
afterwards. At the moment, the function cpu_have_caps will check whether
the cap is enabled from the memory.

It is possible to avoid the load from the memory by using an
ALTERNATIVE. With that the check is just reduced to 1 instruction.

Signed-off-by: Julien Grall 

---

This is the static key for the poor. At some point we might want to
introduce something similar to static key in Xen.

Changes in v2:
- Use unlikely
---
 xen/include/asm-arm/cpufeature.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index 3de6b54301..c6cbc2ec84 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -63,6 +63,18 @@ static inline bool cpus_have_cap(unsigned int num)
 return test_bit(num, cpu_hwcaps);
 }
 
+/* System capability check for constant cap */
+#define cpus_have_const_cap(num) ({ \
+bool __ret; \
+\
+asm volatile (ALTERNATIVE("mov %0, #0", \
+  "mov %0, #1", \
+  num)  \
+  : "=r" (__ret));  \
+\
+unlikely(__ret);\
+})
+
 static inline void cpus_set_cap(unsigned int num)
 {
 if (num >= ARM_NCAPS)
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [[PATCH v3] 4/4] xen/arm: Replace call_smc with arm_smccc_smc

2018-10-01 Thread Julien Grall
call_smc is a subset of arm_smccc_smc. Rather than having 2 methods to
do SMCCC call, replace all call to the former by the later.

Signed-off-by: Julien Grall 

---

Changes in v3:
- Use PSCI_RET where needed
---
 xen/arch/arm/Makefile|  1 -
 xen/arch/arm/platforms/exynos5.c |  3 ++-
 xen/arch/arm/platforms/seattle.c |  4 ++--
 xen/arch/arm/psci.c  | 37 +
 xen/arch/arm/smc.S   | 21 -
 xen/include/asm-arm/processor.h  |  3 ---
 6 files changed, 29 insertions(+), 40 deletions(-)
 delete mode 100644 xen/arch/arm/smc.S

diff --git a/xen/arch/arm/Makefile b/xen/arch/arm/Makefile
index b9b141dc84..37fa8268b3 100644
--- a/xen/arch/arm/Makefile
+++ b/xen/arch/arm/Makefile
@@ -39,7 +39,6 @@ obj-y += processor.o
 obj-y += psci.o
 obj-y += setup.o
 obj-y += shutdown.o
-obj-y += smc.o
 obj-y += smp.o
 obj-y += smpboot.o
 obj-y += sysctl.o
diff --git a/xen/arch/arm/platforms/exynos5.c b/xen/arch/arm/platforms/exynos5.c
index c15ecf80f5..e2c0b7b878 100644
--- a/xen/arch/arm/platforms/exynos5.c
+++ b/xen/arch/arm/platforms/exynos5.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 
 static bool secure_firmware;
 
@@ -249,7 +250,7 @@ static int exynos5_cpu_up(int cpu)
 iounmap(power);
 
 if ( secure_firmware )
-call_smc(SMC_CMD_CPU1BOOT, cpu, 0, 0);
+arm_smccc_smc(SMC_CMD_CPU1BOOT, cpu, NULL);
 
 return cpu_up_send_sgi(cpu);
 }
diff --git a/xen/arch/arm/platforms/seattle.c b/xen/arch/arm/platforms/seattle.c
index 893cc17972..64cc1868c2 100644
--- a/xen/arch/arm/platforms/seattle.c
+++ b/xen/arch/arm/platforms/seattle.c
@@ -33,12 +33,12 @@ static const char * const seattle_dt_compat[] __initconst =
  */
 static void seattle_system_reset(void)
 {
-call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
 }
 
 static void seattle_system_off(void)
 {
-call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
 }
 
 PLATFORM_START(seattle, "SEATTLE")
diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 941eec921b..4ae6504f3e 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -42,42 +42,53 @@ uint32_t smccc_ver;
 
 static uint32_t psci_cpu_on_nr;
 
+#define PSCI_RET(res)   ((int32_t)(res).a0)
+
 int call_psci_cpu_on(int cpu)
 {
-return call_smc(psci_cpu_on_nr, cpu_logical_map(cpu), 
__pa(init_secondary), 0);
+struct arm_smccc_res res;
+
+arm_smccc_smc(psci_cpu_on_nr, cpu_logical_map(cpu), __pa(init_secondary),
+  );
+
+return PSCI_RET(res.a0);
 }
 
 void call_psci_cpu_off(void)
 {
 if ( psci_ver > PSCI_VERSION(0, 1) )
 {
-int errno;
+struct arm_smccc_res res;
 
 /* If successfull the PSCI cpu_off call doesn't return */
-errno = call_smc(PSCI_0_2_FN32_CPU_OFF, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_CPU_OFF, );
 panic("PSCI cpu off failed for CPU%d err=%d\n", smp_processor_id(),
-  errno);
+  PSCI_RET(res));
 }
 }
 
 void call_psci_system_off(void)
 {
 if ( psci_ver > PSCI_VERSION(0, 1) )
-call_smc(PSCI_0_2_FN32_SYSTEM_OFF, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_OFF, NULL);
 }
 
 void call_psci_system_reset(void)
 {
 if ( psci_ver > PSCI_VERSION(0, 1) )
-call_smc(PSCI_0_2_FN32_SYSTEM_RESET, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_SYSTEM_RESET, NULL);
 }
 
 static int __init psci_features(uint32_t psci_func_id)
 {
+struct arm_smccc_res res;
+
 if ( psci_ver < PSCI_VERSION(1, 0) )
 return PSCI_NOT_SUPPORTED;
 
-return call_smc(PSCI_1_0_FN32_PSCI_FEATURES, psci_func_id, 0, 0);
+arm_smccc_smc(PSCI_1_0_FN32_PSCI_FEATURES, psci_func_id, NULL);
+
+return PSCI_RET(res);
 }
 
 static int __init psci_is_smc_method(const struct dt_device_node *psci)
@@ -112,11 +123,11 @@ static void __init psci_init_smccc(void)
 
 if ( psci_features(ARM_SMCCC_VERSION_FID) != PSCI_NOT_SUPPORTED )
 {
-uint32_t ret;
+struct arm_smccc_res res;
 
-ret = call_smc(ARM_SMCCC_VERSION_FID, 0, 0, 0);
-if ( ret != ARM_SMCCC_NOT_SUPPORTED )
-smccc_ver = ret;
+arm_smccc_smc(ARM_SMCCC_VERSION_FID, );
+if ( PSCI_RET(res) != ARM_SMCCC_NOT_SUPPORTED )
+smccc_ver = PSCI_RET(res);
 }
 
 if ( smccc_ver >= SMCCC_VERSION(1, 1) )
@@ -165,6 +176,7 @@ static int __init psci_init_0_2(void)
 { /* sentinel */ },
 };
 int ret;
+struct arm_smccc_res res;
 
 if ( acpi_disabled )
 {
@@ -186,7 +198,8 @@ static int __init psci_init_0_2(void)
 }
 }
 
-psci_ver = call_smc(PSCI_0_2_FN32_PSCI_VERSION, 0, 0, 0);
+arm_smccc_smc(PSCI_0_2_FN32_PSCI_VERSION, );
+psci_ver = PSCI_RET(res);
 
 /* For the moment, we only support PSCI 0.2 and PSCI 1.x */
 if ( psci_ver != PSCI_VERSION(0, 2) && PSCI_VERSION_MAJOR(psci_ver) 

[Xen-devel] [[PATCH v3] 0/4] xen/arm: SMCCC fixup and improvement

2018-10-01 Thread Julien Grall
Hi all,

This patch series contains fixup and improvement for the SMCCC subsystem.

Cheers,

Julien Grall (3):
  xen/arm: cpufeature: Add helper to check constant caps
  xen/arm: smccc: Add wrapper to automatically select the calling
convention
  xen/arm: Replace call_smc with arm_smccc_smc

Volodymyr Babchuk (1):
  xen/arm: add SMC wrapper that is compatible with SMCCC v1.0

 xen/arch/arm/Makefile|  1 -
 xen/arch/arm/arm64/Makefile  |  1 +
 xen/arch/arm/arm64/asm-offsets.c |  5 
 xen/arch/arm/arm64/smc.S | 32 +
 xen/arch/arm/platforms/exynos5.c |  3 +-
 xen/arch/arm/platforms/seattle.c |  4 +--
 xen/arch/arm/psci.c  | 41 ++
 xen/arch/arm/smc.S   | 21 --
 xen/include/asm-arm/cpufeature.h | 15 +-
 xen/include/asm-arm/processor.h  |  3 --
 xen/include/asm-arm/smccc.h  | 62 +++-
 11 files changed, 146 insertions(+), 42 deletions(-)
 create mode 100644 xen/arch/arm/arm64/smc.S
 delete mode 100644 xen/arch/arm/smc.S

-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [[PATCH v3] 3/4] xen/arm: smccc: Add wrapper to automatically select the calling convention

2018-10-01 Thread Julien Grall
Signed-off-by: Julien Grall 
Reviewed-by: Stefano Stabellini 
Reviewed-by: Volodymyr Babchuk 

---
Changes in v3:
- Add Stefano's and Volodymyr's reviewed-by

Changes in v2:
- Invert the condition
- Add missing includes
---
 xen/arch/arm/psci.c  |  4 
 xen/include/asm-arm/cpufeature.h |  3 ++-
 xen/include/asm-arm/smccc.h  | 11 +++
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/psci.c b/xen/arch/arm/psci.c
index 3cf5ecf0f3..941eec921b 100644
--- a/xen/arch/arm/psci.c
+++ b/xen/arch/arm/psci.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -118,6 +119,9 @@ static void __init psci_init_smccc(void)
 smccc_ver = ret;
 }
 
+if ( smccc_ver >= SMCCC_VERSION(1, 1) )
+cpus_set_cap(ARM_SMCCC_1_1);
+
 printk(XENLOG_INFO "Using SMC Calling Convention v%u.%u\n",
SMCCC_VERSION_MAJOR(smccc_ver), SMCCC_VERSION_MINOR(smccc_ver));
 }
diff --git a/xen/include/asm-arm/cpufeature.h b/xen/include/asm-arm/cpufeature.h
index c6cbc2ec84..2d82264427 100644
--- a/xen/include/asm-arm/cpufeature.h
+++ b/xen/include/asm-arm/cpufeature.h
@@ -44,8 +44,9 @@
 #define SKIP_CTXT_SWITCH_SERROR_SYNC 6
 #define ARM_HARDEN_BRANCH_PREDICTOR 7
 #define ARM_SSBD 8
+#define ARM_SMCCC_1_1 9
 
-#define ARM_NCAPS   9
+#define ARM_NCAPS   10
 
 #ifndef __ASSEMBLY__
 
diff --git a/xen/include/asm-arm/smccc.h b/xen/include/asm-arm/smccc.h
index 1ed6cbaa48..126399dd70 100644
--- a/xen/include/asm-arm/smccc.h
+++ b/xen/include/asm-arm/smccc.h
@@ -16,6 +16,9 @@
 #ifndef __ASM_ARM_SMCCC_H__
 #define __ASM_ARM_SMCCC_H__
 
+#include 
+#include 
+
 #define SMCCC_VERSION_MAJOR_SHIFT16
 #define SMCCC_VERSION_MINOR_MASK \
 ((1U << SMCCC_VERSION_MAJOR_SHIFT) - 1)
@@ -213,6 +216,7 @@ struct arm_smccc_res {
  */
 #ifdef CONFIG_ARM_32
 #define arm_smccc_1_0_smc(...) arm_smccc_1_1_smc(__VA_ARGS__)
+#define arm_smccc_smc(...) arm_smccc_1_1_smc(__VA_ARGS__)
 #else
 
 void __arm_smccc_1_0_smc(register_t a0, register_t a1, register_t a2,
@@ -254,6 +258,13 @@ void __arm_smccc_1_0_smc(register_t a0, register_t a1, 
register_t a2,
 #define arm_smccc_1_0_smc(...)  \
 __arm_smccc_1_0_smc_count(__count_args(__VA_ARGS__), __VA_ARGS__)
 
+#define arm_smccc_smc(...)  \
+do {\
+if ( cpus_have_const_cap(ARM_SMCCC_1_1) )   \
+arm_smccc_1_1_smc(__VA_ARGS__); \
+else\
+arm_smccc_1_0_smc(__VA_ARGS__); \
+} while ( 0 )
 #endif /* CONFIG_ARM_64 */
 
 #endif /* __ASSEMBLY__ */
-- 
2.11.0


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Andrew Cooper
On 07/06/18 14:08, Olaf Hering wrote:
> Add an option to control when vTSC emulation will be activated for a
> domU with tsc_mode=default. Without such option each TSC access from
> domU will be emulated, which causes a significant perfomance drop for
> workloads that make use of rdtsc.
>
> One option to avoid the TSC option is to run domUs with tsc_mode=native.
> This has the drawback that migrating a domU from a "2.3GHz" class host
> to a "2.4GHz" class host may change the rate at wich the TSC counter
> increases, the domU may not be prepared for that.
>
> With the new option the host admin can decide how a domU should behave
> when it is migrated across systems of the same class. Since there is
> always some jitter when Xen calibrates the cpu_khz value, all hosts of
> the same class will most likely have slightly different values. As a
> result vTSC emulation is unavoidable. Data collected during the incident
> which triggered this change showed a jitter of up to 200 KHz across
> systems of the same class.

Do you have any further details of the systems involved?  If they are
identical systems, they should all have the same real TSC frequency, and
its a known issue that Xen isn't very good at working out the
frequency.  TBH, fixing that would be far better overall.

>
> Existing padding fields are reused to store vtsc_khz_tolerance as u16.
> The padding is sent as zero in write_tsc_info to the receving host.
> The padding is undefined if the changed code runs as receiver.

I'm not sure what you mean by this final sentence.

> handle_tsc_info has no code to verify that padding is indeed zero. Due
> to the lack of a version field it is impossible to know if the sender
> already has the newly introduced vtsc_tolerance field. In the worst
> case the receiving domU will get an unemulated TSC.

The lack of padding verification is deliberate, for forwards
compatibility.  Why does the sending code matter?  One way or another,
if the field is 0, the option wasn't present or wasn't configured. 
Neither of these situations affect the decision-making that the
receiving side needs to perform.

>
> Signed-off-by: Olaf Hering 
> Reviewed-by: Wei Liu  (v07/v08)
> Reviewed-by: Jan Beulich  (v08)

I'm still -0.5 for this patch.  I can appreciate why you want it, but it
is a gross hack which only works when you don't skew time more than NTP
in the guest can cope with.  My gut feeling is that there will be other
more subtle fallout.

As for the implementation itself, a few trivial comments.

> diff --git a/docs/man/xen-tscmode.pod.7 b/docs/man/xen-tscmode.pod.7
> index 3bbc96f201..122ae36679 100644
> --- a/docs/man/xen-tscmode.pod.7
> +++ b/docs/man/xen-tscmode.pod.7
> @@ -99,6 +99,9 @@ whether or not the VM has been saved/restored/migrated
>  
>  =back
>  
> +If the tsc_mode is set to "default" the decision to emulate TSC can be
> +tweaked further with the "vtsc_tolerance_khz" option.
> +
>  To understand this in more detail, the rest of this document must
>  be read.
>  
> @@ -211,6 +214,19 @@ is emulated.  Note that, though emulated, the "apparent" 
> TSC frequency
>  will be the TSC frequency of the initial physical machine, even after
>  migration.
>  
> +Since the calibration of the TSC frequency may not be 100% accurate, the
> +exact value of the frequency can change even across reboots.

It can change across reboots for other reasons, e.g. firmware settings.

I'd phrase this as "Since the calibration of the TSC frequency isn't
100% accurate, the value measured by Xen can vary across reboots".

>  This means
> +also several otherwise identical systems can have a slightly different
> +TSC frequency. As a result TSC access will be emulated if a domU is
> +migrated from one host to another, identical host. To avoid the
> +performance impact of TSC emulation a certain tolerance of the measured
> +host TSC frequency can be specified with "vtsc_tolerance_khz". If the
> +measured "cpu_khz" value is within the tolerance range, TSC access
> +remains native. Otherwise it will be emulated. This allows to migrate
> +domUs between identical hardware. If the domU will be migrated to a
> +different kind of hardware, say from a "2.3GHz" to a "2.5GHz" system,
> +TSC will be emualted to maintain the TSC frequency expected by the domU.
> +
>  For environments where both TSC-safeness AND highest performance
>  even across migration is a requirement, application code can be specially
>  modified to use an algorithm explicitly designed into Xen for this purpose.
> diff --git a/docs/man/xl.cfg.pod.5.in b/docs/man/xl.cfg.pod.5.in
> index 47d88243b1..995277794f 100644
> --- a/docs/man/xl.cfg.pod.5.in
> +++ b/docs/man/xl.cfg.pod.5.in
> @@ -1898,6 +1898,16 @@ determined in a similar way to that of B TSC 
> mode.
>  
>  Please see B for more information on this option.
>  
> +=item B
> +
> +B<(x86 only, relevant only for tsc_mode=default)>
> +When a domU is started, the CPU frequency of the host is used by the domU for
> +TSC related time 

[Xen-devel] [xen-unstable test] 128267: regressions - FAIL

2018-10-01 Thread osstest service owner
flight 128267 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/128267/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-migrupgrade 22 guest-migrate/src_host/dst_host fail REGR. vs. 
128084
 test-amd64-i386-migrupgrade 22 guest-migrate/src_host/dst_host fail REGR. vs. 
128084

Tests which are failing intermittently (not blocking):
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 16 guest-localmigrate/x10 fail pass 
in 128240

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 128084
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 128084
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 128084
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 128084
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 128084
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 128084
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 128084
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 128084
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 128084
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemuu-win10-i386 10 windows-install fail never pass
 test-amd64-amd64-xl-qemut-win10-i386 10 windows-installfail never pass
 test-amd64-i386-xl-qemut-win10-i386 10 windows-install fail never pass

version targeted for testing:
 xen  edb4724e36256c495a6aa3cf1a12722efe271f9d
baseline version:
 xen  940185b2f6f343251c2b83bd96e599398cea51ec

Last test of basis   128084  2018-09-26 01:51:53 Z5 days
Failing since128118  2018-09-27 00:37:03 Z4 days4 attempts

[Xen-devel] [PATCH v2 4/4] x86: support "pv-l1tf=default"

2018-10-01 Thread Jan Beulich
Just like the otherwise similar "xpti=" allows for, to revert back to
built-in defaults.

Signed-off-by: Jan Beulich 
---
v2: Split out into separate patch.

--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1604,7 +1604,7 @@ certain you don't plan on having PV gues
 turning it off can reduce the attack surface.
 
 ### pv-l1tf (x86)
-> `= List of [ , dom0=, domu= ]`
+> `= List of [ default, , dom0=, domu= ]`
 
 > Default: `false` on believed-unaffected hardware, or in pv-shim mode.
 >  `domu`  on believed-affected hardware.
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -252,7 +252,9 @@ static __init int parse_pv_l1tf(const ch
 break;
 
 default:
-if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
+if ( !strcmp(s, "default") )
+opt_pv_l1tf_hwdom = opt_pv_l1tf_domu = -1;
+else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
 opt_pv_l1tf_hwdom = val;
 else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
 opt_pv_l1tf_domu = val;





___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 3/4] x86: fix "xpti=" and "pv-l1tf=" yet again

2018-10-01 Thread Jan Beulich
While commit 2a3b34ec47 ("x86/spec-ctrl: Yet more fixes for xpti=
parsing") indeed fixed "xpti=dom0", it broke "xpti=no-dom0", in that
this then became equivalent to "xpti=no". In particular, the presence
of "xpti=" alone on the command line means nothing as to which default
is to be overridden; "xpti=no-dom0", for example, ought to have no
effect for DomU-s, as this is distinct from both "xpti=no-dom0,domu"
and "xpti=no-dom0,no-domu".

Signed-off-by: Jan Beulich 
---
v2: Fix copy-and-paste mistake in parse_pv_l1tf(). Split off log message
silencing. Re-base over patches splitting opt_{xpti,pv_l1tf}.

--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -232,12 +232,6 @@ static __init int parse_pv_l1tf(const ch
 const char *ss;
 int val, rc = 0;
 
-/* Inhibit the defaults as an explicit choice has been given. */
-if ( opt_pv_l1tf_hwdom == -1 )
-opt_pv_l1tf_hwdom = 0;
-if ( opt_pv_l1tf_domu == -1 )
-opt_pv_l1tf_domu = 0;
-
 /* Interpret 'pv-l1tf' alone in its positive boolean form. */
 if ( *s == '\0' )
 opt_pv_l1tf_hwdom = opt_pv_l1tf_domu = 1;
@@ -699,12 +693,6 @@ static __init int parse_xpti(const char
 const char *ss;
 int val, rc = 0;
 
-/* Inhibit the defaults as an explicit choice has been given. */
-if ( opt_xpti_hwdom == -1 )
-opt_xpti_hwdom = 0;
-if ( opt_xpti_domu == -1 )
-opt_xpti_domu = 0;
-
 /* Interpret 'xpti' alone in its positive boolean form. */
 if ( *s == '\0' )
 opt_xpti_hwdom = opt_xpti_domu = 1;





___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 2/4] x86: split opt_pv_l1tf

2018-10-01 Thread Jan Beulich
Use separate tracking variables for the hardware domain and DomU-s.

No functional change intended, but adjust the comment in
init_speculation_mitigations() to match prior as well as resulting code.

Signed-off-by: Jan Beulich 
---
v2: New.

--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -143,8 +143,10 @@ static int __init parse_spec_ctrl(const
 if ( opt_smt < 0 )
 opt_smt = 1;
 
-if ( opt_pv_l1tf < 0 )
-opt_pv_l1tf = 0;
+if ( opt_pv_l1tf_hwdom < 0 )
+opt_pv_l1tf_hwdom = 0;
+if ( opt_pv_l1tf_domu < 0 )
+opt_pv_l1tf_domu = 0;
 
 disable_common:
 opt_rsb_pv = false;
@@ -222,7 +224,8 @@ static int __init parse_spec_ctrl(const
 }
 custom_param("spec-ctrl", parse_spec_ctrl);
 
-int8_t __read_mostly opt_pv_l1tf = -1;
+int8_t __read_mostly opt_pv_l1tf_hwdom = -1;
+int8_t __read_mostly opt_pv_l1tf_domu = -1;
 
 static __init int parse_pv_l1tf(const char *s)
 {
@@ -230,12 +233,14 @@ static __init int parse_pv_l1tf(const ch
 int val, rc = 0;
 
 /* Inhibit the defaults as an explicit choice has been given. */
-if ( opt_pv_l1tf == -1 )
-opt_pv_l1tf = 0;
+if ( opt_pv_l1tf_hwdom == -1 )
+opt_pv_l1tf_hwdom = 0;
+if ( opt_pv_l1tf_domu == -1 )
+opt_pv_l1tf_domu = 0;
 
 /* Interpret 'pv-l1tf' alone in its positive boolean form. */
 if ( *s == '\0' )
-opt_pv_l1tf = OPT_PV_L1TF_DOM0 | OPT_PV_L1TF_DOMU;
+opt_pv_l1tf_hwdom = opt_pv_l1tf_domu = 1;
 
 do {
 ss = strchr(s, ',');
@@ -245,20 +250,18 @@ static __init int parse_pv_l1tf(const ch
 switch ( parse_bool(s, ss) )
 {
 case 0:
-opt_pv_l1tf = 0;
+opt_pv_l1tf_hwdom = opt_pv_l1tf_domu = 0;
 break;
 
 case 1:
-opt_pv_l1tf = OPT_PV_L1TF_DOM0 | OPT_PV_L1TF_DOMU;
+opt_pv_l1tf_hwdom = opt_pv_l1tf_domu = 1;
 break;
 
 default:
 if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-opt_pv_l1tf = ((opt_pv_l1tf & ~OPT_PV_L1TF_DOM0) |
-   (val ? OPT_PV_L1TF_DOM0 : 0));
+opt_pv_l1tf_hwdom = val;
 else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-opt_pv_l1tf = ((opt_pv_l1tf & ~OPT_PV_L1TF_DOMU) |
-   (val ? OPT_PV_L1TF_DOMU : 0));
+opt_pv_l1tf_domu = val;
 else if ( *s )
 rc = -EINVAL;
 break;
@@ -321,7 +324,7 @@ static void __init print_details(enum in
opt_l1d_flush ? " L1D_FLUSH" : "");
 
 /* L1TF diagnostics, printed if vulnerable or PV shadowing is in use. */
-if ( cpu_has_bug_l1tf || opt_pv_l1tf )
+if ( cpu_has_bug_l1tf || opt_pv_l1tf_hwdom || opt_pv_l1tf_domu )
 printk("  L1TF: believed%s vulnerable, maxphysaddr L1D %u, CPUID %u"
", Safe address %"PRIx64"\n",
cpu_has_bug_l1tf ? "" : " not",
@@ -356,8 +359,8 @@ static void __init print_details(enum in
xpti_pcid_enabled() ? "" : "out");
 
 printk("  PV L1TF shadowing: Dom0 %s, DomU %s\n",
-   opt_pv_l1tf & OPT_PV_L1TF_DOM0  ? "enabled"  : "disabled",
-   opt_pv_l1tf & OPT_PV_L1TF_DOMU  ? "enabled"  : "disabled");
+   opt_pv_l1tf_hwdom ? "enabled"  : "disabled",
+   opt_pv_l1tf_domu  ? "enabled"  : "disabled");
 #endif
 }
 
@@ -889,18 +892,16 @@ void __init init_speculation_mitigations
 
 /*
  * By default, enable PV domU L1TF mitigations on all L1TF-vulnerable
- * hardware, except when running in shim mode.
+ * hardware, except when running in shim mode, and - at least for the
+ * time being - also excepting the hardware domain.
  *
  * In shim mode, SHADOW is expected to be compiled out, and a malicious
  * guest kernel can only attack the shim Xen, not the host Xen.
  */
-if ( opt_pv_l1tf == -1 )
-{
-if ( pv_shim || !cpu_has_bug_l1tf )
-opt_pv_l1tf = 0;
-else
-opt_pv_l1tf = OPT_PV_L1TF_DOMU;
-}
+if ( opt_pv_l1tf_hwdom == -1 )
+opt_pv_l1tf_hwdom = 0;
+if ( opt_pv_l1tf_domu == -1 )
+opt_pv_l1tf_domu = !pv_shim && cpu_has_bug_l1tf;
 
 /*
  * By default, enable L1D_FLUSH on L1TF-vulnerable hardware, unless
--- a/xen/include/asm-x86/shadow.h
+++ b/xen/include/asm-x86/shadow.h
@@ -224,9 +224,8 @@ void pv_l1tf_tasklet(unsigned long data)
 
 static inline void pv_l1tf_domain_init(struct domain *d)
 {
-d->arch.pv.check_l1tf =
-opt_pv_l1tf & (is_hardware_domain(d)
-   ? OPT_PV_L1TF_DOM0 : OPT_PV_L1TF_DOMU);
+d->arch.pv.check_l1tf = is_hardware_domain(d) ? opt_pv_l1tf_hwdom
+  : opt_pv_l1tf_domu;
 
 #if defined(CONFIG_SHADOW_PAGING) && defined(CONFIG_PV)
 

[Xen-devel] [PATCH v2 1/4] x86: split opt_xpti

2018-10-01 Thread Jan Beulich
Use separate tracking variables for the hardware domain and DomU-s.

No functional change intended.

Signed-off-by: Jan Beulich 
---
v2: New.

--- a/xen/arch/x86/flushtlb.c
+++ b/xen/arch/x86/flushtlb.c
@@ -182,7 +182,7 @@ unsigned int flush_area_local(const void
  */
 invpcid_flush_one(PCID_PV_PRIV, addr);
 invpcid_flush_one(PCID_PV_USER, addr);
-if ( opt_xpti )
+if ( opt_xpti_hwdom || opt_xpti_domu )
 {
 invpcid_flush_one(PCID_PV_PRIV | PCID_PV_XPTI, addr);
 invpcid_flush_one(PCID_PV_USER | PCID_PV_XPTI, addr);
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -253,8 +253,7 @@ int pv_domain_initialise(struct domain *
 /* 64-bit PV guest by default. */
 d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 
-d->arch.pv.xpti = opt_xpti & (is_hardware_domain(d)
-  ? OPT_XPTI_DOM0 : OPT_XPTI_DOMU);
+d->arch.pv.xpti = is_hardware_domain(d) ? opt_xpti_hwdom : opt_xpti_domu;
 
 if ( !is_pv_32bit_domain(d) && use_invpcid && cpu_has_pcid )
 switch ( opt_pcid )
--- a/xen/arch/x86/smpboot.c
+++ b/xen/arch/x86/smpboot.c
@@ -789,7 +789,7 @@ static int setup_cpu_root_pgt(unsigned i
 unsigned int off;
 int rc;
 
-if ( !opt_xpti )
+if ( !opt_xpti_hwdom && !opt_xpti_domu )
 return 0;
 
 rpt = alloc_xen_pagetable();
--- a/xen/arch/x86/spec_ctrl.c
+++ b/xen/arch/x86/spec_ctrl.c
@@ -135,8 +135,10 @@ static int __init parse_spec_ctrl(const
 
 opt_eager_fpu = 0;
 
-if ( opt_xpti < 0 )
-opt_xpti = 0;
+if ( opt_xpti_hwdom < 0 )
+opt_xpti_hwdom = 0;
+if ( opt_xpti_domu < 0 )
+opt_xpti_domu = 0;
 
 if ( opt_smt < 0 )
 opt_smt = 1;
@@ -349,8 +351,8 @@ static void __init print_details(enum in
opt_eager_fpu ? " EAGER_FPU" : "");
 
 printk("  XPTI (64-bit PV only): Dom0 %s, DomU %s (with%s PCID)\n",
-   opt_xpti & OPT_XPTI_DOM0 ? "enabled" : "disabled",
-   opt_xpti & OPT_XPTI_DOMU ? "enabled" : "disabled",
+   opt_xpti_hwdom ? "enabled" : "disabled",
+   opt_xpti_domu  ? "enabled" : "disabled",
xpti_pcid_enabled() ? "" : "out");
 
 printk("  PV L1TF shadowing: Dom0 %s, DomU %s\n",
@@ -665,7 +667,8 @@ static __init void l1tf_calculations(uin
 : (3ul << (paddr_bits - 2;
 }
 
-int8_t __read_mostly opt_xpti = -1;
+int8_t __read_mostly opt_xpti_hwdom = -1;
+int8_t __read_mostly opt_xpti_domu = -1;
 
 static __init void xpti_init_default(uint64_t caps)
 {
@@ -673,9 +676,19 @@ static __init void xpti_init_default(uin
 caps = ARCH_CAPABILITIES_RDCL_NO;
 
 if ( caps & ARCH_CAPABILITIES_RDCL_NO )
-opt_xpti = 0;
+{
+if ( opt_xpti_hwdom < 0 )
+opt_xpti_hwdom = 0;
+if ( opt_xpti_domu < 0 )
+opt_xpti_domu = 0;
+}
 else
-opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+{
+if ( opt_xpti_hwdom < 0 )
+opt_xpti_hwdom = 1;
+if ( opt_xpti_domu < 0 )
+opt_xpti_domu = 1;
+}
 }
 
 static __init int parse_xpti(const char *s)
@@ -684,12 +697,14 @@ static __init int parse_xpti(const char
 int val, rc = 0;
 
 /* Inhibit the defaults as an explicit choice has been given. */
-if ( opt_xpti == -1 )
-opt_xpti = 0;
+if ( opt_xpti_hwdom == -1 )
+opt_xpti_hwdom = 0;
+if ( opt_xpti_domu == -1 )
+opt_xpti_domu = 0;
 
 /* Interpret 'xpti' alone in its positive boolean form. */
 if ( *s == '\0' )
-opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+opt_xpti_hwdom = opt_xpti_domu = 1;
 
 do {
 ss = strchr(s, ',');
@@ -699,22 +714,20 @@ static __init int parse_xpti(const char
 switch ( parse_bool(s, ss) )
 {
 case 0:
-opt_xpti = 0;
+opt_xpti_hwdom = opt_xpti_domu = 0;
 break;
 
 case 1:
-opt_xpti = OPT_XPTI_DOM0 | OPT_XPTI_DOMU;
+opt_xpti_hwdom = opt_xpti_domu = 1;
 break;
 
 default:
 if ( !strcmp(s, "default") )
-opt_xpti = -1;
+opt_xpti_hwdom = opt_xpti_domu = -1;
 else if ( (val = parse_boolean("dom0", s, ss)) >= 0 )
-opt_xpti = (opt_xpti & ~OPT_XPTI_DOM0) |
-   (val ? OPT_XPTI_DOM0 : 0);
+opt_xpti_hwdom = val;
 else if ( (val = parse_boolean("domu", s, ss)) >= 0 )
-opt_xpti = (opt_xpti & ~OPT_XPTI_DOMU) |
-   (val ? OPT_XPTI_DOMU : 0);
+opt_xpti_domu = val;
 else if ( *s )
 rc = -EINVAL;
 break;
@@ -870,8 +883,7 @@ void __init init_speculation_mitigations
 

Re: [Xen-devel] [PATCH V2] x86/altp2m: propagate ept.ad changes to all active altp2ms

2018-10-01 Thread Razvan Cojocaru
On 10/1/18 2:23 PM, George Dunlap wrote:
> On 10/01/2018 12:11 PM, Razvan Cojocaru wrote:
>>
>>
>> On 10/1/18 1:39 PM, Jan Beulich wrote:
>> On 01.10.18 at 11:58,  wrote:
 Changes since V1:
  - Removed unnecessary p2m_lock() in p2m_init_altp2m_ept().
>>>
>>> This was a step in the right direction, but ...
>>>
  static void ept_enable_pml(struct p2m_domain *p2m)
  {
 +struct domain *d = p2m->domain;
 +
  /* Domain must have been paused */
 -ASSERT(atomic_read(>domain->pause_count));
 +ASSERT(atomic_read(>pause_count));
  
  /*
   * No need to return whether vmx_domain_enable_pml has succeeded, as
   * ept_p2m_type_to_flags will do the check, and write protection will 
 be
   * used if PML is not enabled.
   */
 -if ( vmx_domain_enable_pml(p2m->domain) )
 +if ( vmx_domain_enable_pml(d) )
  return;
  
  /* Enable EPT A/D bit for PML */
 -p2m->ept.ad = 1;
 -vmx_domain_update_eptp(p2m->domain);
 +ept_set_ad_sync(p2m, true);
 +
 +vmx_domain_update_eptp(d);
  }
  
  static void ept_disable_pml(struct p2m_domain *p2m)
  {
 +struct domain *d = p2m->domain;
 +
  /* Domain must have been paused */
 -ASSERT(atomic_read(>domain->pause_count));
 +ASSERT(atomic_read(>pause_count));
  
 -vmx_domain_disable_pml(p2m->domain);
 +vmx_domain_disable_pml(d);
  
  /* Disable EPT A/D bit */
 -p2m->ept.ad = 0;
 -vmx_domain_update_eptp(p2m->domain);
 +ept_set_ad_sync(p2m, false);
 +
 +vmx_domain_update_eptp(d);
  }
>>>
>>> These two functions used to be called with the p2m lock held,
>>> while now they aren't anymore. Afaict this introduces a race
>>> where the opposite ept_set_ad_sync() may be called before
>>> an original one was follow by the respective
>>> vmx_domain_update_eptp(), resulting in the A/D enable bit
>>> being set the wrong way round in the end.
>>>
>>> I realize that George did already point out that this is sort of
>>> ugly a situation, but the fixing of the issue here shouldn't
>>> introduce a new race. What's wrong with retaining the
>>> host p2m lock in p2m_{en,dis}able_hardware_log_dirty()?
>>> ept_set_ad_sync() then simply wouldn't acquire/release that
>>> one, but just the altp2m ones.
>>
>> That is fine with be, in fact the whole change has been prompted by
>> George's remark that "there would something a bit funny here about
>> grabbing the main p2m lock in p2m.c, and then grabbing altp2m locks
>> within the function". If, after these comments, he doesn't mind the
>> scenario then I'll do that in V3.
> 
> I think I would rather grab the main p2m locks in
> ept_{enable,disable}_pml().  Wouldn't hurt to have an
> ASSERT(p2m_is_locked_by_me()) in ept_set_ad_sync() as well.

I'll do that. Thanks!


Razvan

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] null scheduler bug

2018-10-01 Thread Julien Grall



On 09/27/2018 06:06 PM, Dario Faggioli wrote:

On Thu, 2018-09-27 at 16:09 +0100, Julien Grall wrote:
Hi,


Hi Dario,


On 09/27/2018 03:32 PM, Dario Faggioli wrote:

On Thu, 2018-09-27 at 15:15 +0200, Milan Boberic wrote:



In one of your e-mail, you wrote:

"Well, our implementation of RCU requires that, from time to time,
the
various physical CPUs of your box become idle, or get an interrupt,
or
go executing inside Xen (for hypercalls, vmexits, etc). In fact, a
CPU
going through Xen is what allow us to tell that it reached a so-
called
'quiescent state', which in turns is necessary for declaring a so-
called 'RCU grace period' over."

I don't quite agree with you on the definition of "quiescent state"
here.


Hehe... I was trying to be both quick and accurate. It's more than
possible that I failed. :-)


To take the domain example, we want to wait until all the CPU has
stopped using the pointer (an hypercall could race put_domain).


I'm not sure what you mean with "an hypercall could race put_domain".


I meant that another CPU get a pointer on the domain until the domain is 
effectively removed from the list by _domain_destroy.



What we want is to wait until all the CPUs that are involved in the
grace period, have gone through rcupdate.c:cpu_quiet(), or have become
idle.


Which is what I meant but in a more convoluted way.



Receiving an interrupt, or experiencing a context switch, or even going
idle, it's "just" how it happens that these CPUs have their chance to
go through cpu_quiet(). It is in this sense that I meant that those
events are used as markers of a quiescent state.

And "wfi=native" (in particular in combination with the null scheduler,
but I guess also with other ones, at least to a certain extent) makes
figuring out the "or have become idle" part tricky. That is the problem
here, isn't it?


That's correct.




That
pointer will not be in-use if the CPU is in kernel-mode/user-mode or
in
the idle loop. Am I correct?


Right.

So, we want that all the CPUs that were in Xen to have either left Xen
at least once or, if they're still there and have never left, that must
be because they've become idle.

And currently we treat all the CPUs that have not told the RCU
subsystem that they're idle (via rcu_idle_enter()) as busy, without
distinguishing between the ones that are busy in Xen from the one which
are busy in guest (kernel or user) mode.


So I am wondering whether we could:
- Mark any CPU in kernel-mode/user-mode quiet


Right. We'd need something like a rcu_guest_enter()/rcu_guest_exit()
(or a rcu_xen_exit()/rcu_xen_enter()), which works for all combination
of arches and guest types.

It looks to me too that this would help in this case, as the vCPU that
stays in guest mode because of wfi=idle would be counted as quiet, and
we won't have to wait for it.


- Raise a RCU_SOFTIRQ in call_rcu?


Mmm... what would be the point of this?


To check if the RCU has work to do. You may call_rcu with all the other 
CPUs quiet. Or do you expect this to be done in rcu_guest_{enter,exit}()?





With that solution, it may even be possible to avoid the timer in
the
idle loop.


Not sure. The timer is there to deal with the case when a CPU which has
a callback queued wants to go idle. It may have quiesced already, but
if there are others which have not, either:
1) we let it go idle, but then the callback will run only when it
wakes up from idle which, without the timer, could be far ahead in
time;
2) we don't let it go idle, but we waste resources;
3) we let it go idle and keep the timer. :-)


Oh right.



But anyway, even if it would not let us get rid of the timer, it seems
like it could be nicer than any other approaches. I accept
help/suggestions about the "let's intercept guest-Xen and Xen-guest
transitions, and track that inside RCU code.


The best place for Arm to have those calls would be in:
	- enter_hypervisor_head(): This is called after exiting the guest and 
before doing any other work
	- leave_hypervisor_head(): This is called just before returning to the 
guest.


I am CCing Stefano work e-mail. When I spoke with him he was interested 
to put some effort towards fixing the bug.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [PATCH v2 0/4] x86: fix "xpti=" and "pv-l1tf=" yet again

2018-10-01 Thread Jan Beulich
The original patch under this title as well as the involved variables
were split up for v2, hopefully addressing the main (yet vague)
review concerns on v1.

1: split opt_xpti
2: split opt_pv_l1tf
3: fix "xpti=" and "pv-l1tf=" yet again
4: support "pv-l1tf=default"

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

[Xen-devel] [distros-debian-sid test] 75332: trouble: blocked/broken

2018-10-01 Thread Platform Team regression test user
flight 75332 distros-debian-sid real [real]
http://osstest.xensource.com/osstest/logs/75332/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-pvopsbroken
 build-i386   broken
 build-amd64-pvopsbroken
 build-armhf  broken
 build-amd64  broken
 build-i386-pvops broken

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-i386-sid-netboot-pygrub  1 build-check(1) blocked n/a
 test-amd64-i386-i386-sid-netboot-pvgrub  1 build-check(1)  blocked n/a
 test-armhf-armhf-armhf-sid-netboot-pygrub  1 build-check(1)blocked n/a
 test-amd64-i386-amd64-sid-netboot-pygrub  1 build-check(1) blocked n/a
 test-amd64-amd64-amd64-sid-netboot-pvgrub  1 build-check(1)blocked n/a
 build-armhf   4 host-install(4)  broken like 75277
 build-armhf-pvops 4 host-install(4)  broken like 75277
 build-i3864 host-install(4)  broken like 75277
 build-i386-pvops  4 host-install(4)  broken like 75277
 build-amd64   4 host-install(4)  broken like 75277
 build-amd64-pvops 4 host-install(4)  broken like 75277

baseline version:
 flight   75277

jobs:
 build-amd64  broken  
 build-armhf  broken  
 build-i386   broken  
 build-amd64-pvopsbroken  
 build-armhf-pvopsbroken  
 build-i386-pvops broken  
 test-amd64-amd64-amd64-sid-netboot-pvgrubblocked 
 test-amd64-i386-i386-sid-netboot-pvgrub  blocked 
 test-amd64-i386-amd64-sid-netboot-pygrub blocked 
 test-armhf-armhf-armhf-sid-netboot-pygrubblocked 
 test-amd64-amd64-i386-sid-netboot-pygrub blocked 



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xensource.com/osstest/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/3] [not-for-unstable] xen/arm: vgic-v3: Delay the initialization of the domain information

2018-10-01 Thread Julien Grall

Hi Andrew,

On 10/01/2018 10:53 AM, Andrew Cooper wrote:

On 01/10/18 10:43, Julien Grall wrote:

Hi,

On 09/29/2018 12:48 AM, Andrew Cooper wrote:

On 29/09/18 00:45, Stefano Stabellini wrote:

On Sat, 29 Sep 2018, Andrew Cooper wrote:

On 28/09/18 21:35, Julien Grall wrote:


On 09/28/2018 12:11 AM, Stefano Stabellini wrote:

On Wed, 26 Sep 2018, Julien Grall wrote:

Hi Stefano,

On 09/25/2018 09:45 PM, Stefano Stabellini wrote:

On Tue, 4 Sep 2018, Andrew Cooper wrote:

On 04/09/18 20:35, Julien Grall wrote:

Hi,

On 09/04/2018 08:21 PM, Julien Grall wrote:

A follow-up patch will require to know the number of vCPUs when
initializating the vGICv3 domain structure. However this
information
is
not available at domain creation. This is only known once
XEN_DOMCTL_max_vpus is called for that domain.

In order to get the max vCPUs around, delay the domain part
of the
vGIC
v3 initialization until the first vCPU of the domain is
initialized.

Signed-off-by: Julien Grall 

---

Cc: Andrew Cooper 

This is nasty but I can't find a better way for Xen 4.11 and
older.
This
is not necessary for unstable as the number of vCPUs is
known at
domain
creation.

Andrew, I have CCed you to know whether you have a better idea
where
to
place this call on Xen 4.11 and older.

I just noticed that d->max_vcpus is initialized after
arch_domain_create. So without this patch on Xen 4.12, it will
not work.

This is getting nastier because arch_domain_init is the one
initialize
the value returned by dom0_max_vcpus. So I am not entirely
sure what
to do here.

The positioning after arch_domain_create() is unfortunate, but I
couldn’t manage better with ARM's current behaviour and Jan's
insistence
that the allocation of d->vcpu was common.  I'd prefer if the
dependency
could be broken and the allocation moved earlier.

One option might be to have an arch_check_domainconfig() (or
similar?)
which is called very early on and can sanity check the values,
including
cross-checking the vgic and max_vcpus settings?  It could even be
responsible for mutating XEN_DOMCTL_CONFIG_GIC_NATIVE into the
correct
real value.

As for your patch here, its a gross hack, but its probably the
best
which can be done.

*Sighs*
If that is what we have to do, it is as ugly as hell, but that
is what
we'll do.

This is the best we can do with the current code base. I think it
would be
worth reworking the code to make it nicer. I will add it in my TODO
list.


My only suggestion to marginally improve it would be instead of:


+    if ( v->vcpu_id == 0 )
+    {
+    rc = vgic_v3_real_domain_init(d);
+    if ( rc )
+    return rc;
+    }

to check on d->arch.vgic.rdist_regions instead:

  if ( d->arch.vgic.rdist_regions == NULL )
  {
     // initialize domain

I would prefer to keep v->vcpu_id == 0 just in case we end up to
re-order the
allocation in the future.

I was suggesting to check on (rdist_regions == NULL) exactly for
potential re-ordering, in case in the future we end up calling
vcpu_vgic_init differently and somehow vcpu_init(vcpu1) is done
before
before vcpu_init(vcpu0). Ideally we would like a way to check that
vgic_v3_real_domain_init has been called before and I thought
rdist_regions == NULL could do just that...

What I meant by re-ordering is we manage to allocate the
re-distributors before the vCPUs are created but still need
vgic_v3_real_domain_init for other purpose.

But vCPU initialization is potentially other issue.

Anyway. both way have drawbacks. Yet I still prefer checking on the
vCPU. It less likely vCPU0 will not be the first one initialized.

With the exception of the idle domain, all vcpus are strictly
allocated
in packed ascending order.  Loads of other stuff will break if that
changed, so I wouldn't worry about it.

Furthermore, there is no obvious reason for this behaviour to ever
change.

OK, let's go with Julien's patch. We need a new tag for this, something
like:

Acked-but-disliked-by: Stefano Stabellini 


Do bear in mind that this patch is only for 4.11 and earlier.  I've
already fixed staging (i.e. 4.12) when it comes to knowing
d->max_vcpus :)

I thought we agreed that patch is necessary for 4.12 as d->max_vcpus
is initialized after arch_domain_init?


Oh right.


I am not planning to do the rework in short term. Did you do more work
on around domain_create recently?


There are multiple related patch series out on xen-devel atm, but I
expect I need to spin a new version of each of them.  I'll see if I have
some time to put towards it.  Are you happy in principle with the
arch_check_domainconfig() plan?


I am happy in principle. If you don't have time to work on it, I will 
try to have a look before Xen 4.12.


Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9] new config option vtsc_tolerance_khz to avoid TSC emulation

2018-10-01 Thread Jan Beulich
>>> On 01.10.18 at 12:52,  wrote:
> Olaf Hering writes ("Re: [PATCH v9] new config option vtsc_tolerance_khz to 
> avoid TSC emulation"):
>> Am Thu, 13 Sep 2018 09:39:13 +0200
>> schrieb Olaf Hering :
>> > this patch was not applied yet, even after a few "pings".
>> 
>> No reaction since months.
>> So scrap that patch, just in case it is still part of someones to-consider 
> queue.
> 
> I think it would be worth exploring whether this patch could be
> applied without an explicit ack from Andrew.
> 
> I confess I ignored all the previous mails because they all started
>Andrew,
> or
>Andrew, Lars,
> so I assumed that you didn't want attention from other
> maintainers/committers.
> 
> Now that I look at the thread it is difficult for me to see the wood
> for the trees but I don't see unanswered concerns.

Problem is - discussion around this was (iirc) happening not only on
the list, but also on irc (including perhaps private chats). It was for
that reason that I made my R-b conditional upon Andrew at least
giving an informal go-ahead (otherwise, together with Wei's R-b,
the patch could have gone in).

Besides the question of correctness from the perspective of guests
(which imo is not a problem as the feature needs to be actively
enabled, and I think we have no reason to keep admins from
breaking their guests if they really mean to), I think the main concern
was with the way migration of the new value was implemented. But I
really have to defer to Andrew for that, irrespective of him not
having responded (on the list) to prior pings.

Jan



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH V2] x86/altp2m: propagate ept.ad changes to all active altp2ms

2018-10-01 Thread George Dunlap
On 10/01/2018 12:11 PM, Razvan Cojocaru wrote:
> 
> 
> On 10/1/18 1:39 PM, Jan Beulich wrote:
> On 01.10.18 at 11:58,  wrote:
>>> Changes since V1:
>>>  - Removed unnecessary p2m_lock() in p2m_init_altp2m_ept().
>>
>> This was a step in the right direction, but ...
>>
>>>  static void ept_enable_pml(struct p2m_domain *p2m)
>>>  {
>>> +struct domain *d = p2m->domain;
>>> +
>>>  /* Domain must have been paused */
>>> -ASSERT(atomic_read(>domain->pause_count));
>>> +ASSERT(atomic_read(>pause_count));
>>>  
>>>  /*
>>>   * No need to return whether vmx_domain_enable_pml has succeeded, as
>>>   * ept_p2m_type_to_flags will do the check, and write protection will 
>>> be
>>>   * used if PML is not enabled.
>>>   */
>>> -if ( vmx_domain_enable_pml(p2m->domain) )
>>> +if ( vmx_domain_enable_pml(d) )
>>>  return;
>>>  
>>>  /* Enable EPT A/D bit for PML */
>>> -p2m->ept.ad = 1;
>>> -vmx_domain_update_eptp(p2m->domain);
>>> +ept_set_ad_sync(p2m, true);
>>> +
>>> +vmx_domain_update_eptp(d);
>>>  }
>>>  
>>>  static void ept_disable_pml(struct p2m_domain *p2m)
>>>  {
>>> +struct domain *d = p2m->domain;
>>> +
>>>  /* Domain must have been paused */
>>> -ASSERT(atomic_read(>domain->pause_count));
>>> +ASSERT(atomic_read(>pause_count));
>>>  
>>> -vmx_domain_disable_pml(p2m->domain);
>>> +vmx_domain_disable_pml(d);
>>>  
>>>  /* Disable EPT A/D bit */
>>> -p2m->ept.ad = 0;
>>> -vmx_domain_update_eptp(p2m->domain);
>>> +ept_set_ad_sync(p2m, false);
>>> +
>>> +vmx_domain_update_eptp(d);
>>>  }
>>
>> These two functions used to be called with the p2m lock held,
>> while now they aren't anymore. Afaict this introduces a race
>> where the opposite ept_set_ad_sync() may be called before
>> an original one was follow by the respective
>> vmx_domain_update_eptp(), resulting in the A/D enable bit
>> being set the wrong way round in the end.
>>
>> I realize that George did already point out that this is sort of
>> ugly a situation, but the fixing of the issue here shouldn't
>> introduce a new race. What's wrong with retaining the
>> host p2m lock in p2m_{en,dis}able_hardware_log_dirty()?
>> ept_set_ad_sync() then simply wouldn't acquire/release that
>> one, but just the altp2m ones.
> 
> That is fine with be, in fact the whole change has been prompted by
> George's remark that "there would something a bit funny here about
> grabbing the main p2m lock in p2m.c, and then grabbing altp2m locks
> within the function". If, after these comments, he doesn't mind the
> scenario then I'll do that in V3.

I think I would rather grab the main p2m locks in
ept_{enable,disable}_pml().  Wouldn't hurt to have an
ASSERT(p2m_is_locked_by_me()) in ept_set_ad_sync() as well.

Thanks,
 -George

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v12 8/9] mm / iommu: include need_iommu() test in iommu_use_hap_pt()

2018-10-01 Thread Julien Grall

Hi Paul,

On 10/01/2018 11:51 AM, Paul Durrant wrote:

-Original Message-
From: Julien Grall [mailto:julien.gr...@arm.com]
Sent: 01 October 2018 11:37
To: Paul Durrant ; xen-devel@lists.xenproject.org
Cc: Jun Nakajima ; George Dunlap
; Jan Beulich ; Andrew Cooper
; Stefano Stabellini 
Subject: Re: [PATCH v12 8/9] mm / iommu: include need_iommu() test in
iommu_use_hap_pt()

Hi Paul,

On 09/27/2018 03:33 PM, Paul Durrant wrote:

The name 'iommu_use_hap_pt' suggests that that P2M table is in use as

the

domain's IOMMU pagetable which, prior to this patch, is not strictly

true

since the macro did not test whether the domain actually has IOMMU
mappings.


Well, I think we assume that iommu_use_hap_pt() will only be called when
need_iommu(d) == 1.


Yes.


At the end, you will still need to check
need_iommu(d) outside because iommu_use_hap_pt() may now return 0 for
domain without IOMMU or for domain not sharing p2m with the IOMMU.

Do you expect an improvement in long-term?



Yes. The ultimate goal is the patch to split up need_iommu() into separate 
macros for 'domain has IOMMU pt' and 'domain needs explicit sync of IOMMU pt' 
and this seems like a logical step. For 'need explicit sync of IOMMU pt' to 
true then 'has IOMMU pt' clearly needs to be true, so I wanted the macro for 
'sharing IOMMU pt' to have the same predicate.


Make senses. It didn't occur to me that was in preparation of the next 
patch.


For the Arm bits:

Acked-by: Julien Grall 

Cheers,

--
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

  1   2   >