Re: [PATCH] x86/msr: Fix XEN_MSR_PAT to build with older binutils

2020-04-29 Thread Jan Beulich
On 29.04.2020 23:39, Andrew Cooper wrote:
> Older binutils complains with:
>   trampoline.S:95: Error: junk `ul&0x' after expression
> 
> Use an assembly-safe constant.
> 
> Signed-off-by: Andrew Cooper 

Acked-by: Jan Beulich 




Re: [PATCH 2/2] xen: credit2: limit the max number of CPUs in a runqueue

2020-04-29 Thread Jan Beulich
On 29.04.2020 19:36, Dario Faggioli wrote:
> @@ -852,14 +862,61 @@ cpu_runqueue_match(const struct csched2_runqueue_data 
> *rqd, unsigned int cpu)
> (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
>  }
>  
> +/* Additional checks, to avoid separating siblings in different runqueues. */
> +static bool
> +cpu_runqueue_smt_match(const struct csched2_runqueue_data *rqd, unsigned int 
> cpu)
> +{
> +unsigned int nr_sibl = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
> +unsigned int rcpu, nr_smts = 0;
> +
> +/*
> + * If we put the CPU in this runqueue, we must be sure that there will
> + * be enough room for accepting its hyperthread sibling(s) here as well.
> + */
> +cpumask_clear(cpumask_scratch_cpu(cpu));
> +for_each_cpu ( rcpu, &rqd->active )
> +{
> +ASSERT(rcpu != cpu);
> +if ( !cpumask_test_cpu(rcpu, cpumask_scratch_cpu(cpu)) )
> +{
> +/*
> + * For each CPU already in the runqueue, account for it and for
> + * its sibling(s), independently from whether such sibling(s) are
> + * in the runqueue already or not.
> + *
> + * Of course, if there are sibling CPUs in the runqueue already,
> + * only count them once.
> + */
> +cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +   per_cpu(cpu_sibling_mask, rcpu));
> +nr_smts += nr_sibl;

This being common code, is it appropriate to assume all CPUs having
the same number of siblings? Even beyond that, iirc the sibling mask
represents the online or parked siblings, but not offline ones. For
the purpose here, don't you rather care about the full set?

What about HT vs AMD Fam15's CUs? Do you want both to be treated
the same here?

Also could you outline the intentions with this logic in the
description, to be able to match the goal with what gets done?

> +}
> +}
> +/*
> + * We know that neither the CPU, nor any of its sibling are here,
> + * or we wouldn't even have entered the function.
> + */
> +ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu),
> +   per_cpu(cpu_sibling_mask, cpu)));
> +
> +/* Try adding CPU and its sibling(s) to the count and check... */
> +nr_smts += nr_sibl;
> +
> +if ( nr_smts <= opt_max_cpus_runqueue )
> +return true;
> +
> +return false;

Fold these into

return nr_smts <= opt_max_cpus_runqueue;

?

> @@ -873,11 +930,44 @@ cpu_add_to_runqueue(struct csched2_private *prv, 
> unsigned int cpu)
>  if ( !rqi_unused && rqd->id > rqi )
>  rqi_unused = true;
>  
> -if ( cpu_runqueue_match(rqd, cpu) )
> +/*
> + * Check whether the CPU should (according to the topology) and also
> + * can (if we there aren't too many already) go in this runqueue.

Nit: Stray "we"?

> + */
> +if ( rqd->refcnt < opt_max_cpus_runqueue &&
> + cpu_runqueue_match(rqd, cpu) )
>  {
> -rqd_valid = true;
> -break;
> +cpumask_t *siblings = per_cpu(cpu_sibling_mask, cpu);
> +
> +dprintk(XENLOG_DEBUG, "CPU %d matches runq %d, cpus={%*pbl} (max 
> %d)\n",
> +cpu, rqd->id, CPUMASK_PR(&rqd->active),
> +opt_max_cpus_runqueue);
> +
> +/*
> + * If we're using core (or socket!) scheduling, or we don't have
> + * hyperthreading, no need to do any further checking.
> + *
> + * If no (to both), but our sibling is already in this runqueue,
> + * then it's also ok for the CPU to stay in this runqueue..

Nit: Stray full 2nd stop?

> + * Otherwise, do some more checks, to better account for SMT.
> + */
> +if ( opt_sched_granularity != SCHED_GRAN_cpu ||
> + cpumask_weight(siblings) <= 1 ||
> + cpumask_intersects(&rqd->active, siblings) )
> +{
> +dprintk(XENLOG_DEBUG, "runq %d selected\n", rqd->id);
> +rqd_valid = rqd;
> +break;
> +}
> +else if ( cpu_runqueue_smt_match(rqd, cpu) )
> +{
> +dprintk(XENLOG_DEBUG, "considering runq %d...\n", rqd->id);
> +rqd_valid = rqd;
> +}
>  }
> + else

Hard tab slipped in.

> @@ -900,6 +990,12 @@ cpu_add_to_runqueue(struct csched2_private *prv, 
> unsigned int cpu)
>  rqd->pick_bias = cpu;
>  rqd->id = rqi;
>  }
> +else
> +rqd = rqd_valid;
> +
> +printk(XENLOG_INFO "CPU %d (sibling={%*pbl}) will go to runqueue %d with 
> {%*pbl}\n",
> +   cpu, CPUMASK_PR(per_cpu(cpu_sibling_mask, cpu)), rqd->id,
> +   CPUMASK_PR(&rqd->active));

Iirc there's one per-CPU printk() already. On large systems this isn't
very nice, so I'd lik

Re: [PATCH 05/12] xen: introduce reserve_heap_pages

2020-04-29 Thread Jan Beulich
On 30.04.2020 00:46, Stefano Stabellini wrote:
> On Fri, 17 Apr 2020, Jan Beulich wrote:
>> On 15.04.2020 03:02, Stefano Stabellini wrote:
>>> Introduce a function named reserve_heap_pages (similar to
>>> alloc_heap_pages) that allocates a requested memory range. Call
>>> __alloc_heap_pages for the implementation.
>>>
>>> Change __alloc_heap_pages so that the original page doesn't get
>>> modified, giving back unneeded memory top to bottom rather than bottom
>>> to top.
>>
>> While it may be less of a problem within a zone, doing so is
>> against our general "return high pages first" policy.
> 
> Is this something you'd be OK with anyway?

As a last resort, maybe. But it really depends on why it needs to be
this way.

> If not, do you have a suggestion on how to do it better? I couldn't find
> a nice way to do it without code duplication, or a big nasty 'if' in the
> middle of the function.

I'd first need to understand the problem to solve.

>>> +pg = maddr_to_page(start);
>>> +node = phys_to_nid(start);
>>> +zone = page_to_zone(pg);
>>> +page_list_del(pg, &heap(node, zone, order));
>>> +
>>> +__alloc_heap_pages(pg, order, memflags, d);
>>
>> I agree with Julien in not seeing how this can be safe / correct.
> 
> I haven't seen any issues so far in my testing -- I imagine it is
> because there aren't many memory allocations after setup_mm() and before
> create_domUs()  (which on ARM is called just before
> domain_unpause_by_systemcontroller at the end of start_xen.)
> 
> 
> I gave a quick look at David's series. Is the idea that I should add a
> patch to do the following:
> 
> - avoiding adding these ranges to xenheap in setup_mm, wait for later
>   (a bit like reserved_mem regions)
> 
> - in construct_domU, add the range to xenheap and reserve it with 
> reserve_heap_pages
> 
> Is that right?

This may be one way, but it may also be not the only possible one.
The main thing to arrange for is that there is either a guarantee
for these ranges to be free (which I think you want to check in
any event, rather than risking to give out something that's already
in use elsewhere), or that you skip ranges which are already in use
(potentially altering [decreasing?] what the specific domain gets
allocated).

Jan



[PATCH v3] tools/xenstore: don't store domU's mfn of ring page in xenstored

2020-04-29 Thread Juergen Gross
The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.

The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL, otherwise the event channel to the
domain is recreated.

As XS_INTRODUCE is limited to dom0 and there is no real downside of
recreating the event channel just omit the test for the gfn to
match and don't return EINVAL for multiple XS_INTRODUCE calls.

Signed-off-by: Juergen Gross 
---
V2:
- remove mfn from struct domain (Julien Grall)
- replace mfn by gfn in comments (Julien Grall)

V3:
- allow multiple XS_INTRODUCE calls (Igor Druzhinin)
---
 tools/xenstore/xenstored_domain.c | 15 ---
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index 5858185211..06359503f0 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -55,10 +55,6 @@ struct domain
   repeated domain introductions. */
evtchn_port_t remote_port;
 
-   /* The mfn associated with the event channel, used only to validate
-  repeated domain introductions. */
-   unsigned long mfn;
-
/* Domain path in store. */
char *path;
 
@@ -363,13 +359,12 @@ static void domain_conn_reset(struct domain *domain)
domain->interface->rsp_cons = domain->interface->rsp_prod = 0;
 }
 
-/* domid, mfn, evtchn, path */
+/* domid, gfn, evtchn, path */
 int do_introduce(struct connection *conn, struct buffered_data *in)
 {
struct domain *domain;
char *vec[3];
unsigned int domid;
-   unsigned long mfn;
evtchn_port_t port;
int rc;
struct xenstore_domain_interface *interface;
@@ -381,7 +376,7 @@ int do_introduce(struct connection *conn, struct 
buffered_data *in)
return EACCES;
 
domid = atoi(vec[0]);
-   mfn = atol(vec[1]);
+   /* Ignore the gfn, we don't need it. */
port = atoi(vec[2]);
 
/* Sanity check args. */
@@ -402,21 +397,19 @@ int do_introduce(struct connection *conn, struct 
buffered_data *in)
return rc;
}
domain->interface = interface;
-   domain->mfn = mfn;
 
/* Now domain belongs to its connection. */
talloc_steal(domain->conn, domain);
 
fire_watches(NULL, in, "@introduceDomain", false);
-   } else if ((domain->mfn == mfn) && (domain->conn != conn)) {
+   } else {
/* Use XS_INTRODUCE for recreating the xenbus event-channel. */
if (domain->port)
xenevtchn_unbind(xce_handle, domain->port);
rc = xenevtchn_bind_interdomain(xce_handle, domid, port);
domain->port = (rc == -1) ? 0 : rc;
domain->remote_port = port;
-   } else
-   return EINVAL;
+   }
 
domain_conn_reset(domain);
 
-- 
2.16.4




[linux-5.4 test] 149878: tolerable FAIL - PUSHED

2020-04-29 Thread osstest service owner
flight 149878 linux-5.4 real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149878/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-armhf-armhf-xl-rtds 15 guest-stop   fail REGR. vs. 149749

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10  fail blocked in 149749
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stop fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 linuxaa73bcc376865c23e61dcebd467697b527901be8
baseline version:
 linux0c418786cb3aa175823f0172d939679df9ab9a54

Last test of basis   149749  2020-04-23 09:09:13 Z6 days
Testing same since   149878  2020-04-29 14:40:04 Z0 days1 attempts


People who touched revisions under test:
  "Eric W. Biederman" 
  "Yan, Zheng" 
  Ahmad Fatoum 
  Alan Stern 
  Alex D

Re: [PATCH 02/12] xen/arm: introduce arch_xen_dom_flags and direct_map

2020-04-29 Thread Stefano Stabellini
On Wed, 15 Apr 2020, Jan Beulich wrote:
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -2527,6 +2527,7 @@ int __init construct_dom0(struct domain *d)
> >  
> >  iommu_hwdom_init(d);
> >  
> > +d->arch.direct_map = true;
> 
> Shouldn't this get set via arch_domain_create() instead?

Yes you are right, this is unnecessary and I can remove it.



Re: [PATCH 01/12] xen: introduce xen_dom_flags

2020-04-29 Thread Stefano Stabellini
On Wed, 15 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > We are passing an extra special boolean flag at domain creation to
> > specify whether we want to the domain to be privileged (i.e. dom0) or
> > not. Another flag will be introduced later in this series.
> > 
> > Introduce a new struct xen_dom_flags and move the privileged flag to it.
> > Other flags will be added to struct xen_dom_flags.
> 
> I'm unsure whether introducing a 2nd structure is worth it here.
> We could as well define some internal-use-only flags for
> struct xen_domctl_createdomain's respective field.

Yep, great idea, I'll do that instead.


> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -529,7 +529,8 @@ static bool emulation_flags_ok(const struct domain *d, 
> > uint32_t emflags)
> >  }
> >  
> >  int arch_domain_create(struct domain *d,
> > -   struct xen_domctl_createdomain *config)
> > +   struct xen_domctl_createdomain *config,
> > +   struct xen_dom_flags *flags)
> 
> const (also elsewhere)?

All of this goes away now, using the exising flag field in
xen_domctl_createdomain, reserving the top 16 bits for internal usage.


> > --- a/xen/arch/x86/setup.c
> > +++ b/xen/arch/x86/setup.c
> > @@ -706,6 +706,7 @@ void __init noreturn __start_xen(unsigned long mbi_p)
> >  .max_maptrack_frames = -1,
> >  };
> >  const char *hypervisor_name;
> > +struct xen_dom_flags flags = { !pv_shim };
> 
> Here and elsewhere please use field designators right away, even if
> there's only a single field now.
> 
> > @@ -363,7 +363,7 @@ struct domain *domain_create(domid_t domid,
> >  ASSERT(is_system_domain(d) ? config == NULL : config != NULL);
> >  
> >  /* Sort out our idea of is_control_domain(). */
> > -d->is_privileged = is_priv;
> > +d->is_privileged =  flags ? flags->is_priv : false;
> 
> Stray double blanks.
> 
> > --- a/xen/common/domctl.c
> > +++ b/xen/common/domctl.c
> > @@ -364,6 +364,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) 
> > u_domctl)
> >  bool_t copyback = 0;
> >  struct xen_domctl curop, *op = &curop;
> >  struct domain *d;
> > +struct xen_dom_flags flags ={ false };
> 
> Missing blank.
> 
> > --- a/xen/include/xen/domain.h
> > +++ b/xen/include/xen/domain.h
> > @@ -63,8 +63,13 @@ void arch_vcpu_destroy(struct vcpu *v);
> >  int map_vcpu_info(struct vcpu *v, unsigned long gfn, unsigned offset);
> >  void unmap_vcpu_info(struct vcpu *v);
> >  
> > +struct xen_dom_flags {
> > +bool is_priv;
> 
> Use a single bit bitfield instead? May even want to consider passing
> this struct by value then.
 



[qemu-mainline test] 149877: tolerable FAIL - PUSHED

2020-04-29 Thread osstest service owner
flight 149877 qemu-mainline real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149877/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-rtds 18 guest-localmigrate/x10   fail  like 149866
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149866
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 149866
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 149866
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 149866
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 149866
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass

version targeted for testing:
 qemuua7922a3c81f34f45b1ebc9670a7769edc4c42a43
baseline version:
 qemuufdd76fecdde1ad444ff4deb7f1c4f7e4a1ef97d6

Last test of basis   149866  2020-04-28 17:07:10 Z1 days
Testing same since   149877  2020-04-29 14:36:19 Z0 days1 attempts


People who touched revisions under test:
  Peter Maydell 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-

[xen-unstable-smoke test] 149884: tolerable all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149884 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149884/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  8065e1b41688592778de76c731c62f34e71f3129
baseline version:
 xen  4304ff420e51b973ec9eb9dafd64a917dd9c0fb1

Last test of basis   149883  2020-04-29 18:01:04 Z0 days
Testing same since   149884  2020-04-29 21:02:20 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Jan Beulich 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   4304ff420e..8065e1b416  8065e1b41688592778de76c731c62f34e71f3129 -> smoke



Re: [PATCH 04/12] xen: split alloc_heap_pages in two halves for reusability

2020-04-29 Thread Stefano Stabellini
On Fri, 17 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -911,54 +911,18 @@ static struct page_info *get_free_buddy(unsigned int 
> > zone_lo,
> >  }
> >  }
> >  
> > -/* Allocate 2^@order contiguous pages. */
> > -static struct page_info *alloc_heap_pages(
> > -unsigned int zone_lo, unsigned int zone_hi,
> > -unsigned int order, unsigned int memflags,
> > -struct domain *d)
> > +static void __alloc_heap_pages(struct page_info **pgo,
> > +   unsigned int order,
> > +   unsigned int memflags,
> > +   struct domain *d)
> 
> Along the line of Wei's reply, I'd suggest the name to better reflect
> the difference to alloc_heap_pages() itself. Maybe
> alloc_pages_from_buddy() or alloc_buddy_pages()?
> 
> In addition
> - no double leading underscores please
> - instead of the function returning void and taking
>   struct page_info **pgo, why not have it take and return
>   struct page_info *?
> - add a comment about the non-standard locking behavior

I made all these changes



[RFC] UEFI Secure Boot on Xen Hosts

2020-04-29 Thread Bobby Eshleman
Hey all,

We're looking to develop UEFI Secure Boot support for grub-loaded Xen and
ultimately for XCP-ng (I'm on the XCP-ng team at Vates).

In addition to carrying the chain-of-trust through the entire boot sequence
into dom0, we would also like to build something akin to Linux's Lockdown for
dom0 and its privileged interfaces.

It appears that there are various options and we'd like to discuss them with
the community.

# Option #1: PE/COFF and Shim

Shim installs a verification protocol available to subsequent programs via EFI
boot services.  The protocol is called SHIM_LOCK and it is currently supported
by xen.efi.

Shim requires the payload under verification to be a PE/COFF executable.  In
order to support both shim and maintain the multiboot2 protocol, Daniel Kiper's
patchset [1]  (among other things) incorporates the PE/COFF header
into xen.gz and adds dom0 verification via SHIM_LOCK in
efi_multiboot2().

There appears that some work will be needed on top of this patchset, but not
much as it seems most of the foot work has been done.

AFAIK, the changes needed in GRUB for this approach are already upstream (the
shim_lock module is upstream), and shim would go untouched.

# Option #2: Extended Multiboot2 and Shim

Another approach that could be taken is to embed Xen's signature into a
new multiboot2 header and then modify shim to support it.  This would
arguably be more readable than embedding the PE/COFF header, would add
value to shim, and would fit nicely with the mb2 header code that
already exists in Xen.  The downside being that it would require a shim
fork.  Grub2 would be unchanged.

I'm not familliar with Microsoft's signing process.  I do know they
support template submissions based on shim, and I'm not sure if such a
big change would impact their approval process.

# Option #3: Lean on Grub2's LoadFile2() Verification

Grub2 will provide a LoadFile2() method to subsequent programs that supports
signature verification of arbitrary files.  Linux is moving in the
direction of using LoadFile2() for loading the initrd [2], and Grub2 will
support verifying the signatures of files loaded via LoadFile2().  This is set
for release in GRUB 2.06 sometime in the latter half of 2020.

In Xen, this approach could be used for loading dom0 as well, offering a very
simple verified load interface.

Currently the initrd argument passed from Linux to LoadFile2() is a vendor
media device path GUID [3].

Changes to Xen:
- Xen would need to pass these device paths to Grub
- Xen would be changed to load dom0 with the LoadFile2() interface via boot 
services

Changes to Grub:
- Xen dom0 kernel/initrd device paths would need to be introduced to Grub

Potential Issues:
- How will Xen handle more boot modules than just a dom0 and dom0
  initrd?
- Would each boot module need a unique vendor guid?
- Would this interfere with the DomB proposal?  I suspect not because
  the DomB proposal suggests launching DomUs from an already booted
  DomB, at which point other means could be used.

We'd just like to get the conversation going on this topic before we
dive too far into implementing something.  Are any of these approaches a
hard no for upstreaming?  Do any stand out as best candidates?  Any
feedback / questions / criticisms would be greatly appreciated.

Thanks,
Bobby Eshleman


# References

1. Daniel Kiper's patchset:
https://lists.xenproject.org/archives/html/xen-devel/2018-06/msg01292.html
2. Linux loading initrd with LoadFile2():
https://lkml.org/lkml/2020/2/17/331
3. The vendor media guid for initrd:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/firmware/efi/libstub/efi-stub-helper.c#n311



Re: [PATCH 05/12] xen: introduce reserve_heap_pages

2020-04-29 Thread Stefano Stabellini
On Fri, 17 Apr 2020, Jan Beulich wrote:
> On 15.04.2020 03:02, Stefano Stabellini wrote:
> > Introduce a function named reserve_heap_pages (similar to
> > alloc_heap_pages) that allocates a requested memory range. Call
> > __alloc_heap_pages for the implementation.
> > 
> > Change __alloc_heap_pages so that the original page doesn't get
> > modified, giving back unneeded memory top to bottom rather than bottom
> > to top.
> 
> While it may be less of a problem within a zone, doing so is
> against our general "return high pages first" policy.

Is this something you'd be OK with anyway?

If not, do you have a suggestion on how to do it better? I couldn't find
a nice way to do it without code duplication, or a big nasty 'if' in the
middle of the function.


> > @@ -1073,7 +1073,42 @@ static struct page_info *alloc_heap_pages(
> >  return NULL;
> >  }
> >  
> > -__alloc_heap_pages(&pg, order, memflags, d);
> > +__alloc_heap_pages(pg, order, memflags, d);
> > +return pg;
> > +}
> > +
> > +static struct page_info *reserve_heap_pages(struct domain *d,
> > +paddr_t start,
> > +unsigned int order,
> > +unsigned int memflags)
> > +{
> > +nodeid_t node;
> > +unsigned int zone;
> > +struct page_info *pg;
> > +
> > +if ( unlikely(order > MAX_ORDER) )
> > +return NULL;
> > +
> > +spin_lock(&heap_lock);
> > +
> > +/*
> > + * Claimed memory is considered unavailable unless the request
> > + * is made by a domain with sufficient unclaimed pages.
> > + */
> > +if ( (outstanding_claims + (1UL << order) > total_avail_pages) &&
> > +  ((memflags & MEMF_no_refcount) ||
> > +   !d || d->outstanding_pages < (1UL << order)) )
> > +{
> > +spin_unlock(&heap_lock);
> > +return NULL;
> > +}
> 
> Where would such a claim come from? Given the purpose I'd assume
> the function (as well as reserve_domheap_pages()) can actually be
> __init. With that I'd then also be okay with them getting built
> unconditionally, i.e. even on x86.

Yes, you are right, I'll make the function __init and also remove this
check on claimed memory.


> > +pg = maddr_to_page(start);
> > +node = phys_to_nid(start);
> > +zone = page_to_zone(pg);
> > +page_list_del(pg, &heap(node, zone, order));
> > +
> > +__alloc_heap_pages(pg, order, memflags, d);
> 
> I agree with Julien in not seeing how this can be safe / correct.

I haven't seen any issues so far in my testing -- I imagine it is
because there aren't many memory allocations after setup_mm() and before
create_domUs()  (which on ARM is called just before
domain_unpause_by_systemcontroller at the end of start_xen.)


I gave a quick look at David's series. Is the idea that I should add a
patch to do the following:

- avoiding adding these ranges to xenheap in setup_mm, wait for later
  (a bit like reserved_mem regions)

- in construct_domU, add the range to xenheap and reserve it with 
reserve_heap_pages

Is that right?



Re: [PATCH 11/12] xen/arm: if xen_force don't try to setup the IOMMU

2020-04-29 Thread Stefano Stabellini
On Wed, 15 Apr 2020, Julien Grall wrote:
> Hi Stefano,
> 
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > If xen_force (which means xen,force-assign-without-iommu was requested)
> > don't try to add the device to the IOMMU. Return early instead.
> 
> 
> Could you explain why this is an issue to call xen_force after
> iommu_add_dt_device()?

There are two issues. I should add info about both of them to the commit
message.


The first issue is that an error returned by iommu_add_dt_device (for
any reason) would cause handle_passthrough_prop to stop and return error
right away. But actually the iommu is not needed for that device if
xen_force is set.

(In fact, one of the reasons why a user might want to set
force-assign-without-iommu is because there are iommu issues with a
device.)


The second issue is about the usage of "xen,force-assign-without-iommu":
it would be useful to let the user set "xen,force-assign-without-iommu"
for devices that are described as behind a SMMU in device tree, but
the SMMU can't actually be used for some reason. Of course, the user
could always manually edit the device tree to make it look like as if
the device is not behind an IOMMU. That would work OK. However, I think
it would be better to avoid making that a requirement.

If we want to allow "xen,force-assign-without-iommu" for a device behind
a SMMU then we need this patch, otherwise this would happen:

res = iommu_add_dt_device(node); // succeeds
if ( xen_force && !dt_device_is_protected(node) ) // fails because the 
device is protected
return 0;
return iommu_assign_dt_device(kinfo->d, node); // fails because 
!is_iommu_enabled(d) which is fine but then handle_prop_pfdt returns error too



All in all, I thought it would make sense to avoid any iommu settings
and potential iommu errors altogether if xen_force is set and return
early.



[PATCH] x86/msr: Fix XEN_MSR_PAT to build with older binutils

2020-04-29 Thread Andrew Cooper
Older binutils complains with:
  trampoline.S:95: Error: junk `ul&0x' after expression

Use an assembly-safe constant.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 
---
 xen/include/asm-x86/processor.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/xen/include/asm-x86/processor.h b/xen/include/asm-x86/processor.h
index ea6e5497f4..8f6f5a97dd 100644
--- a/xen/include/asm-x86/processor.h
+++ b/xen/include/asm-x86/processor.h
@@ -99,7 +99,7 @@
  * Host IA32_CR_PAT value to cover all memory types.  This is not the default
  * MSR_PAT value, and is an ABI with PV guests.
  */
-#define XEN_MSR_PAT 0x050100070406ul
+#define XEN_MSR_PAT _AC(0x050100070406, ULL)
 
 #ifndef __ASSEMBLY__
 
-- 
2.11.0




Re: [PATCH 12/12] xen/arm: call iomem_permit_access for passthrough devices

2020-04-29 Thread Stefano Stabellini
On Wed, 15 Apr 2020, Julien Grall wrote:
> On 15/04/2020 02:02, Stefano Stabellini wrote:
> > iomem_permit_access should be called for MMIO regions of devices
> > assigned to a domain. Currently it is not called for MMIO regions of
> > passthrough devices of Dom0less guests. This patch fixes it.
> 
> You can already have cases today where the MMIO regions are mapped to the
> guest but the guest doesn't have permission on them.
> 
> Can you explain why this is a problem here?

I don't remember the original problem that prompted me into doing this
investigation. It came up while I was developing and testing this
series: I noticed the discrepancy compared to the regular (not dom0less)
device assignment code path
(tools/libxl/libxl_create.c:domcreate_launch_dm).

Now I removed this patch from the series, went back to test it, and it
still works fine. Oops, it is not needed after all :-)


I think it makes sense to remove this patch from this series, I'll do
that.

But doesn't it make sense to give domU permission if it is going to get
the memory mapped? But admittedly I can't think of something that would
break because of the lack of the iomem_permit_access call in this code
path.


> > Signed-off-by: Stefano Stabellini 
> > ---
> >   xen/arch/arm/domain_build.c | 11 +++
> >   1 file changed, 11 insertions(+)
> > 
> > diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
> > index d0fc497d07..d3d42eef5d 100644
> > --- a/xen/arch/arm/domain_build.c
> > +++ b/xen/arch/arm/domain_build.c
> > @@ -1846,6 +1846,17 @@ static int __init handle_passthrough_prop(struct
> > kernel_info *kinfo,
> >   return -EINVAL;
> >   }
> >   +res = iomem_permit_access(kinfo->d, paddr_to_pfn(mstart),
> > +  paddr_to_pfn(PAGE_ALIGN(mstart + size -
> > 1)));
> > +if ( res )
> > +{
> > +printk(XENLOG_ERR "Unable to permit to dom%d access to"
> > +   " 0x%"PRIx64" - 0x%"PRIx64"\n",
> > +   kinfo->d->domain_id,
> > +   mstart & PAGE_MASK, PAGE_ALIGN(mstart + size) - 1);
> > +return res;
> > +}
> > +
> >   res = map_regions_p2mt(kinfo->d,
> >  gaddr_to_gfn(gstart),
> >  PFN_DOWN(size),
> > 
> 
> -- 
> Julien Grall
> 



[xen-unstable-smoke test] 149883: tolerable all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149883 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149883/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  4304ff420e51b973ec9eb9dafd64a917dd9c0fb1
baseline version:
 xen  46d8f69d466a05863737fb81d8c9ef39c3be8b45

Last test of basis   149876  2020-04-29 14:01:03 Z0 days
Testing same since   149883  2020-04-29 18:01:04 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   46d8f69d46..4304ff420e  4304ff420e51b973ec9eb9dafd64a917dd9c0fb1 -> smoke



Re: [PATCH 0/12] direct-map DomUs

2020-04-29 Thread Stefano Stabellini
On Thu, 16 Apr 2020, Julien Grall wrote:
> > Stefano Stabellini (12):
> >xen: introduce xen_dom_flags
> >xen/arm: introduce arch_xen_dom_flags and direct_map
> >xen/arm: introduce 1:1 mapping for domUs
> >xen: split alloc_heap_pages in two halves for reusability
> >xen: introduce reserve_heap_pages
> >xen/arm: reserve 1:1 memory for direct_map domUs
> >xen/arm: new vgic: rename vgic_cpu/dist_base to c/dbase
> >xen/arm: if is_domain_direct_mapped use native addresses for GICv2
> >xen/arm: if is_domain_direct_mapped use native addresses for GICv3
> >xen/arm: if is_domain_direct_mapped use native UART address for 
> > vPL011
> 
> The 3 patches above cover addresses but not interrupts. Why?

Hi Julien,

I take that you are referring to GUEST_VPL011_SPI, right?



[qemu-upstream-unstable test] 149875: tolerable FAIL - PUSHED

2020-04-29 Thread osstest service owner
flight 149875 qemu-upstream-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149875/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 141899
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 141899
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 141899
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 141899
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 qemuu410cc30fdc590417ae730d635bbc70257adf6750
baseline version:
 qemuu933ebad2470a169504799a1d95b8e410bd9847ef

Last test of basis   141899  2019-09-27 10:37:51 Z  215 days
Testing same since   149875  2020-04-29 12:38:30 Z0 days1 attempts


404 people touched revisions under test,
not listing them all

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-ar

[xen-unstable-smoke test] 149876: tolerable all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149876 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149876/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  46d8f69d466a05863737fb81d8c9ef39c3be8b45
baseline version:
 xen  f9bf746258eb53011f863571c7073037202b6743

Last test of basis   149874  2020-04-29 11:00:59 Z0 days
Testing same since   149876  2020-04-29 14:01:03 Z0 days1 attempts


People who touched revisions under test:
  Juergen Gross 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f9bf746258..46d8f69d46  46d8f69d466a05863737fb81d8c9ef39c3be8b45 -> smoke



[PATCH 0/2] xen: credit2: limit the number of CPUs per runqueue

2020-04-29 Thread Dario Faggioli
In Credit2, the CPUs are assigned to runqueues according to the host
topology. For instance, if we want per-socket runqueues (which is the
default), the CPUs that are in the same socket will end up in the same
runqueue.

This is generally good for scalability, at least until the number of
CPUs that end up in the same runqueue is not too high. In fact, all this
CPUs will compete for the same spinlock, for making scheduling decisions
and manipulating the scheduler data structures. Therefore, if they are
too many, that can become a bottleneck.

This has not been an issue so far, but architectures with 128 CPUs per
socket are now available, and it is certainly unideal to have so many
CPUs in the same runqueue, competing for the same locks, etc.

Let's therefore set a cap to the total number of CPUs that can share a
runqueue. The value is set to 16, by default, but can be changed with
a boot command line parameter.

Note that this is orthogonal and independent from the activity of
introducing runqueue arrangement mechanisms and chriteria. In fact,
we very well may want to implement the capability of having, say, per-LLC
runqueues, or per-die (where that is defined) runqueues. In fact, this
would work well on current system, but nothing guarantees that, in
future, there won't be an architecture with hundreds of CPUs per DIE or
LLC... And we'll be back to where we are now.

Therefore, even if we are actually planning to add some new runqueue
organization criteria, fixing a limit for the number of CPUs that
will ever share a runqueue, whatever the underlying topology is, is
still useful.

Note also that, if the host has hyperthreading (and we are not using
core-scheduling), additional care is taken to avoid splitting CPUs that
are hyperthread siblings among different runqueues.

---
Dario Faggioli (2):
  xen: credit2: factor cpu to runqueue matching in a function
  xen: credit2: limit the max number of CPUs in a runqueue

 xen/common/sched/cpupool.c |2 -
 xen/common/sched/credit2.c |  130 +++-
 xen/common/sched/private.h |2 +
 3 files changed, 119 insertions(+), 15 deletions(-)

--
Dario Faggioli, Ph.D
http://about.me/dario.faggioli
Virtualization Software Engineer
SUSE Labs, SUSE https://www.suse.com/
---
<> (Raistlin Majere)



[PATCH 1/2] xen: credit2: factor cpu to runqueue matching in a function

2020-04-29 Thread Dario Faggioli
Just move the big if() condition in an inline function.

No functional change intended.

Signed-off-by: Dario Faggioli 
---
Cc: George Dunlap 
---
 xen/common/sched/credit2.c |   28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 34f05c3e2a..697c9f917d 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -838,6 +838,20 @@ static inline bool same_core(unsigned int cpua, unsigned 
int cpub)
cpu_to_core(cpua) == cpu_to_core(cpub);
 }
 
+static inline bool
+cpu_runqueue_match(const struct csched2_runqueue_data *rqd, unsigned int cpu)
+{
+unsigned int peer_cpu = rqd->pick_bias;
+
+BUG_ON(cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
+
+/* OPT_RUNQUEUE_CPU will never find an existing runqueue. */
+return opt_runqueue == OPT_RUNQUEUE_ALL ||
+   (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
+   (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, cpu)) 
||
+   (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
+}
+
 static struct csched2_runqueue_data *
 cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
@@ -855,21 +869,11 @@ cpu_add_to_runqueue(struct csched2_private *prv, unsigned 
int cpu)
 rqd_ins = &prv->rql;
 list_for_each_entry ( rqd, &prv->rql, rql )
 {
-unsigned int peer_cpu;
-
 /* Remember first unused queue index. */
 if ( !rqi_unused && rqd->id > rqi )
 rqi_unused = true;
 
-peer_cpu = rqd->pick_bias;
-BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID ||
-   cpu_to_socket(peer_cpu) == XEN_INVALID_SOCKET_ID);
-
-/* OPT_RUNQUEUE_CPU will never find an existing runqueue. */
-if ( opt_runqueue == OPT_RUNQUEUE_ALL ||
- (opt_runqueue == OPT_RUNQUEUE_CORE && same_core(peer_cpu, cpu)) ||
- (opt_runqueue == OPT_RUNQUEUE_SOCKET && same_socket(peer_cpu, 
cpu)) ||
- (opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu)) )
+if ( cpu_runqueue_match(rqd, cpu) )
 {
 rqd_valid = true;
 break;
@@ -3744,6 +3748,8 @@ csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 struct csched2_pcpu *spc;
 struct csched2_runqueue_data *rqd;
 
+BUG_ON(cpu_to_socket(cpu) == XEN_INVALID_SOCKET_ID);
+
 spc = xzalloc(struct csched2_pcpu);
 if ( spc == NULL )
 return ERR_PTR(-ENOMEM);




[PATCH] x86/hap: be more selective with assisted TLB flush

2020-04-29 Thread Roger Pau Monne
When doing an assisted flush on HAP the purpose of the
on_selected_cpus is just to trigger a vmexit on remote CPUs that are
in guest context, and hence just using is_vcpu_dirty_cpu is too lax,
also check that the vCPU is running.

While there also pass NULL as the data parameter of on_selected_cpus,
the dummy handler doesn't consume the data in any way.

Signed-off-by: Roger Pau Monné 
---
 xen/arch/x86/mm/hap/hap.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/mm/hap/hap.c b/xen/arch/x86/mm/hap/hap.c
index 580d1c2164..0275cdf5c8 100644
--- a/xen/arch/x86/mm/hap/hap.c
+++ b/xen/arch/x86/mm/hap/hap.c
@@ -719,7 +719,7 @@ static bool flush_tlb(bool (*flush_vcpu)(void *ctxt, struct 
vcpu *v),
 hvm_asid_flush_vcpu(v);
 
 cpu = read_atomic(&v->dirty_cpu);
-if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) )
+if ( cpu != this_cpu && is_vcpu_dirty_cpu(cpu) && v->is_running )
 __cpumask_set_cpu(cpu, mask);
 }
 
@@ -729,7 +729,7 @@ static bool flush_tlb(bool (*flush_vcpu)(void *ctxt, struct 
vcpu *v),
  * not currently running will already be flushed when scheduled because of
  * the ASID tickle done in the loop above.
  */
-on_selected_cpus(mask, dummy_flush, mask, 0);
+on_selected_cpus(mask, dummy_flush, NULL, 0);
 
 return true;
 }
-- 
2.26.0




[PATCH 2/2] xen: credit2: limit the max number of CPUs in a runqueue

2020-04-29 Thread Dario Faggioli
In Credit2 CPUs (can) share runqueues, depending on the topology. For
instance, with per-socket runqueues (the default) all the CPUs that are
part of the same socket share a runqueue.

On platform with a huge number of CPUs per socket, that could be a
problem. An example is AMD EPYC2 servers, where we can have up to 128
CPUs in a socket.

It is of course possible to define other, still topology-based, runqueue
arrangements (e.g., per-LLC, per-DIE, etc). But that may still result in
runqueues with too many CPUs on other/future platforms.

Therefore, let's set a limit to the max number of CPUs that can share a
Credit2 runqueue. The actual value is configurable (at boot time), the
default being 16. If, for instance,  there are more than 16 CPUs in a
socket, they'll be split among two (or more) runqueues.

Note: with core scheduling enabled, this parameter sets the max number
of *scheduling resources* that can share a runqueue. Therefore, with
granularity set to core (and assumint 2 threads per core), we will have
at most 16 cores per runqueue, which corresponds to 32 threads. But that
is fine, considering how core scheduling works.

Signed-off-by: Dario Faggioli 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Juergen Gross 
---
 xen/common/sched/cpupool.c |2 -
 xen/common/sched/credit2.c |  104 ++--
 xen/common/sched/private.h |2 +
 3 files changed, 103 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched/cpupool.c b/xen/common/sched/cpupool.c
index d40345b585..0227457285 100644
--- a/xen/common/sched/cpupool.c
+++ b/xen/common/sched/cpupool.c
@@ -37,7 +37,7 @@ static cpumask_t cpupool_locked_cpus;
 
 static DEFINE_SPINLOCK(cpupool_lock);
 
-static enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu;
+enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu;
 static unsigned int __read_mostly sched_granularity = 1;
 
 #ifdef CONFIG_HAS_SCHED_GRANULARITY
diff --git a/xen/common/sched/credit2.c b/xen/common/sched/credit2.c
index 697c9f917d..abe4d048c8 100644
--- a/xen/common/sched/credit2.c
+++ b/xen/common/sched/credit2.c
@@ -471,6 +471,16 @@ static int __init parse_credit2_runqueue(const char *s)
 }
 custom_param("credit2_runqueue", parse_credit2_runqueue);
 
+/*
+ * How many CPUs will be put, at most, in the same runqueue.
+ * Runqueues are still arranged according to the host topology (and
+ * according to the value of the 'credit2_runqueue' parameter). But
+ * we also have a cap to the number of CPUs that share runqueues.
+ * As soon as we reach the limit, a new runqueue will be created.
+ */
+static unsigned int __read_mostly opt_max_cpus_runqueue = 16;
+integer_param("sched_credit2_max_cpus_runqueue", opt_max_cpus_runqueue);
+
 /*
  * Per-runqueue data
  */
@@ -852,14 +862,61 @@ cpu_runqueue_match(const struct csched2_runqueue_data 
*rqd, unsigned int cpu)
(opt_runqueue == OPT_RUNQUEUE_NODE && same_node(peer_cpu, cpu));
 }
 
+/* Additional checks, to avoid separating siblings in different runqueues. */
+static bool
+cpu_runqueue_smt_match(const struct csched2_runqueue_data *rqd, unsigned int 
cpu)
+{
+unsigned int nr_sibl = cpumask_weight(per_cpu(cpu_sibling_mask, cpu));
+unsigned int rcpu, nr_smts = 0;
+
+/*
+ * If we put the CPU in this runqueue, we must be sure that there will
+ * be enough room for accepting its hyperthread sibling(s) here as well.
+ */
+cpumask_clear(cpumask_scratch_cpu(cpu));
+for_each_cpu ( rcpu, &rqd->active )
+{
+ASSERT(rcpu != cpu);
+if ( !cpumask_test_cpu(rcpu, cpumask_scratch_cpu(cpu)) )
+{
+/*
+ * For each CPU already in the runqueue, account for it and for
+ * its sibling(s), independently from whether such sibling(s) are
+ * in the runqueue already or not.
+ *
+ * Of course, if there are sibling CPUs in the runqueue already,
+ * only count them once.
+ */
+cpumask_or(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+   per_cpu(cpu_sibling_mask, rcpu));
+nr_smts += nr_sibl;
+}
+}
+/*
+ * We know that neither the CPU, nor any of its sibling are here,
+ * or we wouldn't even have entered the function.
+ */
+ASSERT(!cpumask_intersects(cpumask_scratch_cpu(cpu),
+   per_cpu(cpu_sibling_mask, cpu)));
+
+/* Try adding CPU and its sibling(s) to the count and check... */
+nr_smts += nr_sibl;
+
+if ( nr_smts <= opt_max_cpus_runqueue )
+return true;
+
+return false;
+}
+
 static struct csched2_runqueue_data *
 cpu_add_to_runqueue(struct csched2_private *prv, unsigned int cpu)
 {
 struct csched2_runqueue_data *rqd, *rqd_new;
+struct csched2_runqueue_data *rqd_valid = NULL;
 struct list_head *rqd_ins;
 unsigned long flags;
 int rqi = 0;
-bool rqi_unused = false, rqd_vali

Re: Xen Compilation Error on Ubuntu 20.04

2020-04-29 Thread Ayush Dosaj
Awesome, thanks!

On Wed, Apr 29, 2020 at 10:55 PM Andrew Cooper 
wrote:

> On 29/04/2020 18:17, Ayush Dosaj wrote:
> > Hi Xen development team,
> >
> > I am Ayush. I compiled Xen Hypervisor from source on Ubuntu 20.04
> > machine running on an intel-i9 CPU.
> > I am getting compilation error due to the following two flags.
> > Error: error: ‘-mindirect-branch’ and ‘-fcf-protection’ are not
> compatible.
> >
> > Complete Error logs can be found at
> https://paste.ubuntu.com/p/xvvyPnhW5c/
> >
> > And when I compiled Xen commenting the two flags in Rules.mk file, it
> > compiles and installs properly but on boot-up i see a blank black screen
> > and i am stuck there.
>
> That is a GCC bug (these options are actually fine in combination).  It
> got fixed earlier today in master, and backported for GCC 9.4
>
> You can work around it by appending -fcf-protection=none to CFLAGS
>
> I wouldn't try editing the logic around -mindirect-branch, as that is
> related to retpoline safety for Spectre v2, and probably relies on the
> build matching the code.
>
> ~Andrew
>


-- 
Ayush Dosaj
VIT Vellore


Re: Xen Compilation Error on Ubuntu 20.04

2020-04-29 Thread Andrew Cooper
On 29/04/2020 18:17, Ayush Dosaj wrote:
> Hi Xen development team,
> 
> I am Ayush. I compiled Xen Hypervisor from source on Ubuntu 20.04
> machine running on an intel-i9 CPU.
> I am getting compilation error due to the following two flags.
> Error: error: ‘-mindirect-branch’ and ‘-fcf-protection’ are not compatible.
> 
> Complete Error logs can be found at https://paste.ubuntu.com/p/xvvyPnhW5c/
> 
> And when I compiled Xen commenting the two flags in Rules.mk file, it
> compiles and installs properly but on boot-up i see a blank black screen
> and i am stuck there.

That is a GCC bug (these options are actually fine in combination).  It
got fixed earlier today in master, and backported for GCC 9.4

You can work around it by appending -fcf-protection=none to CFLAGS

I wouldn't try editing the logic around -mindirect-branch, as that is
related to retpoline safety for Spectre v2, and probably relies on the
build matching the code.

~Andrew



Xen Compilation Error on Ubuntu 20.04

2020-04-29 Thread Ayush Dosaj
Hi Xen development team,

I am Ayush. I compiled Xen Hypervisor from source on Ubuntu 20.04 machine
running on an intel-i9 CPU.
I am getting compilation error due to the following two flags.
Error: error: ‘-mindirect-branch’ and ‘-fcf-protection’ are not compatible.

Complete Error logs can be found at https://paste.ubuntu.com/p/xvvyPnhW5c/

And when I compiled Xen commenting the two flags in Rules.mk file, it
compiles and installs properly but on boot-up i see a blank black screen
and i am stuck there.


Best,
Ayush Dosaj


Re: [PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends

2020-04-29 Thread Wei Liu
On Wed, Apr 29, 2020 at 06:08:01PM +0200, David Hildenbrand wrote:
> We soon want to pass flags - prepare for that.
> 
> This patch is based on a similar patch by Oscar Salvador:
> 
> https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de
> 
[...]
> ---
>  drivers/hv/hv_balloon.c |  2 +-

> diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
> index 32e3bc0aa665..0194bed1a573 100644
> --- a/drivers/hv/hv_balloon.c
> +++ b/drivers/hv/hv_balloon.c
> @@ -726,7 +726,7 @@ static void hv_mem_hot_add(unsigned long start, unsigned 
> long size,
>  
>   nid = memory_add_physaddr_to_nid(PFN_PHYS(start_pfn));
>   ret = add_memory(nid, PFN_PHYS((start_pfn)),
> - (HA_CHUNK << PAGE_SHIFT));
> + (HA_CHUNK << PAGE_SHIFT), 0);
>  
>   if (ret) {
>   pr_err("hot_add memory failed error is %d\n", ret);

Acked-by: Wei Liu 



[xen-unstable test] 149871: tolerable FAIL - PUSHED

2020-04-29 Thread osstest service owner
flight 149871 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149871/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 16 guest-localmigrate   fail REGR. vs. 149865

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149865
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 149865
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 149865
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 149865
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 149865
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 149865
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 149865
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 149865
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop fail like 149865
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass

version targeted for testing:
 xen  4ec07971f1c5a236a0d8c528d806efb6bfd3d1a6
baseline version:
 xen  2add344e6a4ef690871157b87b0e7d52bc6db9e4

Last test of basis   149865  2020-04-28 16:32:15 Z0 days
Testing same since   149871  2020-04-29 05:14:21 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Hongyan Xia 
  Jan Beulich 
  Juergen Gross 
  Wei Liu 

jobs:
 build-amd64-xsm  pass
 

[PATCH v1 3/3] virtio-mem: Add memory with MHP_DRIVER_MANAGED

2020-04-29 Thread David Hildenbrand
We don't want /sys/firmware/memmap entries and we want to indicate
our memory as "System RAM (driver managed)" in /proc/iomem. This is
especially relevant for kexec-tools, which have to be updated to
support dumping virtio-mem memory after this patch. Expected behavior in
kexec-tools:
- Don't use this memory when creating a fixed-up firmware memmap. Works
  now out of the box on x86-64.
- Don't use this memory for placing kexec segments. Works now out of the
  box on x86-64.
- Consider "System RAM (driver managed)" when creating the elfcorehdr
  for kdump. This memory has to be dumped. Needs update of kexec-tools.

With this patch on x86-64:

/proc/iomem:
-0fff : Reserved
1000-0009fbff : System RAM
[...]
fffc- : Reserved
1-13fff : System RAM
14000-147ff : System RAM (driver managed)
34000-347ff : System RAM (driver managed)
34800-34fff : System RAM (driver managed)
[..]
328000-32 : PCI Bus :00

/sys/firmware/memmap:
-0009fc00 (System RAM)
0009fc00-000a (Reserved)
000f-0010 (Reserved)
0010-bffe (System RAM)
bffe-c000 (Reserved)
feffc000-ff00 (Reserved)
fffc-0001 (Reserved)
0001-00014000 (System RAM)

Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Michal Hocko 
Cc: Eric Biederman 
Signed-off-by: David Hildenbrand 
---
 drivers/virtio/virtio_mem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/virtio/virtio_mem.c b/drivers/virtio/virtio_mem.c
index 3101cbf9e59d..6f658d1aeac4 100644
--- a/drivers/virtio/virtio_mem.c
+++ b/drivers/virtio/virtio_mem.c
@@ -421,7 +421,8 @@ static int virtio_mem_mb_add(struct virtio_mem *vm, 
unsigned long mb_id)
nid = memory_add_physaddr_to_nid(addr);
 
dev_dbg(&vm->vdev->dev, "adding memory block: %lu\n", mb_id);
-   return add_memory(nid, addr, memory_block_size_bytes(), 0);
+   return add_memory(nid, addr, memory_block_size_bytes(),
+ MHP_DRIVER_MANAGED);
 }
 
 /*
-- 
2.25.3




[PATCH v1 0/3] mm/memory_hotplug: Make virtio-mem play nicely with kexec-tools

2020-04-29 Thread David Hildenbrand
This series is based on [1]:
[PATCH v2 00/10] virtio-mem: paravirtualized memory
That will hopefull get picked up soon, rebased to -next.

The following patches were reverted from -next [2]:
[PATCH 0/3] kexec/memory_hotplug: Prevent removal and accidental use
As discussed in that thread, they should be reverted from -next already.

In theory, if people agree, we could take the first two patches via the
-mm tree now and the last (virtio-mem) patch via MST's tree once picking up
virtio-mem. No strong feelings.


Memory added by virtio-mem is special and might contain logical holes,
especially after memory unplug, but also when adding memory in
sub-section size. While memory in these holes can usually be read, that
memory should not be touched. virtio-mem managed device memory is never
exposed via any firmware memmap (esp., e820). The device driver will
request to plug memory from the hypervisor and add it to Linux.

On a cold start, all memory is unplugged, and the guest driver will first
request to plug memory from the hypervisor, to then add it to Linux. After
a reboot, all memory will get unplugged (except in rare, special cases). In
case the device driver comes up and detects that some memory is still
plugged after a reboot, it will manually request to unplug all memory from
the hypervisor first - to then request to plug memory from the hypervisor
and add to Linux. This is essentially a defragmentation step, where all
logical holes are removed.

As the device driver is responsible for detecting, adding and managing that
memory, also kexec should treat it like that. It is special. We need a way
to teach kexec-tools to not add that memory to the fixed-up firmware
memmap, to not place kexec images onto this memory, but still allow kdump
to dump it. Add a flag to tell memory hotplug code to
not create /sys/firmware/memmap entries and to indicate it via
"System RAM (driver managed)" in /proc/iomem.

Before this series, kexec_file_load() already did the right thing (for
virtio-mem) by not adding that memory to the fixed-up firmware memmap and
letting the device driver handle it. With this series, also kexec_load() -
which relies on user space to provide a fixed up firmware memmap - does
the right thing with virtio-mem memory.

When the virtio-mem device driver(s) come up, they will request to unplug
all memory from the hypervisor first (esp. defragment), to then request to
plug consecutive memory ranges from the hypervisor, and add them to Linux
- just like on a reboot where we still have memory plugged.

[1] https://lore.kernel.org/r/20200311171422.10484-1-da...@redhat.com/
[2] https://lore.kernel.org/r/20200326180730.4754-1-james.mo...@arm.com

David Hildenbrand (3):
  mm/memory_hotplug: Prepare passing flags to add_memory() and friends
  mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED
  virtio-mem: Add memory with MHP_DRIVER_MANAGED

 arch/powerpc/platforms/powernv/memtrace.c |  2 +-
 .../platforms/pseries/hotplug-memory.c|  2 +-
 drivers/acpi/acpi_memhotplug.c|  2 +-
 drivers/base/memory.c |  2 +-
 drivers/dax/kmem.c|  2 +-
 drivers/hv/hv_balloon.c   |  2 +-
 drivers/s390/char/sclp_cmd.c  |  2 +-
 drivers/virtio/virtio_mem.c   |  3 +-
 drivers/xen/balloon.c |  2 +-
 include/linux/memory_hotplug.h| 15 +++--
 mm/memory_hotplug.c   | 31 +--
 11 files changed, 44 insertions(+), 21 deletions(-)

-- 
2.25.3




[PATCH v1 2/3] mm/memory_hotplug: Introduce MHP_DRIVER_MANAGED

2020-04-29 Thread David Hildenbrand
Some paravirtualized devices that add memory via add_memory() and
friends (esp. virtio-mem) don't want to create entries in
/sys/firmware/memmap/ - primarily to hinder kexec from adding this
memory to the boot memmap of the kexec kernel.

In fact, such memory is never exposed via the firmware (e.g., e820), but
only via the device, so exposing this memory via /sys/firmware/memmap/ is
wrong:
 "kexec needs the raw firmware-provided memory map to setup the
  parameter segment of the kernel that should be booted with
  kexec. Also, the raw memory map is useful for debugging. For
  that reason, /sys/firmware/memmap is an interface that provides
  the raw memory map to userspace." [1]

We want to let user space know that memory which is always detected,
added, and managed via a (device) driver - like memory managed by
virtio-mem - is special. It cannot be used for placing kexec segments
and the (device) driver is responsible for re-adding memory that
(eventually shrunk/grown/defragmented) memory after a reboot/kexec. It
should e.g., not be added to a fixed up firmware memmap. However, it should
be dumped by kdump.

Also, such memory could behave differently than an ordinary DIMM - e.g.,
memory managed by virtio-mem can have holes inside added memory resource,
which should not be touched, especially for writing.

Let's expose that memory as "System RAM (driver managed)" e.g., via
/pro/iomem.

We don't have to worry about firmware_map_remove() on the removal path.
If there is no entry, it will simply return with -EINVAL.

[1] https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-memmap

Cc: Andrew Morton 
Cc: Michal Hocko 
Cc: Pankaj Gupta 
Cc: Wei Yang 
Cc: Baoquan He 
Cc: Eric Biederman 
Signed-off-by: David Hildenbrand 
---
 include/linux/memory_hotplug.h |  8 
 mm/memory_hotplug.c| 20 
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index bf0e3edb8688..cc538584b39e 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -68,6 +68,14 @@ struct mhp_params {
pgprot_t pgprot;
 };
 
+/* Flags used for add_memory() and friends. */
+
+/*
+ * Don't create entries in /sys/firmware/memmap/ and expose memory as
+ * "System RAM (driver managed)" in e.g., /proc/iomem
+ */
+#define MHP_DRIVER_MANAGED 1
+
 /*
  * Zone resizing functions
  *
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index ebdf6541d074..cfa0721280aa 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -98,11 +98,11 @@ void mem_hotplug_done(void)
 u64 max_mem_size = U64_MAX;
 
 /* add this memory to iomem resource */
-static struct resource *register_memory_resource(u64 start, u64 size)
+static struct resource *register_memory_resource(u64 start, u64 size,
+const char *resource_name)
 {
struct resource *res;
unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
-   char *resource_name = "System RAM";
 
/*
 * Make sure value parsed from 'mem=' only restricts memory adding
@@ -1058,7 +1058,8 @@ int __ref add_memory_resource(int nid, struct resource 
*res,
BUG_ON(ret);
 
/* create new memmap entry */
-   firmware_map_add_hotplug(start, start + size, "System RAM");
+   if (!(flags & MHP_DRIVER_MANAGED))
+   firmware_map_add_hotplug(start, start + size, "System RAM");
 
/* device_online() will take the lock when calling online_pages() */
mem_hotplug_done();
@@ -1081,10 +1082,21 @@ int __ref add_memory_resource(int nid, struct resource 
*res,
 /* requires device_hotplug_lock, see add_memory_resource() */
 int __ref __add_memory(int nid, u64 start, u64 size, unsigned long flags)
 {
+   const char *resource_name = "System RAM";
struct resource *res;
int ret;
 
-   res = register_memory_resource(start, size);
+   /*
+* Indicate that memory managed by a driver is special. It's always
+* detected and added via a driver, should not be given to the kexec
+* kernel for booting when manually crafting the firmware memmap, and
+* no kexec segments should be placed on it. However, kdump should
+* dump this memory.
+*/
+   if (flags & MHP_DRIVER_MANAGED)
+   resource_name = "System RAM (driver managed)";
+
+   res = register_memory_resource(start, size, resource_name);
if (IS_ERR(res))
return PTR_ERR(res);
 
-- 
2.25.3




[PATCH v1 1/3] mm/memory_hotplug: Prepare passing flags to add_memory() and friends

2020-04-29 Thread David Hildenbrand
We soon want to pass flags - prepare for that.

This patch is based on a similar patch by Oscar Salvador:

https://lkml.kernel.org/r/20190625075227.15193-3-osalva...@suse.de

Cc: Michael Ellerman 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: "Rafael J. Wysocki" 
Cc: Len Brown 
Cc: Greg Kroah-Hartman 
Cc: Dan Williams 
Cc: Vishal Verma 
Cc: Dave Jiang 
Cc: "K. Y. Srinivasan" 
Cc: Haiyang Zhang 
Cc: Stephen Hemminger 
Cc: Wei Liu 
Cc: Heiko Carstens 
Cc: Vasily Gorbik 
Cc: Christian Borntraeger 
Cc: "Michael S. Tsirkin" 
Cc: Jason Wang 
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Stefano Stabellini 
Cc: Andrew Morton 
Cc: Thomas Gleixner 
Cc: Pingfan Liu 
Cc: Leonardo Bras 
Cc: Nathan Lynch 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: Baoquan He 
Cc: Wei Yang 
Cc: Pankaj Gupta 
Cc: Eric Biederman 
Cc: linuxppc-...@lists.ozlabs.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-a...@vger.kernel.org
Cc: linux-nvd...@lists.01.org
Cc: linux-hyp...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: virtualizat...@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Signed-off-by: David Hildenbrand 
---
 arch/powerpc/platforms/powernv/memtrace.c   |  2 +-
 arch/powerpc/platforms/pseries/hotplug-memory.c |  2 +-
 drivers/acpi/acpi_memhotplug.c  |  2 +-
 drivers/base/memory.c   |  2 +-
 drivers/dax/kmem.c  |  2 +-
 drivers/hv/hv_balloon.c |  2 +-
 drivers/s390/char/sclp_cmd.c|  2 +-
 drivers/virtio/virtio_mem.c |  2 +-
 drivers/xen/balloon.c   |  2 +-
 include/linux/memory_hotplug.h  |  7 ---
 mm/memory_hotplug.c | 11 ++-
 11 files changed, 19 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/memtrace.c 
b/arch/powerpc/platforms/powernv/memtrace.c
index 13b369d2cc45..a7475d18c671 100644
--- a/arch/powerpc/platforms/powernv/memtrace.c
+++ b/arch/powerpc/platforms/powernv/memtrace.c
@@ -224,7 +224,7 @@ static int memtrace_online(void)
ent->mem = 0;
}
 
-   if (add_memory(ent->nid, ent->start, ent->size)) {
+   if (add_memory(ent->nid, ent->start, ent->size, 0)) {
pr_err("Failed to add trace memory to node %d\n",
ent->nid);
ret += 1;
diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c 
b/arch/powerpc/platforms/pseries/hotplug-memory.c
index 5ace2f9a277e..ae44eba46ca0 100644
--- a/arch/powerpc/platforms/pseries/hotplug-memory.c
+++ b/arch/powerpc/platforms/pseries/hotplug-memory.c
@@ -646,7 +646,7 @@ static int dlpar_add_lmb(struct drmem_lmb *lmb)
block_sz = memory_block_size_bytes();
 
/* Add the memory */
-   rc = __add_memory(lmb->nid, lmb->base_addr, block_sz);
+   rc = __add_memory(lmb->nid, lmb->base_addr, block_sz, 0);
if (rc) {
invalidate_lmb_associativity_index(lmb);
return rc;
diff --git a/drivers/acpi/acpi_memhotplug.c b/drivers/acpi/acpi_memhotplug.c
index e294f44a7850..d91b3584d4b2 100644
--- a/drivers/acpi/acpi_memhotplug.c
+++ b/drivers/acpi/acpi_memhotplug.c
@@ -207,7 +207,7 @@ static int acpi_memory_enable_device(struct 
acpi_memory_device *mem_device)
if (node < 0)
node = memory_add_physaddr_to_nid(info->start_addr);
 
-   result = __add_memory(node, info->start_addr, info->length);
+   result = __add_memory(node, info->start_addr, info->length, 0);
 
/*
 * If the memory block has been used by the kernel, add_memory()
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 2b09b68b9f78..c0ef7d9e310a 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -432,7 +432,7 @@ static ssize_t probe_store(struct device *dev, struct 
device_attribute *attr,
 
nid = memory_add_physaddr_to_nid(phys_addr);
ret = __add_memory(nid, phys_addr,
-  MIN_MEMORY_BLOCK_SIZE * sections_per_block);
+  MIN_MEMORY_BLOCK_SIZE * sections_per_block, 0);
 
if (ret)
goto out;
diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c
index 3d0a7e702c94..e159184e0ba0 100644
--- a/drivers/dax/kmem.c
+++ b/drivers/dax/kmem.c
@@ -65,7 +65,7 @@ int dev_dax_kmem_probe(struct device *dev)
new_res->flags = IORESOURCE_SYSTEM_RAM;
new_res->name = dev_name(dev);
 
-   rc = add_memory(numa_node, new_res->start, resource_size(new_res));
+   rc = add_memory(numa_node, new_res->start, resource_size(new_res), 0);
if (rc) {
release_resource(new_res);
kfree(new_res);
diff --git a/drivers/hv/hv_balloon.c b/drivers/hv/hv_balloon.c
index 32e3bc0aa665..0194bed1a573 100644
--- a/drivers/hv/hv_balloon.c
+++ b/drivers/hv/hv_balloon.c
@@ -72

Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Jan Beulich
On 29.04.2020 17:30, Julien Grall wrote:
> Hi Jan,
> 
> On 29/04/2020 16:23, Jan Beulich wrote:
>> On 29.04.2020 17:06, Julien Grall wrote:
>>>
>>>
>>> On 29/04/2020 15:56, Jan Beulich wrote:
 On 29.04.2020 16:14, Julien Grall wrote:
> Hi Jan,
>
> On 29/04/2020 15:05, Jan Beulich wrote:
>> On 29.04.2020 16:01, Julien Grall wrote:
>>> Hi,
>>>
>>> On 22/04/2020 10:20, Jan Beulich wrote:
> Even if it was possible to use the sub-structs defined in the header
> that way, keep in mind that we also wrote:
>
>     /* dummy member to force sizeof(struct 
> xen_pvcalls_request)
>  * to match across archs */
>     struct xen_pvcalls_dummy {
>     uint8_t dummy[56];
>     } dummy;

 This has nothing to do with how a consumer may use the structs.

> And the spec also clarifies that the size of each specific request is
> always 56 bytes.

 Sure, and I didn't mean to imply that a consumer would be allowed
 to break this requirement. Still something like this

 int pvcall_new_socket(struct xen_pvcalls_socket *s) {
     struct xen_pvcalls_request req = {
     .req_id = REQ_ID,
     .cmd = PVCALLS_SOCKET,
     .u.socket = *s,
     };

     return pvcall(&req);
 }

 may break.
>>>
>>> I think I understand your concern now. So yes I agree this would break 
>>> 32-bit consumer.
>>>
>>> As the padding is at the end of the structure, I think a 32-bit 
>>> frontend and 64-bit backend (or vice-versa) should currently work 
>>> without any trouble. The problem would come later if we decide to 
>>> extend a command.
>>
>> Can commands be extended at all, i.e. don't extensions require new
>> commands? The issue I've described has nothing to do with future
>> extending of any of the affected structures.
>
> I think my point wasn't conveyed correctly. The implicit padding is at
> the end of the structure for all the consumers but 32-bit x86. So
> without any modification, I think 32-bit frontend can still communicate
> with 64-bit backend (or vice-versa).

 There's no issue communicating afaics, as for communication
 you wouldn't use the sub-structures, but the single container
 one. The problem is, as described, with possible uses internal
 to one side of the communication.
>>>
>>> I am sorry but I can't figure out how this is an issue. The
>>> problem you described would only happen if you are calling a
>>> 64-bit library from a 32-bit software.
>>
>> Why? The example given doesn't require such.
> 
> Your example is only valid if we change the padding. In my previous
> e-mail I wrote "without modification" (i.e existing code) and
> marking the implicit padding only for 64-bit x86 and Arm. So there
> is no change in the size of the structure and therefore there is no
> issue to recompile as the size would not change.

Oh, sorry, yes. I was mislead by "I think 32-bit frontend can still
communicate with 64-bit backend (or vice-versa)", because I never
said there would be such an issue.

>>> Is it even possible?
>>
>> In principle yes, I think. I don't think OSes like Linux allow this,
>> though.
> Do we really have to care about this?

I don't think we do, but this is a moot point anyway.

Jan



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Julien Grall

Hi Jan,

On 29/04/2020 16:23, Jan Beulich wrote:

On 29.04.2020 17:06, Julien Grall wrote:



On 29/04/2020 15:56, Jan Beulich wrote:

On 29.04.2020 16:14, Julien Grall wrote:

Hi Jan,

On 29/04/2020 15:05, Jan Beulich wrote:

On 29.04.2020 16:01, Julien Grall wrote:

Hi,

On 22/04/2020 10:20, Jan Beulich wrote:

Even if it was possible to use the sub-structs defined in the header
that way, keep in mind that we also wrote:

    /* dummy member to force sizeof(struct xen_pvcalls_request)
     * to match across archs */
    struct xen_pvcalls_dummy {
    uint8_t dummy[56];
    } dummy;


This has nothing to do with how a consumer may use the structs.


And the spec also clarifies that the size of each specific request is
always 56 bytes.


Sure, and I didn't mean to imply that a consumer would be allowed
to break this requirement. Still something like this

int pvcall_new_socket(struct xen_pvcalls_socket *s) {
    struct xen_pvcalls_request req = {
    .req_id = REQ_ID,
    .cmd = PVCALLS_SOCKET,
    .u.socket = *s,
    };

    return pvcall(&req);
}

may break.


I think I understand your concern now. So yes I agree this would break 32-bit 
consumer.

As the padding is at the end of the structure, I think a 32-bit frontend and 
64-bit backend (or vice-versa) should currently work without any trouble. The 
problem would come later if we decide to extend a command.


Can commands be extended at all, i.e. don't extensions require new
commands? The issue I've described has nothing to do with future
extending of any of the affected structures.


I think my point wasn't conveyed correctly. The implicit padding is at
the end of the structure for all the consumers but 32-bit x86. So
without any modification, I think 32-bit frontend can still communicate
with 64-bit backend (or vice-versa).


There's no issue communicating afaics, as for communication
you wouldn't use the sub-structures, but the single container
one. The problem is, as described, with possible uses internal
to one side of the communication.


I am sorry but I can't figure out how this is an issue. The
problem you described would only happen if you are calling a
64-bit library from a 32-bit software.


Why? The example given doesn't require such.


Your example is only valid if we change the padding. In my previous 
e-mail I wrote "without modification" (i.e existing code) and marking 
the implicit padding only for 64-bit x86 and Arm. So there is no change 
in the size of the structure and therefore there is no issue to 
recompile as the size would not change.



Is it even possible?


In principle yes, I think. I don't think OSes like Linux allow this,
though.

Do we really have to care about this?

Cheers,

--
Julien Grall



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Jan Beulich
On 29.04.2020 17:06, Julien Grall wrote:
> 
> 
> On 29/04/2020 15:56, Jan Beulich wrote:
>> On 29.04.2020 16:14, Julien Grall wrote:
>>> Hi Jan,
>>>
>>> On 29/04/2020 15:05, Jan Beulich wrote:
 On 29.04.2020 16:01, Julien Grall wrote:
> Hi,
>
> On 22/04/2020 10:20, Jan Beulich wrote:
>>> Even if it was possible to use the sub-structs defined in the header
>>> that way, keep in mind that we also wrote:
>>>
>>>    /* dummy member to force sizeof(struct xen_pvcalls_request)
>>>     * to match across archs */
>>>    struct xen_pvcalls_dummy {
>>>    uint8_t dummy[56];
>>>    } dummy;
>>
>> This has nothing to do with how a consumer may use the structs.
>>
>>> And the spec also clarifies that the size of each specific request is
>>> always 56 bytes.
>>
>> Sure, and I didn't mean to imply that a consumer would be allowed
>> to break this requirement. Still something like this
>>
>> int pvcall_new_socket(struct xen_pvcalls_socket *s) {
>>    struct xen_pvcalls_request req = {
>>    .req_id = REQ_ID,
>>    .cmd = PVCALLS_SOCKET,
>>    .u.socket = *s,
>>    };
>>
>>    return pvcall(&req);
>> }
>>
>> may break.
>
> I think I understand your concern now. So yes I agree this would break 
> 32-bit consumer.
>
> As the padding is at the end of the structure, I think a 32-bit frontend 
> and 64-bit backend (or vice-versa) should currently work without any 
> trouble. The problem would come later if we decide to extend a command.

 Can commands be extended at all, i.e. don't extensions require new
 commands? The issue I've described has nothing to do with future
 extending of any of the affected structures.
>>>
>>> I think my point wasn't conveyed correctly. The implicit padding is at
>>> the end of the structure for all the consumers but 32-bit x86. So
>>> without any modification, I think 32-bit frontend can still communicate
>>> with 64-bit backend (or vice-versa).
>>
>> There's no issue communicating afaics, as for communication
>> you wouldn't use the sub-structures, but the single container
>> one. The problem is, as described, with possible uses internal
>> to one side of the communication.
> 
> I am sorry but I can't figure out how this is an issue. The
> problem you described would only happen if you are calling a
> 64-bit library from a 32-bit software.

Why? The example given doesn't require such.

> Is it even possible?

In principle yes, I think. I don't think OSes like Linux allow this,
though.

Jan



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Julien Grall




On 29/04/2020 15:56, Jan Beulich wrote:

On 29.04.2020 16:14, Julien Grall wrote:

Hi Jan,

On 29/04/2020 15:05, Jan Beulich wrote:

On 29.04.2020 16:01, Julien Grall wrote:

Hi,

On 22/04/2020 10:20, Jan Beulich wrote:

Even if it was possible to use the sub-structs defined in the header
that way, keep in mind that we also wrote:

   /* dummy member to force sizeof(struct xen_pvcalls_request)
    * to match across archs */
   struct xen_pvcalls_dummy {
   uint8_t dummy[56];
   } dummy;


This has nothing to do with how a consumer may use the structs.


And the spec also clarifies that the size of each specific request is
always 56 bytes.


Sure, and I didn't mean to imply that a consumer would be allowed
to break this requirement. Still something like this

int pvcall_new_socket(struct xen_pvcalls_socket *s) {
   struct xen_pvcalls_request req = {
   .req_id = REQ_ID,
   .cmd = PVCALLS_SOCKET,
   .u.socket = *s,
   };

   return pvcall(&req);
}

may break.


I think I understand your concern now. So yes I agree this would break 32-bit 
consumer.

As the padding is at the end of the structure, I think a 32-bit frontend and 
64-bit backend (or vice-versa) should currently work without any trouble. The 
problem would come later if we decide to extend a command.


Can commands be extended at all, i.e. don't extensions require new
commands? The issue I've described has nothing to do with future
extending of any of the affected structures.


I think my point wasn't conveyed correctly. The implicit padding is at
the end of the structure for all the consumers but 32-bit x86. So
without any modification, I think 32-bit frontend can still communicate
with 64-bit backend (or vice-versa).


There's no issue communicating afaics, as for communication
you wouldn't use the sub-structures, but the single container
one. The problem is, as described, with possible uses internal
to one side of the communication.


I am sorry but I can't figure out how this is an issue. The problem you 
described would only happen if you are calling a 64-bit library from a 
32-bit software. Is it even possible?


Cheers,

--
Julien Grall



Re: [PATCH v2 3/5] tools/misc: add xen-domctx to present domain context

2020-04-29 Thread Jan Beulich
On 07.04.2020 19:38, Paul Durrant wrote:
> +int main(int argc, char **argv)
> +{
> +uint32_t domid;
> +unsigned int entry;
> +xc_interface *xch;
> +int rc;
> +
> +if ( argc != 2 || !argv[1] || (rc = atoi(argv[1])) < 0 )
> +{
> +fprintf(stderr, "usage: %s \n", argv[0]);
> +exit(1);
> +}

Perhaps also allow dumping just a single (vCPU or other) ID?

Jan



Re: [PATCH 6/7] xen/guest_access: Consolidate guest access helpers in xen/guest_access.h

2020-04-29 Thread Julien Grall




On 29/04/2020 15:54, Jan Beulich wrote:

On 29.04.2020 16:13, Julien Grall wrote:

So can you please have another and explain how the line can be drawn with just 
two architectures in place.


There are abstract considerations that can be used to draw the
line, as well as knowledge of people on architectures Xen doesn't
run on, but where one can - with such knowledge - extrapolate how
it would want to be implemented.

>

I don't think the question at this point is where to draw the
line, but whether to have asm-generic/ in the first place.


Well the two come together. You can't add a new directory with no clear 
view how this is going to be used.


At the moment, this would result at best bikeshedding because 
developpers may have a different opinion on how a new architecture would 
be implemented in Xen.


If you have a 3rd architectures then it would be easier to argue the 
header should be added in asm-generic/ or xen/:
   - asm-generic/ should be used if 2 of the architectures are using 
the same interface

   - xen/ should be if the 3 architectures are using the same interface

Cheers,

--
Julien Grall



Re: [PATCH v2] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Jan Beulich
On 29.04.2020 15:58, Hongyan Xia wrote:
> From: Wei Liu 
> 
> Also, clean up the initialisation of plXe.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Hongyan Xia 

Reviewed-by: Jan Beulich 




Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Jan Beulich
On 29.04.2020 16:14, Julien Grall wrote:
> Hi Jan,
> 
> On 29/04/2020 15:05, Jan Beulich wrote:
>> On 29.04.2020 16:01, Julien Grall wrote:
>>> Hi,
>>>
>>> On 22/04/2020 10:20, Jan Beulich wrote:
> Even if it was possible to use the sub-structs defined in the header
> that way, keep in mind that we also wrote:
>
>   /* dummy member to force sizeof(struct xen_pvcalls_request)
>    * to match across archs */
>   struct xen_pvcalls_dummy {
>   uint8_t dummy[56];
>   } dummy;

 This has nothing to do with how a consumer may use the structs.

> And the spec also clarifies that the size of each specific request is
> always 56 bytes.

 Sure, and I didn't mean to imply that a consumer would be allowed
 to break this requirement. Still something like this

 int pvcall_new_socket(struct xen_pvcalls_socket *s) {
   struct xen_pvcalls_request req = {
   .req_id = REQ_ID,
   .cmd = PVCALLS_SOCKET,
   .u.socket = *s,
   };

   return pvcall(&req);
 }

 may break.
>>>
>>> I think I understand your concern now. So yes I agree this would break 
>>> 32-bit consumer.
>>>
>>> As the padding is at the end of the structure, I think a 32-bit frontend 
>>> and 64-bit backend (or vice-versa) should currently work without any 
>>> trouble. The problem would come later if we decide to extend a command.
>>
>> Can commands be extended at all, i.e. don't extensions require new
>> commands? The issue I've described has nothing to do with future
>> extending of any of the affected structures.
> 
> I think my point wasn't conveyed correctly. The implicit padding is at
> the end of the structure for all the consumers but 32-bit x86. So
> without any modification, I think 32-bit frontend can still communicate
> with 64-bit backend (or vice-versa).

There's no issue communicating afaics, as for communication
you wouldn't use the sub-structures, but the single container
one. The problem is, as described, with possible uses internal
to one side of the communication.

Jan



Re: [PATCH 6/7] xen/guest_access: Consolidate guest access helpers in xen/guest_access.h

2020-04-29 Thread Jan Beulich
On 29.04.2020 16:13, Julien Grall wrote:
> So can you please have another and explain how the line can be drawn with 
> just two architectures in place.

There are abstract considerations that can be used to draw the
line, as well as knowledge of people on architectures Xen doesn't
run on, but where one can - with such knowledge - extrapolate how
it would want to be implemented.

I don't think the question at this point is where to draw the
line, but whether to have asm-generic/ in the first place.

Jan



Re: [PATCH v2 2/5] xen/common/domctl: introduce XEN_DOMCTL_get/setdomaincontext

2020-04-29 Thread Jan Beulich
On 07.04.2020 19:38, Paul Durrant wrote:
> @@ -358,6 +359,113 @@ static struct vnuma_info *vnuma_init(const struct 
> xen_domctl_vnuma *uinfo,
>  return ERR_PTR(ret);
>  }
>  
> +struct domctl_context
> +{
> +void *buffer;
> +size_t len;
> +size_t cur;
> +};
> +
> +static int accumulate_size(void *priv, const void *data, size_t len)
> +{
> +struct domctl_context *c = priv;
> +
> +if ( c->len + len < c->len )
> +return -EOVERFLOW;
> +
> +c->len += len;
> +
> +return 0;
> +}
> +
> +static int save_data(void *priv, const void *data, size_t len)
> +{
> +struct domctl_context *c = priv;
> +
> +if ( c->len - c->cur < len )
> +return -ENOSPC;
> +
> +memcpy(c->buffer + c->cur, data, len);
> +c->cur += len;
> +
> +return 0;
> +}
> +
> +static int getdomaincontext(struct domain *d,
> +struct xen_domctl_getdomaincontext *gdc)
> +{
> +struct domctl_context c = { };

Please can you use ZERO_BLOCK_PTR or some such for the buffer
field, such that errnoeous use of the field would not end up
as a (PV-controllable) NULL deref. (Yes, it's a domctl, but
still.) This being common code you also want to get things
right for Arm, irrespective of whether the code will be dead
there for now.

> +int rc;
> +
> +if ( d == current->domain )
> +return -EPERM;
> +
> +if ( guest_handle_is_null(gdc->buffer) ) /* query for buffer size */
> +{
> +if ( gdc->size )
> +return -EINVAL;
> +
> +/* dry run to acquire buffer size */
> +rc = domain_save(d, accumulate_size, &c, true);
> +if ( rc )
> +return rc;
> +
> +gdc->size = c.len;
> +return 0;
> +}
> +
> +c.len = gdc->size;
> +c.buffer = xmalloc_bytes(c.len);

What sizes are we looking at here? It may be better to use vmalloc()
right from the start. If not, I'd like to advocate for using
xmalloc_array() instead of xmalloc_bytes() - see the almost-XSA
commit cf38b4926e2b.

> +if ( !c.buffer )
> +return -ENOMEM;
> +
> +rc = domain_save(d, save_data, &c, false);
> +
> +gdc->size = c.cur;
> +if ( !rc && copy_to_guest(gdc->buffer, c.buffer, gdc->size) )

As to my remark in patch 1 on the size field, applying to this size
field too - copy_to_user{,_hvm}() don't support a 64-bit value (on
y86 at least).

> --- a/xen/include/public/domctl.h
> +++ b/xen/include/public/domctl.h
> @@ -38,7 +38,7 @@
>  #include "hvm/save.h"
>  #include "memory.h"
>  
> -#define XEN_DOMCTL_INTERFACE_VERSION 0x0012
> +#define XEN_DOMCTL_INTERFACE_VERSION 0x0013

I don't see you making any change making the interface backwards
incompatible, hence no need for the bump.

> @@ -1129,6 +1129,44 @@ struct xen_domctl_vuart_op {
>   */
>  };
>  
> +/*
> + * Get/Set domain PV context. The same struct xen_domctl_domaincontext
> + * is used for both commands but with slightly different field semantics
> + * as follows:
> + *
> + * XEN_DOMCTL_getdomaincontext
> + * ---
> + *
> + * buffer (IN):   The buffer into which the context data should be
> + *copied, or NULL to query the buffer size that should
> + *be allocated.
> + * size (IN/OUT): If 'buffer' is NULL then the value passed in must be
> + *zero, and the value passed out will be the size of the
> + *buffer to allocate.
> + *If 'buffer' is non-NULL then the value passed in must
> + *be the size of the buffer into which data may be copied.

This leaves open whether the size also gets updated in this latter
case.

> + */
> +struct xen_domctl_getdomaincontext {
> +uint64_t size;

If this is to remain 64-bits (with too large values suitably taken
care of for all cases - see above), uint64_aligned_t please for
consistency, if nothing else.

Jan



Re: [PATCH] x86/CPUID: correct error indicator for max extended leaf

2020-04-29 Thread Andrew Cooper
On 29/04/2020 15:11, Jan Beulich wrote:
> With the max base leaf using 0, this one should be using the extended
> leaf counterpart thereof, rather than some arbitrary extended leaf.
>
> Fixes: 588a966a572e ("libx86: Introduce x86_cpu_policies_are_compatible()")
> Signed-off-by: Jan Beulich 

Reviewed-by: Andrew Cooper 



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Julien Grall

Hi Jan,

On 29/04/2020 15:05, Jan Beulich wrote:

On 29.04.2020 16:01, Julien Grall wrote:

Hi,

On 22/04/2020 10:20, Jan Beulich wrote:

Even if it was possible to use the sub-structs defined in the header
that way, keep in mind that we also wrote:

  /* dummy member to force sizeof(struct xen_pvcalls_request)
   * to match across archs */
  struct xen_pvcalls_dummy {
  uint8_t dummy[56];
  } dummy;


This has nothing to do with how a consumer may use the structs.


And the spec also clarifies that the size of each specific request is
always 56 bytes.


Sure, and I didn't mean to imply that a consumer would be allowed
to break this requirement. Still something like this

int pvcall_new_socket(struct xen_pvcalls_socket *s) {
  struct xen_pvcalls_request req = {
  .req_id = REQ_ID,
  .cmd = PVCALLS_SOCKET,
  .u.socket = *s,
  };

  return pvcall(&req);
}

may break.


I think I understand your concern now. So yes I agree this would break 32-bit 
consumer.

As the padding is at the end of the structure, I think a 32-bit frontend and 
64-bit backend (or vice-versa) should currently work without any trouble. The 
problem would come later if we decide to extend a command.


Can commands be extended at all, i.e. don't extensions require new
commands? The issue I've described has nothing to do with future
extending of any of the affected structures.


I think my point wasn't conveyed correctly. The implicit padding is at 
the end of the structure for all the consumers but 32-bit x86. So 
without any modification, I think 32-bit frontend can still communicate 
with 64-bit backend (or vice-versa).


Therefore I suggest to rework the documentation and add the implicit 
padding just for all the architectures but 32-bit x86.


Cheers,

--
Julien Grall



Re: [PATCH 6/7] xen/guest_access: Consolidate guest access helpers in xen/guest_access.h

2020-04-29 Thread Julien Grall

Hi,

On 29/04/2020 15:07, Jan Beulich wrote:

On 29.04.2020 16:04, Julien Grall wrote:

Gentle ping. Any comments on the direction to take?


Just to avoid misunderstanding - while the mail was sent with me as
the only one on the To list, I don't think I've been meant, as I've
voiced my opinion. I assume you rather meant to ping others to chime
in?


I barely pay attention to CC vs TO. If I am on the list of recipients of 
an e-mail, then I will at least have a glance.


This e-mail is directed to you specifically and also the others. While 
you may have voiced some of your opinion already, you haven't replied 
back how you would decide whether an header should be added in xen or 
asm-generic.


So can you please have another and explain how the line can be drawn 
with just two architectures in place.


Cheers,

--
Julien Grall



[PATCH] x86/CPUID: correct error indicator for max extended leaf

2020-04-29 Thread Jan Beulich
With the max base leaf using 0, this one should be using the extended
leaf counterpart thereof, rather than some arbitrary extended leaf.

Fixes: 588a966a572e ("libx86: Introduce x86_cpu_policies_are_compatible()")
Signed-off-by: Jan Beulich 

--- a/tools/tests/cpu-policy/test-cpu-policy.c
+++ b/tools/tests/cpu-policy/test-cpu-policy.c
@@ -570,7 +570,7 @@ static void test_is_compatible_failure(v
 {
 .name = "Host extd.max_leaf out of range",
 .guest_cpuid.extd.max_leaf = 1,
-.e = { 0x8008, -1, -1 },
+.e = { 0x8000, -1, -1 },
 },
 {
 .name = "Host no CPUID faulting, Guest wanted",
--- a/xen/lib/x86/policy.c
+++ b/xen/lib/x86/policy.c
@@ -19,7 +19,7 @@ int x86_cpu_policies_are_compatible(cons
 FAIL_CPUID(0, NA);
 
 if ( guest->cpuid->extd.max_leaf > host->cpuid->extd.max_leaf )
-FAIL_CPUID(0x8008, NA);
+FAIL_CPUID(0x8000, NA);
 
 /* TODO: Audit more CPUID data. */
 



Re: [PATCH 6/7] xen/guest_access: Consolidate guest access helpers in xen/guest_access.h

2020-04-29 Thread Jan Beulich
On 29.04.2020 16:04, Julien Grall wrote:
> Gentle ping. Any comments on the direction to take?

Just to avoid misunderstanding - while the mail was sent with me as
the only one on the To list, I don't think I've been meant, as I've
voiced my opinion. I assume you rather meant to ping others to chime
in?

Jan

> On 09/04/2020 10:28, Julien Grall wrote:
>>
>>
>> On 09/04/2020 09:06, Jan Beulich wrote:
>>> On 09.04.2020 10:01, Julien Grall wrote:
 Hi,

 On 09/04/2020 07:30, Jan Beulich wrote:
> On 09.04.2020 00:05, Julien Grall wrote:
> I don't see why a new port may not also want
> to go that route instead of the x86/Arm one.
 I could accept that someone would want to reinvent a new ABI
 from scratch for just an hypothetical new arch. However it would
 be quite an effort to reinvent XEN_GUEST_HANDLE(). The chance is
 RISC-V is only going to re-use what Arm did as Arm already did
 with x86.

 I would like to avoid to introduce a new directory asm-generic
 with just one header in it. Maybe you have some other headers in
 mind?
>>>
>>> I recall having wondered a few times whether we shouldn't use this
>>> concept elsewhere. One case iirc was bitops stuff. Looking over
>>> the Linux ones, some atomic and barrier fallback implementations
>>> may also sensibly live there, and there are likely more.
>>
>> In theory it makes sense but, in the current state of Xen, 'asm-generic' 
>> means they are common to Arm and x86. This is basically the same definition 
>> as for the directory 'xen'. So how do you draw a line which files go where?
>>
>> To be honest, I don't think we can draw a line without a 3rd architecture. 
>> So I would recommend to wait until then to create an asm-generic directory.
>>
>> Meanwhile, I still think the consolidation in xen/ is useful as it makes 
>> easier to maintain. It is also going to make easier for RISCv (or a new 
>> arch) as they don't have to worry about duplication (if any).
>>
>> In the event they decide to purse their own route, then it is not going to 
>> be a massive pain to move part of xen/guest_access.h in 
>> asm-generic/guest_access.h and include the latter from {xen, asm} 
>> /guest_access.h.
> 
> Cheers,
> 




Re: [PATCH 6/7] xen/guest_access: Consolidate guest access helpers in xen/guest_access.h

2020-04-29 Thread Julien Grall

Hi,

Gentle ping. Any comments on the direction to take?

On 09/04/2020 10:28, Julien Grall wrote:



On 09/04/2020 09:06, Jan Beulich wrote:

On 09.04.2020 10:01, Julien Grall wrote:

Hi,

On 09/04/2020 07:30, Jan Beulich wrote:

On 09.04.2020 00:05, Julien Grall wrote:
I don't see why a new port may not also want
to go that route instead of the x86/Arm one.

I could accept that someone would want to reinvent a new ABI
from scratch for just an hypothetical new arch. However it would
be quite an effort to reinvent XEN_GUEST_HANDLE(). The chance is
RISC-V is only going to re-use what Arm did as Arm already did
with x86.

I would like to avoid to introduce a new directory asm-generic
with just one header in it. Maybe you have some other headers in
mind?


I recall having wondered a few times whether we shouldn't use this
concept elsewhere. One case iirc was bitops stuff. Looking over
the Linux ones, some atomic and barrier fallback implementations
may also sensibly live there, and there are likely more.


In theory it makes sense but, in the current state of Xen, 'asm-generic' 
means they are common to Arm and x86. This is basically the same 
definition as for the directory 'xen'. So how do you draw a line which 
files go where?


To be honest, I don't think we can draw a line without a 3rd 
architecture. So I would recommend to wait until then to create an 
asm-generic directory.


Meanwhile, I still think the consolidation in xen/ is useful as it makes 
easier to maintain. It is also going to make easier for RISCv (or a new 
arch) as they don't have to worry about duplication (if any).


In the event they decide to purse their own route, then it is not going 
to be a massive pain to move part of xen/guest_access.h in 
asm-generic/guest_access.h and include the latter from {xen, asm} 
/guest_access.h.


Cheers,

--
Julien Grall



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Jan Beulich
On 29.04.2020 16:01, Julien Grall wrote:
> Hi,
> 
> On 22/04/2020 10:20, Jan Beulich wrote:
>>> Even if it was possible to use the sub-structs defined in the header
>>> that way, keep in mind that we also wrote:
>>>
>>>  /* dummy member to force sizeof(struct xen_pvcalls_request)
>>>   * to match across archs */
>>>  struct xen_pvcalls_dummy {
>>>  uint8_t dummy[56];
>>>  } dummy;
>>
>> This has nothing to do with how a consumer may use the structs.
>>
>>> And the spec also clarifies that the size of each specific request is
>>> always 56 bytes.
>>
>> Sure, and I didn't mean to imply that a consumer would be allowed
>> to break this requirement. Still something like this
>>
>> int pvcall_new_socket(struct xen_pvcalls_socket *s) {
>>  struct xen_pvcalls_request req = {
>>  .req_id = REQ_ID,
>>  .cmd = PVCALLS_SOCKET,
>>  .u.socket = *s,
>>  };
>>
>>  return pvcall(&req);
>> }
>>
>> may break.
> 
> I think I understand your concern now. So yes I agree this would break 32-bit 
> consumer.
> 
> As the padding is at the end of the structure, I think a 32-bit frontend and 
> 64-bit backend (or vice-versa) should currently work without any trouble. The 
> problem would come later if we decide to extend a command.

Can commands be extended at all, i.e. don't extensions require new
commands? The issue I've described has nothing to do with future
extending of any of the affected structures.

Jan



Re: [PATCH] pvcalls: Document explicitly the padding for all arches

2020-04-29 Thread Julien Grall

Hi,

On 22/04/2020 10:20, Jan Beulich wrote:

Even if it was possible to use the sub-structs defined in the header
that way, keep in mind that we also wrote:

 /* dummy member to force sizeof(struct xen_pvcalls_request)
  * to match across archs */
 struct xen_pvcalls_dummy {
 uint8_t dummy[56];
 } dummy;


This has nothing to do with how a consumer may use the structs.


And the spec also clarifies that the size of each specific request is
always 56 bytes.


Sure, and I didn't mean to imply that a consumer would be allowed
to break this requirement. Still something like this

int pvcall_new_socket(struct xen_pvcalls_socket *s) {
 struct xen_pvcalls_request req = {
 .req_id = REQ_ID,
 .cmd = PVCALLS_SOCKET,
 .u.socket = *s,
 };

 return pvcall(&req);
}

may break.


I think I understand your concern now. So yes I agree this would break 
32-bit consumer.


As the padding is at the end of the structure, I think a 32-bit frontend 
and 64-bit backend (or vice-versa) should currently work without any 
trouble. The problem would come later if we decide to extend a command.


I will document the padding only for non 32-bit x86 guest and rework the 
documentation.


Cheers,

--
Julien Grall



[PATCH] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Hongyan Xia
From: Wei Liu 

Also, clean up the initialisation of plXe.

Signed-off-by: Wei Liu 
Signed-off-by: Hongyan Xia 

---
Changed since last revision:
- lift this patch out since others in the series have been merged.
- remove lXt variables and reuse plXe for unmapping.
- clean up plXe initialisation.
- drop a Reviewed-by.
---
 xen/arch/x86/pv/dom0_build.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c b/xen/arch/x86/pv/dom0_build.c
index abfbe5f436..3522eb0114 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -49,18 +49,11 @@ static __init void mark_pv_pt_pages_rdonly(struct domain *d,
 {
 unsigned long count;
 struct page_info *page;
-l4_pgentry_t *pl4e;
-l3_pgentry_t *pl3e;
-l2_pgentry_t *pl2e;
-l1_pgentry_t *pl1e;
-
-pl4e = l4start + l4_table_offset(vpt_start);
-pl3e = l4e_to_l3e(*pl4e);
-pl3e += l3_table_offset(vpt_start);
-pl2e = l3e_to_l2e(*pl3e);
-pl2e += l2_table_offset(vpt_start);
-pl1e = l2e_to_l1e(*pl2e);
-pl1e += l1_table_offset(vpt_start);
+l4_pgentry_t *pl4e = l4start + l4_table_offset(vpt_start);
+l3_pgentry_t *pl3e = map_l3t_from_l4e(*pl4e) + l3_table_offset(vpt_start);
+l2_pgentry_t *pl2e = map_l2t_from_l3e(*pl3e) + l2_table_offset(vpt_start);
+l1_pgentry_t *pl1e = map_l1t_from_l2e(*pl2e) + l1_table_offset(vpt_start);
+
 for ( count = 0; count < nr_pt_pages; count++ )
 {
 l1e_remove_flags(*pl1e, _PAGE_RW);
@@ -85,12 +78,21 @@ static __init void mark_pv_pt_pages_rdonly(struct domain *d,
 if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) )
 {
 if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) )
-pl3e = l4e_to_l3e(*++pl4e);
-pl2e = l3e_to_l2e(*pl3e);
+{
+/* Need to unmap the page before the increment. */
+unmap_domain_page(pl3e - 1);
+pl3e = map_l3t_from_l4e(*++pl4e);
+}
+unmap_domain_page(pl2e - 1);
+pl2e = map_l2t_from_l3e(*pl3e);
 }
-pl1e = l2e_to_l1e(*pl2e);
+unmap_domain_page(pl1e - 1);
+pl1e = map_l1t_from_l2e(*pl2e);
 }
 }
+unmap_domain_page(pl1e);
+unmap_domain_page(pl2e);
+unmap_domain_page(pl3e);
 }
 
 static __init void setup_pv_physmap(struct domain *d, unsigned long pgtbl_pfn,
-- 
2.17.1




Re: [PATCH v2 1/3] x86/pv: Options to disable and/or compile out 32bit PV support

2020-04-29 Thread Jan Beulich
On 29.04.2020 15:06, Andrew Cooper wrote:
> This is the start of some performance and security-hardening improvements,
> based on the fact that 32bit PV guests are few and far between these days.
> 
> Ring1 is full of architectural corner cases, such as counting as supervisor
> from a paging point of view.  This accounts for a substantial performance hit
> on processors from the last 8 years (adjusting SMEP/SMAP on every privilege
> transition), and the gap is only going to get bigger with new hardware
> features.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Wei Liu 
> Reviewed-by: Roger Pau Monné 

Acked-by: Jan Beulich 




[xen-unstable-smoke test] 149874: tolerable all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149874 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149874/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  f9bf746258eb53011f863571c7073037202b6743
baseline version:
 xen  e9aca9470ed86966f9c0fd0db85132ff28d652c4

Last test of basis   149872  2020-04-29 08:02:03 Z0 days
Testing same since   149874  2020-04-29 11:00:59 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Wei Liu 
  Wei Liu 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   e9aca9470e..f9bf746258  f9bf746258eb53011f863571c7073037202b6743 -> smoke



Re: [PATCH] x86/S3: Drop {save,restore}_rest_processor_state() completely

2020-04-29 Thread Jan Beulich
On 29.04.2020 15:36, Andrew Cooper wrote:
> On 29/04/2020 14:25, Jan Beulich wrote:
>> On 29.04.2020 13:32, Andrew Cooper wrote:
>>> On 29/04/2020 12:16, Jan Beulich wrote:
 On 29.04.2020 13:09, Andrew Cooper wrote:
> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -91,6 +91,11 @@ trampoline_protmode_entry:
>  and %edi,%edx
>  wrmsr
>  1:
> +/* Set up PAT before enabling paging. */
> +mov $XEN_MSR_PAT & 0x, %eax
> +mov $XEN_MSR_PAT >> 32, %edx
> +mov $MSR_IA32_CR_PAT, %ecx
> +wrmsr
 Doesn't this also eliminate the need for cpu_init() doing this?
 If you agree with that one also dropped
 Reviewed-by: Jan Beulich 
>>> That doesn't cover the BSP on either the legacy or EFI paths.
>> The legacy path, afaict, uses it:
>>
>> .Lskip_realmode:
>> /* EBX == 0 indicates we are the BP (Boot Processor). */
>> xor %ebx,%ebx
>>
>> /* Jump to the common bootstrap entry point. */
>> jmp trampoline_protmode_entry
> 
> Oh, of course.
> 
>> The xen.efi entry path really should have the change you make
>> mirrored anyway.
> 
> Are you happy for it to go in efi_arch_post_exit_boot()?  We don't
> disable paging, but that is the point where we switch from the EFI
> pagetables to Xen's.

Yes, that's the most "symmetrical" place, I think.

Jan



Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 14:06, Jürgen Groß wrote:
> On 29.04.20 14:49, Igor Druzhinin wrote:
>> On 29/04/2020 13:29, Jürgen Groß wrote:
>>> On 29.04.20 13:04, Igor Druzhinin wrote:
>>
>> Yes, XAPI is doing soft reset differently from libxl. You need to keep 
>> xenstore
>> data to at least keep backends working without the need to reinitialize them
>> before entering kdump kernel. We do the same thing while entering crash 
>> kernel
>> in Windows after BSOD (CC Paul as he recommended this approach).
>> There are other reasons to not reset xenstore data.
>>
>> I considered XS_INTRODUCE functionality to be part of xenstored ABI and we 
>> built
>> a lot of infrastructure on top of that. So I don't think it could be easily 
>> removed now.
> 
> Nobody wants to remove XS_INTRODUCE. It was just questioned there is a
> need to call XS_INTRODUCE multiple times for a domain without a call of
> XS_RELEASE in between.

Sorry, I didn't mean the whole XS_INTRODUCE but that particular bit that
allows it to rebind the channels without calling XS_RELEASE first as the
latter has a lot of implications from toolstack point of view.

> In case there is such a need this means we either should just drop the
> test for the mfn to match and recreate the event-channel (which will do
> no harm IMO), or we need to keep the mfn in the live-update state record
> with the knowledge that it is far from being a good indicator for the
> test whether a domain is known already or not (two HVM domains using a
> similar configuration with the very same kernel will use the same gfn
> probably).
> 
> As only dom0 is allowed to use XS_INTRODUCE and I'm not aware of any
> problems with neither XS_INTRODUCE failing due to illegal calls (no mfn
> match), nor with potentially recreating the event channel for a "wrong"
> domain, I suggest to just allow XS_INTRODUCE to be called as often as
> the toolstack wants to.

If that means keeping the current semantics of XS_INTRODUCE - I'd be quite
happy with that.

Igor



Re: [PATCH 2/3] x86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds

2020-04-29 Thread Jan Beulich
On 29.04.2020 15:30, Andrew Cooper wrote:
> On 29/04/2020 14:29, Jan Beulich wrote:
>> On 29.04.2020 15:13, Andrew Cooper wrote:
>>> On 20/04/2020 15:09, Jan Beulich wrote:
 On 17.04.2020 17:50, Andrew Cooper wrote:
> --- a/xen/arch/x86/pv/domain.c
> +++ b/xen/arch/x86/pv/domain.c
> @@ -215,7 +215,7 @@ int switch_compat(struct domain *d)
>  return 0;
>  
>  d->arch.has_32bit_shinfo = 1;
> -d->arch.is_32bit_pv = 1;
> +d->arch.pv.is_32bit = 1;
>  
>  for_each_vcpu( d, v )
>  {
> @@ -235,7 +235,7 @@ int switch_compat(struct domain *d)
>  return 0;
>  
>   undo_and_fail:
> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
>  for_each_vcpu( d, v )
>  {
>  free_compat_arg_xlat(v);
> @@ -358,7 +358,7 @@ int pv_domain_initialise(struct domain *d)
>  d->arch.ctxt_switch = &pv_csw;
>  
>  /* 64-bit PV guest by default. */
> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
 Switch to true/false while you're touching these?
>>> Yes, but I'm tempted to delete these lines in the final hunk.  Its
>>> writing zeros into a zeroed structures.
>> Oh, yes, agreed.
> 
> Can I take this as an ack then?

Sorry, didn't realize I didn't give one yet with the adjustments
made:
Reviewed-by: Jan Beulich 

Jan



Re: [PATCH] x86/S3: Drop {save,restore}_rest_processor_state() completely

2020-04-29 Thread Andrew Cooper
On 29/04/2020 14:25, Jan Beulich wrote:
> On 29.04.2020 13:32, Andrew Cooper wrote:
>> On 29/04/2020 12:16, Jan Beulich wrote:
>>> On 29.04.2020 13:09, Andrew Cooper wrote:
 --- a/xen/arch/x86/boot/trampoline.S
 +++ b/xen/arch/x86/boot/trampoline.S
 @@ -91,6 +91,11 @@ trampoline_protmode_entry:
  and %edi,%edx
  wrmsr
  1:
 +/* Set up PAT before enabling paging. */
 +mov $XEN_MSR_PAT & 0x, %eax
 +mov $XEN_MSR_PAT >> 32, %edx
 +mov $MSR_IA32_CR_PAT, %ecx
 +wrmsr
>>> Doesn't this also eliminate the need for cpu_init() doing this?
>>> If you agree with that one also dropped
>>> Reviewed-by: Jan Beulich 
>> That doesn't cover the BSP on either the legacy or EFI paths.
> The legacy path, afaict, uses it:
>
> .Lskip_realmode:
> /* EBX == 0 indicates we are the BP (Boot Processor). */
> xor %ebx,%ebx
>
> /* Jump to the common bootstrap entry point. */
> jmp trampoline_protmode_entry

Oh, of course.

> The xen.efi entry path really should have the change you make
> mirrored anyway.

Are you happy for it to go in efi_arch_post_exit_boot()?  We don't
disable paging, but that is the point where we switch from the EFI
pagetables to Xen's.

~Andrew



Re: [PATCH 5/6] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Jan Beulich
On 29.04.2020 14:29, Hongyan Xia wrote:
> (Looks like other patches in this series have been merged. Replying to
> this one only.)

Please send as a proper patch, this one came through ...

> From: Wei Liu 
> Date: Tue, 5 Feb 2019 16:32:54 +
> Subject: [PATCH] x86/pv: map and unmap page tables in
> mark_pv_pt_pages_rdonly
> 
> Also, clean up the initialisation of plXe.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Hongyan Xia 
> Reviewed-by: Julien Grall 
> ---
>  xen/arch/x86/pv/dom0_build.c | 32 +---
>  1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/xen/arch/x86/pv/dom0_build.c
> b/xen/arch/x86/pv/dom0_build.c
> index abfbe5f436..3522eb0114 100644
> --- a/xen/arch/x86/pv/dom0_build.c
> +++ b/xen/arch/x86/pv/dom0_build.c
> @@ -49,18 +49,11 @@ static __init void mark_pv_pt_pages_rdonly(struct
> domain *d,
>  {
>  unsigned long count;
>  struct page_info *page;
> -l4_pgentry_t *pl4e;
> -l3_pgentry_t *pl3e;
> -l2_pgentry_t *pl2e;
> -l1_pgentry_t *pl1e;
> -
> -pl4e = l4start + l4_table_offset(vpt_start);
> -pl3e = l4e_to_l3e(*pl4e);
> -pl3e += l3_table_offset(vpt_start);
> -pl2e = l3e_to_l2e(*pl3e);
> -pl2e += l2_table_offset(vpt_start);
> -pl1e = l2e_to_l1e(*pl2e);
> -pl1e += l1_table_offset(vpt_start);
> +l4_pgentry_t *pl4e = l4start + l4_table_offset(vpt_start);
> +l3_pgentry_t *pl3e = map_l3t_from_l4e(*pl4e) +
> l3_table_offset(vpt_start);
> +l2_pgentry_t *pl2e = map_l2t_from_l3e(*pl3e) +
> l2_table_offset(vpt_start);
> +l1_pgentry_t *pl1e = map_l1t_from_l2e(*pl2e) +
> l1_table_offset(vpt_start);

... mangled anyway. I also think with the change made you need to
drop the R-b.

Jan



Re: [PATCH 2/3] x86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds

2020-04-29 Thread Andrew Cooper
On 29/04/2020 14:29, Jan Beulich wrote:
> On 29.04.2020 15:13, Andrew Cooper wrote:
>> On 20/04/2020 15:09, Jan Beulich wrote:
>>> On 17.04.2020 17:50, Andrew Cooper wrote:
 --- a/xen/arch/x86/pv/domain.c
 +++ b/xen/arch/x86/pv/domain.c
 @@ -215,7 +215,7 @@ int switch_compat(struct domain *d)
  return 0;
  
  d->arch.has_32bit_shinfo = 1;
 -d->arch.is_32bit_pv = 1;
 +d->arch.pv.is_32bit = 1;
  
  for_each_vcpu( d, v )
  {
 @@ -235,7 +235,7 @@ int switch_compat(struct domain *d)
  return 0;
  
   undo_and_fail:
 -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
  for_each_vcpu( d, v )
  {
  free_compat_arg_xlat(v);
 @@ -358,7 +358,7 @@ int pv_domain_initialise(struct domain *d)
  d->arch.ctxt_switch = &pv_csw;
  
  /* 64-bit PV guest by default. */
 -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
 +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
>>> Switch to true/false while you're touching these?
>> Yes, but I'm tempted to delete these lines in the final hunk.  Its
>> writing zeros into a zeroed structures.
> Oh, yes, agreed.

Can I take this as an ack then?

~Andrew



Re: [PATCH 2/3] x86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds

2020-04-29 Thread Jan Beulich
On 29.04.2020 15:13, Andrew Cooper wrote:
> On 20/04/2020 15:09, Jan Beulich wrote:
>> On 17.04.2020 17:50, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/pv/domain.c
>>> +++ b/xen/arch/x86/pv/domain.c
>>> @@ -215,7 +215,7 @@ int switch_compat(struct domain *d)
>>>  return 0;
>>>  
>>>  d->arch.has_32bit_shinfo = 1;
>>> -d->arch.is_32bit_pv = 1;
>>> +d->arch.pv.is_32bit = 1;
>>>  
>>>  for_each_vcpu( d, v )
>>>  {
>>> @@ -235,7 +235,7 @@ int switch_compat(struct domain *d)
>>>  return 0;
>>>  
>>>   undo_and_fail:
>>> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>>> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
>>>  for_each_vcpu( d, v )
>>>  {
>>>  free_compat_arg_xlat(v);
>>> @@ -358,7 +358,7 @@ int pv_domain_initialise(struct domain *d)
>>>  d->arch.ctxt_switch = &pv_csw;
>>>  
>>>  /* 64-bit PV guest by default. */
>>> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>>> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
>> Switch to true/false while you're touching these?
> 
> Yes, but I'm tempted to delete these lines in the final hunk.  Its
> writing zeros into a zeroed structures.

Oh, yes, agreed.

Jan



Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 14:22, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin 
>> Sent: 29 April 2020 14:17
>> To: p...@xen.org; 'Jürgen Groß' ; 'Julien Grall' 
>> ; 'Julien Grall'
>> 
>> Cc: 'xen-devel' ; 'Ian Jackson' 
>> ; 'Wei Liu'
>> ; andrew.coop...@citrix.com
>> Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in 
>> xensotred
>>
>> On 29/04/2020 13:56, Paul Durrant wrote:
 -Original Message-
 From: Igor Druzhinin 
 Sent: 29 April 2020 13:50
 To: Jürgen Groß ; Julien Grall ; Julien 
 Grall
 
 Cc: xen-devel ; Ian Jackson 
 ; Wei Liu
 ; andrew.coop...@citrix.com; Paul Durrant 
 Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page 
 in xensotred

 On 29/04/2020 13:29, Jürgen Groß wrote:
>
> Wei, Ian, can you please tell me where I'm wrong?
>
> A soft reset should restart the domain in a clean state. AFAIK libxl is
> handling that by doing kind of in-place save-restore, including calling
> xs_release_domain() and later xs_introduce_domain(). This should result
> in xenstored throwing away all state it has regarding the domain and
> then restarting with a new (internal) domain instance.
>
> Is XAPI doing soft reset differently? Why should there be a need for
> keeping xenstored data across a soft reset?

 Yes, XAPI is doing soft reset differently from libxl. You need to keep 
 xenstore
 data to at least keep backends working without the need to reinitialize 
 them
 before entering kdump kernel. We do the same thing while entering crash 
 kernel
 in Windows after BSOD (CC Paul as he recommended this approach).
>>>
>>> IIRC I recommended not involving Xen or the toolstack in entering the crash 
>>> kernel... they don't
>> need to know. Windows quite happily enters its crash kernel, rebuilds its 
>> Xen interfaces and re-
>> attaches to PV backends without any external intervention.
>>
>> In case of kdump toolstack certainly needs to know as it gets shutdown code 
>> 5 and
>> needs to restart the domain.
>>
> 
> The toolstack needs to restart the domain once the crash has completed, yes.

To clarify, what I meant is that once crash happened (before kdump kernel is 
loaded)
toolstack gets code 5 and then it's supposed to call soft reset and unpause the 
domain
to actually load into crash kernel.

I didn't mean that after crash kernel is finished the domain has to be 
restarted - that's
obvious.

Igor

 



Re: [PATCH] x86/S3: Drop {save,restore}_rest_processor_state() completely

2020-04-29 Thread Jan Beulich
On 29.04.2020 13:32, Andrew Cooper wrote:
> On 29/04/2020 12:16, Jan Beulich wrote:
>> On 29.04.2020 13:09, Andrew Cooper wrote:
>>> --- a/xen/arch/x86/boot/trampoline.S
>>> +++ b/xen/arch/x86/boot/trampoline.S
>>> @@ -91,6 +91,11 @@ trampoline_protmode_entry:
>>>  and %edi,%edx
>>>  wrmsr
>>>  1:
>>> +/* Set up PAT before enabling paging. */
>>> +mov $XEN_MSR_PAT & 0x, %eax
>>> +mov $XEN_MSR_PAT >> 32, %edx
>>> +mov $MSR_IA32_CR_PAT, %ecx
>>> +wrmsr
>> Doesn't this also eliminate the need for cpu_init() doing this?
>> If you agree with that one also dropped
>> Reviewed-by: Jan Beulich 
> 
> That doesn't cover the BSP on either the legacy or EFI paths.

The legacy path, afaict, uses it:

.Lskip_realmode:
/* EBX == 0 indicates we are the BP (Boot Processor). */
xor %ebx,%ebx

/* Jump to the common bootstrap entry point. */
jmp trampoline_protmode_entry

The xen.efi entry path really should have the change you make
mirrored anyway.

Jan



Re: [PATCH v2 4/4] x86: adjustments to guest handle treatment

2020-04-29 Thread Julien Grall

Hi,

On 22/04/2020 10:32, Jan Beulich wrote:

On 22.04.2020 10:17, Julien Grall wrote:

On 21/04/2020 10:13, Jan Beulich wrote:

First of all avoid excessive conversions. copy_{from,to}_guest(), for
example, work fine with all of XEN_GUEST_HANDLE{,_64,_PARAM}().

Further
- do_physdev_op_compat() didn't use the param form for its parameter,
- {hap,shadow}_track_dirty_vram() wrongly used the param form,
- compat processor Px logic failed to check compatibility of native and
    compat structures not further converted.

As this eliminates all users of guest_handle_from_param() and as there's
no real need to allow for conversions in both directions, drop the
macros as well.


I was kind of expecting both guest_handle_from_param() and
guest_handle_to_param() to be dropped together. May I ask why
you still need guest_handle_to_param()?


There are three (iirc) uses left which I don't really see how
to sensibly replace. Take a look at the different callers of
x86's vcpumask_to_pcpumask(), for example.


Oh, const_guest_handle_from_ptr() is returning a GUEST_HANDLE_PARAM. 
This is a bit odd but fair enough.





--- a/xen/include/xen/acpi.h
+++ b/xen/include/xen/acpi.h
@@ -184,8 +184,8 @@ static inline unsigned int acpi_get_csub
   static inline void acpi_set_csubstate_limit(unsigned int new_limit) { 
return; }
   #endif
   -#ifdef XEN_GUEST_HANDLE_PARAM
-int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE_PARAM(uint32));
+#ifdef XEN_GUEST_HANDLE
+int acpi_set_pdc_bits(u32 acpi_id, XEN_GUEST_HANDLE(uint32));
   #endif


Do we really need to keep the #ifdef here?


I think so, yes, or else the original one wouldn't have been
needed either. (Consider the header getting included without
any of the public headers having got included first.) Dropping
(if it was possible) this would be an orthogonal change imo.


Fair point.

Cheers,

--
Julien Grall



Re: [ANNOUNCE] Xen 4.14 Development Update

2020-04-29 Thread Jason Andryuk
On Fri, Apr 24, 2020 at 3:40 AM Paul Durrant  wrote:
> *  Linux stub domains (v4)
>   -  Marek Marczykowski-Górecki

In coordination with Marek, I've posted v5.

-Jason



RE: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin 
> Sent: 29 April 2020 14:17
> To: p...@xen.org; 'Jürgen Groß' ; 'Julien Grall' 
> ; 'Julien Grall'
> 
> Cc: 'xen-devel' ; 'Ian Jackson' 
> ; 'Wei Liu'
> ; andrew.coop...@citrix.com
> Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in 
> xensotred
> 
> On 29/04/2020 13:56, Paul Durrant wrote:
> >> -Original Message-
> >> From: Igor Druzhinin 
> >> Sent: 29 April 2020 13:50
> >> To: Jürgen Groß ; Julien Grall ; Julien 
> >> Grall
> >> 
> >> Cc: xen-devel ; Ian Jackson 
> >> ; Wei Liu
> >> ; andrew.coop...@citrix.com; Paul Durrant 
> >> Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page 
> >> in xensotred
> >>
> >> On 29/04/2020 13:29, Jürgen Groß wrote:
> >>>
> >>> Wei, Ian, can you please tell me where I'm wrong?
> >>>
> >>> A soft reset should restart the domain in a clean state. AFAIK libxl is
> >>> handling that by doing kind of in-place save-restore, including calling
> >>> xs_release_domain() and later xs_introduce_domain(). This should result
> >>> in xenstored throwing away all state it has regarding the domain and
> >>> then restarting with a new (internal) domain instance.
> >>>
> >>> Is XAPI doing soft reset differently? Why should there be a need for
> >>> keeping xenstored data across a soft reset?
> >>
> >> Yes, XAPI is doing soft reset differently from libxl. You need to keep 
> >> xenstore
> >> data to at least keep backends working without the need to reinitialize 
> >> them
> >> before entering kdump kernel. We do the same thing while entering crash 
> >> kernel
> >> in Windows after BSOD (CC Paul as he recommended this approach).
> >
> > IIRC I recommended not involving Xen or the toolstack in entering the crash 
> > kernel... they don't
> need to know. Windows quite happily enters its crash kernel, rebuilds its Xen 
> interfaces and re-
> attaches to PV backends without any external intervention.
> 
> In case of kdump toolstack certainly needs to know as it gets shutdown code 5 
> and
> needs to restart the domain.
> 

The toolstack needs to restart the domain once the crash has completed, yes.

> And I'm not completely sure it's possible to enter kdump without calling soft 
> reset
> at all. Even if it's possible I'd be cautious to do it in production for the 
> future.
> 

If it is not possible at the moment then IMO it should be made so; having soft 
reset in the toolstack is a layering violation IMO.

  Paul





Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 13:56, Paul Durrant wrote:
>> -Original Message-
>> From: Igor Druzhinin 
>> Sent: 29 April 2020 13:50
>> To: Jürgen Groß ; Julien Grall ; Julien 
>> Grall
>> 
>> Cc: xen-devel ; Ian Jackson 
>> ; Wei Liu
>> ; andrew.coop...@citrix.com; Paul Durrant 
>> Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in 
>> xensotred
>>
>> On 29/04/2020 13:29, Jürgen Groß wrote:
>>>
>>> Wei, Ian, can you please tell me where I'm wrong?
>>>
>>> A soft reset should restart the domain in a clean state. AFAIK libxl is
>>> handling that by doing kind of in-place save-restore, including calling
>>> xs_release_domain() and later xs_introduce_domain(). This should result
>>> in xenstored throwing away all state it has regarding the domain and
>>> then restarting with a new (internal) domain instance.
>>>
>>> Is XAPI doing soft reset differently? Why should there be a need for
>>> keeping xenstored data across a soft reset?
>>
>> Yes, XAPI is doing soft reset differently from libxl. You need to keep 
>> xenstore
>> data to at least keep backends working without the need to reinitialize them
>> before entering kdump kernel. We do the same thing while entering crash 
>> kernel
>> in Windows after BSOD (CC Paul as he recommended this approach).
> 
> IIRC I recommended not involving Xen or the toolstack in entering the crash 
> kernel... they don't need to know. Windows quite happily enters its crash 
> kernel, rebuilds its Xen interfaces and re-attaches to PV backends without 
> any external intervention.

In case of kdump toolstack certainly needs to know as it gets shutdown code 5 
and
needs to restart the domain.

And I'm not completely sure it's possible to enter kdump without calling soft 
reset
at all. Even if it's possible I'd be cautious to do it in production for the 
future.

Igor



Re: [PATCH 2/3] x86/pv: Short-circuit is_pv_{32,64}bit_domain() in !CONFIG_PV32 builds

2020-04-29 Thread Andrew Cooper
On 20/04/2020 15:09, Jan Beulich wrote:
> On 17.04.2020 17:50, Andrew Cooper wrote:
>> --- a/xen/arch/x86/pv/domain.c
>> +++ b/xen/arch/x86/pv/domain.c
>> @@ -215,7 +215,7 @@ int switch_compat(struct domain *d)
>>  return 0;
>>  
>>  d->arch.has_32bit_shinfo = 1;
>> -d->arch.is_32bit_pv = 1;
>> +d->arch.pv.is_32bit = 1;
>>  
>>  for_each_vcpu( d, v )
>>  {
>> @@ -235,7 +235,7 @@ int switch_compat(struct domain *d)
>>  return 0;
>>  
>>   undo_and_fail:
>> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
>>  for_each_vcpu( d, v )
>>  {
>>  free_compat_arg_xlat(v);
>> @@ -358,7 +358,7 @@ int pv_domain_initialise(struct domain *d)
>>  d->arch.ctxt_switch = &pv_csw;
>>  
>>  /* 64-bit PV guest by default. */
>> -d->arch.is_32bit_pv = d->arch.has_32bit_shinfo = 0;
>> +d->arch.pv.is_32bit = d->arch.has_32bit_shinfo = 0;
> Switch to true/false while you're touching these?

Yes, but I'm tempted to delete these lines in the final hunk.  Its
writing zeros into a zeroed structures.

~Andrew



[PATCH v2 1/3] x86/pv: Options to disable and/or compile out 32bit PV support

2020-04-29 Thread Andrew Cooper
This is the start of some performance and security-hardening improvements,
based on the fact that 32bit PV guests are few and far between these days.

Ring1 is full of architectural corner cases, such as counting as supervisor
from a paging point of view.  This accounts for a substantial performance hit
on processors from the last 8 years (adjusting SMEP/SMAP on every privilege
transition), and the gap is only going to get bigger with new hardware
features.

Signed-off-by: Andrew Cooper 
Reviewed-by: Wei Liu 
Reviewed-by: Roger Pau Monné 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monné 

v2:
 * Fix typos, __init
 * Introduce no_config_param() wrapper
---
 docs/misc/xen-command-line.pandoc | 12 +++-
 xen/arch/x86/Kconfig  | 16 
 xen/arch/x86/pv/domain.c  | 34 ++
 xen/arch/x86/setup.c  |  9 +++--
 xen/include/asm-x86/pv/domain.h   |  6 ++
 xen/include/xen/param.h   |  9 +
 6 files changed, 83 insertions(+), 3 deletions(-)

diff --git a/docs/misc/xen-command-line.pandoc 
b/docs/misc/xen-command-line.pandoc
index acd0b3d994..ee12b0f53f 100644
--- a/docs/misc/xen-command-line.pandoc
+++ b/docs/misc/xen-command-line.pandoc
@@ -1694,7 +1694,17 @@ The following resources are available:
 CDP, one COS will corespond two CBMs other than one with CAT, due to the
 sum of CBMs is fixed, that means actual `cos_max` in use will automatically
 reduce to half when CDP is enabled.
-   
+
+### pv
+= List of [ 32= ]
+
+Applicability: x86
+
+Controls for aspects of PV guest support.
+
+*   The `32` boolean controls whether 32bit PV guests can be created.  It
+defaults to `true`, and is ignored when `CONFIG_PV32` is compiled out.
+
 ### pv-linear-pt (x86)
 > `= `
 
diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig
index a69be983d6..96432f1f69 100644
--- a/xen/arch/x86/Kconfig
+++ b/xen/arch/x86/Kconfig
@@ -49,6 +49,22 @@ config PV
 
  If unsure, say Y.
 
+config PV32
+   bool "Support for 32bit PV guests"
+   depends on PV
+   default y
+   ---help---
+ The 32bit PV ABI uses Ring1, an area of the x86 architecture which
+ was deprecated and mostly removed in the AMD64 spec.  As a result,
+ it occasionally conflicts with newer x86 hardware features, causing
+ overheads for Xen to maintain backwards compatibility.
+
+ People may wish to disable 32bit PV guests for attack surface
+ reduction, or performance reasons.  Backwards compatibility can be
+ provided via the PV Shim mechanism.
+
+ If unsure, say Y.
+
 config PV_LINEAR_PT
bool "Support for PV linear pagetables"
depends on PV
diff --git a/xen/arch/x86/pv/domain.c b/xen/arch/x86/pv/domain.c
index 43da5c179f..3579dc063e 100644
--- a/xen/arch/x86/pv/domain.c
+++ b/xen/arch/x86/pv/domain.c
@@ -16,6 +16,38 @@
 #include 
 #include 
 
+#ifdef CONFIG_PV32
+int8_t __read_mostly opt_pv32 = -1;
+#endif
+
+static __init int parse_pv(const char *s)
+{
+const char *ss;
+int val, rc = 0;
+
+do {
+ss = strchr(s, ',');
+if ( !ss )
+ss = strchr(s, '\0');
+
+if ( (val = parse_boolean("32", s, ss)) >= 0 )
+{
+#ifdef CONFIG_PV32
+opt_pv32 = val;
+#else
+no_config_param("PV32", "pv", s, ss);
+#endif
+}
+else
+rc = -EINVAL;
+
+s = ss + 1;
+} while ( *ss );
+
+return rc;
+}
+custom_param("pv", parse_pv);
+
 static __read_mostly enum {
 PCID_OFF,
 PCID_ALL,
@@ -174,6 +206,8 @@ int switch_compat(struct domain *d)
 
 BUILD_BUG_ON(offsetof(struct shared_info, vcpu_info) != 0);
 
+if ( !opt_pv32 )
+return -EOPNOTSUPP;
 if ( is_hvm_domain(d) || domain_tot_pages(d) != 0 )
 return -EACCES;
 if ( is_pv_32bit_domain(d) )
diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c
index eb56d78c2f..9e9576344c 100644
--- a/xen/arch/x86/setup.c
+++ b/xen/arch/x86/setup.c
@@ -53,6 +53,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* opt_nosmp: If true, secondary processors are ignored. */
 static bool __initdata opt_nosmp;
@@ -1870,8 +1871,12 @@ void arch_get_xen_caps(xen_capabilities_info_t *info)
 {
 snprintf(s, sizeof(s), "xen-%d.%d-x86_64 ", major, minor);
 safe_strcat(*info, s);
-snprintf(s, sizeof(s), "xen-%d.%d-x86_32p ", major, minor);
-safe_strcat(*info, s);
+
+if ( opt_pv32 )
+{
+snprintf(s, sizeof(s), "xen-%d.%d-x86_32p ", major, minor);
+safe_strcat(*info, s);
+}
 }
 if ( hvm_enabled )
 {
diff --git a/xen/include/asm-x86/pv/domain.h b/xen/include/asm-x86/pv/domain.h
index 7a69bfb303..df9716ff26 100644
--- a/xen/include/asm-x86/pv/domain.h
+++ b/xen/include/asm-x86/pv/domain.h
@@ -23,6 +23,12 @@
 
 #include 
 
+#ifdef CONFIG_PV32
+extern int8_t opt_pv32;
+#else
+# define opt_pv32 false
+#endif

Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Jürgen Groß

On 29.04.20 14:49, Igor Druzhinin wrote:

On 29/04/2020 13:29, Jürgen Groß wrote:

On 29.04.20 13:04, Igor Druzhinin wrote:

On 29/04/2020 11:49, Jürgen Groß wrote:

On 29.04.20 12:39, Igor Druzhinin wrote:

On 29/04/2020 10:22, Julien Grall wrote:

Hi Juergen,

On 29/04/2020 06:51, Jürgen Groß wrote:


Recreating the event channel would be fine, but I don't see why it
would ever be needed. And XS_INTRODUCE is called only at domain creation
time today, so there is obviously no need for repeating this call.

Maybe the idea was to do this after sending a XS_RESUME command, which
isn't used anywhere in Xen and is another part of Xenstore which doesn't
make any sense today.


Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port XS_INTRODUCE 
evtchn rebind function from cxenstored" added the exact same behavior in the OCaml 
XenStored last year.

This was introduced 12 years after C XenStored, so surely someone think this is 
useful. We should check why this was introduced in OCaml XenStored (I have CCed 
the author of the patch).

If we still think this is not useful, then you should add an explanation in the 
commit message why the two implementations diverge and possibly update the spec.


Thanks for CC, Julien.

We indeed already use this functionality in our toolstack for guest kdump
functions. It's not possible in XAPI to adopt libxl model where almost 
everything
is restarted for a domain entering kdump, so we have to use this message to
rebind xenstore evtchn after soft reset without shutting down backends and
recreating the whole subtree (frontends reconnect fine after that).

We obviously only require it for now to be present in oxenstored only.
Please don't remove this functionality if possible.


If I read handling in libxl correctly, in the soft reset case XS_RELEASE
is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
xenstored throwing away its related struct domain, so XS_INTRODUCE will
be possible again.


  From what I remember it was not possible to keep xenstored data for a domain
and at the same time perform release-introduce cycle (at least in oxenstored).
It also involved firing @releaseDomain which caused havoc in other part of
the toolstack.


Wei, Ian, can you please tell me where I'm wrong?

A soft reset should restart the domain in a clean state. AFAIK libxl is
handling that by doing kind of in-place save-restore, including calling
xs_release_domain() and later xs_introduce_domain(). This should result
in xenstored throwing away all state it has regarding the domain and
then restarting with a new (internal) domain instance.

Is XAPI doing soft reset differently? Why should there be a need for
keeping xenstored data across a soft reset?


Yes, XAPI is doing soft reset differently from libxl. You need to keep xenstore
data to at least keep backends working without the need to reinitialize them
before entering kdump kernel. We do the same thing while entering crash kernel
in Windows after BSOD (CC Paul as he recommended this approach).
There are other reasons to not reset xenstore data.

I considered XS_INTRODUCE functionality to be part of xenstored ABI and we built
a lot of infrastructure on top of that. So I don't think it could be easily 
removed now.


Nobody wants to remove XS_INTRODUCE. It was just questioned there is a
need to call XS_INTRODUCE multiple times for a domain without a call of
XS_RELEASE in between.

In case there is such a need this means we either should just drop the
test for the mfn to match and recreate the event-channel (which will do
no harm IMO), or we need to keep the mfn in the live-update state record
with the knowledge that it is far from being a good indicator for the
test whether a domain is known already or not (two HVM domains using a
similar configuration with the very same kernel will use the same gfn
probably).

As only dom0 is allowed to use XS_INTRODUCE and I'm not aware of any
problems with neither XS_INTRODUCE failing due to illegal calls (no mfn
match), nor with potentially recreating the event channel for a "wrong"
domain, I suggest to just allow XS_INTRODUCE to be called as often as
the toolstack wants to.

In the end that was the primary reason to write this patch: to remove
the need to have the mfn in the live-update state record.

Thoughts?


Juergen



RE: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Paul Durrant
> -Original Message-
> From: Igor Druzhinin 
> Sent: 29 April 2020 13:50
> To: Jürgen Groß ; Julien Grall ; Julien Grall
> 
> Cc: xen-devel ; Ian Jackson 
> ; Wei Liu
> ; andrew.coop...@citrix.com; Paul Durrant 
> Subject: Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in 
> xensotred
> 
> On 29/04/2020 13:29, Jürgen Groß wrote:
> > On 29.04.20 13:04, Igor Druzhinin wrote:
> >> On 29/04/2020 11:49, Jürgen Groß wrote:
> >>> On 29.04.20 12:39, Igor Druzhinin wrote:
>  On 29/04/2020 10:22, Julien Grall wrote:
> > Hi Juergen,
> >
> > On 29/04/2020 06:51, Jürgen Groß wrote:
> >>
> >> Recreating the event channel would be fine, but I don't see why it
> >> would ever be needed. And XS_INTRODUCE is called only at domain 
> >> creation
> >> time today, so there is obviously no need for repeating this call.
> >>
> >> Maybe the idea was to do this after sending a XS_RESUME command, which
> >> isn't used anywhere in Xen and is another part of Xenstore which 
> >> doesn't
> >> make any sense today.
> >
> > Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port 
> > XS_INTRODUCE evtchn
> rebind function from cxenstored" added the exact same behavior in the OCaml 
> XenStored last year.
> >
> > This was introduced 12 years after C XenStored, so surely someone think 
> > this is useful. We
> should check why this was introduced in OCaml XenStored (I have CCed the 
> author of the patch).
> >
> > If we still think this is not useful, then you should add an 
> > explanation in the commit message
> why the two implementations diverge and possibly update the spec.
> 
>  Thanks for CC, Julien.
> 
>  We indeed already use this functionality in our toolstack for guest kdump
>  functions. It's not possible in XAPI to adopt libxl model where almost 
>  everything
>  is restarted for a domain entering kdump, so we have to use this message 
>  to
>  rebind xenstore evtchn after soft reset without shutting down backends 
>  and
>  recreating the whole subtree (frontends reconnect fine after that).
> 
>  We obviously only require it for now to be present in oxenstored only.
>  Please don't remove this functionality if possible.
> >>>
> >>> If I read handling in libxl correctly, in the soft reset case XS_RELEASE
> >>> is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
> >>> xenstored throwing away its related struct domain, so XS_INTRODUCE will
> >>> be possible again.
> >>
> >>  From what I remember it was not possible to keep xenstored data for a 
> >> domain
> >> and at the same time perform release-introduce cycle (at least in 
> >> oxenstored).
> >> It also involved firing @releaseDomain which caused havoc in other part of
> >> the toolstack.
> >
> > Wei, Ian, can you please tell me where I'm wrong?
> >
> > A soft reset should restart the domain in a clean state. AFAIK libxl is
> > handling that by doing kind of in-place save-restore, including calling
> > xs_release_domain() and later xs_introduce_domain(). This should result
> > in xenstored throwing away all state it has regarding the domain and
> > then restarting with a new (internal) domain instance.
> >
> > Is XAPI doing soft reset differently? Why should there be a need for
> > keeping xenstored data across a soft reset?
> 
> Yes, XAPI is doing soft reset differently from libxl. You need to keep 
> xenstore
> data to at least keep backends working without the need to reinitialize them
> before entering kdump kernel. We do the same thing while entering crash kernel
> in Windows after BSOD (CC Paul as he recommended this approach).

IIRC I recommended not involving Xen or the toolstack in entering the crash 
kernel... they don't need to know. Windows quite happily enters its crash 
kernel, rebuilds its Xen interfaces and re-attaches to PV backends without any 
external intervention.

  Paul

> There are other reasons to not reset xenstore data.
> 
> I considered XS_INTRODUCE functionality to be part of xenstored ABI and we 
> built
> a lot of infrastructure on top of that. So I don't think it could be easily 
> removed now.
> 
> Igor




Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 13:29, Jürgen Groß wrote:
> On 29.04.20 13:04, Igor Druzhinin wrote:
>> On 29/04/2020 11:49, Jürgen Groß wrote:
>>> On 29.04.20 12:39, Igor Druzhinin wrote:
 On 29/04/2020 10:22, Julien Grall wrote:
> Hi Juergen,
>
> On 29/04/2020 06:51, Jürgen Groß wrote:
>>
>> Recreating the event channel would be fine, but I don't see why it
>> would ever be needed. And XS_INTRODUCE is called only at domain creation
>> time today, so there is obviously no need for repeating this call.
>>
>> Maybe the idea was to do this after sending a XS_RESUME command, which
>> isn't used anywhere in Xen and is another part of Xenstore which doesn't
>> make any sense today.
>
> Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port 
> XS_INTRODUCE evtchn rebind function from cxenstored" added the exact same 
> behavior in the OCaml XenStored last year.
>
> This was introduced 12 years after C XenStored, so surely someone think 
> this is useful. We should check why this was introduced in OCaml 
> XenStored (I have CCed the author of the patch).
>
> If we still think this is not useful, then you should add an explanation 
> in the commit message why the two implementations diverge and possibly 
> update the spec.

 Thanks for CC, Julien.

 We indeed already use this functionality in our toolstack for guest kdump
 functions. It's not possible in XAPI to adopt libxl model where almost 
 everything
 is restarted for a domain entering kdump, so we have to use this message to
 rebind xenstore evtchn after soft reset without shutting down backends and
 recreating the whole subtree (frontends reconnect fine after that).

 We obviously only require it for now to be present in oxenstored only.
 Please don't remove this functionality if possible.
>>>
>>> If I read handling in libxl correctly, in the soft reset case XS_RELEASE
>>> is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
>>> xenstored throwing away its related struct domain, so XS_INTRODUCE will
>>> be possible again.
>>
>>  From what I remember it was not possible to keep xenstored data for a domain
>> and at the same time perform release-introduce cycle (at least in 
>> oxenstored).
>> It also involved firing @releaseDomain which caused havoc in other part of
>> the toolstack.
> 
> Wei, Ian, can you please tell me where I'm wrong?
> 
> A soft reset should restart the domain in a clean state. AFAIK libxl is
> handling that by doing kind of in-place save-restore, including calling
> xs_release_domain() and later xs_introduce_domain(). This should result
> in xenstored throwing away all state it has regarding the domain and
> then restarting with a new (internal) domain instance.
> 
> Is XAPI doing soft reset differently? Why should there be a need for
> keeping xenstored data across a soft reset?

Yes, XAPI is doing soft reset differently from libxl. You need to keep xenstore
data to at least keep backends working without the need to reinitialize them 
before entering kdump kernel. We do the same thing while entering crash kernel
in Windows after BSOD (CC Paul as he recommended this approach).
There are other reasons to not reset xenstore data.

I considered XS_INTRODUCE functionality to be part of xenstored ABI and we built
a lot of infrastructure on top of that. So I don't think it could be easily 
removed now.

Igor



Re: [PATCH] mini-os: Avoid segfaults in tc{g,s}etattr

2020-04-29 Thread Andrew Cooper
On 28/04/2020 12:55, Andrew Cooper wrote:
>> Below is what I was preparing to submit as a patch.  So, yes it hacks around
>> it, but it isn't messy.
>>
>> ---
>> Disable fcf-protection to build working binaries
>>
>> Ubuntu gcc-9 enables -fcf-protection by default, which conflicts with
>> -mindirect-branch=extern and prevents building the hypervisor with
>> CONFIG_INDIRECT_THUNK:
>> xmalloc.h:81:1: error: ‘-mindirect-branch’ and ‘-fcf-protection’ are not
>> compatible
>>
>> Stefan Bader also noticed that build32.mk requires -fcf-protection=none
>> or else the hypervisor will not boot.
>> https://bugs.launchpad.net/ubuntu/+source/gcc-9/+bug/1863260  Similarly,
>> rombios reboots almost immediately without -fcf-protection=none.  Both
>> of those can be handled by setting it in EMBEDDED_EXTRA_CFLAGS.
>>
>> CC: Stefan Bader 
>> Signed-off-by: Jason Andryuk 
> Sadly, this isn't really appropriate.  We specifically do want to use
> both -fcf-protection and -mindirect-branch=thunk-extern together, when
> GCC isn't broken.
>
> Overriding -fcf-protection is ok but only when we're certain we've got a
> buggy GCC, so that when this bug is fixed, we can return to sensible
> behaviour.

GCC has been adjusted on master
(https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9be3bb2c0a258fd6a7d3d05d232a21930c757d3c)
and the gcc-9 branch
(https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=a03efb266fcbf4a01285fff871a5bfe5caac4944).
 
This should be fixed for GCC 10 and 9.4

I checked the resulting hypervisor build with both -fcf-protection and
retpolines, and it works fine.

The question now is what to do all the buggy GCCs out there.  We can
either ignore the problem and it will eventually go away, or spot the
problematic compiler and clobber -fcf-protection.

We also need to see what is wrong with RomBIOS, because that is weird. 
However, we should not be interfering with the HOSTCC settings.

~Andrew



Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Jürgen Groß

On 29.04.20 13:04, Igor Druzhinin wrote:

On 29/04/2020 11:49, Jürgen Groß wrote:

On 29.04.20 12:39, Igor Druzhinin wrote:

On 29/04/2020 10:22, Julien Grall wrote:

Hi Juergen,

On 29/04/2020 06:51, Jürgen Groß wrote:


Recreating the event channel would be fine, but I don't see why it
would ever be needed. And XS_INTRODUCE is called only at domain creation
time today, so there is obviously no need for repeating this call.

Maybe the idea was to do this after sending a XS_RESUME command, which
isn't used anywhere in Xen and is another part of Xenstore which doesn't
make any sense today.


Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port XS_INTRODUCE 
evtchn rebind function from cxenstored" added the exact same behavior in the OCaml 
XenStored last year.

This was introduced 12 years after C XenStored, so surely someone think this is 
useful. We should check why this was introduced in OCaml XenStored (I have CCed 
the author of the patch).

If we still think this is not useful, then you should add an explanation in the 
commit message why the two implementations diverge and possibly update the spec.


Thanks for CC, Julien.

We indeed already use this functionality in our toolstack for guest kdump
functions. It's not possible in XAPI to adopt libxl model where almost 
everything
is restarted for a domain entering kdump, so we have to use this message to
rebind xenstore evtchn after soft reset without shutting down backends and
recreating the whole subtree (frontends reconnect fine after that).

We obviously only require it for now to be present in oxenstored only.
Please don't remove this functionality if possible.


If I read handling in libxl correctly, in the soft reset case XS_RELEASE
is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
xenstored throwing away its related struct domain, so XS_INTRODUCE will
be possible again.


 From what I remember it was not possible to keep xenstored data for a domain
and at the same time perform release-introduce cycle (at least in oxenstored).
It also involved firing @releaseDomain which caused havoc in other part of
the toolstack.


Wei, Ian, can you please tell me where I'm wrong?

A soft reset should restart the domain in a clean state. AFAIK libxl is
handling that by doing kind of in-place save-restore, including calling
xs_release_domain() and later xs_introduce_domain(). This should result
in xenstored throwing away all state it has regarding the domain and
then restarting with a new (internal) domain instance.

Is XAPI doing soft reset differently? Why should there be a need for
keeping xenstored data across a soft reset?


Juergen



Re: [PATCH 5/6] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Hongyan Xia
(Looks like other patches in this series have been merged. Replying to
this one only.)

From: Wei Liu 
Date: Tue, 5 Feb 2019 16:32:54 +
Subject: [PATCH] x86/pv: map and unmap page tables in
mark_pv_pt_pages_rdonly

Also, clean up the initialisation of plXe.

Signed-off-by: Wei Liu 
Signed-off-by: Hongyan Xia 
Reviewed-by: Julien Grall 
---
 xen/arch/x86/pv/dom0_build.c | 32 +---
 1 file changed, 17 insertions(+), 15 deletions(-)

diff --git a/xen/arch/x86/pv/dom0_build.c
b/xen/arch/x86/pv/dom0_build.c
index abfbe5f436..3522eb0114 100644
--- a/xen/arch/x86/pv/dom0_build.c
+++ b/xen/arch/x86/pv/dom0_build.c
@@ -49,18 +49,11 @@ static __init void mark_pv_pt_pages_rdonly(struct
domain *d,
 {
 unsigned long count;
 struct page_info *page;
-l4_pgentry_t *pl4e;
-l3_pgentry_t *pl3e;
-l2_pgentry_t *pl2e;
-l1_pgentry_t *pl1e;
-
-pl4e = l4start + l4_table_offset(vpt_start);
-pl3e = l4e_to_l3e(*pl4e);
-pl3e += l3_table_offset(vpt_start);
-pl2e = l3e_to_l2e(*pl3e);
-pl2e += l2_table_offset(vpt_start);
-pl1e = l2e_to_l1e(*pl2e);
-pl1e += l1_table_offset(vpt_start);
+l4_pgentry_t *pl4e = l4start + l4_table_offset(vpt_start);
+l3_pgentry_t *pl3e = map_l3t_from_l4e(*pl4e) +
l3_table_offset(vpt_start);
+l2_pgentry_t *pl2e = map_l2t_from_l3e(*pl3e) +
l2_table_offset(vpt_start);
+l1_pgentry_t *pl1e = map_l1t_from_l2e(*pl2e) +
l1_table_offset(vpt_start);
+
 for ( count = 0; count < nr_pt_pages; count++ )
 {
 l1e_remove_flags(*pl1e, _PAGE_RW);
@@ -85,12 +78,21 @@ static __init void mark_pv_pt_pages_rdonly(struct
domain *d,
 if ( !((unsigned long)++pl2e & (PAGE_SIZE - 1)) )
 {
 if ( !((unsigned long)++pl3e & (PAGE_SIZE - 1)) )
-pl3e = l4e_to_l3e(*++pl4e);
-pl2e = l3e_to_l2e(*pl3e);
+{
+/* Need to unmap the page before the increment. */
+unmap_domain_page(pl3e - 1);
+pl3e = map_l3t_from_l4e(*++pl4e);
+}
+unmap_domain_page(pl2e - 1);
+pl2e = map_l2t_from_l3e(*pl3e);
 }
-pl1e = l2e_to_l1e(*pl2e);
+unmap_domain_page(pl1e - 1);
+pl1e = map_l1t_from_l2e(*pl2e);
 }
 }
+unmap_domain_page(pl1e);
+unmap_domain_page(pl2e);
+unmap_domain_page(pl3e);
 }
 
 static __init void setup_pv_physmap(struct domain *d, unsigned long
pgtbl_pfn,
-- 
2.24.1.AMZN




Re: [PATCH] x86/hyperv: stash and use the configured max VP index

2020-04-29 Thread Wei Liu
On Wed, Apr 29, 2020 at 11:41:44AM +0100, Wei Liu wrote:
> The value returned from CPUID is the maximum number for virtual
> processors supported by Hyper-V. It could be larger than the maximum
> number of virtual processors configured.
> 
> Stash the configured number into a variable and use it in calculations.
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/guest/hyperv/hyperv.c  | 4 
>  xen/arch/x86/guest/hyperv/private.h | 1 +
>  xen/arch/x86/guest/hyperv/tlb.c | 2 +-
>  xen/arch/x86/guest/hyperv/util.c| 2 +-
>  4 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c 
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index 91a6782cd986..84221b751453 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -33,6 +33,7 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
>  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
>  DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
>  
> +unsigned int __read_mostly hv_max_vp_index;
>  static bool __read_mostly hcall_page_ready;
>  
>  static uint64_t generate_guest_id(void)
> @@ -143,6 +144,9 @@ static int setup_hypercall_pcpu_arg(void)
>  rdmsrl(HV_X64_MSR_VP_INDEX, vp_index_msr);
>  this_cpu(hv_vp_index) = vp_index_msr;
>  
> +if ( vp_index_msr > hv_max_vp_index )
> +hv_max_vp_index = vp_index_msr;
> +
>  return 0;
>  }
>  
> diff --git a/xen/arch/x86/guest/hyperv/private.h 
> b/xen/arch/x86/guest/hyperv/private.h
> index 354fc7f685a7..fea3e291e944 100644
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -28,6 +28,7 @@
>  DECLARE_PER_CPU(void *, hv_input_page);
>  DECLARE_PER_CPU(void *, hv_vp_assist);
>  DECLARE_PER_CPU(unsigned int, hv_vp_index);
> +extern unsigned int hv_max_vp_index;
>  
>  static inline unsigned int hv_vp_index(unsigned int cpu)
>  {
> diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
> index 1d723d6ee679..0a44071481bd 100644
> --- a/xen/arch/x86/guest/hyperv/tlb.c
> +++ b/xen/arch/x86/guest/hyperv/tlb.c
> @@ -166,7 +166,7 @@ int hyperv_flush_tlb(const cpumask_t *mask, const void 
> *va,
>  {
>  unsigned int vpid = hv_vp_index(cpu);
>  
> -if ( vpid >= ms_hyperv.max_vp_index )
> +if ( vpid >= hv_max_vp_index )

I think the >= should be changed to > here.

Wei.

>  {
>  local_irq_restore(irq_flags);
>  return -ENXIO;
> diff --git a/xen/arch/x86/guest/hyperv/util.c 
> b/xen/arch/x86/guest/hyperv/util.c
> index bec61c2afd87..2c5f421b7bd9 100644
> --- a/xen/arch/x86/guest/hyperv/util.c
> +++ b/xen/arch/x86/guest/hyperv/util.c
> @@ -33,7 +33,7 @@ int cpumask_to_vpset(struct hv_vpset *vpset,
>  {
>  int nr = 1;
>  unsigned int cpu, vcpu_bank, vcpu_offset;
> -unsigned int max_banks = ms_hyperv.max_vp_index / 64;
> +unsigned int max_banks = hv_max_vp_index / 64;
>  
>  /* Up to 64 banks can be represented by valid_bank_mask */
>  if ( max_banks > 64 )
> -- 
> 2.20.1
> 



[linux-linus test] 149868: tolerable FAIL - PUSHED

2020-04-29 Thread osstest service owner
flight 149868 linux-linus real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149868/

Failures :-/ but no regressions.

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-xl-rtds 17 guest-saverestore.2  fail REGR. vs. 149854

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-examine  4 memdisk-try-append   fail  like 149840
 test-amd64-amd64-xl-qemut-win7-amd64 17 guest-stopfail like 149854
 test-armhf-armhf-libvirt 14 saverestore-support-checkfail  like 149854
 test-amd64-amd64-xl-qemuu-win7-amd64 17 guest-stopfail like 149854
 test-amd64-amd64-xl-qemut-ws16-amd64 17 guest-stopfail like 149854
 test-amd64-i386-xl-qemuu-win7-amd64 17 guest-stop fail like 149854
 test-armhf-armhf-libvirt-raw 13 saverestore-support-checkfail  like 149854
 test-armhf-armhf-xl-rtds 16 guest-start/debian.repeatfail  like 149854
 test-amd64-i386-xl-qemut-win7-amd64 17 guest-stop fail like 149854
 test-amd64-amd64-xl-qemuu-ws16-amd64 17 guest-stopfail like 149854
 test-amd64-i386-xl-pvshim12 guest-start  fail   never pass
 test-arm64-arm64-xl-seattle  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-seattle  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  13 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  13 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 11 migrate-support-check 
fail never pass
 test-amd64-amd64-qemuu-nested-amd 17 debian-hvm-install/l1/l2  fail never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 13 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 14 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  14 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 13 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 14 saverestore-support-checkfail  never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  14 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemuu-ws16-amd64 17 guest-stop  fail never pass
 test-armhf-armhf-libvirt-raw 12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-cubietruck 13 migrate-support-checkfail never pass
 test-armhf-armhf-xl-cubietruck 14 saverestore-support-checkfail never pass
 test-armhf-armhf-xl-vhd  12 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  13 saverestore-support-checkfail   never pass
 test-amd64-i386-xl-qemut-ws16-amd64 17 guest-stop  fail never pass

version targeted for testing:
 linux96c9a7802af7d500a582d89a8b864584fe878c1b
baseline version:
 linux51184ae37e0518fd90cb437a2fbc953ae558cd0d

Last test of basis   149854  2020-04-28 10:51:07 Z1 days
Testing same since   149868  2020-04-29 00:40:01 Z0 days1 attempts


People who touched revisions under t

Re: [PATCH] x86/S3: Drop {save,restore}_rest_processor_state() completely

2020-04-29 Thread Andrew Cooper
On 29/04/2020 12:16, Jan Beulich wrote:
> On 29.04.2020 13:09, Andrew Cooper wrote:
>> --- a/xen/arch/x86/boot/trampoline.S
>> +++ b/xen/arch/x86/boot/trampoline.S
>> @@ -91,6 +91,11 @@ trampoline_protmode_entry:
>>  and %edi,%edx
>>  wrmsr
>>  1:
>> +/* Set up PAT before enabling paging. */
>> +mov $XEN_MSR_PAT & 0x, %eax
>> +mov $XEN_MSR_PAT >> 32, %edx
>> +mov $MSR_IA32_CR_PAT, %ecx
>> +wrmsr
> Doesn't this also eliminate the need for cpu_init() doing this?
> If you agree with that one also dropped
> Reviewed-by: Jan Beulich 

That doesn't cover the BSP on either the legacy or EFI paths.

~Andrew



Re: [PATCH] x86/S3: Drop {save,restore}_rest_processor_state() completely

2020-04-29 Thread Jan Beulich
On 29.04.2020 13:09, Andrew Cooper wrote:
> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -91,6 +91,11 @@ trampoline_protmode_entry:
>  and %edi,%edx
>  wrmsr
>  1:
> +/* Set up PAT before enabling paging. */
> +mov $XEN_MSR_PAT & 0x, %eax
> +mov $XEN_MSR_PAT >> 32, %edx
> +mov $MSR_IA32_CR_PAT, %ecx
> +wrmsr

Doesn't this also eliminate the need for cpu_init() doing this?
If you agree with that one also dropped
Reviewed-by: Jan Beulich 

Jan



[PATCH] x86/S3: Drop {save, restore}_rest_processor_state() completely

2020-04-29 Thread Andrew Cooper
There is no need to save/restore FS/GS/XCR0 state.  It will be handled
suitably on the context switch away from the idle.

The CR4 restoration in restore_rest_processor_state() was actually fighting
later code in enter_state() which tried to keep CR4.MCE clear until everything
was set up.  Delete the intermediate restoration, and defer final restoration
until after MCE is reconfigured.

Changing PAT should be done before paging is enabled.  Move it into the
trampoline during setup for 64bit.

The only remaing piece of restoration is load_system_tables(), so suspend.c
can be deleted in its entirety.

Signed-off-by: Andrew Cooper 
---
CC: Jan Beulich 
CC: Wei Liu 
CC: Roger Pau Monne 
---
 xen/arch/x86/acpi/Makefile  |  2 +-
 xen/arch/x86/acpi/power.c   |  9 
 xen/arch/x86/acpi/suspend.c | 49 -
 xen/arch/x86/acpi/wakeup_prot.S |  4 +---
 xen/arch/x86/boot/trampoline.S  |  5 +
 5 files changed, 11 insertions(+), 58 deletions(-)
 delete mode 100644 xen/arch/x86/acpi/suspend.c

diff --git a/xen/arch/x86/acpi/Makefile b/xen/arch/x86/acpi/Makefile
index 1b9e625713..041377e2bb 100644
--- a/xen/arch/x86/acpi/Makefile
+++ b/xen/arch/x86/acpi/Makefile
@@ -1,4 +1,4 @@
 obj-y += cpufreq/
 
-obj-y += lib.o power.o suspend.o cpu_idle.o cpuidle_menu.o
+obj-y += lib.o power.o cpu_idle.o cpuidle_menu.o
 obj-bin-y += boot.init.o wakeup_prot.o
diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c
index 6dfd4c7891..0cda362045 100644
--- a/xen/arch/x86/acpi/power.c
+++ b/xen/arch/x86/acpi/power.c
@@ -195,7 +195,6 @@ static int enter_state(u32 state)
 unsigned long flags;
 int error;
 struct cpu_info *ci;
-unsigned long cr4;
 
 if ( (state <= ACPI_STATE_S0) || (state > ACPI_S_STATES_MAX) )
 return -EINVAL;
@@ -270,15 +269,15 @@ static int enter_state(u32 state)
 
 system_state = SYS_STATE_resume;
 
-/* Restore CR4 and EFER from cached values. */
-cr4 = read_cr4();
-write_cr4(cr4 & ~X86_CR4_MCE);
+/* Restore EFER from cached value. */
 write_efer(read_efer());
 
 device_power_up(SAVED_ALL);
 
 mcheck_init(&boot_cpu_data, false);
-write_cr4(cr4);
+
+/* Restore CR4 from cached value, now MCE is set up. */
+write_cr4(read_cr4());
 
 printk(XENLOG_INFO "Finishing wakeup from ACPI S%d state.\n", state);
 
diff --git a/xen/arch/x86/acpi/suspend.c b/xen/arch/x86/acpi/suspend.c
deleted file mode 100644
index 3c1a3cbb34..00
--- a/xen/arch/x86/acpi/suspend.c
+++ /dev/null
@@ -1,49 +0,0 @@
-/*
- * Portions are:
- *  Copyright (c) 2002 Pavel Machek 
- *  Copyright (c) 2001 Patrick Mochel 
- */
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-static unsigned long saved_fs_base, saved_gs_base, saved_kernel_gs_base;
-static uint64_t saved_xcr0;
-
-void save_rest_processor_state(void)
-{
-saved_fs_base = rdfsbase();
-saved_gs_base = rdgsbase();
-rdmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base);
-
-if ( cpu_has_xsave )
-saved_xcr0 = get_xcr0();
-}
-
-
-void restore_rest_processor_state(void)
-{
-load_system_tables();
-
-/* Restore full CR4 (inc MCE) now that the IDT is in place. */
-write_cr4(mmu_cr4_features);
-
-wrfsbase(saved_fs_base);
-wrgsbase(saved_gs_base);
-wrmsrl(MSR_SHADOW_GS_BASE, saved_kernel_gs_base);
-
-if ( cpu_has_xsave && !set_xcr0(saved_xcr0) )
-BUG();
-
-wrmsrl(MSR_IA32_CR_PAT, XEN_MSR_PAT);
-
-mtrr_bp_restore();
-}
diff --git a/xen/arch/x86/acpi/wakeup_prot.S b/xen/arch/x86/acpi/wakeup_prot.S
index 0ce96e26a9..4dba6020a7 100644
--- a/xen/arch/x86/acpi/wakeup_prot.S
+++ b/xen/arch/x86/acpi/wakeup_prot.S
@@ -15,8 +15,6 @@ ENTRY(do_suspend_lowlevel)
 mov %cr0, %rax
 mov %rax, saved_cr0(%rip)
 
-callsave_rest_processor_state
-
 /* enter sleep state physically */
 mov $3, %edi
 callacpi_enter_sleep_state
@@ -51,7 +49,7 @@ ENTRY(s3_resume)
 lretq
 1:
 
-call restore_rest_processor_state
+callload_system_tables
 
 .Lsuspend_err:
 pop %r15
diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index 18c6638924..662e6bdd3c 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -91,6 +91,11 @@ trampoline_protmode_entry:
 and %edi,%edx
 wrmsr
 1:
+/* Set up PAT before enabling paging. */
+mov $XEN_MSR_PAT & 0x, %eax
+mov $XEN_MSR_PAT >> 32, %edx
+mov $MSR_IA32_CR_PAT, %ecx
+wrmsr
 
 /* Set up EFER (Extended Feature Enable Register). */
 movl$MSR_EFER,%ecx
-- 
2.11.0




Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-29 Thread Julien Grall

Hi Paul,

On 28/04/2020 16:35, Paul Durrant wrote:

-Original Message-
From: Julien Grall 
Sent: 20 April 2020 18:21
To: Paul Durrant ; xen-devel@lists.xenproject.org
Cc: Paul Durrant ; Andrew Cooper 
; George Dunlap
; Ian Jackson ; Jan Beulich 
;
Stefano Stabellini ; Wei Liu ; Volodymyr 
Babchuk
; Roger Pau Monné 
Subject: Re: [PATCH v2 1/5] xen/common: introduce a new framework for 
save/restore of 'domain' context

Hi Paul,

On 07/04/2020 18:38, Paul Durrant wrote:

To allow enlightened HVM guests (i.e. those that have PV drivers) to be
migrated without their co-operation it will be necessary to transfer 'PV'
state such as event channel state, grant entry state, etc.

Currently there is a framework (entered via the hvm_save/load() functions)
that allows a domain's 'HVM' (architectural) state to be transferred but
'PV' state is also common with pure PV guests and so this framework is not
really suitable.

This patch adds the new public header and low level implementation of a new
common framework, entered via the domain_save/load() functions. Subsequent
patches will introduce other parts of the framework, and code that will
make use of it within the current version of the libxc migration stream.

This patch also marks the HVM-only framework as deprecated in favour of the
new framework.

Signed-off-by: Paul Durrant 
---
Cc: Andrew Cooper 
Cc: George Dunlap 
Cc: Ian Jackson 
Cc: Jan Beulich 
Cc: Julien Grall 
Cc: Stefano Stabellini 
Cc: Wei Liu 
Cc: Volodymyr Babchuk 
Cc: "Roger Pau Monné" 

v2:
   - Allow multi-stage save/load to avoid the need to double-buffer
   - Get rid of the masks and add an 'ignore' flag instead
   - Create copy function union to preserve const save buffer
   - Deprecate HVM-only framework
---
   xen/common/Makefile|   1 +
   xen/common/save.c  | 329 +
   xen/include/public/arch-arm/hvm/save.h |   5 +
   xen/include/public/arch-x86/hvm/save.h |   5 +
   xen/include/public/save.h  |  84 +++
   xen/include/xen/save.h | 152 
   6 files changed, 576 insertions(+)
   create mode 100644 xen/common/save.c
   create mode 100644 xen/include/public/save.h
   create mode 100644 xen/include/xen/save.h

diff --git a/xen/common/Makefile b/xen/common/Makefile
index e8cde65370..90553ba5d7 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -37,6 +37,7 @@ obj-y += radix-tree.o
   obj-y += rbtree.o
   obj-y += rcupdate.o
   obj-y += rwlock.o
+obj-y += save.o
   obj-y += shutdown.o
   obj-y += softirq.o
   obj-y += sort.o
diff --git a/xen/common/save.c b/xen/common/save.c
new file mode 100644
index 00..6cdac3785b
--- /dev/null
+++ b/xen/common/save.c
@@ -0,0 +1,329 @@
+/*
+ * save.c: Save and restore PV guest state common to all domain types.
+ *
+ * Copyright Amazon.com Inc. or its affiliates.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; If not, see .
+ */
+
+#include 
+
+union domain_copy_entry {
+domain_write_entry write;
+domain_read_entry read;
+};
+
+struct domain_context {
+bool log;
+struct domain_save_descriptor desc;
+size_t data_len;


What is data_len?



It’s used for internal accounting.


Can you add a comment explaining it?




+union domain_copy_entry copy;
+void *priv;
+};
+
+static struct {
+const char *name;
+bool per_vcpu;
+domain_save_handler save;
+domain_load_handler load;
+} handlers[DOMAIN_SAVE_CODE_MAX + 1];
+
+void __init domain_register_save_type(unsigned int tc, const char *name,
+  bool per_vcpu,
+  domain_save_handler save,
+  domain_load_handler load)
+{
+BUG_ON(tc > ARRAY_SIZE(handlers));
+
+ASSERT(!handlers[tc].save);
+ASSERT(!handlers[tc].load);
+
+handlers[tc].name = name;
+handlers[tc].per_vcpu = per_vcpu;
+handlers[tc].save = save;
+handlers[tc].load = load;
+}
+
+int domain_save_begin(struct domain_context *c, unsigned int tc,
+  const char *name, const struct vcpu *v, size_t len)
+{
+int rc;
+
+if ( c->log )
+gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
+ (unsigned long)len);


How about using %zu rather than a cast here?



Yes, that would be better.


+
+BUG_ON(tc != c->desc.typecode);
+BUG_ON(v->vcpu_id != c->desc.vcpu_id);


I can't find any

Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 11:49, Jürgen Groß wrote:
> On 29.04.20 12:39, Igor Druzhinin wrote:
>> On 29/04/2020 10:22, Julien Grall wrote:
>>> Hi Juergen,
>>>
>>> On 29/04/2020 06:51, Jürgen Groß wrote:

 Recreating the event channel would be fine, but I don't see why it
 would ever be needed. And XS_INTRODUCE is called only at domain creation
 time today, so there is obviously no need for repeating this call.

 Maybe the idea was to do this after sending a XS_RESUME command, which
 isn't used anywhere in Xen and is another part of Xenstore which doesn't
 make any sense today.
>>>
>>> Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port 
>>> XS_INTRODUCE evtchn rebind function from cxenstored" added the exact same 
>>> behavior in the OCaml XenStored last year.
>>>
>>> This was introduced 12 years after C XenStored, so surely someone think 
>>> this is useful. We should check why this was introduced in OCaml XenStored 
>>> (I have CCed the author of the patch).
>>>
>>> If we still think this is not useful, then you should add an explanation in 
>>> the commit message why the two implementations diverge and possibly update 
>>> the spec.
>>
>> Thanks for CC, Julien.
>>
>> We indeed already use this functionality in our toolstack for guest kdump
>> functions. It's not possible in XAPI to adopt libxl model where almost 
>> everything
>> is restarted for a domain entering kdump, so we have to use this message to
>> rebind xenstore evtchn after soft reset without shutting down backends and
>> recreating the whole subtree (frontends reconnect fine after that).
>>
>> We obviously only require it for now to be present in oxenstored only.
>> Please don't remove this functionality if possible.
> 
> If I read handling in libxl correctly, in the soft reset case XS_RELEASE
> is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
> xenstored throwing away its related struct domain, so XS_INTRODUCE will
> be possible again.

>From what I remember it was not possible to keep xenstored data for a domain
and at the same time perform release-introduce cycle (at least in oxenstored).
It also involved firing @releaseDomain which caused havoc in other part of
the toolstack.

Igor



Re: [PATCH 5/6] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Jan Beulich
On 29.04.2020 11:26, Hongyan Xia wrote:
> But if people do not have a problem with plXe - 1, I will post a new
> revision for that.

I'd prefer that; I could live with the current version, but I'm
not in favor of it.

Jan



Re: [PATCH v2 1/5] xen/common: introduce a new framework for save/restore of 'domain' context

2020-04-29 Thread Jan Beulich
On 07.04.2020 19:38, Paul Durrant wrote:
> +void __init domain_register_save_type(unsigned int tc, const char *name,
> +  bool per_vcpu,
> +  domain_save_handler save,
> +  domain_load_handler load)
> +{
> +BUG_ON(tc > ARRAY_SIZE(handlers));

>=

> +ASSERT(!handlers[tc].save);
> +ASSERT(!handlers[tc].load);
> +
> +handlers[tc].name = name;
> +handlers[tc].per_vcpu = per_vcpu;
> +handlers[tc].save = save;
> +handlers[tc].load = load;
> +}
> +
> +int domain_save_begin(struct domain_context *c, unsigned int tc,
> +  const char *name, const struct vcpu *v, size_t len)

I find it quite odd for a function like this one to take a
struct vcpu * rather than a struct domain *. See also below
comment on the vcpu_id field in the public header.

> +{
> +int rc;
> +
> +if ( c->log )
> +gdprintk(XENLOG_INFO, "%pv save: %s (%lu)\n", v, name,
> + (unsigned long)len);
> +
> +BUG_ON(tc != c->desc.typecode);
> +BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> +
> +ASSERT(!c->data_len);
> +c->data_len = c->desc.length = len;
> +
> +rc = c->copy.write(c->priv, &c->desc, sizeof(c->desc));
> +if ( rc )
> +return rc;
> +
> +c->desc.length = 0;
> +
> +return 0;
> +}
> +
> +int domain_save_data(struct domain_context *c, const void *src, size_t len)
> +{
> +if ( c->desc.length + len > c->data_len )
> +return -ENOSPC;
> +
> +c->desc.length += len;
> +
> +return c->copy.write(c->priv, src, len);
> +}
> +
> +int domain_save_end(struct domain_context *c)

I'm sure there is a reason for the split into three load/save
functions (begin/data/end), but I can't figure it and the
description also doesn't explain it. They're all used together
only afaics, in domain_{save,load}_entry(). Or wait, there are
DOMAIN_{SAVE,LOAD}_BEGIN() macros apparently allowing separate
use of ..._begin(), but then it's still not clear why ..._end()
need to be separate from ..._data().

> +int domain_save(struct domain *d, domain_write_entry write, void *priv,
> +bool dry_run)
> +{
> +struct domain_context c = {
> +.copy.write = write,
> +.priv = priv,
> +.log = !dry_run,
> +};
> +struct domain_save_header h = {

const? Perhaps even static?

> +.magic = DOMAIN_SAVE_MAGIC,
> +.version = DOMAIN_SAVE_VERSION,
> +};
> +struct domain_save_header e;
> +unsigned int i;
> +int rc;
> +
> +ASSERT(d != current->domain);
> +
> +if ( d->is_dying )
> +return -EINVAL;

Could I talk you into using less generic an error code here, e.g.
-ESRCH or -ENXIO? There look to be further uses of EINVAL that
may want replacing.

> +domain_pause(d);
> +
> +c.desc.typecode = DOMAIN_SAVE_CODE(HEADER);
> +
> +rc = DOMAIN_SAVE_ENTRY(HEADER, &c, d->vcpu[0], &h, sizeof(h));
> +if ( rc )
> +goto out;
> +
> +for ( i = 0; i < ARRAY_SIZE(handlers); i++ )
> +{
> +domain_save_handler save = handlers[i].save;
> +
> +if ( !save )
> +continue;
> +
> +memset(&c.desc, 0, sizeof(c.desc));
> +c.desc.typecode = i;
> +
> +if ( handlers[i].per_vcpu )
> +{
> +struct vcpu *v;
> +
> +for_each_vcpu ( d, v )
> +{
> +c.desc.vcpu_id = v->vcpu_id;
> +
> +rc = save(v, &c, dry_run);
> +if ( rc )
> +goto out;
> +}
> +}
> +else
> +{
> +rc = save(d->vcpu[0], &c, dry_run);
> +if ( rc )
> +goto out;
> +}
> +}
> +
> +memset(&c.desc, 0, sizeof(c.desc));
> +c.desc.typecode = DOMAIN_SAVE_CODE(END);
> +
> +rc = DOMAIN_SAVE_ENTRY(END, &c, d->vcpu[0], &e, 0);

By the looks of it you're passing uninitialized e here; it's just
that the struct has no members. It would look less odd if you used
NULL here. Otherwise please don't use literal 0, but sizeof() for
the last parameter.

> +int domain_load_begin(struct domain_context *c, unsigned int tc,
> +  const char *name, const struct vcpu *v, size_t len,
> +  bool exact)
> +{
> +if ( c->log )
> +gdprintk(XENLOG_INFO, "%pv load: %s (%lu)\n", v, name,
> + (unsigned long)len);
> +
> +BUG_ON(tc != c->desc.typecode);
> +BUG_ON(v->vcpu_id != c->desc.vcpu_id);
> +
> +if ( (exact && (len != c->desc.length)) ||
> + (len < c->desc.length) )
> +return -EINVAL;

How about

if ( exact ? len != c->desc.length
   : len < c->desc.length )

? I'm also unsure about the < - don't you mean > instead? Too
little data would be compensated by zero padding, but too
much data can't be dealt with. But maybe I'm getting the sense
of len wrong ...

> +int domain_load(struct domain *d, do

Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Jürgen Groß

On 29.04.20 12:39, Igor Druzhinin wrote:

On 29/04/2020 10:22, Julien Grall wrote:

Hi Juergen,

On 29/04/2020 06:51, Jürgen Groß wrote:


Recreating the event channel would be fine, but I don't see why it
would ever be needed. And XS_INTRODUCE is called only at domain creation
time today, so there is obviously no need for repeating this call.

Maybe the idea was to do this after sending a XS_RESUME command, which
isn't used anywhere in Xen and is another part of Xenstore which doesn't
make any sense today.


Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port XS_INTRODUCE 
evtchn rebind function from cxenstored" added the exact same behavior in the OCaml 
XenStored last year.

This was introduced 12 years after C XenStored, so surely someone think this is 
useful. We should check why this was introduced in OCaml XenStored (I have CCed 
the author of the patch).

If we still think this is not useful, then you should add an explanation in the 
commit message why the two implementations diverge and possibly update the spec.


Thanks for CC, Julien.

We indeed already use this functionality in our toolstack for guest kdump
functions. It's not possible in XAPI to adopt libxl model where almost 
everything
is restarted for a domain entering kdump, so we have to use this message to
rebind xenstore evtchn after soft reset without shutting down backends and
recreating the whole subtree (frontends reconnect fine after that).

We obviously only require it for now to be present in oxenstored only.
Please don't remove this functionality if possible.


If I read handling in libxl correctly, in the soft reset case XS_RELEASE
is issued before doing another XS_INTRODUCE. XS_RELEASE will result in
xenstored throwing away its related struct domain, so XS_INTRODUCE will
be possible again.


Juergen



[PATCH] x86/hyperv: stash and use the configured max VP index

2020-04-29 Thread Wei Liu
The value returned from CPUID is the maximum number for virtual
processors supported by Hyper-V. It could be larger than the maximum
number of virtual processors configured.

Stash the configured number into a variable and use it in calculations.

Signed-off-by: Wei Liu 
---
 xen/arch/x86/guest/hyperv/hyperv.c  | 4 
 xen/arch/x86/guest/hyperv/private.h | 1 +
 xen/arch/x86/guest/hyperv/tlb.c | 2 +-
 xen/arch/x86/guest/hyperv/util.c| 2 +-
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/xen/arch/x86/guest/hyperv/hyperv.c 
b/xen/arch/x86/guest/hyperv/hyperv.c
index 91a6782cd986..84221b751453 100644
--- a/xen/arch/x86/guest/hyperv/hyperv.c
+++ b/xen/arch/x86/guest/hyperv/hyperv.c
@@ -33,6 +33,7 @@ DEFINE_PER_CPU_READ_MOSTLY(void *, hv_input_page);
 DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
 DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
 
+unsigned int __read_mostly hv_max_vp_index;
 static bool __read_mostly hcall_page_ready;
 
 static uint64_t generate_guest_id(void)
@@ -143,6 +144,9 @@ static int setup_hypercall_pcpu_arg(void)
 rdmsrl(HV_X64_MSR_VP_INDEX, vp_index_msr);
 this_cpu(hv_vp_index) = vp_index_msr;
 
+if ( vp_index_msr > hv_max_vp_index )
+hv_max_vp_index = vp_index_msr;
+
 return 0;
 }
 
diff --git a/xen/arch/x86/guest/hyperv/private.h 
b/xen/arch/x86/guest/hyperv/private.h
index 354fc7f685a7..fea3e291e944 100644
--- a/xen/arch/x86/guest/hyperv/private.h
+++ b/xen/arch/x86/guest/hyperv/private.h
@@ -28,6 +28,7 @@
 DECLARE_PER_CPU(void *, hv_input_page);
 DECLARE_PER_CPU(void *, hv_vp_assist);
 DECLARE_PER_CPU(unsigned int, hv_vp_index);
+extern unsigned int hv_max_vp_index;
 
 static inline unsigned int hv_vp_index(unsigned int cpu)
 {
diff --git a/xen/arch/x86/guest/hyperv/tlb.c b/xen/arch/x86/guest/hyperv/tlb.c
index 1d723d6ee679..0a44071481bd 100644
--- a/xen/arch/x86/guest/hyperv/tlb.c
+++ b/xen/arch/x86/guest/hyperv/tlb.c
@@ -166,7 +166,7 @@ int hyperv_flush_tlb(const cpumask_t *mask, const void *va,
 {
 unsigned int vpid = hv_vp_index(cpu);
 
-if ( vpid >= ms_hyperv.max_vp_index )
+if ( vpid >= hv_max_vp_index )
 {
 local_irq_restore(irq_flags);
 return -ENXIO;
diff --git a/xen/arch/x86/guest/hyperv/util.c b/xen/arch/x86/guest/hyperv/util.c
index bec61c2afd87..2c5f421b7bd9 100644
--- a/xen/arch/x86/guest/hyperv/util.c
+++ b/xen/arch/x86/guest/hyperv/util.c
@@ -33,7 +33,7 @@ int cpumask_to_vpset(struct hv_vpset *vpset,
 {
 int nr = 1;
 unsigned int cpu, vcpu_bank, vcpu_offset;
-unsigned int max_banks = ms_hyperv.max_vp_index / 64;
+unsigned int max_banks = hv_max_vp_index / 64;
 
 /* Up to 64 banks can be represented by valid_bank_mask */
 if ( max_banks > 64 )
-- 
2.20.1




Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Igor Druzhinin
On 29/04/2020 10:22, Julien Grall wrote:
> Hi Juergen,
> 
> On 29/04/2020 06:51, Jürgen Groß wrote:
>>
>> Recreating the event channel would be fine, but I don't see why it
>> would ever be needed. And XS_INTRODUCE is called only at domain creation
>> time today, so there is obviously no need for repeating this call.
>>
>> Maybe the idea was to do this after sending a XS_RESUME command, which
>> isn't used anywhere in Xen and is another part of Xenstore which doesn't
>> make any sense today.
> 
> Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port 
> XS_INTRODUCE evtchn rebind function from cxenstored" added the exact same 
> behavior in the OCaml XenStored last year.
> 
> This was introduced 12 years after C XenStored, so surely someone think this 
> is useful. We should check why this was introduced in OCaml XenStored (I have 
> CCed the author of the patch).
> 
> If we still think this is not useful, then you should add an explanation in 
> the commit message why the two implementations diverge and possibly update 
> the spec.

Thanks for CC, Julien.

We indeed already use this functionality in our toolstack for guest kdump
functions. It's not possible in XAPI to adopt libxl model where almost 
everything
is restarted for a domain entering kdump, so we have to use this message to
rebind xenstore evtchn after soft reset without shutting down backends and
recreating the whole subtree (frontends reconnect fine after that).

We obviously only require it for now to be present in oxenstored only.
Please don't remove this functionality if possible.

Igor 



[xen-unstable-smoke test] 149872: tolerable all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149872 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149872/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  13 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  14 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  13 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  14 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  e9aca9470ed86966f9c0fd0db85132ff28d652c4
baseline version:
 xen  4ec07971f1c5a236a0d8c528d806efb6bfd3d1a6

Last test of basis   149863  2020-04-28 16:01:08 Z0 days
Testing same since   149872  2020-04-29 08:02:03 Z0 days1 attempts


People who touched revisions under test:
  Jan Beulich 
  Roger Pau Monné 
  Tim Deegan 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   4ec07971f1..e9aca9470e  e9aca9470ed86966f9c0fd0db85132ff28d652c4 -> smoke



[xen-unstable-coverity test] 149873: all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149873 xen-unstable-coverity real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149873/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 xen  4ec07971f1c5a236a0d8c528d806efb6bfd3d1a6
baseline version:
 xen  f093b08c47b39da6019421a2b61d40745b3e573b

Last test of basis   149828  2020-04-26 09:18:45 Z3 days
Testing same since   149873  2020-04-29 09:19:00 Z0 days1 attempts


People who touched revisions under test:
  Andrew Cooper 
  Anthony PERARD 
  George Dunlap 
  Hongyan Xia 
  Jan Beulich 
  Juergen Gross 
  Julien Grall 
  Nick Rosbrook 
  Nick Rosbrook 
  Stefano Stabellini 
  Wei Liu 
  Wei Liu 

jobs:
 coverity-amd64   pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f093b08c47..4ec07971f1  4ec07971f1c5a236a0d8c528d806efb6bfd3d1a6 -> 
coverity-tested/smoke



Re: [PATCH 5/6] x86/pv: map and unmap page tables in mark_pv_pt_pages_rdonly

2020-04-29 Thread Hongyan Xia
On Tue, 2020-04-28 at 16:59 +0100, Hongyan Xia wrote:
> On Tue, 2020-04-28 at 16:55 +0100, Hongyan Xia wrote:
> > On Tue, 2020-04-28 at 17:33 +0200, Jan Beulich wrote:
> > > On 17.04.2020 11:52, Hongyan Xia wrote:
> > > > --- a/xen/arch/x86/pv/dom0_build.c
> > > > +++ b/xen/arch/x86/pv/dom0_build.c
> > > > @@ -50,17 +50,17 @@ static __init void
> > > > mark_pv_pt_pages_rdonly(struct domain *d,
> > > >  unsigned long count;
> > > >  struct page_info *page;
> > > >  l4_pgentry_t *pl4e;
> > > > -l3_pgentry_t *pl3e;
> > > > -l2_pgentry_t *pl2e;
> > > > -l1_pgentry_t *pl1e;
> > > > +l3_pgentry_t *pl3e, *l3t;
> > > > +l2_pgentry_t *pl2e, *l2t;
> > > > +l1_pgentry_t *pl1e, *l1t;
> > > 
> > > I don't quite see why the new local variables get introduced:
> > > unmap_domain_page(), iirc, is quite fine with a non-page-
> > > aligned argument.
> > 
> > You are right, although in this function, where plXe points to may
> > not
> > be the page we want to unmap. When plXe becomes aligned and points
> > to
> > a
> > new page, we actually want to unmap the page before it increments
> > to
> > an
> > aligned value.
> 
> Hmm, we can just unmap(plXe - 1) if my logic is correct, and save 3
> local variables.

Sorry for monologuing, but I still prefer separating plXe and lXt
because it makes it clear what we are unmapping. Unmapping plXe - 1 is
a bit hackish.

But if people do not have a problem with plXe - 1, I will post a new
revision for that.

Hongyan




Re: [PATCH] tools/xenstore: don't store domU's mfn of ring page in xensotred

2020-04-29 Thread Julien Grall

Hi Juergen,

On 29/04/2020 06:51, Jürgen Groß wrote:

On 28.04.20 22:55, Julien Grall wrote:

Hi Juergen,

On Tue, 28 Apr 2020 at 16:53, Juergen Gross  wrote:


The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.

The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL.

Today there shouldn't be multiple XS_INTRODUCE requests for the same
domain issued, so the mfn/gfn can just be ignored and multiple
XS_INTRODUCE commands can be rejected without testing the mfn/gfn.


So there is a comment in the else part:

/* Use XS_INTRODUCE for recreating the xenbus event-channel. */

 From the commit message this is not entirely clear why we want to
prevent recreating the event-channel. Can you expand it?


Recreating the event channel would be fine, but I don't see why it
would ever be needed. And XS_INTRODUCE is called only at domain creation
time today, so there is obviously no need for repeating this call.

Maybe the idea was to do this after sending a XS_RESUME command, which
isn't used anywhere in Xen and is another part of Xenstore which doesn't
make any sense today.


Commit f6cc37ea8ac71385b60507c034519f304da75f4c "tools/oxenstored: port 
XS_INTRODUCE evtchn rebind function from cxenstored" added the exact 
same behavior in the OCaml XenStored last year.


This was introduced 12 years after C XenStored, so surely someone think 
this is useful. We should check why this was introduced in OCaml 
XenStored (I have CCed the author of the patch).


If we still think this is not useful, then you should add an explanation 
in the commit message why the two implementations diverge and possibly 
update the spec.


Cheers,

--
Julien Grall



Re: Cpu on/offlining crash with core scheduling

2020-04-29 Thread Sergey Dyasli
On 29/04/2020 09:09, Jürgen Groß wrote:
> On 27.04.20 15:49, Sergey Dyasli wrote:
>> Hi Juergen,
>>
>> When I'm testing vcpu pinning with something like:
>>
>>   # xl vcpu-pin 0 0 2
>>   # xen-hptool cpu-offline 3
>>
>>   (offline / online CPUs {2,3} if the above is successful)
>>
>> I'm reliably seeing the following crash on the latest staging:
>>
>> (XEN) Watchdog timer detects that CPU1 is stuck!
>> (XEN) [ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]
>> (XEN) CPU:    1
>> (XEN) RIP:    e008:[] 
>> common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385
>> (XEN) RFLAGS: 0002   CONTEXT: hypervisor
>> (XEN) rax: f001   rbx: 82d0805c9118   rcx: 83085e750301
>> (XEN) rdx: 0001   rsi: 83086499b972   rdi: 83085e7503a6
>> (XEN) rbp: 83085e7dfe28   rsp: 83085e7dfdd8   r8:  830864985440
>> (XEN) r9:  83085e714068   r10: 0014   r11: 0056b6a1aab2
>> (XEN) r12: 83086499e490   r13: 82d0805f26e0   r14: 83085e7503a0
>> (XEN) r15: 0001   cr0: 80050033   cr4: 00362660
>> (XEN) cr3: 000823a8e000   cr2: 6026000f6fc0
>> (XEN) fsb:    gsb: 888138dc   gss: 
>> (XEN) ds: 002b   es: 002b   fs:    gs:    ss: e010   cs: e008
>> (XEN) Xen code around  
>> (common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385):
>> (XEN)  4c 89 f7 e8 dc a5 fd ff <4b> 8b 44 fd 00 48 8b 04 18 4c 3b 70 10 0f 
>> 85 3f
>> (XEN) Xen stack trace from rsp=83085e7dfdd8:
>> (XEN)    0056b42128a6 83086499ff30 83086498a000 83085e7dfe48
>> (XEN)    00010001 0056b42128a6 83086499e490 
>> (XEN)    0001 0001 83085e7dfe78 82d080252ae8
>> (XEN)    83086498a000 000180230434 83085e7503a0 82d0805ceb00
>> (XEN)     82d0805cea80  82d0805dea80
>> (XEN)    83085e7dfeb0 82d08022c232 0001 82d0805ceb00
>> (XEN)    0001 0001 0001 83085e7dfec0
>> (XEN)    82d08022c2cd 83085e7dfef0 82d08031cae9 83086498a000
>> (XEN)    83086498a000 0001 0001 83085e7dfde8
>> (XEN)    88813021d700 88813021d700  
>> (XEN)    0007 88813021d700 0246 7ff0
>> (XEN)     0001ca00  810013aa
>> (XEN)    8203d210 deadbeefdeadf00d deadbeefdeadf00d 0100
>> (XEN)    810013aa e033 0246 c900400dfeb0
>> (XEN)    e02b   
>> (XEN)     e011 83086498a000 0037e43bd000
>> (XEN)    00362660  800864980002 0601
>> (XEN)    
>> (XEN) Xen call trace:
>> (XEN)    [] R 
>> common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385
>> (XEN)    [] F common/sched/core.c#sched_slave+0x262/0x31e
>> (XEN)    [] F common/softirq.c#__do_softirq+0x8a/0xbc
>> (XEN)    [] F do_softirq+0x13/0x15
>> (XEN)    [] F arch/x86/domain.c#idle_loop+0x57/0xa7
>> (XEN)
>> (XEN) CPU0 @ e008:82d08022c2b7 (process_pending_softirqs+0x53/0x56)
>> (XEN) CPU4 @ e008:82d08022bc40 
>> (common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
>> (XEN) CPU2 @ e008:82d08022c26f (process_pending_softirqs+0xb/0x56)
>> (XEN) CPU7 @ e008:82d08022bc40 
>> (common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
>> (XEN) CPU3 @ e008:82d08022bc40 
>> (common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
>> (XEN) CPU5 @ e008:82d08022cc34 (_spin_lock+0x4d/0x62)
>> (XEN) CPU6 @ e008:82d08022c264 (process_pending_softirqs+0/0x56)
>> (XEN)
>> (XEN) 
>> (XEN) Panic on CPU 1:
>> (XEN) FATAL TRAP: vector = 2 (nmi)
>> (XEN) [error_code=] , IN INTERRUPT CONTEXT
>> (XEN) 
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN) Executing kexec image on cpu1
>> (XEN) Shot down all CPUs
>>
>>
>> Is this something you can reproduce?
> 
> Yes, I was able to hit this.
> 
> Attached patch is fixing it for me. Could you give it a try?

The patch fixes the immediate issue:

Tested-by: Sergey Dyasli 

Thanks!

However, when running the following script:

while :; do xen-hptool cpu-offline 3; xen-hptool cpu-offline 2; 
xen-hptool cpu-online 3; xen-hptool cpu-online 2; sleep 0.1; done

there was some weirdness with the utility on some invocations:

xen-hptool: symbol lookup error: /lib64/libxenctrl.so.4.14: undefined 
symbol: xc__hypercall_buffer_free
Segmentation fault (core dumped)
xen-hptool: symbol lookup error: /lib64/libxenctrl.so.4.14: undefined 
symbol: xc__hypercall_bounce_post
xen-hptool: relocation error: /lib64/libxenctrl.so.4.14: s

Re: [PATCH] x86/pass-through: avoid double IRQ unbind during domain cleanup

2020-04-29 Thread Roger Pau Monné
On Wed, Apr 29, 2020 at 10:35:24AM +0200, Jan Beulich wrote:
> On 29.04.2020 10:26, Roger Pau Monné wrote:
> > On Wed, Apr 29, 2020 at 09:37:11AM +0200, Jan Beulich wrote:
> >> On 28.04.2020 18:14, Roger Pau Monné wrote:
> >>> On Tue, Apr 28, 2020 at 02:21:48PM +0200, Jan Beulich wrote:
>  XEN_DOMCTL_destroydomain creates a continuation if domain_kill 
>  -ERESTARTs.
>  In that scenario, it is possible to receive multiple _pirq_guest_unbind
>  calls for the same pirq from domain_kill, if the pirq has not yet been
>  removed from the domain's pirq_tree, as:
>    domain_kill()
>  -> domain_relinquish_resources()
>    -> pci_release_devices()
>  -> pci_clean_dpci_irq()
>    -> pirq_guest_unbind()
>  -> __pirq_guest_unbind()
> 
>  Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
>  from the tree being iterated after the first call there. In case such a
>  removed entry still has a softirq outstanding, record it and re-check
>  upon re-invocation.
> 
>  Reported-by: Varad Gautam 
>  Signed-off-by: Jan Beulich 
>  Tested-by: Varad Gautam 
> 
>  --- a/xen/arch/x86/irq.c
>  +++ b/xen/arch/x86/irq.c
>  @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
>   }
>   
>   if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
>  -BUG();
>  +BUG_ON(!d->is_dying);
> >>>
> >>> I think to keep the previous behavior this should be:
> >>>
> >>> BUG_ON(!is_hvm_domain(d) || !d->is_dying);
> >>>
> >>> Since the pirqs will only be removed elsewhere if the domain is HVM?
> >>
> >> pirq_cleanup_check() is a generic hook, and hence I consider it more
> >> correct to not have it behave differently in this regard for different
> >> types of guests. IOW while it _may_ (didn't check) not be the case
> >> today that this can be called multiple times even for PV guests, I'd
> >> view this as legitimate behavior.
> > 
> > Previous to this patch pirq_cleanup_check couldn't be called multiple
> > times, as it would result in the BUG triggering, that was true for
> > both PV and HVM. Now that the removal of PIRQs from the tree is done
> > elsewhere for HVM when the domain is dying the check needs to be
> > relaxed for HVM at least. I would prefer if it was kept as-is for PV
> > (since there's been no change in behavior for PV that could allow for
> > multiple calls to pirq_cleanup_check), or else a small comment in the
> > commit message would help clarify this is done on purpose.
> 
> I've added
> 
> "Note that pirq_cleanup_check() gets relaxed beyond what's strictly
>  needed here, to avoid introducing an asymmetry there between HVM and PV
>  guests."
> 
> Does this sound suitable to you?

Yes, thanks. With that:

Reviewed-by: Roger Pau Monné 

Roger.



Re: [PATCH] x86/pass-through: avoid double IRQ unbind during domain cleanup

2020-04-29 Thread Jan Beulich
On 29.04.2020 10:26, Roger Pau Monné wrote:
> On Wed, Apr 29, 2020 at 09:37:11AM +0200, Jan Beulich wrote:
>> On 28.04.2020 18:14, Roger Pau Monné wrote:
>>> On Tue, Apr 28, 2020 at 02:21:48PM +0200, Jan Beulich wrote:
 XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTs.
 In that scenario, it is possible to receive multiple _pirq_guest_unbind
 calls for the same pirq from domain_kill, if the pirq has not yet been
 removed from the domain's pirq_tree, as:
   domain_kill()
 -> domain_relinquish_resources()
   -> pci_release_devices()
 -> pci_clean_dpci_irq()
   -> pirq_guest_unbind()
 -> __pirq_guest_unbind()

 Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
 from the tree being iterated after the first call there. In case such a
 removed entry still has a softirq outstanding, record it and re-check
 upon re-invocation.

 Reported-by: Varad Gautam 
 Signed-off-by: Jan Beulich 
 Tested-by: Varad Gautam 

 --- a/xen/arch/x86/irq.c
 +++ b/xen/arch/x86/irq.c
 @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
  }
  
  if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
 -BUG();
 +BUG_ON(!d->is_dying);
>>>
>>> I think to keep the previous behavior this should be:
>>>
>>> BUG_ON(!is_hvm_domain(d) || !d->is_dying);
>>>
>>> Since the pirqs will only be removed elsewhere if the domain is HVM?
>>
>> pirq_cleanup_check() is a generic hook, and hence I consider it more
>> correct to not have it behave differently in this regard for different
>> types of guests. IOW while it _may_ (didn't check) not be the case
>> today that this can be called multiple times even for PV guests, I'd
>> view this as legitimate behavior.
> 
> Previous to this patch pirq_cleanup_check couldn't be called multiple
> times, as it would result in the BUG triggering, that was true for
> both PV and HVM. Now that the removal of PIRQs from the tree is done
> elsewhere for HVM when the domain is dying the check needs to be
> relaxed for HVM at least. I would prefer if it was kept as-is for PV
> (since there's been no change in behavior for PV that could allow for
> multiple calls to pirq_cleanup_check), or else a small comment in the
> commit message would help clarify this is done on purpose.

I've added

"Note that pirq_cleanup_check() gets relaxed beyond what's strictly
 needed here, to avoid introducing an asymmetry there between HVM and PV
 guests."

Does this sound suitable to you?

Jan



[ovmf test] 149869: all pass - PUSHED

2020-04-29 Thread osstest service owner
flight 149869 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/149869/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf b2034179e8feed9c7d3bc8f9d40a18fd236c5b57
baseline version:
 ovmf 099dfbb29d8bf0a30e397e3f5baf1da437b8f0ba

Last test of basis   149867  2020-04-28 18:09:26 Z0 days
Testing same since   149869  2020-04-29 03:52:49 Z0 days1 attempts


People who touched revisions under test:
  Abner Chang 
  Anthony PERARD 
  Laszlo Ersek 
  Michael Kubacki 
  Sean Brogan 
  Shenglei Zhang 
  Zhang, Shenglei 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   099dfbb29d..b2034179e8  b2034179e8feed9c7d3bc8f9d40a18fd236c5b57 -> 
xen-tested-master



[PATCH v2] tools/xenstore: don't store domU's mfn of ring page in xenstored

2020-04-29 Thread Juergen Gross
The XS_INTRODUCE command has two parameters: the mfn (or better: gfn)
of the domain's xenstore ring page and the event channel of the
domain for communicating with Xenstore.

The gfn is not really needed. It is stored in the per-domain struct
in xenstored and in case of another XS_INTRODUCE for the domain it
is tested to match the original value. If it doesn't match the
command is aborted via EINVAL.

Today there aren't multiple XS_INTRODUCE requests for the same domain
issued, so the mfn/gfn can just be ignored and multiple XS_INTRODUCE
commands can be rejected without testing the mfn/gfn.

Signed-off-by: Juergen Gross 
Acked-by: Andrew Cooper 
---
V2:
- remove mfn from struct domain (Julien Grall)
- replace mfn by gfn in comments (Julien Grall)
---
 tools/xenstore/xenstored_domain.c | 53 +++
 1 file changed, 20 insertions(+), 33 deletions(-)

diff --git a/tools/xenstore/xenstored_domain.c 
b/tools/xenstore/xenstored_domain.c
index 5858185211..1ca11e5e9e 100644
--- a/tools/xenstore/xenstored_domain.c
+++ b/tools/xenstore/xenstored_domain.c
@@ -55,10 +55,6 @@ struct domain
   repeated domain introductions. */
evtchn_port_t remote_port;
 
-   /* The mfn associated with the event channel, used only to validate
-  repeated domain introductions. */
-   unsigned long mfn;
-
/* Domain path in store. */
char *path;
 
@@ -363,13 +359,12 @@ static void domain_conn_reset(struct domain *domain)
domain->interface->rsp_cons = domain->interface->rsp_prod = 0;
 }
 
-/* domid, mfn, evtchn, path */
+/* domid, gfn, evtchn, path */
 int do_introduce(struct connection *conn, struct buffered_data *in)
 {
struct domain *domain;
char *vec[3];
unsigned int domid;
-   unsigned long mfn;
evtchn_port_t port;
int rc;
struct xenstore_domain_interface *interface;
@@ -381,7 +376,7 @@ int do_introduce(struct connection *conn, struct 
buffered_data *in)
return EACCES;
 
domid = atoi(vec[0]);
-   mfn = atol(vec[1]);
+   /* Ignore the gfn, we don't need it. */
port = atoi(vec[2]);
 
/* Sanity check args. */
@@ -390,34 +385,26 @@ int do_introduce(struct connection *conn, struct 
buffered_data *in)
 
domain = find_domain_by_domid(domid);
 
-   if (domain == NULL) {
-   interface = map_interface(domid);
-   if (!interface)
-   return errno;
-   /* Hang domain off "in" until we're finished. */
-   domain = new_domain(in, domid, port);
-   if (!domain) {
-   rc = errno;
-   unmap_interface(interface);
-   return rc;
-   }
-   domain->interface = interface;
-   domain->mfn = mfn;
-
-   /* Now domain belongs to its connection. */
-   talloc_steal(domain->conn, domain);
-
-   fire_watches(NULL, in, "@introduceDomain", false);
-   } else if ((domain->mfn == mfn) && (domain->conn != conn)) {
-   /* Use XS_INTRODUCE for recreating the xenbus event-channel. */
-   if (domain->port)
-   xenevtchn_unbind(xce_handle, domain->port);
-   rc = xenevtchn_bind_interdomain(xce_handle, domid, port);
-   domain->port = (rc == -1) ? 0 : rc;
-   domain->remote_port = port;
-   } else
+   if (domain)
return EINVAL;
 
+   interface = map_interface(domid);
+   if (!interface)
+   return errno;
+   /* Hang domain off "in" until we're finished. */
+   domain = new_domain(in, domid, port);
+   if (!domain) {
+   rc = errno;
+   unmap_interface(interface);
+   return rc;
+   }
+   domain->interface = interface;
+
+   /* Now domain belongs to its connection. */
+   talloc_steal(domain->conn, domain);
+
+   fire_watches(NULL, in, "@introduceDomain", false);
+
domain_conn_reset(domain);
 
send_ack(conn, XS_INTRODUCE);
-- 
2.16.4




Re: [PATCH] x86/pass-through: avoid double IRQ unbind during domain cleanup

2020-04-29 Thread Roger Pau Monné
On Wed, Apr 29, 2020 at 09:37:11AM +0200, Jan Beulich wrote:
> On 28.04.2020 18:14, Roger Pau Monné wrote:
> > On Tue, Apr 28, 2020 at 02:21:48PM +0200, Jan Beulich wrote:
> >> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTs.
> >> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> >> calls for the same pirq from domain_kill, if the pirq has not yet been
> >> removed from the domain's pirq_tree, as:
> >>   domain_kill()
> >> -> domain_relinquish_resources()
> >>   -> pci_release_devices()
> >> -> pci_clean_dpci_irq()
> >>   -> pirq_guest_unbind()
> >> -> __pirq_guest_unbind()
> >>
> >> Avoid recurring invocations of pirq_guest_unbind() by removing the pIRQ
> >> from the tree being iterated after the first call there. In case such a
> >> removed entry still has a softirq outstanding, record it and re-check
> >> upon re-invocation.
> >>
> >> Reported-by: Varad Gautam 
> >> Signed-off-by: Jan Beulich 
> >> Tested-by: Varad Gautam 
> >>
> >> --- a/xen/arch/x86/irq.c
> >> +++ b/xen/arch/x86/irq.c
> >> @@ -1323,7 +1323,7 @@ void (pirq_cleanup_check)(struct pirq *p
> >>  }
> >>  
> >>  if ( radix_tree_delete(&d->pirq_tree, pirq->pirq) != pirq )
> >> -BUG();
> >> +BUG_ON(!d->is_dying);
> > 
> > I think to keep the previous behavior this should be:
> > 
> > BUG_ON(!is_hvm_domain(d) || !d->is_dying);
> > 
> > Since the pirqs will only be removed elsewhere if the domain is HVM?
> 
> pirq_cleanup_check() is a generic hook, and hence I consider it more
> correct to not have it behave differently in this regard for different
> types of guests. IOW while it _may_ (didn't check) not be the case
> today that this can be called multiple times even for PV guests, I'd
> view this as legitimate behavior.

Previous to this patch pirq_cleanup_check couldn't be called multiple
times, as it would result in the BUG triggering, that was true for
both PV and HVM. Now that the removal of PIRQs from the tree is done
elsewhere for HVM when the domain is dying the check needs to be
relaxed for HVM at least. I would prefer if it was kept as-is for PV
(since there's been no change in behavior for PV that could allow for
multiple calls to pirq_cleanup_check), or else a small comment in the
commit message would help clarify this is done on purpose.

Thanks, Roger.



Re: [PATCH v4] x86: clear RDRAND CPUID bit on AMD family 15h/16h

2020-04-29 Thread Jan Beulich
On 02.04.2020 16:25, Andrew Cooper wrote:
> On 09/03/2020 09:08, Jan Beulich wrote:
>> Inspired by Linux commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24:
>>
>> There have been reports of RDRAND issues after resuming from suspend on
>> some AMD family 15h and family 16h systems. This issue stems from a BIOS
>> not performing the proper steps during resume to ensure RDRAND continues
>> to function properly.
>>
>> Update the CPU initialization to clear the RDRAND CPUID bit for any 
>> family
>> 15h and 16h processor that supports RDRAND. If it is known that the 
>> family
>> 15h or family 16h system does not have an RDRAND resume issue or that the
>> system will not be placed in suspend, the "cpuid=rdrand" kernel parameter
> 
> I'm not sure what is best to do here.  The type suggests that this is a
> verbatim copy of the Linux commit message, but this tiny detail is Xen
> specific.

It simply didn't seem to make sense to leave the Linux way of
specifying this in here, just to then say further down what the
correct (Xen) way is.

>> ---
>> Still slightly RFC, and still in particular because of the change to
>> parse_xen_cpuid():
> 
> FWIW, that is very similar to XenServer's AVX512 off-by-default bodge
> until the default vs max policy work is ready.
> 
> I don't have a better suggestion right now, but hopefully something
> better might become obvious when we've got more users.  Either way, I'm
> expecting it to turn into a "switch ( mid->bit )" expression in due course.

IOW do you want me to use switch() right away?

>> @@ -646,6 +647,25 @@ static void init_amd(struct cpuinfo_x86
>>  if (acpi_smi_cmd && (acpi_enable_value | acpi_disable_value))
>>  amd_acpi_c1e_quirk = true;
>>  break;
>> +
>> +case 0x15: case 0x16:
>> +/*
>> + * There are too many Fam15/Fam16 systems where upon resume
> 
> "some" systems.
> 
>> + * from S3 firmware fails to re-setup properly functioning
>> + * RDRAND.
> 
> I think this needs another sentence of explanation.
> 
> By the time we can spot the problem, it is too late to take evasive
> action, and there is nothing Xen can do to repair the problem.

Sure, added.

>>   Clear the feature unless force-enabled on the
>> + * command line.
>> + */
>> +if (c == &boot_cpu_data &&
>> +cpu_has(c, X86_FEATURE_RDRAND) &&
>> +!is_forced_cpu_cap(X86_FEATURE_RDRAND)) {
>> +static const char __initconst text[] =
>> +"RDRAND may cease to work on this hardware upon 
>> resume from S3.\n"
>> +"Please choose an explicit cpuid={no-}rdrand 
>> setting.\n";
>> +
>> +setup_clear_cpu_cap(X86_FEATURE_RDRAND);
>> +warning_add(text);
> 
> What do you think to clobbering RDRAND via the CPUMASK registers in this
> case?  We've got full control there, and it would stop PV userspace as well.

Why would such be needed? The host_policy -> pv_max_cpuid_policy
-> recalculate_cpuid_policy() propagation already causes the flag
to get zapped from guest policies once it got cleared here. And
it's the guest policy that controls what gets put in the masking
MSRs for PV guests, isn't it?

>> @@ -498,6 +504,28 @@ void identify_cpu(struct cpuinfo_x86 *c)
>>  printk("\n");
>>  #endif
>>  
>> +/*
>> + * If RDRAND is available, make an attempt to check that it actually
>> + * (still) works.
>> + */
> 
> Do you think it would be helpful to test in the opposite case as well. 
> If we come back from S3 and find that RDRAND does actually work, then it
> is safe to tell the user that they can re-enable.

I'd view this as a nice-to-have that isn't all that obvious how to
actually implement in a sufficiently clean way. For example, we
can't use arch_get_random() in that case. Therefore I'd prefer if
this extra courtesy could be left out of here for now, and - if
indeed deemed useful - be added later.

>> +if (cpu_has(c, X86_FEATURE_RDRAND)) {
>> +unsigned int prev = 0;
>> +
>> +for (i = 0; i < 5; ++i)
>> +{
>> +unsigned int cur = arch_get_random();
>> +
>> +if (prev && cur != prev)
>> +break;
>> +prev = cur;
>> +cpu_relax();
> 
> Why the relax?  We're not polling hammering the memory bus waiting for
> an unknown period of time until something changes.
> 
> We simply need up to 5 random numbers as fast as the RNG can produce
> them (which is actually quite slow.  RDRAND has ~350 cycle latency minimum.)

Dropped; I put it there simply to give the hardware some breathing
room between adjacent requests.

Jan



RE: [PATCH v2 2/2] Improve legacy vbios handling

2020-04-29 Thread Paul Durrant
> -Original Message-
> From: Grzegorz Uriasz 
> Sent: 29 April 2020 04:04
> To: qemu-de...@nongnu.org
> Cc: Grzegorz Uriasz ; marma...@invisiblethingslab.com; 
> ar...@puzio.waw.pl;
> ja...@bartmin.ski; j.nowa...@student.uw.edu.pl; Stefano Stabellini 
> ; Anthony
> Perard ; Paul Durrant ; 
> xen-devel@lists.xenproject.org
> Subject: [PATCH v2 2/2] Improve legacy vbios handling
> 
> The current method of getting the vbios is broken - it just isn't working on 
> any device I've tested -
> the reason
> for this is explained in the previous patch. The vbios is polymorphic and 
> getting a proper unmodified
> copy is
> often not possible without reverse engineering the firmware. We don't need an 
> unmodified copy for most
> purposes -
> an unmodified copy is only needed for initializing the bios framebuffer and 
> providing the bios with a
> corrupted
> copy of the rom won't do any damage as the bios will just ignore the rom.
> 
> After the i915 driver takes over the vbios is only needed for reading some 
> metadata/configuration
> stuff etc...
> I've tested that not having any kind of vbios in the guest actually works 
> fine but on older
> generations of IGD
> there are some slight hiccups. To maximize compatibility the best approach is 
> to just copy the results
> of the vbios
> execution directly to the guest. It turns out the vbios is always present on 
> an hardcoded memory range
> in a reserved
> memory range from real mode - all we need to do is to memcpy it into the 
> guest.
> 
> The following patch does 2 things:
> 1) When pci_assign_dev_load_option_rom fails to read the vbios from 
> sysfs(this works only when the igd
> is not the
> boot gpu - this is unlikely to happen) it falls back to using /dev/mem to 
> copy the vbios directly to
> the guest.

Why bother with sysfs if it is unlikely to work?

> Using /dev/mem should be fine as there is more xen specific pci code which 
> also relies on /dev/mem.
> 2) When dealing with IGD in the more generic code we skip the allocation of 
> the rom resource - the
> reason for this is to prevent
> a malicious guest from modifying the vbios in the host -> this is needed as 
> someone might try to pwn
> the i915 driver in the host by doing so
> (attach igd to guest, guest modifies vbios, the guest is terminated and the 
> idg is reattached to the
> host, i915 driver in the host uses data from the modified vbios).
> This is also needed to not overwrite the proper shadow copy made before.
> 
> I've tested this patch and it works fine - the guest isn't complaining about 
> the missing vbios tables
> and the pci config
> space in the guest looks fine.
> 
> Signed-off-by: Grzegorz Uriasz 
> ---
>  hw/xen/xen_pt.c  |  8 +--
>  hw/xen/xen_pt_graphics.c | 48 +---
>  hw/xen/xen_pt_load_rom.c |  2 +-
>  3 files changed, 52 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
> index b91082cb8b..ffc3559dd4 100644
> --- a/hw/xen/xen_pt.c
> +++ b/hw/xen/xen_pt.c
> @@ -483,8 +483,12 @@ static int 
> xen_pt_register_regions(XenPCIPassthroughState *s, uint16_t *cmd)
> i, r->size, r->base_addr, type);
>  }
> 
> -/* Register expansion ROM address */
> -if (d->rom.base_addr && d->rom.size) {
> +/*
> + * Register expansion ROM address. If we are dealing with a ROM
> + * shadow copy for legacy vga devices then don't bother to map it
> + * as previous code creates a proper shadow copy
> + */
> +if (d->rom.base_addr && d->rom.size && !(is_igd_vga_passthrough(d))) {

You don't need brackets around the function call.

  Paul

>  uint32_t bar_data = 0;
> 
>  /* Re-set BAR reported by OS, otherwise ROM can't be read. */
> diff --git a/hw/xen/xen_pt_graphics.c b/hw/xen/xen_pt_graphics.c
> index a3bc7e3921..fe0ef2685c 100644
> --- a/hw/xen/xen_pt_graphics.c
> +++ b/hw/xen/xen_pt_graphics.c
> @@ -129,7 +129,7 @@ int xen_pt_unregister_vga_regions(XenHostPCIDevice *dev)
>  return 0;
>  }
> 
> -static void *get_vgabios(XenPCIPassthroughState *s, int *size,
> +static void *get_sysfs_vgabios(XenPCIPassthroughState *s, int *size,
> XenHostPCIDevice *dev)
>  {
>  return pci_assign_dev_load_option_rom(&s->dev, size,
> @@ -137,6 +137,45 @@ static void *get_vgabios(XenPCIPassthroughState *s, int 
> *size,
>dev->dev, dev->func);
>  }
> 
> +static void xen_pt_direct_vbios_copy(XenPCIPassthroughState *s, Error **errp)
> +{
> +int fd = -1;
> +void *guest_bios = NULL;
> +void *host_vbios = NULL;
> +/* This is always 32 pages in the real mode reserved region */
> +int bios_size = 32 << XC_PAGE_SHIFT;
> +int vbios_addr = 0xc;
> +
> +fd = open("/dev/mem", O_RDONLY);
> +if (fd == -1) {
> +error_setg(errp, "Can't open /dev/mem: %s", strerror(errno));
> +return;
> +}
> +host_vbios = mmap(NULL, bios_size,
> +PROT

Re: Cpu on/offlining crash with core scheduling

2020-04-29 Thread Jürgen Groß

On 27.04.20 15:49, Sergey Dyasli wrote:

Hi Juergen,

When I'm testing vcpu pinning with something like:

  # xl vcpu-pin 0 0 2
  # xen-hptool cpu-offline 3

  (offline / online CPUs {2,3} if the above is successful)

I'm reliably seeing the following crash on the latest staging:

(XEN) Watchdog timer detects that CPU1 is stuck!
(XEN) [ Xen-4.14-unstable  x86_64  debug=y   Not tainted ]
(XEN) CPU:1
(XEN) RIP:e008:[] 
common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385
(XEN) RFLAGS: 0002   CONTEXT: hypervisor
(XEN) rax: f001   rbx: 82d0805c9118   rcx: 83085e750301
(XEN) rdx: 0001   rsi: 83086499b972   rdi: 83085e7503a6
(XEN) rbp: 83085e7dfe28   rsp: 83085e7dfdd8   r8:  830864985440
(XEN) r9:  83085e714068   r10: 0014   r11: 0056b6a1aab2
(XEN) r12: 83086499e490   r13: 82d0805f26e0   r14: 83085e7503a0
(XEN) r15: 0001   cr0: 80050033   cr4: 00362660
(XEN) cr3: 000823a8e000   cr2: 6026000f6fc0
(XEN) fsb:    gsb: 888138dc   gss: 
(XEN) ds: 002b   es: 002b   fs:    gs:    ss: e010   cs: e008
(XEN) Xen code around  
(common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385):
(XEN)  4c 89 f7 e8 dc a5 fd ff <4b> 8b 44 fd 00 48 8b 04 18 4c 3b 70 10 0f 85 3f
(XEN) Xen stack trace from rsp=83085e7dfdd8:
(XEN)0056b42128a6 83086499ff30 83086498a000 83085e7dfe48
(XEN)00010001 0056b42128a6 83086499e490 
(XEN)0001 0001 83085e7dfe78 82d080252ae8
(XEN)83086498a000 000180230434 83085e7503a0 82d0805ceb00
(XEN) 82d0805cea80  82d0805dea80
(XEN)83085e7dfeb0 82d08022c232 0001 82d0805ceb00
(XEN)0001 0001 0001 83085e7dfec0
(XEN)82d08022c2cd 83085e7dfef0 82d08031cae9 83086498a000
(XEN)83086498a000 0001 0001 83085e7dfde8
(XEN)88813021d700 88813021d700  
(XEN)0007 88813021d700 0246 7ff0
(XEN) 0001ca00  810013aa
(XEN)8203d210 deadbeefdeadf00d deadbeefdeadf00d 0100
(XEN)810013aa e033 0246 c900400dfeb0
(XEN)e02b   
(XEN) e011 83086498a000 0037e43bd000
(XEN)00362660  800864980002 0601
(XEN)
(XEN) Xen call trace:
(XEN)[] R 
common/sched/core.c#sched_wait_rendezvous_in+0x16c/0x385
(XEN)[] F common/sched/core.c#sched_slave+0x262/0x31e
(XEN)[] F common/softirq.c#__do_softirq+0x8a/0xbc
(XEN)[] F do_softirq+0x13/0x15
(XEN)[] F arch/x86/domain.c#idle_loop+0x57/0xa7
(XEN)
(XEN) CPU0 @ e008:82d08022c2b7 (process_pending_softirqs+0x53/0x56)
(XEN) CPU4 @ e008:82d08022bc40 
(common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
(XEN) CPU2 @ e008:82d08022c26f (process_pending_softirqs+0xb/0x56)
(XEN) CPU7 @ e008:82d08022bc40 
(common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
(XEN) CPU3 @ e008:82d08022bc40 
(common/rcupdate.c#rcu_process_callbacks+0x22e/0x24b)
(XEN) CPU5 @ e008:82d08022cc34 (_spin_lock+0x4d/0x62)
(XEN) CPU6 @ e008:82d08022c264 (process_pending_softirqs+0/0x56)
(XEN)
(XEN) 
(XEN) Panic on CPU 1:
(XEN) FATAL TRAP: vector = 2 (nmi)
(XEN) [error_code=] , IN INTERRUPT CONTEXT
(XEN) 
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing kexec image on cpu1
(XEN) Shot down all CPUs


Is this something you can reproduce?


Yes, I was able to hit this.

Attached patch is fixing it for me. Could you give it a try?


Juergen
>From 44b530f4bb9c94ae4e429b83d730237f11410a5f Mon Sep 17 00:00:00 2001
From: Juergen Gross 
Date: Wed, 29 Apr 2020 09:40:46 +0200
Subject: [PATCH] xen/sched: allow rcu work to happen when syncing cpus in core
 scheduling

With RCU barriers moved from tasklets to normal RCU processing cpu
offlining in core scheduling might deadlock due to cpu synchronization
required by RCU processing and core scheduling concurrently.

Fix that by bailing out from core scheduling synchronization in case
of pending RCU work. Additionally the RCU softirq is now required to
be of higher priority than the scheduling softirqs in order to do
RCU processing before entering the scheduler again, as bailing out from
the core scheduling synchronization requires to raise another softirq
SCHED_SLAVE, which would bypass RCU processing again.

Reported-by: Sergey Dyasli 
Signed-off-by: Juergen Gross 
---
 xen/common/sched/core.c   | 10 +++---
 xen/include/xen/softir

Re: [PATCH v2] mem_sharing: map shared_info page to same gfn during fork

2020-04-29 Thread Roger Pau Monné
On Tue, Apr 28, 2020 at 08:29:00AM -0700, Tamas K Lengyel wrote:
> During a VM fork we copy the shared_info page; however, we also need to ensure
> that the page is mapped into the same GFN in the fork as its in the parent.
> 
> Signed-off-by: Tamas K Lengyel 
> Suggested-by: Roger Pau Monne 

Reviewed-by: Roger Pau Monné 

Thanks!



  1   2   >