Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Christoffer Dall
On Thu, Aug 13, 2015 at 8:41 AM, Jan Beulich  wrote:
 On 12.08.15 at 18:18,  wrote:
>> On 12/08/15 16:58, Jan Beulich wrote:
>> On 12.08.15 at 17:51,  wrote:
 On Wed, Aug 12, 2015 at 5:45 PM, Jan Beulich  wrote:
 On 07.08.15 at 04:11,  wrote:
>> All these tables will be copied to Dom0 memory except that the reused
>> tables(DSDT, SPCR, etc) will be mapped to Dom0.
>
> Which again seems odd - such tables should be considered MMIO
> (despite living in RAM), and hence not be part of Dom0's memory
> assignment.
>
 not sure if this applies to the context of your comment, but we had
 issues before when trying to deal with this data as device memory
 (MMIO), because ARM doesn't allow unaligned accesses etc. on device
 memory, and so Dom0 would crash at memory abort exceptions during
 boot, so this really does have to be mapped as RAM to Dom0.  If you
 meant some internal Xen bookkeeping thing, then just ignore me.
>>>
>>> Hmm, how would native Linux avoid such unaligned accesses then?
>>
>> ACPI table are living on RAM on ARM. So there is no issue with Linux
>> baremetal.
>
> I'm sure our interpretation of the meaning of RAM here differs:
> While from a physical perspective the tables live in RAM too on
> x86, from a memory map perspective they don't. And since iirc
> ACPI implies UEFI, and since UEFI has special memory types
> for such tables (to prevent the OS from using the space as
> normal memory), I can't believe this to be different under ARM.
>
>> The "problem" is from Xen where we are mapping memory region with Device
>> attribute. This is because until now we never had a reason to directly
>> map other thing than MMIO to the domain.
>>
>> This could be fixed by adding new helper in Xen to directly map RAM region.
>
> Which would seem to be the correct route.
>
>> Nonetheless, we still have to copy some table in Xen in order to modify
>> them and/or new one. I have in mind the FADT table to set the hypervisor
>> field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
>> avoid the kernel thinks there is hyp mode available.
>
> "Have to"? Or rather "would like to"? As said in another reply to
> Shannon, I didn't see any rationale spelled out for this fundamental
> difference to x86.
>
So the suggestion is to just edit the tables in place rather than
copying them and relying on Xen having already read whatever
'original' information it needs before modifying them?

Is there ever a case where the tables live in ROM or shouldn't be
modified for any other reason?

-Christoffer

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 10:01,  wrote:
> On Thu, Aug 13, 2015 at 8:41 AM, Jan Beulich  wrote:
> On 12.08.15 at 18:18,  wrote:
>>> Nonetheless, we still have to copy some table in Xen in order to modify
>>> them and/or new one. I have in mind the FADT table to set the hypervisor
>>> field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
>>> avoid the kernel thinks there is hyp mode available.
>>
>> "Have to"? Or rather "would like to"? As said in another reply to
>> Shannon, I didn't see any rationale spelled out for this fundamental
>> difference to x86.
>>
> So the suggestion is to just edit the tables in place rather than
> copying them and relying on Xen having already read whatever
> 'original' information it needs before modifying them?

No, the suggestion is to leave the tables alone and teach Dom0
to ignore parts it has no interest in.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 10:01,  wrote:
> On Thu, Aug 13, 2015 at 8:41 AM, Jan Beulich  wrote:
> On 12.08.15 at 18:18,  wrote:
>>> Nonetheless, we still have to copy some table in Xen in order to modify
>>> them and/or new one. I have in mind the FADT table to set the hypervisor
>>> field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
>>> avoid the kernel thinks there is hyp mode available.
>>
>> "Have to"? Or rather "would like to"? As said in another reply to
>> Shannon, I didn't see any rationale spelled out for this fundamental
>> difference to x86.
>>
> So the suggestion is to just edit the tables in place rather than
> copying them and relying on Xen having already read whatever
> 'original' information it needs before modifying them?

And just to clarify / amend my reply just sent: I also don't see why
you would need to expose the System Table to Dom0. Did anyone
of you look at how we draw the boundary between Xen and Dom0
on x86? If not, I'd suggest doing so. If so, I'd like to ask for
proper reasoning for any item of data exposed to Dom0 beyond
what we currently expose.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v2] libxl: fix libxl__build_hvm error code return path

2015-08-13 Thread Wei Liu
This one missed libxl__domain_firmware call. I will respin.

Not sure why gcc doesn't complain.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Failure to boot HVM guest with more than 32 VCPUS

2015-08-13 Thread Vitaly Kuznetsov
"Hao, Xudong"  writes:

>> -Original Message-
>> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>> Sent: Wednesday, August 12, 2015 5:40 PM
>> To: Hao, Xudong
>> Cc: xen-devel@lists.xen.org
>> Subject: Re: [Xen-devel] Failure to boot HVM guest with more than 32
>> VCPUS
>> 
>> "Hao, Xudong"  writes:
>> 
>> > Hi,
>> >
>> > In X86_64 platform, we noticed an issue that Xen boot a RHEL6u6 or
>> > Fedora22 guest, when configure the VCPU more than 32, the guest will
>> > fail to boot up.
>> 
>> The issue is well-known for RHEL6.6 (and is fixed in 6.7 and in 6.6.z)
>> but Fedora22 should boot. The log below is from RHEL6.6, can you please
>> provide one from Fedora?
>> 
>
> Vitaly,
>
> Thanks for quick response. Is the fix in guest pv driver?

The fix is in RHEL6 kernel, prior to it there is no support for
vcpu_info outside of shared_info and so only 32 vcpus are supported.

> Will catch Fedora22 log.

Please do, this is probably a different issue.

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST] ts-xen-install: Install netcat-openbsd

2015-08-13 Thread Ian Campbell
On Wed, 2015-08-12 at 17:09 +0100, Ian Campbell wrote:
> This is required by libvirt for live migration (netcat-traditional
> doesn't cut it).
> 
> netcat-openbsd has higher update-alternatives priority, so it will be
> used if installed.
> 
> Signed-off-by: Ian Campbell 
> Cc: Jim Fehlig 

I ran an adhoc test with this applied which resulted in:

2015-08-12 18:07:26 Z executing ssh ... root@10.80.229.144 virsh migrate --live 
debian.guest.osstest xen+ssh://lace-bug
error: unable to connect to 'lace-bug.uk.xensource.com:49152': Invalid argument

(note that the domainname for this osstest instance is "xs.citrite.net",
there is such a domain as "uk.xensource.com" which may have come from DHCP
or DNS or something).

So I switched to using $dho->{Fqdn} instead of just $dho->{Name} and tried
again resulting in:

2015-08-12 21:49:22 Z executing ssh ... root@10.80.229.144 virsh migrate --live 
debian.guest.osstest xen+ssh://lace-bug.xs.citrite.net
error: unable to connect to 'lace-bug.uk.xensource.com:49152': Invalid argument

Which is pretty WTF... Logs for this one are at 
http://xenbits.xen.org/people/ianc/tmp/37828/
http://xenbits.xen.org/people/ianc/tmp/37828/test-amd64-amd64-libvirt-pair/info.html

I'm doing an experiment now with $dho->{Ip} but that seems like it would
just be papering over the issue, whatever it is.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.6 v3] libxl: fix libxl__build_hvm error code return path

2015-08-13 Thread Wei Liu
In 25652f23 ("tools/libxl: detect and avoid conflicts with RDM"), new
code was added to use rc to store libxl function call return value,
which complied to libxl coding style. That patch, however, didn't change
other locations where return value was stored in ret. In the end
libxl__build_hvm could return 0 when it failed.

Change all assignments to ret to assignment to rc and abolish ret.

Signed-off-by: Wei Liu 
---
v3: store return value for libxl__domain_firmware in rc, too.
---
 tools/libxl/libxl_dom.c | 47 +--
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index e1f11a3..1e5bb0b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -912,7 +912,7 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 {
 libxl_ctx *ctx = libxl__gc_owner(gc);
 struct xc_hvm_build_args args = {};
-int ret, rc = ERROR_FAIL;
+int rc;
 uint64_t mmio_start, lowmem_end, highmem_end;
 libxl_domain_build_info *const info = &d_config->b_info;
 
@@ -932,10 +932,13 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 if (max_ram_below_4g < HVM_BELOW_4G_MMIO_START)
 args.mmio_size = info->u.hvm.mmio_hole_memkb << 10;
 }
-if (libxl__domain_firmware(gc, info, &args)) {
+
+rc = libxl__domain_firmware(gc, info, &args);
+if (rc) {
 LOG(ERROR, "initializing domain firmware failed");
 goto out;
 }
+
 if (args.mem_target == 0)
 args.mem_target = args.mem_size;
 if (args.mmio_size == 0)
@@ -963,15 +966,15 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 if (info->num_vnuma_nodes != 0) {
 int i;
 
-ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
-if (ret) {
-LOGEV(ERROR, ret, "hvm build vmemranges failed");
+rc = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+if (rc) {
+LOGEV(ERROR, rc, "hvm build vmemranges failed");
 goto out;
 }
-ret = libxl__vnuma_config_check(gc, info, state);
-if (ret) goto out;
-ret = set_vnuma_info(gc, domid, info, state);
-if (ret) goto out;
+rc = libxl__vnuma_config_check(gc, info, state);
+if (rc) goto out;
+rc = set_vnuma_info(gc, domid, info, state);
+if (rc) goto out;
 
 args.nr_vmemranges = state->num_vmemranges;
 args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
@@ -991,9 +994,9 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 args.vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
 }
 
-ret = xc_hvm_build(ctx->xch, domid, &args);
-if (ret) {
-LOGEV(ERROR, ret, "hvm building failed");
+rc = xc_hvm_build(ctx->xch, domid, &args);
+if (rc) {
+LOGEV(ERROR, rc, "hvm building failed");
 goto out;
 }
 
@@ -1002,22 +1005,22 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
-ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
-   &state->store_mfn, state->console_port,
-   &state->console_mfn, state->store_domid,
-   state->console_domid);
-if (ret) {
-LOGEV(ERROR, ret, "hvm build set params failed");
+rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
+  &state->store_mfn, state->console_port,
+  &state->console_mfn, state->store_domid,
+  state->console_domid);
+if (rc) {
+LOGEV(ERROR, rc, "hvm build set params failed");
 goto out;
 }
 
-ret = hvm_build_set_xs_values(gc, domid, &args);
-if (ret) {
-LOG(ERROR, "hvm build set xenstore values failed (ret=%d)", ret);
+rc = hvm_build_set_xs_values(gc, domid, &args);
+if (rc) {
+LOG(ERROR, "hvm build set xenstore values failed (rc=%d)", rc);
 goto out;
 }
 
-return 0;
+rc = 0;
 out:
 return rc;
 }
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 01/17] VT-d Posted-intterrupt (PI) design

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 03:37,  wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
>> Sent: Wednesday, August 12, 2015 11:35 PM
>> And with those tiny little changes:
>> 
>> Reviewed-by: Konrad Rzeszutek Wilk 
> 
> Thanks a lot for your review on this whole series, Konrad!

Especially with a reply like this, you should have - just like I'm doing
now - trimmed almost all of the previous mail's body from your reply.
There's absolutely no reason to have all readers scroll through
many pages just to find this one line response.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 11/17] vt-d: Add API to update IRTE when VT-d PI is used

2015-08-13 Thread Jan Beulich
>>> On 12.08.15 at 18:23,  wrote:
> On Wed, Aug 12, 2015 at 10:35:32AM +0800, Feng Wu wrote:
>> +GET_IREMAP_ENTRY(ir_ctrl->iremap_maddr, remap_index, iremap_entries, p);
>> +
>> +old_ire = new_ire = *p;
>> +
>> +/* Setup/Update interrupt remapping table entry. */
>> +setup_posted_irte(&new_ire, pi_desc, gvec);
>> +ret = cmpxchg16b(p, &old_ire, &new_ire);
>> +
>> +ASSERT(ret == *(__uint128_t *)&old_ire);
>> +
>> +iommu_flush_cache_entry(p, sizeof(*p));
> 
> The other use sites of iommu_flush_cache_entry are sizeof(struct 
> iremap_entry).
> I think if you are doing this modification - then you should also have an
> patch to modify the other code to be in sync.
> 
> Or just use the sizeof(struct ...).

I asked for this (to avoid proliferation of the bad style); I
certainly wouldn't mind another cleanup patch to deal with the
other use sites, but I certainly don't want to see this go back
to sizeof(struct ...).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 3/3] libxc: fix vNUMA memory allocation

2015-08-13 Thread Dario Faggioli
On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> We should use new_memflags in xc_domain_populate_physmap. That variable
> contains node information.
> 
> Reported-by: Boris Ostrovsky 
> Signed-off-by: Wei Liu 
>
Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 1/3] xl: fix vNUMA vdistance parsing

2015-08-13 Thread Dario Faggioli
On Wed, 2015-08-12 at 20:35 +0100, Wei Liu wrote:
> We should parse the output from splitting function, not the original
> string.
> 
> Signed-off-by: Wei Liu 
>
Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 12/17] Update IRTE according to guest interrupt config changes

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 03:37,  wrote:
>> From: Konrad Rzeszutek Wilk [mailto:konrad.w...@oracle.com]
>> Sent: Thursday, August 13, 2015 12:43 AM
>> On Wed, Aug 12, 2015 at 10:35:33AM +0800, Feng Wu wrote:
>> > +}
>> > +
>> > +if ( dest_vcpu_num != 0 )
>> > +{
>> > +idx = 0;
>> > +
>> > +for ( i = gvec % dest_vcpu_num; i >= 0; i--)
>> 
>> That loop is not good as it will overflow.
>> 
>> Imagine gvec = 40, dest_vcpu_num = 2
>> On first iteration i = 0, on the next i = -1 (aka 0xfff), and so on.
>> 
> 
> So we should define 'i' as int, right?

No, you should fix the loop. 'i' can't validly hold negative values.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Failure to boot HVM guest with more than 32 VCPUS

2015-08-13 Thread Hao, Xudong


> -Original Message-
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> Sent: Thursday, August 13, 2015 4:20 PM
> To: Hao, Xudong 
> Cc: xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] Failure to boot HVM guest with more than 32 VCPUS
> 
> "Hao, Xudong"  writes:
> 
> >> -Original Message-
> >> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
> >> Sent: Wednesday, August 12, 2015 5:40 PM
> >> To: Hao, Xudong
> >> Cc: xen-devel@lists.xen.org
> >> Subject: Re: [Xen-devel] Failure to boot HVM guest with more than 32
> >> VCPUS
> >>
> >> "Hao, Xudong"  writes:
> >>
> >> > Hi,
> >> >
> >> > In X86_64 platform, we noticed an issue that Xen boot a RHEL6u6 or
> >> > Fedora22 guest, when configure the VCPU more than 32, the guest
> >> > will fail to boot up.
> >>
> >> The issue is well-known for RHEL6.6 (and is fixed in 6.7 and in
> >> 6.6.z) but Fedora22 should boot. The log below is from RHEL6.6, can
> >> you please provide one from Fedora?
> >>
> >
> > Vitaly,
> >
> > Thanks for quick response. Is the fix in guest pv driver?
> 
> The fix is in RHEL6 kernel, prior to it there is no support for vcpu_info 
> outside of
> shared_info and so only 32 vcpus are supported.
> 
> > Will catch Fedora22 log.
> 
> Please do, this is probably a different issue.
> 

Attach the Fedora22 log, a different issue.

> --
>   Vitaly


fedora22.log
Description: fedora22.log
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?

2015-08-13 Thread Ian Campbell
On Wed, 2015-08-12 at 18:12 +0100, Wei Liu wrote:
> On Wed, Aug 12, 2015 at 11:04:25AM +0100, Ian Campbell wrote:
> [...]
> > > > 
> > > > As Andy says I think we want restore_fd in the check, I can't see 
> > > > any
> > > > reason we wouldn't want to close the socket too.
> > > > 
> > > 
> > > Do you mean migrate_fd when you say "socket"?
> > 
> > In the migrate case we do "restore_fd = migrate_fd;", so yes, 
> > indirectly.
> > 
> > 
> > >  I tried that, but that led
> > > to failure because toolstack still needs to get controlling 
> > > information
> > > out of it (the "GO" message).
> > > 
> > > Maybe I close this too early.
> > 
> > Right.
> > 
> 
> I look at the code. Even if we should close that socket, it should not
> happen inside create_domain, because the caller (migrate_receive) needs
> that fd.
> 
> IMO create_domain should only close restore_fd if that fd is opened by
> itself.

That makes sense, yes. The close should probably have an associated comment
since this will be a bit subtle.

Perhaps rather than trying to repeat the conditions which lead to it being
opened we should just do:
int restore_fd_to_close = -1;
...
restore_fd_to_close = restore_fd = open(restore_file, O_RDONLY);
...
if (restore_fd_to_close >= 0) {
close(restore_fd_to_close);
restore_fd_to_close = -1;
}

Strictly speaking we ought to check the return of close too I suppose.

> Whether we should close send_fd and recv_fd in migrate_receive is
> another matter. I don't think we should. They are just stdin and stdout,
> not closing them wouldn't cause us any trouble.

The trouble they cause is holding kernel resources associated with the
socket, not to mention leaving a possible (perhaps unlikely) avenue of
attack from the network to a process which isn't expecting it...

Any we should be redirecting those to /dev/null as part of daemonising as a
matter of course and it looks like do_daemonize does that, so this is
already fine I think.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Ian Campbell
On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
> The function shouldn't return NULL after having obtained a reference,
> or else the caller won't know to drop it.

> Also its result shouldn't be ignored - if calling code is certain that
> a page already has a non-zero refcount, it better ASSERT()s so.

How is page_get_owner_and_reference sure the reference count was not zero
on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
itself) Or have I misunderstood the assertion being made here?

Likewise where does the preexisting reference in the grant table case come
from? Perhaps a comment alongside the ASSERT would make it clearer what the
assert was doing to future readers ("/* Reference count must have already
been >=1 due to XXX, so this cannot have failed */") or some such.

> 
> Finally this as well as get_page() and put_page() are required to be
> available on all architectures - move the declarations to xen/mm.h.
> 
> Signed-off-by: Jan Beulich 
> 
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -1,10 +1,8 @@
> -#include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> -
> -#include 
>  #include 
>  
>  static unsigned long raw_copy_to_guest_helper(void *to, const void 
> *from,
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -1170,6 +1170,7 @@ long arch_memory_op(int op, XEN_GUEST_HA
>  struct domain *page_get_owner_and_reference(struct page_info *page)
>  {
>  unsigned long x, y = page->count_info;
> +struct domain *owner;
>  
>  do {
>  x = y;
> @@ -1182,7 +1183,10 @@ struct domain *page_get_owner_and_refere
>  }
>  while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
>  
> -return page_get_owner(page);
> +owner = page_get_owner(page);
> +ASSERT(owner);
> +
> +return owner;
>  }
>  
>  void put_page(struct page_info *page)
> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -2039,6 +2039,7 @@ void put_page(struct page_info *page)
>  struct domain *page_get_owner_and_reference(struct page_info *page)
>  {
>  unsigned long x, y = page->count_info;
> +struct domain *owner;
>  
>  do {
>  x = y;
> @@ -2052,7 +2053,10 @@ struct domain *page_get_owner_and_refere
>  }
>  while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
>  
> -return page_get_owner(page);
> +owner = page_get_owner(page);
> +ASSERT(owner);
> +
> +return owner;
>  }
>  
>  
> --- a/xen/common/grant_table.c
> +++ b/xen/common/grant_table.c
> @@ -2245,7 +2245,8 @@ __acquire_grant_for_copy(
>  {
>  ASSERT(mfn_valid(act->frame));
>  *page = mfn_to_page(act->frame);
> -(void)page_get_owner_and_reference(*page);
> +td = page_get_owner_and_reference(*page);
> +ASSERT(td);
>  }
>  
>  act->pin += readonly ? GNTPIN_hstr_inc : GNTPIN_hstw_inc;
> --- a/xen/include/asm-arm/mm.h
> +++ b/xen/include/asm-arm/mm.h
> @@ -275,10 +275,6 @@ static inline void *page_to_virt(const s
>  return mfn_to_virt(page_to_mfn(pg));
>  }
>  
> -struct domain *page_get_owner_and_reference(struct page_info *page);
> -void put_page(struct page_info *page);
> -int  get_page(struct page_info *page, struct domain *domain);
> -
>  struct page_info *get_page_from_gva(struct domain *d, vaddr_t va,
>  unsigned long flags);
>  
> --- a/xen/include/asm-x86/mm.h
> +++ b/xen/include/asm-x86/mm.h
> @@ -352,9 +352,6 @@ const unsigned long *get_platform_badpag
>  int page_lock(struct page_info *page);
>  void page_unlock(struct page_info *page);
>  
> -struct domain *page_get_owner_and_reference(struct page_info *page);
> -void put_page(struct page_info *page);
> -int  get_page(struct page_info *page, struct domain *domain);
>  void put_page_type(struct page_info *page);
>  int  get_page_type(struct page_info *page, unsigned long type);
>  int  put_page_type_preemptible(struct page_info *page);
> --- a/xen/include/xen/mm.h
> +++ b/xen/include/xen/mm.h
> @@ -45,6 +45,7 @@
>  #ifndef __XEN_MM_H__
>  #define __XEN_MM_H__
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -77,9 +78,12 @@ TYPE_SAFE(unsigned long, pfn);
>  #undef pfn_t
>  #endif
>  
> -struct domain;
>  struct page_info;
>  
> +struct domain *__must_check page_get_owner_and_reference(struct 
> page_info *);
> +void put_page(struct page_info *);
> +int get_page(struct page_info *, struct domain *);
> +
>  /* Boot-time allocator. Turns into generic allocator after bootstrap. */
>  void init_boot_pages(paddr_t ps, paddr_t pe);
>  unsigned long alloc_boot_pages(
> 
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 09:39:36AM +0100, Ian Campbell wrote:
> On Wed, 2015-08-12 at 18:12 +0100, Wei Liu wrote:
> > On Wed, Aug 12, 2015 at 11:04:25AM +0100, Ian Campbell wrote:
> > [...]
> > > > > 
> > > > > As Andy says I think we want restore_fd in the check, I can't see 
> > > > > any
> > > > > reason we wouldn't want to close the socket too.
> > > > > 
> > > > 
> > > > Do you mean migrate_fd when you say "socket"?
> > > 
> > > In the migrate case we do "restore_fd = migrate_fd;", so yes, 
> > > indirectly.
> > > 
> > > 
> > > >  I tried that, but that led
> > > > to failure because toolstack still needs to get controlling 
> > > > information
> > > > out of it (the "GO" message).
> > > > 
> > > > Maybe I close this too early.
> > > 
> > > Right.
> > > 
> > 
> > I look at the code. Even if we should close that socket, it should not
> > happen inside create_domain, because the caller (migrate_receive) needs
> > that fd.
> > 
> > IMO create_domain should only close restore_fd if that fd is opened by
> > itself.
> 
> That makes sense, yes. The close should probably have an associated comment
> since this will be a bit subtle.
> 
> Perhaps rather than trying to repeat the conditions which lead to it being
> opened we should just do:
> int restore_fd_to_close = -1;
> ...
> restore_fd_to_close = restore_fd = open(restore_file, O_RDONLY);
> ...
> if (restore_fd_to_close >= 0) {
> close(restore_fd_to_close);
> restore_fd_to_close = -1;
> }
> 
> Strictly speaking we ought to check the return of close too I suppose.
> 

What would we do in case close fails?

> > Whether we should close send_fd and recv_fd in migrate_receive is
> > another matter. I don't think we should. They are just stdin and stdout,
> > not closing them wouldn't cause us any trouble.
> 
> The trouble they cause is holding kernel resources associated with the
> socket, not to mention leaving a possible (perhaps unlikely) avenue of
> attack from the network to a process which isn't expecting it...
> 
> Any we should be redirecting those to /dev/null as part of daemonising as a
> matter of course and it looks like do_daemonize does that, so this is
> already fine I think.
> 

Right.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [BUG][4.5.1]xl cpupool-create segfault (with config file parameter)

2015-08-13 Thread Dario Faggioli
On Wed, 2015-08-12 at 12:28 +0100, Ian Campbell wrote:
> On Wed, 2015-08-12 at 13:03 +0200, Benjamin Beier wrote:

> > You never get any output apart from "segmentation fault".
> > If you create exactly the same cpupool without using a config file it 
> > works absolutely fine.
> > Tested on multiple systems and it seems to be reproducible for everyone.
> > STrace shows that the command fails right after reading the content of 
> > the config file.
> 
> Please can you capture a backtrace by running it under gdb. Valgrind might
> also have something interesting to say.
> 
Yep, a gdb backtrace would be really helpful.

I've 'played' with cpupools quite a bit in the last few months (during
the whole Xen 4.6 release cycle), and I don't remember seeing anything
like this... :-/

> FWIW this doesn't happen on the current development branch:
> # cat foo
> Testing
> # xl cpupool-create foo
> foo:1: config parsing error near `Testing': lexical error
> Failed to parse config file: Invalid argument
> 
Just tried with the example cpupool config file, from /etc/xen/cpupool,
and it works. I'm also on staging (i.e., 4-6.-rc1):

  root@Zhaman:~# cat /etc/xen/cpupool  
  #
  # Configuration setup for 'xm cpupool-create' or 'xl cpupool-create'.
  # This script sets the parameters used when a cpupool is created using
  # 'xm cpupool-create' or 'xl cpupool-create'.
  # You use a separate script for each cpupool you want to create, or 
  # you can set the parameters for the cpupool on the xm command line.
  #

  # the name of the new cpupool
  name = "Example-Cpupool"

  # the scheduler to use: valid are e.g. credit, credit2 and rtds
  sched = "credit"

  # list of cpus to use
  cpus = ["2", "3"]

  root@Zhaman:~# xl cpupool-create /etc/xen/cpupool
  cpu 2 illegal or not free
  root@Zhaman:~# xl cpupool-cpu-remove Pool-0 2,3
  root@Zhaman:~# xl cpupool-create /etc/xen/cpupool
  Using config file "/etc/xen/cpupool"
  cpupool name:   Example-Cpupool
  scheduler:  credit
  number of cpus: 2
  root@Zhaman:~# xl cpupool-list 
  Name   CPUs   Sched Active   Domain count
  Pool-0  14credit   y  1
  Example-Cpupool  2credit   y  0

> But I don't see any obviously related looking fixes in the commit log.
> 
Since 4.5, there has been changes touching both xl and libxl bits of
cpupools. E.g.:

  commit a86eecbbf5155aa5b4ec02b6c5e41baf1a7f49de
  xl: enable using ranges of pCPUs when creating cpupools

However, as I said, I don't remember facing (and fixing) such error.

> Was 4.5.0 ok?
> 
> Ian.

-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-08-13 Thread Dario Faggioli
On Wed, 2015-08-12 at 13:13 -0400, Konrad Rzeszutek Wilk wrote:
> On Wed, Aug 12, 2015 at 10:35:36AM +0800, Feng Wu wrote:
> > This patch adds the following arch hooks in scheduler:
> > - vmx_pre_ctx_switch_pi():
> > It is called before context switch, we update the posted
> > interrupt descriptor when the vCPU is preempted, go to sleep,
> > or is blocked.
> > 
> > - vmx_post_ctx_switch_pi()
> > It is called after context switch, we update the posted
> > interrupt descriptor when the vCPU is going to run.
> > 
> > - arch_vcpu_wake_prepare()
> > It will be called when waking up the vCPU, we update
> > the posted interrupt descriptor when the vCPU is unblocked.
> > 
> > CC: Keir Fraser 
> > CC: Jan Beulich 
> > CC: Andrew Cooper 
> > CC: Kevin Tian 
> > CC: George Dunlap 
> > CC: Dario Faggioli 
> > Sugguested-by: Dario Faggioli 
> > Signed-off-by: Feng Wu 
>
With Konrad's points addressed:

Reviewed-by: Dario Faggioli 

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 00:41 -0600, Jan Beulich wrote:
> > 
> > Nonetheless, we still have to copy some table in Xen in order to modify
> > them and/or new one. I have in mind the FADT table to set the hypervisor
> > field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
> > avoid the kernel thinks there is hyp mode available.
> 
> "Have to"? Or rather "would like to"? As said in another reply to
> Shannon, I didn't see any rationale spelled out for this fundamental
> difference to x86.

I think the fundamental difference is that h/w virt features are not
(always) "discoverable" through h/w interfaces on ARM whereas they are
visible in e.g. cpuid on x86 and not described in ACPI (x86 is only just
gaining hardware support for virtualised interrupts so perhaps this will
change? I think it doesn't have any hypervisor dedicated timer sources).

So on ARM the firmware tables contain things like the additional
virtualisation register regions in the interrupt controller description and
the hypervisor timer interrupt in the timer block description. So we would
like to hide these from dom0.

Perhaps instead we should teach dom0 to notice that it was launched in
Kernel mode rather than Hyp mode (which is detectable) and therefore ignore
these unusable resources. Now you mention it that does sound sensible and I
imagine is even already (close to) true for a Linux kernel to handle e.g.
older firmware with newer device tree.

The other reason for modification is to hide the Xen console device (i.e.
one UART) from the dom0 kernel, since it is unusable. How does that work on
x86? Do you just not bother and you expect the admin to arrange the
configuration to work or is there some other trick?

BTW, IIRC x86 does modify at least one ACPI table which is the DMAR (I
think), to hide the IOMMU from the guest? That's another table we would
want to frob on ARM I think (or it's equivalent, which I think is IORT).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V6 3/7] libxl: add pvusb API

2015-08-13 Thread Wei Liu
On Tue, Aug 11, 2015 at 08:24:01PM -0600, Chun Yan Liu wrote:
> 
> 
> >>> On 8/11/2015 at 07:27 PM, in message
> <2015082702.gf7...@zion.uk.xensource.com>, Wei Liu 
> wrote: 
> > On Mon, Aug 10, 2015 at 06:35:24PM +0800, Chunyan Liu wrote: 
> > > Add pvusb APIs, including: 
> > >  - attach/detach (create/destroy) virtual usb controller. 
> > >  - attach/detach usb device 
> > >  - list usb controller and usb devices 
> > >  - some other helper functions 
> > >  
> > > Signed-off-by: Chunyan Liu  
> > > Signed-off-by: Simon Cao  
> > >  
> > > --- 
> > > changes: 
> > >   - Address George's comments: 
> > >   * Update libxl_device_usb_getinfo to read ctrl/port only and 
> > > get other information. 
> > >   * Update backend path according to xenstore frontend 'xxx/backend' 
> > > entry instead of using TOOLSTACK_DOMID. 
> > >   * Use 'type' to indicate qemu/pv instead of previous naming 'protocol'. 
> > >   * Add USB 'devtype' union, currently only includes "hostdev" 
> > >  
> >  
> > I will leave this to Ian and George since they had strong opinions on 
> > this. 
> >  
> > I only skimmed this patch. Some comments below. 
> >  
> > [...] 
> > > + 
> > > +int libxl_device_usb_getinfo(libxl_ctx *ctx, uint32_t domid, 
> > > + libxl_device_usb *usb, 
> > > + libxl_usbinfo *usbinfo); 
> > > + 
> > >  /* Network Interfaces */ 
> > >  int libxl_device_nic_add(libxl_ctx *ctx, uint32_t domid, 
> > > libxl_device_nic  
> > *nic, 
> > >   const libxl_asyncop_how *ao_how) 
> > > diff --git a/tools/libxl/libxl_device.c b/tools/libxl/libxl_device.c 
> > > index bee5ed5..935f25b 100644 
> > > --- a/tools/libxl/libxl_device.c 
> > > +++ b/tools/libxl/libxl_device.c 
> > > @@ -676,6 +676,10 @@ void libxl__devices_destroy(libxl__egc *egc,  
> > libxl__devices_remove_state *drs) 
> > >  aodev->action = LIBXL__DEVICE_ACTION_REMOVE; 
> > >  aodev->dev = dev; 
> > >  aodev->force = drs->force; 
> > > +if (dev->backend_kind == LIBXL__DEVICE_KIND_VUSB) { 
> > > +libxl__initiate_device_usbctrl_remove(egc, aodev); 
> > > +continue; 
> > > +} 
> >  
> > Is there a risk that this races with individual device removal? I think 
> > you get away with it because removal of individual device is idempotent? 
> 
> You mean races with other device removal (like 'vbd') ? Yes, it is idempotent.
> Only for 'vusb' (corresponding to USB controller), before removing USB 
> controller
> it will first removing all USB devices under it. 
> 

No. What I mean is, the removal of usbctrl triggers removal of all
assigned usb devices. And then this function initiates removal of
assigned usb devices again. Is this a possible scenario?

> >  
> > >  libxl__initiate_device_remove(egc, aodev); 
> > >  } 
> > >  } 
> > > diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h 
> > > index f98f089..5be3b3a 100644 
> > > --- a/tools/libxl/libxl_internal.h 
> > > +++ b/tools/libxl/libxl_internal.h 
> > > @@ -2553,6 +2553,14 @@ _hidden void libxl__device_vtpm_add(libxl__egc 
> > > *egc,  
> > uint32_t domid, 
> > > libxl_device_vtpm *vtpm, 
> > > libxl__ao_device *aodev); 
> > >   
> > > +_hidden void libxl__device_usbctrl_add(libxl__egc *egc, uint32_t domid, 
> > > +   libxl_device_usbctrl *usbctrl, 
> > > +   libxl__ao_device *aodev); 
> > > + 
> > > +_hidden void libxl__device_usb_add(libxl__egc *egc, uint32_t domid, 
> > > +   libxl_device_usb *usb, 
> > > +   libxl__ao_device *aodev); 
> > > + 
> > >  /* Internal function to connect a vkb device */ 
> > >  _hidden int libxl__device_vkb_add(libxl__gc *gc, uint32_t domid, 
> > >libxl_device_vkb *vkb); 
> > > @@ -2585,6 +2593,13 @@ _hidden void  
> > libxl__wait_device_connection(libxl__egc*, 
> > >  _hidden void libxl__initiate_device_remove(libxl__egc *egc, 
> > > libxl__ao_device *aodev); 
> > >   
> > > +_hidden int libxl__device_from_usbctrl(libxl__gc *gc, uint32_t domid, 
> > [...] 
> > > +void libxl__device_usb_add(libxl__egc *egc, uint32_t domid, 
> > > +   libxl_device_usb *usb, 
> > > +   libxl__ao_device *aodev) 
> > > +{ 
> > > +STATE_AO_GC(aodev->ao); 
> > > +int rc = -1; 
> > > +char *busid = NULL; 
> > > + 
> > > +assert(usb->u.hostdev.hostbus > 0 && usb->u.hostdev.hostaddr > 0); 
> > > + 
> > > +busid = usb_busaddr_to_busid(gc, usb->u.hostdev.hostbus, 
> > > + usb->u.hostdev.hostaddr); 
> > > +if (!busid) { 
> > > +LOG(ERROR, "USB device doesn'

Re: [Xen-devel] [PATCH V6 2/7] libxl_read_file_contents: add new entry to read sysfs file

2015-08-13 Thread Wei Liu
On Tue, Aug 11, 2015 at 08:37:09PM -0600, Chun Yan Liu wrote:
[...]
> > > + 
> > > +if (rs < datalen) { 
> > > +if (ferror(f)) { 
> > >  LOGE(ERROR, "failed to read %s", filename); 
> > > -else if (feof(f)) 
> > > -LOG(ERROR, "%s changed size while we were reading it", 
> > > - filename); 
> > > -else 
> > > +goto xe; 
> > > +} else if (feof(f)) { 
> > > +if (tolerate_shrinking_file) { 
> > > +datalen = rs; 
> > > +} else { 
> > > +LOG(ERROR, "%s shrunk size while we were reading 
> > > it", 
> > > +filename); 
> > > +goto xe; 
> > > +} 
> > > +} else { 
> > >  abort(); 
> > > -goto xe; 
> > > +} 
> >  
> > This is a bit bikeshedding, but you can leave "goto xe" out of two `if' 
> > to reduce patch size. 
> 
> I guess you mean if (ferror(f)) and if (feof(f)) ? We can't leave 'goto xe' 
> outside,
> since in if (feof(f)) && if (tolerate_shrinking_file), it's not error but an 
> expected
> result in sysfs case.   
> 

Oh, right. I missed that tolerate_shrinking_file check. Sorry for the
noise.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 15/17] vmx: Add some scheduler hooks for VT-d posted interrupts

2015-08-13 Thread Wu, Feng


> -Original Message-
> From: Dario Faggioli [mailto:dario.faggi...@citrix.com]
> Sent: Thursday, August 13, 2015 5:06 PM
> To: Wu, Feng
> Cc: Konrad Rzeszutek Wilk; xen-devel@lists.xen.org; Tian, Kevin; Keir Fraser;
> George Dunlap; Andrew Cooper; Jan Beulich
> Subject: Re: [Xen-devel] [PATCH v5 15/17] vmx: Add some scheduler hooks for
> VT-d posted interrupts
> 
> > > Signed-off-by: Feng Wu 
> >
> With Konrad's points addressed:
> 
> Reviewed-by: Dario Faggioli 

Thanks for this!

-Feng

> 
> Regards,
> Dario
> --
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 09:50 +0100, Wei Liu wrote:
> On Thu, Aug 13, 2015 at 09:39:36AM +0100, Ian Campbell wrote:
> > On Wed, 2015-08-12 at 18:12 +0100, Wei Liu wrote:
> > > On Wed, Aug 12, 2015 at 11:04:25AM +0100, Ian Campbell wrote:
> > > [...]
> > > > > > 
> > > > > > As Andy says I think we want restore_fd in the check, I can't 
> > > > > > see 
> > > > > > any
> > > > > > reason we wouldn't want to close the socket too.
> > > > > > 
> > > > > 
> > > > > Do you mean migrate_fd when you say "socket"?
> > > > 
> > > > In the migrate case we do "restore_fd = migrate_fd;", so yes, 
> > > > indirectly.
> > > > 
> > > > 
> > > > >  I tried that, but that led
> > > > > to failure because toolstack still needs to get controlling 
> > > > > information
> > > > > out of it (the "GO" message).
> > > > > 
> > > > > Maybe I close this too early.
> > > > 
> > > > Right.
> > > > 
> > > 
> > > I look at the code. Even if we should close that socket, it should 
> > > not
> > > happen inside create_domain, because the caller (migrate_receive) 
> > > needs
> > > that fd.
> > > 
> > > IMO create_domain should only close restore_fd if that fd is opened 
> > > by
> > > itself.
> > 
> > That makes sense, yes. The close should probably have an associated 
> > comment
> > since this will be a bit subtle.
> > 
> > Perhaps rather than trying to repeat the conditions which lead to it 
> > being
> > opened we should just do:
> > int restore_fd_to_close = -1;
> > ...
> > restore_fd_to_close = restore_fd = open(restore_file, O_RDONLY);
> > ...
> > if (restore_fd_to_close >= 0) {
> > close(restore_fd_to_close);
> > restore_fd_to_close = -1;
> > }
> > 
> > Strictly speaking we ought to check the return of close too I suppose.
> > 
> 
> What would we do in case close fails?

Maybe just log?

http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html says
we can get:

EBADF, which would be an error in our code.
EINTR, ...
EIO, which would be from disk full or a disk dying or something.

We (and most code in general) tends to not worry about close failing too
much, there's an argument for that too...

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Failure to boot HVM guest with more than 32 VCPUS

2015-08-13 Thread Vitaly Kuznetsov
"Hao, Xudong"  writes:

>
> Attach the Fedora22 log, a different issue.
>

Here is the crash:

[ 6399.814801] xen_netfront: Initialising Xen virtual ethernet driver
[ 6399.814838] [drm] Initialized drm 1.1.0 20060810
[ 6399.838500] xen_netfront: can't alloc rx grant refs
[ 6399.838504] net eth0: only created 31 queues
[ 6399.839930] BUG: unable to handle kernel NULL pointer dereference at 
0018
[ 6399.839938] IP: [] netback_changed+0x8eb/0xef0 
[xen_netfront]
[ 6399.839940] PGD 0
[ 6399.839943] Oops:  [#1] SMP
[ 6399.839948] Modules linked in: drm xen_netfront(+) xen_blkfront(+) 
crc32c_intel ata_generic pata_acpi
[ 6399.839955] CPU: 0 PID: 241 Comm: xenwatch Not tainted 4.0.4-301.fc22.x86_64 
#1
[ 6399.839957] Hardware name: Xen HVM domU, BIOS 4.6.0-rc 08/13/2015
[ 6399.839959] task: 880107ffc590 ti: 88010792 task.ti: 
88010792
[ 6399.839963] RIP: 0010:[]  [] 
netback_changed+0x8eb/0xef0 [xen_netfront]
[ 6399.839965] RSP: :880107923d68  EFLAGS: 00010202
[ 6399.839966] RAX:  RBX: 8800e6568000 RCX: 0001
[ 6399.839968] RDX: 000e6568 RSI: 8800e65c40f8 RDI: 3f41
[ 6399.839969] RBP: 880107923df8 R08: c994 R09: 
[ 6399.839971] R10: ea0003995a00 R11: 0010 R12: 8800e65c4000
[ 6399.839972] R13: 8800e65c40f8 R14: 8800e6567000 R15: 00044000
[ 6399.839974] FS:  () GS:88010960() 
knlGS:
[ 6399.839975] CS:  0010 DS:  ES:  CR0: 80050033
[ 6399.839977] CR2: 0018 CR3: 01c0b000 CR4: 001407f0
[ 6399.839980] Stack:
[ 6399.839983]  880107923d78 8800e65c1e04 8800e65c1f21 
8800eb8b1400
[ 6399.839985]  8800e73d8000 8800eb8b1400 002107923dec 
8820
[ 6399.839988]  000107ffc590 0041 8800ebb261f1 
87ee3bd9
[ 6399.839989] Call Trace:
[ 6399.839998]  [] xenbus_otherend_changed+0xad/0x120
[ 6399.840014]  [] ? prepare_to_wait_event+0x87/0x100
[ 6399.840017]  [] ? unregister_xenbus_watch+0x1d0/0x1d0
[ 6399.840021]  [] backend_changed+0x13/0x20
[ 6399.840023]  [] xenwatch_thread+0x8f/0x150
[ 6399.840026]  [] ? wake_atomic_t_function+0x70/0x70
[ 6399.840031]  [] kthread+0xd8/0xf0
[ 6399.840035]  [] ? kthread_worker_fn+0x180/0x180
[ 6399.840041]  [] ret_from_fork+0x58/0x90
[ 6399.840044]  [] ? kthread_worker_fn+0x180/0x180
[ 6399.840056] Code: 48 8b 04 f0 48 83 f8 ff 0f 84 e2 04 00 00 48 89 c6 48 b8 
ff ff ff ff ff ff ff 3f 48 21 c6 e9 ad fd ff ff 49 8b 44 24 20 4c 89 ee <48> 8b 
78 18 e8 fc ee 2c e1 85 c0 0f 88 be fd ff ff 49 8b 44 24
[ 6399.840059] RIP  [] netback_changed+0x8eb/0xef0 
[xen_netfront]
[ 6399.840060]  RSP 
[ 6399.840061] CR2: 0018
[ 6399.840064] ---[ end trace 790f9d91e3f3059b ]---

The main issue (I suppose) is:
[ 6399.838500] xen_netfront: can't alloc rx grant refs

it happens when gnttab_alloc_grant_references() fails. As far as I can
see it can only happen when gnttab_expand() fails. Do you have anything
in your Xen dmesg ('xl dmegs' output on your host)? In case not, can you
try increasing you guest's memory to see if the issue goes away?

-- 
  Vitaly

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:05,  wrote:
> The other reason for modification is to hide the Xen console device (i.e.
> one UART) from the dom0 kernel, since it is unusable. How does that work on
> x86? Do you just not bother and you expect the admin to arrange the
> configuration to work or is there some other trick?

When it's I/O port based we simply disallow access to the ports (i.e.
Dom0 reads return all ones, Dom0 writes get dropped). When it's
MMIO based (which iirc became an option only in the not so distant
past) we can't always do that, since there could be other devices
on the same MMIO page. In any event we then pci_hide_devices() or
pci_ro_device() the device (see ns16550_init_postirq()) to limit the
damage Dom0 can do.

> BTW, IIRC x86 does modify at least one ACPI table which is the DMAR (I
> think), to hide the IOMMU from the guest? That's another table we would
> want to frob on ARM I think (or it's equivalent, which I think is IORT).

Eliminating that hack is supposed to be on the VT-d maintainers'
TODO list(s) - Dom0 has no business looking at that table (and its
AMD counterpart already isn't being fiddled with in the same way).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/HVM: hvm_map_guest_frame_rw() adjustments

2015-08-13 Thread Wei Liu
On Wed, Aug 12, 2015 at 08:16:33AM -0600, Jan Beulich wrote:
> ... and its callers.
> 
> While all non-nested users are made fully honor the semantics of that
> type, doing so in the nested case seemed insane (if doable at all,
> considering VMCS shadowing), and hence there the respective operations
> are simply made fail.
> 
> One case not (yet) taken care of is that of a page getting transitioned
> to this type after a mapping got established.
> 
> Signed-off-by: Jan Beulich 

Is this a bug fix? I think so, but the title only says adjustment so I'd
better be sure.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v3] libxl: fix libxl__build_hvm error code return path

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 09:24 +0100, Wei Liu wrote:
> @@ -1002,22 +1005,22 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t
> domid,
>  goto out;
>  }
>  
> -ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,

At least this one doesn't return ERROR_*, it returns 0 or -1, assigning
such to rc is wrong and ultimately returning it from this function even
worse.

Remind me why we aren't just taking Roger's original patch from <
1438942688-7610-2-git-send-email-roger@citrix.com> ? It seems we are
basically iterating towards it...

> -   &state->store_mfn, state->console_port,
> -   &state->console_mfn, state->store_domid,
> -   state->console_domid);
> -if (ret) {
> -LOGEV(ERROR, ret, "hvm build set params failed");
> +rc = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
> +  &state->store_mfn, state->console_port,
> +  &state->console_mfn, state->store_domid,
> +  state->console_domid);
> +if (rc) {
> +LOGEV(ERROR, rc, "hvm build set params failed");
>  goto out;
>  }
>  
> -ret = hvm_build_set_xs_values(gc, domid, &args);
> -if (ret) {
> -LOG(ERROR, "hvm build set xenstore values failed (ret=%d)", 
> ret);
> +rc = hvm_build_set_xs_values(gc, domid, &args);
> +if (rc) {
> +LOG(ERROR, "hvm build set xenstore values failed (rc=%d)", rc);
>  goto out;
>  }
>  
> -return 0;
> +rc = 0;
>  out:
>  return rc;
>  }

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/hvm: Fix non-onchangeonly CR write events logic

2015-08-13 Thread Wei Liu
On Wed, Aug 12, 2015 at 08:48:05PM +0300, Razvan Cojocaru wrote:
> On 08/12/2015 08:45 PM, Andrew Cooper wrote:
> > On 12/08/15 18:00, Razvan Cojocaru wrote:
> >> hvm_event_crX() already returns a bool_t to tell us whether an
> >> event will be sent out or not, so the extra check that value != old
> >> is not only useless, but also prevents non-onchangeonly events from
> >> being sent.
> >>
> >> Signed-off-by: Razvan Cojocaru 
> > 
> > Reviewed-by: Andrew Cooper 
> > 
> > I presume this is a 4.6 candidate?  If so, you need to CC the release
> > manager as we are into the RC's
> 
> Yes, it is. I've now CCd Wei, thank you for pointing that out!
> And sorry for missing this before.
> 

Release-acked-by: Wei Liu 

> 
> Thanks,
> Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86/p2m: clear_identity_p2m_entry() must cope with 'relaxed' RDM mode

2015-08-13 Thread Wei Liu
On Wed, Aug 12, 2015 at 09:05:43AM -0600, Jan Beulich wrote:
> Tearing down a 1:1 mapping that was never established isn't really nice
> (and in fact hits an ASSERT() in p2m_remove_page()). Convert from a
> wrapper macro to a proper function which then can take care of the
> situation.
> 
> Also take the opportunity to remove the 'page_order' parameter of
> clear_identity_p2m_entry(), to make it match set_identity_p2m_entry().
> 

Release-acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread David Vrabel
On 12/08/15 20:36, Wei Liu wrote:
> Originally, if user didn't specify maxvcpus= in xl config file, the
> maximum size of vcpu bitmap was always equal to maximum number of pcpus.
> This might not be what user wants.
> 
> Calculate the maximum number of vcpus before allocating vcpu bitmap.
> This requires parsing the same config options twice. Extra a macro to do
> that.
[...]
> +#define PARSE_VNUMA_SPEC(body)

Just when I though libxl/xl's macro craziness couldn't get any worse...

Do you want anyone else to be able to understand this code?

It looks like you can parse the option into a list of (option, value)
tuples and then iterate over this list twice.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: move perform_gunzip to common

2015-08-13 Thread Stefano Stabellini
On Thu, 13 Aug 2015, Jan Beulich wrote:
> >>> On 12.08.15 at 18:15,  wrote:
> > On Wed, 12 Aug 2015, Jan Beulich wrote:
> >> >>> On 12.08.15 at 16:47,  wrote:
> >> > @@ -31,8 +33,15 @@ typedef int decompress_fn(unsigned char *inbuf, 
> >> > unsigned int len,
> >> >   * dependent).
> >> >   */
> >> >  
> >> > -decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4;
> >> > +decompress_fn perform_gunzip, bunzip2, unxz, unlzma, unlzo, unlz4;
> >> >  
> >> >  int decompress(void *inbuf, unsigned int len, void *outbuf);
> >> >  
> >> > +static inline unsigned long output_length(char *image, unsigned long 
> >> > image_len)
> >> 
> >> Neither of the callers gets moved out of bzimage.c - why does this
> >> function need to move?
> > 
> > We'll use it on arm.
> 
> Hmm, the way it is used on x86 makes it quite architecture specific
> (namely because of the assumption that the size is also in said
> place for non-gz compression methods). I'd therefore prefer code
> duplication over code sharing here. 

Actually after seeing the size and quality of the resulting patches, I
am starting to feel the same way.

In terms of code changes, I was thinking that the best result would be
moving the "boilerplate" code from xen/arch/x86/bzimage.c to
xen/common/inflate.c, see below, then the interface would become just
perform_gunzip and gzip_check. But I guess you wouldn't want inflate.c
to be modified, right?

Alternatively we could move it to a new file, let's call it gunzip.h,
that would #include "inflate.c", so:

bzimage.c -- #include --> gunzip.h -- #include --> inflate.c

And again we just leave the perform_gunzip and gzip_check calls in
bzimage.c.  What do you think?

---


diff --git a/xen/arch/x86/bzimage.c b/xen/arch/x86/bzimage.c
index c86c39e..cfd34d6 100644
--- a/xen/arch/x86/bzimage.c
+++ b/xen/arch/x86/bzimage.c
@@ -8,144 +8,8 @@
 #include 
 #include 
 
-#define HEAPORDER 3
-
-static unsigned char *__initdata window;
-#define memptr long
-static memptr __initdata free_mem_ptr;
-static memptr __initdata free_mem_end_ptr;
-
-#define WSIZE   0x8000
-
-static unsigned char *__initdata inbuf;
-static unsigned __initdata insize;
-
-/* Index of next byte to be processed in inbuf: */
-static unsigned __initdata inptr;
-
-/* Bytes in output buffer: */
-static unsigned __initdata outcnt;
-
-#define OF(args)args
-#define STATIC  static
-
-#define memzero(s, n)   memset((s), 0, (n))
-
-typedef unsigned char   uch;
-typedef unsigned short  ush;
-typedef unsigned long   ulg;
-
-#define INIT__init
-#define INITDATA__initdata
-
-#define get_byte()  (inptr < insize ? inbuf[inptr++] : fill_inbuf())
-
-/* Diagnostic functions */
-#ifdef DEBUG
-#  define Assert(cond, msg) do { if (!(cond)) error(msg); } while (0)
-#  define Trace(x)  do { fprintf x; } while (0)
-#  define Tracev(x) do { if (verbose) fprintf x ; } while (0)
-#  define Tracevv(x)do { if (verbose > 1) fprintf x ; } while (0)
-#  define Tracec(c, x)  do { if (verbose && (c)) fprintf x ; } while (0)
-#  define Tracecv(c, x) do { if (verbose > 1 && (c)) fprintf x ; } while (0)
-#else
-#  define Assert(cond, msg)
-#  define Trace(x)
-#  define Tracev(x)
-#  define Tracevv(x)
-#  define Tracec(c, x)
-#  define Tracecv(c, x)
-#endif
-
-static long __initdata bytes_out;
-static void flush_window(void);
-
-static __init void error(char *x)
-{
-panic("%s", x);
-}
-
-static __init int fill_inbuf(void)
-{
-error("ran out of input data");
-return 0;
-}
-
-
 #include "../../common/inflate.c"
 
-static __init void flush_window(void)
-{
-/*
- * The window is equal to the output buffer therefore only need to
- * compute the crc.
- */
-unsigned long c = crc;
-unsigned n;
-unsigned char *in, ch;
-
-in = window;
-for ( n = 0; n < outcnt; n++ )
-{
-ch = *in++;
-c = crc_32_tab[((int)c ^ ch) & 0xff] ^ (c >> 8);
-}
-crc = c;
-
-bytes_out += (unsigned long)outcnt;
-outcnt = 0;
-}
-
-static __init unsigned long output_length(char *image, unsigned long image_len)
-{
-return *(uint32_t *)&image[image_len - 4];
-}
-
-static __init int gzip_check(char *image, unsigned long image_len)
-{
-unsigned char magic0, magic1;
-
-if ( image_len < 2 )
-return 0;
-
-magic0 = (unsigned char)image[0];
-magic1 = (unsigned char)image[1];
-
-return (magic0 == 0x1f) && ((magic1 == 0x8b) || (magic1 == 0x9e));
-}
-
-static __init int perform_gunzip(char *output, char *image, unsigned long 
image_len)
-{
-int rc;
-
-if ( !gzip_check(image, image_len) )
-return 1;
-
-window = (unsigned char *)output;
-
-free_mem_ptr = (unsigned long)alloc_xenheap_pages(HEAPORDER, 0);
-free_mem_end_ptr = free_mem_ptr + (PAGE_SIZE << HEAPORDER);
-
-inbuf = (unsigned char *)image;
-insize = image_len;
-inptr = 0;
-
-makecrc();
-
-if ( gunzip() < 0 )
-{
-rc = -EINVAL;
-}
-else
-{
-rc 

Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 10:50,  wrote:
> On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
>> The function shouldn't return NULL after having obtained a reference,
>> or else the caller won't know to drop it.
> 
>> Also its result shouldn't be ignored - if calling code is certain that
>> a page already has a non-zero refcount, it better ASSERT()s so.
> 
> How is page_get_owner_and_reference sure the reference count was not zero
> on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
> itself) Or have I misunderstood the assertion being made here?

There is still a NULL return path left (when the reference couldn't
be obtained, which includes the case where the reference count
was zero). 

> Likewise where does the preexisting reference in the grant table case come
> from? Perhaps a comment alongside the ASSERT would make it clearer what the
> assert was doing to future readers ("/* Reference count must have already
> been >=1 due to XXX, so this cannot have failed */") or some such.

It is my understanding that - as seen in the patch context - the
page's MFN being read from act->frame implies that (together with
the condition of the respective if() the else it's sitting in, namely the
fact that act->pin is non-zero when we get here). It would seem
odd to state in a comment what surrounding code does (I would
agree to the need of a comment if the requirement was satisfied in
a place far away).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Tim Deegan
At 09:50 +0100 on 13 Aug (1439459451), Ian Campbell wrote:
> On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
> > The function shouldn't return NULL after having obtained a reference,
> > or else the caller won't know to drop it.
> 
> > Also its result shouldn't be ignored - if calling code is certain that
> > a page already has a non-zero refcount, it better ASSERT()s so.
> 
> How is page_get_owner_and_reference sure the reference count was not zero
> on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
> itself) Or have I misunderstood the assertion being made here?

That path is fine -- if the count was zero, it will return NULL before
trying to get the owner.  But I'm not convinced that the assertion is
correct -- are there really no pages anywhere that have a recount but
no owner?  What about, e.g. shadow pool pages or other uses of
MEMF_no_owner?  If a guest can cause Xen to try to get_page() one of
those, then we'd hit the new assertion.

How about unlikely(!owner) path that drops the taken ref instead?

It'd be great to add a comment explaining the semantics of the call,
since the uninformed reader might expect it to take a ref in all
cases.

Cheers,

Tim.

> Likewise where does the preexisting reference in the grant table case come
> from? Perhaps a comment alongside the ASSERT would make it clearer what the
> assert was doing to future readers ("/* Reference count must have already
> been >=1 due to XXX, so this cannot have failed */") or some such.
> 
> > Finally this as well as get_page() and put_page() are required to be
> > available on all architectures - move the declarations to xen/mm.h.
> > 
> > Signed-off-by: Jan Beulich 
> > 
> > --- a/xen/arch/arm/guestcopy.c
> > +++ b/xen/arch/arm/guestcopy.c
> > @@ -1,10 +1,8 @@
> > -#include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> > -
> > -#include 
> >  #include 
> >  
> >  static unsigned long raw_copy_to_guest_helper(void *to, const void 
> > *from,
> > --- a/xen/arch/arm/mm.c
> > +++ b/xen/arch/arm/mm.c
> > @@ -1170,6 +1170,7 @@ long arch_memory_op(int op, XEN_GUEST_HA
> >  struct domain *page_get_owner_and_reference(struct page_info *page)
> >  {
> >  unsigned long x, y = page->count_info;
> > +struct domain *owner;
> >  
> >  do {
> >  x = y;
> > @@ -1182,7 +1183,10 @@ struct domain *page_get_owner_and_refere
> >  }
> >  while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
> >  
> > -return page_get_owner(page);
> > +owner = page_get_owner(page);
> > +ASSERT(owner);
> > +
> > +return owner;
> >  }
> >  
> >  void put_page(struct page_info *page)
> > --- a/xen/arch/x86/mm.c
> > +++ b/xen/arch/x86/mm.c
> > @@ -2039,6 +2039,7 @@ void put_page(struct page_info *page)
> >  struct domain *page_get_owner_and_reference(struct page_info *page)
> >  {
> >  unsigned long x, y = page->count_info;
> > +struct domain *owner;
> >  
> >  do {
> >  x = y;
> > @@ -2052,7 +2053,10 @@ struct domain *page_get_owner_and_refere
> >  }
> >  while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
> >  
> > -return page_get_owner(page);
> > +owner = page_get_owner(page);
> > +ASSERT(owner);
> > +
> > +return owner;
> >  }
> >  
> >  
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -2245,7 +2245,8 @@ __acquire_grant_for_copy(
> >  {
> >  ASSERT(mfn_valid(act->frame));
> >  *page = mfn_to_page(act->frame);
> > -(void)page_get_owner_and_reference(*page);
> > +td = page_get_owner_and_reference(*page);
> > +ASSERT(td);
> >  }
> >  
> >  act->pin += readonly ? GNTPIN_hstr_inc : GNTPIN_hstw_inc;
> > --- a/xen/include/asm-arm/mm.h
> > +++ b/xen/include/asm-arm/mm.h
> > @@ -275,10 +275,6 @@ static inline void *page_to_virt(const s
> >  return mfn_to_virt(page_to_mfn(pg));
> >  }
> >  
> > -struct domain *page_get_owner_and_reference(struct page_info *page);
> > -void put_page(struct page_info *page);
> > -int  get_page(struct page_info *page, struct domain *domain);
> > -
> >  struct page_info *get_page_from_gva(struct domain *d, vaddr_t va,
> >  unsigned long flags);
> >  
> > --- a/xen/include/asm-x86/mm.h
> > +++ b/xen/include/asm-x86/mm.h
> > @@ -352,9 +352,6 @@ const unsigned long *get_platform_badpag
> >  int page_lock(struct page_info *page);
> >  void page_unlock(struct page_info *page);
> >  
> > -struct domain *page_get_owner_and_reference(struct page_info *page);
> > -void put_page(struct page_info *page);
> > -int  get_page(struct page_info *page, struct domain *domain);
> >  void put_page_type(struct page_info *page);
> >  int  get_page_type(struct page_info *page, unsigned long type);
> >  int  put_page_type_preemptible(struct page_info *page);
> > --- a/xen/include/xen/mm.h
> > +++ b/xen/include/xen/mm.h
> > @@ -45,6 +45,7 @@
> >  #ifndef __XEN_MM_H__
> >  #define __XEN_MM_H__
> >  
> >

Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Dario Faggioli
On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> Originally, if user didn't specify maxvcpus= in xl config file, the
> maximum size of vcpu bitmap was always equal to maximum number of pcpus.
> This might not be what user wants.
> 
So, to understand, the issue here is that the bitmaps may be too bit,
and hence we're wasting memory? Or (since you're mentioning valgrind)
are (without this patch) also leaking stuff in some way?

I'm confused by he "not be what user wants" part, as, IMO, the actual
user --i.e., the person writing the config file and then issuing the `xl
ceate' command-- wouldn't notice any difference, would it? Of course,
that does not mean that we should waste memory... as I said, I'm asking
in order to understand what this patch is actually trying to fix...

About the code...

> diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> index 078acd1..0fcef98 100644
> --- a/tools/libxl/xl_cmdimpl.c
> +++ b/tools/libxl/xl_cmdimpl.c
>

Would it make sense to skip this (the first) step if the user actually
did specify "maxvcpus="? Or, if not, at least... ... ...

> +/* Pass one, get the total number of vcpus */
> +PARSE_VNUMA_SPEC({
> +if (!strcmp("vcpus", option)) {
> +split_string_into_string_list(value, ",", &cpu_spec_list);
> +len = libxl_string_list_length(&cpu_spec_list);
>  
> -vnode_spec = xlu_cfg_get_listitem2(vnuma, i);
> -assert(vnode_spec);
> +for (j = 0; j < len; j++) {
> +parse_range(cpu_spec_list[j], &s, &e);
> +for (; s <= e; s++)
> +max_vcpus++;
> +}
> +libxl_string_list_dispose(&cpu_spec_list);
> +}
> +});
>  
... ... ... Move the check for that here, i.e., avoiding doing the
second pass if there is a mismatch?

> +for (i = 0; i < num_vnuma; i++) {
> +libxl_bitmap_init(&vcpu_parsed[i]);
> +if (libxl_cpu_bitmap_alloc(ctx, &vcpu_parsed[i], max_vcpus)) {
> +fprintf(stderr, "libxl_node_bitmap_alloc failed.\n");
>  exit(1);
>  }
> +}

> +/* Pass two, fill in structs */
> +PARSE_VNUMA_SPEC({
> +if (!strcmp("pnode", option)) {
> +val = parse_ulong(value);
> +if (val >= nr_nodes) {
> +fprintf(stderr,
> +"xl: invalid pnode number: %lu\n", val);

>  /* User has specified maxvcpus= */
>  if (b_info->max_vcpus != 0 &&  b_info->max_vcpus != max_vcpus) {

Regards,
Dario
-- 
<> (Raistlin Majere)
-
Dario Faggioli, Ph.D, http://about.me/dario.faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/HVM: hvm_map_guest_frame_rw() adjustments

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:22,  wrote:
> On Wed, Aug 12, 2015 at 08:16:33AM -0600, Jan Beulich wrote:
>> ... and its callers.
>> 
>> While all non-nested users are made fully honor the semantics of that
>> type, doing so in the nested case seemed insane (if doable at all,
>> considering VMCS shadowing), and hence there the respective operations
>> are simply made fail.
>> 
>> One case not (yet) taken care of is that of a page getting transitioned
>> to this type after a mapping got established.
>> 
>> Signed-off-by: Jan Beulich 
> 
> Is this a bug fix? I think so, but the title only says adjustment so I'd
> better be sure.

Yes, it is (I had hoped that the description would be sufficient to
tell).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] xen/mm: populate_physmap: validate correctly the gfn for direct mapped domain

2015-08-13 Thread Ian Campbell
On Tue, 2015-08-11 at 18:41 +0100, Julien Grall wrote:
> Direct mapped domain has already the memory allocated 1:1, so we are
> directly using the gfn as mfn to map the RAM in the guest.
> 
> While we are validating that the page associated to the first mfn belongs 
> to
> the domain, the subsequent MFN are not validated when the extent_order
> is > 0.
> 
> This may result to map memory region (MMIO, RAM) which doesn't belong to 
> the
> domain.
> 
> Although, only DOM0 on ARM is using a direct memory mapped. So it
> doesn't affect any guest (at least on the upstream version) or even x86.
> 
> Signed-off-by: Julien Grall 
> 
> ---
> Cc: Ian Campbell 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Keir Fraser 
> Cc: Tim Deegan 
> 
> This patch is a candidate for Xen 4.6 and backport to Xen 4.5 (and
> maybe 4.4).
> 
> The hypercall is used for the ballooning code and only DOM0 on ARM
> is a direct mapped domain.
> 
> The current code to validate the GFN passed by the populate physmap
> hypercall has been moved in a for loop in order to validate each
> GFN when the extent order is > 0.
> 
> Without it, direct mapping domain may be able to map memory region
> which doesn't belong to it.
> 
> I think the switch to the for loop is pretty safe and contained. The
> worst thing that could happen is denying mapping more often than we
> use to do. Although, IIRC Linux is mostly mapping page one by one
> (i.e extent order = 0).
> ---
>  xen/common/memory.c | 30 ++
>  1 file changed, 18 insertions(+), 12 deletions(-)
> 
> diff --git a/xen/common/memory.c b/xen/common/memory.c
> index 61bb94c..7cc8af4 100644
> --- a/xen/common/memory.c
> +++ b/xen/common/memory.c
> @@ -126,22 +126,28 @@ static void populate_physmap(struct memop_args *a)
>  if ( is_domain_direct_mapped(d) )
>  {
>  mfn = gpfn;
> -if ( !mfn_valid(mfn) )
> +
> +for ( j = 0; j < (1 << a->extent_order); j++, mfn++ )
>  {
> -gdprintk(XENLOG_INFO, "Invalid mfn 
> %#"PRI_xen_pfn"\n",
> +if ( !mfn_valid(mfn) )
> +{
> +gdprintk(XENLOG_INFO, "Invalid mfn 
> %#"PRI_xen_pfn"\n",
>   mfn);
> -goto out;
> +goto out;
> +}
> +
> +page = mfn_to_page(mfn);
> +if ( !get_page(page, d) )
> +{
> +gdprintk(XENLOG_INFO,
> + "mfn %#"PRI_xen_pfn" doesn't belong to 
> the"
> + " domain\n", mfn);
> +goto out;

This will leak the references on any earlier pages, which will happen all
the time if you are cross a boundary from RAM into e.g. MMIO or whatever.

A variant of get_page which took an order argument and returned either with
0 or 1< +}
> +put_page(page);
>  }
>  
> -page = mfn_to_page(mfn);
> -if ( !get_page(page, d) )
> -{
> -gdprintk(XENLOG_INFO,
> - "mfn %#"PRI_xen_pfn" doesn't belong to the"
> - " domain\n", mfn);
> -goto out;
> -}
> -put_page(page);
> +page = mfn_to_page(gpfn);
>  }
>  else
>  page = alloc_domheap_pages(d, a->extent_order, a
> ->memflags);

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/HVM: hvm_map_guest_frame_rw() adjustments

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 03:33:42AM -0600, Jan Beulich wrote:
> >>> On 13.08.15 at 11:22,  wrote:
> > On Wed, Aug 12, 2015 at 08:16:33AM -0600, Jan Beulich wrote:
> >> ... and its callers.
> >> 
> >> While all non-nested users are made fully honor the semantics of that
> >> type, doing so in the nested case seemed insane (if doable at all,
> >> considering VMCS shadowing), and hence there the respective operations
> >> are simply made fail.
> >> 
> >> One case not (yet) taken care of is that of a page getting transitioned
> >> to this type after a mapping got established.
> >> 
> >> Signed-off-by: Jan Beulich 
> > 
> > Is this a bug fix? I think so, but the title only says adjustment so I'd
> > better be sure.
> 
> Yes, it is (I had hoped that the description would be sufficient to
> tell).
> 

Thanks for confirmation.

Release-acked-by: Wei Liu 


> Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v3] libxl: fix libxl__build_hvm error code return path

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 10:22:25AM +0100, Ian Campbell wrote:
> On Thu, 2015-08-13 at 09:24 +0100, Wei Liu wrote:
> > @@ -1002,22 +1005,22 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t
> > domid,
> >  goto out;
> >  }
> >  
> > -ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
> 
> At least this one doesn't return ERROR_*, it returns 0 or -1, assigning
> such to rc is wrong and ultimately returning it from this function even
> worse.
> 
> Remind me why we aren't just taking Roger's original patch from <
> 1438942688-7610-2-git-send-email-roger@citrix.com> ? It seems we are
> basically iterating towards it...
> 

Maybe we should just take that patch, that would save me from doing
another round.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Xen-users] "xl restore" leaks a file descriptor?

2015-08-13 Thread Andrew Cooper
On 13/08/15 10:17, Ian Campbell wrote:
> On Thu, 2015-08-13 at 09:50 +0100, Wei Liu wrote:
>> On Thu, Aug 13, 2015 at 09:39:36AM +0100, Ian Campbell wrote:
>>> On Wed, 2015-08-12 at 18:12 +0100, Wei Liu wrote:
 On Wed, Aug 12, 2015 at 11:04:25AM +0100, Ian Campbell wrote:
 [...]
>>> As Andy says I think we want restore_fd in the check, I can't 
>>> see 
>>> any
>>> reason we wouldn't want to close the socket too.
>>>
>> Do you mean migrate_fd when you say "socket"?
> In the migrate case we do "restore_fd = migrate_fd;", so yes, 
> indirectly.
>
>
>>  I tried that, but that led
>> to failure because toolstack still needs to get controlling 
>> information
>> out of it (the "GO" message).
>>
>> Maybe I close this too early.
> Right.
>
 I look at the code. Even if we should close that socket, it should 
 not
 happen inside create_domain, because the caller (migrate_receive) 
 needs
 that fd.

 IMO create_domain should only close restore_fd if that fd is opened 
 by
 itself.
>>> That makes sense, yes. The close should probably have an associated 
>>> comment
>>> since this will be a bit subtle.
>>>
>>> Perhaps rather than trying to repeat the conditions which lead to it 
>>> being
>>> opened we should just do:
>>> int restore_fd_to_close = -1;
>>> ...
>>> restore_fd_to_close = restore_fd = open(restore_file, O_RDONLY);
>>> ...
>>> if (restore_fd_to_close >= 0) {
>>> close(restore_fd_to_close);
>>> restore_fd_to_close = -1;
>>> }
>>>
>>> Strictly speaking we ought to check the return of close too I suppose.
>>>
>> What would we do in case close fails?
> Maybe just log?
>
> http://pubs.opengroup.org/onlinepubs/9699919799/functions/close.html says
> we can get:
>
> EBADF, which would be an error in our code.
> EINTR, ...
> EIO, which would be from disk full or a disk dying or something.
>
> We (and most code in general) tends to not worry about close failing too
> much, there's an argument for that too...

The return value really shouldn't be ignored.

Logging loudly is about all which can be done, and that is fine, but the
user really does need to be aware that something bad went wrong.

~Andrew

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Ian Campbell
On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> Originally, if user didn't specify maxvcpus= in xl config file, the
> maximum size of vcpu bitmap was always equal to maximum number of pcpus.
> This might not be what user wants.

What are you suggesting they wanted instead? We are only talking about the
bitmap right, and the typical/sensible config will have #vcpus <= #pcpus,
so they will fit even if they "waste" some bits during parsing.

I'm almost inclined to suggest that if a user wants #vcpus > #pcpus they
should have to specify maxvcpus and not rely on the vnuma parsing code
inferring this fact.

IOW maybe this code could just error out (or print a warning) if this
happens? + a doc update.

> Calculate the maximum number of vcpus before allocating vcpu bitmap.
> This requires parsing the same config options twice. Extra a macro to do
> that.

I'm not sure a macro was really the right answer here. e.g. a function
which takes a pointer to a bitmap, if the bitmap is null return the size
which would be needed, otherwise fill in the bitmap, or something. That
could also be split more obviously into a refactoring step and then the
addition of the new checks.

> Signed-off-by: Wei Liu 
> ---
> Valgrind is still clean after this patch applied.
> 
> This patch looks massive because it involves indentation changes, but in
> fact the meat of it is small.

Even with -b it's impossible to read.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] PCI Pass-through in Xen ARM: Draft 4

2015-08-13 Thread Manish Jaggi

  -
 | PCI Pass-through in Xen ARM |
  -
 manish.ja...@caviumnetworks.com
 ---

  Draft-4


 -
 Introduction
 -
 This document describes the design for the PCI passthrough support in Xen
 ARM. The target system is an ARM 64bit SoC with GICv3 and SMMU v2 and PCIe
 devices.

 -
 Revision History
 -
 Changes from Draft-1:
 -
 a) map_mmio hypercall removed from earlier draft
 b) device bar mapping into guest not 1:1
 c) Reserved Area in guest address space for mapping PCI-EP BARs in Stage2.
 d) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).

 Changes from Draft-2:
 -
 a) DomU boot information updated with boot-time device assignment and
 hotplug.
 b) SMMU description added
 c) Mapping between streamID - bdf - deviceID.
 d) assign_device hypercall to include virtual(guest) sbdf.
 Toolstack to generate guest sbdf rather than pciback.

 Changes from Draft-3:
 -
 a) Fixed typos and added more description
 b) NUMA and PCI passthrough description removed for now.
 c) Added example from Ian's Mail

 -
 Index
 -
   (1) Background

   (2) Basic PCI Support in Xen ARM
   (2.1) pci_hostbridge and pci_hostbridge_ops
   (2.2) PHYSDEVOP_HOSTBRIDGE_ADD hypercall
   (2.3) XEN Internal API

   (3) SMMU programming
   (3.1) Additions for PCI Passthrough
   (3.2) Mapping between streamID - deviceID - pci sbdf - requesterID

   (4) Assignment of PCI device
   (4.1) Dom0
   (4.1.1) Stage 2 Mapping of GITS_ITRANSLATER space (4k)
   (4.1.1.1) For Dom0
   (4.1.1.2) For DomU
   (4.1.1.2.1) Hypercall Details: XEN_DOMCTL_get_itranslater_space

   (4.2) DomU
   (4.2.1) Reserved Areas in guest memory space
   (4.2.2) Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
   (4.2.3) Hypercall Modification for bdf mapping notification to xen

   (5) DomU FrontEnd Bus Changes
   (5.1) Change in Linux PCI frontend bus and gicv3-its node binding 
for domU


   (6) Glossary

   (7) References
 -

 1.Background of PCI passthrough
 -
 Passthrough refers to assigning a PCI device to a guest domain (domU) such
 that the guest has full control over the device. The MMIO space / 
interrupts
 are managed by the guest itself, close to how a bare kernel manages a 
device.


 Device's access to guest address space needs to be isolated and protected.
 SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
 device access guest memory for data transfer and sending MSI/X interrupts.
 PCI devices generated message signalled interrupt writes are within guest
 address spaces which are also translated using SMMU.

 For this reason the GITS (ITS address space) Interrupt Translation 
Register

 space is mapped in the guest address space.

 2.Basic PCI Support for ARM
 -
 The APIs to read write from PCI configuration space are based on 
segment:bdf.

 How the sbdf is mapped to a physical address is under the realm of the PCI
 host controller.

 ARM PCI support in Xen, introduces PCI host controller similar to what
 exists in Linux. Host controller drivers registers callbacks, which are
 invoked on matching the compatible property in pci device tree node.

 Note: as pci devices are enumerated the pci node in device tree refers to
 the host controller.

 (TODO: for ACPI unimplemented)

 2.1pci_hostbridge and pci_hostbridge_ops
 -
 The init function in the PCI host driver calls to register hostbridge
 callbacks:

 int pci_hostbridge_register(pci_hostbridge_t *pcihb);

 struct pci_hostbridge_ops {
 u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,
 u32 reg, u32 bytes);
 void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
 u32 reg, u32 bytes, u32 val);
 };

 struct pci_hostbridge{
 u32 segno;
 paddr_t cfg_base;
 paddr_t cfg_size;
 struct dt_device_node *dt_node;
 struct pci_hostbridge_ops ops;
 struct list_head list;
 };

 A PCI conf_read function would internally be as follows:
 u32 pcihb_conf_read(u32 seg, u32 bus, u32 devfn,u32 re

Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM

2015-08-13 Thread Wei Liu
On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote:
> The firmware directory is not built at all on ARM. Attempting to update
> it using the target subtree-force-update will fail when try to update
> seabios.
> 
> Signed-off-by: Julien Grall 

Release-acked-by: Wei Liu 

> 
> ---
> Cc: Ian Jackson 
> Cc: Stefano Stabellini 
> Cc: Ian Campbell 
> Cc: Wei Liu 
> 
> I've noticed it while trying to update the QEMU tree used by Xen on
> a platform where iasl is not present (required by seabios in order
> to update it).
> 
> I think this should go in Xen 4.6 and possibly backport to Xen 4.5
> ---
>  tools/Makefile | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tools/Makefile b/tools/Makefile
> index 45cb4b2..2618559 100644
> --- a/tools/Makefile
> +++ b/tools/Makefile
> @@ -305,7 +305,9 @@ endif
>  ifeq ($(CONFIG_QEMU_TRAD),y)
>   $(MAKE) qemu-xen-traditional-dir-force-update
>  endif
> +ifeq ($(CONFIG_X86),y)
>   $(MAKE) -C firmware subtree-force-update
> +endif
>  
>  subtree-force-update-all:
>   $(MAKE) qemu-xen-dir-force-update
> -- 
> 2.1.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Stefano Stabellini
On Thu, 13 Aug 2015, Jan Beulich wrote:
> >>> On 12.08.15 at 18:18,  wrote:
> > On 12/08/15 16:58, Jan Beulich wrote:
> > On 12.08.15 at 17:51,  wrote:
> >>> On Wed, Aug 12, 2015 at 5:45 PM, Jan Beulich  wrote:
> >>> On 07.08.15 at 04:11,  wrote:
> > All these tables will be copied to Dom0 memory except that the reused
> > tables(DSDT, SPCR, etc) will be mapped to Dom0.
> 
>  Which again seems odd - such tables should be considered MMIO
>  (despite living in RAM), and hence not be part of Dom0's memory
>  assignment.
> 
> >>> not sure if this applies to the context of your comment, but we had
> >>> issues before when trying to deal with this data as device memory
> >>> (MMIO), because ARM doesn't allow unaligned accesses etc. on device
> >>> memory, and so Dom0 would crash at memory abort exceptions during
> >>> boot, so this really does have to be mapped as RAM to Dom0.  If you
> >>> meant some internal Xen bookkeeping thing, then just ignore me.
> >> 
> >> Hmm, how would native Linux avoid such unaligned accesses then?
> > 
> > ACPI table are living on RAM on ARM. So there is no issue with Linux
> > baremetal.
> 
> I'm sure our interpretation of the meaning of RAM here differs:
> While from a physical perspective the tables live in RAM too on
> x86, from a memory map perspective they don't. And since iirc
> ACPI implies UEFI, and since UEFI has special memory types
> for such tables (to prevent the OS from using the space as
> normal memory), I can't believe this to be different under ARM.
> 
> > The "problem" is from Xen where we are mapping memory region with Device
> > attribute. This is because until now we never had a reason to directly
> > map other thing than MMIO to the domain.
> > 
> > This could be fixed by adding new helper in Xen to directly map RAM region.
> 
> Which would seem to be the correct route.
> 
> > Nonetheless, we still have to copy some table in Xen in order to modify
> > them and/or new one. I have in mind the FADT table to set the hypervisor
> > field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
> > avoid the kernel thinks there is hyp mode available.
> 
> "Have to"? Or rather "would like to"? As said in another reply to
> Shannon, I didn't see any rationale spelled out for this fundamental
> difference to x86.

Just because it was done this way before, it doesn't mean that it is the
right way of doing it.

I think is bad design to expose the presence of certain functionalities
in ACPI tables and then expect that the Dom0 kernel won't use them.
ACPI tables should describe only and all the hardware which is exposed
to Dom0. Anything else is a mistake.

For example it is only natural for the kernel to try to use the GIC hyp
functionalities if they are described, while actually they are not
emulated by Xen at all.

I would rather teach Xen to fix the tables now, than later try teach
every possible Dom0 kernel (Linux, FreeBSD, etc) to ignore a set of info
which are wrong but still passed to them anyway. Moreover the list of
things to ignore can change over time. It is just a bad ABI to maintain.

In terms of code we could share in Linux between Xen x86 and Xen ARM
regarding dealing with ACPI info we want to ignore, unfortunately there
isn't much of it, because we are mostly talking about new ARM specific
tables here (GIC, arch_timer, etc).

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after va_copy()

2015-08-13 Thread Wei Liu
On Fri, Aug 07, 2015 at 02:51:59PM +0100, Andrew Cooper wrote:
> C requires that every use of va_copy() is matched with a va_end() call.
> 
> This is especially important for x86_64 as va_{start,copy}() may need to
> allocate memory to generate a va_list containing parameters which were
> previously in registers.
> 
> Signed-off-by: Andrew Cooper 

Release-acked-by: Wei Liu 

> ---
> CC: Ian Campbell 
> CC: Ian Jackson 
> CC: Wei Liu 
> ---
>  tools/xenstore/talloc.c |6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tools/xenstore/talloc.c b/tools/xenstore/talloc.c
> index 54dbd02..d7edcf3 100644
> --- a/tools/xenstore/talloc.c
> +++ b/tools/xenstore/talloc.c
> @@ -1101,13 +1101,16 @@ char *talloc_vasprintf(const void *t, const char 
> *fmt, va_list ap)
>  
>   /* this call looks strange, but it makes it work on older solaris boxes 
> */
>   if ((len = vsnprintf(&c, 1, fmt, ap2)) < 0) {
> + va_end(ap2);
>   return NULL;
>   }
> + va_end(ap2);
>  
>   ret = _talloc(t, len+1);
>   if (ret) {
>   VA_COPY(ap2, ap);
>   vsnprintf(ret, len+1, fmt, ap2);
> + va_end(ap2);
>   talloc_set_name_const(ret, ret);
>   }
>  
> @@ -1161,8 +1164,10 @@ static char *talloc_vasprintf_append(char *s, const 
> char *fmt, va_list ap)
>* the original string. Most current callers of this 
>* function expect it to never return NULL.
>*/
> + va_end(ap2);
>   return s;
>   }
> + va_end(ap2);
>  
>   s = talloc_realloc(NULL, s, char, s_len + len+1);
>   if (!s) return NULL;
> @@ -1170,6 +1175,7 @@ static char *talloc_vasprintf_append(char *s, const 
> char *fmt, va_list ap)
>   VA_COPY(ap2, ap);
>  
>   vsnprintf(s+s_len, len+1, fmt, ap2);
> + va_end(ap2);
>   talloc_set_name_const(s, s);
>  
>   return s;
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:31,  wrote:
> At 09:50 +0100 on 13 Aug (1439459451), Ian Campbell wrote:
>> On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
>> > The function shouldn't return NULL after having obtained a reference,
>> > or else the caller won't know to drop it.
>> 
>> > Also its result shouldn't be ignored - if calling code is certain that
>> > a page already has a non-zero refcount, it better ASSERT()s so.
>> 
>> How is page_get_owner_and_reference sure the reference count was not zero
>> on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
>> itself) Or have I misunderstood the assertion being made here?
> 
> That path is fine -- if the count was zero, it will return NULL before
> trying to get the owner.  But I'm not convinced that the assertion is
> correct -- are there really no pages anywhere that have a recount but
> no owner?  What about, e.g. shadow pool pages or other uses of
> MEMF_no_owner?  If a guest can cause Xen to try to get_page() one of
> those, then we'd hit the new assertion.
> 
> How about unlikely(!owner) path that drops the taken ref instead?

That would be dead code - a page the ref count of wasn't zero
prior to the function taking an extra ref shouldn't be unowned.
If the new ASSERT() ever triggers we know we have a bug
elsewhere (which would be hidden if we did what you suggest).

> It'd be great to add a comment explaining the semantics of the call,
> since the uninformed reader might expect it to take a ref in all
> cases.

It would seem to me that this ought to be implicit from the
__must_check that the patch adds, as otherwise there would
be no (reasonable, i.e. leaving aside refcount overflow) way
to get back NULL here. But yes, if you think this is _too_
implicit, I could certainly add a comment to the declaration.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 11:32:35AM +0200, Dario Faggioli wrote:
> On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> > Originally, if user didn't specify maxvcpus= in xl config file, the
> > maximum size of vcpu bitmap was always equal to maximum number of pcpus.
> > This might not be what user wants.
> > 
> So, to understand, the issue here is that the bitmaps may be too bit,

Too small.

> and hence we're wasting memory? Or (since you're mentioning valgrind)
> are (without this patch) also leaking stuff in some way?
> 

No. I mention valgrind because I want to be sure the refactoring doesn't
cause memory leak.

> I'm confused by he "not be what user wants" part, as, IMO, the actual
> user --i.e., the person writing the config file and then issuing the `xl
> ceate' command-- wouldn't notice any difference, would it? Of course,
> that does not mean that we should waste memory... as I said, I'm asking
> in order to understand what this patch is actually trying to fix...
> 

The trick here is that xl is "smart" enough to sum up the number of
vcpus needed in vNUMA configuration and check that against maxvcpus (if
specified).

So if users has not specified maxvcpus, b_info->..max_vcpus would be 0,
and the allocated bitmap would have the size of number of pcpus. That
bitmap may be too small for the number of vcpus specified in vNUMA
configuration.

Consider following configuration:

memory = 186368
vcpus = 36
cpus = "nodes:0"
vnuma = [ [ "pnode=0","size=2048","vcpus=0-35","vdistances=10,20,20,20" ],
  [ "pnode=1","size=61440","vdistances=20,10,20,20" ],
  [ "pnode=2","size=61440","vdistances=20,20,10,20" ],
  [ "pnode=3","size=61440","vdistances=20,20,20,10" ] ]
# no maxvcpus=

If you only have, say, 8 pcpus, the parsed configuration is wrong.

> About the code...
> 
> > diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
> > index 078acd1..0fcef98 100644
> > --- a/tools/libxl/xl_cmdimpl.c
> > +++ b/tools/libxl/xl_cmdimpl.c
> >
> 
> Would it make sense to skip this (the first) step if the user actually
> did specify "maxvcpus="? Or, if not, at least... ... ...
> 

No, see above.

> > +/* Pass one, get the total number of vcpus */
> > +PARSE_VNUMA_SPEC({
> > +if (!strcmp("vcpus", option)) {
> > +split_string_into_string_list(value, ",", &cpu_spec_list);
> > +len = libxl_string_list_length(&cpu_spec_list);
> >  
> > -vnode_spec = xlu_cfg_get_listitem2(vnuma, i);
> > -assert(vnode_spec);
> > +for (j = 0; j < len; j++) {
> > +parse_range(cpu_spec_list[j], &s, &e);
> > +for (; s <= e; s++)
> > +max_vcpus++;
> > +}
> > +libxl_string_list_dispose(&cpu_spec_list);
> > +}
> > +});
> >  
> ... ... ... Move the check for that here, i.e., avoiding doing the
> second pass if there is a mismatch?
> 

Yes, this is doable.

Wei.

> > +for (i = 0; i < num_vnuma; i++) {
> > +libxl_bitmap_init(&vcpu_parsed[i]);
> > +if (libxl_cpu_bitmap_alloc(ctx, &vcpu_parsed[i], max_vcpus)) {
> > +fprintf(stderr, "libxl_node_bitmap_alloc failed.\n");
> >  exit(1);
> >  }
> > +}
> 
> > +/* Pass two, fill in structs */
> > +PARSE_VNUMA_SPEC({
> > +if (!strcmp("pnode", option)) {
> > +val = parse_ulong(value);
> > +if (val >= nr_nodes) {
> > +fprintf(stderr,
> > +"xl: invalid pnode number: %lu\n", val);
> 
> >  /* User has specified maxvcpus= */
> >  if (b_info->max_vcpus != 0 &&  b_info->max_vcpus != max_vcpus) {
> 
> Regards,
> Dario
> -- 
> <> (Raistlin Majere)
> -
> Dario Faggioli, Ph.D, http://about.me/dario.faggioli
> Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xs-devel] Trying to bring up stub domain in xen-4.4-xs88306

2015-08-13 Thread Andrew Cooper
On 13/08/15 08:04, Xuehan Xu wrote:
> Hi, everyone.
>
> I'm trying to run a Windows HVM vm with stub domain in xenserver-6.5,
> whose internal xen version is xen-4.4-xs88306. After I started the vm,
> both the vm and its corresponding stub domain crashed. Here is the
> related content in hypervisor.log. The domain ID of the windows vm is
> 1, and the stub domain's id is 2.
>
> [2015-08-12 17:11:24] (d1) [  135.564650] HVM Loader
> [2015-08-12 17:11:24] (d1) [  135.564756] Detected Xen v4.4.1-xs88306
> [2015-08-12 17:11:24] (d1) [  135.564850] Xenbus rings @0xfeffc000,
> event channel 2
> [2015-08-12 17:11:24] (d1) [  135.565048] System requested ROMBIOS
> [2015-08-12 17:11:24] (d1) [  135.565090] CPU speed is 2494 MHz
> [2015-08-12 17:11:24] (d2) [  135.565234] Bootstrapping...
> [2015-08-12 17:11:24] (d2) [  135.565275] Xen Minimal OS!
> [2015-08-12 17:11:24] (d2) [  135.565284]   start_info: 0x585000(VA)
> [2015-08-12 17:11:24] (d2) [  135.565289] nr_pages: 0x2000
> [2015-08-12 17:11:24] (d2) [  135.565293]   shared_inf: 0xbda9e000(MA)
> [2015-08-12 17:11:24] (d2) [  135.565296]  pt_base: 0x588000(VA)
> [2015-08-12 17:11:24] (d2) [  135.565300] nr_pt_frames: 0x7
> [2015-08-12 17:11:24] (d2) [  135.565304] mfn_list: 0x575000(VA)
> [2015-08-12 17:11:24] (d2) [  135.565307]mod_start: 0x0(VA)
> [2015-08-12 17:11:24] (d2) [  135.565311]  mod_len: 0
> [2015-08-12 17:11:24] (d2) [  135.565314]flags: 0x0
> [2015-08-12 17:11:24] (d2) [  135.565318] cmd_line: 
> [2015-08-12 17:11:24] (d2) [  135.565374]   stack:  0x534660-0x554660
> [2015-08-12 17:11:24] (d2) [  135.565381] MM: Init
> [2015-08-12 17:11:24] (d2) [  135.565385]   _text: 0x0(VA)
> [2015-08-12 17:11:24] (d2) [  135.565389]  _etext: 0x1203b2(VA)
> [2015-08-12 17:11:24] (d2) [  135.565393]_erodata: 0x176000(VA)
> [2015-08-12 17:11:24] (d2) [  135.565396]  _edata: 0x17bf88(VA)
> [2015-08-12 17:11:24] (d2) [  135.565399] stack start: 0x534660(VA)
> [2015-08-12 17:11:24] (d2) [  135.565402]_end: 0x574f68(VA)
> [2015-08-12 17:11:24] (d2) [  135.565406]   start_pfn: 592
> [2015-08-12 17:11:24] (d2) [  135.565410] max_pfn: 2000
> [2015-08-12 17:11:24] (d2) [  135.565414] Mapping memory range
> 0x80 - 0x200
> [2015-08-12 17:11:24] (d1) [  135.565525] Relocating guest memory for
> lowmem MMIO space enabled
> [2015-08-12 17:11:24] (d1) [  135.565586] PCI-ISA link 0 routed to IRQ5
> [2015-08-12 17:11:24] (d1) [  135.565648] PCI-ISA link 1 routed to IRQ10
> [2015-08-12 17:11:24] (d1) [  135.565710] PCI-ISA link 2 routed to IRQ11
> [2015-08-12 17:11:24] (d1) [  135.565770] PCI-ISA link 3 routed to IRQ5
> [2015-08-12 17:11:24] (d1) [  135.566884] *** HVMLoader assertion
> '(devfn != PCI_ISA_DEVFN) || ((vendor_id == 0x8086) && 
> [2015-08-12 17:11:24] (d1) [  135.566968] (device_id == 0x7000))'
> failed at pci.c:112
> [2015-08-12 17:11:24] (d1) [  135.567012] *** HVMLoader crashed.
> [2015-08-12 17:11:24] (d2) [  135.568790] setting 0x0-0x176000 readonly
> [2015-08-12 17:11:24] (d2) [  135.568799] skipped 0x1000
> [2015-08-12 17:11:24] (d2) [  135.568980] MM: Initialise page
> allocator for 59e000(59e000)-200(200)
> [2015-08-12 17:11:24] (d2) [  135.568999] MM: done
> [2015-08-12 17:11:24] (d2) [  135.569050] Demand map pfns at
> 2001000-2002001000.
> [2015-08-12 17:11:24] (d2) [  135.569056] Heap resides at
> 2002002000-4002002000.
> [2015-08-12 17:11:24] (d2) [  135.569060] Initialising timer interface
> [2015-08-12 17:11:24] (d2) [  135.569125] Initialising console ... done.
> [2015-08-12 17:11:24] (d2) [  135.569199] gnttab_table mapped at
> 0x2001000.
> [2015-08-12 17:11:24] (d2) [  135.569205] Initialising scheduler
> [2015-08-12 17:11:24] (d2) [  135.569216] Thread "Idle": pointer:
> 0x2002002050, stack: 0x5c
> [2015-08-12 17:11:24] (d2) [  135.569225] Thread "xenstore": pointer:
> 0x2002002800, stack: 0x5d
> [2015-08-12 17:11:24] (d2) [  135.569232] xenbus initialised on irq 1
> mfn 0xbd3f5
> [2015-08-12 17:11:24] (d2) [  135.569247] Thread "shutdown": pointer:
> 0x2002002fb0, stack: 0x5e
> [2015-08-12 17:11:24] (d2) [  135.569254] Dummy main: start_info=0x554760
> [2015-08-12 17:11:24] (d2) [  135.569260] Thread "main": pointer:
> 0x2002003760, stack: 0x5f
> [2015-08-12 17:11:24] (d2) [  135.569295] Thread "pcifront": pointer:
> 0x2002003f50, stack: 0x60
> [2015-08-12 17:11:24] (d2) [  135.569311] pcifront_watches: waiting
> for backend path to appear device/pci/0/backend
> [2015-08-12 17:11:24] (d2) [  135.570783] dom vm is at
> /vm/8b9cf2d6-4f2e-b0b4-36fb-1728f111f6e7
> [2015-08-12 17:11:24] (d2) [  135.571479] 
> NETFRONT for device/vif/0 **
> [2015-08-12 17:11:24] (d2) [  135.571484] 
> [2015-08-12 17:11:24] (d2) [  135.571486] 
> [2015-08-12 17:11:24] (d2) [  135.571518] net TX ring size 256
> [2015-08-12 17:11:24] (d2) [  135.571522] net RX ring size 256
> [2015-08-12 17:11:24] (d2) [  135.575064] backend at
> /local/domai

[Xen-devel] [distros-debian-jessie test] 37829: tolerable FAIL

2015-08-13 Thread Platform Team regression test user
flight 37829 distros-debian-jessie real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/37829/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-armhf-jessie-netboot-pygrub 9 debian-di-install fail never 
pass

baseline version:
 flight   37793

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-jessie-netboot-pvgrub pass
 test-amd64-i386-i386-jessie-netboot-pvgrub   pass
 test-amd64-i386-amd64-jessie-netboot-pygrub  pass
 test-armhf-armhf-armhf-jessie-netboot-pygrub fail
 test-amd64-amd64-i386-jessie-netboot-pygrub  pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 10:42:26AM +0100, Ian Campbell wrote:
> On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> > Originally, if user didn't specify maxvcpus= in xl config file, the
> > maximum size of vcpu bitmap was always equal to maximum number of pcpus.
> > This might not be what user wants.
> 
> What are you suggesting they wanted instead? We are only talking about the
> bitmap right, and the typical/sensible config will have #vcpus <= #pcpus,
> so they will fit even if they "waste" some bits during parsing.

#vcpus > #pcpus, bitmap is too small.

> 
> I'm almost inclined to suggest that if a user wants #vcpus > #pcpus they
> should have to specify maxvcpus and not rely on the vnuma parsing code
> inferring this fact.
> 

I don't think we should prevent people from shooting themselves in the
foot.

> IOW maybe this code could just error out (or print a warning) if this
> happens? + a doc update.
> 

Xl doesn't complain when you set vcpus > pcpus. I don't think vNUMA
should behave differently.

Wei.

> > Calculate the maximum number of vcpus before allocating vcpu bitmap.
> > This requires parsing the same config options twice. Extra a macro to do
> > that.
> 
> I'm not sure a macro was really the right answer here. e.g. a function
> which takes a pointer to a bitmap, if the bitmap is null return the size
> which would be needed, otherwise fill in the bitmap, or something. That
> could also be split more obviously into a refactoring step and then the
> addition of the new checks.
> 
> > Signed-off-by: Wei Liu 
> > ---
> > Valgrind is still clean after this patch applied.
> > 
> > This patch looks massive because it involves indentation changes, but in
> > fact the meat of it is small.
> 
> Even with -b it's impossible to read.
> 
> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] xen/mm: populate_physmap: validate correctly the gfn for direct mapped domain

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:33,  wrote:
> On Tue, 2015-08-11 at 18:41 +0100, Julien Grall wrote:
>> --- a/xen/common/memory.c
>> +++ b/xen/common/memory.c
>> @@ -126,22 +126,28 @@ static void populate_physmap(struct memop_args *a)
>>  if ( is_domain_direct_mapped(d) )
>>  {
>>  mfn = gpfn;
>> -if ( !mfn_valid(mfn) )
>> +
>> +for ( j = 0; j < (1 << a->extent_order); j++, mfn++ )
>>  {
>> -gdprintk(XENLOG_INFO, "Invalid mfn %#"PRI_xen_pfn"\n",
>> +if ( !mfn_valid(mfn) )
>> +{
>> +gdprintk(XENLOG_INFO, "Invalid mfn 
>> %#"PRI_xen_pfn"\n",
>>   mfn);
>> -goto out;
>> +goto out;
>> +}
>> +
>> +page = mfn_to_page(mfn);
>> +if ( !get_page(page, d) )
>> +{
>> +gdprintk(XENLOG_INFO,
>> + "mfn %#"PRI_xen_pfn" doesn't belong to the"
>> + " domain\n", mfn);
>> +goto out;
> 
> This will leak the references on any earlier pages, which will happen all
> the time if you are cross a boundary from RAM into e.g. MMIO or whatever.

I don't see why it would - the put_page() (as you say a few lines
down in your reply) follows right afterwards.

> A variant of get_page which took an order argument and returned either with
> 0 or 1< suppose we don't already have one of those, but you could add it.

That would be useful if we wanted to retain the ref, but that's not the
case here.

> Oh, am I wrong and we do get_page then immediately put_page for every page?
> I suppose that is one way to check the current owner, but I think it needs
> a comment at least (it did before TBH, just iterating over many pages makes
> it seem even odder).
> 
> Is that really the best way to ask if a page is owned by a given domain tho
> ugh? Does page_get_owner not suffice?

That could produce wrong results when the page previously had
no reference (and namely, due to the unions used in
struct page_info, was free or in use as a shadow page).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [xs-devel] Trying to bring up stub domain in xen-4.4-xs88306

2015-08-13 Thread Andrew Cooper
On 13/08/15 10:52, Andrew Cooper wrote:
> On 13/08/15 08:04, Xuehan Xu wrote:
>> Hi, everyone.
>>
>> I'm trying to run a Windows HVM vm with stub domain in xenserver-6.5,
>> whose internal xen version is xen-4.4-xs88306. After I started the
>> vm, both the vm and its corresponding stub domain crashed. Here is
>> the related content in hypervisor.log. The domain ID of the windows
>> vm is 1, and the stub domain's id is 2.
>>
>> [2015-08-12 17:11:24] (d1) [  135.564650] HVM Loader
>> [2015-08-12 17:11:24] (d1) [  135.564756] Detected Xen v4.4.1-xs88306
>> [2015-08-12 17:11:24] (d1) [  135.564850] Xenbus rings @0xfeffc000,
>> event channel 2
>> [2015-08-12 17:11:24] (d1) [  135.565048] System requested ROMBIOS
>> [2015-08-12 17:11:24] (d1) [  135.565090] CPU speed is 2494 MHz
>> [2015-08-12 17:11:24] (d2) [  135.565234] Bootstrapping...
>> [2015-08-12 17:11:24] (d2) [  135.565275] Xen Minimal OS!
>> [2015-08-12 17:11:24] (d2) [  135.565284]   start_info: 0x585000(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565289] nr_pages: 0x2000
>> [2015-08-12 17:11:24] (d2) [  135.565293]   shared_inf: 0xbda9e000(MA)
>> [2015-08-12 17:11:24] (d2) [  135.565296]  pt_base: 0x588000(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565300] nr_pt_frames: 0x7
>> [2015-08-12 17:11:24] (d2) [  135.565304] mfn_list: 0x575000(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565307]mod_start: 0x0(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565311]  mod_len: 0
>> [2015-08-12 17:11:24] (d2) [  135.565314]flags: 0x0
>> [2015-08-12 17:11:24] (d2) [  135.565318] cmd_line: 
>> [2015-08-12 17:11:24] (d2) [  135.565374]   stack:  0x534660-0x554660
>> [2015-08-12 17:11:24] (d2) [  135.565381] MM: Init
>> [2015-08-12 17:11:24] (d2) [  135.565385]   _text: 0x0(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565389]  _etext: 0x1203b2(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565393]_erodata: 0x176000(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565396]  _edata: 0x17bf88(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565399] stack start: 0x534660(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565402]_end: 0x574f68(VA)
>> [2015-08-12 17:11:24] (d2) [  135.565406]   start_pfn: 592
>> [2015-08-12 17:11:24] (d2) [  135.565410] max_pfn: 2000
>> [2015-08-12 17:11:24] (d2) [  135.565414] Mapping memory range
>> 0x80 - 0x200
>> [2015-08-12 17:11:24] (d1) [  135.565525] Relocating guest memory for
>> lowmem MMIO space enabled
>> [2015-08-12 17:11:24] (d1) [  135.565586] PCI-ISA link 0 routed to IRQ5
>> [2015-08-12 17:11:24] (d1) [  135.565648] PCI-ISA link 1 routed to IRQ10
>> [2015-08-12 17:11:24] (d1) [  135.565710] PCI-ISA link 2 routed to IRQ11
>> [2015-08-12 17:11:24] (d1) [  135.565770] PCI-ISA link 3 routed to IRQ5
>> [2015-08-12 17:11:24] (d1) [  135.566884] *** HVMLoader assertion
>> '(devfn != PCI_ISA_DEVFN) || ((vendor_id == 0x8086) && 
>> [2015-08-12 17:11:24] (d1) [  135.566968] (device_id == 0x7000))'
>> failed at pci.c:112
>> [2015-08-12 17:11:24] (d1) [  135.567012] *** HVMLoader crashed.
>> [2015-08-12 17:11:24] (d2) [  135.568790] setting 0x0-0x176000 readonly
>> [2015-08-12 17:11:24] (d2) [  135.568799] skipped 0x1000
>> [2015-08-12 17:11:24] (d2) [  135.568980] MM: Initialise page
>> allocator for 59e000(59e000)-200(200)
>> [2015-08-12 17:11:24] (d2) [  135.568999] MM: done
>> [2015-08-12 17:11:24] (d2) [  135.569050] Demand map pfns at
>> 2001000-2002001000.
>> [2015-08-12 17:11:24] (d2) [  135.569056] Heap resides at
>> 2002002000-4002002000.
>> [2015-08-12 17:11:24] (d2) [  135.569060] Initialising timer interface
>> [2015-08-12 17:11:24] (d2) [  135.569125] Initialising console ... done.
>> [2015-08-12 17:11:24] (d2) [  135.569199] gnttab_table mapped at
>> 0x2001000.
>> [2015-08-12 17:11:24] (d2) [  135.569205] Initialising scheduler
>> [2015-08-12 17:11:24] (d2) [  135.569216] Thread "Idle": pointer:
>> 0x2002002050, stack: 0x5c
>> [2015-08-12 17:11:24] (d2) [  135.569225] Thread "xenstore": pointer:
>> 0x2002002800, stack: 0x5d
>> [2015-08-12 17:11:24] (d2) [  135.569232] xenbus initialised on irq 1
>> mfn 0xbd3f5
>> [2015-08-12 17:11:24] (d2) [  135.569247] Thread "shutdown": pointer:
>> 0x2002002fb0, stack: 0x5e
>> [2015-08-12 17:11:24] (d2) [  135.569254] Dummy main: start_info=0x554760
>> [2015-08-12 17:11:24] (d2) [  135.569260] Thread "main": pointer:
>> 0x2002003760, stack: 0x5f
>> [2015-08-12 17:11:24] (d2) [  135.569295] Thread "pcifront": pointer:
>> 0x2002003f50, stack: 0x60
>> [2015-08-12 17:11:24] (d2) [  135.569311] pcifront_watches: waiting
>> for backend path to appear device/pci/0/backend
>> [2015-08-12 17:11:24] (d2) [  135.570783] dom vm is at
>> /vm/8b9cf2d6-4f2e-b0b4-36fb-1728f111f6e7
>> [2015-08-12 17:11:24] (d2) [  135.571479] 
>> NETFRONT for device/vif/0 **
>> [2015-08-12 17:11:24] (d2) [  135.571484] 
>> [2015-08-12 17:11:24] (d2) [  135.571486] 
>> [2015-08-12 17:11:24] (d2) [  135.571518] net TX ring size 256
>> [201

Re: [Xen-devel] [PATCH 1/2] xen: move perform_gunzip to common

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:28,  wrote:
> On Thu, 13 Aug 2015, Jan Beulich wrote:
>> >>> On 12.08.15 at 18:15,  wrote:
>> > On Wed, 12 Aug 2015, Jan Beulich wrote:
>> >> >>> On 12.08.15 at 16:47,  wrote:
>> >> > @@ -31,8 +33,15 @@ typedef int decompress_fn(unsigned char *inbuf, 
>> >> > unsigned int len,
>> >> >   * dependent).
>> >> >   */
>> >> >  
>> >> > -decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4;
>> >> > +decompress_fn perform_gunzip, bunzip2, unxz, unlzma, unlzo, unlz4;
>> >> >  
>> >> >  int decompress(void *inbuf, unsigned int len, void *outbuf);
>> >> >  
>> >> > +static inline unsigned long output_length(char *image, unsigned long 
>> >> > image_len)
>> >> 
>> >> Neither of the callers gets moved out of bzimage.c - why does this
>> >> function need to move?
>> > 
>> > We'll use it on arm.
>> 
>> Hmm, the way it is used on x86 makes it quite architecture specific
>> (namely because of the assumption that the size is also in said
>> place for non-gz compression methods). I'd therefore prefer code
>> duplication over code sharing here. 
> 
> Actually after seeing the size and quality of the resulting patches, I
> am starting to feel the same way.
> 
> In terms of code changes, I was thinking that the best result would be
> moving the "boilerplate" code from xen/arch/x86/bzimage.c to
> xen/common/inflate.c, see below, then the interface would become just
> perform_gunzip and gzip_check. But I guess you wouldn't want inflate.c
> to be modified, right?

Yes, unless really unavoidable.

> Alternatively we could move it to a new file, let's call it gunzip.h,
> that would #include "inflate.c", so:
> 
> bzimage.c -- #include --> gunzip.h -- #include --> inflate.c
> 
> And again we just leave the perform_gunzip and gzip_check calls in
> bzimage.c.  What do you think?

That's an option.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 03:31 -0600, Jan Beulich wrote:
> > 
> > > > On 13.08.15 at 10:50,  wrote:
> > On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
> > > The function shouldn't return NULL after having obtained a reference,
> > > or else the caller won't know to drop it.
> > 
> > > Also its result shouldn't be ignored - if calling code is certain 
> > > that
> > > a page already has a non-zero refcount, it better ASSERT()s so.
> > 
> > How is page_get_owner_and_reference sure the reference count was not 
> > zero
> > on entry? (WRT the call to page_get_owner in 
> > page_get_owner_and_reference
> > itself) Or have I misunderstood the assertion being made here?
> 
> There is still a NULL return path left (when the reference couldn't
> be obtained, which includes the case where the reference count
> was zero). 

Thanks, I missed that big comment somehow.

> > Likewise where does the preexisting reference in the grant table case 
> > come
> > from? Perhaps a comment alongside the ASSERT would make it clearer what 
> > the
> > assert was doing to future readers ("/* Reference count must have 
> > already
> > been >=1 due to XXX, so this cannot have failed */") or some such.
> 
> It is my understanding that - as seen in the patch context - the
> page's MFN being read from act->frame implies that (together with
> the condition of the respective if() the else it's sitting in, namely the
> fact that act->pin is non-zero when we get here). It would seem
> odd to state in a comment what surrounding code does (I would
> agree to the need of a comment if the requirement was satisfied in
> a place far away).

The fact that act->frame/act->pin together somehow imply a reference count
(taken elsewhere?) is not all that obvious, at least to me. A comment to
that effect was exactly what I was after.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:43,  wrote:
> On Thu, 13 Aug 2015, Jan Beulich wrote:
>> >>> On 12.08.15 at 18:18,  wrote:
>> > On 12/08/15 16:58, Jan Beulich wrote:
>> > On 12.08.15 at 17:51,  wrote:
>> >>> On Wed, Aug 12, 2015 at 5:45 PM, Jan Beulich  wrote:
>> >>> On 07.08.15 at 04:11,  wrote:
>> > All these tables will be copied to Dom0 memory except that the reused
>> > tables(DSDT, SPCR, etc) will be mapped to Dom0.
>> 
>>  Which again seems odd - such tables should be considered MMIO
>>  (despite living in RAM), and hence not be part of Dom0's memory
>>  assignment.
>> 
>> >>> not sure if this applies to the context of your comment, but we had
>> >>> issues before when trying to deal with this data as device memory
>> >>> (MMIO), because ARM doesn't allow unaligned accesses etc. on device
>> >>> memory, and so Dom0 would crash at memory abort exceptions during
>> >>> boot, so this really does have to be mapped as RAM to Dom0.  If you
>> >>> meant some internal Xen bookkeeping thing, then just ignore me.
>> >> 
>> >> Hmm, how would native Linux avoid such unaligned accesses then?
>> > 
>> > ACPI table are living on RAM on ARM. So there is no issue with Linux
>> > baremetal.
>> 
>> I'm sure our interpretation of the meaning of RAM here differs:
>> While from a physical perspective the tables live in RAM too on
>> x86, from a memory map perspective they don't. And since iirc
>> ACPI implies UEFI, and since UEFI has special memory types
>> for such tables (to prevent the OS from using the space as
>> normal memory), I can't believe this to be different under ARM.
>> 
>> > The "problem" is from Xen where we are mapping memory region with Device
>> > attribute. This is because until now we never had a reason to directly
>> > map other thing than MMIO to the domain.
>> > 
>> > This could be fixed by adding new helper in Xen to directly map RAM region.
>> 
>> Which would seem to be the correct route.
>> 
>> > Nonetheless, we still have to copy some table in Xen in order to modify
>> > them and/or new one. I have in mind the FADT table to set the hypervisor
>> > field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
>> > avoid the kernel thinks there is hyp mode available.
>> 
>> "Have to"? Or rather "would like to"? As said in another reply to
>> Shannon, I didn't see any rationale spelled out for this fundamental
>> difference to x86.
> 
> Just because it was done this way before, it doesn't mean that it is the
> right way of doing it.

Correct. Which is why I'm not saying outright "no" but asking for
justification.

> I think is bad design to expose the presence of certain functionalities
> in ACPI tables and then expect that the Dom0 kernel won't use them.
> ACPI tables should describe only and all the hardware which is exposed
> to Dom0. Anything else is a mistake.

That depends on the perspective you take: Especially for a PV
Dom0 it is quite reasonable for the kernel to be aware of certain
implications from running under a (or the) hypervisor it knows of.
I agree that the perspective may shift when it comes to HVM-like
Dom0 (albeit in the spectrum I'd rather see ARM near PVH, which
again is supposed to be aware).

> For example it is only natural for the kernel to try to use the GIC hyp
> functionalities if they are described, while actually they are not
> emulated by Xen at all.

See Ian's earlier reply: It can also be considered natural for it to
be aware that when run in EL2 to not use EL1 functionality.

> I would rather teach Xen to fix the tables now, than later try teach
> every possible Dom0 kernel (Linux, FreeBSD, etc) to ignore a set of info
> which are wrong but still passed to them anyway. Moreover the list of
> things to ignore can change over time. It is just a bad ABI to maintain.

I don't think there's a clear advantage to teaching Xen of new
tables to hide over teaching Dom0 of new tables to ignore - it
much depends on release cycles and such which one would be
more beneficial.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 10:54 +0100, Wei Liu wrote:
> On Thu, Aug 13, 2015 at 10:42:26AM +0100, Ian Campbell wrote:
> > On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> > > Originally, if user didn't specify maxvcpus= in xl config file, the
> > > maximum size of vcpu bitmap was always equal to maximum number of 
> > > pcpus.
> > > This might not be what user wants.
> > 
> > What are you suggesting they wanted instead? We are only talking about 
> > the
> > bitmap right, and the typical/sensible config will have #vcpus <= 
> > #pcpus,
> > so they will fit even if they "waste" some bits during parsing.
> 
> #vcpus > #pcpus, bitmap is too small.

Right. Which is trivial to detect as we go through the parsing and raise an
appropriate error.

> > I'm almost inclined to suggest that if a user wants #vcpus > #pcpus 
> > they
> > should have to specify maxvcpus and not rely on the vnuma parsing code
> > inferring this fact.
> > 
> 
> I don't think we should prevent people from shooting themselves in the
> foot.

If the cost of supporting that is this patch then I disagree. If you can
find a way to do it simply and cleanly then fine, maybe.

In fact I even disagree in general, we can and should provide warnings or
errors for things which we know are bad and which are most likely
unintentional, but provide overrides.

> > IOW maybe this code could just error out (or print a warning) if this
> > happens? + a doc update.
> > 
> 
> Xl doesn't complain when you set vcpus > pcpus. I don't think vNUMA
> should behave differently.

Not always true, e.g. from vcpuset:

   if (max_vcpus > dominfo.vcpu_online && max_vcpus > host_cpu) {
fprintf(stderr, "You are overcommmitting! You have %d physical" \
" CPUs and want %d vCPUs! Aborting, use --ignore-host to" \
" continue\n", host_cpu, max_vcpus);
rc = 1;
}

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/2] x86/HVM: hvm_map_guest_frame_rw() adjustments

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 11:35,  wrote:
> On Thu, Aug 13, 2015 at 03:33:42AM -0600, Jan Beulich wrote:
>> >>> On 13.08.15 at 11:22,  wrote:
>> > On Wed, Aug 12, 2015 at 08:16:33AM -0600, Jan Beulich wrote:
>> >> ... and its callers.
>> >> 
>> >> While all non-nested users are made fully honor the semantics of that
>> >> type, doing so in the nested case seemed insane (if doable at all,
>> >> considering VMCS shadowing), and hence there the respective operations
>> >> are simply made fail.
>> >> 
>> >> One case not (yet) taken care of is that of a page getting transitioned
>> >> to this type after a mapping got established.
>> >> 
>> >> Signed-off-by: Jan Beulich 
>> > 
>> > Is this a bug fix? I think so, but the title only says adjustment so I'd
>> > better be sure.
>> 
>> Yes, it is (I had hoped that the description would be sufficient to
>> tell).
> 
> Thanks for confirmation.

And I only now realize that I screwed up the title - what is here
was what 0/2 had; this patch really is supposed to be named
"x86/HVM: honor p2m_ram_ro in hvm_map_guest_frame_rw()"
just like its v1 was. I'm sorry for that.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xl: close restore file fd when we're done with it

2015-08-13 Thread Wei Liu
And log if close fails.

Reported-by: Andrew Armenia 
Signed-off-by: Wei Liu 
---
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Andrew Armenia 

For 4.6, fix leaking fd to avoid holding on to restoring file.

This should also be backported to many versions that uses xl.

---
 tools/libxl/xl_cmdimpl.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 499a05c..c6b0b68 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2627,6 +2627,7 @@ static uint32_t create_domain(struct domain_create 
*dom_info)
 void *config_data = 0;
 int config_len = 0;
 int restore_fd = -1;
+int restore_fd_to_close = -1;
 const libxl_asyncprogress_how *autoconnect_console_how;
 struct save_file_header hdr;
 
@@ -2650,6 +2651,7 @@ static uint32_t create_domain(struct domain_create 
*dom_info)
 fprintf(stderr, "Can't open restore file: %s\n", 
strerror(errno));
 return ERROR_INVAL;
 }
+restore_fd_to_close = restore_fd;
 rc = libxl_fd_set_cloexec(ctx, restore_fd, 1);
 if (rc) return rc;
 }
@@ -2851,6 +2853,13 @@ start:
 
 release_lock();
 
+if (restore_fd_to_close >= 0) {
+if (close(restore_fd_to_close))
+fprintf(stderr, "Failed to close restoring file, fd %d, errno 
%d\n",
+restore_fd_to_close, errno);
+restore_fd_to_close = -1;
+}
+
 if (!paused)
 libxl_domain_unpause(ctx, domid);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xl: close restore file fd when we're done with it

2015-08-13 Thread Wei Liu
This patch is for 4.6. Forgot to extend the tag in subject line.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 0/2] Refactor ioreq server for better performance.

2015-08-13 Thread Yu Zhang
XenGT leverages ioreq server to track and forward the accesses to
GPU I/O resources, e.g. the PPGTT(per-process graphic translation
tables). Currently, ioreq server uses rangeset to track the BDF/
PIO/MMIO ranges to be emulated. To select an ioreq server, the 
rangeset is searched to see if the I/O range is recorded. However,
traversing the link list inside rangeset could be time consuming
when number of ranges is too high. On HSW platform, number of PPGTTs
for each vGPU could be several hundred. On BDW, this value could
be several thousand.  This patch series refactored rangeset to base
it on red-back tree, so that the searching would be more efficient. 

Besides, this patchset also splits the tracking of MMIO and guest
ram ranges into different rangesets. And to accommodate more ranges,
limitation of the number of ranges in an ioreq server, MAX_NR_IO_RANGES
is changed - future patches might be provided to tune this with other
approaches.


Changes in v4: 
Keep the name HVMOP_IO_RANGE_MEMORY for MMIO resources, and add 
a new one, HVMOP_IO_RANGE_WP_MEM, for write-protected memory.

Changes in v3: 
1> Use a seperate rangeset for guest ram pages in ioreq server;
2> Refactor rangeset, instead of introduce a new data structure.

Changes in v2: 
1> Split the original patch into 2;
2> Take Paul Durrant's comments:
  a> Add a name member in the struct rb_rangeset, and use the 'q' 
debug key to dump the ranges in ioreq server;
  b> Keep original routine names for hvm ioreq server;
  c> Commit message changes - mention that a future patch to change
the maximum ranges inside ioreq server.

Yu Zhang (2):
  Differentiate IO/mem resources tracked by ioreq server
  Refactor rangeset structure for better performance.

 tools/libxc/include/xenctrl.h| 31 +++
 tools/libxc/xc_domain.c  | 55 +++
 xen/arch/x86/hvm/hvm.c   | 26 -
 xen/common/rangeset.c| 82 +---
 xen/include/asm-x86/hvm/domain.h |  4 +-
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/ioreq.h   |  1 +
 7 files changed, 175 insertions(+), 25 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-13 Thread Yu Zhang
Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.

This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang 
---
 tools/libxc/include/xenctrl.h| 31 ++
 tools/libxc/xc_domain.c  | 55 
 xen/arch/x86/hvm/hvm.c   | 26 ++-
 xen/include/asm-x86/hvm/domain.h |  4 +--
 xen/include/public/hvm/hvm_op.h  |  1 +
 xen/include/public/hvm/ioreq.h   |  1 +
 6 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index de3c0ad..53f196d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2010,6 +2010,37 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface 
*xch,
 int is_mmio,
 uint64_t start,
 uint64_t end);
+/**
+ * This function registers a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
+domid_t domid,
+ioservid_t id,
+uint64_t start,
+uint64_t end);
+
+/**
+ * This function deregisters a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
+domid_t domid,
+ioservid_t id,
+uint64_t start,
+uint64_t end);
 
 /**
  * This function registers a PCI device for config space emulation.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 2ee26fb..42aeba9 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1552,6 +1552,61 @@ int xc_hvm_unmap_io_range_from_ioreq_server(xc_interface 
*xch, domid_t domid,
 return rc;
 }
 
+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch, domid_t domid,
+ioservid_t id, uint64_t start, 
uint64_t end)
+{
+DECLARE_HYPERCALL;
+DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+int rc;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+hypercall.op = __HYPERVISOR_hvm_op;
+hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+arg->domid = domid;
+arg->id = id;
+arg->type = HVMOP_IO_RANGE_WP_MEM;
+arg->start = start;
+arg->end = end;
+
+rc = do_xen_hypercall(xch, &hypercall);
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+}
+
+int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch, domid_t domid,
+ioservid_t id, uint64_t start, 
uint64_t end)
+{
+DECLARE_HYPERCALL;
+DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+int rc;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+hypercall.op = __HYPERVISOR_hvm_op;
+hypercall.arg[0] = HVMOP_unmap_io_range_from_ioreq_server;
+hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+arg->domid = domid;
+arg->id = id;
+arg->type = HVMOP_IO_RANGE_WP_MEM;
+arg->start = start;
+arg->end = end;
+
+rc = do_xen_hypercall(xch, &hypercall);
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+
+}
+
 int xc_hvm_map_pcidev_to_ioreq_server(xc_interface *xch, domid_t domid,
   ioservid_t id, uint16_t segment,
   uint8_t bus, uint8_t device,
diff --git a/xen/arch/x86/

[Xen-devel] [PATCH v4 2/2] Refactor rangeset structure for better performance.

2015-08-13 Thread Yu Zhang
This patch refactors struct rangeset to base it on the red-black
tree structure, instead of on the current doubly linked list. By
now, ioreq leverages rangeset to keep track of the IO/memory
resources to be emulated. Yet when number of ranges inside one
ioreq server is very high, traversing a doubly linked list could
be time consuming. With this patch, the time complexity for
searching a rangeset can be improved from O(n) to O(log(n)).
Interfaces of rangeset still remain the same, and no new APIs
introduced.

Signed-off-by: Yu Zhang 
---
 xen/common/rangeset.c | 82 +--
 1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c..87b6aab 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -10,11 +10,12 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /* An inclusive range [s,e] and pointer to next range in ascending order. */
 struct range {
-struct list_head list;
+struct rb_node node;
 unsigned long s, e;
 };
 
@@ -24,7 +25,7 @@ struct rangeset {
 struct domain   *domain;
 
 /* Ordered list of ranges contained in this set, and protecting lock. */
-struct list_head range_list;
+struct rb_root   range_tree;
 
 /* Number of ranges that can be allocated */
 long nr_ranges;
@@ -45,41 +46,78 @@ struct rangeset {
 static struct range *find_range(
 struct rangeset *r, unsigned long s)
 {
-struct range *x = NULL, *y;
+struct rb_node *node;
+struct range   *x;
+struct range   *prev = NULL;
 
-list_for_each_entry ( y, &r->range_list, list )
+node = r->range_tree.rb_node;
+while ( node )
 {
-if ( y->s > s )
-break;
-x = y;
+x = container_of(node, struct range, node);
+if ( (s >= x->s) && (s <= x->e) )
+return x;
+if ( s < x->s )
+node = node->rb_left;
+else if ( s > x->s )
+{
+prev = x;
+node = node->rb_right;
+}
 }
 
-return x;
+return prev;
 }
 
 /* Return the lowest range in the set r, or NULL if r is empty. */
 static struct range *first_range(
 struct rangeset *r)
 {
-if ( list_empty(&r->range_list) )
-return NULL;
-return list_entry(r->range_list.next, struct range, list);
+struct rb_node *node;
+
+node = rb_first(&r->range_tree);
+if ( node != NULL )
+return container_of(node, struct range, node);
+
+return NULL;
 }
 
 /* Return range following x in ascending order, or NULL if x is the highest. */
 static struct range *next_range(
 struct rangeset *r, struct range *x)
 {
-if ( x->list.next == &r->range_list )
-return NULL;
-return list_entry(x->list.next, struct range, list);
+struct rb_node *node;
+
+node = rb_next(&x->node);
+if ( node )
+return container_of(node, struct range, node);
+
+return NULL;
 }
 
 /* Insert range y after range x in r. Insert as first range if x is NULL. */
 static void insert_range(
 struct rangeset *r, struct range *x, struct range *y)
 {
-list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
+struct rb_node **node;
+struct rb_node *parent = NULL;
+
+if ( x == NULL )
+node = &r->range_tree.rb_node;
+else
+{
+node = &x->node.rb_right;
+parent = &x->node;
+}
+
+while ( *node )
+{
+parent = *node;
+node = &parent->rb_left;
+}
+
+/* Add new node and rebalance the red-black tree. */
+rb_link_node(&y->node, parent, node);
+rb_insert_color(&y->node, &r->range_tree);
 }
 
 /* Remove a range from its list and free it. */
@@ -88,7 +126,7 @@ static void destroy_range(
 {
 r->nr_ranges++;
 
-list_del(&x->list);
+rb_erase(&x->node, &r->range_tree);
 xfree(x);
 }
 
@@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
 bool_t rangeset_is_empty(
 const struct rangeset *r)
 {
-return ((r == NULL) || list_empty(&r->range_list));
+return ((r == NULL) || RB_EMPTY_ROOT(&r->range_tree));
 }
 
 struct rangeset *rangeset_new(
@@ -332,7 +370,7 @@ struct rangeset *rangeset_new(
 return NULL;
 
 rwlock_init(&r->lock);
-INIT_LIST_HEAD(&r->range_list);
+r->range_tree = RB_ROOT;
 r->nr_ranges = -1;
 
 BUG_ON(flags & ~RANGESETF_prettyprint_hex);
@@ -410,7 +448,7 @@ void rangeset_domain_destroy(
 
 void rangeset_swap(struct rangeset *a, struct rangeset *b)
 {
-LIST_HEAD(tmp);
+struct rb_node* tmp;
 
 if ( a < b )
 {
@@ -423,9 +461,9 @@ void rangeset_swap(struct rangeset *a, struct rangeset *b)
 write_lock(&a->lock);
 }
 
-list_splice_init(&a->range_list, &tmp);
-list_splice_init(&b->range_list, &a->range_list);
-list_splice(&tmp, &b->range_list);
+tmp = a->range_tree.rb_node;
+a->range_tree.rb_node = b->range_tree.rb_node;
+b->range_tree.rb_node = tmp;
 
 write

Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Tim Deegan
At 03:46 -0600 on 13 Aug (1439437600), Jan Beulich wrote:
> >>> On 13.08.15 at 11:31,  wrote:
> > At 09:50 +0100 on 13 Aug (1439459451), Ian Campbell wrote:
> >> On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
> >> > The function shouldn't return NULL after having obtained a reference,
> >> > or else the caller won't know to drop it.
> >> 
> >> > Also its result shouldn't be ignored - if calling code is certain that
> >> > a page already has a non-zero refcount, it better ASSERT()s so.
> >> 
> >> How is page_get_owner_and_reference sure the reference count was not zero
> >> on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
> >> itself) Or have I misunderstood the assertion being made here?
> > 
> > That path is fine -- if the count was zero, it will return NULL before
> > trying to get the owner.  But I'm not convinced that the assertion is
> > correct -- are there really no pages anywhere that have a recount but
> > no owner?  What about, e.g. shadow pool pages or other uses of
> > MEMF_no_owner?  If a guest can cause Xen to try to get_page() one of
> > those, then we'd hit the new assertion.
> > 
> > How about unlikely(!owner) path that drops the taken ref instead?
> 
> That would be dead code - a page the ref count of wasn't zero
> prior to the function taking an extra ref shouldn't be unowned.

Right.  That being the case, Reviewed-by: Tim Deegan .

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Stefano Stabellini
On Thu, 13 Aug 2015, Jan Beulich wrote:
> >>> On 13.08.15 at 11:43,  wrote:
> > On Thu, 13 Aug 2015, Jan Beulich wrote:
> >> >>> On 12.08.15 at 18:18,  wrote:
> >> > On 12/08/15 16:58, Jan Beulich wrote:
> >> > On 12.08.15 at 17:51,  wrote:
> >> >>> On Wed, Aug 12, 2015 at 5:45 PM, Jan Beulich  wrote:
> >> >>> On 07.08.15 at 04:11,  wrote:
> >> > All these tables will be copied to Dom0 memory except that the reused
> >> > tables(DSDT, SPCR, etc) will be mapped to Dom0.
> >> 
> >>  Which again seems odd - such tables should be considered MMIO
> >>  (despite living in RAM), and hence not be part of Dom0's memory
> >>  assignment.
> >> 
> >> >>> not sure if this applies to the context of your comment, but we had
> >> >>> issues before when trying to deal with this data as device memory
> >> >>> (MMIO), because ARM doesn't allow unaligned accesses etc. on device
> >> >>> memory, and so Dom0 would crash at memory abort exceptions during
> >> >>> boot, so this really does have to be mapped as RAM to Dom0.  If you
> >> >>> meant some internal Xen bookkeeping thing, then just ignore me.
> >> >> 
> >> >> Hmm, how would native Linux avoid such unaligned accesses then?
> >> > 
> >> > ACPI table are living on RAM on ARM. So there is no issue with Linux
> >> > baremetal.
> >> 
> >> I'm sure our interpretation of the meaning of RAM here differs:
> >> While from a physical perspective the tables live in RAM too on
> >> x86, from a memory map perspective they don't. And since iirc
> >> ACPI implies UEFI, and since UEFI has special memory types
> >> for such tables (to prevent the OS from using the space as
> >> normal memory), I can't believe this to be different under ARM.
> >> 
> >> > The "problem" is from Xen where we are mapping memory region with Device
> >> > attribute. This is because until now we never had a reason to directly
> >> > map other thing than MMIO to the domain.
> >> > 
> >> > This could be fixed by adding new helper in Xen to directly map RAM 
> >> > region.
> >> 
> >> Which would seem to be the correct route.
> >> 
> >> > Nonetheless, we still have to copy some table in Xen in order to modify
> >> > them and/or new one. I have in mind the FADT table to set the hypervisor
> >> > field and hiding the hypervisor specific data (GIC hyp, timer hyp...) to
> >> > avoid the kernel thinks there is hyp mode available.
> >> 
> >> "Have to"? Or rather "would like to"? As said in another reply to
> >> Shannon, I didn't see any rationale spelled out for this fundamental
> >> difference to x86.
> > 
> > Just because it was done this way before, it doesn't mean that it is the
> > right way of doing it.
> 
> Correct. Which is why I'm not saying outright "no" but asking for
> justification.
> 
> > I think is bad design to expose the presence of certain functionalities
> > in ACPI tables and then expect that the Dom0 kernel won't use them.
> > ACPI tables should describe only and all the hardware which is exposed
> > to Dom0. Anything else is a mistake.
> 
> That depends on the perspective you take: Especially for a PV
> Dom0 it is quite reasonable for the kernel to be aware of certain
> implications from running under a (or the) hypervisor it knows of.
> I agree that the perspective may shift when it comes to HVM-like
> Dom0 (albeit in the spectrum I'd rather see ARM near PVH, which
> again is supposed to be aware).

PVH has always been the wrong analogy for ARM guests. Guest kernels boot
following native boot paths. They don't require emulation because the
ARM architecture is much more flexible and legacy free compared to x86,
not because we instructed the kernel to ignore certain pieces of
hardware. So they are like PV on HVM without any need for QEMU because
there is nothing to emulate. Maybe the right analogy is HVMLite.


> > For example it is only natural for the kernel to try to use the GIC hyp
> > functionalities if they are described, while actually they are not
> > emulated by Xen at all.
> 
> See Ian's earlier reply: It can also be considered natural for it to
> be aware that when run in EL2 to not use EL1 functionality.

It is not just about the GIC Hyp functionalities.


> > I would rather teach Xen to fix the tables now, than later try teach
> > every possible Dom0 kernel (Linux, FreeBSD, etc) to ignore a set of info
> > which are wrong but still passed to them anyway. Moreover the list of
> > things to ignore can change over time. It is just a bad ABI to maintain.
> 
> I don't think there's a clear advantage to teaching Xen of new
> tables to hide over teaching Dom0 of new tables to ignore - it
> much depends on release cycles and such which one would be
> more beneficial.

There are a couple of very obvious advantages:

- there is one Xen, there are many Dom0 kernels
- if Xen changes ACPI tables, Dom0 won't be surprised by the fact that
actually they don't match the hardware. Keep in mind that we don't have
10 years of PV history on ARM. People still think of this as

Re: [Xen-devel] [PATCH for-4.6] xen/mm: populate_physmap: validate correctly the gfn for direct mapped domain

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 03:54 -0600, Jan Beulich wrote:
> > 
> > > > On 13.08.15 at 11:33,  wrote:
> > On Tue, 2015-08-11 at 18:41 +0100, Julien Grall wrote:
> > > --- a/xen/common/memory.c
> > > +++ b/xen/common/memory.c
> > > @@ -126,22 +126,28 @@ static void populate_physmap(struct memop_args 
> > > *a)
> > >  if ( is_domain_direct_mapped(d) )
> > >  {
> > >  mfn = gpfn;
> > > -if ( !mfn_valid(mfn) )
> > > +
> > > +for ( j = 0; j < (1 << a->extent_order); j++, mfn++ 
> > > )
> > >  {
> > > -gdprintk(XENLOG_INFO, "Invalid mfn 
> > > %#"PRI_xen_pfn"\n",
> > > +if ( !mfn_valid(mfn) )
> > > +{
> > > +gdprintk(XENLOG_INFO, "Invalid mfn 
> > > %#"PRI_xen_pfn"\n",
> > >   mfn);
> > > -goto out;
> > > +goto out;
> > > +}
> > > +
> > > +page = mfn_to_page(mfn);
> > > +if ( !get_page(page, d) )
> > > +{
> > > +gdprintk(XENLOG_INFO,
> > > + "mfn %#"PRI_xen_pfn" doesn't belong 
> > > to the"
> > > + " domain\n", mfn);
> > > +goto out;
> > 
> > This will leak the references on any earlier pages, which will happen 
> > all
> > the time if you are cross a boundary from RAM into e.g. MMIO or 
> > whatever.
> 
> I don't see why it would - the put_page() (as you say a few lines
> down in your reply) follows right afterwards.

Right, sorry I should have delete all this once I realised what was going
on (I wasn't 100% sure I guess).

> > Oh, am I wrong and we do get_page then immediately put_page for every page?
> > I suppose that is one way to check the current owner, but I think it needs
> > a comment at least (it did before TBH, just iterating over many pages makes
> > it seem even odder).
> > 
> > Is that really the best way to ask if a page is owned by a given domain tho
> > ugh? Does page_get_owner not suffice?
> 
> That could produce wrong results when the page previously had
> no reference (and namely, due to the unions used in
> struct page_info, was free or in use as a shadow page).

I see.

Maybe some new helper to encapsulate the get+put to validate the presence
of a current owner would be nice in the future then, but not a blocker for
this patch I guess.

There isn't a race here is there? What if the reference were dropped after
this check but before the guest_physmap_add_page (which takes new
references)? We are implicitly relying on a guarantee made elsewhere that a
direct mapped guest never drops the final ref until it is destroyed (which
can't be happening in this window I think), but it would be obviously safer
against such bugs if we just held the ref until after the p2m was updated.

Another thing for a future/4.7 cleanup I guess.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xl: close restore file fd when we're done with it

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 11:09 +0100, Wei Liu wrote:
> And log if close fails.
> 
> Reported-by: Andrew Armenia 
> Signed-off-by: Wei Liu 
> ---

Acked-by: Ian Campbell 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-13 Thread Paul Durrant
> -Original Message-
> From: Yu Zhang [mailto:yu.c.zh...@linux.intel.com]
> Sent: 13 August 2015 11:06
> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; 
> Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan...@intel.com
> Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq
> server
> 
> Currently in ioreq server, guest write-protected ram pages are
> tracked in the same rangeset with device mmio resources. Yet
> unlike device mmio, which can be in big chunks, the guest write-
> protected pages may be discrete ranges with 4K bytes each.

Would the interfaces be better using xen_pfn_t rather than using uint64_t to 
pass addresses round where the bottom 12 bits will always be zero?

  Paul

> 
> This patch uses a seperate rangeset for the guest ram pages.
> And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.
> 
> Note: Previously, a new hypercall or subop was suggested to map
> write-protected pages into ioreq server. However, it turned out
> handler of this new hypercall would be almost the same with the
> existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
> already a type parameter in this hypercall. So no new hypercall
> defined, only a new type is introduced.
> 
> Signed-off-by: Yu Zhang 
> ---
>  tools/libxc/include/xenctrl.h| 31 ++
>  tools/libxc/xc_domain.c  | 55
> 
>  xen/arch/x86/hvm/hvm.c   | 26 ++-
>  xen/include/asm-x86/hvm/domain.h |  4 +--
>  xen/include/public/hvm/hvm_op.h  |  1 +
>  xen/include/public/hvm/ioreq.h   |  1 +
>  6 files changed, 115 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
> index de3c0ad..53f196d 100644
> --- a/tools/libxc/include/xenctrl.h
> +++ b/tools/libxc/include/xenctrl.h
> @@ -2010,6 +2010,37 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
>  int is_mmio,
>  uint64_t start,
>  uint64_t end);
> +/**
> + * This function registers a range of write-protected memory for emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> +domid_t domid,
> +ioservid_t id,
> +uint64_t start,
> +uint64_t end);
> +
> +/**
> + * This function deregisters a range of write-protected memory for
> emulation.
> + *
> + * @parm xch a handle to an open hypervisor interface.
> + * @parm domid the domain id to be serviced
> + * @parm id the IOREQ Server id.
> + * @parm start start of range
> + * @parm end end of range (inclusive).
> + * @return 0 on success, -1 on failure.
> + */
> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
> +domid_t domid,
> +ioservid_t id,
> +uint64_t start,
> +uint64_t end);
> 
>  /**
>   * This function registers a PCI device for config space emulation.
> diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
> index 2ee26fb..42aeba9 100644
> --- a/tools/libxc/xc_domain.c
> +++ b/tools/libxc/xc_domain.c
> @@ -1552,6 +1552,61 @@ int
> xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
> domid,
>  return rc;
>  }
> 
> +int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
> domid_t domid,
> +ioservid_t id, uint64_t start, 
> uint64_t end)
> +{
> +DECLARE_HYPERCALL;
> +DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
> +int rc;
> +
> +arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
> +if ( arg == NULL )
> +return -1;
> +
> +hypercall.op = __HYPERVISOR_hvm_op;
> +hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
> +hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
> +
> +arg->domid = domid;
> +arg->id = id;
> +arg->type = HVMOP_IO_RANGE_WP_MEM;
> +arg->start = start;
> +arg->end = end;
> +
> +rc = do_xen_hypercall(xch, &hypercall);
> +
> +xc_hypercall_buffer_free(xch, arg);
> +return rc;
> +}
> +
> +int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
> domid_t domid,
> +ioservid_t id, uint64_t start, 
> uint64_t end)
> +{
> +DECLARE_HYPERCALL;
> +DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, ar

Re: [Xen-devel] [PATCH 1/2] xen: move perform_gunzip to common

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 03:57 -0600, Jan Beulich wrote:
> > 
> > > > On 13.08.15 at 11:28,  wrote:
> > On Thu, 13 Aug 2015, Jan Beulich wrote:
> > > > > > On 12.08.15 at 18:15,  wrote:
> > > > On Wed, 12 Aug 2015, Jan Beulich wrote:
> > > > > > > > On 12.08.15 at 16:47,  
> > > > > > > > wrote:
> > > > > > @@ -31,8 +33,15 @@ typedef int decompress_fn(unsigned char 
> > > > > > *inbuf, unsigned int len,
> > > > > >   * dependent).
> > > > > >   */
> > > > > >  
> > > > > > -decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4;
> > > > > > +decompress_fn perform_gunzip, bunzip2, unxz, unlzma, unlzo, 
> > > > > > unlz4;
> > > > > >  
> > > > > >  int decompress(void *inbuf, unsigned int len, void *outbuf);
> > > > > >  
> > > > > > +static inline unsigned long output_length(char *image, 
> > > > > > unsigned long image_len)
> > > > > 
> > > > > Neither of the callers gets moved out of bzimage.c - why does 
> > > > > this
> > > > > function need to move?
> > > > 
> > > > We'll use it on arm.
> > > 
> > > Hmm, the way it is used on x86 makes it quite architecture specific
> > > (namely because of the assumption that the size is also in said
> > > place for non-gz compression methods). I'd therefore prefer code
> > > duplication over code sharing here. 
> > 
> > Actually after seeing the size and quality of the resulting patches, I
> > am starting to feel the same way.
> > 
> > In terms of code changes, I was thinking that the best result would be
> > moving the "boilerplate" code from xen/arch/x86/bzimage.c to
> > xen/common/inflate.c, see below, then the interface would become just
> > perform_gunzip and gzip_check. But I guess you wouldn't want inflate.c
> > to be modified, right?
> 
> Yes, unless really unavoidable.
> 
> > Alternatively we could move it to a new file, let's call it gunzip.h,
> > that would #include "inflate.c", so:
> > 
> > bzimage.c -- #include --> gunzip.h -- #include --> inflate.c
> > 
> > And again we just leave the perform_gunzip and gzip_check calls in
> > bzimage.c.  What do you think?

How about putting perform_gunzip and gzip_check into a new gunzip.c which
includes inflate.c?


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCHv3 01/10] mm: memory hotplug with an existing resource

2015-08-13 Thread David Vrabel
On 30/07/15 18:03, David Vrabel wrote:
> Add add_memory_resource() to add memory using an existing "System RAM"
> resource.  This is useful if the memory region is being located by
> finding a free resource slot with allocate_resource().
> 
> Xen guests will make use of this in their balloon driver to hotplug
> arbitrary amounts of memory in response to toolstack requests.

Ping?  This enables a useful feature for Xen guests.

> Signed-off-by: David Vrabel 
> Cc: Andrew Morton 
> ---
>  include/linux/memory_hotplug.h |  2 ++
>  mm/memory_hotplug.c| 28 +---
>  2 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
> index 6ffa0ac..c76d371 100644
> --- a/include/linux/memory_hotplug.h
> +++ b/include/linux/memory_hotplug.h
> @@ -11,6 +11,7 @@ struct zone;
>  struct pglist_data;
>  struct mem_section;
>  struct memory_block;
> +struct resource;
>  
>  #ifdef CONFIG_MEMORY_HOTPLUG
>  
> @@ -266,6 +267,7 @@ static inline void remove_memory(int nid, u64 start, u64 
> size) {}
>  extern int walk_memory_range(unsigned long start_pfn, unsigned long end_pfn,
>   void *arg, int (*func)(struct memory_block *, void *));
>  extern int add_memory(int nid, u64 start, u64 size);
> +extern int add_memory_resource(int nid, struct resource *resource);
>  extern int zone_for_memory(int nid, u64 start, u64 size, int zone_default);
>  extern int arch_add_memory(int nid, u64 start, u64 size);
>  extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 003dbe4..169770a 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1224,23 +1224,21 @@ int zone_for_memory(int nid, u64 start, u64 size, int 
> zone_default)
>  }
>  
>  /* we are OK calling __meminit stuff here - we have CONFIG_MEMORY_HOTPLUG */
> -int __ref add_memory(int nid, u64 start, u64 size)
> +int __ref add_memory_resource(int nid, struct resource *res)
>  {
> + u64 start, size;
>   pg_data_t *pgdat = NULL;
>   bool new_pgdat;
>   bool new_node;
> - struct resource *res;
>   int ret;
>  
> + start = res->start;
> + size = resource_size(res);
> +
>   ret = check_hotplug_memory_range(start, size);
>   if (ret)
>   return ret;
>  
> - res = register_memory_resource(start, size);
> - ret = -EEXIST;
> - if (!res)
> - return ret;
> -
>   {   /* Stupid hack to suppress address-never-null warning */
>   void *p = NODE_DATA(nid);
>   new_pgdat = !p;
> @@ -1290,6 +1288,22 @@ out:
>   mem_hotplug_done();
>   return ret;
>  }
> +EXPORT_SYMBOL_GPL(add_memory_resource);
> +
> +int __ref add_memory(int nid, u64 start, u64 size)
> +{
> + struct resource *res;
> + int ret;
> +
> + res = register_memory_resource(start, size);
> + if (!res)
> + return -EEXIST;
> +
> + ret = add_memory_resource(nid, res);
> + if (ret < 0)
> + release_memory_resource(res);
> + return ret;
> +}
>  EXPORT_SYMBOL_GPL(add_memory);
>  
>  #ifdef CONFIG_MEMORY_HOTREMOVE
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 12:12,  wrote:
> At 03:46 -0600 on 13 Aug (1439437600), Jan Beulich wrote:
>> >>> On 13.08.15 at 11:31,  wrote:
>> > At 09:50 +0100 on 13 Aug (1439459451), Ian Campbell wrote:
>> >> On Fri, 2015-07-24 at 06:37 -0600, Jan Beulich wrote:
>> >> > The function shouldn't return NULL after having obtained a reference,
>> >> > or else the caller won't know to drop it.
>> >> 
>> >> > Also its result shouldn't be ignored - if calling code is certain that
>> >> > a page already has a non-zero refcount, it better ASSERT()s so.
>> >> 
>> >> How is page_get_owner_and_reference sure the reference count was not zero
>> >> on entry? (WRT the call to page_get_owner in page_get_owner_and_reference
>> >> itself) Or have I misunderstood the assertion being made here?
>> > 
>> > That path is fine -- if the count was zero, it will return NULL before
>> > trying to get the owner.  But I'm not convinced that the assertion is
>> > correct -- are there really no pages anywhere that have a recount but
>> > no owner?  What about, e.g. shadow pool pages or other uses of
>> > MEMF_no_owner?  If a guest can cause Xen to try to get_page() one of
>> > those, then we'd hit the new assertion.
>> > 
>> > How about unlikely(!owner) path that drops the taken ref instead?
>> 
>> That would be dead code - a page the ref count of wasn't zero
>> prior to the function taking an extra ref shouldn't be unowned.
> 
> Right.  That being the case, Reviewed-by: Tim Deegan .

Btw, as I was about to add a comment as you asked for along with
adding the one Ian wants I realized that
page_get_owner_and_reference() is a helper of get_page(), i.e.
their behavior of not always acquiring a reference is the same.
Which - to me - makes adding such a comment pretty pointless.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 11:13 +0100, Stefano Stabellini wrote:
> 
> > > For example it is only natural for the kernel to try to use the GIC 
> > > hyp
> > > functionalities if they are described, while actually they are not
> > > emulated by Xen at all.
> > 
> > See Ian's earlier reply: It can also be considered natural for it to
> > be aware that when run in EL2 to not use EL1 functionality.

NB EL2 == Hyp and EL1 == Kernel, so it's the other way round, FWIW.

> It is not just about the GIC Hyp functionalities.

What else is there which is not subject to this logic? Timers are too, it
even applies to IOMMU's which have both stage1 and stage2 bits.

BTW, I think kernels _already_ need to deal with a lot of this because in
reality nobody modifies the DTB when they use a firmware which launches the
kernel in EL1. IOW I think the kernel is already aware of which resources
can be used by which privilege level.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 11:07:47AM +0100, Ian Campbell wrote:
> On Thu, 2015-08-13 at 10:54 +0100, Wei Liu wrote:
> > On Thu, Aug 13, 2015 at 10:42:26AM +0100, Ian Campbell wrote:
> > > On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> > > > Originally, if user didn't specify maxvcpus= in xl config file, the
> > > > maximum size of vcpu bitmap was always equal to maximum number of 
> > > > pcpus.
> > > > This might not be what user wants.
> > > 
> > > What are you suggesting they wanted instead? We are only talking about 
> > > the
> > > bitmap right, and the typical/sensible config will have #vcpus <= 
> > > #pcpus,
> > > so they will fit even if they "waste" some bits during parsing.
> > 
> > #vcpus > #pcpus, bitmap is too small.
> 
> Right. Which is trivial to detect as we go through the parsing and raise an
> appropriate error.
> 
> > > I'm almost inclined to suggest that if a user wants #vcpus > #pcpus 
> > > they
> > > should have to specify maxvcpus and not rely on the vnuma parsing code
> > > inferring this fact.
> > > 
> > 
> > I don't think we should prevent people from shooting themselves in the
> > foot.
> 
> If the cost of supporting that is this patch then I disagree. If you can
> find a way to do it simply and cleanly then fine, maybe.
> 

I would say let's make xl dumb. If the maximum number of vcpus specified
in vNUMA doesn't match maximum number of vcpus specified in
configuration file, error out. That would maybe lead to a smaller patch.

This was my first implementation of this parser routine but for some
reason we tried to make xl smart (to sum up vcpus in vNUMA
configuration), and now it's causing us trouble.

> In fact I even disagree in general, we can and should provide warnings or
> errors for things which we know are bad and which are most likely
> unintentional, but provide overrides.
> 
> > > IOW maybe this code could just error out (or print a warning) if this
> > > happens? + a doc update.
> > > 
> > 
> > Xl doesn't complain when you set vcpus > pcpus. I don't think vNUMA
> > should behave differently.
> 
> Not always true, e.g. from vcpuset:
> 
>if (max_vcpus > dominfo.vcpu_online && max_vcpus > host_cpu) {
> fprintf(stderr, "You are overcommmitting! You have %d physical" \
> " CPUs and want %d vCPUs! Aborting, use --ignore-host to" 
> \
> " continue\n", host_cpu, max_vcpus);
> rc = 1;
> }
> 

That is for vcpuset command, not xl configuration file parser. There is
nothing preventing user from setting #vcpus > #pcpus and no warning
woudl be given. The parser could use some improvement, but that's 4.7
material.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xen: move perform_gunzip to common

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 12:17,  wrote:
> On Thu, 2015-08-13 at 03:57 -0600, Jan Beulich wrote:
>> > 
>> > > > On 13.08.15 at 11:28,  wrote:
>> > On Thu, 13 Aug 2015, Jan Beulich wrote:
>> > > > > > On 12.08.15 at 18:15,  wrote:
>> > > > On Wed, 12 Aug 2015, Jan Beulich wrote:
>> > > > > > > > On 12.08.15 at 16:47,  
>> > > > > > > > wrote:
>> > > > > > @@ -31,8 +33,15 @@ typedef int decompress_fn(unsigned char 
>> > > > > > *inbuf, unsigned int len,
>> > > > > >   * dependent).
>> > > > > >   */
>> > > > > >  
>> > > > > > -decompress_fn bunzip2, unxz, unlzma, unlzo, unlz4;
>> > > > > > +decompress_fn perform_gunzip, bunzip2, unxz, unlzma, unlzo, 
>> > > > > > unlz4;
>> > > > > >  
>> > > > > >  int decompress(void *inbuf, unsigned int len, void *outbuf);
>> > > > > >  
>> > > > > > +static inline unsigned long output_length(char *image, 
>> > > > > > unsigned long image_len)
>> > > > > 
>> > > > > Neither of the callers gets moved out of bzimage.c - why does 
>> > > > > this
>> > > > > function need to move?
>> > > > 
>> > > > We'll use it on arm.
>> > > 
>> > > Hmm, the way it is used on x86 makes it quite architecture specific
>> > > (namely because of the assumption that the size is also in said
>> > > place for non-gz compression methods). I'd therefore prefer code
>> > > duplication over code sharing here. 
>> > 
>> > Actually after seeing the size and quality of the resulting patches, I
>> > am starting to feel the same way.
>> > 
>> > In terms of code changes, I was thinking that the best result would be
>> > moving the "boilerplate" code from xen/arch/x86/bzimage.c to
>> > xen/common/inflate.c, see below, then the interface would become just
>> > perform_gunzip and gzip_check. But I guess you wouldn't want inflate.c
>> > to be modified, right?
>> 
>> Yes, unless really unavoidable.
>> 
>> > Alternatively we could move it to a new file, let's call it gunzip.h,
>> > that would #include "inflate.c", so:
>> > 
>> > bzimage.c -- #include --> gunzip.h -- #include --> inflate.c
>> > 
>> > And again we just leave the perform_gunzip and gzip_check calls in
>> > bzimage.c.  What do you think?
> 
> How about putting perform_gunzip and gzip_check into a new gunzip.c which
> includes inflate.c?

Would seem as good to me, provided specifically the output_length()
helper then can stay where it is.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST] ts-xen-install: Install netcat-openbsd

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 09:21 +0100, Ian Campbell wrote:
> 
> Which is pretty WTF... Logs for this one are at 
> http://xenbits.xen.org/people/ianc/tmp/37828/
> http://xenbits.xen.org/people/ianc/tmp/37828/test-amd64-amd64-libvirt
> -pair/info.html
> 
> I'm doing an experiment now with $dho->{Ip} but that seems like it would
> just be papering over the issue, whatever it is.

http://xenbits.xen.org/people/ianc/tmp/37831/ =>
http://xenbits.xen.org/people/ianc/tmp/37831/test-amd64-amd64-libvirt-pair/21.ts-guest-migrate.log

2015-08-13 09:42:45 Z executing ssh ... root@10.80.229.144 virsh migrate --live 
debian.guest.osstest xen+ssh://10.80.228.77
error: unable to connect to 'lace-bug.uk.xensource.com:49152': Invalid argument

Which, just, WTF?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Christoffer Dall
On Thu, Aug 13, 2015 at 11:22:19AM +0100, Ian Campbell wrote:
> On Thu, 2015-08-13 at 11:13 +0100, Stefano Stabellini wrote:
> > 
> > > > For example it is only natural for the kernel to try to use the GIC 
> > > > hyp
> > > > functionalities if they are described, while actually they are not
> > > > emulated by Xen at all.
> > > 
> > > See Ian's earlier reply: It can also be considered natural for it to
> > > be aware that when run in EL2 to not use EL1 functionality.
> 
> NB EL2 == Hyp and EL1 == Kernel, so it's the other way round, FWIW.
> 
> > It is not just about the GIC Hyp functionalities.
> 
> What else is there which is not subject to this logic? Timers are too, it
> even applies to IOMMU's which have both stage1 and stage2 bits.
> 
> BTW, I think kernels _already_ need to deal with a lot of this because in
> reality nobody modifies the DTB when they use a firmware which launches the
> kernel in EL1. IOW I think the kernel is already aware of which resources
> can be used by which privilege level.
> 
Yes, for resources specific to EL2 I believe that is indeed the case
(the GIC driver doesn't look at the hypervisor control register address,
and KVM does not even get that far if you're not booted in EL2, and the
timer only uses the virtual timer if not booted in EL2 - we never
attempt to use the hyp timer until Marc's VHE patches land, but they
also depend on being booted in hyp mode).

However, what about for other resources?  Having code somewhere that
says "hide this random piece of hardware if you're Xen dom0" sounds
awful to me.  I know it's only the serial port right now, but still.

-Christoffer

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] xen/mm: populate_physmap: validate correctly the gfn for direct mapped domain

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 12:14,  wrote:
> There isn't a race here is there? What if the reference were dropped after
> this check but before the guest_physmap_add_page (which takes new
> references)? We are implicitly relying on a guarantee made elsewhere that a
> direct mapped guest never drops the final ref until it is destroyed (which
> can't be happening in this window I think), but it would be obviously safer
> against such bugs if we just held the ref until after the p2m was updated.

Yeah, we depend on this, and what you suggest would indeed be
an improvment.

> Another thing for a future/4.7 cleanup I guess.

Right.

But as bottom line I take it that your objections to the patch going in
soon have been eliminated?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] tools/libxc: linux: Don't use getpagesize() when unmapping the grants

2015-08-13 Thread Ian Campbell
On Tue, 2015-08-11 at 13:48 +0100, Ian Campbell wrote:
> On Fri, 2015-08-07 at 22:45 +0100, Wei Liu wrote:
> > On Fri, Aug 07, 2015 at 07:53:55PM +0100, Julien Grall wrote:
> > > The grants are based on the Xen granularity (i.e 4KB). While the 
> > > function
> > > to map grants for Linux (linux_gnttab_grant_map) is using the correct
> > > size (XC_PAGE_SIZE), the unmap one (linux_gnttab_munmap) is using
> > > getpagesize().
> > > 
> > > On domain using a page granularity different than Xen (this is the 
> > > case
> > > for AARCH64 guest using 64KB page), the unmap will be called with the
> > > wrong size.
> > > 
> > > Signed-off-by: Julien Grall 
> > > 
> > > ---
> > > Cc: Ian Jackson 
> > > Cc: Stefano Stabellini 
> > > Cc: Ian Campbell 
> > > Cc: Wei Liu 
> > > 
> > 
> > Acked-by: Wei Liu 
> 
> Acked-by: Ian Campbell 
> 
> > I think this is a bug fix and should be applied for 4.6.

I took this as a release ack and applied.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 v2 0/4] Patches for c/oxenstored

2015-08-13 Thread Ian Campbell
On Mon, 2015-08-10 at 09:00 +0100, Wei Liu wrote:
> Wei Liu (4):
>   cxenstored: fix systemd socket activation
>   cxenstored: document a bunch of short options in help string
>   oxenstored: fix systemd socket activation
>   oxenstored: move sd_notify_ready out of main loop

Applied.

> 
>  tools/ocaml/xenstored/systemd.ml  |  2 +-
>  tools/ocaml/xenstored/systemd.mli |  4 +--
>  tools/ocaml/xenstored/systemd_stubs.c |  6 ++--
>  tools/ocaml/xenstored/utils.ml|  2 +-
>  tools/ocaml/xenstored/xenstored.ml|  4 +--
>  tools/xenstore/xenstored_core.c   | 59 +
> --
>  6 files changed, 44 insertions(+), 33 deletions(-)
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] tools/xenstore: Correct use of va_end() after va_copy()

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 10:45 +0100, Wei Liu wrote:
> On Fri, Aug 07, 2015 at 02:51:59PM +0100, Andrew Cooper wrote:
> > C requires that every use of va_copy() is matched with a va_end() call.
> > 
> > This is especially important for x86_64 as va_{start,copy}() may need 
> > to
> > allocate memory to generate a va_list containing parameters which were
> > previously in registers.
> > 
> > Signed-off-by: Andrew Cooper 
> 
> Release-acked-by: Wei Liu 

Applied, thanks.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] tools: libxl: Remove unnecessary trailing \n from log messages.

2015-08-13 Thread Ian Campbell
On Wed, 2015-08-12 at 17:42 +0100, Wei Liu wrote:
> On Wed, Aug 12, 2015 at 02:56:01PM +0100, Ian Campbell wrote:
> > Both xl's LOG and the various libxl logging mechanisms automatically
> > include a trailing \n.
> > 
> > Remove all unnecessary \n's from the logs messages with the following
> > semantic patch.
> > 
> > spatch also reindents (I couldn't see how to make it stop). In general
> > it has improved matters but in 1 case it has introduced a long line,
> > this will be fixed in the next patch.
> > 
> > Semantic patch, run as
> > spatch --in-place --no-includes --include-headers \
> > --sp-file libxl-log-nl.spatch \
> > tools/libxl/libxl*.[ch] tools/libxl/xl*.[ch]
> 
> Both patches:
> 
> Release-acked-by: Wei Liu 

Applied, thanks.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] tools: Don't try to update the firmware directory on ARM

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 10:44 +0100, Wei Liu wrote:
> On Fri, Aug 07, 2015 at 06:27:18PM +0100, Julien Grall wrote:
> > The firmware directory is not built at all on ARM. Attempting to update
> > it using the target subtree-force-update will fail when try to update
> > seabios.
> > 
> > Signed-off-by: Julien Grall 
> 
> Release-acked-by: Wei Liu 

Acked + applied, thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] gitignore: Don't ignore *.rej

2015-08-13 Thread Ian Campbell
On Wed, 2015-08-12 at 16:21 +0100, Ian Jackson wrote:
> Konrad Rzeszutek Wilk writes ("Re: [Xen-devel] [PATCH] gitignore: Don't 
> ignore *.rej"):
> > On Wed, Aug 12, 2015 at 10:07:37AM +0100, Ian Campbell wrote:
> > > These indicate a patch application went wrong, I want to see them in
> > > "git status". This appears to have been imported from .hgignore where
> > > it has been since 2005.
> > > 
> > > Signed-off-by: Ian Campbell 
> > > Cc: Daniel Kiper 
> > 
> > Reviewed-by: Konrad Rzeszutek Wilk 
> 
> Acked-by: Ian Jackson 
> 
> (As an aside, I would normally apply a `usual' kind of trivial
> .gitignore update without waiting for any kind of ack.  But this one
> seems not entirely trivial.)

Right, I thought it might even be controversial, but it seems not. I've
applied.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 2/2] Refactor rangeset structure for better performance.

2015-08-13 Thread Paul Durrant
> -Original Message-
> From: Yu Zhang [mailto:yu.c.zh...@linux.intel.com]
> Sent: 13 August 2015 11:06
> To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; 
> Ian
> Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper
> Cc: Kevin Tian; zhiyuan...@intel.com
> Subject: [PATCH v4 2/2] Refactor rangeset structure for better performance.
> 
> This patch refactors struct rangeset to base it on the red-black
> tree structure, instead of on the current doubly linked list. By
> now, ioreq leverages rangeset to keep track of the IO/memory
> resources to be emulated. Yet when number of ranges inside one
> ioreq server is very high, traversing a doubly linked list could
> be time consuming. With this patch, the time complexity for
> searching a rangeset can be improved from O(n) to O(log(n)).
> Interfaces of rangeset still remain the same, and no new APIs
> introduced.
> 
> Signed-off-by: Yu Zhang 
> ---
>  xen/common/rangeset.c | 82
> +--
>  1 file changed, 60 insertions(+), 22 deletions(-)
> 
> diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
> index 6c6293c..87b6aab 100644
> --- a/xen/common/rangeset.c
> +++ b/xen/common/rangeset.c
> @@ -10,11 +10,12 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  /* An inclusive range [s,e] and pointer to next range in ascending order. */
>  struct range {
> -struct list_head list;
> +struct rb_node node;
>  unsigned long s, e;
>  };
> 
> @@ -24,7 +25,7 @@ struct rangeset {
>  struct domain   *domain;
> 
>  /* Ordered list of ranges contained in this set, and protecting lock. */
> -struct list_head range_list;
> +struct rb_root   range_tree;
> 
>  /* Number of ranges that can be allocated */
>  long nr_ranges;
> @@ -45,41 +46,78 @@ struct rangeset {
>  static struct range *find_range(
>  struct rangeset *r, unsigned long s)
>  {
> -struct range *x = NULL, *y;
> +struct rb_node *node;
> +struct range   *x;
> +struct range   *prev = NULL;
> 
> -list_for_each_entry ( y, &r->range_list, list )
> +node = r->range_tree.rb_node;
> +while ( node )
>  {
> -if ( y->s > s )
> -break;
> -x = y;
> +x = container_of(node, struct range, node);
> +if ( (s >= x->s) && (s <= x->e) )
> +return x;
> +if ( s < x->s )
> +node = node->rb_left;
> +else if ( s > x->s )

Do you need else if there? I think else would do. s either has to be < x->s or 
> x->e to get to this point.

> +{
> +prev = x;
> +node = node->rb_right;
> +}
>  }
> 
> -return x;
> +return prev;

Do you not want to return NULL here? It looks like you'd only get here if there 
was no matching range.

  Paul

>  }
> 
>  /* Return the lowest range in the set r, or NULL if r is empty. */
>  static struct range *first_range(
>  struct rangeset *r)
>  {
> -if ( list_empty(&r->range_list) )
> -return NULL;
> -return list_entry(r->range_list.next, struct range, list);
> +struct rb_node *node;
> +
> +node = rb_first(&r->range_tree);
> +if ( node != NULL )
> +return container_of(node, struct range, node);
> +
> +return NULL;
>  }
> 
>  /* Return range following x in ascending order, or NULL if x is the highest. 
> */
>  static struct range *next_range(
>  struct rangeset *r, struct range *x)
>  {
> -if ( x->list.next == &r->range_list )
> -return NULL;
> -return list_entry(x->list.next, struct range, list);
> +struct rb_node *node;
> +
> +node = rb_next(&x->node);
> +if ( node )
> +return container_of(node, struct range, node);
> +
> +return NULL;
>  }
> 
>  /* Insert range y after range x in r. Insert as first range if x is NULL. */
>  static void insert_range(
>  struct rangeset *r, struct range *x, struct range *y)
>  {
> -list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
> +struct rb_node **node;
> +struct rb_node *parent = NULL;
> +
> +if ( x == NULL )
> +node = &r->range_tree.rb_node;
> +else
> +{
> +node = &x->node.rb_right;
> +parent = &x->node;
> +}
> +
> +while ( *node )
> +{
> +parent = *node;
> +node = &parent->rb_left;
> +}
> +
> +/* Add new node and rebalance the red-black tree. */
> +rb_link_node(&y->node, parent, node);
> +rb_insert_color(&y->node, &r->range_tree);
>  }
> 
>  /* Remove a range from its list and free it. */
> @@ -88,7 +126,7 @@ static void destroy_range(
>  {
>  r->nr_ranges++;
> 
> -list_del(&x->list);
> +rb_erase(&x->node, &r->range_tree);
>  xfree(x);
>  }
> 
> @@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
>  bool_t rangeset_is_empty(
>  const struct rangeset *r)
>  {
> -return ((r == NULL) || list_empty(&r->range_list));
> +return ((r == N

Re: [Xen-devel] [xs-devel] Trying to bring up stub domain in xen-4.4-xs88306

2015-08-13 Thread Wei Liu
On Thu, Aug 13, 2015 at 10:52:14AM +0100, Andrew Cooper wrote:
> On 13/08/15 08:04, Xuehan Xu wrote:
> > Hi, everyone.
> >
> > I'm trying to run a Windows HVM vm with stub domain in xenserver-6.5,
> > whose internal xen version is xen-4.4-xs88306. After I started the vm,
> > both the vm and its corresponding stub domain crashed. Here is the
> > related content in hypervisor.log. The domain ID of the windows vm is
> > 1, and the stub domain's id is 2.
> >
> > [2015-08-12 17:11:24] (d1) [  135.564650] HVM Loader
> > [2015-08-12 17:11:24] (d1) [  135.564756] Detected Xen v4.4.1-xs88306
> > [2015-08-12 17:11:24] (d1) [  135.564850] Xenbus rings @0xfeffc000,
> > event channel 2
> > [2015-08-12 17:11:24] (d1) [  135.565048] System requested ROMBIOS
> > [2015-08-12 17:11:24] (d1) [  135.565090] CPU speed is 2494 MHz
> > [2015-08-12 17:11:24] (d2) [  135.565234] Bootstrapping...
> > [2015-08-12 17:11:24] (d2) [  135.565275] Xen Minimal OS!
> > [2015-08-12 17:11:24] (d2) [  135.565284]   start_info: 0x585000(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565289] nr_pages: 0x2000
> > [2015-08-12 17:11:24] (d2) [  135.565293]   shared_inf: 0xbda9e000(MA)
> > [2015-08-12 17:11:24] (d2) [  135.565296]  pt_base: 0x588000(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565300] nr_pt_frames: 0x7
> > [2015-08-12 17:11:24] (d2) [  135.565304] mfn_list: 0x575000(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565307]mod_start: 0x0(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565311]  mod_len: 0
> > [2015-08-12 17:11:24] (d2) [  135.565314]flags: 0x0
> > [2015-08-12 17:11:24] (d2) [  135.565318] cmd_line: 
> > [2015-08-12 17:11:24] (d2) [  135.565374]   stack:  0x534660-0x554660
> > [2015-08-12 17:11:24] (d2) [  135.565381] MM: Init
> > [2015-08-12 17:11:24] (d2) [  135.565385]   _text: 0x0(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565389]  _etext: 0x1203b2(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565393]_erodata: 0x176000(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565396]  _edata: 0x17bf88(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565399] stack start: 0x534660(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565402]_end: 0x574f68(VA)
> > [2015-08-12 17:11:24] (d2) [  135.565406]   start_pfn: 592
> > [2015-08-12 17:11:24] (d2) [  135.565410] max_pfn: 2000
> > [2015-08-12 17:11:24] (d2) [  135.565414] Mapping memory range
> > 0x80 - 0x200
> > [2015-08-12 17:11:24] (d1) [  135.565525] Relocating guest memory for
> > lowmem MMIO space enabled
> > [2015-08-12 17:11:24] (d1) [  135.565586] PCI-ISA link 0 routed to IRQ5
> > [2015-08-12 17:11:24] (d1) [  135.565648] PCI-ISA link 1 routed to IRQ10
> > [2015-08-12 17:11:24] (d1) [  135.565710] PCI-ISA link 2 routed to IRQ11
> > [2015-08-12 17:11:24] (d1) [  135.565770] PCI-ISA link 3 routed to IRQ5
> > [2015-08-12 17:11:24] (d1) [  135.566884] *** HVMLoader assertion
> > '(devfn != PCI_ISA_DEVFN) || ((vendor_id == 0x8086) && 
> > [2015-08-12 17:11:24] (d1) [  135.566968] (device_id == 0x7000))'
> > failed at pci.c:112
> > [2015-08-12 17:11:24] (d1) [  135.567012] *** HVMLoader crashed.
> > [2015-08-12 17:11:24] (d2) [  135.568790] setting 0x0-0x176000 readonly
> > [2015-08-12 17:11:24] (d2) [  135.568799] skipped 0x1000
> > [2015-08-12 17:11:24] (d2) [  135.568980] MM: Initialise page
> > allocator for 59e000(59e000)-200(200)
> > [2015-08-12 17:11:24] (d2) [  135.568999] MM: done
> > [2015-08-12 17:11:24] (d2) [  135.569050] Demand map pfns at
> > 2001000-2002001000.
> > [2015-08-12 17:11:24] (d2) [  135.569056] Heap resides at
> > 2002002000-4002002000.
> > [2015-08-12 17:11:24] (d2) [  135.569060] Initialising timer interface
> > [2015-08-12 17:11:24] (d2) [  135.569125] Initialising console ... done.
> > [2015-08-12 17:11:24] (d2) [  135.569199] gnttab_table mapped at
> > 0x2001000.
> > [2015-08-12 17:11:24] (d2) [  135.569205] Initialising scheduler
> > [2015-08-12 17:11:24] (d2) [  135.569216] Thread "Idle": pointer:
> > 0x2002002050, stack: 0x5c
> > [2015-08-12 17:11:24] (d2) [  135.569225] Thread "xenstore": pointer:
> > 0x2002002800, stack: 0x5d
> > [2015-08-12 17:11:24] (d2) [  135.569232] xenbus initialised on irq 1
> > mfn 0xbd3f5
> > [2015-08-12 17:11:24] (d2) [  135.569247] Thread "shutdown": pointer:
> > 0x2002002fb0, stack: 0x5e
> > [2015-08-12 17:11:24] (d2) [  135.569254] Dummy main: start_info=0x554760
> > [2015-08-12 17:11:24] (d2) [  135.569260] Thread "main": pointer:
> > 0x2002003760, stack: 0x5f
> > [2015-08-12 17:11:24] (d2) [  135.569295] Thread "pcifront": pointer:
> > 0x2002003f50, stack: 0x60
> > [2015-08-12 17:11:24] (d2) [  135.569311] pcifront_watches: waiting
> > for backend path to appear device/pci/0/backend
> > [2015-08-12 17:11:24] (d2) [  135.570783] dom vm is at
> > /vm/8b9cf2d6-4f2e-b0b4-36fb-1728f111f6e7
> > [2015-08-12 17:11:24] (d2) [  135.571479] 
> > NETFRONT for device/vif/0 **
> > [2015-08-12 17:11:24] (d2) [  135.571484] 
> > [2015-08-

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Stefano Stabellini
On Thu, 13 Aug 2015, Christoffer Dall wrote:
> On Thu, Aug 13, 2015 at 11:22:19AM +0100, Ian Campbell wrote:
> > On Thu, 2015-08-13 at 11:13 +0100, Stefano Stabellini wrote:
> > > 
> > > > > For example it is only natural for the kernel to try to use the GIC 
> > > > > hyp
> > > > > functionalities if they are described, while actually they are not
> > > > > emulated by Xen at all.
> > > > 
> > > > See Ian's earlier reply: It can also be considered natural for it to
> > > > be aware that when run in EL2 to not use EL1 functionality.
> > 
> > NB EL2 == Hyp and EL1 == Kernel, so it's the other way round, FWIW.
> > 
> > > It is not just about the GIC Hyp functionalities.
> > 
> > What else is there which is not subject to this logic? Timers are too, it
> > even applies to IOMMU's which have both stage1 and stage2 bits.
> > 
> > BTW, I think kernels _already_ need to deal with a lot of this because in
> > reality nobody modifies the DTB when they use a firmware which launches the
> > kernel in EL1. IOW I think the kernel is already aware of which resources
> > can be used by which privilege level.
> > 
> Yes, for resources specific to EL2 I believe that is indeed the case
> (the GIC driver doesn't look at the hypervisor control register address,
> and KVM does not even get that far if you're not booted in EL2, and the
> timer only uses the virtual timer if not booted in EL2 - we never
> attempt to use the hyp timer until Marc's VHE patches land, but they
> also depend on being booted in hyp mode).
>
> However, what about for other resources?  Having code somewhere that
> says "hide this random piece of hardware if you're Xen dom0" sounds
> awful to me.  I know it's only the serial port right now, but still.

I completely agree. Non-EL2 specific resources should defenitely be
removed. I can see that EL2 stuff could be left there, but I think that
removing it now would avoid us trouble in the future.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 12:29 +0200, Christoffer Dall wrote:
> On Thu, Aug 13, 2015 at 11:22:19AM +0100, Ian Campbell wrote:
> > On Thu, 2015-08-13 at 11:13 +0100, Stefano Stabellini wrote:
> > > 
> > > > > For example it is only natural for the kernel to try to use the 
> > > > > GIC 
> > > > > hyp
> > > > > functionalities if they are described, while actually they are 
> > > > > not
> > > > > emulated by Xen at all.
> > > > 
> > > > See Ian's earlier reply: It can also be considered natural for it 
> > > > to
> > > > be aware that when run in EL2 to not use EL1 functionality.
> > 
> > NB EL2 == Hyp and EL1 == Kernel, so it's the other way round, FWIW.
> > 
> > > It is not just about the GIC Hyp functionalities.
> > 
> > What else is there which is not subject to this logic? Timers are too, 
> > it
> > even applies to IOMMU's which have both stage1 and stage2 bits.
> > 
> > BTW, I think kernels _already_ need to deal with a lot of this because 
> > in
> > reality nobody modifies the DTB when they use a firmware which launches 
> > the
> > kernel in EL1. IOW I think the kernel is already aware of which 
> > resources
> > can be used by which privilege level.
> > 
> Yes, for resources specific to EL2 I believe that is indeed the case
> (the GIC driver doesn't look at the hypervisor control register address,
> and KVM does not even get that far if you're not booted in EL2, and the
> timer only uses the virtual timer if not booted in EL2 - we never
> attempt to use the hyp timer until Marc's VHE patches land, but they
> also depend on being booted in hyp mode).

Right, and I think that's almost always going to be the case by virtue of
the architecture. You can't use these resources from EL1 (however you got
there) so you need checks.

> However, what about for other resources?  Having code somewhere that
> says "hide this random piece of hardware if you're Xen dom0" sounds
> awful to me.  I know it's only the serial port right now, but still.

Right, that's the only bit I'm aware of that I'm not sure about, which is
why I was curious how x86 managed it (I've seen but not digested Jan's
answer).

At least this has the virtue of being an extra table, and tagging on a new
one of those has more limited impact on the tables I think.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST] ts-xen-install: Install netcat-openbsd

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 11:27 +0100, Ian Campbell wrote:
> On Thu, 2015-08-13 at 09:21 +0100, Ian Campbell wrote:
> > 
> > Which is pretty WTF... Logs for this one are at 
> > http://xenbits.xen.org/people/ianc/tmp/37828/
> > http://xenbits.xen.org/people/ianc/tmp/37828/test-amd64-amd64-libvirt
> > -pair/info.html
> > 
> > I'm doing an experiment now with $dho->{Ip} but that seems like it 
> > would
> > just be papering over the issue, whatever it is.
> 
> http://xenbits.xen.org/people/ianc/tmp/37831/ =>
> http://xenbits.xen.org/people/ianc/tmp/37831/test-amd64-amd64-libvirt
> -pair/21.ts-guest-migrate.log
> 
> 2015-08-13 09:42:45 Z executing ssh ... root@10.80.229.144 virsh migrate 
> --live debian.guest.osstest xen+ssh://10.80.228.77
> error: unable to connect to 'lace-bug.uk.xensource.com:49152': Invalid 
> argument
> 
> Which, just, WTF?

http://xenbits.xen.org/people/ianc/tmp/37831/test-amd64-amd64-libvirt-pair/moss-bug---var-log-libvirt-libvirtd.log.gz
is the libvirt log on the send (moss-bug == 10.80.229.144) has the first
mention of migrate as:

2015-08-13 09:42:45.982+: 3457: debug : virDomainMigratePerform3Params:5229 
: params["migrate_uri"]=(string)tcp://lace-bug.uk.xensource.com:49152

and no mentions of 10.80.228.77 anywhere, so I suppose whatever is going on
is a client side thing? Lets see if "virsh --debug 0" tells me anything
more.

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] PCI Pass-through in Xen ARM: Draft 4

2015-08-13 Thread Julien Grall
Hi Manish,

I left Linaro 4 months ago. Can you please use my email citrix
(julien.gr...@citrix.com).

On 13/08/15 10:42, Manish Jaggi wrote:
>  -
> 
> 
>  1.Background of PCI passthrough
>  -
> 
>  Passthrough refers to assigning a PCI device to a guest domain (domU) such
>  that the guest has full control over the device. The MMIO space /
> interrupts
>  are managed by the guest itself, close to how a bare kernel manages a
> device.
> 
>  Device's access to guest address space needs to be isolated and protected.
>  SMMU (System MMU - IOMMU in ARM) is programmed by xen hypervisor to allow
>  device access guest memory for data transfer and sending MSI/X interrupts.
>  PCI devices generated message signalled interrupt writes are within guest
>  address spaces which are also translated using SMMU.

In all this design you are only speaking about MSI interrupts. But what
about legacy interrupt?

AFAICT, it's possible for Linux to use it either because MSIs are not
supported or because the user asked it.

>  4.2.2Xenstore Update: For each PCI-EP BAR (IPA-PA mapping info).
>  
> 
>  Toolstack also updates the xenstore information for the device
>  (virtualbar:physical bar).This information is read by xen-pciback and
>  returned to the domU-pcifront driver configuration space reads for BAR.
> 
>  Entries created are as follows:
>  /local/domain/0/backend/pci/1/0
>  vdev-N
>  BDF = ""
>  BAR-0-IPA = ""
>  BAR-0-PA = ""
>  BAR-0-SIZE = ""
>  ...
>  BAR-M-IPA = ""
>  BAR-M-PA = ""
>  BAR-M-SIZE = ""
> 
>  Note: If BAR M SIZE is 0, it is not a valid entry.

How would you describe the ROM in xenstore?


regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6 2/3] xl: fix vNUMA vcpus parsing

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 11:28 +0100, Wei Liu wrote:
> On Thu, Aug 13, 2015 at 11:07:47AM +0100, Ian Campbell wrote:
> > On Thu, 2015-08-13 at 10:54 +0100, Wei Liu wrote:
> > > On Thu, Aug 13, 2015 at 10:42:26AM +0100, Ian Campbell wrote:
> > > > On Wed, 2015-08-12 at 20:36 +0100, Wei Liu wrote:
> > > > > Originally, if user didn't specify maxvcpus= in xl config file, 
> > > > > the
> > > > > maximum size of vcpu bitmap was always equal to maximum number of 
> > > > > 
> > > > > pcpus.
> > > > > This might not be what user wants.
> > > > 
> > > > What are you suggesting they wanted instead? We are only talking 
> > > > about 
> > > > the
> > > > bitmap right, and the typical/sensible config will have #vcpus <= 
> > > > #pcpus,
> > > > so they will fit even if they "waste" some bits during parsing.
> > > 
> > > #vcpus > #pcpus, bitmap is too small.
> > 
> > Right. Which is trivial to detect as we go through the parsing and 
> > raise an
> > appropriate error.
> > 
> > > > I'm almost inclined to suggest that if a user wants #vcpus > #pcpus 
> > > > 
> > > > they
> > > > should have to specify maxvcpus and not rely on the vnuma parsing 
> > > > code
> > > > inferring this fact.
> > > > 
> > > 
> > > I don't think we should prevent people from shooting themselves in 
> > > the
> > > foot.
> > 
> > If the cost of supporting that is this patch then I disagree. If you 
> > can
> > find a way to do it simply and cleanly then fine, maybe.
> > 
> 
> I would say let's make xl dumb. If the maximum number of vcpus specified
> in vNUMA doesn't match maximum number of vcpus specified in
> configuration file, error out. That would maybe lead to a smaller patch.
> 
> This was my first implementation of this parser routine but for some
> reason we tried to make xl smart (to sum up vcpus in vNUMA
> configuration), and now it's causing us trouble.

I disagree that xl should be dumb. It is perfectly fine for xl to do
convenient things automatically while still detecting things which are
likely to be incorrect or accidental or not handling all the strange edge
cases automatically (so long as they are possible manually).

Inferring #vcpus from vnuma config is a useful thing to do automatically,
but that doesn't preclude erroring out on edge cases.


> > In fact I even disagree in general, we can and should provide warnings 
> > or
> > errors for things which we know are bad and which are most likely
> > unintentional, but provide overrides.
> > 
> > > > IOW maybe this code could just error out (or print a warning) if 
> > > > this
> > > > happens? + a doc update.
> > > > 
> > > 
> > > Xl doesn't complain when you set vcpus > pcpus. I don't think vNUMA
> > > should behave differently.
> > 
> > Not always true, e.g. from vcpuset:
> > 
> >if (max_vcpus > dominfo.vcpu_online && max_vcpus > host_cpu) {
> > fprintf(stderr, "You are overcommmitting! You have %d 
> > physical" \
> > " CPUs and want %d vCPUs! Aborting, use --ignore
> > -host to" \
> > " continue\n", host_cpu, max_vcpus);
> > rc = 1;
> > }
> > 
> 
> That is for vcpuset command, not xl configuration file parser.


I did say "not always true" rather than "not true".

>  There is
> nothing preventing user from setting #vcpus > #pcpus and no warning
> woudl be given.

A bug IMHO.

>  The parser could use some improvement, but that's 4.7
> material.
> 
> Wei.
> 
> > Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Julien Grall
On 13/08/15 10:20, Jan Beulich wrote:
>> BTW, IIRC x86 does modify at least one ACPI table which is the DMAR (I
>> think), to hide the IOMMU from the guest? That's another table we would
>> want to frob on ARM I think (or it's equivalent, which I think is IORT).
> 
> Eliminating that hack is supposed to be on the VT-d maintainers'
> TODO list(s) - Dom0 has no business looking at that table (and its
> AMD counterpart already isn't being fiddled with in the same way).

ARM SMMU is supporting 2 stage in order to protect the device also by
the domain.

At some point we will expose it to DOM0 and therefore may need to modify
IORT.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.6] xen/mm: populate_physmap: validate correctly the gfn for direct mapped domain

2015-08-13 Thread Ian Campbell
On Thu, 2015-08-13 at 04:29 -0600, Jan Beulich wrote:
> > 
> > > > On 13.08.15 at 12:14,  wrote:
> > There isn't a race here is there? What if the reference were dropped 
> > after
> > this check but before the guest_physmap_add_page (which takes new
> > references)? We are implicitly relying on a guarantee made elsewhere 
> > that a
> > direct mapped guest never drops the final ref until it is destroyed 
> > (which
> > can't be happening in this window I think), but it would be obviously 
> > safer
> > against such bugs if we just held the ref until after the p2m was 
> > updated.
> 
> Yeah, we depend on this, and what you suggest would indeed be
> an improvment.
> 
> > Another thing for a future/4.7 cleanup I guess.
> 
> Right.
> 
> But as bottom line I take it that your objections to the patch going in
> soon have been eliminated?

Correct.

Acked-by: Ian Campbell 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH OSSTEST] ts-xen-install: Install netcat-openbsd

2015-08-13 Thread Ian Jackson
Ian Campbell writes ("[PATCH OSSTEST] ts-xen-install: Install netcat-openbsd"):
> This is required by libvirt for live migration (netcat-traditional
> doesn't cut it).
> 
> netcat-openbsd has higher update-alternatives priority, so it will be
> used if installed.

Acked-by: Ian Jackson 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 2/2] cambridge: arrange to test each new baseline

2015-08-13 Thread Ian Jackson
Ian Campbell writes ("Re: [Xen-devel] [PATCH v2 2/2] cambridge: arrange to test 
each new baseline"):
> I also considered refactoring this bit:
> 
>   if [ "x$treeurl" != xnone: ]; then
>   treearg=--tree-$tree=$treeurl
>   fi
>   tested_revision=`check_tested $treearg --print-revision=$tree`
>   if [ "x$tested_revision" != x ]; then
>   OLD_REVISION="$tested_revision"
>   fi
> 
> into a helper e.g. select_last_test_revision.

That might be best as it gives it a name.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq server

2015-08-13 Thread Yu, Zhang



On 8/13/2015 6:16 PM, Paul Durrant wrote:

-Original Message-
From: Yu Zhang [mailto:yu.c.zh...@linux.intel.com]
Sent: 13 August 2015 11:06
To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper
Cc: Kevin Tian; zhiyuan...@intel.com
Subject: [PATCH v4 1/2] Differentiate IO/mem resources tracked by ioreq
server

Currently in ioreq server, guest write-protected ram pages are
tracked in the same rangeset with device mmio resources. Yet
unlike device mmio, which can be in big chunks, the guest write-
protected pages may be discrete ranges with 4K bytes each.


Would the interfaces be better using xen_pfn_t rather than using uint64_t to 
pass addresses round where the bottom 12 bits will always be zero?

   Paul


Thank you, Paul.
Well, I'm not quite sure if the bottom 12 bits of the address are zero.
I had thought these addresses are just guest physical ones causing
the ept violation, and not aligned down to its page address, like the
MMIO. So, if our handler code has already done that, using xen_pfn_t
could be better(my developing machine encountered some hardware
problem, will check the code tomorrow) :)

Yu




This patch uses a seperate rangeset for the guest ram pages.
And a new ioreq type, IOREQ_TYPE_WP_MEM, is defined.

Note: Previously, a new hypercall or subop was suggested to map
write-protected pages into ioreq server. However, it turned out
handler of this new hypercall would be almost the same with the
existing pair - HVMOP_[un]map_io_range_to_ioreq_server, and there's
already a type parameter in this hypercall. So no new hypercall
defined, only a new type is introduced.

Signed-off-by: Yu Zhang 
---
  tools/libxc/include/xenctrl.h| 31 ++
  tools/libxc/xc_domain.c  | 55

  xen/arch/x86/hvm/hvm.c   | 26 ++-
  xen/include/asm-x86/hvm/domain.h |  4 +--
  xen/include/public/hvm/hvm_op.h  |  1 +
  xen/include/public/hvm/ioreq.h   |  1 +
  6 files changed, 115 insertions(+), 3 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index de3c0ad..53f196d 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2010,6 +2010,37 @@ int
xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch,
  int is_mmio,
  uint64_t start,
  uint64_t end);
+/**
+ * This function registers a range of write-protected memory for emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
+domid_t domid,
+ioservid_t id,
+uint64_t start,
+uint64_t end);
+
+/**
+ * This function deregisters a range of write-protected memory for
emulation.
+ *
+ * @parm xch a handle to an open hypervisor interface.
+ * @parm domid the domain id to be serviced
+ * @parm id the IOREQ Server id.
+ * @parm start start of range
+ * @parm end end of range (inclusive).
+ * @return 0 on success, -1 on failure.
+ */
+int xc_hvm_unmap_mem_range_from_ioreq_server(xc_interface *xch,
+domid_t domid,
+ioservid_t id,
+uint64_t start,
+uint64_t end);

  /**
   * This function registers a PCI device for config space emulation.
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index 2ee26fb..42aeba9 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1552,6 +1552,61 @@ int
xc_hvm_unmap_io_range_from_ioreq_server(xc_interface *xch, domid_t
domid,
  return rc;
  }

+int xc_hvm_map_mem_range_to_ioreq_server(xc_interface *xch,
domid_t domid,
+ioservid_t id, uint64_t start, 
uint64_t end)
+{
+DECLARE_HYPERCALL;
+DECLARE_HYPERCALL_BUFFER(xen_hvm_io_range_t, arg);
+int rc;
+
+arg = xc_hypercall_buffer_alloc(xch, arg, sizeof(*arg));
+if ( arg == NULL )
+return -1;
+
+hypercall.op = __HYPERVISOR_hvm_op;
+hypercall.arg[0] = HVMOP_map_io_range_to_ioreq_server;
+hypercall.arg[1] = HYPERCALL_BUFFER_AS_ARG(arg);
+
+arg->domid = domid;
+arg->id = id;
+arg->type = HVMOP_IO_RANGE_WP_MEM;
+arg->start = start;
+arg->end = end;
+
+rc = do_xen_hypercall(xch, &hypercall);
+
+xc_hypercall_buffer_free(xch, arg);
+return rc;
+}
+
+int xc_hvm_unmap_mem_range_from

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Shannon Zhao


On 2015/8/13 18:29, Christoffer Dall wrote:
> On Thu, Aug 13, 2015 at 11:22:19AM +0100, Ian Campbell wrote:
>> On Thu, 2015-08-13 at 11:13 +0100, Stefano Stabellini wrote:
>>>
> For example it is only natural for the kernel to try to use the GIC 
> hyp
> functionalities if they are described, while actually they are not
> emulated by Xen at all.

 See Ian's earlier reply: It can also be considered natural for it to
 be aware that when run in EL2 to not use EL1 functionality.
>>
>> NB EL2 == Hyp and EL1 == Kernel, so it's the other way round, FWIW.
>>
>>> It is not just about the GIC Hyp functionalities.
>>
>> What else is there which is not subject to this logic? Timers are too, it
>> even applies to IOMMU's which have both stage1 and stage2 bits.
>>
>> BTW, I think kernels _already_ need to deal with a lot of this because in
>> reality nobody modifies the DTB when they use a firmware which launches the
>> kernel in EL1. IOW I think the kernel is already aware of which resources
>> can be used by which privilege level.
>>
> Yes, for resources specific to EL2 I believe that is indeed the case
> (the GIC driver doesn't look at the hypervisor control register address,
> and KVM does not even get that far if you're not booted in EL2, and the
> timer only uses the virtual timer if not booted in EL2 - we never
> attempt to use the hyp timer until Marc's VHE patches land, but they
> also depend on being booted in hyp mode).
> 
> However, what about for other resources?  Having code somewhere that
> says "hide this random piece of hardware if you're Xen dom0" sounds
> awful to me.  I know it's only the serial port right now, but still.
> 
It needs to modify MADT table according to dom0_max_vcpus and modify EFI
MMAP table according to dom0_mem. And modify FADT table to set
hypervisor_id as "XenVMM" to tell Dom0 that it runs on Xen hypervisor.

--
Shannon



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] add page_get_owner_and_reference() related ASSERT()s

2015-08-13 Thread Jan Beulich
The function shouldn't return NULL after having obtained a reference,
or else the caller won't know to drop it.

Also its result shouldn't be ignored - if calling code is certain that
a page already has a non-zero refcount, it better ASSERT()s so.

Finally this as well as get_page() and put_page() are required to be
available on all architectures - move the declarations to xen/mm.h.

Signed-off-by: Jan Beulich 
Reviewed-by: Tim Deegan 
---
v2: Add comment to grant table change as requested by Ian.

--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -1,10 +1,8 @@
-#include 
 #include 
 #include 
+#include 
 #include 
 #include 
-
-#include 
 #include 
 
 static unsigned long raw_copy_to_guest_helper(void *to, const void *from,
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1170,6 +1170,7 @@ long arch_memory_op(int op, XEN_GUEST_HA
 struct domain *page_get_owner_and_reference(struct page_info *page)
 {
 unsigned long x, y = page->count_info;
+struct domain *owner;
 
 do {
 x = y;
@@ -1182,7 +1183,10 @@ struct domain *page_get_owner_and_refere
 }
 while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
 
-return page_get_owner(page);
+owner = page_get_owner(page);
+ASSERT(owner);
+
+return owner;
 }
 
 void put_page(struct page_info *page)
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -2038,6 +2038,7 @@ void put_page(struct page_info *page)
 struct domain *page_get_owner_and_reference(struct page_info *page)
 {
 unsigned long x, y = page->count_info;
+struct domain *owner;
 
 do {
 x = y;
@@ -2051,7 +2052,10 @@ struct domain *page_get_owner_and_refere
 }
 while ( (y = cmpxchg(&page->count_info, x, x + 1)) != x );
 
-return page_get_owner(page);
+owner = page_get_owner(page);
+ASSERT(owner);
+
+return owner;
 }
 
 
--- a/xen/common/grant_table.c
+++ b/xen/common/grant_table.c
@@ -2244,7 +2244,12 @@ __acquire_grant_for_copy(
 {
 ASSERT(mfn_valid(act->frame));
 *page = mfn_to_page(act->frame);
-(void)page_get_owner_and_reference(*page);
+td = page_get_owner_and_reference(*page);
+/*
+ * act->pin being non-zero should guarantee the page to have a
+ * non-zero refcount and hence a valid owner.
+ */
+ASSERT(td);
 }
 
 act->pin += readonly ? GNTPIN_hstr_inc : GNTPIN_hstw_inc;
--- a/xen/include/asm-arm/mm.h
+++ b/xen/include/asm-arm/mm.h
@@ -275,10 +275,6 @@ static inline void *page_to_virt(const s
 return mfn_to_virt(page_to_mfn(pg));
 }
 
-struct domain *page_get_owner_and_reference(struct page_info *page);
-void put_page(struct page_info *page);
-int  get_page(struct page_info *page, struct domain *domain);
-
 struct page_info *get_page_from_gva(struct domain *d, vaddr_t va,
 unsigned long flags);
 
--- a/xen/include/asm-x86/mm.h
+++ b/xen/include/asm-x86/mm.h
@@ -352,9 +352,6 @@ const unsigned long *get_platform_badpag
 int page_lock(struct page_info *page);
 void page_unlock(struct page_info *page);
 
-struct domain *page_get_owner_and_reference(struct page_info *page);
-void put_page(struct page_info *page);
-int  get_page(struct page_info *page, struct domain *domain);
 void put_page_type(struct page_info *page);
 int  get_page_type(struct page_info *page, unsigned long type);
 int  put_page_type_preemptible(struct page_info *page);
--- a/xen/include/xen/mm.h
+++ b/xen/include/xen/mm.h
@@ -45,6 +45,7 @@
 #ifndef __XEN_MM_H__
 #define __XEN_MM_H__
 
+#include 
 #include 
 #include 
 #include 
@@ -77,9 +78,12 @@ TYPE_SAFE(unsigned long, pfn);
 #undef pfn_t
 #endif
 
-struct domain;
 struct page_info;
 
+void put_page(struct page_info *);
+int get_page(struct page_info *, struct domain *);
+struct domain *__must_check page_get_owner_and_reference(struct page_info *);
+
 /* Boot-time allocator. Turns into generic allocator after bootstrap. */
 void init_boot_pages(paddr_t ps, paddr_t pe);
 unsigned long alloc_boot_pages(


add page_get_owner_and_reference() related ASSERT()s

The function shouldn't return NULL after having obtained a reference,
or else the caller won't know to drop it.

Also its result shouldn't be ignored - if calling code is certain that
a page already has a non-zero refcount, it better ASSERT()s so.

Finally this as well as get_page() and put_page() are required to be
available on all architectures - move the declarations to xen/mm.h.

Signed-off-by: Jan Beulich 
Reviewed-by: Tim Deegan 
---
v2: Add comment to grant table change as requested by Ian.

--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -1,10 +1,8 @@
-#include 
 #include 
 #include 
+#include 
 #include 
 #include 
-
-#include 
 #include 
 
 static unsigned long raw_copy_to_guest_helper(void *to, const void *from,
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -1170,6 +1170,7 @@ long arch_memory_op(int op, XEN_GUEST_HA
 struct domain *page_get_owner_and_reference(struct page_inf

Re: [Xen-devel] [PATCH v4 2/2] Refactor rangeset structure for better performance.

2015-08-13 Thread Yu, Zhang



On 8/13/2015 6:33 PM, Paul Durrant wrote:

-Original Message-
From: Yu Zhang [mailto:yu.c.zh...@linux.intel.com]
Sent: 13 August 2015 11:06
To: xen-devel@lists.xen.org; Paul Durrant; Ian Jackson; Stefano Stabellini; Ian
Campbell; Wei Liu; Keir (Xen.org); jbeul...@suse.com; Andrew Cooper
Cc: Kevin Tian; zhiyuan...@intel.com
Subject: [PATCH v4 2/2] Refactor rangeset structure for better performance.

This patch refactors struct rangeset to base it on the red-black
tree structure, instead of on the current doubly linked list. By
now, ioreq leverages rangeset to keep track of the IO/memory
resources to be emulated. Yet when number of ranges inside one
ioreq server is very high, traversing a doubly linked list could
be time consuming. With this patch, the time complexity for
searching a rangeset can be improved from O(n) to O(log(n)).
Interfaces of rangeset still remain the same, and no new APIs
introduced.

Signed-off-by: Yu Zhang 
---
  xen/common/rangeset.c | 82
+--
  1 file changed, 60 insertions(+), 22 deletions(-)

diff --git a/xen/common/rangeset.c b/xen/common/rangeset.c
index 6c6293c..87b6aab 100644
--- a/xen/common/rangeset.c
+++ b/xen/common/rangeset.c
@@ -10,11 +10,12 @@
  #include 
  #include 
  #include 
+#include 
  #include 

  /* An inclusive range [s,e] and pointer to next range in ascending order. */
  struct range {
-struct list_head list;
+struct rb_node node;
  unsigned long s, e;
  };

@@ -24,7 +25,7 @@ struct rangeset {
  struct domain   *domain;

  /* Ordered list of ranges contained in this set, and protecting lock. */
-struct list_head range_list;
+struct rb_root   range_tree;

  /* Number of ranges that can be allocated */
  long nr_ranges;
@@ -45,41 +46,78 @@ struct rangeset {
  static struct range *find_range(
  struct rangeset *r, unsigned long s)
  {
-struct range *x = NULL, *y;
+struct rb_node *node;
+struct range   *x;
+struct range   *prev = NULL;

-list_for_each_entry ( y, &r->range_list, list )
+node = r->range_tree.rb_node;
+while ( node )
  {
-if ( y->s > s )
-break;
-x = y;
+x = container_of(node, struct range, node);
+if ( (s >= x->s) && (s <= x->e) )
+return x;
+if ( s < x->s )
+node = node->rb_left;
+else if ( s > x->s )


Do you need else if there? I think else would do. s either has to be < x->s or > 
x->e to get to this point.

Thanks. You are right. :)



+{
+prev = x;
+node = node->rb_right;
+}
  }

-return x;
+return prev;


Do you not want to return NULL here? It looks like you'd only get here if there 
was no matching range.

Well, prev can either be NULL(like its initial value), or pointing to
the previous node of the range. It's to keep the original intention of
routine find_range() - to return the range if found, or return the
previous one, or NULL.

B.R.
Yu


   Paul


  }

  /* Return the lowest range in the set r, or NULL if r is empty. */
  static struct range *first_range(
  struct rangeset *r)
  {
-if ( list_empty(&r->range_list) )
-return NULL;
-return list_entry(r->range_list.next, struct range, list);
+struct rb_node *node;
+
+node = rb_first(&r->range_tree);
+if ( node != NULL )
+return container_of(node, struct range, node);
+
+return NULL;
  }

  /* Return range following x in ascending order, or NULL if x is the highest. 
*/
  static struct range *next_range(
  struct rangeset *r, struct range *x)
  {
-if ( x->list.next == &r->range_list )
-return NULL;
-return list_entry(x->list.next, struct range, list);
+struct rb_node *node;
+
+node = rb_next(&x->node);
+if ( node )
+return container_of(node, struct range, node);
+
+return NULL;
  }

  /* Insert range y after range x in r. Insert as first range if x is NULL. */
  static void insert_range(
  struct rangeset *r, struct range *x, struct range *y)
  {
-list_add(&y->list, (x != NULL) ? &x->list : &r->range_list);
+struct rb_node **node;
+struct rb_node *parent = NULL;
+
+if ( x == NULL )
+node = &r->range_tree.rb_node;
+else
+{
+node = &x->node.rb_right;
+parent = &x->node;
+}
+
+while ( *node )
+{
+parent = *node;
+node = &parent->rb_left;
+}
+
+/* Add new node and rebalance the red-black tree. */
+rb_link_node(&y->node, parent, node);
+rb_insert_color(&y->node, &r->range_tree);
  }

  /* Remove a range from its list and free it. */
@@ -88,7 +126,7 @@ static void destroy_range(
  {
  r->nr_ranges++;

-list_del(&x->list);
+rb_erase(&x->node, &r->range_tree);
  xfree(x);
  }

@@ -319,7 +357,7 @@ bool_t rangeset_contains_singleton(
  bool_t rangeset_is_empty(
  const struct rangeset *r)
  {
-return ((r == NULL) || list_e

Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Jan Beulich
>>> On 13.08.15 at 12:48,  wrote:
> On 2015/8/13 18:29, Christoffer Dall wrote:
>> However, what about for other resources?  Having code somewhere that
>> says "hide this random piece of hardware if you're Xen dom0" sounds
>> awful to me.  I know it's only the serial port right now, but still.
>> 
> It needs to modify MADT table according to dom0_max_vcpus and modify EFI
> MMAP table according to dom0_mem. And modify FADT table to set
> hypervisor_id as "XenVMM" to tell Dom0 that it runs on Xen hypervisor.

None of this should be required:

Unaltered MADT may (or should I say will) be needed to drive P- or
C-states.

UEFI memmap shouldn't be exposed to Dom0 at all.

Detecting running on Xen should (hopefully) be possible by other
means for Dom0.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86/dmi_scan: only honor end-of-table for 64-bit tables

2015-08-13 Thread Jan Beulich
From: Jean Delvare 

A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.

Signed-off-by: Jean Delvare 
[Linux commit 17cd5bd5391e6e7b363d66335e1bc6760ae969b9]
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/dmi_scan.c
+++ b/xen/arch/x86/dmi_scan.c
@@ -148,10 +148,11 @@ static int __init dmi_table(paddr_t base
data = buf;
 
/*
-*  Stop when we see all the items the table claimed to have
-*  OR we run off the end of the table (also happens)
-*/
- 
+* Stop when we have seen all the items the table claimed to have
+* (SMBIOS < 3.0 only) OR we reach an end-of-table marker (SMBIOS
+* >= 3.0 only) OR we run off the end of the table (should never
+* happen but sometimes does on bogus implementations.)
+*/
while((num < 0 || i < num) && data-buf+sizeof(struct dmi_header)<=len)
{
dm=(struct dmi_header *)data;
@@ -165,8 +166,16 @@ static int __init dmi_table(paddr_t base
data++;
if(data-buftype == DMI_ENTRY_END_OF_TABLE)
-   break;
+   /*
+* 7.45 End-of-Table (Type 127) [SMBIOS reference spec v3.0.0]
+* For tables behind a 64-bit entry point, we have no item
+* count and no exact table length, so stop on end-of-table
+* marker. For tables behind a 32-bit entry point, we have
+* seen OEM structures behind the end-of-table marker on
+* some systems, so don't trust it.
+*/
+   if (num < 0 && dm->type == DMI_ENTRY_END_OF_TABLE)
+   break;
data+=2;
i++;
}



x86/dmi_scan: only honor end-of-table for 64-bit tables

A 32-bit entry point to a DMI table says how many structures the table
contains. The SMBIOS specification explicitly says that end-of-table
markers should be ignored if they are not actually at the end of the
DMI table. So only honor the end-of-table marker for tables accessed
through 64-bit entry points, as they do not specify a structure count.

Signed-off-by: Jean Delvare 
[Linux commit 17cd5bd5391e6e7b363d66335e1bc6760ae969b9]
Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/dmi_scan.c
+++ b/xen/arch/x86/dmi_scan.c
@@ -148,10 +148,11 @@ static int __init dmi_table(paddr_t base
data = buf;
 
/*
-*  Stop when we see all the items the table claimed to have
-*  OR we run off the end of the table (also happens)
-*/
- 
+* Stop when we have seen all the items the table claimed to have
+* (SMBIOS < 3.0 only) OR we reach an end-of-table marker (SMBIOS
+* >= 3.0 only) OR we run off the end of the table (should never
+* happen but sometimes does on bogus implementations.)
+*/
while((num < 0 || i < num) && data-buf+sizeof(struct dmi_header)<=len)
{
dm=(struct dmi_header *)data;
@@ -165,8 +166,16 @@ static int __init dmi_table(paddr_t base
data++;
if(data-buftype == DMI_ENTRY_END_OF_TABLE)
-   break;
+   /*
+* 7.45 End-of-Table (Type 127) [SMBIOS reference spec v3.0.0]
+* For tables behind a 64-bit entry point, we have no item
+* count and no exact table length, so stop on end-of-table
+* marker. For tables behind a 32-bit entry point, we have
+* seen OEM structures behind the end-of-table marker on
+* some systems, so don't trust it.
+*/
+   if (num < 0 && dm->type == DMI_ENTRY_END_OF_TABLE)
+   break;
data+=2;
i++;
}
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Julien Grall
On 13/08/15 11:54, Jan Beulich wrote:
 On 13.08.15 at 12:48,  wrote:
>> On 2015/8/13 18:29, Christoffer Dall wrote:
>>> However, what about for other resources?  Having code somewhere that
>>> says "hide this random piece of hardware if you're Xen dom0" sounds
>>> awful to me.  I know it's only the serial port right now, but still.
>>>
>> It needs to modify MADT table according to dom0_max_vcpus and modify EFI
>> MMAP table according to dom0_mem. And modify FADT table to set
>> hypervisor_id as "XenVMM" to tell Dom0 that it runs on Xen hypervisor.
> 
> None of this should be required:
> 
> Unaltered MADT may (or should I say will) be needed to drive P- or
> C-states.

We have to alter MADT for different reasons:
- restricting the number of vCPUs
- update the CPU bring up method
- changing the CPUID

Maybe we should pass a backup but we don't support cpufreq driver right
now and it would need quite a look of work to do it (see [1]).

> UEFI memmap shouldn't be exposed to Dom0 at all.

We have to craft a UEFI memmap in order to notify Dom0 where are the RAM
regions.

> Detecting running on Xen should (hopefully) be possible by other
> means for Dom0.

How there is no cpuid like on ARM. So no possibility to know whether you
are running on Xen or another hypervisor...

Modifying the FADT is the only viable solution we have today in order to
tell that it's running on Xen.

Regards,

[1]http://lists.xen.org/archives/html/xen-devel/2014-10/msg02883.html

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Design doc of adding ACPI support for arm64 on Xen - version 2

2015-08-13 Thread Stefano Stabellini
On Thu, 13 Aug 2015, Julien Grall wrote:
> On 13/08/15 11:54, Jan Beulich wrote:
>  On 13.08.15 at 12:48,  wrote:
> >> On 2015/8/13 18:29, Christoffer Dall wrote:
> >>> However, what about for other resources?  Having code somewhere that
> >>> says "hide this random piece of hardware if you're Xen dom0" sounds
> >>> awful to me.  I know it's only the serial port right now, but still.
> >>>
> >> It needs to modify MADT table according to dom0_max_vcpus and modify EFI
> >> MMAP table according to dom0_mem. And modify FADT table to set
> >> hypervisor_id as "XenVMM" to tell Dom0 that it runs on Xen hypervisor.
> > 
> > None of this should be required:
> > 
> > Unaltered MADT may (or should I say will) be needed to drive P- or
> > C-states.
> 
> We have to alter MADT for different reasons:
>   - restricting the number of vCPUs
>   - update the CPU bring up method
>   - changing the CPUID
> 
> Maybe we should pass a backup but we don't support cpufreq driver right
> now and it would need quite a look of work to do it (see [1]).
> 
> > UEFI memmap shouldn't be exposed to Dom0 at all.
> 
> We have to craft a UEFI memmap in order to notify Dom0 where are the RAM
> regions.

Right. Keep in mind that there are no legacy interfaces to get these
info on ARM. All we have is ACPI and EFI.


> > Detecting running on Xen should (hopefully) be possible by other
> > means for Dom0.
> 
> How there is no cpuid like on ARM. So no possibility to know whether you
> are running on Xen or another hypervisor...
> 
> Modifying the FADT is the only viable solution we have today in order to
> tell that it's running on Xen.

We also have the XENV table.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


  1   2   >