date:20150113

On 13/01/15 16:46, Ian Campbell wrote:
 We need to track everything for interrupt assignment to a guest/dom0. So
 if the guest ask for a free vIRQ we can give it directly.
 
 Makes sense.
 
 In that case you 0/4 mail doesn't fully describe the use case for the
 series, since it talks about the dom0 PPI only.

Sorry I skipped this comment by inadvertence. My cover letter was
explaining the current use case, I didn't think to explain the future
use case. I will update the cover letter.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

On 01/13/2015 12:56 AM, Jan Beulich wrote:
 On 12.01.15 at 18:36, edmund.h.wh...@intel.com wrote:
 On 01/12/2015 02:00 AM, Jan Beulich wrote:
 On 10.01.15 at 00:04, edmund.h.wh...@intel.com wrote:
 On 01/09/2015 02:41 PM, Andrew Cooper wrote:
 Having some non-OS part of the guest swap the EPT tables and
 accidentally turn a DMA buffer read-only is not going to end well.


 The agent can certainly do bad things, and at some level you have to 
 assume 
 it
 is sensible enough not to. However, I'm not sure this is fundamentally more
 dangerous than what a privileged domain can do today using the MEMOP...
 operations, and people are already using those for very similar purposes.

 I don't follow - how is what privileged domain can do related to the
 proposed changes here (which are - via VMFUNC - at least partially
 guest controllable, and that's also the case Andrew mentioned in his
 reply)? I'm having a hard time understanding how a P2M stripped of
 anything that's not plain RAM can be very useful to a guest. IOW
 without such fundamental aspects clarified I don't see a point in
 looking at the individual patches (which btw, according to your
 wording elsewhere, should have been marked RFC).

 In this patch series, none of the new hypercalls are protected by xsm
 policies. Earlier in the process of working on this code, I added such
 a check to all the hypercalls, but then removed them all because it
 dawned on me that I didn't actually understand what I was doing and
 my code only worked because I only ever built the dummy permit everything
 policy.

 Should some version of this patch series be accepted, my hope is that
 someone who does understand xsm policies would put the appropriate checks
 in place, and at that point I maintain that these extra capabilities
 would not be fundamentally more dangerous than existing mechanisms
 available to privileged domains, because policy can prevent the guest
 using vmfunc. That's obviously not true today.
 
 Please simply consult with the XSM maintainer on questions/issues
 like this. Proposing a partial (insecure) patch set isn't appropriate.
 
 The alternate p2m's only contain entries for ram pages with valid mfn's.
 All other page types are still handled in the nested page fault handler
 for the host p2m. Those pages (at least the ones I've encountered) don't
 require the hardware to have a valid EPTE for the page.
 
 I.e. the functionality requiring e.g. p2m_ram_logdirty and
 p2m_mmio_direct is then incompatible with your proposed additions
 (which I think was also already noted by Andrew). That's imo not
 a basis to think about accepting (or even reviewing) the series.

Andrew raised that question, and I answered that pages needing
special handling are compatible with these changes. Unless I
misunderstood him, he accepted that.

If the hardware is never intended to be able to satisfy an access to
a page without generating an EPT violation, then all the hardware
needs is a set of EPT's that guarantee that behaviour. These changes
take of advantage of that to avoid copying any of the EPTE's for special
pages into the alternate p2m's. Instead, the nested page fault handler
for the alternate p2m returns a status to indicate that the host p2m
nested page fault handler should handle the violation using the data
in the host p2m.

If the result is that the page becomes ram in the host p2m and the
instruction is restarted, the hardware will generate another violation
and this time the EPTE will be copied.

This works. I have vram log-dirty working, something that does not work
with the nestedhvm nested EPT code.

Ed


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] make error codes a formal part of the ABI

Ian Campbell writes (Re: [PATCH RFC] make error codes a formal part of the 
ABI):
 On Tue, 2015-01-13 at 16:21 +, Jan Beulich wrote:
  There's on small block commented with TBD left in the public header.
  This is the main reason for the submission being RFC. While we don't
  currently use these error codes, I'm not sure if we should leave all
  or some of them out for the time being.
 
 I say lets omit any we don't use for now.

Is it possible that anyone is using the existing header file where
these values were defined ?  If so their code might say
  case ELOOP:
which would not compile when they switched to the new header.

I don't know whether this is likely, or a problem.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 05/19] libxl: add vmemrange to libxl__domain_build_state

Wei Liu writes ([PATCH v3 05/19] libxl: add vmemrange to 
libxl__domain_build_state):
 A vnode consists of one or more vmemranges (virtual memory range).  One
 example of multiple vmemranges is that there is a hole in one vnode.

I'm finding this series a bit oddly structured.  This patch, for
example, just introduces some new fields to an internal state struct -
but these fields are not initialised, set, or read.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 18/19] libxlutil: nested list support

Wei Liu writes (Re: [PATCH v3 18/19] libxlutil: nested list support):
 On Tue, Jan 13, 2015 at 03:52:48PM +, Ian Jackson wrote:
  This commit message is very brief.  For example, under the heading of
  `Rework internal representation of setting' I would expect a clear
  description of every formulaic change.
 
 Originally the internal representation of setting is (string, string)
 pair, the first string being the name of the setting, second string
 being the value of the setting. Now the internal is changed to (string,
 ConfigValue) pair, where ConfigValue can refer to a string or a list of
 ConfigValue's. Internal functions to deal with setting are changed
 accordingly.
 
 Does the above description makes things clearer?

Yes.  Something like that should be in the commit message.  It would
help to refer to the actual type names.  You could say (if true) for
example, internal functions new refer to a ConfigSetting; the public
APIs still talk about ConfigValues or some such.

  Also, I think would be much easier to review if split up into 3 parts,
  which from the description above ought to be doable without trouble.
 
 OK. I can try to split this patch into three.

If it's difficult for some reason then do get back to me.

  AFAICT from your changes, the API is not backward compatible.  ICBW,
  but if I'm right that's not acceptable I'm afraid, even in libxlu.
 
 The old APIs still have the same semantic as before, so any applications
 linked against those APIs still have the same results returned.

Oh yes.  Sorry, I had misread the patch and read your changes to
libxlu_internal.h as being in libxlutil.h.

   Previous APIs work as before.
  
  That can't be right because you have to at least specify how they deal
  with the additional config file syntax.
 
 No, the old APIs don't deal with new syntax. If applications want to
 support new syntax, they need to use new API.

It's obvious that the new API can't return the new syntax.  The
question is what happens if you try.

 If old APIs try to get value from new syntax, it has no effect.

I don't think has no effect can be right.  What is the actual return
value from the function ?  Will it be treated as an error ?  (IMO it
should be, and this should be documented.)

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PULL 2/2] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-13 Thread Stefano Stabellini

Increase maxmem before calling xc_domain_populate_physmap_exact to
avoid the risk of running out of guest memory. This way we can also
avoid complex memory calculations in libxl at domain construction
time.

This patch fixes an abort() when assigning more than 4 NICs to a VM.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Signed-off-by: Don Slutz dsl...@verizon.com
---
 xen-hvm.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/xen-hvm.c b/xen-hvm.c
index 7548794..e2e575b 100644
--- a/xen-hvm.c
+++ b/xen-hvm.c
@@ -90,6 +90,12 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
*shared_page, int vcpu)
 #endif
 
 #define BUFFER_IO_MAX_DELAY  100
+/* Leave some slack so that hvmloader does not complain about lack of
+ * memory at boot time (Could not allocate order=0 extent).
+ * Once hvmloader is modified to cope with that situation without
+ * printing warning messages, QEMU_SPARE_PAGES can be removed.
+ */
+#define QEMU_SPARE_PAGES 16
 
 typedef struct XenPhysmap {
 hwaddr start_addr;
@@ -244,6 +250,8 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr)
 unsigned long nr_pfn;
 xen_pfn_t *pfn_list;
 int i;
+xc_domaininfo_t info;
+unsigned long free_pages;
 
 if (runstate_check(RUN_STATE_INMIGRATE)) {
 /* RAM already populated in Xen */
@@ -266,6 +274,22 @@ void xen_ram_alloc(ram_addr_t ram_addr, ram_addr_t size, 
MemoryRegion *mr)
 pfn_list[i] = (ram_addr  TARGET_PAGE_BITS) + i;
 }
 
+if ((xc_domain_getinfolist(xen_xc, xen_domid, 1, info) != 1) ||
+(info.domain != xen_domid)) {
+hw_error(xc_domain_getinfolist failed);
+}
+free_pages = info.max_pages - info.tot_pages;
+if (free_pages  QEMU_SPARE_PAGES) {
+free_pages -= QEMU_SPARE_PAGES;
+} else {
+free_pages = 0;
+}
+if ((free_pages  nr_pfn) 
+(xc_domain_setmaxmem(xen_xc, xen_domid,
+ ((info.max_pages + nr_pfn - free_pages)
+   (XC_PAGE_SHIFT - 10)))  0)) {
+hw_error(xc_domain_setmaxmem failed);
+}
 if (xc_domain_populate_physmap_exact(xen_xc, xen_domid, nr_pfn, 0, 0, 
pfn_list)) {
 hw_error(xen: failed to populate ram at  RAM_ADDR_FMT, ram_addr);
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PULL 1/2] xen-pt: Fix PCI devices re-attach failed

2015-01-13 Thread Stefano Stabellini

From: Liang Li liang.z...@intel.com

Use the 'xl pci-attach $DomU $BDF' command to attach more than
one PCI devices to the guest, then detach the devices with
'xl pci-detach $DomU $BDF', after that, re-attach these PCI
devices again, an error message will be reported like following:

libxl: error: libxl_qmp.c:287:qmp_handle_error_response: receive
an error message from QMP server: Duplicate ID 'pci-pt-03_10.1'
for device.

If using the 'address_space_memory' as the parameter of
'memory_listener_register', 'xen_pt_region_del' will not be called
if the memory region's name is not 'xen-pci-pt-*' when the devices
is detached. This will cause the device's related QemuOpts object
not be released properly.

Using the device's address space can avoid such issue, because the
calling count of 'xen_pt_region_add' when attaching and the calling
count of 'xen_pt_region_del' when detaching is the same, so all the
memory region ref and unref by the 'xen_pt_region_add' and
'xen_pt_region_del' can be released properly.

Signed-off-by: Liang Li liang.z...@intel.com
Reviewed-by: Paolo Bonzini pbonz...@redhat.com
Reported-by: Longtao Pang longtaox.p...@intel.com
---
 hw/xen/xen_pt.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/xen/xen_pt.c b/hw/xen/xen_pt.c
index c1bf357..f2893b2 100644
--- a/hw/xen/xen_pt.c
+++ b/hw/xen/xen_pt.c
@@ -736,7 +736,7 @@ static int xen_pt_initfn(PCIDevice *d)
 }
 
 out:
-memory_listener_register(s-memory_listener, address_space_memory);
+memory_listener_register(s-memory_listener, s-dev.bus_master_as);
 memory_listener_register(s-io_listener, address_space_io);
 XEN_PT_LOG(d,
Real physical device %02x:%02x.%d registered successfully!\n,
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCHv1 0/3 net-next] xen-netfront: refactor making Tx requests

As netfront as evolved to handle different sorts of skbs the code to
fill a Tx requests has been copy and pasted several times.  The series
refactors this and a few other areas.

The first patch is to a Xen header but this can be merged via
net-next.

David


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] xen-netfront: refactor skb slot counting

A function to count the number of slots an skb needs is more useful
than one that counts the slots needed for only the frags.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 drivers/net/xen-netfront.c |   13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 22bcb4e..6b29b3a 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -521,13 +521,15 @@ static void xennet_make_frags(struct sk_buff *skb, struct 
netfront_queue *queue,
 }
 
 /*
- * Count how many ring slots are required to send the frags of this
- * skb. Each frag might be a compound page.
+ * Count how many ring slots are required to send this skb. Each frag
+ * might be a compound page.
  */
-static int xennet_count_skb_frag_slots(struct sk_buff *skb)
+static int xennet_count_skb_slots(struct sk_buff *skb)
 {
int i, frags = skb_shinfo(skb)-nr_frags;
-   int pages = 0;
+   int pages;
+
+   pages = PFN_UP(offset_in_page(skb-data) + skb_headlen(skb));
 
for (i = 0; i  frags; i++) {
skb_frag_t *frag = skb_shinfo(skb)-frags + i;
@@ -597,8 +599,7 @@ static int xennet_start_xmit(struct sk_buff *skb, struct 
net_device *dev)
goto drop;
}
 
-   slots = DIV_ROUND_UP(offset + len, PAGE_SIZE) +
-   xennet_count_skb_frag_slots(skb);
+   slots = xennet_count_skb_slots(skb);
if (unlikely(slots  MAX_SKB_FRAGS + 1)) {
net_dbg_ratelimited(xennet: skb rides the rocket: %d slots, %d 
bytes\n,
slots, skb-len);
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 00/24] xen/arm: Add support for non-pci passthrough

On 13/01/15 14:25, Julien Grall wrote:
 This series has been tested on Midway by assigning the secondary network card
 to a guest (see instruction below). I plan to do futher testing on other
 boards.

I forgot to precise that only changes has only been build tested on x86.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/2] xen/arm: Misc for grant-table

Hi all,

This series contains a couple of change for the grant-table header.

The first one only removed an unused/misplaced define. The second one,
increase the number of grant frame iniatialize when the domain is created.

Regards,

Julien Grall (2):
  xen/arm: Remove the define INVALID_GFN from arch-arm/grant_table.h
  xen/arm: grant-table: Increased the initial number of grant frame to 4

 xen/include/asm-arm/grant_table.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] make error codes a formal part of the ABI

 On 13.01.15 at 17:57, ian.jack...@eu.citrix.com wrote:
 Ian Campbell writes (Re: [PATCH RFC] make error codes a formal part of the 
 ABI):
 On Tue, 2015-01-13 at 16:21 +, Jan Beulich wrote:
  There's on small block commented with TBD left in the public header.
  This is the main reason for the submission being RFC. While we don't
  currently use these error codes, I'm not sure if we should leave all
  or some of them out for the time being.
 
 I say lets omit any we don't use for now.
 
 Is it possible that anyone is using the existing header file where
 these values were defined ?  If so their code might say
   case ELOOP:
 which would not compile when they switched to the new header.

The existing header is a hypervisor private one. Any code outside
the hypervisor using it imo deserves to get broken.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 07/19] libxl: x86: factor out e820_host_sanitize

Wei Liu writes ([PATCH v3 07/19] libxl: x86: factor out e820_host_sanitize):
 This function gets the machine E820 map and sanitize it according to PV
 guest configuration.
 
 This will be used in later patch. No functional change introduced in
 this patch.

Thanks.  It is easy to see that this is correct.

The way that `rc' is used to contain a libxc (syscall) return value is
contrary to the coding style but it is better not to fix this in the
same patch as the code motion.

Acked-by: Ian Jackson ian.jack...@eu.citrix.com

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC] make error codes a formal part of the ABI

On Tue, 2015-01-13 at 16:57 +, Ian Jackson wrote:
 Ian Campbell writes (Re: [PATCH RFC] make error codes a formal part of the 
 ABI):
  On Tue, 2015-01-13 at 16:21 +, Jan Beulich wrote:
   There's on small block commented with TBD left in the public header.
   This is the main reason for the submission being RFC. While we don't
   currently use these error codes, I'm not sure if we should leave all
   or some of them out for the time being.
  
  I say lets omit any we don't use for now.
 
 Is it possible that anyone is using the existing header file where
 these values were defined ?

It's not installed or in the regular header paths, so it seems unlikely,
or at least they would have had to jump through some hoops and no doubt
have a big comment about their fragile hack...

   If so their code might say
   case ELOOP:
 which would not compile when they switched to the new header.
 
 I don't know whether this is likely, or a problem.
 
 Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6 2/4] xen/arm: vgic: Keep track of vIRQ used by a domain

On Tue, 2015-01-13 at 16:57 +, Julien Grall wrote:
 (CC Jan)

I think you forget, I added him.

  @@ -49,6 +49,21 @@ int domain_vtimer_init(struct domain *d)
   {
   d-arch.phys_timer_base.offset = NOW();
   d-arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
  +
  +/* At this stage vgic_reserve_virq can't fail */
  +if ( is_hardware_domain(d) )
  +{
  +BUG_ON(!vgic_reserve_virq(d, 
  timer_get_irq(TIMER_PHYS_SECURE_PPI)));
  +BUG_ON(!vgic_reserve_virq(d, 
  timer_get_irq(TIMER_PHYS_NONSECURE_PPI)));
  +BUG_ON(!vgic_reserve_virq(d, timer_get_irq(TIMER_VIRT_PPI)));
  +}
  +else
  +{
  +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_S_PPI));
  +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_NS_PPI));
  +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_VIRT_PPI));
 
  Although BUG_ON is not conditional on $debug I think we still should
  avoid side effects in the condition.
 
  I know, but this should never fail as it called during on domain
  construction. If so we may have some other issue later if we decide to
  assign PPI to a guest.
 
  I would prefer to keep the BUG_ON here
  
  I'm not objecting the the BUG_ON itself but to the fact that the
  condition has a side effect. Please use:
  if (!do_something())
  BUG()
  instead to avoid this.
 
 We have other place in the code where BUG_ON as a side-effect.

If we do then it is a tiny minority of places, and they are IMHO wrong.
I spotted one in the 600+ results of grepping for BUG_ON.

 IHMO, if (!do_something()) BUG() = BUG_ON.

No, BUG_ON() is a variant of ASSERT(), with the distinction being that
the former is not only included when debug=y. It is as wrong to have a
side-effect in the BUG_ON as it is to have one in an ASSERT.

 On the latter you know directly why it's failing, on the former you have
 to look at the code.

If it's important/possible to fail then a log message would be
appropriate.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 3/3] xen-netfront: refactor making Tx requests

Eliminate all the duplicate code for making Tx requests by
consolidating them into a single xennet_make_one_txreq() function.

xennet_make_one_txreq() and xennet_make_txreqs() work with pages and
offsets so it will be easier to make netfront handle highmem frags in
the future.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 drivers/net/xen-netfront.c |  181 
 1 file changed, 67 insertions(+), 114 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index 6b29b3a..68e0e8f 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -425,99 +425,56 @@ static void xennet_tx_buf_gc(struct netfront_queue *queue)
xennet_maybe_wake_tx(queue);
 }
 
-static void xennet_make_frags(struct sk_buff *skb, struct netfront_queue 
*queue,
- struct xen_netif_tx_request *tx)
-{
-   char *data = skb-data;
-   unsigned long mfn;
-   RING_IDX prod = queue-tx.req_prod_pvt;
-   int frags = skb_shinfo(skb)-nr_frags;
-   unsigned int offset = offset_in_page(data);
-   unsigned int len = skb_headlen(skb);
+static struct xen_netif_tx_request *xennet_make_one_txreq(
+   struct netfront_queue *queue, struct sk_buff *skb,
+   struct page *page, unsigned int offset, unsigned int len)
+{
unsigned int id;
+   struct xen_netif_tx_request *tx;
grant_ref_t ref;
-   int i;
 
-   /* While the header overlaps a page boundary (including being
-  larger than a page), split it it into page-sized chunks. */
-   while (len  PAGE_SIZE - offset) {
-   tx-size = PAGE_SIZE - offset;
-   tx-flags |= XEN_NETTXF_more_data;
-   len -= tx-size;
-   data += tx-size;
-   offset = 0;
+   len = min_t(unsigned int, PAGE_SIZE - offset, len);
 
-   id = get_id_from_freelist(queue-tx_skb_freelist, 
queue-tx_skbs);
-   queue-tx_skbs[id].skb = skb_get(skb);
-   tx = RING_GET_REQUEST(queue-tx, prod++);
-   tx-id = id;
-   ref = gnttab_claim_grant_reference(queue-gref_tx_head);
-   BUG_ON((signed short)ref  0);
+   id = get_id_from_freelist(queue-tx_skb_freelist, queue-tx_skbs);
+   tx = RING_GET_REQUEST(queue-tx, queue-tx.req_prod_pvt++);
+   ref = gnttab_claim_grant_reference(queue-gref_tx_head);
+   BUG_ON((signed short)ref  0);
 
-   mfn = virt_to_mfn(data);
-   gnttab_grant_foreign_access_ref(ref, 
queue-info-xbdev-otherend_id,
-   mfn, GNTMAP_readonly);
+   gnttab_grant_foreign_access_ref(ref, queue-info-xbdev-otherend_id,
+   page_to_mfn(page), GNTMAP_readonly);
 
-   queue-grant_tx_page[id] = virt_to_page(data);
-   tx-gref = queue-grant_tx_ref[id] = ref;
-   tx-offset = offset;
-   tx-size = len;
-   tx-flags = 0;
-   }
+   queue-tx_skbs[id].skb = skb;
+   queue-grant_tx_page[id] = page;
+   queue-grant_tx_ref[id] = ref;
 
-   /* Grant backend access to each skb fragment page. */
-   for (i = 0; i  frags; i++) {
-   skb_frag_t *frag = skb_shinfo(skb)-frags + i;
-   struct page *page = skb_frag_page(frag);
+   tx-id = id;
+   tx-gref = ref;
+   tx-offset = offset;
+   tx-size = len;
+   tx-flags = 0;
 
-   len = skb_frag_size(frag);
-   offset = frag-page_offset;
+   return tx;
+}
 
-   /* Skip unused frames from start of page */
-   page += offset  PAGE_SHIFT;
-   offset = ~PAGE_MASK;
+static struct xen_netif_tx_request *xennet_make_txreqs(
+   struct netfront_queue *queue, struct xen_netif_tx_request *tx,
+   struct sk_buff *skb, struct page *page,
+   unsigned int offset, unsigned int len)
+{
+   /* Skip unused frames from start of page */
+   page += offset  PAGE_SHIFT;
+   offset = ~PAGE_MASK;
 
-   while (len  0) {
-   unsigned long bytes;
-
-   bytes = PAGE_SIZE - offset;
-   if (bytes  len)
-   bytes = len;
-
-   tx-flags |= XEN_NETTXF_more_data;
-
-   id = get_id_from_freelist(queue-tx_skb_freelist,
- queue-tx_skbs);
-   queue-tx_skbs[id].skb = skb_get(skb);
-   tx = RING_GET_REQUEST(queue-tx, prod++);
-   tx-id = id;
-   ref = 
gnttab_claim_grant_reference(queue-gref_tx_head);
-   BUG_ON((signed short)ref  0);
-
-   mfn = pfn_to_mfn(page_to_pfn(page));
-   gnttab_grant_foreign_access_ref(ref,
-

[Xen-devel] [PATCH 1/3] xen: add page_to_mfn()

pfn_to_mfn(page_to_pfn(p)) is a common use case so add a generic
helper for it.

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 include/xen/page.h |5 +
 1 file changed, 5 insertions(+)

diff --git a/include/xen/page.h b/include/xen/page.h
index 12765b6..c5ed20b 100644
--- a/include/xen/page.h
+++ b/include/xen/page.h
@@ -3,6 +3,11 @@
 
 #include asm/xen/page.h
 
+static inline unsigned long page_to_mfn(struct page *page)
+{
+   return pfn_to_mfn(page_to_pfn(page));
+}
+
 struct xen_memory_region {
phys_addr_t start;
phys_addr_t size;
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 03/19] libxc: allocate memory with vNUMA information for PV guest

Wei Liu writes ([PATCH v3 03/19] libxc: allocate memory with vNUMA information 
for PV guest):
...
 diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
 index 07d7224..c459e77 100644
 --- a/tools/libxc/include/xc_dom.h
 +++ b/tools/libxc/include/xc_dom.h
 @@ -167,6 +167,11 @@ struct xc_dom_image {
...
 +/* vNUMA information */
 +unsigned int *vnode_to_pnode; /* vnode to pnode mapping array */
 +uint64_t *vnode_size; /* vnode size array */

You don't specify the units.  You should probably name the variable
_bytes or _pages or something.

Looking at the algorithm below it seems to be in _mby.  But the domain
size is specified in pages.  So AFAICT if you try to create a domain
which is not a whole number of pages, it is bound to fail !

Perhaps the vnode memory size should be in pages too.

 +unsigned int nr_vnodes;   /* number of elements of above arrays */

Is there some reason to prefer this arrangement with multiple parallel
arrays, to one with a single array of structs ?

 +/* Setup dummy vNUMA information if it's not provided. Not
 + * that this is a valid state if libxl doesn't provide any
 + * vNUMA information.
 + *
 + * In this case we setup some dummy value for the convenience
 + * of the allocation code. Note that from the user's PoV the
 + * guest still has no vNUMA configuration.
 + */

This arrangement for defaulting makes it difficult to supply only
partial information - for example, to supply the number of vnodes but
allow the system to make up the details.

I have a similar complaint about the corresponding libxl code.

I think you should decide where you want the defaulting to be, and do
it in a more flexible way in that one place.  Probably, libxl.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6 2/4] xen/arm: vgic: Keep track of vIRQ used by a domain

On 13/01/15 16:57, Julien Grall wrote:
 (CC Jan)

Forgot to really CC Jan for the bool stuff.

 Hi Ian,
 
 On 13/01/15 16:46, Ian Campbell wrote:
 vgic_reserve_irq returns a boolean:

 Please use true/false then.

 In Xen we have xen/stdbool.h which differs from normal stdboot.h. I'm
 not sure what the rules are for use.
 
 Jan please correct me if I'm wrong, xen/stdbool.h has been introduced
 for the ELF code and should not be used anywhere else.
 
 true/false is defined in xen/stdbool.h together with Bool not bool_t.
 
 0 = not reserved
 1 = reserved

 I don't see why we should return an int in this case, as the caller
 should know how to use it.

 It's slightly more conventional to return error codes, but I guess I
 don't mind much.
 
 Agree, but in this particular case we don't have to know the error code.
 So it's pointless to return it.
 
 @@ -49,6 +49,21 @@ int domain_vtimer_init(struct domain *d)
  {
  d-arch.phys_timer_base.offset = NOW();
  d-arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
 +
 +/* At this stage vgic_reserve_virq can't fail */
 +if ( is_hardware_domain(d) )
 +{
 +BUG_ON(!vgic_reserve_virq(d, 
 timer_get_irq(TIMER_PHYS_SECURE_PPI)));
 +BUG_ON(!vgic_reserve_virq(d, 
 timer_get_irq(TIMER_PHYS_NONSECURE_PPI)));
 +BUG_ON(!vgic_reserve_virq(d, timer_get_irq(TIMER_VIRT_PPI)));
 +}
 +else
 +{
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_S_PPI));
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_NS_PPI));
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_VIRT_PPI));

 Although BUG_ON is not conditional on $debug I think we still should
 avoid side effects in the condition.

 I know, but this should never fail as it called during on domain
 construction. If so we may have some other issue later if we decide to
 assign PPI to a guest.

 I would prefer to keep the BUG_ON here

 I'm not objecting the the BUG_ON itself but to the fact that the
 condition has a side effect. Please use:
 if (!do_something())
  BUG()
 instead to avoid this.
 
 We have other place in the code where BUG_ON as a side-effect.
 
 IHMO, if (!do_something()) BUG() = BUG_ON.
 
 On the latter you know directly why it's failing, on the former you have
 to look at the code.
 
 Regards,
 


-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 12/19] hvmloader: retrieve vNUMA information from hypervisor

On Tue, Jan 13, 2015 at 04:50:11PM +, Jan Beulich wrote:
  On 13.01.15 at 13:11, wei.l...@citrix.com wrote:
  +void init_vnuma_info(void)
  +{
  +int rc, retry = 0;
  +struct xen_vnuma_topology_info vnuma_topo;
  +
  +vcpu_to_vnode = scratch_alloc(sizeof(uint32_t) * hvm_info-nr_vcpus, 
  0);
 
 sizeof(*vcpu_to_vnode) please.
 

Done.

  +rc = -EAGAIN;
  +while ( rc == -EAGAIN  retry  10 )
 
 What's the justification for 10 here? A sane tool stack shouldn't alter
 the values while starting the domain.
 

I wasn't sure if a toolstack will change the values whilst domain is
running. But you now confirm that a sane toolstack shouldn't do that I
can just remove this loop.

  +{
  +vnuma_topo.domid = DOMID_SELF;
  +vnuma_topo.pad = 0;
  +vnuma_topo.nr_vcpus = 0;
  +vnuma_topo.nr_vnodes = 0;
  +vnuma_topo.nr_vmemranges = 0;
  +
  +set_xen_guest_handle(vnuma_topo.vdistance.h, NULL);
  +set_xen_guest_handle(vnuma_topo.vcpu_to_vnode.h, NULL);
  +set_xen_guest_handle(vnuma_topo.vmemrange.h, NULL);
  +
  +rc = hypercall_memory_op(XENMEM_get_vnumainfo, vnuma_topo);
  +
  +if ( rc == -EOPNOTSUPP )
  +return;
  +
  +if ( rc != -ENOBUFS )
  +break;
  +
  +ASSERT(vnuma_topo.nr_vcpus == hvm_info-nr_vcpus);
 
 I also wonder whether we shouldn't make the hypervisor return
 back the (modified) values in the -EAGAIN error case, so that you
 could move above first half of the loop body out of the loop.
 

I don't think hypercall modifies values in -EAGAIN case. The first half
of that loop is to prepare hypercall structure so that we can retrieve
the new size.

But since you say no sane toolstack should do that this issue becomes
moot.

Wei.

 Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 05/19] libxl: add vmemrange to libxl__domain_build_state

On Tue, Jan 13, 2015 at 05:02:10PM +, Ian Jackson wrote:
 Wei Liu writes ([PATCH v3 05/19] libxl: add vmemrange to 
 libxl__domain_build_state):
  A vnode consists of one or more vmemranges (virtual memory range).  One
  example of multiple vmemranges is that there is a hole in one vnode.
 
 I'm finding this series a bit oddly structured.  This patch, for
 example, just introduces some new fields to an internal state struct -
 but these fields are not initialised, set, or read.
 

The new fields (and other existing fields) are initialised to zero in
initiate_domain_create, that's why it doesn't need to be explicitly
initialised.

These new fields are accessed in the next patch. I can either explicitly
say so in commit log or squash this patch with the next one. Which way
do you prefer?

TBH I don't think this patch and next one should be squashed into one
patch.

Wei.

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for-4.6 2/4] xen/arm: vgic: Keep track of vIRQ used by a domain

On 13/01/15 17:18, Ian Campbell wrote:
 On Tue, 2015-01-13 at 16:57 +, Julien Grall wrote:
 (CC Jan)
 
 I think you forget, I added him.
 
 @@ -49,6 +49,21 @@ int domain_vtimer_init(struct domain *d)
  {
  d-arch.phys_timer_base.offset = NOW();
  d-arch.virt_timer_base.offset = READ_SYSREG64(CNTPCT_EL0);
 +
 +/* At this stage vgic_reserve_virq can't fail */
 +if ( is_hardware_domain(d) )
 +{
 +BUG_ON(!vgic_reserve_virq(d, 
 timer_get_irq(TIMER_PHYS_SECURE_PPI)));
 +BUG_ON(!vgic_reserve_virq(d, 
 timer_get_irq(TIMER_PHYS_NONSECURE_PPI)));
 +BUG_ON(!vgic_reserve_virq(d, timer_get_irq(TIMER_VIRT_PPI)));
 +}
 +else
 +{
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_S_PPI));
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_PHYS_NS_PPI));
 +BUG_ON(!vgic_reserve_virq(d, GUEST_TIMER_VIRT_PPI));

 Although BUG_ON is not conditional on $debug I think we still should
 avoid side effects in the condition.

 I know, but this should never fail as it called during on domain
 construction. If so we may have some other issue later if we decide to
 assign PPI to a guest.

 I would prefer to keep the BUG_ON here

 I'm not objecting the the BUG_ON itself but to the fact that the
 condition has a side effect. Please use:
 if (!do_something())
 BUG()
 instead to avoid this.

 We have other place in the code where BUG_ON as a side-effect.
 
 If we do then it is a tiny minority of places, and they are IMHO wrong.
 I spotted one in the 600+ results of grepping for BUG_ON.

I spotted more. Anyway, I will move to a if (!do_smth()) BUG() form.

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 03/19] libxc: allocate memory with vNUMA information for PV guest

On Tue, Jan 13, 2015 at 05:05:26PM +, Ian Jackson wrote:
 Wei Liu writes ([PATCH v3 03/19] libxc: allocate memory with vNUMA 
 information for PV guest):
 ...
  diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
  index 07d7224..c459e77 100644
  --- a/tools/libxc/include/xc_dom.h
  +++ b/tools/libxc/include/xc_dom.h
  @@ -167,6 +167,11 @@ struct xc_dom_image {
 ...
  +/* vNUMA information */
  +unsigned int *vnode_to_pnode; /* vnode to pnode mapping array */
  +uint64_t *vnode_size; /* vnode size array */
 
 You don't specify the units.  You should probably name the variable
 _bytes or _pages or something.
 
 Looking at the algorithm below it seems to be in _mby.  But the domain
 size is specified in pages.  So AFAICT if you try to create a domain
 which is not a whole number of pages, it is bound to fail !
 
 Perhaps the vnode memory size should be in pages too.
 

Let's use page as unit.

  +unsigned int nr_vnodes;   /* number of elements of above arrays */
 
 Is there some reason to prefer this arrangement with multiple parallel
 arrays, to one with a single array of structs ?
 

No, I don't have preference. I can pack vnode_to_pnode and
vnode_size(_pages) into a struct.

  +/* Setup dummy vNUMA information if it's not provided. Not
  + * that this is a valid state if libxl doesn't provide any
  + * vNUMA information.
  + *
  + * In this case we setup some dummy value for the convenience
  + * of the allocation code. Note that from the user's PoV the
  + * guest still has no vNUMA configuration.
  + */
 
 This arrangement for defaulting makes it difficult to supply only
 partial information - for example, to supply the number of vnodes but
 allow the system to make up the details.
 
 I have a similar complaint about the corresponding libxl code.
 
 I think you should decide where you want the defaulting to be, and do
 it in a more flexible way in that one place.  Probably, libxl.
 

The defaulting will be in libxl. That's what Dario is working on.

If libxl provides information, these dummy values will have no effect.

Maybe the comment is confusing. I wasn't saying there defaulting
happening inside libxc. It's only for the convenience of the allocation
code, because it needs to operate on one mapping.

Wei.

 Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH for 2.3 v2 1/1] xen-hvm: increase maxmem before calling xc_domain_populate_physmap

2015-01-13 Thread Stefano Stabellini

On Mon, 12 Jan 2015, Stefano Stabellini wrote:
 On Wed, 3 Dec 2014, Don Slutz wrote:
  From: Stefano Stabellini stefano.stabell...@eu.citrix.com
  
  Increase maxmem before calling xc_domain_populate_physmap_exact to
  avoid the risk of running out of guest memory. This way we can also
  avoid complex memory calculations in libxl at domain construction
  time.
  
  This patch fixes an abort() when assigning more than 4 NICs to a VM.
  
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Signed-off-by: Don Slutz dsl...@verizon.com
  ---
  v2: Changes by Don Slutz
Switch from xc_domain_getinfo to xc_domain_getinfolist
Fix error check for xc_domain_getinfolist
Limit increase of maxmem to only do when needed:
  Add QEMU_SPARE_PAGES (How many pages to leave free)
  Add free_pages calculation
  
   xen-hvm.c | 19 +++
   1 file changed, 19 insertions(+)
  
  diff --git a/xen-hvm.c b/xen-hvm.c
  index 7548794..d30e77e 100644
  --- a/xen-hvm.c
  +++ b/xen-hvm.c
  @@ -90,6 +90,7 @@ static inline ioreq_t *xen_vcpu_ioreq(shared_iopage_t 
  *shared_page, int vcpu)
   #endif
   
   #define BUFFER_IO_MAX_DELAY  100
  +#define QEMU_SPARE_PAGES 16
 
 We need a big comment here to explain why we have this parameter and
 when we'll be able to get rid of it.
 
 Other than that the patch is fine.
 
 Thanks!
 

Actually I'll just go ahead and add the comment and commit, if for you
is OK.

Cheers,

Stefano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 15/19] libxc: allocate memory with vNUMA information for HVM guest

2015-01-13 Thread Konrad Rzeszutek Wilk

On Tue, Jan 13, 2015 at 12:11:43PM +, Wei Liu wrote:
 The algorithm is more or less the same as the one used for PV guest.
 Libxc gets hold of the mapping of vnode to pnode and size of each vnode
 then allocate memory accordingly.

Could you split this patch in two? One part for the adding of the code
and the other for moving the existing code around?


 
 And then the function returns low memory end, high memory end and mmio
 start to caller. Libxl needs those values to construct vmemranges for
 that guest.
 
 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Dario Faggioli dario.faggi...@citrix.com
 Cc: Elena Ufimtseva ufimts...@gmail.com
 ---
 Changes in v3:
 1. Rewrite commit log.
 2. Add a few code comments.
 ---
  tools/libxc/include/xenguest.h |7 ++
  tools/libxc/xc_hvm_build_x86.c |  224 
 ++--
  2 files changed, 151 insertions(+), 80 deletions(-)
 
 diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
 index 40bbac8..d1cbb4e 100644
 --- a/tools/libxc/include/xenguest.h
 +++ b/tools/libxc/include/xenguest.h
 @@ -230,6 +230,13 @@ struct xc_hvm_build_args {
  struct xc_hvm_firmware_module smbios_module;
  /* Whether to use claim hypercall (1 - enable, 0 - disable). */
  int claim_enabled;
 +unsigned int nr_vnodes;/* Number of vnodes */
 +unsigned int *vnode_to_pnode; /* Vnode to pnode mapping */
 +uint64_t *vnode_size;  /* Size of vnodes */
 +/* Out parameters  */
 +uint64_t lowmem_end;
 +uint64_t highmem_end;
 +uint64_t mmio_start;
  };
  
  /**
 diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
 index c81a25b..54d3dc8 100644
 --- a/tools/libxc/xc_hvm_build_x86.c
 +++ b/tools/libxc/xc_hvm_build_x86.c
 @@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
  }
  
  static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
 -   uint64_t mmio_start, uint64_t mmio_size)
 +   uint64_t mmio_start, uint64_t mmio_size,
 +   struct xc_hvm_build_args *args)
  {
  struct hvm_info_table *hvm_info = (struct hvm_info_table *)
  (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
 @@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t 
 mem_size,
  hvm_info-high_mem_pgend = highmem_end  PAGE_SHIFT;
  hvm_info-reserved_mem_pgstart = ioreq_server_pfn(0);
  
 +args-lowmem_end = lowmem_end;
 +args-highmem_end = highmem_end;
 +args-mmio_start = mmio_start;
 +
  /* Finish with the checksum. */
  for ( i = 0, sum = 0; i  hvm_info-length; i++ )
  sum += ((uint8_t *)hvm_info)[i];
 @@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
 char *image, unsigned long image_size)
  {
  xen_pfn_t *page_array = NULL;
 -unsigned long i, nr_pages = args-mem_size  PAGE_SHIFT;
 +unsigned long i, j, nr_pages = args-mem_size  PAGE_SHIFT;
  unsigned long target_pages = args-mem_target  PAGE_SHIFT;
  uint64_t mmio_start = (1ull  32) - args-mmio_size;
  uint64_t mmio_size = args-mmio_size;
 @@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
  xen_capabilities_info_t caps;
  unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
  stat_1gb_pages = 0;
 -int pod_mode = 0;
 +unsigned int memflags = 0;
  int claim_enabled = args-claim_enabled;
  xen_pfn_t special_array[NR_SPECIAL_PAGES];
  xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
 -
 -if ( nr_pages  target_pages )
 -pod_mode = XENMEMF_populate_on_demand;
 +uint64_t dummy_vnode_size;
 +unsigned int dummy_vnode_to_pnode;
 +uint64_t total;
  
  memset(elf, 0, sizeof(elf));
  if ( elf_init(elf, image, image_size) != 0 )
 @@ -276,6 +281,37 @@ static int setup_guest(xc_interface *xch,
  v_start = 0;
  v_end = args-mem_size;
  
 +if ( nr_pages  target_pages )
 +memflags |= XENMEMF_populate_on_demand;
 +
 +if ( args-nr_vnodes == 0 )
 +{
 +/* Build dummy vnode information */
 +args-nr_vnodes = 1;
 +dummy_vnode_to_pnode = XC_VNUMA_NO_NODE;
 +dummy_vnode_size = args-mem_size  20;
 +args-vnode_size = dummy_vnode_size;
 +args-vnode_to_pnode = dummy_vnode_to_pnode;
 +}
 +else
 +{
 +if ( nr_pages  target_pages )
 +{
 +PERROR(Cannot enable vNUMA and PoD at the same time);
 +goto error_out;
 +}
 +}
 +
 +total = 0;
 +for ( i = 0; i  args-nr_vnodes; i++ )
 +total += (args-vnode_size[i]  20);
 +if ( total != args-mem_size )
 +{
 +PERROR(Memory size requested by vNUMA (0x%PRIx64) mismatches 
 memory size configured for domain (0x%PRIx64),
 +   total, args-mem_size);
 +goto error_out;
 +

Re: [Xen-devel] [PATCH v3 08/19] libxl: functions to build vmemranges for PV guest

Wei Liu writes ([PATCH v3 08/19] libxl: functions to build vmemranges for PV 
guest):
 Introduce a arch-independent routine to generate one vmemrange per
 vnode. Also introduce arch-dependent routines for different
 architectures because part of the process is arch-specific -- ARM has
 yet have NUMA support and E820 is x86 only.
 
 For those x86 guests who care about machine E820 map (i.e. with
 e820_host=1), vnode is further split into several vmemranges to
 accommodate memory holes.  A few stubs for libxl_arm.c are created.
...
 +/* Generate one vmemrange for each virtual node. */
 +next = 0;
 +for (i = 0; i  b_info-num_vnuma_nodes; i++) {
 +libxl_vnode_info *p = b_info-vnuma_nodes[i];
 +
 +v = libxl__realloc(gc, v, sizeof(*v) * (i+1));

Please use GCREALLOC_ARRAY.

 +v[i].start = next;
 +v[i].end = next + (p-mem  20); /* mem is in MiB */

Why are all these values in different units ?

Also, it would be best if the units were in the field and variable
names.  Then you wouldn't have to write an explanatory comment.

 diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
 index e959e37..2018afc 100644
 --- a/tools/libxl/libxl_x86.c
 +++ b/tools/libxl/libxl_x86.c
 @@ -338,6 +338,80 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc 
 *gc,
...
 +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
 +  uint32_t domid,
 +  libxl_domain_build_info *b_info,
 +  libxl__domain_build_state *state)
 +{
...
 +n = 0; /* E820 counter */

How about putting this information in the variable name rather than
dropping it into a comment ?  Likewise i.

 +while (remaining  0) {
 +if (n = nr_e820) {
 +rc = ERROR_FAIL;

ERROR_NOMEM, surely ?

 +if (map[n].size = remaining) {
 +v[x].start = map[n].addr;
 +v[x].end = map[n].addr + remaining;
 +map[n].addr += remaining;
 +map[n].size -= remaining;
 +remaining = 0;
 +} else {
 +v[x].start = map[n].addr;
 +v[x].end = map[n].addr + map[n].size;
 +remaining -= map[n].size;
 +n++;
 +}

It might be possible to write this more compactly with something like

   use = map[n].size  remaining ? map[n].size : remaining;

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 02/11] VMX: implement suppress #VE.

On 01/12/2015 09:45 AM, Ed White wrote:
 On 01/12/2015 08:43 AM, Andrew Cooper wrote:
 On 09/01/15 21:26, Ed White wrote:
 In preparation for selectively enabling hardware #VE in a later patch,
 set suppress #VE on all EPTE's on #VE-capable hardware.

 Suppress #VE should always be the default condition for two reasons:
 it is generally not safe to deliver #VE into a guest unless that guest
 has been modified to receive it; and even then for most EPT violations only
 the hypervisor is able to handle the violation.

 Signed-off-by: Ed White edmund.h.wh...@intel.com
 ---
  xen/arch/x86/mm/p2m-ept.c | 34 +-
  xen/include/asm-x86/hvm/vmx/vmx.h |  1 +
  2 files changed, 34 insertions(+), 1 deletion(-)

 diff --git a/xen/arch/x86/mm/p2m-ept.c b/xen/arch/x86/mm/p2m-ept.c
 index eb8b5f9..2b9f07c 100644
 --- a/xen/arch/x86/mm/p2m-ept.c
 +++ b/xen/arch/x86/mm/p2m-ept.c
 @@ -41,7 +41,7 @@
  #define is_epte_superpage(ept_entry)((ept_entry)-sp)
  static inline bool_t is_epte_valid(ept_entry_t *e)
  {
 -return (e-epte != 0  e-sa_p2mt != p2m_invalid);
 +return (e-valid != 0  e-sa_p2mt != p2m_invalid);
  }
  
  /* returns : 0 for success, -errno otherwise */
 @@ -194,6 +194,19 @@ static int ept_set_middle_entry(struct p2m_domain 
 *p2m, ept_entry_t *ept_entry)
  
  ept_entry-r = ept_entry-w = ept_entry-x = 1;
  
 +/* Disable #VE on all entries */ 
 +if ( cpu_has_vmx_virt_exceptions )
 +{
 +ept_entry_t *table = __map_domain_page(pg);
 +
 +for ( int i = 0; i  EPT_PAGETABLE_ENTRIES; i++ )

 Style - please declare i in the upper scope, and it should be unsigned.

 +table[i].suppress_ve = 1;
 +
 +unmap_domain_page(table);
 +
 +ept_entry-suppress_ve = 1;
 +}
 +
  return 1;
  }
  
 @@ -243,6 +256,10 @@ static int ept_split_super_page(struct p2m_domain 
 *p2m, ept_entry_t *ept_entry,
  epte-sp = (level  1);
  epte-mfn += i * trunk;
  epte-snp = (iommu_enabled  iommu_snoop);
 +
 +if ( cpu_has_vmx_virt_exceptions )
 +epte-suppress_ve = 1;
 +
  ASSERT(!epte-rsvd1);
  
  ept_p2m_type_to_flags(epte, epte-sa_p2mt, epte-access);
 @@ -753,6 +770,9 @@ ept_set_entry(struct p2m_domain *p2m, unsigned long 
 gfn, mfn_t mfn,
  ept_p2m_type_to_flags(new_entry, p2mt, p2ma);
  }
  
 +if ( cpu_has_vmx_virt_exceptions )
 +new_entry.suppress_ve = 1;
 +
  rc = atomic_write_ept_entry(ept_entry, new_entry, target);
  if ( unlikely(rc) )
  old_entry.epte = 0;
 @@ -1069,6 +1089,18 @@ int ept_p2m_init(struct p2m_domain *p2m)
  /* set EPT page-walk length, now it's actual walk length - 1, i.e. 3 */
  ept-ept_wl = 3;
  
 +/* Disable #VE on all entries */
 +if ( cpu_has_vmx_virt_exceptions )
 +{
 +ept_entry_t *table =
 +map_domain_page(pagetable_get_pfn(p2m_get_pagetable(p2m)));
 +
 +for ( int i = 0; i  EPT_PAGETABLE_ENTRIES; i++ )
 +table[i].suppress_ve = 1;

 Is it safe setting SVE on an entry which is not known to be a superpage
 or not present?  The manual states that the bit is ignored in this case,
 but I am concerned that, as with SVE, this bit will suddenly gain
 meaning in the future.

 
 It is safe to do this. Never say never, but I am aware of no plans to
 overload this bit, and I would know. Unless you feel strongly about it,
 I would prefer to leave this as-is, since changing it would make the code
 more complex.
 

One point that I should have clarified yesterday: the SDM says the bit is
ignored for a non-terminal present entry; the bit is not ignored for
non-present entries, which is why I have to set all the SVE bits in a new
page -- my lazy EPTE copying algorithm wouldn't work otherwise because all
the zero entries would generate #VE.

Ed

 +
 +unmap_domain_page(table);
 +}
 +
  if ( !zalloc_cpumask_var(ept-synced_mask) )
  return -ENOMEM;
  
 diff --git a/xen/include/asm-x86/hvm/vmx/vmx.h 
 b/xen/include/asm-x86/hvm/vmx/vmx.h
 index 8bae195..70fee74 100644
 --- a/xen/include/asm-x86/hvm/vmx/vmx.h
 +++ b/xen/include/asm-x86/hvm/vmx/vmx.h
 @@ -49,6 +49,7 @@ typedef union {
  suppress_ve :   1;  /* bit 63 - suppress #VE */
  };
  u64 epte;
 +u64 valid   :   63; /* entire EPTE except suppress #VE bit */

 I am not sure 'valid' is a sensible name here.  As it is only used in
 is_epte_valid(), might it be better to just use -epte and a bitmask for
 everything other than the #VE bit?

 
 This seemed more in the style of the code I was changing, but I can do it
 as you suggest.
 
 Ed
 
  } ept_entry_t;
  
  typedef struct {



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 00/14] Enable vTPM subsystem on TPM 2.0

2015-01-13 Thread Xu, Quan

 -Original Message-
 From: Daniel De Graaf [mailto:dgde...@tycho.nsa.gov]
 Sent: Tuesday, January 13, 2015 11:54 PM
 To: Xu, Quan; xen-devel@lists.xen.org
 Cc: stefano.stabell...@eu.citrix.com; samuel.thiba...@ens-lyon.org;
 ian.campb...@citrix.com; ian.jack...@eu.citrix.com; jbeul...@suse.com;
 k...@xen.org; t...@xen.org
 Subject: Re: [PATCH v3 00/14] Enable vTPM subsystem on TPM 2.0

 On 01/12/2015 11:06 AM, Xu, Quan wrote:
  Graaf,
  Now there are no more comments for this series of patch.
  Can this series of patch be merged in staging branch? or any other AR, let 
  me
 know.
  If the series of patch are in staging branch, the Community and I can 
  continue
 to develop and enhance it.

 A few remaining comments:

 Patch 6 adds an #if 0 block; is this test code that you meant to remove?

Thanks,
It is just an example how to bind/unbind. I will remove it in v4 and send out 
v4 ASAP.

 Patch 9 (see reply).

I will fix it.

 Are you planning to replace TPM2_Bind with TPM2_Seal in a later series?
 If so, please make a note of this limitation in the documentation for TPM2, 
 since
 using PCRs to seal the data can be an important security feature that users 
 of the
 vtpmmgr rely on.

Yes, I will replace TPM2_Bind with TPM2_Seal in a later series.

 For the other patches in this series (1-5,7-8,10):
 Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov

 With patch #14 documenting the lack of TPM2 sealing, #11-13 are also Acked.

I will fix the Patch#14 documenting the lack of TPM2 sealing in v4.
Thanks again.

Quan

 - Daniel

  Thanks
  Quan

  -Original Message-
  From: Xu, Quan
  Sent: Wednesday, December 31, 2014 1:50 PM
  To: xen-devel@lists.xen.org
  Cc: dgde...@tycho.nsa.gov; stefano.stabell...@eu.citrix.com;
  samuel.thiba...@ens-lyon.org; ian.campb...@citrix.com;
  ian.jack...@eu.citrix.com; jbeul...@suse.com; k...@xen.org;
  t...@xen.org; Xu, Quan
  Subject: [PATCH v3 00/14] Enable vTPM subsystem on TPM 2.0

  ###
  # Happy New Year..#
  ###

  This series of patch enable the virtual Trusted Platform Module
  (vTPM) subsystem for Xen on TPM 2.0.

  Noted, functionality for a virtual guest operating system (a DomU) is
  still TPM 1.2. The main modifcation is on vtpmmgr-stubdom. The
  challenge is that TPM
  2.0 is not backward compatible with TPM 1.2.

  --
  DESIGN OVERVIEW
  --
  The architecture of vTPM subsystem on TPM 2.0 is described below:

  +--+
  |Linux DomU| ...
  |   |  ^   |
  |   v  |   |
  |   xen-tpmfront   |
  +--+
   |  ^
   v  |
  +--+
  | mini-os/tpmback  |
  |   |  ^   |
  |   v  |   |
  |  vtpm-stubdom| ...
  |   |  ^   |
  |   v  |   |
  | mini-os/tpmfront |
  +--+
   |  ^
   v  |
  +--+
  | mini-os/tpmback  |
  |   |  ^   |
  |   v  |   |
  | vtpmmgr-stubdom  |
  |   |  ^   |
  |   v  |   |
  | mini-os/tpm2_tis |
  +--+
   |  ^
   v  |
  +--+
  | Hardware TPM 2.0 |
  +--+
* Linux DomU: The Linux based guest that wants to use a vTPM. There
  many be
  more than one of these.

* xen-tpmfront.ko: Linux kernel virtual TPM frontend driver. This driver
   provides vTPM access to a para-virtualized Linux
  based DomU.

* mini-os/tpmback: Mini-os TPM backend driver. The Linux frontend driver
   connects to this backend driver to facilitate
   communications between the Linux DomU and its
  vTPM. This
   driver is also used by vtpmmgr-stubdom to
  communicate with
   vtpm-stubdom.

* vtpm-stubdom: A mini-os stub domain that implements a vTPM. There is
 a
one to one mapping between running vtpm-stubdom
  instances and
logical vtpms on the system. The vTPM Platform
  Configuration
Registers (PCRs) are all initialized to zero.

* mini-os/tpmfront: Mini-os TPM frontend driver. The vTPM mini-os
 domain
vtpm-stubdom uses this driver to communicate
 with
vtpmmgr-stubdom. This driver could also be used
  separately to
implement a mini-os domain that wishes to use a
  vTPM of
its own.
* vtpmmgr-stubdom: A mini-os domain that implements the vTPM
 manager.
  There is only one vTPM manager and it should be
  running during
  the entire lifetime of the machine.  This domain regulates
  access to the physical TPM on the system and secures the
  persistent state of each vTPM.

* mini-os/tpm2_tis: Mini-os TPM version 2.0 TPM Interface Specification

Re: [Xen-devel] [Xen-users] [TestDay] minor bug + possible configuration bug 4.5rc4 archlinux

2015-01-13 Thread Olaf Hering

On Tue, Jan 13, Doug McMillan wrote:

 Also quick question if I am understanding my remaining issue [tmpfs: Bad mount
 option context] as described by a previous thread. Until
 the code that generates it changes I need to manually change
 var-lib-xenstored.mount from
...
 Or am I misunderstanding that also??

No, thats fine. The contex= is already removed for 4.5.0.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] (v2) Design proposal for RMRR fix

2015-01-13 Thread Tian, Kevin

 From: Jan Beulich [mailto:jbeul...@suse.com]
 Sent: Wednesday, January 14, 2015 12:06 AM
 
  On 13.01.15 at 17:00, george.dun...@eu.citrix.com wrote:
  Another option I was thinking about: Before assigning a device to a
  guest, you have to unplug the device and assign it to pci-back (e.g.,
  with xl pci-assignable-add).  In addition to something like rmmr=host,
  we could add rmrr=assignable, which would add all of the RMRRs of all
  devices currently listed as assignable.  The idea would then be that
  you first make all your devices assignable, then just start your guests,
  and everything you've made assignable will be able to be assigned.
 
 Nice idea indeed, but I'm not sure about its practicability: It may
 not be desirable to make all devices eventually to be handed to a
 guest prior to starting any of the guests it may get handed to. In
 particular there may be reasons why the host needs the device
 while (or until after) creating the guests.
 

and I'm not sure whether there's enough knowledge to judge whether 
a device is assignable since potential conflicts may be detected only
when the guest is launched.

Thanks
Kevin

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

 Ed White edmund.h.wh...@intel.com 01/13/15 9:03 PM 
On 01/13/2015 11:01 AM, Andrew Cooper wrote:
 One thing I have noticed while looking at the #VE stuff that EPT also
 supports A/D tracking, which might be quite a nice optimisation and
 forgo the need for p2m_ram_logdirty, but I think this should be treated
 as an orthogonal item.
 
This is far from my area of expertise, but I believe there is code in Xen
to use EPT D bits in migration.

There once was a patch series, but upon asking on the (performance)
benefits, the submitting engineer stated that there was no measurable
improvement, and hence the series never got applied. Right now PML
is being worked on afaik, which from what I can tell will make it a lot
easier (compared to scanning the whole tree for set D bits) to collect
the modified bitmap when the tool stack asks for it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

 Ed White edmund.h.wh...@intel.com 01/13/15 10:32 PM 
On 01/13/2015 12:45 PM, Andrew Cooper wrote:
 On 13/01/15 20:02, Ed White wrote:
 The set of mfn's is the same, but I do allow gfn-mfn mappings to be
 modified under certain circumstances. One use of this is to point the
 same VA to different physical pages (with different access permissions)
 in different p2m's to hide memory changes.
 
 What is the practical use of being able to play paging tricks like this
 behind a VMs back?
 
I'm restricted in how much detail I can go into on a public mailing list,
but imagine that you want a data read to see one thing and an instruction
fetch to see something else.

How would that work? There can only be one P2M in use at a time, and that's
used for both translations. Or are you saying at least one of the two accesses
would be emulated nevertheless?

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [CALL-FOR-AGENDA] Monthly Xen.org Technical Call (2015-01-14)

On Wed, 2015-01-07 at 15:32 +, Ian Campbell wrote:
 The first Xen technical call will be at:
 Wed 14 Jan 17:00:00 GMT 201
 `date -d @1421254800`
 
 See http://lists.xen.org/archives/html/xen-devel/2015-01/msg00414.html
 for more information on the call.

In the absence of any further information from Konrad on the plans for
the retrospective there are no agenda items and therefore no call
tomorrow.

Thanks,
Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 03/19] libxc: allocate memory with vNUMA information for PV guest

2015-01-13 Thread Andrew Cooper

On 13/01/15 12:11, Wei Liu wrote:
 From libxc's point of view, it only needs to know vnode to pnode mapping
 and size of each vnode to allocate memory accordingly. Add these fields
 to xc_dom structure.

 The caller might not pass in vNUMA information. In that case, a dummy
 layout is generated for the convenience of libxc's allocation code. The
 upper layer (libxl etc) still sees the domain has no vNUMA
 configuration.

 Signed-off-by: Wei Liu wei.l...@citrix.com
 Cc: Ian Campbell ian.campb...@citrix.com
 Cc: Ian Jackson ian.jack...@eu.citrix.com
 Cc: Dario Faggioli dario.faggi...@citrix.com
 Cc: Elena Ufimtseva ufimts...@gmail.com
 ---
 Changes in v3:
 1. Rewrite commit log.
 2. Shorten some error messages.
 ---
  tools/libxc/include/xc_dom.h |5 +++
  tools/libxc/xc_dom_x86.c |   79 
 --
  tools/libxc/xc_private.h |2 ++
  3 files changed, 75 insertions(+), 11 deletions(-)

 diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
 index 07d7224..c459e77 100644
 --- a/tools/libxc/include/xc_dom.h
 +++ b/tools/libxc/include/xc_dom.h
 @@ -167,6 +167,11 @@ struct xc_dom_image {
  struct xc_dom_loader *kernel_loader;
  void *private_loader;
  
 +/* vNUMA information */
 +unsigned int *vnode_to_pnode; /* vnode to pnode mapping array */
 +uint64_t *vnode_size; /* vnode size array */

Please make it very clear in the comment here that size is in MB (at
least I presume so, given the shifts by 20).  There are currently no
specified units.

 +unsigned int nr_vnodes;   /* number of elements of above arrays */
 +
  /* kernel loader */
  struct xc_dom_arch *arch_hooks;
  /* allocate up to virt_alloc_end */
 diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
 index bf06fe4..06a7e54 100644
 --- a/tools/libxc/xc_dom_x86.c
 +++ b/tools/libxc/xc_dom_x86.c
 @@ -759,7 +759,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
  int arch_setup_meminit(struct xc_dom_image *dom)
  {
  int rc;
 -xen_pfn_t pfn, allocsz, i, j, mfn;
 +xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
 +int i, j;
  
  rc = x86_compat(dom-xch, dom-guest_domid, dom-guest_type);
  if ( rc )
 @@ -811,18 +812,74 @@ int arch_setup_meminit(struct xc_dom_image *dom)
  /* setup initial p2m */
  for ( pfn = 0; pfn  dom-total_pages; pfn++ )
  dom-p2m_host[pfn] = pfn;
 -
 +
 +/* Setup dummy vNUMA information if it's not provided. Not
 + * that this is a valid state if libxl doesn't provide any
 + * vNUMA information.
 + *
 + * In this case we setup some dummy value for the convenience
 + * of the allocation code. Note that from the user's PoV the
 + * guest still has no vNUMA configuration.
 + */
 +if ( dom-nr_vnodes == 0 )
 +{
 +dom-nr_vnodes = 1;
 +dom-vnode_to_pnode = xc_dom_malloc(dom,
 +
 sizeof(*dom-vnode_to_pnode));
 +dom-vnode_to_pnode[0] = XC_VNUMA_NO_NODE;
 +dom-vnode_size = xc_dom_malloc(dom, sizeof(*dom-vnode_size));
 +dom-vnode_size[0] = (dom-total_pages  PAGE_SHIFT)  20;
 +}
 +
 +total = 0;
 +for ( i = 0; i  dom-nr_vnodes; i++ )
 +total += ((dom-vnode_size[i]  20)  PAGE_SHIFT);

Can I suggest a mb_to_pages() helper rather than opencoding this in
several locations.

 +if ( total != dom-total_pages )
 +{
 +xc_dom_panic(dom-xch, XC_INTERNAL_ERROR,
 + %s: vNUMA page count mismatch (0x%PRIpfn != 
 0x%PRIpfn)\n,
 + __FUNCTION__, total, dom-total_pages);

__func__ please.  It is part of C99 unlike __FUNCTION__ which is a gnuism.

andrewcoop:xen.git$ git grep  __FUNCTION__ | wc -l
230
andrewcoop:xen.git$ git grep  __func__ | wc -l
194

Looks like the codebase is very mixed, but best to err on the side of
the standard.

 +return -EINVAL;
 +}
 +
  /* allocate guest memory */
 -for ( i = rc = allocsz = 0;
 -  (i  dom-total_pages)  !rc;
 -  i += allocsz )
 +pfn_base = 0;
 +for ( i = 0; i  dom-nr_vnodes; i++ )
  {
 -allocsz = dom-total_pages - i;
 -if ( allocsz  1024*1024 )
 -allocsz = 1024*1024;
 -rc = xc_domain_populate_physmap_exact(
 -dom-xch, dom-guest_domid, allocsz,
 -0, 0, dom-p2m_host[i]);
 +unsigned int memflags;
 +uint64_t pages;
 +
 +memflags = 0;
 +if ( dom-vnode_to_pnode[i] != XC_VNUMA_NO_NODE )
 +{
 +memflags |= XENMEMF_exact_node(dom-vnode_to_pnode[i]);
 +memflags |= XENMEMF_exact_node_request;
 +}
 +
 +pages = (dom-vnode_size[i]  20)  PAGE_SHIFT;
 +
 +for ( j = 0;

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

On 01/13/2015 11:01 AM, Andrew Cooper wrote:
 On 09/01/15 21:26, Ed White wrote:
 This set of patches adds support to hvm domains for EPTP switching by 
 creating
 multiple copies of the host p2m (currently limited to 10 copies).

 The primary use of this capability is expected to be in scenarios where 
 access
 to memory needs to be monitored and/or restricted below the level at which 
 the
 guest OS page tables operate. Two examples that were discussed at the 2014 
 Xen
 developer summit are:

 VM introspection: 
 http://www.slideshare.net/xen_com_mgr/
 zero-footprint-guest-memory-introspection-from-xen

 Secure inter-VM communication:
 http://www.slideshare.net/xen_com_mgr/nakajima-nvf

 Each p2m copy is populated lazily on EPT violations, and only contains 
 entries for
 ram p2m types. Permissions for pages in alternate p2m's can be changed in a 
 similar
 way to the existing memory access interface, and gfn-mfn mappings can be 
 changed.

 All this is done through extra HVMOP types.

 The cross-domain HVMOP code has been compile-tested only. Also, the 
 cross-domain
 code is hypervisor-only, the toolstack has not been modified.

 The intra-domain code has been tested. Violation notifications can only be 
 received
 for pages that have been modified (access permissions and/or gfn-mfn 
 mapping) 
 intra-domain, and only on VCPU's that have enabled notification.

 VMFUNC and #VE will both be emulated on hardware without native support.

 This code is not compatible with nested hvm functionality and will refuse to 
 work
 with nested hvm active. It is also not compatible with migration. It should 
 be
 considered experimental.
 
 Having reviewed most of the series, I believe I now have a feeling for
 what you are trying to achieve, but I would like to discuss some of the
 design implications.
 
 The following is my understanding of the situation.  Please correct me
 if I have made a mistake.
 
 

Thanks for investing the time to do this. Maybe this first couple of days
would have gone more smoothly if something like this was in the cover letter.

With the exception of a couple of minor points, you are spot on.

 Currently, a domain has a single host p2m.  This contains the guest
 physical address mappings, and a combination of p2m types which are used
 by existing components to allow certain actions to happen.  All vcpus
 run with the same host p2m.
 
 A domain may have a number of nested p2ms (currently an arbitrary limit
 of 10).  These are used for nested-virt and are translated by the host
 p2m.  Vcpus in guest mode run under a nested p2m.
 
 This new altp2m infrastructure adds the ability to use a different set
 of tables in the place of the host p2m.  This, in practice, allows for
 different translations, different p2m types, different access permissions. 
 
 One usecase of alternate p2ms is to provide introspection information to
 out-of-guest entities (via the mem_event interface) or to in-guest
 entities (via #VE).
 
 
 Now for some observations and assumptions.
 
 It occurs to me that the altp2m mechanism is generic.  From the look of
 the series, it is mostly implemented in a generic way, which is great. 
 The only Intel specific bits appear to be the ept handling itself,
 'vmfunc' instruction support and #VE injection to in-guest entities. 
 

That was my intention. I don't know enough about the state of AMD
virtualization to know if it can support these patches by emulating
vmfunc and #VE, but that was my target.

 I can't think of any reasonable case where the alternate p2m would want
 mappings different to the host p2m.  That is to say, an altp2m will map
 the same set of mfns to make a guest physical address space, but may
 differ in page permissions and possibly p2m types.
 

The set of mfn's is the same, but I do allow gfn-mfn mappings to be
modified under certain circumstances. One use of this is to point the
same VA to different physical pages (with different access permissions)
in different p2m's to hide memory changes.

 Given the above restriction, I believe a lot of the existing features
 can continue to work and coexist.  For generating mem_events, the
 permissions can be altered in the altp2m.  For injecting #VE, the altp2m
 type can change to the new p2m_ram_rw, so long as the host p2m type is
 compatible.  For both, a vmexit can occur.  Xen can do the appropriate
 action and also inject a #VE on its way back into the guest.
 
 One thing I have noticed while looking at the #VE stuff that EPT also
 supports A/D tracking, which might be quite a nice optimisation and
 forgo the need for p2m_ram_logdirty, but I think this should be treated
 as an orthogonal item.
 

This is far from my area of expertise, but I believe there is code in Xen
to use EPT D bits in migration.

Ed

 When shared ept/iommu is not in use, altp2m can safely be used by vcpus,
 as this will not interfere with the IOMMU permissions.
 
 Furthermore, I can't conceptually think of an

Re: [Xen-devel] [PATCH Linux-2.6.18] scsifront: avoid aquiring same lock twice if ring is full

2015-01-13 Thread Pasi Kärkkäinen

Hi,

On Tue, Jan 13, 2015 at 05:22:58PM +0100, Juergen Gross wrote:
 The locking in scsifront_dev_reset_handler() is obviously wrong. In
 case of a full ring the host lock is aquired twice.
 
 Fixing this issue enables to get rid of the endless fo loop with an
 explicit break statement.
 

Is this patch needed in upstream Linux kernel aswell, now that Xen PVSCSI 
drivers are in upstream Linux ?


Thanks,

-- Pasi


 Signed-off-by: Juergen Gross jgr...@suse.com
 ---
 
 diff -r 078f1bb69ea5 drivers/xen/scsifront/scsifront.c
 --- a/drivers/xen/scsifront/scsifront.c   Wed Dec 10 10:22:39 2014 +0100
 +++ b/drivers/xen/scsifront/scsifront.c   Tue Jan 13 14:32:33 2015 +0100
 @@ -447,12 +447,10 @@ static int scsifront_dev_reset_handler(s
   uint16_t rqid;
   int err = 0;
  
 - for (;;) {
  #if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,12)
 - spin_lock_irq(host-host_lock);
 + spin_lock_irq(host-host_lock);
  #endif
 - if (!RING_FULL(info-ring))
 - break;
 + while (RING_FULL(info-ring)) {
   if (err) {
  #if LINUX_VERSION_CODE = KERNEL_VERSION(2,6,12)
   spin_unlock_irq(host-host_lock);
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] x86, arm, platform, xen, kconfig: add xen defconfig helper

2015-01-13 Thread Luis R. Rodriguez

On Mon, Dec 15, 2014 at 02:58:26PM +, Stefano Stabellini wrote:
 On Tue, 9 Dec 2014, Luis R. Rodriguez wrote:
  From: Luis R. Rodriguez mcg...@suse.com
  
  This lets you build a kernel which can support xen dom0
  or xen guests by just using:
  
 make xenconfig
  
  on both x86 and arm64 kernels. This also splits out the
  options which are available currently to be built with x86
  and 'make ARCH=arm64' under a shared config.
  
  Technically xen supports a dom0 kernel and also a guest
  kernel configuration but upon review with the xen team
  since we don't have many dom0 options its best to just
  combine these two into one.
  
  Cc: Josh Triplett j...@joshtriplett.org
  Cc: Borislav Petkov b...@suse.de
  Cc: Pekka Enberg penb...@kernel.org
  Cc: David Rientjes rient...@google.com
  Cc: Michal Marek mma...@suse.cz
  Cc: Randy Dunlap rdun...@infradead.org
  Cc: penb...@kernel.org
  Cc: levinsasha...@gmail.com
  Cc: mtosa...@redhat.com
  Cc: fengguang...@intel.com
  Cc: David Vrabel david.vra...@citrix.com
  Cc: Ian Campbell ian.campb...@citrix.com
  Cc: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  Cc: xen-de...@lists.xenproject.org
  Reviewed-by: Josh Triplett j...@joshtriplett.org
  Signed-off-by: Luis R. Rodriguez mcg...@suse.com
  ---
   arch/x86/configs/xen.config |  7 +++
   kernel/configs/xen.config   | 30 ++
   scripts/kconfig/Makefile|  5 +
   3 files changed, 42 insertions(+)
   create mode 100644 arch/x86/configs/xen.config
   create mode 100644 kernel/configs/xen.config
  
  diff --git a/arch/x86/configs/xen.config b/arch/x86/configs/xen.config
  new file mode 100644
  index 000..92b8587f
  --- /dev/null
  +++ b/arch/x86/configs/xen.config
  @@ -0,0 +1,7 @@
  +# x86 xen specific config options
  +CONFIG_XEN_PVHVM=y
  +CONFIG_XEN_MAX_DOMAIN_MEMORY=500
  +CONFIG_XEN_SAVE_RESTORE=y
  +# CONFIG_XEN_DEBUG_FS is not set
  +CONFIG_XEN_PVH=y
  +CONFIG_XEN_MCE_LOG=y
  diff --git a/kernel/configs/xen.config b/kernel/configs/xen.config
  new file mode 100644
  index 000..d2ec010
  --- /dev/null
  +++ b/kernel/configs/xen.config
  @@ -0,0 +1,30 @@
  +# generic config
  +CONFIG_XEN=y
  +CONFIG_XEN_DOM0=y
  +CONFIG_PCI_XEN=y
 
 This shouldn't be here

If PCI is not supported on the arch this won't be selected as kconfig would not
allow for it, what would be the issue of keeping it here? What xen instances
would we not want to have this enabled for and can we instead manage that
through Kconfig magic by negating PCI_XEN for it?

  +CONFIG_XEN_PCIDEV_FRONTEND=m
  +CONFIG_XEN_BLKDEV_FRONTEND=m
  +CONFIG_XEN_BLKDEV_BACKEND=m
  +CONFIG_XEN_NETDEV_FRONTEND=m
  +CONFIG_XEN_NETDEV_BACKEND=m
  +CONFIG_INPUT_XEN_KBDDEV_FRONTEND=y
  +CONFIG_HVC_XEN=y
  +CONFIG_HVC_XEN_FRONTEND=y
  +CONFIG_TCG_XEN=m
 
 neither should this

OK!

  +CONFIG_XEN_WDT=m
  +CONFIG_XEN_FBDEV_FRONTEND=y
  +CONFIG_XEN_BALLOON=y
  +CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
  +CONFIG_XEN_SCRUB_PAGES=y
  +CONFIG_XEN_DEV_EVTCHN=m
  +CONFIG_XEN_BACKEND=y
  +CONFIG_XENFS=m
  +CONFIG_XEN_COMPAT_XENFS=y
  +CONFIG_XEN_SYS_HYPERVISOR=y
  +CONFIG_XEN_XENBUS_FRONTEND=y
  +CONFIG_XEN_GNTDEV=m
  +CONFIG_XEN_GRANT_DEV_ALLOC=m
  +CONFIG_SWIOTLB_XEN=y
  +CONFIG_XEN_PCIDEV_BACKEND=m
  +CONFIG_XEN_PRIVCMD=m
  +CONFIG_XEN_ACPI_PROCESSOR=m
 
 and this

OK!

  Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] --enable-xsm ?

2015-01-13 Thread Konrad Rzeszutek Wilk

Hey

I was wondering if there would be any plans for configure.ac
(or the m4 scripts) to have an --enable-xsm which would set
XSM_ENABLE (or FLASK_ENABLE) to true?

Right now by default to build with XSM one has to manually change
the Config.mk ENABLE_XSM option to 'y'.

Thanks.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 04/11] x86/MM: Improve p2m type checks.

On 01/12/2015 09:48 AM, Andrew Cooper wrote:
 On 09/01/15 21:26, Ed White wrote:
 diff --git a/xen/include/asm-x86/p2m.h b/xen/include/asm-x86/p2m.h
 index 5f7fe71..8193901 100644
 --- a/xen/include/asm-x86/p2m.h
 +++ b/xen/include/asm-x86/p2m.h
 @@ -193,6 +193,9 @@ struct p2m_domain {
   * threaded on in LRU order. */
  struct list_head   np2m_list;
  
 +/* Does this p2m belong to the altp2m code? */
 +bool_t alternate;
 +
  /* Host p2m: Log-dirty ranges registered for the domain. */
  struct rangeset   *logdirty_ranges;
  
 @@ -290,7 +293,9 @@ struct p2m_domain *p2m_get_nestedp2m(struct vcpu *v, 
 uint64_t np2m_base);
   */
  struct p2m_domain *p2m_get_p2m(struct vcpu *v);
  
 -#define p2m_is_nestedp2m(p2m)   ((p2m) != p2m_get_hostp2m((p2m-domain)))
 +#define p2m_is_hostp2m(p2m)   ((p2m) == p2m_get_hostp2m((p2m-domain)))
 +#define p2m_is_altp2m(p2m)((p2m)-alternate)
 +#define p2m_is_nestedp2m(p2m) (!p2m_is_altp2m(p2m)  !p2m_ishostp2m(p2m))
 
 Might this be better expressed as a p2m type, currently of the set
 {host, alt, nested} ?  p2m_is_nestedp2m() is starting to hide some
 moderately complicated calculations.
 

Any suggestions for the name? Unfortunately, p2m_type is already
taken, and I can't think of a good alternative.

Ed

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen/arm: Blacklist the memory mapped timer (armv7-timer-mem)

Some platform (such as the VFP Base AEMv8 model) has a memory mapped
timer. We don't want DOM0 use this timer rather than the generic ARM
timer. So blacklist it for all platforms.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
This patch is candidate to backport for Xen 4.5 and Xen 4.4.

It may not apply correctly for Xen 4.4.
---
 xen/arch/arm/domain_build.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index bf8dc78..16ce248 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1047,6 +1047,7 @@ static int handle_node(struct domain *d, struct 
kernel_info *kinfo,
 DT_MATCH_COMPATIBLE(arm,psci),
 DT_MATCH_PATH(/cpus),
 DT_MATCH_TYPE(memory),
+DT_MATCH_COMPATIBLE(arm,armv7-timer-mem),
 { /* sentinel */ },
 };
 static const struct dt_device_match gic_matches[] __initconst =
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/2] x86, arm, platform, xen, kconfig: add xen defconfig helper

Hello Luis,

On 13/01/15 19:03, Luis R. Rodriguez wrote:
 diff --git a/kernel/configs/xen.config b/kernel/configs/xen.config
 new file mode 100644
 index 000..d2ec010
 --- /dev/null
 +++ b/kernel/configs/xen.config
 @@ -0,0 +1,30 @@
 +# generic config
 +CONFIG_XEN=y
 +CONFIG_XEN_DOM0=y
 +CONFIG_PCI_XEN=y
 +CONFIG_XEN_PCIDEV_FRONTEND=m
 +CONFIG_XEN_BLKDEV_FRONTEND=m
 +CONFIG_XEN_BLKDEV_BACKEND=m
 +CONFIG_XEN_NETDEV_FRONTEND=m
 +CONFIG_XEN_NETDEV_BACKEND=m
 +CONFIG_INPUT_XEN_KBDDEV_FRONTEND=y
 +CONFIG_HVC_XEN=y
 +CONFIG_HVC_XEN_FRONTEND=y
 +CONFIG_TCG_XEN=m
 +CONFIG_XEN_WDT=m
 +CONFIG_XEN_FBDEV_FRONTEND=y
 +CONFIG_XEN_BALLOON=y
 +CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
 +CONFIG_XEN_SCRUB_PAGES=y
 +CONFIG_XEN_DEV_EVTCHN=m
 +CONFIG_XEN_BACKEND=y
 +CONFIG_XENFS=m
 +CONFIG_XEN_COMPAT_XENFS=y
 +CONFIG_XEN_SYS_HYPERVISOR=y
 +CONFIG_XEN_XENBUS_FRONTEND=y
 +CONFIG_XEN_GNTDEV=m
 +CONFIG_XEN_GRANT_DEV_ALLOC=m
 +CONFIG_SWIOTLB_XEN=y
 +CONFIG_XEN_PCIDEV_BACKEND=m
 +CONFIG_XEN_PRIVCMD=m
 +CONFIG_XEN_ACPI_PROCESSOR=m

 The common fragment config looks good for both ARM32 and ARM64:

 Acked-by: Julien Grall julien.gr...@linaro.org
 
 Can someone apply this? Who should this go through?

Stefano had some comments on this patch. See:

http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01531.html

Regards,

-- 
Julien Grall

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 01/19] xen: dump vNUMA information with debug key u

2015-01-13 Thread Andrew Cooper

On 13/01/15 12:11, Wei Liu wrote:
 @@ -408,6 +413,49 @@ static void dump_numa(unsigned char key)
  
  for_each_online_node ( i )
  printk(Node %u: %u\n, i, page_num_node[i]);
 +
 +if ( !d-vnuma )
 +continue;

Nit - extraneous whitespace.

 +
 +vnuma = d-vnuma;
 +printk( %u vnodes, %u vcpus:\n, vnuma-nr_vnodes, 
 d-max_vcpus);
 +for ( i = 0; i  vnuma-nr_vnodes; i++ )
 +{
 +err = snprintf(keyhandler_scratch, 12, %3u,
 +vnuma-vnode_to_pnode[i]);
 +if ( err  0 || vnuma-vnode_to_pnode[i] == NUMA_NO_NODE )
 +strlcpy(keyhandler_scratch, ???, 3);
 +
 +printk(   vnode  %3u - pnode %s\n, i, keyhandler_scratch);
 +for ( j = 0; j  vnuma-nr_vmemranges; j++ )
 +{
 +if ( vnuma-vmemrange[j].nid == i )
 +{
 +printk( %016PRIx64 - %016PRIx64\n,
 +   vnuma-vmemrange[j].start,
 +   vnuma-vmemrange[j].end);
 +}
 +}
 +
 +printk(   vcpus: );
 +for ( j = 0, n = 0; j  d-max_vcpus; j++ )
 +{
 +if ( !(j  0x3f) )
 +process_pending_softirqs();
 +
 +if ( vnuma-vcpu_to_vnode[j] == i )
 +{
 +if ( (n + 1) % 8 == 0 )
 +printk(%3d\n, j);
 +else if ( !(n % 8)  n != 0 )
 +printk(%17d , j);
 +else
 +printk(%3d , j);
 +n++;
 +}

Do you have a sample of what this looks like for a VM with more than 8
vcpus ?

~Andrew

 +}
 +printk(\n);
 +}
  }
  
  rcu_read_unlock(domlist_read_lock);



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PULL 0/2] Xen tree 2015-01-13

2015-01-13 Thread Peter Maydell

On 13 January 2015 at 18:24, Stefano Stabellini
stefano.stabell...@eu.citrix.com wrote:
 The following changes since commit 7d5ad15d17f26dd4f9ff5f3491828bc34e74f28c:

   Merge remote-tracking branch 'remotes/stefanha/tags/net-pull-request' into 
 staging (2015-01-12 11:13:24 +)

 are available in the git repository at:


   git://xenbits.xen.org/people/sstabellini/qemu-dm.git xen-2015-01-13

 for you to fetch changes up to c1d322e6048796296555dd36fdd102d7fa2f50bf:

   xen-hvm: increase maxmem before calling xc_domain_populate_physmap 
 (2015-01-13 18:05:52 +)

 
 Liang Li (1):
   xen-pt: Fix PCI devices re-attach failed

 Stefano Stabellini (1):
   xen-hvm: increase maxmem before calling xc_domain_populate_physmap

Applied, thanks.

-- PMM

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Architecture for dom0 integrity measurements.

2015-01-13 Thread Dr. Greg Wettstein

On Jan 12,  3:53pm, Xu, Quan wrote:
} Subject: RE: [Xen-devel] Architecture for dom0 integrity measurements.

 Hi, Dr. G.W. Wettstein

Hi Quan, thanks for taking the time to reply.

 cc Graaf who is vTPM / XSM maintainer.
 Also cc Stefano.

Greetings to everyone else as well.

  -Original Message-
  From: xen-devel-boun...@lists.xen.org
  [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of Dr. Greg Wettstein
  Sent: Saturday, January 10, 2015 10:59 PM
  To: xen-devel@lists.xen.org
  Subject: [Xen-devel] Architecture for dom0 integrity measurements.

  Hi, I hope the weekend is going well for everyone.
 
  We have been watching the discussions on the list over the holiday
  on the refinement and enhancement of the TPM architecture for Xen,
  including support for TPM 2.0.  We are active in measured platform
  development and I wanted to pose what is perhaps a philosophical
  question to everyone.
 
  Our systems boot from a hardware root of trust via TXT and we
  heavily leverage the Linux Integrity Measurement Architecture
  (IMA) for mutual remote attestation.

 Is it based on TBoot / OpenAttestation ?

Yes, we leverage TBOOT to implement the root of trust for our security
supervisor.

We have worked with OAT but our development efforts have been focused
on something we refer to as POSSUM.  We are heavily invested in the
concept of intrinsically linking identity to authentication and
ephemeral key exchange through mutual device attestation.

  Others may disagree but I wouldn't even contemplate delivering an
  integrity certified platform without including all of the dom0
  infrastructure into  the platform measurement status.  We heavily
  leverage the current 4.4.x vTPM implementation for testing and
  development and the documentation states clearly to not integrate
  TPM/TIS support into the dom0 OS.

 Ditto.

Everyone seems to agree on this point.

  The obvious model is to run a software TPM simulator in dom0 and
  have the vTPM I/O domains link to that.  We are heavily invested
  in IBM's software TPM simulator and have been tossing around the
  idea of building up a proof of concept based on that.  I wanted to
  make sure we were not misunderstanding anything with the current
  or proposed architecture before we invest the resources.

 BM's software TPM simulator, is it libtpms?

 For all I know, the libtpms is a library that targets the
 integration of TPM functionality into hypervisors.  In this mode,
 libtpms is dynamic linking library, so there is no root of trust.
 If you really want to enable it, I have some=20 Suggestion.

It is Ken Goldman's work at IBM and the library name is libtpm.  It is
a library of TPM functionality which is used to implement a TPM
simulator/server.  Trousers talks to the server and for testing and
development we have been able to move our codebase between it and
hardware without modification.

 1. vTPM I/O domains is now needed in this mode, QEMU can implement
 another TPM Backend to link libtpms. Try to refer to
 http://lists.nongnu.org/archive/html/qemu-devel/2013-11/msg00674.html=20

 2. Enabling seabios for HVM virtual machine.
 Refer to patch ' vTPM: add TPM TCPA and SSDT for HVM virtual machine'
 And https://github.com/virt2x/seabios2=20

Thanks for the references, we are following up on .

 We have also been considering whether or not to implement the
 multiple TPM states in the context of the dom0 hardware
 virtualization instance.

 Does it mean initial states from libtpms? Such as
 clear/save/.etc. Correct me, if I am wrong..

I believe we are talking about the same conept/technology.  The
library initializes its TPM 'state' but the state is not anchored in
hardware.

 Once again not as 'technically secure' but it does cut out a lot of
 complexity with the current model,

 Yes, agree with this point.

Yes, it doesn't take very much work on this technology to appreciate
the reproducibility, flexibility and debuggability of a simulator.

It is not, as I noted above, capable of implementing a hardware root
of trust like a hardware TPM but the same rules apply to it as the
vTPM/vtpmmgr architecture.  If the simulator and its database is
aanchored to a hardware root of trust it should be possible to have
its simulation services be trusted by a guest.

We've started work on going through the code and building up a
prototype which is capable of running multiple TPM instances, each of
which could be coupled to a guest domain.  We will see where that
leads us with respec to couping it via XEN's tpm front/back drivers to
a guest.

  with the added benefit of that infrastructure being covered by a
  hardware rooted IMA state.
 
  Also we are extremely interested in what hardware and motherboards
  with TPM 2.0 support are being used for this development,
  obviously with TXT being a requirement.  It wasn't too long ago we
  were advised directly by Intel that physical hardware was not
  available, perhaps that was a miscommunication.  Given the work

Re: [Xen-devel] [PATCH v3 4/5] tools: code refactoring for MBM

On Tue, Jan 13, 2015 at 01:46:50PM +, Wei Liu wrote:
 On Tue, Jan 13, 2015 at 04:02:12PM +0800, Chao Peng wrote:
  Make some internal routines common so that total/local memory bandwidth
  monitoring in the next patch can make use of them.
  
  Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
 
 Acked-by: Wei Liu wei.l...@citrix.com
 
 Could you please in your later patch submission include short change
 log int the commit message so that reviewers can know what has changed.
 
 The change log can be separated with --- so they do appear in the
 commit message in tree.
 
 Thanks Wei. I added the change logs in the cover letter for the whole
 patch serial. But your suggestion is valuable as we can add more
 detail change logs on per-patch base. I will take your VNUMA patch as
 an example :)

 Chao
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Question about partitioning shared cache in Xen

2015-01-13 Thread Meng Xu

Hi,

[Goal]
I want to investigate the impact of the shared cache on the
performance of workload in guest domain.
I also want to partition the shared cache via page coloring mechanism
so that guest domains can use different cache colors of shared cache
and will not have interference in the shared cache.

[Motivation: Why do I want to partition the shared cache?]
Because the shared cache is shared among all guest domains (I assume
the machine has multicores sharing the same LLC. For example, Intel(R)
Xeon(R) CPU E5-1650 v2 has 6 physical cores sharing a 12MB L3 cache.),
the workload in one domU can interfere another domU's memory-intensive
workload on the same machine via shared cache. This shared-cache
interference makes the execution time of the workload in a domU
non-deterministic and increase a lot. (If we assume the worst case,
the worst-case execution time of the workload will be too
pessimistic.) A stable execution time is very important in real-time
computation when the real-time program, like the control program on
automobile, have to generate the result within a deadline.

I did some quick measurements to show how shared cache can be used by
a holistic domain to interfere the execution time of another domain's
workload. I pin the VCPUs of two domains to different physical cores
and use one domain to pollute the shared cache. The result shows that
the shared-cache interference can make the execution time of another
domain's workload slow down by 4x. The whole experiment result can be
found at 
https://github.com/PennPanda/cis601/blob/master/project/data/boxplot_cache_v2.pdf
 . (The workload of the figure is a program reading a large array. I
run the program for 100 times and draw the latency of accessing the
array in a box plot. The first column with name alone−d1v1 is the
boxplot latency when the program in dom1 runs alone. The fourth column
d1v1d2v1−pindiffcore is the boxplot latency when the program in dom1
runs along with another program in dom2, and these two domains uses
different cores. dom1 and dom2 have 1 vcpu with budget equal to
period. The scheduler is credit scheduler.)

[Idea of how to partition the shared cache]
When a PV guest domain is created, it will call xc_dom_boot_mem_init()
to allocate memory for the domain, which finally calls
xc_domain_populate_physmap_exact() to allocate memory pages from
domheap in Xen.
The idea of partitioning the share cache is as follows:
1) xl tool change: Add an option in domain's configuration file which
specifies which cache colors this domain should use. (I have done this
and when I use xl create --dry-run, I can see the parameters are
parsed to the build information.)
2) hypervisor change: Add another hypercall
xc_domain_populate_physmap_exact_ca() which has one more parameter,
i.e, the cache colors this domain should use. I also need to reserve a
memory pool which sort the reserved memory pages based on its cache
color.

When a PV domain is created, I can specify the cache colors it uses.
Then the xl tool will call the xc_domain_populate_physmap_exact_ca()
to only allocate the memory pages with the specified cache colors to
this domain.

[Quick implementation]
I attached my quick implementation patch at the end of this email.

[Issues and Questions]
After I applied the patch to  Xen's commit point
36174af3fbeb1b662c0eadbfa193e77f68cc955b and run it on my machine,
dom0 cannot boot up.:-(
The error message from dom0 is:
[0.00] Kernel panic - not syncing: Failed to get contiguous
memory for DMA from Xen!

[0.00] You either: don't have the permissions, do not have
enough free memory under 4GB, or the hypervisor memory is too
fragmented! (rc:-12)

I tried to print every message in the function I touched in order to
figure out where goes wrong but failed. :-(
The thing I cannot understand is that: My implementation haven't
reserve any  memory pages in the cache-aware memory pool before the
system boots up. Basically, every function I modified haven't been
called before the system boots up. But the system crashes. :-( (The
system can boot up and work perfectly before applying my patch.)

I really appreciate it if any of you could point out the part I missed
or misunderstood. :-)

Thank you very very much!

Best,

Meng


The full crash message is as follows:

Xen 4.5.0-rc

(XEN) Xen version 4.5.0-rc (root@) (gcc (Ubuntu/Linaro 4.6.3-1ubuntu5)
4.6.3) debug=y Sun Jan 11 11:39:23 EST 2015

(XEN) Latest ChangeSet: Sun Jan 4 22:19:40 2015 -0500 git:962a13f-dirty

(XEN) Bootloader: GRUB 1.99-21ubuntu3.14

(XEN) Command line: placeholder dom0_memory=512M sched=credit
console=tty0 com1=115200n8 console=com1

(XEN) Video information:

(XEN)  VGA is text mode 80x25, font 8x16

(XEN) Disc information:

(XEN)  Found 1 MBR signatures

(XEN)  Found 1 EDD information structures

(XEN) Xen-e820 RAM map:

(XEN)   - 0009fc00 (usable)

(XEN)  0009fc00 - 000a (reserved)

(XEN)  000f -

Re: [Xen-devel] [PATCH v2] [Bugfix] x86/apic: Fix xen IRQ allocation failure caused by commit b81975eade8c

2015-01-13 Thread Thomas Gleixner

On Tue, 13 Jan 2015, Sander Eikelenboom wrote:

 
 Monday, January 12, 2015, 4:01:00 PM, you wrote:
 
  On 12/01/15 13:39, Jiang Liu wrote:
  Commit b81975eade8c (x86, irq: Clean up irqdomain transition code)
  breaks xen IRQ allocation because xen_smp_prepare_cpus() doesn't invoke
  setup_IO_APIC(), so no irqdomains created for IOAPICs and
  mp_map_pin_to_irq() fails at the very beginning.
  
  Enhance xen_smp_prepare_cpus() to call setup_IO_APIC() to initialize
  irqdomain for IOAPICs.
 
  Having Xen call setup_IO_APIC() to initialize the irq domains then having to
  add special cases to it is just wrong.
 
  The bits of init deferred by mp_register_apic() are also deferred to
  two different places which looks odd.
 
  What about something like the following (untested) patch?
 
 Hi David / Gerry,
 
 David's patch (after fixing a few compile issues) fixes the problem.
 
 The power button now works for me on:
 - intel baremetal
 - intel xen
 - amd baremetal (no issues with the override anymore)
 - amd xen   (no freeze issues anymore)

Can someone please send a proper patch with changelog?

Thanks,

tglx

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [Xen-users] [TestDay] minor bug + possible configuration bug 4.5rc4 archlinux

2015-01-13 Thread Olaf Hering

On Mon, Jan 12, Ian Campbell wrote:

 @devs -- we obviously need to do something about this (too late for 4.5,
 but for 4.6 + backport). Perhaps there is some alternative systemd
 construction which disassociates the actual path from the abstract
 service xenstored dir mounted?

I dont think we can do anything about this systemd brain damage. Either
it gets its Where= from such line within the file, or it gets its Where=
from the filename. In which case it has to stop looking at a Where=
line.

In any case, its wrong to use --localstatedir=/tmpfs-mount-point because
that means all mails in the spool subdirectory are in danger. If thats
the mindset of ArchLinux all we can do is to recommend to stop using it
for any serious task.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m

 On 12.01.15 at 18:36, edmund.h.wh...@intel.com wrote:
 On 01/12/2015 02:00 AM, Jan Beulich wrote:
 On 10.01.15 at 00:04, edmund.h.wh...@intel.com wrote:
 On 01/09/2015 02:41 PM, Andrew Cooper wrote:
 Having some non-OS part of the guest swap the EPT tables and
 accidentally turn a DMA buffer read-only is not going to end well.


 The agent can certainly do bad things, and at some level you have to assume 
 it
 is sensible enough not to. However, I'm not sure this is fundamentally more
 dangerous than what a privileged domain can do today using the MEMOP...
 operations, and people are already using those for very similar purposes.
 
 I don't follow - how is what privileged domain can do related to the
 proposed changes here (which are - via VMFUNC - at least partially
 guest controllable, and that's also the case Andrew mentioned in his
 reply)? I'm having a hard time understanding how a P2M stripped of
 anything that's not plain RAM can be very useful to a guest. IOW
 without such fundamental aspects clarified I don't see a point in
 looking at the individual patches (which btw, according to your
 wording elsewhere, should have been marked RFC).
 
 In this patch series, none of the new hypercalls are protected by xsm
 policies. Earlier in the process of working on this code, I added such
 a check to all the hypercalls, but then removed them all because it
 dawned on me that I didn't actually understand what I was doing and
 my code only worked because I only ever built the dummy permit everything
 policy.
 
 Should some version of this patch series be accepted, my hope is that
 someone who does understand xsm policies would put the appropriate checks
 in place, and at that point I maintain that these extra capabilities
 would not be fundamentally more dangerous than existing mechanisms
 available to privileged domains, because policy can prevent the guest
 using vmfunc. That's obviously not true today.

Please simply consult with the XSM maintainer on questions/issues
like this. Proposing a partial (insecure) patch set isn't appropriate.

 The alternate p2m's only contain entries for ram pages with valid mfn's.
 All other page types are still handled in the nested page fault handler
 for the host p2m. Those pages (at least the ones I've encountered) don't
 require the hardware to have a valid EPTE for the page.

I.e. the functionality requiring e.g. p2m_ram_logdirty and
p2m_mmio_direct is then incompatible with your proposed additions
(which I think was also already noted by Andrew). That's imo not
a basis to think about accepting (or even reviewing) the series.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 5/5] tools: add total/local memory bandwith monitoring

Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring
are supported: total and local memory bandwidth monitoring. To use it,
CMT should be enabled in hypervisor.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 docs/man/xl.pod.1 |9 +
 tools/libxc/include/xenctrl.h |2 +
 tools/libxc/xc_psr.c  |8 
 tools/libxl/libxl.h   |8 
 tools/libxl/libxl_psr.c   |   84 +
 tools/libxl/libxl_types.idl   |2 +
 tools/libxl/xl_cmdimpl.c  |   21 ++-
 tools/libxl/xl_cmdtable.c |4 +-
 8 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 6b89ba8..0370625 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just 
attach the domain id with
 the monitoring service. When the domain doesn't need to be monitored any more,
 detach the domain id from the monitoring service.
 
+Intel Broadwell and later server platforms also offer total/local memory
+bandwidth monitoring. Xen supports per-domain monitoring for these two
+additional monitoring types. Both memory bandwidth monitoring and L3 cache
+occupancy monitoring share the same set of underground monitoring service. Once
+a domain is attached to the monitoring service, monitoring data can be showed
+for any of these monitoring types.
+
 =over 4
 
 =item Bpsr-cmt-attach [Idomain-id]
@@ -1476,6 +1483,8 @@ detach: Detach the platform shared resource monitoring 
service from a domain.
 Show monitoring data for a certain domain or all domains. Current supported
 monitor types are:
  - cache-occupancy: showing the L3 cache occupancy.
+ - total-mem-bandwidth: showing the total memory bandwidth.
+ - local-mem-bandwidth: showing the local memory bandwidth.
 
 =back
 
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index c6e9e3e..06366b5 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, 
xc_resource_op_t *ops);
 #if defined(__i386__) || defined(__x86_64__)
 enum xc_psr_cmt_type {
 XC_PSR_CMT_L3_OCCUPANCY,
+XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
 };
 typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index a9881a4..5858693 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -23,6 +23,8 @@
 #define IA32_CMT_CTR_ERROR_MASK (0x3ull  62)
 
 #define EVTID_L3_OCCUPANCY 0x1
+#define EVTID_TOTAL_MEM_BANDWIDTH  0x2
+#define EVTID_LOCAL_MEM_BANDWIDTH  0x3
 
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid)
 {
@@ -168,6 +170,12 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, 
uint32_t cpu,
 case XC_PSR_CMT_L3_OCCUPANCY:
 evtid = EVTID_L3_OCCUPANCY;
 break;
+case XC_PSR_CMT_TOTAL_MEM_BANDWIDTH:
+evtid = EVTID_TOTAL_MEM_BANDWIDTH;
+break;
+case XC_PSR_CMT_LOCAL_MEM_BANDWIDTH:
+evtid = EVTID_LOCAL_MEM_BANDWIDTH;
+break;
 default:
 return -1;
 }
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 596d2a0..347ef52 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1462,6 +1462,14 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
   uint32_t domid,
   uint32_t socketid,
   uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_total_mem_bandwidth(libxl_ctx *ctx,
+  uint32_t domid,
+  uint32_t socketid,
+  uint32_t *bandwidth);
+int libxl_psr_cmt_get_local_mem_bandwidth(libxl_ctx *ctx,
+  uint32_t domid,
+  uint32_t socketid,
+  uint32_t *bandwidth);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index c88c421..0c3e4e6 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -18,6 +18,7 @@
 
 
 #define IA32_QM_CTR_ERROR_MASK (0x3ul  62)
+#define MBM_SAMPLE_RETRY_MAX 4
 
 static void libxl__psr_cmt_log_err_msg(libxl__gc *gc, int err)
 {
@@ -240,6 +241,89 @@ out:
 return rc;
 }
 
+static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
+uint32_t domid,
+xc_psr_cmt_type type,
+uint32_t socketid,
+uint32_t *bandwidth)
+{
+uint64_t sample1, sample2;
+uint32_t upscaling_factor;
+int retry_attempts = 0;
+int rc;
+

[Xen-devel] [PATCH v3 4/5] tools: code refactoring for MBM

Make some internal routines common so that total/local memory bandwidth
monitoring in the next patch can make use of them.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 tools/libxl/libxl_psr.c  |   44 -
 tools/libxl/xl_cmdimpl.c |   54 +++---
 2 files changed, 61 insertions(+), 37 deletions(-)

diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 84819e6..c88c421 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -176,20 +176,16 @@ int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, 
uint32_t *event_mask)
 return rc;
 }
 
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
-  uint32_t domid,
-  uint32_t socketid,
-  uint32_t *l3_cache_occupancy)
+static int libxl__psr_cmt_get_l3_monitoring_data(libxl__gc *gc,
+ uint32_t domid,
+ xc_psr_cmt_type type,
+ uint32_t socketid,
+ uint64_t *data)
 {
-GC_INIT(ctx);
-
 unsigned int rmid;
-uint32_t upscaling_factor;
-uint64_t monitor_data;
 int cpu, rc;
-xc_psr_cmt_type type;
 
-rc = xc_psr_cmt_get_domain_rmid(ctx-xch, domid, rmid);
+rc = xc_psr_cmt_get_domain_rmid(CTX-xch, domid, rmid);
 if (rc  0 || rmid == 0) {
 LOGE(ERROR, fail to get the domain rmid, 
 or domain is not attached with platform QoS monitoring service);
@@ -204,14 +200,32 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
 goto out;
 }
 
-type = XC_PSR_CMT_L3_OCCUPANCY;
-rc = xc_psr_cmt_get_data(ctx-xch, rmid, cpu, type, monitor_data);
+rc = xc_psr_cmt_get_data(CTX-xch, rmid, cpu, type, data);
 if (rc  0) {
 LOGE(ERROR, failed to get monitoring data);
 rc = ERROR_FAIL;
-goto out;
 }
 
+out:
+return rc;
+}
+
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+  uint32_t domid,
+  uint32_t socketid,
+  uint32_t *l3_cache_occupancy)
+{
+GC_INIT(ctx);
+uint64_t data;
+uint32_t upscaling_factor;
+int rc;
+
+rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid,
+   XC_PSR_CMT_L3_OCCUPANCY,
+   socketid, data);
+if (rc  0)
+goto out;
+
 rc = xc_psr_cmt_get_l3_upscaling_factor(ctx-xch, upscaling_factor);
 if (rc  0) {
 LOGE(ERROR, failed to get L3 upscaling factor);
@@ -219,8 +233,8 @@ int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
 goto out;
 }
 
-*l3_cache_occupancy = upscaling_factor * monitor_data / 1024;
-rc = 0;
+*l3_cache_occupancy = upscaling_factor * data / 1024;
+
 out:
 GC_FREE;
 return rc;
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 24f3c8d..09ca73e 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7846,12 +7846,13 @@ out:
 }
 
 #ifdef LIBXL_HAVE_PSR_CMT
-static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
- uint32_t nr_sockets)
+static void psr_cmt_print_domain_l3_info(libxl_dominfo *dominfo,
+ libxl_psr_cmt_type type,
+ uint32_t nr_sockets)
 {
 char *domain_name;
 uint32_t socketid;
-uint32_t l3_cache_occupancy;
+uint32_t data;
 
 if (!libxl_psr_cmt_domain_attached(ctx, dominfo-domid))
 return;
@@ -7861,15 +7862,21 @@ static void 
psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
 free(domain_name);
 
 for (socketid = 0; socketid  nr_sockets; socketid++) {
-if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo-domid, socketid,
-   l3_cache_occupancy))
-printf(%13u KB, l3_cache_occupancy);
+switch (type) {
+case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo-domid,
+   socketid, data))
+printf(%13u KB, data);
+break;
+default:
+return;
+}
 }
 
 printf(\n);
 }
 
-static int psr_cmt_show_cache_occupancy(uint32_t domid)
+static int psr_cmt_show_l3_info(libxl_psr_cmt_type type, uint32_t domid)
 {
 uint32_t i, socketid, nr_sockets, total_rmid;
 uint32_t l3_cache_size;
@@ -7905,19 +7912,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
 printf(%14s %d, Socket, socketid);
 printf(\n);
 
-/* Total L3 cache size */
-printf(%-46s, Total L3 Cache Size);
-

[Xen-devel] [PATCH v3 1/5] x86: expose CMT L3 event mask to user space

L3 event mask indicates the event types supported in host, including
cache occupancy event as well as local/total memory bandwidth events
for Memory Bandwidth Monitoring(MBM). Expose it so all these events
can be monitored in user space.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
Acked-by: Jan Beulich jbeul...@suse.com
---
 xen/arch/x86/sysctl.c   |3 +++
 xen/include/public/sysctl.h |1 +
 2 files changed, 4 insertions(+)

diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c
index 57ad992..611a291 100644
--- a/xen/arch/x86/sysctl.c
+++ b/xen/arch/x86/sysctl.c
@@ -157,6 +157,9 @@ long arch_do_sysctl(
 sysctl-u.psr_cmt_op.u.data = (ret ? 0 : info.size);
 break;
 }
+case XEN_SYSCTL_PSR_CMT_get_l3_event_mask:
+sysctl-u.psr_cmt_op.u.data = psr_cmt-l3.features;
+break;
 default:
 sysctl-u.psr_cmt_op.u.data = 0;
 ret = -ENOSYS;
diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h
index b3713b3..8552dc6 100644
--- a/xen/include/public/sysctl.h
+++ b/xen/include/public/sysctl.h
@@ -641,6 +641,7 @@ DEFINE_XEN_GUEST_HANDLE(xen_sysctl_coverage_op_t);
 /* The L3 cache size is returned in KB unit */
 #define XEN_SYSCTL_PSR_CMT_get_l3_cache_size 2
 #define XEN_SYSCTL_PSR_CMT_enabled   3
+#define XEN_SYSCTL_PSR_CMT_get_l3_event_mask 4
 struct xen_sysctl_psr_cmt_op {
 uint32_t cmd;   /* IN: XEN_SYSCTL_PSR_CMT_* */
 uint32_t flags; /* padding variable, may be extended for future use */
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 3/5] tools: correct coding style for psr

- space: remove space after '(' or before ')' in 'if' condition;
- indention: align function definition/call arguments;

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
Acked-by: Wei Liu wei.l...@citrix.com
---
 tools/libxc/include/xenctrl.h |8 
 tools/libxc/xc_psr.c  |   10 +-
 tools/libxl/libxl.h   |   11 +++
 tools/libxl/libxl_psr.c   |   11 +++
 tools/libxl/xl_cmdimpl.c  |   11 ++-
 5 files changed, 29 insertions(+), 22 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 96b357c..c6e9e3e 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2693,15 +2693,15 @@ typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid);
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-uint32_t *rmid);
+   uint32_t *rmid);
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-uint32_t *upscaling_factor);
+   uint32_t *upscaling_factor);
 int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
 uint32_t *l3_cache_size);
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-uint32_t cpu, uint32_t psr_cmt_type, uint64_t *monitor_data);
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+uint32_t psr_cmt_type, uint64_t *monitor_data);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index ac19fe4..a9881a4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -47,7 +47,7 @@ int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid)
 }
 
 int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid,
-uint32_t *rmid)
+   uint32_t *rmid)
 {
 int rc;
 DECLARE_DOMCTL;
@@ -88,7 +88,7 @@ int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t 
*total_rmid)
 }
 
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
-uint32_t *upscaling_factor)
+   uint32_t *upscaling_factor)
 {
 static int val = 0;
 int rc;
@@ -130,7 +130,7 @@ int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, 
uint32_t *event_mask)
 }
 
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
-  uint32_t *l3_cache_size)
+ uint32_t *l3_cache_size)
 {
 static int val = 0;
 int rc;
@@ -155,8 +155,8 @@ int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, 
uint32_t cpu,
 return rc;
 }
 
-int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
-uint32_t cpu, xc_psr_cmt_type type, uint64_t *monitor_data)
+int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
+xc_psr_cmt_type type, uint64_t *monitor_data)
 {
 xc_resource_op_t op;
 xc_resource_entry_t entries[2];
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index c9a64f9..596d2a0 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1454,11 +1454,14 @@ int libxl_psr_cmt_detach(libxl_ctx *ctx, uint32_t 
domid);
 int libxl_psr_cmt_domain_attached(libxl_ctx *ctx, uint32_t domid);
 int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
-uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+uint32_t socketid,
+uint32_t *l3_cache_size);
 int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
-int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
-uint32_t socketid, uint32_t *l3_cache_occupancy);
+int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx,
+  uint32_t domid,
+  uint32_t socketid,
+  uint32_t *l3_cache_occupancy);
 #endif
 
 /* misc */
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 07f2aee..84819e6 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -135,8 +135,9 @@ int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t 
*total_rmid)
 return rc;
 }
 
-int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
- uint32_t *l3_cache_size)
+int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx,
+uint32_t socketid,
+uint32_t

[Xen-devel] [PATCH v3 0/5] enable Memory Bandwidth Monitoring (MBM) for VMs

Changes from v2:
* Remove the usage of static to cache data in xc;
  NOTE: Other places that already existed before are not touched due to
the needs for API change. Will fix in separate patch if desirable.
* Coding style;

Changes from v1:
* Move event type check from xc to xl;
* Add retry capability for MBM sampling;
* Fix Coding style/docs;

Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature
which builds on the CMT infrastructure to allow monitoring of system
memory bandwidth. Event codes are provided to monitor both total
and local bandwidth, meaning bandwidth over QPI and other external
links can be monitored.

For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its
dependency on CMT, the software also makes use of most of CMT codes.
Actually, besides introducing two additional events and some cpuid
feature bits, there are no extra changes compared to cache occupancy
monitoring in CMT. Due to this, CMT should be enabled first to use
this feature.

For interface changes, the patch serial only introduces a new command
XEN_SYSCTL_PSR_CMT_get_l3_event_mask which exposes MBM feature
capability to user space and introduces two additional options for
xl psr-cmt-show:
total_mem_bandwidth: Show total memory bandwidth
local_mem_bandwidth: Show local memory bandwidth

The usage flow keeps the same with CMT.

Chao Peng (5):
  x86: expose CMT L3 event mask to user space
  tools: add routine to get CMT L3 event mask
  tools: correct coding style for psr
  tools: code refactoring for MBM
  tools: add total/local memory bandwith monitoring

 docs/man/xl.pod.1 |9 +++
 tools/libxc/include/xenctrl.h |   11 ++--
 tools/libxc/xc_psr.c  |   35 --
 tools/libxl/libxl.h   |   20 --
 tools/libxl/libxl_psr.c   |  142 +
 tools/libxl/libxl_types.idl   |2 +
 tools/libxl/xl_cmdimpl.c  |   72 +++--
 tools/libxl/xl_cmdtable.c |4 +-
 xen/arch/x86/sysctl.c |3 +
 xen/include/public/sysctl.h   |1 +
 10 files changed, 251 insertions(+), 48 deletions(-)

-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 2/5] tools: add routine to get CMT L3 event mask

This is the tools side wrapper for XEN_SYSCTL_PSR_CMT_get_l3_event_mask
of XEN_SYSCTL_psr_cmt_op.

Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
---
 tools/libxc/include/xenctrl.h |1 +
 tools/libxc/xc_psr.c  |   17 +
 tools/libxl/libxl.h   |1 +
 tools/libxl/libxl_psr.c   |   15 +++
 4 files changed, 34 insertions(+)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..96b357c 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2697,6 +2697,7 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, 
uint32_t domid,
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
 uint32_t *upscaling_factor);
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
 uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid,
diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index 872e6dc..ac19fe4 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -112,6 +112,23 @@ int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
 return rc;
 }
 
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask)
+{
+int rc;
+DECLARE_SYSCTL;
+
+sysctl.cmd = XEN_SYSCTL_psr_cmt_op;
+sysctl.u.psr_cmt_op.cmd =
+XEN_SYSCTL_PSR_CMT_get_l3_event_mask;
+sysctl.u.psr_cmt_op.flags = 0;
+
+rc = xc_sysctl(xch, sysctl);
+if ( !rc )
+*event_mask = sysctl.u.psr_cmt_op.u.data;
+
+return rc;
+}
+
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
   uint32_t *l3_cache_size)
 {
diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
index 0a123f1..c9a64f9 100644
--- a/tools/libxl/libxl.h
+++ b/tools/libxl/libxl.h
@@ -1456,6 +1456,7 @@ int libxl_psr_cmt_enabled(libxl_ctx *ctx);
 int libxl_psr_cmt_get_total_rmid(libxl_ctx *ctx, uint32_t *total_rmid);
 int libxl_psr_cmt_get_l3_cache_size(libxl_ctx *ctx, uint32_t socketid,
 uint32_t *l3_cache_size);
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask);
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
 uint32_t socketid, uint32_t *l3_cache_occupancy);
 #endif
diff --git a/tools/libxl/libxl_psr.c b/tools/libxl/libxl_psr.c
index 0437465..07f2aee 100644
--- a/tools/libxl/libxl_psr.c
+++ b/tools/libxl/libxl_psr.c
@@ -160,6 +160,21 @@ out:
 return rc;
 }
 
+int libxl_psr_cmt_get_l3_event_mask(libxl_ctx *ctx, uint32_t *event_mask)
+{
+GC_INIT(ctx);
+int rc;
+
+rc = xc_psr_cmt_get_l3_event_mask(ctx-xch, event_mask);
+if (rc  0) {
+libxl__psr_cmt_log_err_msg(gc, errno);
+rc = ERROR_FAIL;
+}
+
+GC_FREE;
+return rc;
+}
+
 int libxl_psr_cmt_get_cache_occupancy(libxl_ctx *ctx, uint32_t domid,
 uint32_t socketid, uint32_t *l3_cache_occupancy)
 {
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH] xen-time: decreasing the rating of the xen clocksource below that of the tsc clocksource for dom0's

2015-01-13 Thread Imre Palik

From: Palik, Imre im...@amazon.de

In Dom0's the use of the TSC clocksource (whenever it is stable enough to
be used) instead of the Xen clocksource should not cause any issues, as
Dom0 VMs never live-migrated.  The TSC clocksource is somewhat more
efficient than the Xen paravirtualised clocksource, thus it should have
higher rating.

This patch decreases the rating of the Xen clocksource in Dom0s to 275.
Which is half-way between the rating of the TSC clocksource (300) and the
hpet clocksource (250).

Cc: Anthony Liguori aligu...@amazon.com
Signed-off-by: Imre Palik im...@amazon.de
---
 arch/x86/xen/time.c |4 
 1 file changed, 4 insertions(+)

diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c
index f473d26..c768726 100644
--- a/arch/x86/xen/time.c
+++ b/arch/x86/xen/time.c
@@ -487,6 +487,10 @@ static void __init xen_time_init(void)
int cpu = smp_processor_id();
struct timespec tp;
 
+   /* As Dom0 is never moved, no penalty on using TSC there */
+   if (xen_initial_domain())
+   xen_clocksource.rating = 275;
+
clocksource_register_hz(xen_clocksource, NSEC_PER_SEC);
 
if (HYPERVISOR_vcpu_op(VCPUOP_stop_periodic_timer, cpu, NULL) == 0) {
-- 
1.7.9.5


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [libvirt test] 33382: regressions - FAIL

2015-01-13 Thread xen . org

flight 33382 libvirt real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33382/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-libvirt   5 libvirt-build fail REGR. vs. 32648
 build-i386-libvirt5 libvirt-build fail REGR. vs. 32648

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass

version targeted for testing:
 libvirt  97fac17c77d9bdfacafff1c5c39b2df3c1530614
baseline version:
 libvirt  2360fe5d24175835d3f5fd1c7e8e6e13addab629


People who touched revisions under test:
  Alexander Burluka aburl...@parallels.com
  Cedric Bosdonnat cbosdon...@suse.com
  Chunyan Liu cy...@suse.com
  CÃ©dric Bosdonnat cbosdon...@suse.com
  Daniel P. Berrange berra...@redhat.com
  Eric Blake ebl...@redhat.com
  Geoff Hickey ghic...@datagravity.com
  Jim Fehlig jfeh...@suse.com
  Jiri Denemark jdene...@redhat.com
  John Ferlan jfer...@redhat.com
  JÃ¡n Tomko jto...@redhat.com
  Kiarie Kahurani davidkiar...@gmail.com
  Luyao Huang lhu...@redhat.com
  Michal Privoznik mpriv...@redhat.com
  Nehal J Wani nehaljw.k...@gmail.com
  Pavel Hrdina phrd...@redhat.com
  Peter Krempa pkre...@redhat.com
  Stefan Berger stef...@linux.vnet.ibm.com


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  fail
 build-i386-libvirt   fail
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt fail
 test-armhf-armhf-libvirt blocked
 test-amd64-i386-libvirt  blocked



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 680 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Fwd: [OPW PATCH 1/4] tools/xl: Calling _init and _dispose function for libxl types

2015-01-13 Thread Uma Sharma

There was no v2 (v2 was not created properly).
Yes, 1/4 was the cover letter. And 4/4 was not correct.

Thank you for applying the patches.

On Mon, Jan 12, 2015 at 11:29 PM, Ian Campbell ian.campb...@citrix.com wrote:
 On Tue, 2014-10-21 at 18:04 +0100, George Dunlap wrote:

 Just getting back to these after the freeze.

 On Tue, Oct 21, 2014 at 5:34 PM, Uma Sharma uma.sharma...@gmail.com wrote:
  Should I resend the patches then?

 On the xen-devel list, always reply at the bottom, like this.  :-)

 I think normally it wouldn't matter, but since the point of the
 exercise is to get you familiar with the tools, I'd say yes, why don't
 you send them again (maybe using the 'v2' tag).

 Was there a v2 here? If so I seem to have misplaced it.

 As it stands it looks like I have:
 [OPW PATCH 2/4] tools/xl: Call init function for libxl_domain_sched_params
 (AKA 544581bd.847e460a.4ff9.a...@mx.google.com)
 [OPW PATCH 3/4] tools/xl: Call init function for libxl_bitmap
 (AKA 54458271.a28b420a.52e5.a...@mx.google.com)

 Both of which are acked by Wei, I have applied them.

 I don't seem to have the actual 1/4 patch, or was 1/4 just the cover
 letter?

 [OPW PATCH 4/4] tools/xl:Call init and dispose function for libxl_dominfo
 (AKA 544583e4.c8e7420a.6486.b...@mx.google.com) was incorrect, as
 was the followup tools/xl:Making _dispose function simplicity for
 libxl_dominfo. I think the code in that case is correct as is.

 Please let me know if there are any other outstanding patches from the
 OPW application process which I've missed.

 Ian.




-- 
Regards,
Uma Sharma
http://about.me/umasharma

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [linux-linus test] 33377: regressions - FAIL

2015-01-13 Thread xen . org

flight 33377 linux-linus real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33377/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-qemuu-rhel6hvm-intel  5 xen-boot  fail REGR. vs. 32879
 test-amd64-i386-xl-credit29 guest-start   fail REGR. vs. 32879

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail like 32879
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail like 32879
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 32879
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install  fail like 32879

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass

version targeted for testing:
 linuxeaa27f34e91a14cdceed26ed6c6793ec1d186115
baseline version:
 linux9bb29b6b927bcd79cf185ee67bcebfe630f0dea1


People who touched revisions under test:
  John W. Linville linvi...@tuxdriver.com
  Aaron Brown aaron.f.br...@intel.com
  Aaron Plattner aplatt...@nvidia.com
  Alan Stern st...@rowland.harvard.edu
  Alex Deucher alexander.deuc...@amd.com
  Alex Thorlton athorl...@sgi.com
  Alex Williamson alex.william...@redhat.com
  Alexandre Courbot acour...@nvidia.com
  Alexey Khoroshilov khoroshi...@ispras.ru
  Andi Kleen a...@linux.intel.com
  Andreas Oehler andr...@oehler-net.de
  Andrew Jackson andrew.jack...@arm.com
  Andrew Morton a...@linux-foundation.org
  Andy Lutomirski l...@amacapital.net
  Andy Shevchenko andy.shevche...@gmail.com
  Anil Chintalapati (achintal) achin...@cisco.com
  Anil Chintalapati achin...@cisco.com
  Anton Vorontsov anton.voront...@linaro.org
  Antonio Quartulli anto...@meshcoding.com
  Ard Biesheuvel ard.biesheu...@linaro.org
  Arnaldo Carvalho de Melo a...@redhat.com
  Arne Goedeke e...@laramies.com
  Aron Szabo a...@ubit.hu
  Ben Goz ben@amd.com
  Ben Pfaff b...@nicira.com
  Ben Skeggs bske...@redhat.com
  Benjamin Tissoires benjamin.tissoi...@redhat.com
  BjÃ¸rn Mork bj...@mork.no
  Bruno PrÃ©mont bonb...@linux-vserver.org
  Catalin Marinas catalin.mari...@arm.com
  Chad Dupuis chad.dup...@qlogic.com
  Chris Mason c...@fb.com
  Chris Wilson ch...@chris-wilson.co.uk
  Christian KÃ¶nig christian.koe...@amd.com
  Christoph Hellwig h...@lst.de
  Corey Minyard cminy...@mvista.com
  Dan Carpenter dan.carpen...@oracle.com
  Daniel Borkmann dbork...@redhat.com
  Daniel Mack dan...@zonque.org
  Daniel Nicoletti dantt...@gmail.com
  Daniel Thompson daniel.thomp...@linaro.org
  Daniel Walter d.wal...@0x90.at
  Dave Airlie airl...@gmail.com
  Dave Airlie airl...@redhat.com
  David Ahern dsah...@gmail.com
  David Drysdale drysd...@google.com
  David Howells dhowe...@redhat.com
  David Rientjes rient...@google.com
  David S. Miller da...@davemloft.net
  Davidlohr Bueso d...@stgolabs.net
  Doug Anderson diand...@chromium.org
  Fabian Frederick f...@skynet.be
  Fang, Yang A yang.a.f...@intel.com
  Felipe Balbi ba...@ti.com
  Filipe Manana fdman...@suse.com
  Francesco Virlinzi francesco.virli...@st.com
  Giedrius StatkeviÄius giedrius.statkevic...@gmail.com
  Govindarajulu Varadarajan _gov...@gmx.com
  Grygorii Strashko

Re: [Xen-devel] [PATCH v5 7/9] libxc: introduce soft reset for HVM domains

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
 Add new xc_domain_soft_reset() function which performs so-called 'soft reset'
 for an HVM domain. It is being performed in the following way:
 - Save HVM context and all HVM params;
 - Devour original domain with XEN_DOMCTL_devour;
 - Wait till original domain dies or has no pages left;
 - Restore HVM context, HVM params, seed grant table.

Are any of these operations slow, per the definition under 'Machinery
for asynchronous operations (ao)' in libxl_internal.h? Wait till
original domain dies sounds like it might be.

That might have implications for the use of this functionality from
libxl.

 +xc_hvm_param_get(xch, source_dom, HVM_PARAM_IDENT_PT,
 + hvm_params[HVM_PARAM_IDENT_PT]);

There's quite a risk of the set of HVM parameters retrieved getting out
of sync, either with the hypervisor or with the sets done below.

I don't know if any part of the migration infrastructure (specifically
Andy Cooper's v2 stuff, or some of the underlying hypercalls) could be
reused here to pickle/unpickle the state?

Other possibilities:

A new hypercall pair to get/set all hvm params.

An list of params to save/restore locally here, which would at least
stop the get/set parts gettuing out of sync, but doesn't help with the
hypervisor getting out of sync (and therefore would not be my preferred
solution).

Also this function needs to take arch specifics into account.

 +while ( 1 )
 +{
 +sleep(SLEEP_INT);
 +if ( xc_get_tot_pages(xch, source_dom) = 0 )
 +{
 +DPRINTF(All pages were transferred);
 +break;
 +}
 +}

I think we are going to need to find a better solution than this.

Changing the nature of the hypercall as I suggested in a previous reply
would also remove this, so I'll wait for a verdict on that before
worrying about this bit any further.
 [...]

 +PERROR(Faled to perform soft reset, destroying domain %d,

Failed

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/10] xen/arm: Implement hip04-d01 platform

2015-01-13 Thread Frediano Ziglio

2015-01-13 11:58 GMT+00:00 Ian Campbell ian.campb...@citrix.com:
 On Mon, 2014-11-03 at 10:11 +, Frediano Ziglio wrote:
 Add this new platform to Xen.
 This platform require specific code to initialize CPUs.

 What is the bootwrapper? Are you running this on real silicon or on an
 emulator? Can the platform be made to do PSCI instead?


Very real. It's actually on my desk and I'm not in Matrix :-)
Has no PSCI support. Would be honestly very great. As we (as company)
write the firmware could be technically doable. There is no plan. This
piece of software is meant to bring the CPU from Secure mode to
Unsecure Hypervisor mode before calling kernel/hypervisor code and
provide supervisor calls.

 +np_fab = dt_find_compatible_node(NULL, NULL, hisilicon,hip04-fabric);

 Please add a reference to the DT bindings document for these values.

 linux/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
 seems related but doesn't talk about most of these fields.


There are documentation in the Linaro kernel, see
https://git.linaro.org/kernel/linux-linaro-tracking.git/blob/HEAD:/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt.
I hope it will be merged soon.

Frediano

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 8/9] libxl: soft reset support

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
 Supported for HVM guests only.

Is it specifically PVHVM guests, or are unaware HVM guests also
supported? (I think the answer is that an unaware HVM guest has no way
to trigger a soft reset, so maybe it's moot...)

 diff --git a/tools/libxl/libxl.h b/tools/libxl/libxl.h
 index 0a123f1..710dc0e 100644
 --- a/tools/libxl/libxl.h
 +++ b/tools/libxl/libxl.h
 @@ -929,6 +929,12 @@ int static inline libxl_domain_create_restore_0x040200(
  
  #endif
  
 +int libxl_domain_soft_reset(libxl_ctx *ctx, libxl_domain_config *d_config,
 +uint32_t *domid, uint32_t domid_old,
 +const libxl_asyncop_how *ao_how,
 +const libxl_asyncprogress_how *aop_console_how)
 +LIBXL_EXTERNAL_CALLERS_ONLY;
 +
/* A progress report will be made via ao_console_how, of type
 * domain_create_console_available, when the domain's primary
 * console is available and can be connected to.
 diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
 index 1198225..0a840c9 100644
 --- a/tools/libxl/libxl_create.c
 +++ b/tools/libxl/libxl_create.c
 @@ -25,6 +25,8 @@
  #include xen/hvm/hvm_info_table.h
  #include xen/hvm/e820.h
  
 +#define INVALID_DOMID ~0

Is this completely internal to this file, or are you requiring that it
matches the one in xl_cmdimpl.c (i.e does it cross the library
interface)?

 +
 +void libxl__xc_domain_soft_reset(libxl__egc *egc,
 + libxl__domain_create_state *dcs)
 +{
 +STATE_AO_GC(dcs-ao);
 +libxl_ctx *ctx = libxl__gc_owner(gc);
 +const uint32_t domid_soft_reset = dcs-domid_soft_reset;
 +const uint32_t domid = dcs-guest_domid;
 +libxl_domain_config *const d_config = dcs-guest_config;
 +libxl_domain_build_info *const info = d_config-b_info;
 +uint8_t *buf;
 +uint32_t len;
 +uint32_t console_domid, store_domid;
 +unsigned long store_mfn, console_mfn;
 +int rc;
 +struct libxl__domain_suspend_state *dss;
 +
 +GCNEW(dss);
 +
 +dss-ao = ao;
 +dss-domid = domid_soft_reset;
 +dss-dm_savefile = GCSPRINTF(/var/lib/xen/qemu-save.%d,
 + domid_soft_reset);
 +
 +if (info-type == LIBXL_DOMAIN_TYPE_HVM) {

I thought the alternative  (PV) wasn't possible?

 +rc = libxl__domain_suspend_device_model(gc, dss);
 +if (rc) goto out;
 +}
 +
 +console_domid = dcs-build_state.console_domid;
 +store_domid = dcs-build_state.store_domid;
[...]
 +rc = xc_domain_soft_reset(ctx-xch, domid_soft_reset, domid, 
 console_domid,
 +  console_mfn, store_domid, store_mfn);
 +if (rc) goto out;
[..]
 +dcs-build_state.store_mfn = store_mfn;
 +dcs-build_state.console_mfn = console_mfn;

Are you trying to avoid passing dcs-build_state.store_mfn to the xc
function directly for some reason?

 +
 +rc = libxl__toolstack_save(domid_soft_reset, buf, len, dss);
 +if (rc) goto out;
 +
 +rc = libxl__toolstack_restore(domid, buf, len, dcs-shs);
 +if (rc) goto out;
 +out:
 +/*
 + * Now pretend we did normal restore and simply call
 + * libxl__xc_domain_restore_done().
 + */
 +libxl__xc_domain_restore_done(egc, dcs, rc, 0, 0);
 +}
 +
  void libxl__srm_callout_callback_restore_results(unsigned long store_mfn,
unsigned long console_mfn, void *user)
  {
 diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
 index 4a0e2be..10ef652 100644
 --- a/tools/libxl/libxl_types.idl
 +++ b/tools/libxl/libxl_types.idl
 @@ -121,6 +121,8 @@ libxl_action_on_shutdown = 
 Enumeration(action_on_shutdown, [
  
  (5, COREDUMP_DESTROY),
  (6, COREDUMP_RESTART),
 +
 +(7, SOFT_RESET),

I think I mention a LIBXL_HAVE #define earlier on, since they are all
related I think you can have a single one for the overall feature rather
than ones for each new enum value. function etc. Probably
LIBXL_HAVE_DOMAIN_SOFT_RESET fits best?

 @@ -2519,7 +2538,17 @@ start:
   * restore/migrate-receive it again.
   */
  restoring = 0;
 -}else{
 +} else if (domid_old != INVALID_DOMID) {
 +/* Do soft reset */
 +ret = libxl_domain_soft_reset(ctx, d_config,
 +  domid, domid_old,
 +  0, 0);
 +
 +if ( ret ) {
 +goto error_out;
 +}
 +domid_old = INVALID_DOMID;
 +} else {
  ret = libxl_domain_create_new(ctx, d_config, domid,
0, autoconnect_console_how);
  }
 @@ -2583,6 +2612,8 @@ start:
  event-u.domain_shutdown.shutdown_reason,
  event-u.domain_shutdown.shutdown_reason);
  switch (handle_domain_death(domid, event, d_config)) {
 +case 3:
 +domid_old = domid;

Please comment when falling

[Xen-devel] [PATCH v3 06/24] xen/arm: Map disabled device in DOM0

The check to avoid mapping disabled device in DOM0 was added in the anticipation
of the device passthrough. But, a brand new property will be added later to mark
device which will passthrough. At the same time, remove the memory type
check because those nodes has been blacklisted.

Futhermore, some platform (such as the OMAP) may try to poke device even
if the property status is set to disabled.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Andrii Tseglytskyi andrii.tseglyts...@globallogic.com

---

Changes in v3:
- Patch added
- xen/arm: follow-up to allow DOM0 manage IRQ and MMIO has
been split in 2 patch [1]
- Drop the check for memory type. Thoses nodes have been
blacklisted.

[1] https://patches.linaro.org/34669/
---
 xen/arch/arm/domain_build.c| 19 +++
 xen/arch/arm/platforms/omap5.c | 12 
 2 files changed, 3 insertions(+), 28 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index 8f1b48e..f68755f 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1104,22 +1104,9 @@ static int handle_node(struct domain *d, struct 
kernel_info *kinfo,
 return 0;
 }
 
-/*
- * Some device doesn't need to be mapped in Xen:
- *  - Memory: the guest will see a different view of memory. It will
- *  be allocated later.
- *  - Disabled device: Linux is able to cope with status=disabled
- *  property. Therefore these device doesn't need to be mapped. This
- *  solution can be use later for pass through.
- */
-if ( !dt_device_type_is_equal(node, memory) 
- dt_device_is_available(node) )
-{
-res = map_device(d, node);
-
-if ( res )
-return res;
-}
+res = map_device(d, node);
+if ( res)
+return res;
 
 /*
  * The property name is used to have a different name on older FDT
diff --git a/xen/arch/arm/platforms/omap5.c b/xen/arch/arm/platforms/omap5.c
index 9d6e504..e7bf30d 100644
--- a/xen/arch/arm/platforms/omap5.c
+++ b/xen/arch/arm/platforms/omap5.c
@@ -155,17 +155,6 @@ static const char * const dra7_dt_compat[] __initconst =
 NULL
 };
 
-static const struct dt_device_match dra7_blacklist_dev[] __initconst =
-{
-/* OMAP Linux kernel handles devices with status disabled in a
- * weird manner - tries to reset them. While their memory ranges
- * are not mapped, this leads to data aborts, so skip these devices
- * from DT for dom0.
- */
-DT_MATCH_NOT_AVAILABLE(),
-{ /* sentinel */ },
-};
-
 PLATFORM_START(omap5, TI OMAP5)
 .compatible = omap5_dt_compat,
 .init_time = omap5_init_time,
@@ -185,7 +174,6 @@ PLATFORM_START(dra7, TI DRA7)
 
 .dom0_gnttab_start = 0x4b00,
 .dom0_gnttab_size = 0x2,
-.blacklist_dev = dra7_blacklist_dev,
 PLATFORM_END
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 12/24] xen/arm: Release IRQ routed to a domain when it's destroying

Xen has to release IRQ routed to a domain in order to reuse later. Currently
only SPIs can be routed to the guest so we only need to browse SPIs for a
specific domain.

Futhermore, a guest can crash and let the IRQ in an incorrect state (i.e has
not being EOIed). Xen will have to reset the IRQ in order to be able to reuse
the IRQ later.

Introduce 2 new functions for release an IRQ routed to a domain:
- release_guest_irq: upper level to retrieve the IRQ, call the GIC
code and release the action
- gic_remove_guest_irq: Check if we can remove the IRQ, and reset
it if necessary

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
Changes in v3:
- Take the vgic rank lock to protect p-desc
- Correctly check if the IRQ is disabled
- Extend the check on the virq in release_guest_irq
- Use vgic_get_target_vcpu to get the target vCPU
- Remove spurious change

Changes in v2:
- Drop the desc-handler = no_irq_type in release_irq as it's
buggy if the IRQ is routed to Xen
- Add release_guest_irq and gic_remove_guest_irq
---
 xen/arch/arm/gic.c| 46 +
 xen/arch/arm/irq.c| 48 +++
 xen/arch/arm/vgic.c   | 16 
 xen/include/asm-arm/gic.h |  4 
 xen/include/asm-arm/irq.h |  2 ++
 5 files changed, 116 insertions(+)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 240870f..bb298e9 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -162,6 +162,52 @@ out:
 return res;
 }
 
+/* This function only works with SPIs for now */
+int gic_remove_irq_from_guest(struct domain *d, unsigned int virq,
+  struct irq_desc *desc)
+{
+struct vcpu *v_target = vgic_get_target_vcpu(d-vcpu[0], virq);
+struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq);
+struct pending_irq *p = irq_to_pending(v_target, virq);
+unsigned long flags;
+
+ASSERT(spin_is_locked(desc-lock));
+ASSERT(test_bit(_IRQ_GUEST, desc-status));
+ASSERT(p-desc == desc);
+
+vgic_lock_rank(v_target, rank, flags);
+
+/* If the IRQ is removed when the domain is dying, we only need to
+ * EOI the IRQ if it has not been done by the guest
+ */
+if ( d-is_dying )
+{
+desc-handler-shutdown(desc);
+if ( test_bit(_IRQ_INPROGRESS, desc-status) )
+gic_hw_ops-deactivate_irq(desc);
+clear_bit(_IRQ_INPROGRESS, desc-status);
+goto end;
+}
+
+/* TODO: Handle eviction from LRs. For now, deny remove if the IRQ
+ * is inflight and not disabled.
+ */
+if ( test_bit(_IRQ_INPROGRESS, desc-status) ||
+ !test_bit(_IRQ_DISABLED, desc-status) )
+return -EBUSY;
+
+end:
+clear_bit(_IRQ_GUEST, desc-status);
+desc-handler = no_irq_type;
+
+p-desc = NULL;
+
+vgic_unlock_rank(v_target, rank, flags);
+
+
+return 0;
+}
+
 int gic_irq_xlate(const u32 *intspec, unsigned int intsize,
   unsigned int *out_hwirq,
   unsigned int *out_type)
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 0072347..ce5ae1a 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -504,6 +504,54 @@ free_info:
 return retval;
 }
 
+int release_guest_irq(struct domain *d, unsigned int virq)
+{
+struct irq_desc *desc;
+struct irq_guest *info;
+unsigned long flags;
+struct pending_irq *p;
+int ret;
+
+/* Only SPIs are supported */
+if ( virq  32 || virq = vgic_num_irqs(d) )
+return -EINVAL;
+
+p = irq_to_pending(d-vcpu[0], virq);
+if ( !p-desc )
+return -EINVAL;
+
+desc = p-desc;
+
+spin_lock_irqsave(desc-lock, flags);
+
+ret = -EINVAL;
+if ( !test_bit(_IRQ_GUEST, desc-status) )
+goto unlock;
+
+ret = -EINVAL;
+
+info = irq_get_guest_info(desc);
+if ( d != info-d )
+goto unlock;
+
+ret = gic_remove_irq_from_guest(d, virq, desc);
+
+spin_unlock_irqrestore(desc-lock, flags);
+
+if ( !ret )
+{
+release_irq(desc-irq, info);
+xfree(info);
+}
+
+return ret;
+
+unlock:
+spin_unlock_irqrestore(desc-lock, flags);
+
+return ret;
+}
+
 /*
  * pirq event channels. We don't use these on ARM, instead we use the
  * features of the GIC to inject virtualised normal interrupts.
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index fc8a270..4ddfd73 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -133,6 +133,22 @@ void register_vgic_ops(struct domain *d, const struct 
vgic_ops *ops)
 
 void domain_vgic_free(struct domain *d)
 {
+int i;
+int ret;
+
+for ( i = 0; i  (d-arch.vgic.nr_spis); i++ )
+{
+struct pending_irq *p = d-arch.vgic.pending_irqs[i];
+
+if ( p-desc )
+{
+ret = release_guest_irq(d, p-irq);
+if ( ret )
+dprintk(XENLOG_G_WARNING, d%u: Failed to

[Xen-devel] [PATCH v3 03/24] xen/dts: Allow only IRQ translation that are mapped to main GIC

Xen is only able to handle one GIC controller. Some platform may contain
other interrupt controller.

Make sure to only translate IRQ mapped into the GIC handled by Xen.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---

Changes in v3:
- Patch was previously sent a separate series [1]
- Rework the comment in dt_irq_translate.

Changelog based on the separate series:

Changes in v3:
- Add an ASSERT to check that dt_interrupt_controller is not
NULL.

Changes in v2:
- Fix compilation...

[1] https://patches.linaro.org/33312/
---
 xen/common/device_tree.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c
index f471008..bb9d7ce 100644
--- a/xen/common/device_tree.c
+++ b/xen/common/device_tree.c
@@ -1058,8 +1058,14 @@ int dt_irq_translate(const struct dt_raw_irq *raw,
  struct dt_irq *out_irq)
 {
 ASSERT(dt_irq_xlate != NULL);
+ASSERT(dt_interrupt_controller != NULL);
 
-/* TODO: Retrieve the right irq_xlate. This is only work for the gic */
+/*
+ * TODO: Retrieve the right irq_xlate. This is only works for the primary
+ * interrupt controller.
+ */
+if ( raw-controller != dt_interrupt_controller )
+return -EINVAL;
 
 return dt_irq_xlate(raw-specifier, raw-size,
 out_irq-irq, out_irq-type);
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 08/24] xen/arm: Allow virq != irq

Actually Xen is assuming that the virtual IRQ will always be the same as IRQ.

Modify route_guest_irq to take the virtual IRQ in parameter and let Xen
assign a different IRQ number. Also store the vIRQ in the desc action to
easily retrieve easily the IRQ target when we need to inject the interrupt.

As DOM0 will get most the devices, the vIRQ is equal to the IRQ in that case.

At the same time modify the behavior of irq_get_domain. The function now
assumes that the irq_desc belongs to an IRQ assigned to a guest.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
Changes in v3
- Spelling/grammar nits
- Fix compilation on ARM64. Forgot to update route_irq_to_guest
  call for xgene platform.
- Add a word about irq_get_domain behavior change
- More s/irq/virq/ because of the rebasing on the latest staging

Changes in v2:
- Patch added
---
 xen/arch/arm/domain_build.c  |  2 +-
 xen/arch/arm/gic.c   |  5 ++--
 xen/arch/arm/irq.c   | 47 ++--
 xen/arch/arm/platforms/xgene-storm.c |  2 +-
 xen/arch/arm/vgic.c  | 20 +++
 xen/include/asm-arm/gic.h|  3 ++-
 xen/include/asm-arm/irq.h|  4 +--
 xen/include/asm-arm/vgic.h   |  4 +--
 8 files changed, 55 insertions(+), 32 deletions(-)

diff --git a/xen/arch/arm/domain_build.c b/xen/arch/arm/domain_build.c
index b48b5d0..06c1dec 100644
--- a/xen/arch/arm/domain_build.c
+++ b/xen/arch/arm/domain_build.c
@@ -1029,7 +1029,7 @@ static int handle_device(struct domain *d, struct 
dt_device_node *dev)
  * twice the IRQ. This can happen if the IRQ is shared
  */
 vgic_reserve_virq(d, irq);
-res = route_irq_to_guest(d, irq, dt_node_name(dev));
+res = route_irq_to_guest(d, irq, irq, dt_node_name(dev));
 if ( res )
 {
 printk(XENLOG_ERR Unable to route IRQ %u to domain %u\n,
diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index eb0c5d6..15de283 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -126,7 +126,8 @@ void gic_route_irq_to_xen(struct irq_desc *desc, const 
cpumask_t *cpu_mask,
 /* Program the GIC to route an interrupt to a guest
  *   - desc.lock must be held
  */
-void gic_route_irq_to_guest(struct domain *d, struct irq_desc *desc,
+void gic_route_irq_to_guest(struct domain *d, unsigned int virq,
+struct irq_desc *desc,
 const cpumask_t *cpu_mask, unsigned int priority)
 {
 struct pending_irq *p;
@@ -139,7 +140,7 @@ void gic_route_irq_to_guest(struct domain *d, struct 
irq_desc *desc,
 
 /* Use vcpu0 to retrieve the pending_irq struct. Given that we only
  * route SPIs to guests, it doesn't make any difference. */
-p = irq_to_pending(d-vcpu[0], desc-irq);
+p = irq_to_pending(d-vcpu[0], virq);
 p-desc = desc;
 }
 
diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 25ecf1d..830832c 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -31,6 +31,13 @@
 static unsigned int local_irqs_type[NR_LOCAL_IRQS];
 static DEFINE_SPINLOCK(local_irqs_type_lock);
 
+/* Describe an IRQ assigned to a guest */
+struct irq_guest
+{
+struct domain *d;
+unsigned int virq;
+};
+
 static void ack_none(struct irq_desc *irq)
 {
 printk(unexpected IRQ trap at irq %02x\n, irq-irq);
@@ -122,18 +129,20 @@ void __cpuinit init_secondary_IRQ(void)
 BUG_ON(init_local_irq_data()  0);
 }
 
-static inline struct domain *irq_get_domain(struct irq_desc *desc)
+static inline struct irq_guest *irq_get_guest_info(struct irq_desc *desc)
 {
 ASSERT(spin_is_locked(desc-lock));
-
-if ( !test_bit(_IRQ_GUEST, desc-status) )
-return dom_xen;
-
+ASSERT(test_bit(_IRQ_GUEST, desc-status));
 ASSERT(desc-action != NULL);
 
 return desc-action-dev_id;
 }
 
+static inline struct domain *irq_get_domain(struct irq_desc *desc)
+{
+return irq_get_guest_info(desc)-d;
+}
+
 void irq_set_affinity(struct irq_desc *desc, const cpumask_t *cpu_mask)
 {
 if ( desc != NULL )
@@ -197,7 +206,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 
 if ( test_bit(_IRQ_GUEST, desc-status) )
 {
-struct domain *d = irq_get_domain(desc);
+struct irq_guest *info = irq_get_guest_info(desc);
 
 desc-handler-end(desc);
 
@@ -206,7 +215,7 @@ void do_IRQ(struct cpu_user_regs *regs, unsigned int irq, 
int is_fiq)
 
 /* the irq cannot be a PPI, we only support delivery of SPIs to
  * guests */
-vgic_vcpu_inject_spi(d, irq);
+vgic_vcpu_inject_spi(info-d, info-virq);
 goto out_no_end;
 }
 
@@ -370,19 +379,30 @@ err:
 return rc;
 }
 
-int route_irq_to_guest(struct domain *d, unsigned int irq,
-   const char * devname)
+int route_irq_to_guest(struct domain *d, unsigned int virq,
+

[Xen-devel] [PATCH v3 11/24] xen/arm: Let the toolstack configure the number of SPIs

Each domain may have a different number of IRQs depending on the devices
assigned to it.

Rather re-using the number of IRQs used by the hardwared GIC, let the
toolstack specify the number of SPIs when the domain is created. This
will avoid to waste memory.

To calculate the number of SPIs, we assume that any IRQ given via the option
irqs= in xl is mapped 1:1 to the guest.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v3:
- Fix typoes
- A separate has been created to extend the DOMCTL create domain

Changes in v2:
- Patch added
---
 tools/libxc/xc_domain.c   |  1 +
 tools/libxl/libxl_arm.c   | 19 +++
 xen/arch/arm/domain.c |  7 ++-
 xen/arch/arm/setup.c  |  1 +
 xen/arch/arm/vgic.c   | 10 +-
 xen/include/asm-arm/domain.h  |  2 ++
 xen/include/asm-arm/setup.h   |  1 +
 xen/include/asm-arm/vgic.h|  2 +-
 xen/include/public/arch-arm.h |  2 ++
 9 files changed, 38 insertions(+), 7 deletions(-)

diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index eebc121..eb066cf 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -67,6 +67,7 @@ int xc_domain_create(xc_interface *xch,
 /* No arch-specific configuration for now */
 #elif defined (__arm__) || defined(__aarch64__)
 config.gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
+config.nr_spis = 0;
 #else
 errno = ENOSYS;
 return -1;
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index cddce6e..53177eb 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -39,6 +39,25 @@ int libxl__arch_domain_prepare_config(libxl__gc *gc,
   libxl_domain_config *d_config,
   xc_domain_configuration_t *xc_config)
 {
+uint32_t nr_spis = 0;
+unsigned int i;
+
+for (i = 0; i  d_config-b_info.num_irqs; i++) {
+int irq = d_config-b_info.irqs[i];
+int spi = irq - 32;
+
+if (irq  32)
+continue;
+
+if (nr_spis = spi)
+nr_spis = spi + 1;
+}
+
+LOG(DEBUG, Configure the domain);
+
+xc_config-nr_spis = nr_spis;
+LOG(DEBUG,  - Allocate %u SPIs, nr_spis);
+
 xc_config-gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
 
 return 0;
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 2473b10..6e56665 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -560,10 +560,15 @@ int arch_domain_create(struct domain *d, unsigned int 
domcr_flags,
 }
 config-gic_version = gic_version;
 
+/* Sanity check on the number of SPIs */
+rc = -EINVAL;
+if ( config-nr_spis  (gic_number_lines() - 32) )
+goto fail;
+
 if ( (rc = gicv_setup(d)) != 0 )
 goto fail;
 
-if ( (rc = domain_vgic_init(d)) != 0 )
+if ( (rc = domain_vgic_init(d, config-nr_spis)) != 0 )
 goto fail;
 
 if ( (rc = domain_vtimer_init(d)) != 0 )
diff --git a/xen/arch/arm/setup.c b/xen/arch/arm/setup.c
index 18227f6..b28a708 100644
--- a/xen/arch/arm/setup.c
+++ b/xen/arch/arm/setup.c
@@ -815,6 +815,7 @@ void __init start_xen(unsigned long boot_phys_offset,
 /* Create initial domain 0. */
 /* The vGIC for DOM0 is exactly emulated the hardware GIC */
 config.gic_version = XEN_DOMCTL_CONFIG_GIC_DEFAULT;
+config.nr_spis = gic_number_lines() - 32;
 
 dom0 = domain_create(0, 0, 0, config);
 if ( IS_ERR(dom0) || (alloc_dom0_vcpu0(dom0) == NULL) )
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index c915670..fc8a270 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -67,16 +67,16 @@ static void vgic_init_pending_irq(struct pending_irq *p, 
unsigned int virq)
 p-irq = virq;
 }
 
-int domain_vgic_init(struct domain *d)
+int domain_vgic_init(struct domain *d, unsigned int nr_spis)
 {
 int i;
 
 d-arch.vgic.ctlr = 0;
 
-if ( is_hardware_domain(d) )
-d-arch.vgic.nr_spis = gic_number_lines() - 32;
-else
-d-arch.vgic.nr_spis = 0; /* We don't need SPIs for the guest */
+/* The number of SPIs has to be aligned to 32 see
+ * GICD_TYPER.ITLinesNumber definition
+ */
+d-arch.vgic.nr_spis = ROUNDUP(nr_spis, 32);
 
 switch ( gic_hw_version() )
 {
diff --git a/xen/include/asm-arm/domain.h b/xen/include/asm-arm/domain.h
index d302fc9..101b4e9 100644
--- a/xen/include/asm-arm/domain.h
+++ b/xen/include/asm-arm/domain.h
@@ -121,6 +121,8 @@ struct arch_domain
 unsigned int evtchn_irq;
 }  __cacheline_aligned;
 
+#define domain_is_configured(d) ((d)-arch.is_configured)
+
 struct arch_vcpu
 {
 struct {
diff --git a/xen/include/asm-arm/setup.h b/xen/include/asm-arm/setup.h
index ba5a67d..254cc17 100644
--- a/xen/include/asm-arm/setup.h
+++ b/xen/include/asm-arm/setup.h
@@ -54,6 +54,7 @@ void copy_from_paddr(void *dst, paddr_t paddr,

[Xen-devel] [PATCH v3 21/24] tools/(lib)xl: Add partial device tree support for ARM

Let the user to pass additional nodes to the guest device tree. For this
purpose, everything in the node /passthrough from the partial device tree will
be copied into the guest device tree.

The node /aliases will be also copied to allow the user to define aliases
which can be used by the guest kernel.

A simple partial device tree will look like:

/dts-v1/;

/ {
#address-cells = 2;
#size-cells = 2;

passthrough {
compatible = simple-bus;
ranges;
#address-cells = 2;
#size-cells = 2;

/* List of your nodes */
}
};

Note that:
* The interrupt-parent proporties will be added by the toolstack in
the root node
* The properties compatible, ranges, #address-cells and #size-cells
in /passthrough are mandatory.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v3:
- Patch added
---
 docs/man/xl.cfg.pod.5   |   7 ++
 tools/libxl/libxl_arm.c | 253 
 tools/libxl/libxl_types.idl |   1 +
 tools/libxl/xl_cmdimpl.c|   1 +
 4 files changed, 262 insertions(+)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index e2f91fc..225b782 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -398,6 +398,13 @@ not emulated.
 Specify that this domain is a driver domain. This enables certain
 features needed in order to run a driver domain.
 
+=item Bdevice_tree=PATH
+
+Specify a partial device tree (compiled via the Device Tree Compiler).
+Everything under the node /passthrough will be copied into the guest
+device tree. For convenience, the node /aliases is also copied to allow
+the user to defined aliases which can be used by the guest kernel.
+
 =back
 
 =head2 Devices
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 53177eb..619458b 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -540,6 +540,238 @@ out:
 }
 }
 
+static bool check_overrun(uint64_t a, uint64_t b, uint32_t max)
+{
+return ((a + b)  UINT_MAX || (a + b)  max);
+}
+
+/* Only FDT v17 is supported */
+#define FDT_REQUIRED_VERSION0x11
+
+static int check_partial_fdt(libxl__gc *gc, void *fdt, size_t size)
+{
+int r;
+
+if (size  FDT_V17_SIZE) {
+LOG(ERROR, Partial FDT is too small);
+return ERROR_FAIL;
+}
+
+if (fdt_magic(fdt) != FDT_MAGIC) {
+LOG(ERROR, Partial FDT is not a valid Flat Device Tree);
+return ERROR_FAIL;
+}
+
+if (fdt_version(fdt) != FDT_REQUIRED_VERSION) {
+LOG(ERROR, Partial FDT version not supported. Required 0x%x got 0x%x,
+FDT_REQUIRED_VERSION, fdt_version(fdt));
+return ERROR_FAIL;
+}
+
+r = fdt_check_header(fdt);
+if (r) {
+LOG(ERROR, Failed to check the partial FDT (%d), r);
+return ERROR_FAIL;
+}
+
+/* Check if the *size and off* fields doesn't overrun the totalsize
+ * of the partial FDT.
+ */
+if (fdt_totalsize(fdt)  size) {
+LOG(ERROR, Partial FDT totalsize is too big);
+return ERROR_FAIL;
+}
+
+size = fdt_totalsize(fdt);
+if (fdt_off_dt_struct(fdt)  size ||
+fdt_off_dt_strings(fdt)  size ||
+check_overrun(fdt_off_dt_struct(fdt), fdt_size_dt_struct(fdt), size) ||
+check_overrun(fdt_off_dt_strings(fdt), fdt_size_dt_strings(fdt), 
size)) {
+LOG(ERROR, Failed to validate the header of the partial FDT);
+return ERROR_FAIL;
+}
+
+return 0;
+}
+
+/*
+ * Check if a string stored the strings block section is correctly
+ * nul-terminated.
+ * off_dt_strings and size_dt_strings fields have been validity-check
+ * earlier, so it's safe to use them here.
+ */
+static bool check_string(void *fdt, int nameoffset)
+{
+const char *str = fdt_string(fdt, nameoffset);
+
+for (; nameoffset  fdt_size_dt_strings(fdt); nameoffset++, str++) {
+if (*str == '\0')
+return true;
+}
+
+return false;
+}
+
+static int copy_properties(libxl__gc *gc, void *fdt, void *pfdt,
+   int nodeoff)
+{
+int propoff, nameoff, r;
+const struct fdt_property *prop;
+
+for (propoff = fdt_first_property_offset(pfdt, nodeoff);
+ propoff = 0;
+ propoff = fdt_next_property_offset(pfdt, propoff)) {
+
+if (!(prop = fdt_get_property_by_offset(pfdt, propoff, NULL))) {
+return -FDT_ERR_INTERNAL;
+}
+
+/*
+ * Libfdt doesn't perform any check on the validity of a string
+ * stored in the strings block section. As the property name is
+ * stored there, check it.
+ */
+nameoff = fdt32_to_cpu(prop-nameoff);
+if (!check_string(pfdt, nameoff)) {
+LOG(ERROR, The strings block section of the partial FDT is 
malformed);
+return -FDT_ERR_BADSTRUCTURE;
+}
+
+r =

[Xen-devel] [PATCH v3 00/24] xen/arm: Add support for non-pci passthrough

Hello all,

This is the third version of this patch series to add support for platform
device passthrough on ARM.

Compare to the previous version [1], the automatic mapping of MMIO/IRQ and
the generation of the device tree has been dropped.

Instead the user will have to:
- Map manually MMIO/IRQ
- Describe the device in the newly partial device tree support
- Specify the list of device protected by an IOMMU to assign to the
guest.

While this solution is primitive, this is allow us to support more complex
device in Xen with an little additionnal work for the user. Attempting to
do it automatically is more difficult because we may not know the dependencies
between devices (for instance a Network card and a phy).

To avoid adding code in DOM0 to manage platform device deassignment, the
user has to add the property xen,passthrough to the device tree node
describing the device. This can be easily done via U-Boot. For instance,
if we want to passthrough the second network card of a Midway server to the
guest. The user will have to add the following line the u-boot script:

fdt set /soc/ethernet@fff51000 xen,passthrough

This series has been tested on Midway by assigning the secondary network card
to a guest (see instruction below). I plan to do futher testing on other
boards.

There is some TODO, mostly related to XSM in different patches (see commit
message or /* TODO: ... */ in the files).

This series is based on my series Find automatically a PPI for DOM0 event
channel IRQ [2] and xen/arm: Resync the SMMU driver with the Linux one [3].
A working tree can be found here:
git://xenbits.xen.org/julieng/xen-unstable.git branch passthrough-v3

Major changes in v3:
- Rework the approach to passthrough a device (xen,passthrough +
  partial device tree).
- Extend the existing hypercalls to assign/deassign device rather than
adding new one.
- Merge series [4] and [5] in this serie.

Major changes in v2:
 - Drop the patch #1 of the previous version
 - Virtual IRQ are not anymore equal to the physical interrupt
 - Move the hypercall to get DT informations for privcmd to domctl
 - Split the domain creation in 2 two parts to allow per guest
 VGIC configuration (such as the number of SPIs).
 - Bunch of typoes, commit improvement, function renaming.

For all changes see in each patch.

I believe, it's better to have a basic support in Xen rather than nothing.
This could be improved later.

Sincerely yours,

[1] http://lists.xen.org/archives/html/xen-devel/2014-07/msg04090.html
[2] http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01386.html
[3] http://lists.xenproject.org/archives/html/xen-devel/2014-12/msg01612.html
[4] http://lists.xen.org/archives/html/xen-devel/2014-11/msg01672.html
[5] http://lists.xenproject.org/archives/html/xen-devel/2014-07/msg02098.html

=

Instructions to passthrough a non-PCI device

The example will use the secondary network card for the midway server.

1) Mark the device to let Xen knowns the device will be used for passthrough.
This is done in the device tree node describing the device by adding the
property xen,passthrough. The command to do it in U-Boot is:

fdt set /soc/ethernet@fff51000 xen,passthrough

2) Create the partial device tree describing the device. The IRQ are mapped
1:1 to the guest (i.e VIRQ == IRQ). For MMIO will have to find hole in the
guest memory layout (see xen/include/public/arch-arm.h, noted the layout
is not stable and can change between 2 releases version of Xen).

/dts-v1/;

/ {
#address-cells = 2;
#size-cells = 2;

aliases {
net = mac0;
};

passthrough {
compatible = simple-bus;
ranges;
#address-cells = 2;
#size-cells = 2;
mac0: ethernet@1000 {
compatible = calxeda,hb-xgmac;
reg = 0 0x1000 0 0x1000;
interrupts = 0 80 4  0 81 4  0 82 4;
/* dma-coherent can't be set because it requires platform
 * specific code for highbank
 */
/*  dma-coherent; */
};

foo {
my = mac0;
};
};
};

3) Compile the partial guest device with dtc (Device Tree Compiler).
For our purpose, the compiled file will be called guest-midway.dtb and
placed in /root in DOM0.

3) Add the following options in the guest configuration file:

device_tree = /root/guest-midway.dtb
dtdev = [ /soc/ethernet@fff51000 ]
irqs = [ 112, 113, 114 ]
iomem = [ 0xfff51,1@0x1 ]

Cc: manish.ja...@caviumnetworks.com
Cc: suravee.suthikulpa...@amd.com
Cc: andrii.tseglyts...@globallogic.com

Julien Grall (24):
  xen: Extend DOMCTL createdomain to support arch configuration
  xen/arm: Divide GIC initialization in 2 parts
  xen/dts: Allow only IRQ translation that are mapped to main GIC
  xen: guestcopy: Provide an helper to safely copy string from guest

[Xen-devel] [PATCH v3 04/24] xen: guestcopy: Provide an helper to safely copy string from guest

Flask code already provides an helper to copy a string from guest. In a later
patch, the new DT hypercalls will need a similar function.

To avoid code duplication, copy the flask helper (flask_copying_string) to
common code:
- Rename into safe_copy_string_from_guest
- Add comment to explain the extra +1
- Return directly the buffer and use the macros provided by
xen/err.h to return an error code if necessary.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Daniel De Graaf dgde...@tycho.nsa.gov
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com
Cc: Keir Fraser k...@xen.org

---
Changes in v3:
- Use macros of xen/err.h to return either the buffer or an
error code
- Reuse size_t instead of unsigned long
- Update comment and commit message

Changes in v2:
- Rename copy_string_from_guest into safe_copy_string_from_guest
- Update commit message and comment in the code
---
 xen/common/Makefile|  1 +
 xen/common/guestcopy.c | 30 +
 xen/include/xen/guest_access.h |  5 +
 xen/xsm/flask/flask_op.c   | 43 ++
 4 files changed, 46 insertions(+), 33 deletions(-)
 create mode 100644 xen/common/guestcopy.c

diff --git a/xen/common/Makefile b/xen/common/Makefile
index 9ce75bb..3da774a 100644
--- a/xen/common/Makefile
+++ b/xen/common/Makefile
@@ -10,6 +10,7 @@ obj-y += event_2l.o
 obj-y += event_channel.o
 obj-y += event_fifo.o
 obj-y += grant_table.o
+obj-y += guestcopy.o
 obj-y += irq.o
 obj-y += kernel.o
 obj-y += keyhandler.o
diff --git a/xen/common/guestcopy.c b/xen/common/guestcopy.c
new file mode 100644
index 000..d974f5c
--- /dev/null
+++ b/xen/common/guestcopy.c
@@ -0,0 +1,30 @@
+#include xen/config.h
+#include xen/lib.h
+#include xen/guest_access.h
+#include xen/err.h
+
+/* The function copies a string from the guest and adds a NUL to
+ * make sure the string is correctly terminated.
+ */
+void *safe_copy_string_from_guest(XEN_GUEST_HANDLE(char) u_buf,
+  size_t size, size_t max_size)
+{
+char *tmp;
+
+if ( size  max_size )
+return ERR_PTR(-ENOENT);
+
+/* Add an extra +1 to append \0 */
+tmp = xmalloc_array(char, size + 1);
+if ( !tmp )
+return ERR_PTR(-ENOMEM);
+
+if ( copy_from_guest(tmp, u_buf, size) )
+{
+xfree(tmp);
+return ERR_PTR(-EFAULT);
+}
+tmp[size] = 0;
+
+return tmp;
+}
diff --git a/xen/include/xen/guest_access.h b/xen/include/xen/guest_access.h
index 373454e..55645e6 100644
--- a/xen/include/xen/guest_access.h
+++ b/xen/include/xen/guest_access.h
@@ -8,6 +8,8 @@
 #define __XEN_GUEST_ACCESS_H__
 
 #include asm/guest_access.h
+#include xen/types.h
+#include public/xen.h
 
 #define copy_to_guest(hnd, ptr, nr) \
 copy_to_guest_offset(hnd, 0, ptr, nr)
@@ -27,4 +29,7 @@
 #define __clear_guest(hnd, nr)  \
 __clear_guest_offset(hnd, 0, nr)
 
+void *safe_copy_string_from_guest(XEN_GUEST_HANDLE(char) u_buf,
+ size_t size, size_t max_size);
+
 #endif /* __XEN_GUEST_ACCESS_H__ */
diff --git a/xen/xsm/flask/flask_op.c b/xen/xsm/flask/flask_op.c
index 7743aac..b14d306 100644
--- a/xen/xsm/flask/flask_op.c
+++ b/xen/xsm/flask/flask_op.c
@@ -12,6 +12,7 @@
 #include xen/event.h
 #include xsm/xsm.h
 #include xen/guest_access.h
+#include xen/err.h
 
 #include public/xsm/flask_op.h
 
@@ -76,29 +77,6 @@ static int domain_has_security(struct domain *d, u32 perms)
 perms, NULL);
 }
 
-static int flask_copyin_string(XEN_GUEST_HANDLE(char) u_buf, char **buf,
-   size_t size, size_t max_size)
-{
-char *tmp;
-
-if ( size  max_size )
-return -ENOENT;
-
-tmp = xmalloc_array(char, size + 1);
-if ( !tmp )
-return -ENOMEM;
-
-if ( copy_from_guest(tmp, u_buf, size) )
-{
-xfree(tmp);
-return -EFAULT;
-}
-tmp[size] = 0;
-
-*buf = tmp;
-return 0;
-}
-
 #endif /* COMPAT */
 
 static int flask_security_user(struct xen_flask_userlist *arg)
@@ -112,9 +90,9 @@ static int flask_security_user(struct xen_flask_userlist 
*arg)
 if ( rv )
 return rv;
 
-rv = flask_copyin_string(arg-u.user, user, arg-size, PAGE_SIZE);
-if ( rv )
-return rv;
+user = safe_copy_string_from_guest(arg-u.user, arg-size, PAGE_SIZE);
+if ( IS_ERR(user) )
+return PTR_ERR(user);
 
 rv = security_get_user_sids(arg-start_sid, user, sids, nsids);
 if ( rv  0 )
@@ -227,9 +205,9 @@ static int flask_security_context(struct 
xen_flask_sid_context *arg)
 if ( rv )
 return rv;
 
-rv = flask_copyin_string(arg-context, buf, arg-size, PAGE_SIZE);
-if ( rv )
-return rv;
+buf = safe_copy_string_from_guest(arg-context, arg-size, PAGE_SIZE);
+if ( IS_ERR(buf) )
+return PTR_ERR(buf);

[Xen-devel] [PATCH v3 16/24] xen/passthrough: Introduce iommu_construct

This new function will correctly initialize the IOMMU page table for the
current domain.

Also use it in iommu_assign_dt_device even though the current IOMMU
implementation on ARM shares P2M with the processor.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Jan Beulich jbeul...@suse.com

---
Changes in v3:
- The ASSERT in iommu_construct was redundant with the if ()
- Remove d-need_iommu = 1 in assign_device has it's already
done by iommu_construct.
- Simplify the code in the caller of iommu_construct

Changes in v2:
- Add missing Signed-off-by
- Rename iommu_buildup to iommu_construct
---
 xen/drivers/passthrough/arm/iommu.c   |  6 ++
 xen/drivers/passthrough/device_tree.c |  4 
 xen/drivers/passthrough/iommu.c   | 19 +++
 xen/drivers/passthrough/pci.c | 15 ---
 xen/include/xen/iommu.h   |  2 ++
 5 files changed, 35 insertions(+), 11 deletions(-)

diff --git a/xen/drivers/passthrough/arm/iommu.c 
b/xen/drivers/passthrough/arm/iommu.c
index 3e9303a..5870aef 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -68,3 +68,9 @@ void arch_iommu_domain_destroy(struct domain *d)
 {
 iommu_dt_domain_destroy(d);
 }
+
+int arch_iommu_populate_page_table(struct domain *d)
+{
+/* The IOMMU shares the p2m with the CPU */
+return -ENOSYS;
+}
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 377d41d..88e496e 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -41,6 +41,10 @@ int iommu_assign_dt_device(struct domain *d, struct 
dt_device_node *dev)
 if ( !list_empty(dev-domain_list) )
 goto fail;
 
+rc = iommu_construct(d);
+if ( rc )
+goto fail;
+
 rc = hd-platform_ops-assign_device(d, 0, dt_to_dev(dev));
 
 if ( rc )
diff --git a/xen/drivers/passthrough/iommu.c b/xen/drivers/passthrough/iommu.c
index cc12735..8915244 100644
--- a/xen/drivers/passthrough/iommu.c
+++ b/xen/drivers/passthrough/iommu.c
@@ -187,6 +187,25 @@ void iommu_teardown(struct domain *d)
 tasklet_schedule(iommu_pt_cleanup_tasklet);
 }
 
+int iommu_construct(struct domain *d)
+{
+int rc = 0;
+
+if ( need_iommu(d)  0 )
+return 0;
+
+if ( !iommu_use_hap_pt(d) )
+{
+rc = arch_iommu_populate_page_table(d);
+if ( rc )
+return rc;
+}
+
+d-need_iommu = 1;
+
+return rc;
+}
+
 void iommu_domain_destroy(struct domain *d)
 {
 struct hvm_iommu *hd = domain_hvm_iommu(d);
diff --git a/xen/drivers/passthrough/pci.c b/xen/drivers/passthrough/pci.c
index 43ce5dc..9a47a37 100644
--- a/xen/drivers/passthrough/pci.c
+++ b/xen/drivers/passthrough/pci.c
@@ -1355,18 +1355,11 @@ static int assign_device(struct domain *d, u16 seg, u8 
bus, u8 devfn)
 if ( !spin_trylock(pcidevs_lock) )
 return -ERESTART;
 
-if ( need_iommu(d) = 0 )
+rc = iommu_construct(d);
+if ( rc )
 {
-if ( !iommu_use_hap_pt(d) )
-{
-rc = arch_iommu_populate_page_table(d);
-if ( rc )
-{
-spin_unlock(pcidevs_lock);
-return rc;
-}
-}
-d-need_iommu = 1;
+spin_unlock(pcidevs_lock);
+return rc;
 }
 
 pdev = pci_get_pdev_by_domain(hardware_domain, seg, bus, devfn);
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index d0f99ef..c146ee4 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -65,6 +65,8 @@ int arch_iommu_domain_init(struct domain *d);
 int arch_iommu_populate_page_table(struct domain *d);
 void arch_iommu_check_autotranslated_hwdom(struct domain *d);
 
+int iommu_construct(struct domain *d);
+
 /* Function used internally, use iommu_domain_destroy */
 void iommu_teardown(struct domain *d);
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 09/24] xen/arm: route_irq_to_guest: Check validity of the IRQ

Currently Xen only supports SPIs routing for guest, add a function
is_assignable_irq to check if we can assign a given IRQ to the guest.

Secondly, make sure the vIRQ is not the greater that the number of IRQs handle
to the vGIC and it's an SPIs.

Thirdly, when the IRQ is already assigned to the domain, check the user
is not asking to use a different vIRQ than the one already bound.

Finally, desc-arch.type which contains the IRQ type (i.e level/edge) must
be correctly configured before. The IRQ type won't be configure when:
- the device has been blacklist for the current platform
- the IRQ has not been describe in the device tree

I think we can safely assume that a user won't never ask to route
as such IRQ to the guest.

Also, use XENLOG_G_ERR in the error message within the function as it will
be later called from a guest.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
Changes in v3:
- Fix typo in commit message and comment
- Add a check that the vIRQ is an SPI
- Check if the user is not asking for a different vIRQ when the
IRQ is already assigned to the guest

Changes in v2:
- Rename is_routable_irq into is_assignable_irq
- Check if the IRQ is not greater than the number handled by the
number of IRQs handled by the gic
- Move is_assignable_irq in irq.c rather than defining in the
header irq.h
- Retrieve the irq descriptor after checking the validity of the
IRQ
- vgic_num_irqs has been moved in a separate patch
- Fix the irq check against vgic_num_irqs
- Use virq instead of irq for vGIC sanity check
---
 xen/arch/arm/irq.c| 58 +++
 xen/include/asm-arm/irq.h |  2 ++
 2 files changed, 56 insertions(+), 4 deletions(-)

diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c
index 830832c..af408ac 100644
--- a/xen/arch/arm/irq.c
+++ b/xen/arch/arm/irq.c
@@ -379,6 +379,15 @@ err:
 return rc;
 }
 
+bool_t is_assignable_irq(unsigned int irq)
+{
+/* For now, we can only route SPIs to the guest */
+return ((irq = NR_LOCAL_IRQS)  (irq  gic_number_lines()));
+}
+
+/* Route an IRQ to a specific guest.
+ * For now only SPIs are assignabled to the guest.
+ */
 int route_irq_to_guest(struct domain *d, unsigned int virq,
unsigned int irq, const char * devname)
 {
@@ -388,6 +397,29 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
 unsigned long flags;
 int retval = 0;
 
+if ( !is_assignable_irq(irq) )
+{
+dprintk(XENLOG_G_ERR, the IRQ%u is not routable\n, irq);
+return -EINVAL;
+}
+
+desc = irq_to_desc(irq);
+
+if ( virq = vgic_num_irqs(d) )
+{
+dprintk(XENLOG_G_ERR,
+the vIRQ number %u is too high for domain %u (max = %u)\n,
+irq, d-domain_id, vgic_num_irqs(d));
+return -EINVAL;
+}
+
+/* Only routing to virtual SPIs is supported */
+if ( virq  32 )
+{
+dprintk(XENLOG_G_ERR, IRQ can only be routed to a virtual SPIs);
+return -EINVAL;
+}
+
 action = xmalloc(struct irqaction);
 if ( !action )
 return -ENOMEM;
@@ -408,8 +440,18 @@ int route_irq_to_guest(struct domain *d, unsigned int virq,
 
 spin_lock_irqsave(desc-lock, flags);
 
+if ( desc-arch.type == DT_IRQ_TYPE_INVALID )
+{
+dprintk(XENLOG_G_ERR, IRQ %u has not been configured\n,
+irq);
+retval = -EIO;
+goto out;
+}
+
 /* If the IRQ is already used by someone
- *  - If it's the same domain - Xen doesn't need to update the IRQ desc
+ *  - If it's the same domain - Xen doesn't need to update the IRQ desc.
+ *  For safety check if we are not trying to assign the IRQ to a
+ *  different vIRQ.
  *  - Otherwise - For now, don't allow the IRQ to be shared between
  *  Xen and domains.
  */
@@ -418,13 +460,21 @@ int route_irq_to_guest(struct domain *d, unsigned int 
virq,
 struct domain *ad = irq_get_domain(desc);
 
 if ( test_bit(_IRQ_GUEST, desc-status)  d == ad )
+{
+if ( irq_get_guest_info(desc)-virq != virq )
+{
+dprintk(XENLOG_G_ERR, d%u: IRQ %u is already assigned to vIRQ 
%u\n,
+d-domain_id, irq, irq_get_guest_info(desc)-virq);
+retval = -EPERM;
+}
 goto out;
+}
 
 if ( test_bit(_IRQ_GUEST, desc-status) )
-printk(XENLOG_ERR ERROR: IRQ %u is already used by domain %u\n,
-   irq, ad-domain_id);
+dprintk(XENLOG_G_ERR, IRQ %u is already used by domain %u\n,
+irq, ad-domain_id);
 else
-printk(XENLOG_ERR ERROR: IRQ %u is already used by Xen\n, irq);
+dprintk(XENLOG_G_ERR, IRQ %u is already used by Xen\n, irq);
 retval = -EBUSY;
 goto out;
 }
diff --git

[Xen-devel] [PATCH v3 05/24] xen/arm: vgic: Introduce a function to initialize pending_irq

The structure pending_irq is initialized on the same way in 2 differents
place. Introduce vgic_init_pending_irq to avoid code duplication.

Also move the setting of the irq field in this function as we need to
initialize it once rather than every time an IRQ is injected to the guest.

Finally, use unsigned int for the irq field to be consistent with the
virq variable

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

---
Changes in v3:
- Add Stefano's acked
- The irq field is now unsigned int
- Update commit message to speak about the int - unsigned int
change
- Use unsigned int rather than unsigned

Changes in v2:
- Patch added
---
 xen/arch/arm/gic.c |  2 +-
 xen/arch/arm/vgic.c| 19 ++-
 xen/include/asm-arm/vgic.h |  2 +-
 3 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c
index 63147f3..eb0c5d6 100644
--- a/xen/arch/arm/gic.c
+++ b/xen/arch/arm/gic.c
@@ -627,7 +627,7 @@ void gic_dump_info(struct vcpu *v)
 
 list_for_each_entry ( p, v-arch.vgic.inflight_irqs, inflight )
 {
-printk(Inflight irq=%d lr=%u\n, p-irq, p-lr);
+printk(Inflight irq=%u lr=%u\n, p-irq, p-lr);
 }
 
 list_for_each_entry( p, v-arch.vgic.lr_pending, lr_queue )
diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c
index 0b24eec..38216f7 100644
--- a/xen/arch/arm/vgic.c
+++ b/xen/arch/arm/vgic.c
@@ -60,6 +60,13 @@ struct vgic_irq_rank *vgic_rank_irq(struct vcpu *v, unsigned 
int irq)
 return vgic_get_rank(v, rank);
 }
 
+static void vgic_init_pending_irq(struct pending_irq *p, unsigned int virq)
+{
+INIT_LIST_HEAD(p-inflight);
+INIT_LIST_HEAD(p-lr_queue);
+p-irq = virq;
+}
+
 int domain_vgic_init(struct domain *d)
 {
 int i;
@@ -100,10 +107,8 @@ int domain_vgic_init(struct domain *d)
 return -ENOMEM;
 
 for (i=0; id-arch.vgic.nr_spis; i++)
-{
-INIT_LIST_HEAD(d-arch.vgic.pending_irqs[i].inflight);
-INIT_LIST_HEAD(d-arch.vgic.pending_irqs[i].lr_queue);
-}
+vgic_init_pending_irq(d-arch.vgic.pending_irqs[i], i + 32);
+
 for (i=0; iDOMAIN_NR_RANKS(d); i++)
 spin_lock_init(d-arch.vgic.shared_irqs[i].lock);
 
@@ -147,10 +152,7 @@ int vcpu_vgic_init(struct vcpu *v)
 
 memset(v-arch.vgic.pending_irqs, 0, sizeof(v-arch.vgic.pending_irqs));
 for (i = 0; i  32; i++)
-{
-INIT_LIST_HEAD(v-arch.vgic.pending_irqs[i].inflight);
-INIT_LIST_HEAD(v-arch.vgic.pending_irqs[i].lr_queue);
-}
+vgic_init_pending_irq(v-arch.vgic.pending_irqs[i], i);
 
 INIT_LIST_HEAD(v-arch.vgic.inflight_irqs);
 INIT_LIST_HEAD(v-arch.vgic.lr_pending);
@@ -407,7 +409,6 @@ void vgic_vcpu_inject_irq(struct vcpu *v, unsigned int irq)
 goto out;
 }
 
-n-irq = irq;
 n-priority = priority;
 
 /* the irq is enabled */
diff --git a/xen/include/asm-arm/vgic.h b/xen/include/asm-arm/vgic.h
index 460a2f3..8582d9d 100644
--- a/xen/include/asm-arm/vgic.h
+++ b/xen/include/asm-arm/vgic.h
@@ -67,7 +67,7 @@ struct pending_irq
 #define GIC_IRQ_GUEST_MIGRATING   4
 unsigned long status;
 struct irq_desc *desc; /* only set it the irq corresponds to a physical 
irq */
-int irq;
+unsigned int irq;
 #define GIC_INVALID_LR ~(uint8_t)0
 uint8_t lr;
 uint8_t priority;
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 01/10] xen/arm: Implement hip04-d01 platform

On Tue, 2015-01-13 at 14:09 +, Frediano Ziglio wrote:
2015-01-13 11:58 GMT+00:00 Ian Campbell ian.campb...@citrix.com:
On Mon, 2014-11-03 at 10:11 +, Frediano Ziglio wrote:
Add this new platform to Xen.
This platform require specific code to initialize CPUs.

What is the bootwrapper? Are you running this on real silicon or on an
emulator? Can the platform be made to do PSCI instead?

Very real. It's actually on my desk and I'm not in Matrix :-)

OK. The choice of bootwrapper as a name is a bit unfortunate, since it
is already used for something else, but oh well.

Has no PSCI support. Would be honestly very great. As we (as company)
write the firmware could be technically doable. There is no plan. This
piece of software is meant to bring the CPU from Secure mode to
Unsecure Hypervisor mode before calling kernel/hypervisor code and
provide supervisor calls.

Sounds a lot like PSCI to me, except non-standard ;-)

+np_fab = dt_find_compatible_node(NULL, NULL,
hisilicon,hip04-fabric);

Please add a reference to the DT bindings document for these values.

linux/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt
seems related but doesn't talk about most of these fields.

There are documentation in the Linaro kernel, see
https://git.linaro.org/kernel/linux-linaro-tracking.git/blob/HEAD:/Documentation/devicetree/bindings/arm/hisilicon/hisilicon.txt.
I hope it will be merged soon.

Thanks, but this doesn't seem to cover many of the properties used by
the code you are adding, e.g. bootwrapper-{size,magic},
relocation-{entry,size} (in fact it suggests they are part of a
boot-method array).

I get the feeling these might be legacy/deprecated. Perhaps we could get
away without supporting such things?

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 08/14] xen-netback: use foreign page information from the pages themselves

On 12/01/15 15:43, David Vrabel wrote:
 From: Jenny Herbert jenny.herb...@citrix.com
 
 Use the foreign page flag in netback to get the domid and grant ref
 needed for the grant copy.  This signficiantly simplifies the netback
 code and makes netback work with foreign pages from other backends
 (e.g., blkback).
 
 This allows blkback to use iSCSI disks provided by domUs running on
 the same host.

Dave,

This depends on several xen changes.  It's been Acked-by: Ian Campbell
ian.campb...@citrix.com

Are you happy for me to merge this via the xen tree in 3.20?

David

 Signed-off-by: Jenny Herbert jennifer.herb...@citrix.com
 Signed-off-by: David Vrabel david.vra...@citrix.com
 ---
  drivers/net/xen-netback/netback.c |  100 
 -
  1 file changed, 9 insertions(+), 91 deletions(-)
 
 diff --git a/drivers/net/xen-netback/netback.c 
 b/drivers/net/xen-netback/netback.c
 index 6441318..ae3ab37 100644
 --- a/drivers/net/xen-netback/netback.c
 +++ b/drivers/net/xen-netback/netback.c
 @@ -314,9 +314,7 @@ static struct xenvif_rx_meta *get_next_rx_buffer(struct 
 xenvif_queue *queue,
  static void xenvif_gop_frag_copy(struct xenvif_queue *queue, struct sk_buff 
 *skb,
struct netrx_pending_operations *npo,
struct page *page, unsigned long size,
 -  unsigned long offset, int *head,
 -  struct xenvif_queue *foreign_queue,
 -  grant_ref_t foreign_gref)
 +  unsigned long offset, int *head)
  {
   struct gnttab_copy *copy_gop;
   struct xenvif_rx_meta *meta;
 @@ -333,6 +331,8 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
 *queue, struct sk_buff *skb
   offset = ~PAGE_MASK;
  
   while (size  0) {
 + struct xen_page_foreign *foreign;
 +
   BUG_ON(offset = PAGE_SIZE);
   BUG_ON(npo-copy_off  MAX_BUFFER_OFFSET);
  
 @@ -361,9 +361,10 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
 *queue, struct sk_buff *skb
   copy_gop-flags = GNTCOPY_dest_gref;
   copy_gop-len = bytes;
  
 - if (foreign_queue) {
 - copy_gop-source.domid = foreign_queue-vif-domid;
 - copy_gop-source.u.ref = foreign_gref;
 + foreign = xen_page_foreign(page);
 + if (foreign) {
 + copy_gop-source.domid = foreign-domid;
 + copy_gop-source.u.ref = foreign-gref;
   copy_gop-flags |= GNTCOPY_source_gref;
   } else {
   copy_gop-source.domid = DOMID_SELF;
 @@ -406,35 +407,6 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
 *queue, struct sk_buff *skb
  }
  
  /*
 - * Find the grant ref for a given frag in a chain of struct ubuf_info's
 - * skb: the skb itself
 - * i: the frag's number
 - * ubuf: a pointer to an element in the chain. It should not be NULL
 - *
 - * Returns a pointer to the element in the chain where the page were found. 
 If
 - * not found, returns NULL.
 - * See the definition of callback_struct in common.h for more details about
 - * the chain.
 - */
 -static const struct ubuf_info *xenvif_find_gref(const struct sk_buff *const 
 skb,
 - const int i,
 - const struct ubuf_info *ubuf)
 -{
 - struct xenvif_queue *foreign_queue = ubuf_to_queue(ubuf);
 -
 - do {
 - u16 pending_idx = ubuf-desc;
 -
 - if (skb_shinfo(skb)-frags[i].page.p ==
 - foreign_queue-mmap_pages[pending_idx])
 - break;
 - ubuf = (struct ubuf_info *) ubuf-ctx;
 - } while (ubuf);
 -
 - return ubuf;
 -}
 -
 -/*
   * Prepare an SKB to be transmitted to the frontend.
   *
   * This function is responsible for allocating grant operations, meta
 @@ -459,8 +431,6 @@ static int xenvif_gop_skb(struct sk_buff *skb,
   int head = 1;
   int old_meta_prod;
   int gso_type;
 - const struct ubuf_info *ubuf = skb_shinfo(skb)-destructor_arg;
 - const struct ubuf_info *const head_ubuf = ubuf;
  
   old_meta_prod = npo-meta_prod;
  
 @@ -507,68 +477,16 @@ static int xenvif_gop_skb(struct sk_buff *skb,
   len = skb_tail_pointer(skb) - data;
  
   xenvif_gop_frag_copy(queue, skb, npo,
 -  virt_to_page(data), len, offset, head,
 -  NULL,
 -  0);
 +  virt_to_page(data), len, offset, head);
   data += len;
   }
  
   for (i = 0; i  nr_frags; i++) {
 - /* This variable also signals whether foreign_gref has a real
 -  * value or not.
 -  */
 - struct xenvif_queue *foreign_queue = NULL;
 - grant_ref_t

Re: [Xen-devel] [PATCH v3 5/5] tools: add total/local memory bandwith monitoring

On Tue, Jan 13, 2015 at 04:02:13PM +0800, Chao Peng wrote:
 Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring
 are supported: total and local memory bandwidth monitoring. To use it,
 CMT should be enabled in hypervisor.
 
 Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
 ---
  docs/man/xl.pod.1 |9 +
  tools/libxc/include/xenctrl.h |2 +
  tools/libxc/xc_psr.c  |8 
  tools/libxl/libxl.h   |8 
  tools/libxl/libxl_psr.c   |   84 
 +
  tools/libxl/libxl_types.idl   |2 +
  tools/libxl/xl_cmdimpl.c  |   21 ++-
  tools/libxl/xl_cmdtable.c |4 +-
  8 files changed, 136 insertions(+), 2 deletions(-)
 
 diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
 index 6b89ba8..0370625 100644
 --- a/docs/man/xl.pod.1
 +++ b/docs/man/xl.pod.1
 @@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just 
 attach the domain id with
  the monitoring service. When the domain doesn't need to be monitored any 
 more,
  detach the domain id from the monitoring service.
  
 +Intel Broadwell and later server platforms also offer total/local memory
 +bandwidth monitoring. Xen supports per-domain monitoring for these two
 +additional monitoring types. Both memory bandwidth monitoring and L3 cache
 +occupancy monitoring share the same set of underground monitoring service. 
 Once
  ^^^
  underlying?

I'm not native speaker though. I will defer reviewing this paragraph to
a native English speaker.

 +a domain is attached to the monitoring service, monitoring data can be showed
 +for any of these monitoring types.
 +
  =over 4
  
[...]
 +static int libxl__psr_cmt_get_mem_bandwidth(libxl__gc *gc,
 +uint32_t domid,
 +xc_psr_cmt_type type,
 +uint32_t socketid,
 +uint32_t *bandwidth)
 +{
 +uint64_t sample1, sample2;
 +uint32_t upscaling_factor;
 +int retry_attempts = 0;
 +int rc;
 +
 +do {
 +rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
 +   sample1);
 +if (rc  0) {
 +rc = ERROR_FAIL;
 +goto out;
 +}
 +
 +usleep(1);
 +
 +rc = libxl__psr_cmt_get_l3_monitoring_data(gc, domid, type, socketid,
 +   sample2);
 +if (rc  0) {
 +   rc = ERROR_FAIL;
 +   goto out;
 +}
 +
 +if (sample2 = sample1)

If sample2 == sample1 then bandwidth is zero. Is this expected?

 +break;
 +
 +if (retry_attempts  MBM_SAMPLE_RETRY_MAX) {
 +retry_attempts++;
 +} else {
 +LOGE(ERROR, event counter overflowed);
 +rc = ERROR_FAIL;
 +goto out;
 +}
 +
 +} while(1);

Minor nit, should be while (1).

The rest of this patch looks OK to me.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v5 6/9] libxl: add libxl__domain_soft_reset_destroy()

On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
 New libxl__domain_soft_reset_destroy() is an internal-only
 version of libxl_domain_destroy() which follows the same domain
 destroy path with the only difference: xc_domain_destroy() is
 being avoided so the domain is not actually being destroyed.

Rather than duplicating the bulk of libxl_domain_destroy, please make
this libxl__domain_destroy taking a flag and turn libxl_domain_destroy
into a thin wrapper around the new internal version.

 Add soft_reset flag to libxl__domain_destroy_state structure
 to support the change.
 
 The original libxl_domain_destroy() function could be easily
 modified to support new flag but I'm trying to avoid that as
 it is part of public API.

There are mechanisms which could be used here to rev the API if it was
desirable to expose this flag to the calling toolstack for some reason,
e.g. checkout the uses of LIBXL_API_VERSION in libxl.h.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxl: provide xenlight.pc

On Tue, 2015-01-13 at 12:56 +, Wei Liu wrote:
 On Tue, Jan 13, 2015 at 01:19:05PM +0100, Olaf Hering wrote:
  On Tue, Jan 13, Ian Campbell wrote:
  
   On Fri, 2015-01-09 at 14:32 +, Wei Liu wrote:
A pkg-config file for libxl. It also contains two variables
(xenfirmwaredir and libexec_bin) so that tools that are very keen on
knowing the locations of Xen binaries (say, libvirt) can use them to
determine the location of the binaries.

Please rerun autogen.sh after applying this patch.
  
  Forgot to reply to this earlier:
  
  Should there really be another file.in.in.in.in mess? I think the
  major/minor values could be placed into some m4 file so that they can be
  substituted properly by configure.
  
 
 I was two minded when I wrote this path. On one hand I didn't want to
 place a m4 file here, on the other I didn't want to leak library version
 numbers to top level m4 directory. Finally I decided to do the .in.in
 trick.
 
 So if you have an argument for either of these please convince me...
 Or you have other idea about file placement please tell me.

I think the library SONAME belongs in the relevant Makefile, not hidden
in the m4 somewhere. Which I think necessitates .in.in. I think we can
live with that.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 20/24] xen/passthrough: Extend XEN_DOMCTL_assign_device to support DT device

TODO: Update the commit message

A device node is described by a path. It will be used to retrieved the
node in the device tree and assign the related device to the domain.

Only device protected by an IOMMU can be assigned to a guest.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Jan Beulich jbeul...@suse.com

---
Changes in v2:
- Use a different number for XEN_DOMCTL_assign_dt_device
---
 tools/libxc/include/xenctrl.h | 10 
 tools/libxc/xc_domain.c   | 95 --
 xen/drivers/passthrough/device_tree.c | 97 +--
 xen/drivers/passthrough/iommu.c   |  7 +++
 xen/drivers/passthrough/pci.c | 43 +++-
 xen/include/public/domctl.h   | 15 +-
 xen/include/xen/iommu.h   |  3 ++
 7 files changed, 249 insertions(+), 21 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index d66571f..db45475 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2055,6 +2055,16 @@ int xc_deassign_device(xc_interface *xch,
  uint32_t domid,
  uint32_t machine_bdf);
 
+int xc_assign_dt_device(xc_interface *xch,
+uint32_t domid,
+char *path);
+int xc_test_assign_dt_device(xc_interface *xch,
+ uint32_t domid,
+ char *path);
+int xc_deassign_dt_device(xc_interface *xch,
+  uint32_t domid,
+  char *path);
+
 int xc_domain_memory_mapping(xc_interface *xch,
  uint32_t domid,
  unsigned long first_gfn,
diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c
index eb066cf..bca3aee 100644
--- a/tools/libxc/xc_domain.c
+++ b/tools/libxc/xc_domain.c
@@ -1637,7 +1637,8 @@ int xc_assign_device(
 
 domctl.cmd = XEN_DOMCTL_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, domctl);
 }
@@ -1686,7 +1687,8 @@ int xc_test_assign_device(
 
 domctl.cmd = XEN_DOMCTL_test_assign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
 
 return do_domctl(xch, domctl);
 }
@@ -1700,11 +1702,96 @@ int xc_deassign_device(
 
 domctl.cmd = XEN_DOMCTL_deassign_device;
 domctl.domain = domid;
-domctl.u.assign_device.machine_sbdf = machine_sbdf;
- 
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI;
+domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf;
+
 return do_domctl(xch, domctl);
 }
 
+int xc_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+int xc_test_assign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_test_assign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+int xc_deassign_dt_device(
+xc_interface *xch,
+uint32_t domid,
+char *path)
+{
+int rc;
+size_t size = strlen(path);
+DECLARE_DOMCTL;
+DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN);
+
+if ( xc_hypercall_bounce_pre(xch, path) )
+return -1;
+
+domctl.cmd = XEN_DOMCTL_deassign_device;
+domctl.domain = (domid_t)domid;
+
+domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT;
+domctl.u.assign_device.u.dt.size = size;
+set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path);
+
+rc = do_domctl(xch, domctl);
+
+xc_hypercall_bounce_post(xch, path);
+
+return rc;
+}
+
+
+
+
 int

Re: [Xen-devel] [PATCH v5 4/9] xen: introduce XEN_DOMCTL_devour

2015-01-13 Thread Tim Deegan

At 13:53 + on 13 Jan (1421153637), Ian Campbell wrote:
 On Thu, 2014-12-11 at 14:45 +0100, Vitaly Kuznetsov wrote:
  +gmfn = mfn_to_gmfn(d, mfn);
 
 (I haven't thought about it super hard, but I'm taking it as given that
 this approach to kexec is going to be needed for ARM too, since that
 seems likely)
 
 mfn_to_gmfn is going to be a bit pricey on ARM, we don't have an m2p to
 refer to, I'm not sure what we would do instead, walking the p2m looking
 for mfns surely won't be a good idea!
 
 An alternative approach to this might be to walk the guest p2m (with
 appropriate continuations) and move each domheap page (this would also
 help us preserve super page mappings). It would also have the advantage
 of not needing additional stages in the destroy path and state in struct
 domain etc, since all the action would be constrained to the one
 hypercall.
 
 x86 folks, would that work for your p2m too?

Without having looked at the details, it sounds plausible to me.

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets

Always fully coalesce guest Rx packets into the minimum number of ring
slots.  Reducing the number of slots per packet has significant
performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
receive test).

However, this does increase the number of grant ops per packet which
decreases performance with some workloads (intrahost VM to VM)
/unless/ grant copy has been optimized for adjacent ops with the same
source or destination (see grant-table: defer releasing pages
acquired in a grant copy[1]).

Do we need to retain the existing path and make the always coalesce
path conditional on a suitable version of Xen?

[1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html

Signed-off-by: David Vrabel david.vra...@citrix.com
---
 drivers/net/xen-netback/common.h  |1 -
 drivers/net/xen-netback/netback.c |  106 ++---
 2 files changed, 3 insertions(+), 104 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 5f1fda4..589fa25 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -251,7 +251,6 @@ struct xenvif {
 struct xenvif_rx_cb {
unsigned long expires;
int meta_slots_used;
-   bool full_coalesce;
 };
 
 #define XENVIF_RX_CB(skb) ((struct xenvif_rx_cb *)(skb)-cb)
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 908e65e..568238d 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -233,51 +233,6 @@ static void xenvif_rx_queue_drop_expired(struct 
xenvif_queue *queue)
}
 }
 
-/*
- * Returns true if we should start a new receive buffer instead of
- * adding 'size' bytes to a buffer which currently contains 'offset'
- * bytes.
- */
-static bool start_new_rx_buffer(int offset, unsigned long size, int head,
-   bool full_coalesce)
-{
-   /* simple case: we have completely filled the current buffer. */
-   if (offset == MAX_BUFFER_OFFSET)
-   return true;
-
-   /*
-* complex case: start a fresh buffer if the current frag
-* would overflow the current buffer but only if:
-* (i)   this frag would fit completely in the next buffer
-* and (ii)  there is already some data in the current buffer
-* and (iii) this is not the head buffer.
-* and (iv)  there is no need to fully utilize the buffers
-*
-* Where:
-* - (i) stops us splitting a frag into two copies
-*   unless the frag is too large for a single buffer.
-* - (ii) stops us from leaving a buffer pointlessly empty.
-* - (iii) stops us leaving the first buffer
-*   empty. Strictly speaking this is already covered
-*   by (ii) but is explicitly checked because
-*   netfront relies on the first buffer being
-*   non-empty and can crash otherwise.
-* - (iv) is needed for skbs which can use up more than MAX_SKB_FRAGS
-*   slot
-*
-* This means we will effectively linearise small
-* frags but do not needlessly split large buffers
-* into multiple copies tend to give large frags their
-* own buffers as before.
-*/
-   BUG_ON(size  MAX_BUFFER_OFFSET);
-   if ((offset + size  MAX_BUFFER_OFFSET)  offset  !head 
-   !full_coalesce)
-   return true;
-
-   return false;
-}
-
 struct netrx_pending_operations {
unsigned copy_prod, copy_cons;
unsigned meta_prod, meta_cons;
@@ -336,24 +291,13 @@ static void xenvif_gop_frag_copy(struct xenvif_queue 
*queue, struct sk_buff *skb
BUG_ON(offset = PAGE_SIZE);
BUG_ON(npo-copy_off  MAX_BUFFER_OFFSET);
 
-   bytes = PAGE_SIZE - offset;
+   if (npo-copy_off == MAX_BUFFER_OFFSET)
+   meta = get_next_rx_buffer(queue, npo);
 
+   bytes = PAGE_SIZE - offset;
if (bytes  size)
bytes = size;
 
-   if (start_new_rx_buffer(npo-copy_off,
-   bytes,
-   *head,
-   XENVIF_RX_CB(skb)-full_coalesce)) {
-   /*
-* Netfront requires there to be some data in the head
-* buffer.
-*/
-   BUG_ON(*head);
-
-   meta = get_next_rx_buffer(queue, npo);
-   }
-
if (npo-copy_off + bytes  MAX_BUFFER_OFFSET)
bytes = MAX_BUFFER_OFFSET - npo-copy_off;
 
@@ -652,60 +596,16 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
(skb = xenvif_rx_dequeue(queue)) != NULL) {
-   RING_IDX max_slots_needed;

[Xen-devel] [seabios test] 33391: tolerable FAIL - PUSHED

2015-01-13 Thread xen . org

flight 33391 seabios real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/33391/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-i386-libvirt   9 guest-start  fail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt  9 guest-start  fail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass

version targeted for testing:
 seabios  301dd092c2d04a5d70c94b9d873d810785e94a84
baseline version:
 seabios  60e0e55f212dadd043ab9e39bee05a48013ddd8f


People who touched revisions under test:
  Kevin O'Connor ke...@koconnor.net


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 pass
 test-amd64-i386-freebsd10-amd64  pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass
 test-amd64-amd64-xl-qemut-win7-amd64 fail
 test-amd64-i386-xl-qemut-win7-amd64  fail
 test-amd64-amd64-xl-qemuu-win7-amd64 fail
 test-amd64-i386-xl-qemuu-win7-amd64  fail
 test-amd64-amd64-xl-win7-amd64   fail
 test-amd64-i386-xl-win7-amd64fail
 test-amd64-i386-xl-credit2   pass
 test-amd64-i386-freebsd10-i386   pass
 test-amd64-amd64-xl-pcipt-intel  fail
 test-amd64-amd64-xl-pvh-intelfail
 test-amd64-i386-rhel6hvm-intel   pass
 test-amd64-i386-qemut-rhel6hvm-intel pass
 test-amd64-i386-qemuu-rhel6hvm-intel pass
 test-amd64-amd64-libvirt fail
 test-amd64-i386-libvirt  fail
 test-amd64-i386-xl-multivcpu pass
 test-amd64-amd64-pairpass
 test-amd64-i386-pair pass
 test-amd64-amd64-xl-sedf-pin pass
 test-amd64-amd64-xl-sedf pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 fail

Re: [Xen-devel] [OSSTEST PATCH] make-flight: reorganize scheduling related test jobs

2015-01-13 Thread Dario Faggioli

On Mon, 2015-01-12 at 16:52 +, Ian Jackson wrote:
 Dario Faggioli writes ([OSSTEST PATCH] make-flight: reorganize scheduling 
 related test jobs):
  Scheduling related tests are ok to run on ARM, so do
  not cut them off. They also do not depend on a
  particular Dom0 architecture.
  
  The net effect is that the following tests are removed:
   test-amd64-i386-xl-credit2
   test-amd64-i386-xl-multivcpu
  
  while the following new ones are created:
   test-amd64-amd64-xl-credit2
   test-amd64-amd64-xl-multivcpu
   test-armhf-armhf-xl-credit2
   test-armhf-armhf-xl-multivcpu
   test-armhf-armhf-xl-sedf
   test-armhf-armhf-xl-sedf-pin
 
 This looks plausible but can you include the output of a diff between
 the two sets of runvars, please ?
 
Not sure I'm getting.

I will put down here a diff of two invocation of
`./mg-show-flight-runvars standalone', one done before the other after
the patch... Was it that?

$ diff -Nru runvars.orig runvars.patched 
--- runvars.orig2015-01-13 09:49:17.402478000 +
+++ runvars.patched 2015-01-13 09:49:56.794085000 +
@@ -3,6 +3,7 @@
 test-amd64-amd64-rumpuserxen-amd64all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test 
 test-amd64-amd64-xl   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test 
 test-amd64-amd64-xl-credit2   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test 
+test-amd64-amd64-xl-multivcpu all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test 
 test-amd64-amd64-xl-pcipt-intel   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test,hvm-intel,pcipassthrough-nic
 test-amd64-amd64-xl-pvh-amd   all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test,hvm-amd 
 test-amd64-amd64-xl-pvh-intel all_hostflags   
arch-amd64,arch-xen-amd64,suite-wheezy,purpose-test,hvm-intel   
@@ -29,7 +30,6 @@
 test-amd64-i386-rhel6hvm-intelall_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test,hvm-intel
 test-amd64-i386-rumpuserxen-i386  all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test  
 test-amd64-i386-xlall_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test  
-test-amd64-i386-xl-multivcpu  all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test  
 test-amd64-i386-xl-qemut-debianhvm-amd64  all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test,hvm  
 test-amd64-i386-xl-qemut-win7-amd64   all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test,hvm  
 test-amd64-i386-xl-qemut-winxpsp3 all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test,hvm  
@@ -44,6 +44,10 @@
 test-amd64-i386-xl-winxpsp3-vcpus1all_hostflags   
arch-i386,arch-xen-amd64,suite-wheezy,purpose-test,hvm  
 test-armhf-armhf-libvirt  all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
 test-armhf-armhf-xl   all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
+test-armhf-armhf-xl-credit2   all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
+test-armhf-armhf-xl-multivcpu all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
+test-armhf-armhf-xl-sedf  all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
+test-armhf-armhf-xl-sedf-pin  all_hostflags   
arch-armhf,arch-xen-armhf,suite-wheezy,purpose-test 
 build-amd64   archamd64
   
 build-amd64-libvirt   archamd64
   
 build-amd64-oldkern   archamd64
   
@@ -62,6 +66,7 @@
 test-amd64-amd64-rumpuserxen-amd64archamd64

[Xen-devel] [PATCH v3 13/24] xen/arm: Implement hypercall PHYSDEVOP_{, un}map_pirq

The physdev sub-hypercalls PHYSDEVOP_{,map}_pirq allow the toolstack to
assign/deassign a physical IRQ to the guest (via the config options irqs
for xl). The x86 version is using them with PIRQ (IRQ bound to an event
channel). As ARM doesn't have a such concept, we could reuse it to bound
a physical IRQ to a virtual IRQ.

For now, we allow only SPIs to be mapped to the guest.
The type MAP_PIRQ_TYPE_GSI is used for this purpose.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Jan Beulich jbeul...@suse.com

---
I'm not sure it's the best solution to reuse hypercalls for a
different purpose. If x86 plan to have a such concept (i.e binding a
physical IRQ to a virtual IRQ), we could introduce new hypercalls.
Any thoughs?

TODO: This patch is lacking of support of vIRQ != IRQ. I plan to
handle it correctly on the next version.

Changes in v3:
- Functions to allocate/release/reserved a VIRQ has been moved
in a separate patch
- Make clear that only MAP_PIRQ_GSI is only supported for now

Changes in v2:
- Add PHYSDEVOP_unmap_pirq
- Rework commit message
- Add functions to allocate/release a VIRQ
- is_routable_irq has been renamed into is_assignable_irq
---
 xen/arch/arm/physdev.c | 136 -
 1 file changed, 134 insertions(+), 2 deletions(-)

diff --git a/xen/arch/arm/physdev.c b/xen/arch/arm/physdev.c
index 61b4a18..0cf9bbd 100644
--- a/xen/arch/arm/physdev.c
+++ b/xen/arch/arm/physdev.c
@@ -8,13 +8,145 @@
 #include xen/types.h
 #include xen/lib.h
 #include xen/errno.h
+#include xen/iocap.h
+#include xen/guest_access.h
+#include xsm/xsm.h
+#include asm/current.h
 #include asm/hypercall.h
+#include public/physdev.h
 
+static int physdev_map_pirq(domid_t domid, int type, int index, int *pirq_p)
+{
+struct domain *d;
+int ret;
+int irq = index;
+int virq;
+
+d = rcu_lock_domain_by_any_id(domid);
+if ( d == NULL )
+return -ESRCH;
+
+ret = xsm_map_domain_pirq(XSM_TARGET, d);
+if ( ret )
+goto free_domain;
+
+/* For now we only suport GSI */
+if ( type != MAP_PIRQ_TYPE_GSI )
+{
+ret = -EINVAL;
+dprintk(XENLOG_G_ERR,
+dom%u: wrong map_pirq type 0x%x, only MAP_PIRQ_TYPE_GSI is 
supported.\n,
+d-domain_id, type);
+goto free_domain;
+}
+
+if ( !is_assignable_irq(irq) )
+{
+ret = -EINVAL;
+dprintk(XENLOG_G_ERR, IRQ%u is not routable to a guest\n, irq);
+goto free_domain;
+}
+
+ret = -EPERM;
+if ( !irq_access_permitted(current-domain, irq) )
+goto free_domain;
+
+if ( *pirq_p  0 )
+{
+BUG_ON(irq  16);   /* is_assignable_irq already denies SGIs */
+virq = vgic_allocate_virq(d, (irq = 32));
+
+ret = -ENOSPC;
+if ( virq  0 )
+goto free_domain;
+}
+else
+{
+ret = -EBUSY;
+virq = *pirq_p;
+
+if ( !vgic_reserve_virq(d, virq) )
+goto free_domain;
+}
+
+gdprintk(XENLOG_DEBUG, irq = %u virq = %u\n, irq, virq);
+
+ret = route_irq_to_guest(d, virq, irq, routed IRQ);
+
+if ( !ret )
+*pirq_p = virq;
+else
+vgic_free_virq(d, virq);
+
+free_domain:
+rcu_unlock_domain(d);
+
+return ret;
+}
+
+int physdev_unmap_pirq(domid_t domid, int pirq)
+{
+struct domain *d;
+int ret;
+
+d = rcu_lock_domain_by_any_id(domid);
+if ( d == NULL )
+return -ESRCH;
+
+ret = xsm_unmap_domain_pirq(XSM_TARGET, d);
+if ( ret )
+goto free_domain;
+
+ret = release_guest_irq(d, pirq);
+if ( ret )
+goto free_domain;
+
+vgic_free_virq(d, pirq);
+
+free_domain:
+rcu_unlock_domain(d);
+
+return ret;
+}
 
 int do_physdev_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
-printk(%s %d cmd=%d: not implemented yet\n, __func__, __LINE__, cmd);
-return -ENOSYS;
+int ret;
+
+switch ( cmd )
+{
+case PHYSDEVOP_map_pirq:
+{
+physdev_map_pirq_t map;
+
+ret = -EFAULT;
+if ( copy_from_guest(map, arg, 1) != 0 )
+break;
+
+ret = physdev_map_pirq(map.domid, map.type, map.index, map.pirq);
+
+if ( __copy_to_guest(arg, map, 1) )
+ret = -EFAULT;
+}
+break;
+
+case PHYSDEVOP_unmap_pirq:
+{
+physdev_unmap_pirq_t unmap;
+
+ret = -EFAULT;
+if ( copy_from_guest(unmap, arg, 1) != 0 )
+break;
+
+ret = physdev_unmap_pirq(unmap.domid, unmap.pirq);
+}
+
+default:
+ret = -ENOSYS;
+break;
+}
+
+return ret;
 }
 
 /*
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 01/24] xen: Extend DOMCTL createdomain to support arch configuration

On ARM the virtual GIC may differ between each guest (emulated GIC version,
number of SPIs...). Those informations are already known at the domain creation
and can never change.

For now only the gic_version is set. In long run, there will be more parameters
such as the number of SPIs. All will be required to be set at the same time.

A new arch-specific structure arch_domainconfig has been created, the x86
one doesn't have any specific configuration, a dummy structure
(C-spec compliant) has been created to factorize the code on the toolstack.

Some external tools (qemu, xenstore) may require to create a domain. Rather
than asking them to take care of the arch-specific domain configuration, let
the current function (xc_domain_create) to chose a default configuration and
introduce a new one (xc_domain_create_config).

This patch also drop the previously DOMCTL arm_configure_domain introduced
in Xen 4.5, as it has been made useless.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Daniel De Graaf dgde...@tycho.nsa.gov
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: Stefano Stabellini stefano.stabell...@citrix.com
Cc: Keir Fraser k...@xen.org
Cc: Jan Beulich jbeul...@suse.com
Cc: Andrew Cooper andrew.coop...@citrix.com
Cc: George Dunlap george.dun...@eu.citrix.com

---
This is a follow-up of 
http://lists.xen.org/archives/html/xen-devel/2014-11/msg00522.html

TODO: What about migration? For now the configuration lives in internal
libxl structure. We need a way to pass the domain configuration to the
other end.

I'm not sure if we should care of this right now as migration doesn't
yet exists on ARM.

For the xc_domain_create, Stefano S. was looking to drop PV domain
creation support in QEMU. So maybe I could simply extend xc_domain_create
and drop the xc_domain_create_config.

Changes in v3:
- Patch was previously sent in a separate series [1]
- Rename arch_domainconfig to xen_arch_domainconfig
- Drop the typedef
- Pass NULL for DOM0 config on x86
- Drop spurious changes
- Update comment in start_xen in arch/arm/setup.c

[1] https://patches.linaro.org/41083/
---
 tools/flask/policy/policy/modules/xen/xen.if |  2 +-
 tools/libxc/include/xenctrl.h| 14 +
 tools/libxc/xc_domain.c  | 46 
 tools/libxl/libxl_arch.h |  6 
 tools/libxl/libxl_arm.c  | 28 ++---
 tools/libxl/libxl_create.c   | 21 ++---
 tools/libxl/libxl_dm.c   |  3 +-
 tools/libxl/libxl_dom.c  |  2 +-
 tools/libxl/libxl_internal.h |  7 +++--
 tools/libxl/libxl_x86.c  | 10 ++
 xen/arch/arm/domain.c| 28 -
 xen/arch/arm/domctl.c| 34 
 xen/arch/arm/mm.c|  6 ++--
 xen/arch/arm/setup.c |  6 +++-
 xen/arch/x86/domain.c|  3 +-
 xen/arch/x86/mm.c|  6 ++--
 xen/arch/x86/setup.c |  8 +++--
 xen/common/domain.c  |  7 +++--
 xen/common/domctl.c  |  3 +-
 xen/common/schedule.c|  3 +-
 xen/include/public/arch-arm.h|  8 +
 xen/include/public/arch-x86/xen.h|  4 +++
 xen/include/public/domctl.h  | 18 +--
 xen/include/xen/domain.h |  3 +-
 xen/include/xen/sched.h  |  9 --
 xen/xsm/flask/hooks.c|  3 --
 xen/xsm/flask/policy/access_vectors  |  2 --
 27 files changed, 170 insertions(+), 120 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.if 
b/tools/flask/policy/policy/modules/xen/xen.if
index 2d32e1c..620d151 100644
--- a/tools/flask/policy/policy/modules/xen/xen.if
+++ b/tools/flask/policy/policy/modules/xen/xen.if
@@ -51,7 +51,7 @@ define(`create_domain_common', `
getaffinity setaffinity setvcpuextstate };
allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim
set_max_evtchn set_vnumainfo get_vnumainfo cacheflush
-   psr_cmt_op configure_domain };
+   psr_cmt_op };
allow $1 $2:security check_context;
allow $1 $2:shadow enable;
allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage 
mmuext_op updatemp };
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 0ad8b8d..d66571f 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -477,18 +477,20 @@ typedef union
 } start_info_any_t;
 #endif
 
+
+typedef struct xen_arch_domainconfig xc_domain_configuration_t;
+int

[Xen-devel] [PATCH v3 19/24] xen/iommu: arm: Wire iommu DOMCTL for ARM

Signed-off-by: Julien Grall julien.gr...@linaro.org
Acked-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
Cc: Jan Beulich jbeul...@suse.com

---
Changes in v3:
- Add Stefano's ack

Changes in v2:
- Don't move the call in common code.
---
 xen/arch/arm/domctl.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/xen/arch/arm/domctl.c b/xen/arch/arm/domctl.c
index 485d3aa..cc4894e 100644
--- a/xen/arch/arm/domctl.c
+++ b/xen/arch/arm/domctl.c
@@ -33,7 +33,16 @@ long arch_do_domctl(struct xen_domctl *domctl, struct domain 
*d,
 return p2m_cache_flush(d, s, e);
 }
 default:
-return subarch_do_domctl(domctl, d, u_domctl);
+{
+int rc;
+
+rc = subarch_do_domctl(domctl, d, u_domctl);
+
+if ( rc == -ENOSYS )
+rc = iommu_do_domctl(domctl, d, u_domctl);
+
+return rc;
+}
 }
 }
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 02/24] xen/arm: Divide GIC initialization in 2 parts

Currently the function to translate IRQ from the device tree is set
unconditionally  to be able to be able to retrieve serial/timer IRQ before the
GIC has been initialized.

It assumes that the xlate function won't never changed. We may also need to
have the primary interrupt controller very early.

Rework the gic initialization in 2 parts:
- gic_preinit: Get the interrupt controller device tree node and set
up GIC and xlate callbacks
- gic_init: Initialize the interrupt controller and the boot CPU
interrupts.

The former function will be called just after the IRQ subsystem as been
initialized.

Signed-off-by: Julien Grall julien.gr...@linaro.org

---
Changes in v3:
- Patch was previously sent in a separate series [1]
- Reorder the function to avoid forward declaration
- Make gic-v3 driver compliant to the new interface
- Remove spurious field addition in gicv2 structure

Changelog based on the separate series:

Changes in v3:
- Patch added.

[1] https://patches.linaro.org/33313/
---
 xen/arch/arm/gic-v2.c | 70 ++-
 xen/arch/arm/gic-v3.c | 75 ---
 xen/arch/arm/gic.c| 16 --
 xen/arch/arm/setup.c  |  3 +-
 xen/include/asm-arm/gic.h |  8 +
 5 files changed, 100 insertions(+), 72 deletions(-)

diff --git a/xen/arch/arm/gic-v2.c b/xen/arch/arm/gic-v2.c
index 15916c9..016b0fd 100644
--- a/xen/arch/arm/gic-v2.c
+++ b/xen/arch/arm/gic-v2.c
@@ -655,37 +655,10 @@ static hw_irq_controller gicv2_guest_irq_type = {
 .set_affinity = gicv2_irq_set_affinity,
 };
 
-const static struct gic_hw_operations gicv2_ops = {
-.info= gicv2_info,
-.secondary_init  = gicv2_secondary_cpu_init,
-.save_state  = gicv2_save_state,
-.restore_state   = gicv2_restore_state,
-.dump_state  = gicv2_dump_state,
-.gicv_setup  = gicv2v_setup,
-.gic_host_irq_type   = gicv2_host_irq_type,
-.gic_guest_irq_type  = gicv2_guest_irq_type,
-.eoi_irq = gicv2_eoi_irq,
-.deactivate_irq  = gicv2_dir_irq,
-.read_irq= gicv2_read_irq,
-.set_irq_properties  = gicv2_set_irq_properties,
-.send_SGI= gicv2_send_SGI,
-.disable_interface   = gicv2_disable_interface,
-.update_lr   = gicv2_update_lr,
-.update_hcr_status   = gicv2_hcr_status,
-.clear_lr= gicv2_clear_lr,
-.read_lr = gicv2_read_lr,
-.write_lr= gicv2_write_lr,
-.read_vmcr_priority  = gicv2_read_vmcr_priority,
-.read_apr= gicv2_read_apr,
-.make_dt_node= gicv2_make_dt_node,
-};
-
-/* Set up the GIC */
-static int __init gicv2_init(struct dt_device_node *node, const void *data)
+static int __init gicv2_init(void)
 {
 int res;
-
-dt_device_set_used_by(node, DOMID_XEN);
+const struct dt_device_node *node = gicv2_info.node;
 
 res = dt_device_get_address(node, 0, gicv2.dbase, NULL);
 if ( res || !gicv2.dbase || (gicv2.dbase  ~PAGE_MASK) )
@@ -708,9 +681,6 @@ static int __init gicv2_init(struct dt_device_node *node, 
const void *data)
 panic(GICv2: Cannot find the maintenance IRQ);
 gicv2_info.maintenance_irq = res;
 
-/* Set the GIC as the primary interrupt controller */
-dt_interrupt_controller = node;
-
 /* TODO: Add check on distributor, cpu size */
 
 printk(GICv2 initialization:\n
@@ -755,8 +725,42 @@ static int __init gicv2_init(struct dt_device_node *node, 
const void *data)
 
 spin_unlock(gicv2.lock);
 
+return 0;
+}
+
+const static struct gic_hw_operations gicv2_ops = {
+.info= gicv2_info,
+.init= gicv2_init,
+.secondary_init  = gicv2_secondary_cpu_init,
+.save_state  = gicv2_save_state,
+.restore_state   = gicv2_restore_state,
+.dump_state  = gicv2_dump_state,
+.gicv_setup  = gicv2v_setup,
+.gic_host_irq_type   = gicv2_host_irq_type,
+.gic_guest_irq_type  = gicv2_guest_irq_type,
+.eoi_irq = gicv2_eoi_irq,
+.deactivate_irq  = gicv2_dir_irq,
+.read_irq= gicv2_read_irq,
+.set_irq_properties  = gicv2_set_irq_properties,
+.send_SGI= gicv2_send_SGI,
+.disable_interface   = gicv2_disable_interface,
+.update_lr   = gicv2_update_lr,
+.update_hcr_status   = gicv2_hcr_status,
+.clear_lr= gicv2_clear_lr,
+.read_lr = gicv2_read_lr,
+.write_lr= gicv2_write_lr,
+.read_vmcr_priority  = gicv2_read_vmcr_priority,
+.read_apr= gicv2_read_apr,
+.make_dt_node= gicv2_make_dt_node,
+};
+
+/* Set up the GIC */
+static int __init gicv2_preinit(struct dt_device_node *node, const void *data)
+{
 gicv2_info.hw_version = GIC_V2;
+gicv2_info.node = node;
 register_gic_ops(gicv2_ops);
+

[Xen-devel] [PATCH v3 24/24] xl: Add new option dtdev

The option dtdev will be used to passthrough a non-PCI device described
in the device tree to a guest.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v2:
- libxl_device_dt has been rename to libxl_device_dtdev
- use xrealloc instead of realloc
---
 docs/man/xl.cfg.pod.5|  5 +
 tools/libxl/xl_cmdimpl.c | 21 -
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 225b782..cfd3d5f 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -721,6 +721,11 @@ More information about Xen gfx_passthru feature is 
available
 on the XenVGAPassthrough Lhttp://wiki.xen.org/wiki/XenVGAPassthrough
 wiki page.
 
+=item Bdtdev=[ DTDEV_PATH, DTDEV_PATH, ... ]
+
+Specifies the host device node to passthrough to this guest. Each DTDEV_PATH
+is the absolute path in the device tree.
+
 =item Bioports=[ IOPORT_RANGE, IOPORT_RANGE, ... ]
 
 Allow guest to access specific legacy I/O ports. Each BIOPORT_RANGE
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 31e89e8..80c9df6 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -986,7 +986,7 @@ static void parse_config_data(const char *config_source,
 long l;
 XLU_Config *config;
 XLU_ConfigList *cpus, *vbds, *nics, *pcis, *cvfbs, *cpuids, *vtpms;
-XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian;
+XLU_ConfigList *channels, *ioports, *irqs, *iomem, *viridian, *dtdevs;
 int num_ioports, num_irqs, num_iomem, num_cpus, num_viridian;
 int pci_power_mgmt = 0;
 int pci_msitranslate = 0;
@@ -1746,6 +1746,25 @@ skip_vfb:
 libxl_defbool_set(b_info-u.pv.e820_host, true);
 }
 
+if (!xlu_cfg_get_list (config, dtdev, dtdevs, 0, 0)) {
+d_config-num_dtdevs = 0;
+d_config-dtdevs = NULL;
+for (i = 0; (buf = xlu_cfg_get_listitem(dtdevs, i)) != NULL; i++) {
+libxl_device_dtdev *dtdev;
+
+d_config-dtdevs = (libxl_device_dtdev *) 
xrealloc(d_config-dtdevs, sizeof (libxl_device_dtdev) * (d_config-num_dtdevs 
+ 1));
+dtdev = d_config-dtdevs + d_config-num_dtdevs;
+libxl_device_dtdev_init(dtdev);
+
+dtdev-path = strdup(buf);
+if (dtdev-path == NULL) {
+fprintf(stderr, unable to duplicate string for dtdevs\n);
+exit(-1);
+}
+d_config-num_dtdevs++;
+}
+}
+
 switch (xlu_cfg_get_list(config, cpuid, cpuids, 0, 1)) {
 case 0:
 {
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 22/24] tools/libxl: arm: Use an higher value for the GIC phandle

The partial device tree may contains phandle. The Device Tree Compiler
tends to allocate the phandle from 1.

Reserve the ID 65000 for the GIC phandle. I think we can safely assume
that the partial device tree will never contain a such ID.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Ian Jackson ian.jack...@eu.citrix.com
Cc: Wei Liu wei.l...@citrix.com

---
Changes in v3:
- Patch added
---
 tools/libxl/libxl_arm.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 619458b..dc745fb 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -78,10 +78,11 @@ static struct arch_info {
 {xen-3.0-aarch64, arm,armv8-timer, arm,armv8 },
 };
 
-enum {
-PHANDLE_NONE = 0,
-PHANDLE_GIC,
-};
+/*
+ * The device tree compiler (DTC) is allocating the phandle from 1 to
+ * onwards. Reserve a high value for the GIC phandle.
+ */
+#define PHANDLE_GIC (65000)
 
 typedef uint32_t be32;
 typedef be32 gic_interrupt[3];
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v3 17/24] xen/passthrough: arm: release earlier the DT devices assigned to a guest

The toolstack may not have deassign every device used by a guest.
Therefore we have to go through the device list and removing them before
asking the IOMMU drivers to release memory for this domain.

This can be done by moving the call to the release function when we
relinquish the resources. The IOMMU part will be destroyed later when
the domain is freed.

Signed-off-by: Julien Grall julien.gr...@linaro.org
Cc: Jan Beulich jbeul...@suse.com

---
Changes in v3:
- Patch added. Superseed the patch xen/passthrough: call
arch_iommu_domain_destroy before calling iommu teardown in
the previous patch series.
---
 xen/arch/arm/domain.c | 4 
 xen/drivers/passthrough/arm/iommu.c   | 1 -
 xen/drivers/passthrough/device_tree.c | 5 -
 xen/include/xen/iommu.h   | 2 +-
 4 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 6e56665..d85748a 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -772,6 +772,10 @@ int domain_relinquish_resources(struct domain *d)
 switch ( d-arch.relmem )
 {
 case RELMEM_not_started:
+ret = iommu_release_dt_devices(d);
+if ( ret )
+return ret;
+
 d-arch.relmem = RELMEM_xen;
 /* Falltrough */
 
diff --git a/xen/drivers/passthrough/arm/iommu.c 
b/xen/drivers/passthrough/arm/iommu.c
index 5870aef..8223a39 100644
--- a/xen/drivers/passthrough/arm/iommu.c
+++ b/xen/drivers/passthrough/arm/iommu.c
@@ -66,7 +66,6 @@ int arch_iommu_domain_init(struct domain *d)
 
 void arch_iommu_domain_destroy(struct domain *d)
 {
-iommu_dt_domain_destroy(d);
 }
 
 int arch_iommu_populate_page_table(struct domain *d)
diff --git a/xen/drivers/passthrough/device_tree.c 
b/xen/drivers/passthrough/device_tree.c
index 88e496e..e7eb34f 100644
--- a/xen/drivers/passthrough/device_tree.c
+++ b/xen/drivers/passthrough/device_tree.c
@@ -97,7 +97,7 @@ int iommu_dt_domain_init(struct domain *d)
 return 0;
 }
 
-void iommu_dt_domain_destroy(struct domain *d)
+int iommu_release_dt_devices(struct domain *d)
 {
 struct hvm_iommu *hd = domain_hvm_iommu(d);
 struct dt_device_node *dev, *_dev;
@@ -109,5 +109,8 @@ void iommu_dt_domain_destroy(struct domain *d)
 if ( rc )
 dprintk(XENLOG_ERR, Failed to deassign %s in domain %u\n,
 dt_node_full_name(dev), d-domain_id);
+return rc;
 }
+
+return 0;
 }
diff --git a/xen/include/xen/iommu.h b/xen/include/xen/iommu.h
index c146ee4..d03df14 100644
--- a/xen/include/xen/iommu.h
+++ b/xen/include/xen/iommu.h
@@ -117,7 +117,7 @@ void iommu_read_msi_from_ire(struct msi_desc *msi_desc, 
struct msi_msg *msg);
 int iommu_assign_dt_device(struct domain *d, struct dt_device_node *dev);
 int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev);
 int iommu_dt_domain_init(struct domain *d);
-void iommu_dt_domain_destroy(struct domain *d);
+int iommu_release_dt_devices(struct domain *d);
 
 #endif /* HAS_DEVICE_TREE */
 
-- 
2.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCHv1 net-next] xen-netback: always fully coalesce guest Rx packets

On Tue, Jan 13, 2015 at 02:05:17PM +, David Vrabel wrote:
 Always fully coalesce guest Rx packets into the minimum number of ring
 slots.  Reducing the number of slots per packet has significant
 performance benefits (e.g., 7.2 Gbit/s to 11 Gbit/s in an off-host
 receive test).
 

Good number.

 However, this does increase the number of grant ops per packet which
 decreases performance with some workloads (intrahost VM to VM)

Do you have figures before and after this change?

 /unless/ grant copy has been optimized for adjacent ops with the same
 source or destination (see grant-table: defer releasing pages
 acquired in a grant copy[1]).
 
 Do we need to retain the existing path and make the always coalesce
 path conditional on a suitable version of Xen?
 

It the new path improves off-host RX on all Xen versions and doesn't
degrade intrahost VM to VM RX that much, I think we should use it
unconditionally.  Is intrahost VM to VM RX important to XenServer?

I don't consider intrahost VM to VM RX a very important use case, at
least not as important as off-host RX. I would expect in a could
environment users would not count on their VMs reside on the same host.
Plus, some could provider might deliberately route traffic off-host for
various reasons even if VMs are on the same host.  (Verizon for one,
mentioned they do that during last year's Xen Summit IIRC).

Others might disagree. Let's wait for other people to chime in.

 [1] http://lists.xen.org/archives/html/xen-devel/2015-01/msg01118.html
 
 Signed-off-by: David Vrabel david.vra...@citrix.com
 ---
  drivers/net/xen-netback/common.h  |1 -
  drivers/net/xen-netback/netback.c |  106 
 ++---
  2 files changed, 3 insertions(+), 104 deletions(-)

Love the diffstat!

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2] tools/Rules.mk: Don't optimize debug builds; add macro debugging information

On Tue, 2015-01-13 at 13:52 +0800, Wen Congyang wrote:
 On 12/01/2014 10:21 PM, Euan Harris wrote:
  Tools debug builds are built with optimization level -O1, inherited from
  the CFLAGS definition in StdGNU.mk.   Optimizations confuse the debugger,
  and the comment justifying -O1 in StdGNU.mk should not apply for a
  userspace library.   Disable optimization by appending -O0 to CFLAGS,
  which overrides the -O1 flag specified earlier.
  
  Also specify -g3, to add macro debugging information which allows
  gdb to expand macro invocations.   This is useful as libxl uses many
  non-trivial macros.
  
  Signed-off-by: Euan Harris euan.har...@citrix.com
  
  Changes since v1:
* moved flag override to tools/Rules.mk so it affects all tools
  ---
   tools/Rules.mk |5 +
   1 files changed, 5 insertions(+), 0 deletions(-)
  
  diff --git a/tools/Rules.mk b/tools/Rules.mk
  index 87a56dc..7ef1ce5 100644
  --- a/tools/Rules.mk
  +++ b/tools/Rules.mk
  @@ -54,6 +54,11 @@ CFLAGS_libxenvchan = -I$(XEN_LIBVCHAN)
   LDLIBS_libxenvchan = $(SHLIB_libxenctrl) $(SHLIB_libxenstore) 
  -L$(XEN_LIBVCHAN) -lxenvchan
   SHLIB_libxenvchan  = -Wl,-rpath-link=$(XEN_LIBVCHAN)
   
  +ifeq ($(debug),y)
  +# Disable optimizations and debugging information for macros
  +CFLAGS += -O0 -g3
  +endif
  +
   LIBXL_BLKTAP ?= $(CONFIG_BLKTAP2)
   
   ifeq ($(LIBXL_BLKTAP),y)
  
 
 This patch causes a building error:
 gcc -fno-strict-aliasing -O2 -g -pipe -Wall -Werror=format-security 
 -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong 
 --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic 
 -D_GNU_SOURCE -fPIC -fwrapv -DNDEBUG -O2 -g -pipe -Wall 
 -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -fexceptions 
 -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 
 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -O1 -fno-omit-frame-pointer -m64 
 -g -fno-strict-aliasing -std=gnu99 -Wall -Wstrict-prototypes 
 -Wdeclaration-after-statement -Wno-unused-but-set-variable 
 -Wno-unused-local-typedefs -O0 -g3 -D__XEN_TOOLS__ -MMD -MF .install.d 
 -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -fno-optimize-sibling-calls -fPIC 
 -I../../tools/include -I../../tools/libxc/include -Ixen/lowlevel/xc 
 -I/usr/include/python2.7 -c xen/lowlevel/xc/xc.c -o 
 build/temp.linux-x86_64-2.7/xen/lowlevel/xc/xc.o -fno-strict-aliasing -Werror
 In file included from /usr/include/limits.h:25:0,
  from 
 /usr/lib/gcc/x86_64-redhat-linux/4.9.2/include/limits.h:168,
  from 
 /usr/lib/gcc/x86_64-redhat-linux/4.9.2/include/syslimits.h:7,
  from 
 /usr/lib/gcc/x86_64-redhat-linux/4.9.2/include/limits.h:34,
  from /usr/include/python2.7/Python.h:19,
  from xen/lowlevel/xc/xc.c:7:
 /usr/include/features.h:328:4: error: #warning _FORTIFY_SOURCE requires 
 compiling with optimization (-O) [-Werror=cpp]

Where is _FORTIFY_SOURCE coming from? I don't see it in our tree
anywhere except stubdom/Makefile which is disabling it and the build
worked for me. Perhaps it is coming from your build environment
somewhere? How are you configuring and building Xen?

Maybe what we want to do is only disable optimisations if debug=y AND
-D_FORTIFY_SOURCE is not set? Might involve some autoconf checks to
determine the fortification level in the user provided CFLAGS, which
might be a bit faffsome.

Ian.

  #  warning _FORTIFY_SOURCE requires compiling with optimization (-O)
 ^
 cc1: all warnings being treated as errors
 error: command 'gcc' failed with exit status 1
 
 The following patch can fix this problem:
 
 From d16961971e14f6e50f9a9905449929d5a7c60860 Mon Sep 17 00:00:00 2001
 From: Wen Congyang we...@cn.fujitsu.com
 Date: Tue, 13 Jan 2015 12:05:30 +0800
 Subject: [PATCH] Fix a building error
 
 Commit 1166ecf7 disables optimization. But _FORTIFY_SOURCE
 requires compiling with optimization (-O). Disable _FORTIFY_SOURCE
 by appending -Wp,-U_FORTIFY_SOURCE to CFLAGS.
 
 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 ---
  tools/Rules.mk | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/tools/Rules.mk b/tools/Rules.mk
 index 962a743..8ad1b05 100644
 --- a/tools/Rules.mk
 +++ b/tools/Rules.mk
 @@ -56,7 +56,7 @@ SHLIB_libxenvchan  = -Wl,-rpath-link=$(XEN_LIBVCHAN)
  
  ifeq ($(debug),y)
  # Disable optimizations and enable debugging information for macros
 -CFLAGS += -O0 -g3
 +CFLAGS += -O0 -g3 -Wp,-U_FORTIFY_SOURCE
  endif
  
  LIBXL_BLKTAP ?= $(CONFIG_BLKTAP2)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 1/5] x86: expose CMT L3 event mask to user space

On Tue, Jan 13, 2015 at 10:00:58AM +, Jan Beulich wrote:
  On 13.01.15 at 09:02, chao.p.p...@linux.intel.com wrote:
  L3 event mask indicates the event types supported in host, including
  cache occupancy event as well as local/total memory bandwidth events
  for Memory Bandwidth Monitoring(MBM). Expose it so all these events
  can be monitored in user space.
  
  Signed-off-by: Chao Peng chao.p.p...@linux.intel.com
  Reviewed-by: Andrew Cooper andrew.coop...@citrix.com
  Acked-by: Jan Beulich jbeul...@suse.com
 
 Please don't re-send patches already applied.
 
Oh, sorry! Just noticed it's already in.
Chao
 
 
 ___
 Xen-devel mailing list
 Xen-devel@lists.xen.org
 http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v1 1/2] tools: unhook blktap1 from the build and remove all references to it

This was disabled by default in Xen 4.4. Since xend has now been
removed from the tree I don't believe anything is using it.

We need to pass an explicit CONFIG_BLKTAP1=n to qemu-xen-traditional
otherwise it defaults to y and doesn't build.

This patch does all the ground work, the tools/blktap directory will
be removed in the next (*huge*) patch.

Note that this has no impact on blktap2, which is what libxl supports.
blktap1 was only usable via xend which has already been removed.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
 INSTALL  |1 -
 config/Tools.mk.in   |1 -
 tools/Makefile   |2 +-
 tools/configure  |   29 +
 tools/configure.ac   |4 +-
 tools/hotplug/Linux/Makefile |1 -
 tools/hotplug/Linux/blktap   |   94 --
 tools/hotplug/Linux/xen-backend.rules.in |2 -
 8 files changed, 3 insertions(+), 131 deletions(-)
 delete mode 100644 tools/hotplug/Linux/blktap

diff --git a/INSTALL b/INSTALL
index 71dd0eb..33f65ba 100644
--- a/INSTALL
+++ b/INSTALL
@@ -142,7 +142,6 @@ this detection and the sysv runlevel scripts have to be 
used.
 
 The old backend drivers are disabled because qdisk is now the default.
 This option can be used to build them anyway.
-  --enable-blktap1
   --enable-blktap2
 
 Build various stubom components, some are only example code. Its usually
diff --git a/config/Tools.mk.in b/config/Tools.mk.in
index 89de5bd..30267fa 100644
--- a/config/Tools.mk.in
+++ b/config/Tools.mk.in
@@ -57,7 +57,6 @@ CONFIG_ROMBIOS  := @rombios@
 CONFIG_SEABIOS  := @seabios@
 CONFIG_QEMU_TRAD:= @qemu_traditional@
 CONFIG_QEMU_XEN := @qemu_xen@
-CONFIG_BLKTAP1  := @blktap1@
 CONFIG_BLKTAP2  := @blktap2@
 CONFIG_QEMUU_EXTRA_ARGS:= @EXTRA_QEMUU_CONFIGURE_ARGS@
 CONFIG_REMUS_NETBUF := @remus_netbuf@
diff --git a/tools/Makefile b/tools/Makefile
index af9798a..1ad7a5d 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -16,7 +16,6 @@ SUBDIRS-y += console
 SUBDIRS-y += xenmon
 SUBDIRS-y += xenstat
 SUBDIRS-$(CONFIG_Linux) += memshr 
-SUBDIRS-$(CONFIG_BLKTAP1) += blktap
 SUBDIRS-$(CONFIG_BLKTAP2) += blktap2
 SUBDIRS-$(CONFIG_NetBSD) += xenbackendd
 SUBDIRS-y += libfsimage
@@ -169,6 +168,7 @@ subdir-all-qemu-xen-traditional-dir: 
qemu-xen-traditional-dir-find
 subdir-install-qemu-xen-traditional-dir: qemu-xen-traditional-dir-find
set -e; \
$(buildmakevars2shellvars); \
+   export CONFIG_BLKTAP1=n; \
cd qemu-xen-traditional-dir; \
$(QEMU_ROOT)/xen-setup \
--extra-cflags=$(EXTRA_CFLAGS_QEMU_TRADITIONAL) \
diff --git a/tools/configure b/tools/configure
index e971070..4117c83 100755
--- a/tools/configure
+++ b/tools/configure
@@ -700,7 +700,6 @@ rombios
 qemu_traditional
 blktap2
 LINUX_BACKEND_MODULES
-blktap1
 debug
 seabios
 ovmf
@@ -790,7 +789,6 @@ enable_xsmpolicy
 enable_ovmf
 enable_seabios
 enable_debug
-enable_blktap1
 with_linux_backend_modules
 enable_blktap2
 enable_qemu_traditional
@@ -1463,7 +1461,6 @@ Optional Features:
   --enable-ovmf   Enable OVMF (default is DISABLED)
   --disable-seabios   Disable SeaBIOS (default is ENABLED)
   --disable-debug Disable debug build of tools (default is ENABLED)
-  --enable-blktap1Enable blktap1 tools (default is DISABLED)
   --enable-blktap2Enable blktap2, (DEFAULT is on for Linux, otherwise
   off)
   --enable-qemu-traditional
@@ -3991,29 +3988,6 @@ debug=$ax_cv_debug
 
 
 
-# Check whether --enable-blktap1 was given.
-if test ${enable_blktap1+set} = set; then :
-  enableval=$enable_blktap1;
-fi
-
-
-if test x$enable_blktap1 = xno; then :
-
-ax_cv_blktap1=n
-
-elif test x$enable_blktap1 = xyes; then :
-
-ax_cv_blktap1=y
-
-elif test -z $ax_cv_blktap1; then :
-
-ax_cv_blktap1=n
-
-fi
-blktap1=$ax_cv_blktap1
-
-
-
 
 # Check whether --with-linux-backend-modules was given.
 if test ${with_linux_backend_modules+set} = set; then :
@@ -4037,7 +4011,6 @@ usbbk
 pciback
 xen-acpi-processor
 blktap2
-blktap
 
 ;;
 *)
@@ -7935,7 +7908,7 @@ fi
 
 
 
-if test x$enable_blktap1 = xyes || test x$enable_blktap2 = xyes; then :
+if test x$enable_blktap2 = xyes]; then :
 
 { $as_echo $as_me:${as_lineno-$LINENO}: checking for io_setup in -laio 5
 $as_echo_n checking for io_setup in -laio...  6; }
diff --git a/tools/configure.ac b/tools/configure.ac
index 1ac63a3..72e2465 100644
--- a/tools/configure.ac
+++ b/tools/configure.ac
@@ -89,7 +89,6 @@ AX_ARG_DEFAULT_ENABLE([xsmpolicy], [Disable XSM policy 
compilation])
 AX_ARG_DEFAULT_DISABLE([ovmf], [Enable OVMF])
 AX_ARG_DEFAULT_ENABLE([seabios], [Disable SeaBIOS])
 AX_ARG_DEFAULT_ENABLE([debug], [Disable debug build of tools])
-AX_ARG_DEFAULT_DISABLE([blktap1], [Enable blktap1 tools])
 
 AC_ARG_WITH([linux-backend-modules],

[Xen-devel] [PATCH v2 2/2] tools: remove blktap1

Now that it is unhooked we can just remove it.

Signed-off-by: Ian Campbell ian.campb...@citrix.com
---
 .gitignore  |5 -
 .hgignore   |5 -
 tools/blktap/Makefile   |   13 -
 tools/blktap/README |  122 --
 tools/blktap/drivers/Makefile   |   73 --
 tools/blktap/drivers/aes.c  | 1319 ---
 tools/blktap/drivers/aes.h  |   28 -
 tools/blktap/drivers/blk.h  |3 -
 tools/blktap/drivers/blk_linux.c|   42 -
 tools/blktap/drivers/blktapctrl.c   |  937 --
 tools/blktap/drivers/blktapctrl.h   |   36 -
 tools/blktap/drivers/blktapctrl_linux.c |   89 --
 tools/blktap/drivers/block-aio.c|  259 
 tools/blktap/drivers/block-qcow.c   | 1434 -
 tools/blktap/drivers/block-qcow2.c  | 2098 ---
 tools/blktap/drivers/block-ram.c|  295 -
 tools/blktap/drivers/block-sync.c   |  242 
 tools/blktap/drivers/block-vmdk.c   |  428 ---
 tools/blktap/drivers/bswap.h|  178 ---
 tools/blktap/drivers/img2qcow.c |  282 -
 tools/blktap/drivers/qcow-create.c  |  130 --
 tools/blktap/drivers/qcow2raw.c |  348 -
 tools/blktap/drivers/tapaio.c   |  357 --
 tools/blktap/drivers/tapaio.h   |  108 --
 tools/blktap/drivers/tapdisk.c  |  872 -
 tools/blktap/drivers/tapdisk.h  |  259 
 tools/blktap/lib/Makefile   |   60 -
 tools/blktap/lib/blkif.c|  185 ---
 tools/blktap/lib/blktaplib.h|  240 
 tools/blktap/lib/list.h |   59 -
 tools/blktap/lib/xenbus.c   |  617 -
 tools/blktap/lib/xs_api.c   |  360 --
 tools/blktap/lib/xs_api.h   |   50 -
 33 files changed, 11533 deletions(-)
 delete mode 100644 tools/blktap/Makefile
 delete mode 100644 tools/blktap/README
 delete mode 100644 tools/blktap/drivers/Makefile
 delete mode 100644 tools/blktap/drivers/aes.c
 delete mode 100644 tools/blktap/drivers/aes.h
 delete mode 100644 tools/blktap/drivers/blk.h
 delete mode 100644 tools/blktap/drivers/blk_linux.c
 delete mode 100644 tools/blktap/drivers/blktapctrl.c
 delete mode 100644 tools/blktap/drivers/blktapctrl.h
 delete mode 100644 tools/blktap/drivers/blktapctrl_linux.c
 delete mode 100644 tools/blktap/drivers/block-aio.c
 delete mode 100644 tools/blktap/drivers/block-qcow.c
 delete mode 100644 tools/blktap/drivers/block-qcow2.c
 delete mode 100644 tools/blktap/drivers/block-ram.c
 delete mode 100644 tools/blktap/drivers/block-sync.c
 delete mode 100644 tools/blktap/drivers/block-vmdk.c
 delete mode 100644 tools/blktap/drivers/bswap.h
 delete mode 100644 tools/blktap/drivers/img2qcow.c
 delete mode 100644 tools/blktap/drivers/qcow-create.c
 delete mode 100644 tools/blktap/drivers/qcow2raw.c
 delete mode 100644 tools/blktap/drivers/tapaio.c
 delete mode 100644 tools/blktap/drivers/tapaio.h
 delete mode 100644 tools/blktap/drivers/tapdisk.c
 delete mode 100644 tools/blktap/drivers/tapdisk.h
 delete mode 100644 tools/blktap/lib/Makefile
 delete mode 100644 tools/blktap/lib/blkif.c
 delete mode 100644 tools/blktap/lib/blktaplib.h
 delete mode 100644 tools/blktap/lib/list.h
 delete mode 100644 tools/blktap/lib/xenbus.c
 delete mode 100644 tools/blktap/lib/xs_api.c
 delete mode 100644 tools/blktap/lib/xs_api.h

[... actual patch omitted, see
git://xenbits.xen.org/people/ianc/xen.git remove-blktap1-v2 ]
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 00/11] Alternate p2m: support multiple copies of host p2m