[Xen-devel] [PATCH 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

2017-05-15 Thread Juergen Gross
There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.

Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.

Cc: sta...@vger.kernel.org
Reported-by: Glenn Enright 
Signed-off-by: Juergen Gross 
---
 drivers/block/xen-blkback/blkback.c | 3 ---
 drivers/block/xen-blkback/xenbus.c  | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 726c32e35db9..6b14c509f3c7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -609,8 +609,6 @@ int xen_blkif_schedule(void *arg)
unsigned long timeout;
int ret;
 
-   xen_blkif_get(blkif);
-
set_freezable();
while (!kthread_should_stop()) {
if (try_to_freeze())
@@ -665,7 +663,6 @@ int xen_blkif_schedule(void *arg)
print_stats(ring);
 
ring->xenblkd = NULL;
-   xen_blkif_put(blkif);
 
return 0;
 }
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4d2f57fa35da..1dc0ff5ed912 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -255,7 +255,6 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
-   ring->xenblkd = NULL;
}
 
/* The above kthread_stop() guarantees that at this point we
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/3] xen/blkback: several fixes of resource management

2017-05-15 Thread Juergen Gross
Destroying a Xen guest domain while it was doing I/Os via xen-blkback
leaked several resources, including references of the guest's memory
pages.

This patch series addresses those leaks by correcting usage of
reference counts and the sequence when to free which resource.

The series applies on top of commit 2d4456c73a487abe ("block:
xen-blkback: add null check to avoid null pointer dereference") in
Jens Axboe's tree kernel/git/axboe/linux-block.git

Juergen Gross (3):
  xen/blkback: fix disconnect while I/Os in flight
  xen/blkback: don't free be structure too early
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

 drivers/block/xen-blkback/blkback.c |  3 ---
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  | 15 ---
 3 files changed, 9 insertions(+), 10 deletions(-)

-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/3] xen/blkback: fix disconnect while I/Os in flight

2017-05-15 Thread Juergen Gross
Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.

To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.

This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.

Cc: sta...@vger.kernel.org
Reported-by: Glenn Enright 
Signed-off-by: Juergen Gross 
Tested-by: Steven Haigh 
---
 drivers/block/xen-blkback/common.h | 1 +
 drivers/block/xen-blkback/xenbus.c | 7 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index dea61f6ab8cb..953f38802333 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -281,6 +281,7 @@ struct xen_blkif_ring {
 
wait_queue_head_t   wq;
atomic_tinflight;
+   int active;
/* One thread per blkif ring. */
struct task_struct  *xenblkd;
unsigned intwaiting_reqs;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 1f3dfaa54d87..e68df9de8858 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -159,7 +159,7 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
init_waitqueue_head(&ring->shutdown_wq);
ring->blkif = blkif;
ring->st_print = jiffies;
-   xen_blkif_get(blkif);
+   ring->active = 1;
}
 
return 0;
@@ -249,6 +249,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
struct xen_blkif_ring *ring = &blkif->rings[r];
unsigned int i = 0;
 
+   if (!ring->active)
+   continue;
+
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
@@ -296,7 +299,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
BUG_ON(ring->free_pages_num != 0);
BUG_ON(ring->persistent_gnt_c != 0);
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
-   xen_blkif_put(blkif);
+   ring->active = 0;
}
blkif->nr_ring_pages = 0;
/*
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/3] xen/blkback: don't free be structure too early

2017-05-15 Thread Juergen Gross
The be structure must nor be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).

Cc: sta...@vger.kernel.org
Reported-by: Glenn Enright 
Signed-off-by: Juergen Gross 
---
 drivers/block/xen-blkback/xenbus.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index e68df9de8858..4d2f57fa35da 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -315,9 +315,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 
 static void xen_blkif_free(struct xen_blkif *blkif)
 {
-
-   xen_blkif_disconnect(blkif);
+   WARN_ON(xen_blkif_disconnect(blkif));
xen_vbd_free(&blkif->vbd);
+   kfree(blkif->be->mode);
+   kfree(blkif->be);
 
/* Make sure everything is drained before shutting down */
kmem_cache_free(xen_blkif_cachep, blkif);
@@ -514,8 +515,6 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
xen_blkif_put(be->blkif);
}
 
-   kfree(be->mode);
-   kfree(be);
return 0;
 }
 
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend

2017-05-15 Thread Juergen Gross
On 15/05/17 22:35, Stefano Stabellini wrote:
> The pvcalls backend has one ioworker per cpu: the ioworkers are
> implemented as a cpu bound workqueue, and will deal with the actual
> socket and data ring reads/writes.
> 
> ioworkers are global: we only have one set for all the frontends. They
> process requests on their wqs list in order, once they are done with a
> request, they'll remove it from the list. A spinlock is used for
> protecting the list. Each ioworker is bound to a different cpu to
> maximize throughput.
> 
> Signed-off-by: Stefano Stabellini 
> CC: boris.ostrov...@oracle.com
> CC: jgr...@suse.com
> ---
>  drivers/xen/pvcalls-back.c | 64 
> ++
>  1 file changed, 64 insertions(+)
> 
> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
> index 2dbf7d8..46a889a 100644
> --- a/drivers/xen/pvcalls-back.c
> +++ b/drivers/xen/pvcalls-back.c
> @@ -25,6 +25,26 @@
>  #include 
>  #include 
>  
> +struct pvcalls_ioworker {
> + struct work_struct register_work;
> + atomic_t io;
> + struct list_head wqs;
> + spinlock_t lock;
> + int num;
> +};
> +
> +struct pvcalls_back_global {
> + struct pvcalls_ioworker *ioworkers;
> + int nr_ioworkers;
> + struct workqueue_struct *wq;
> + struct list_head privs;
> + struct rw_semaphore privs_lock;
> +} pvcalls_back_global;
> +
> +static void pvcalls_back_ioworker(struct work_struct *work)
> +{
> +}
> +
>  static int pvcalls_back_probe(struct xenbus_device *dev,
> const struct xenbus_device_id *id)
>  {
> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device *xdev,
>   .uevent = pvcalls_back_uevent,
>   .otherend_changed = pvcalls_back_changed,
>  };
> +
> +static int __init pvcalls_back_init(void)
> +{
> + int ret, i, cpu;
> +
> + if (!xen_domain())
> + return -ENODEV;
> +
> + ret = xenbus_register_backend(&pvcalls_back_driver);
> + if (ret < 0)
> + return ret;
> +
> + init_rwsem(&pvcalls_back_global.privs_lock);
> + INIT_LIST_HEAD(&pvcalls_back_global.privs);
> + pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
> + if (!pvcalls_back_global.wq)
> + goto error;
> + pvcalls_back_global.nr_ioworkers = num_online_cpus();

Really? Recently I cam across a system with 640 dom0 cpus. I don't think
we want 640 workers initialized when loading the backend module. I'd
prefer one or a few workers per connected frontend.

> + pvcalls_back_global.ioworkers = kzalloc(
> + sizeof(*pvcalls_back_global.ioworkers) *
> + pvcalls_back_global.nr_ioworkers, GFP_KERNEL);

kcalloc()?

> + if (!pvcalls_back_global.ioworkers)
> + goto error;
> + i = 0;
> + for_each_online_cpu(cpu) {
> + pvcalls_back_global.ioworkers[i].num = i;
> + atomic_set(&pvcalls_back_global.ioworkers[i].io, 1);
> + spin_lock_init(&pvcalls_back_global.ioworkers[i].lock);
> + INIT_LIST_HEAD(&pvcalls_back_global.ioworkers[i].wqs);
> + INIT_WORK(&pvcalls_back_global.ioworkers[i].register_work,
> + pvcalls_back_ioworker);
> + i++;
> + }
> + return 0;
> +
> +error:
> + if (pvcalls_back_global.wq)
> + destroy_workqueue(pvcalls_back_global.wq);
> + xenbus_unregister_driver(&pvcalls_back_driver);
> + kfree(pvcalls_back_global.ioworkers);
> + memset(&pvcalls_back_global, 0, sizeof(pvcalls_back_global));
> + return -ENOMEM;
> +}
> +module_init(pvcalls_back_init);
> 

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen: cleanup pvh leftovers from pv-only sources

2017-05-16 Thread Juergen Gross
There are some leftovers testing for pvh guest mode in pv-only source
files. Remove them.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/enlighten_pv.c | 15 ++-
 arch/x86/xen/mmu_pv.c   | 98 +
 2 files changed, 40 insertions(+), 73 deletions(-)

diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index 7cd442690f9d..f33eef4ebd12 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -142,9 +142,7 @@ static void __init xen_banner(void)
struct xen_extraversion extra;
HYPERVISOR_xen_version(XENVER_extraversion, &extra);
 
-   pr_info("Booting paravirtualized kernel %son %s\n",
-   xen_feature(XENFEAT_auto_translated_physmap) ?
-   "with PVH extensions " : "", pv_info.name);
+   pr_info("Booting paravirtualized kernel on %s\n", pv_info.name);
printk(KERN_INFO "Xen version: %d.%d%s%s\n",
   version >> 16, version & 0x, extra.extraversion,
   xen_feature(XENFEAT_mmu_pt_update_preserve_ad) ? " 
(preserve-AD)" : "");
@@ -957,15 +955,10 @@ static void xen_write_msr(unsigned int msr, unsigned low, 
unsigned high)
 
 void xen_setup_shared_info(void)
 {
-   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-   set_fixmap(FIX_PARAVIRT_BOOTMAP,
-  xen_start_info->shared_info);
+   set_fixmap(FIX_PARAVIRT_BOOTMAP, xen_start_info->shared_info);
 
-   HYPERVISOR_shared_info =
-   (struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
-   } else
-   HYPERVISOR_shared_info =
-   (struct shared_info *)__va(xen_start_info->shared_info);
+   HYPERVISOR_shared_info =
+   (struct shared_info *)fix_to_virt(FIX_PARAVIRT_BOOTMAP);
 
 #ifndef CONFIG_SMP
/* In UP this is as good a place as any to set up shared info */
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 7397d8b8459d..1f386d7fdf70 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -355,10 +355,8 @@ static pteval_t pte_pfn_to_mfn(pteval_t val)
pteval_t flags = val & PTE_FLAGS_MASK;
unsigned long mfn;
 
-   if (!xen_feature(XENFEAT_auto_translated_physmap))
-   mfn = __pfn_to_mfn(pfn);
-   else
-   mfn = pfn;
+   mfn = __pfn_to_mfn(pfn);
+
/*
 * If there's no mfn for the pfn, then just create an
 * empty non-present pte.  Unfortunately this loses
@@ -647,9 +645,6 @@ static int __xen_pgd_walk(struct mm_struct *mm, pgd_t *pgd,
limit--;
BUG_ON(limit >= FIXADDR_TOP);
 
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return 0;
-
/*
 * 64-bit has a great big hole in the middle of the address
 * space, which contains the Xen mappings.  On 32-bit these
@@ -1289,9 +1284,6 @@ static void __init xen_pagetable_cleanhighmap(void)
 
 static void __init xen_pagetable_p2m_setup(void)
 {
-   if (xen_feature(XENFEAT_auto_translated_physmap))
-   return;
-
xen_vmalloc_p2m_tree();
 
 #ifdef CONFIG_X86_64
@@ -1314,8 +1306,7 @@ static void __init xen_pagetable_init(void)
xen_build_mfn_list_list();
 
/* Remap memory freed due to conflicts with E820 map */
-   if (!xen_feature(XENFEAT_auto_translated_physmap))
-   xen_remap_memory();
+   xen_remap_memory();
 
xen_setup_shared_info();
 }
@@ -1925,21 +1916,20 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, 
unsigned long max_pfn)
/* Zap identity mapping */
init_level4_pgt[0] = __pgd(0);
 
-   if (!xen_feature(XENFEAT_auto_translated_physmap)) {
-   /* Pre-constructed entries are in pfn, so convert to mfn */
-   /* L4[272] -> level3_ident_pgt
-* L4[511] -> level3_kernel_pgt */
-   convert_pfn_mfn(init_level4_pgt);
+   /* Pre-constructed entries are in pfn, so convert to mfn */
+   /* L4[272] -> level3_ident_pgt  */
+   /* L4[511] -> level3_kernel_pgt */
+   convert_pfn_mfn(init_level4_pgt);
 
-   /* L3_i[0] -> level2_ident_pgt */
-   convert_pfn_mfn(level3_ident_pgt);
-   /* L3_k[510] -> level2_kernel_pgt
-* L3_k[511] -> level2_fixmap_pgt */
-   convert_pfn_mfn(level3_kernel_pgt);
+   /* L3_i[0] -> level2_ident_pgt */
+   convert_pfn_mfn(level3_ident_pgt);
+   /* L3_k[510] -> level2_kernel_pgt */
+   /* L3_k[511] -> level2_fixmap_pgt */
+   convert_pfn_mfn(level3_kernel_pgt);
+
+   /* L3_k[511][506] -> level1_fixmap_pgt */
+   convert_pfn_mfn(level2_fixmap_pgt);
 
-   /* L3_k[5

Re: [Xen-devel] [PATCH 03/18] xen/pvcalls: initialize the module and register the xenbus backend

2017-05-16 Thread Juergen Gross
On 16/05/17 21:58, Stefano Stabellini wrote:
> On Tue, 16 May 2017, Juergen Gross wrote:
>> On 15/05/17 22:35, Stefano Stabellini wrote:
>>> The pvcalls backend has one ioworker per cpu: the ioworkers are
>>> implemented as a cpu bound workqueue, and will deal with the actual
>>> socket and data ring reads/writes.
>>>
>>> ioworkers are global: we only have one set for all the frontends. They
>>> process requests on their wqs list in order, once they are done with a
>>> request, they'll remove it from the list. A spinlock is used for
>>> protecting the list. Each ioworker is bound to a different cpu to
>>> maximize throughput.
>>>
>>> Signed-off-by: Stefano Stabellini 
>>> CC: boris.ostrov...@oracle.com
>>> CC: jgr...@suse.com
>>> ---
>>>  drivers/xen/pvcalls-back.c | 64 
>>> ++
>>>  1 file changed, 64 insertions(+)
>>>
>>> diff --git a/drivers/xen/pvcalls-back.c b/drivers/xen/pvcalls-back.c
>>> index 2dbf7d8..46a889a 100644
>>> --- a/drivers/xen/pvcalls-back.c
>>> +++ b/drivers/xen/pvcalls-back.c
>>> @@ -25,6 +25,26 @@
>>>  #include 
>>>  #include 
>>>  
>>> +struct pvcalls_ioworker {
>>> +   struct work_struct register_work;
>>> +   atomic_t io;
>>> +   struct list_head wqs;
>>> +   spinlock_t lock;
>>> +   int num;
>>> +};
>>> +
>>> +struct pvcalls_back_global {
>>> +   struct pvcalls_ioworker *ioworkers;
>>> +   int nr_ioworkers;
>>> +   struct workqueue_struct *wq;
>>> +   struct list_head privs;
>>> +   struct rw_semaphore privs_lock;
>>> +} pvcalls_back_global;
>>> +
>>> +static void pvcalls_back_ioworker(struct work_struct *work)
>>> +{
>>> +}
>>> +
>>>  static int pvcalls_back_probe(struct xenbus_device *dev,
>>>   const struct xenbus_device_id *id)
>>>  {
>>> @@ -59,3 +79,47 @@ static int pvcalls_back_uevent(struct xenbus_device 
>>> *xdev,
>>> .uevent = pvcalls_back_uevent,
>>> .otherend_changed = pvcalls_back_changed,
>>>  };
>>> +
>>> +static int __init pvcalls_back_init(void)
>>> +{
>>> +   int ret, i, cpu;
>>> +
>>> +   if (!xen_domain())
>>> +   return -ENODEV;
>>> +
>>> +   ret = xenbus_register_backend(&pvcalls_back_driver);
>>> +   if (ret < 0)
>>> +   return ret;
>>> +
>>> +   init_rwsem(&pvcalls_back_global.privs_lock);
>>> +   INIT_LIST_HEAD(&pvcalls_back_global.privs);
>>> +   pvcalls_back_global.wq = alloc_workqueue("pvcalls_io", 0, 0);
>>> +   if (!pvcalls_back_global.wq)
>>> +   goto error;
>>> +   pvcalls_back_global.nr_ioworkers = num_online_cpus();
>>
>> Really? Recently I cam across a system with 640 dom0 cpus. I don't think
>> we want 640 workers initialized when loading the backend module. I'd
>> prefer one or a few workers per connected frontend.
> 
> I think we want to keep the ioworker allocation to be based on the
> number of vcpus: we do not want more ioworkers than vcpus because it is
> a waste of resources and leads to worse performance.  Also, given that
> they do memcpy's, I also think it is a good idea to bind them to vcpus
> (and pin vcpus to pcpus) to get best performance.

This will cause a lot of pain for the cpu offline case. Please don't try
to work against the hypervisor scheduler by designing a backend based on
a vcpu pin policy. This might result in best performance for your
special workload, but generally it is a bad idea!

> However, you have a point there: we need to handle systems with an
> extremely large number of Dom0 vcpus. I suggest we introduce an
> upper limit for the number of ioworkers. Something like:
> 
> #define MAX_IOWORKERS 64
> nr_ioworkers = min(MAX_IOWORKERS, num_online_cpus())
> 
> MAX_IOWORKERS could be configurable via a command line option.

Later you are assigning each active socket to exactly one ioworker.
Wouldn't it make more sense to allocate the ioworker when doing
the connect? This would avoid the problem of having only a statistical
distribution, possibly with all sockets on the same ioworker.

Basically you are re-inventing the wheel by using an own workqueue
implementation in each ioworker looping through all assigned sockets.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/3] xen/blkback: several fixes of resource management

2017-05-18 Thread Juergen Gross
On 18/05/17 16:38, Roger Pau Monné wrote:
> On Tue, May 16, 2017 at 08:23:17AM +0200, Juergen Gross wrote:
>> Destroying a Xen guest domain while it was doing I/Os via xen-blkback
>> leaked several resources, including references of the guest's memory
>> pages.
>>
>> This patch series addresses those leaks by correcting usage of
>> reference counts and the sequence when to free which resource.
>>
>> The series applies on top of commit 2d4456c73a487abe ("block:
>> xen-blkback: add null check to avoid null pointer dereference") in
>> Jens Axboe's tree kernel/git/axboe/linux-block.git
>>
>> Juergen Gross (3):
>>   xen/blkback: fix disconnect while I/Os in flight
>>   xen/blkback: don't free be structure too early
>>   xen/blkback: don't use xen_blkif_get() in xen-blkback kthread
> 
> All 3:
> 
> Acked-by: Roger Pau Monné 
> 
> The comment reported by Dietmar in patch #1 would be nice to fix.

Okay, I'll send V2 soon.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/3] xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

2017-05-18 Thread Juergen Gross
There is no need to use xen_blkif_get()/xen_blkif_put() in the kthread
of xen-blkback. Thread stopping is synchronous and using the blkif
reference counting in the kthread will avoid to ever let the reference
count drop to zero at the end of an I/O running concurrent to
disconnecting and multiple rings.

Setting ring->xenblkd to NULL after stopping the kthread isn't needed
as the kthread does this already.

Signed-off-by: Juergen Gross 
Tested-by: Steven Haigh 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkback/blkback.c | 3 ---
 drivers/block/xen-blkback/xenbus.c  | 1 -
 2 files changed, 4 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 726c32e35db9..6b14c509f3c7 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -609,8 +609,6 @@ int xen_blkif_schedule(void *arg)
unsigned long timeout;
int ret;
 
-   xen_blkif_get(blkif);
-
set_freezable();
while (!kthread_should_stop()) {
if (try_to_freeze())
@@ -665,7 +663,6 @@ int xen_blkif_schedule(void *arg)
print_stats(ring);
 
ring->xenblkd = NULL;
-   xen_blkif_put(blkif);
 
return 0;
 }
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 4cdf0490983e..792da683e70d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -255,7 +255,6 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
-   ring->xenblkd = NULL;
}
 
/* The above kthread_stop() guarantees that at this point we
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/3] xen/blkback: fix disconnect while I/Os in flight

2017-05-18 Thread Juergen Gross
Today disconnecting xen-blkback is broken in case there are still
I/Os in flight: xen_blkif_disconnect() will bail out early without
releasing all resources in the hope it will be called again when
the last request has terminated. This, however, won't happen as
xen_blkif_free() won't be called on termination of the last running
request: xen_blkif_put() won't decrement the blkif refcnt to 0 as
xen_blkif_disconnect() didn't finish before thus some xen_blkif_put()
calls in xen_blkif_disconnect() didn't happen.

To solve this deadlock xen_blkif_disconnect() and
xen_blkif_alloc_rings() shouldn't use xen_blkif_put() and
xen_blkif_get() but use some other way to do their accounting of
resources.

This at once fixes another error in xen_blkif_disconnect(): when it
returned early with -EBUSY for another ring than 0 it would call
xen_blkif_put() again for already handled rings on a subsequent call.
This will lead to inconsistencies in the refcnt handling.

Cc: sta...@vger.kernel.org
Signed-off-by: Juergen Gross 
Tested-by: Steven Haigh 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkback/common.h | 1 +
 drivers/block/xen-blkback/xenbus.c | 7 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index dea61f6ab8cb..638597b17a38 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -281,6 +281,7 @@ struct xen_blkif_ring {
 
wait_queue_head_t   wq;
atomic_tinflight;
+   boolactive;
/* One thread per blkif ring. */
struct task_struct  *xenblkd;
unsigned intwaiting_reqs;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 1f3dfaa54d87..998915174bb8 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -159,7 +159,7 @@ static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
init_waitqueue_head(&ring->shutdown_wq);
ring->blkif = blkif;
ring->st_print = jiffies;
-   xen_blkif_get(blkif);
+   ring->active = true;
}
 
return 0;
@@ -249,6 +249,9 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
struct xen_blkif_ring *ring = &blkif->rings[r];
unsigned int i = 0;
 
+   if (!ring->active)
+   continue;
+
if (ring->xenblkd) {
kthread_stop(ring->xenblkd);
wake_up(&ring->shutdown_wq);
@@ -296,7 +299,7 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
BUG_ON(ring->free_pages_num != 0);
BUG_ON(ring->persistent_gnt_c != 0);
WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));
-   xen_blkif_put(blkif);
+   ring->active = false;
}
blkif->nr_ring_pages = 0;
/*
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/3] xen/blkback: several fixes of resource management

2017-05-18 Thread Juergen Gross
Destroying a Xen guest domain while it was doing I/Os via xen-blkback
leaked several resources, including references of the guest's memory
pages.

This patch series addresses those leaks by correcting usage of
reference counts and the sequence when to free which resource.

The series applies on top of commit 2d4456c73a487abe ("block:
xen-blkback: add null check to avoid null pointer dereference") in
Jens Axboe's tree kernel/git/axboe/linux-block.git

V2: changed flag to type bool in patch 1 (Dietmar Hahn)

Juergen Gross (3):
  xen/blkback: fix disconnect while I/Os in flight
  xen/blkback: don't free be structure too early
  xen/blkback: don't use xen_blkif_get() in xen-blkback kthread

 drivers/block/xen-blkback/blkback.c |  3 ---
 drivers/block/xen-blkback/common.h  |  1 +
 drivers/block/xen-blkback/xenbus.c  | 15 ---
 3 files changed, 9 insertions(+), 10 deletions(-)

-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/3] xen/blkback: don't free be structure too early

2017-05-18 Thread Juergen Gross
The be structure must nor be freed when freeing the blkif structure
isn't done. Otherwise a use-after-free of be when unmapping the ring
used for communicating with the frontend will occur in case of a
late call of xenblk_disconnect() (e.g. due to an I/O still active
when trying to disconnect).

Signed-off-by: Juergen Gross 
Tested-by: Steven Haigh 
Acked-by: Roger Pau Monné 
---
 drivers/block/xen-blkback/xenbus.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 998915174bb8..4cdf0490983e 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -315,9 +315,10 @@ static int xen_blkif_disconnect(struct xen_blkif *blkif)
 
 static void xen_blkif_free(struct xen_blkif *blkif)
 {
-
-   xen_blkif_disconnect(blkif);
+   WARN_ON(xen_blkif_disconnect(blkif));
xen_vbd_free(&blkif->vbd);
+   kfree(blkif->be->mode);
+   kfree(blkif->be);
 
/* Make sure everything is drained before shutting down */
kmem_cache_free(xen_blkif_cachep, blkif);
@@ -514,8 +515,6 @@ static int xen_blkbk_remove(struct xenbus_device *dev)
xen_blkif_put(be->blkif);
}
 
-   kfree(be->mode);
-   kfree(be);
return 0;
 }
 
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] xen: make xen_flush_tlb_all() static

2017-05-18 Thread Juergen Gross
xen_flush_tlb_all() is used in arch/x86/xen/mmu.c only. Make it static.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 5e375a5e815f..3be06f3caf3c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -42,7 +42,7 @@ xmaddr_t arbitrary_virt_to_machine(void *vaddr)
 }
 EXPORT_SYMBOL_GPL(arbitrary_virt_to_machine);
 
-void xen_flush_tlb_all(void)
+static void xen_flush_tlb_all(void)
 {
struct mmuext_op *op;
struct multicall_space mcs;
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 02/10] paravirt: remove unused function paravirt_disable_iospace()

2017-05-19 Thread Juergen Gross
paravirt_disable_iospace() isn't used anywhere. Remove it.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt_types.h |  2 --
 arch/x86/kernel/paravirt.c| 19 ---
 2 files changed, 21 deletions(-)

diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 7465d6fe336f..7a5de42cb465 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -396,8 +396,6 @@ unsigned paravirt_patch_insns(void *insnbuf, unsigned len,
 unsigned native_patch(u8 type, u16 clobbers, void *ibuf,
  unsigned long addr, unsigned len);
 
-int paravirt_disable_iospace(void);
-
 /*
  * This generates an indirect call based on the operation type number.
  * The type number, computed in PARAVIRT_PATCH, is derived from the
diff --git a/arch/x86/kernel/paravirt.c b/arch/x86/kernel/paravirt.c
index 3586996fc50d..b8b23b3f24c2 100644
--- a/arch/x86/kernel/paravirt.c
+++ b/arch/x86/kernel/paravirt.c
@@ -207,25 +207,6 @@ static u64 native_steal_clock(int cpu)
 extern void native_iret(void);
 extern void native_usergs_sysret64(void);
 
-static struct resource reserve_ioports = {
-   .start = 0,
-   .end = IO_SPACE_LIMIT,
-   .name = "paravirt-ioport",
-   .flags = IORESOURCE_IO | IORESOURCE_BUSY,
-};
-
-/*
- * Reserve the whole legacy IO space to prevent any legacy drivers
- * from wasting time probing for their hardware.  This is a fairly
- * brute-force approach to disabling all non-virtual drivers.
- *
- * Note that this must be called very early to have any effect.
- */
-int paravirt_disable_iospace(void)
-{
-   return request_resource(&ioport_resource, &reserve_ioports);
-}
-
 static DEFINE_PER_CPU(enum paravirt_lazy_mode, paravirt_lazy_mode) = 
PARAVIRT_LAZY_NONE;
 
 static inline void enter_lazy(enum paravirt_lazy_mode mode)
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 07/10] paravirt: split pv_irq_ops for support of PARAVIRT_FULL

2017-05-19 Thread Juergen Gross
Move functions needed for fully paravirtualized guests only into a new
structure pvfull_irq_ops in paravirt_types_full.h, paravirt_full.h and
the associated vector into paravirt_full.c.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/irqflags.h| 44 +++---
 arch/x86/include/asm/paravirt.h| 15 --
 arch/x86/include/asm/paravirt_full.h   | 17 
 arch/x86/include/asm/paravirt_types.h  |  8 +-
 arch/x86/include/asm/paravirt_types_full.h | 10 +++
 arch/x86/kernel/asm-offsets.c  |  2 ++
 arch/x86/kernel/asm-offsets_64.c   |  5 ++--
 arch/x86/kernel/paravirt.c |  6 +---
 arch/x86/kernel/paravirt_full.c|  9 ++
 arch/x86/lguest/boot.c |  2 +-
 arch/x86/xen/irq.c |  3 ++
 11 files changed, 68 insertions(+), 53 deletions(-)

diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index c3319c20127c..2a6d7a675271 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -87,6 +87,26 @@ static inline notrace void arch_local_irq_enable(void)
 }
 
 /*
+ * For spinlocks, etc:
+ */
+static inline notrace unsigned long arch_local_irq_save(void)
+{
+   unsigned long flags = arch_local_save_flags();
+   arch_local_irq_disable();
+   return flags;
+}
+#else
+
+#define ENABLE_INTERRUPTS(x)   sti
+#define DISABLE_INTERRUPTS(x)  cli
+
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
+#ifndef CONFIG_PARAVIRT_FULL
+#ifndef __ASSEMBLY__
+
+/*
  * Used in the idle loop; sti takes one instruction cycle
  * to complete:
  */
@@ -104,30 +124,8 @@ static inline __cpuidle void halt(void)
native_halt();
 }
 
-/*
- * For spinlocks, etc:
- */
-static inline notrace unsigned long arch_local_irq_save(void)
-{
-   unsigned long flags = arch_local_save_flags();
-   arch_local_irq_disable();
-   return flags;
-}
 #else
 
-#define ENABLE_INTERRUPTS(x)   sti
-#define DISABLE_INTERRUPTS(x)  cli
-
-#ifdef CONFIG_X86_64
-#define PARAVIRT_ADJUST_EXCEPTION_FRAME/*  */
-#endif
-
-#endif /* __ASSEMBLY__ */
-#endif /* CONFIG_PARAVIRT */
-
-#ifndef CONFIG_PARAVIRT_FULL
-#ifdef __ASSEMBLY__
-
 #ifdef CONFIG_X86_64
 #define SWAPGS swapgs
 /*
@@ -149,6 +147,8 @@ static inline notrace unsigned long 
arch_local_irq_save(void)
swapgs; \
sysretl
 
+#define PARAVIRT_ADJUST_EXCEPTION_FRAME/*  */
+
 #else
 #define INTERRUPT_RETURN   iret
 #define GET_CR0_INTO_EAX   movl %cr0, %eax
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 2287a2465486..f1680e70162b 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -39,16 +39,6 @@ static inline void write_cr3(unsigned long x)
PVOP_VCALL1(pv_mmu_ops.write_cr3, x);
 }
 
-static inline void arch_safe_halt(void)
-{
-   PVOP_VCALL0(pv_irq_ops.safe_halt);
-}
-
-static inline void halt(void)
-{
-   PVOP_VCALL0(pv_irq_ops.halt);
-}
-
 #define get_kernel_rpl()  (pv_info.kernel_rpl)
 
 static inline unsigned long long paravirt_sched_clock(void)
@@ -721,11 +711,6 @@ extern void default_banner(void);
 #define GET_CR2_INTO_RAX   \
call PARA_INDIRECT(pv_mmu_ops+PV_MMU_read_cr2)
 
-#define PARAVIRT_ADJUST_EXCEPTION_FRAME
\
-   PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_adjust_exception_frame), \
- CLBR_NONE,\
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_adjust_exception_frame))
-
 #endif /* CONFIG_X86_64 */
 
 #endif /* __ASSEMBLY__ */
diff --git a/arch/x86/include/asm/paravirt_full.h 
b/arch/x86/include/asm/paravirt_full.h
index b3cf0960c161..64753ef1d36f 100644
--- a/arch/x86/include/asm/paravirt_full.h
+++ b/arch/x86/include/asm/paravirt_full.h
@@ -231,6 +231,16 @@ static inline void arch_end_context_switch(struct 
task_struct *next)
PVOP_VCALL1(pvfull_cpu_ops.end_context_switch, next);
 }
 
+static inline void arch_safe_halt(void)
+{
+   PVOP_VCALL0(pvfull_irq_ops.safe_halt);
+}
+
+static inline void halt(void)
+{
+   PVOP_VCALL0(pvfull_irq_ops.halt);
+}
+
 #else /* __ASSEMBLY__ */
 
 #define INTERRUPT_RETURN   \
@@ -267,6 +277,13 @@ static inline void arch_end_context_switch(struct 
task_struct *next)
PARA_SITE(PARA_PATCH(pvfull_cpu_ops, PV_CPU_usergs_sysret64),   \
  CLBR_NONE,\
  jmp PARA_INDIRECT(pvfull_cpu_ops+PV_CPU_usergs_sysret64))
+
+#define PARAVIRT_ADJUST_EXCEPTION_FRAME
\
+   PARA_SITE(PARA_PATCH(pvfull_irq_ops, PV_IRQ_adjust_exception_frame), \
+ CLBR_NONE,\
+ call

[Xen-devel] [PATCH 09/10] paravirt: split pv_info for support of PARAVIRT_FULL

2017-05-19 Thread Juergen Gross
Move members needed for fully paravirtualized guests only into a new
structure pvfull_info in paravirt_types_full.h, paravirt_full.h and
the associated vector into paravirt_full.c.

Signed-off-by: Juergen Gross 
---
 arch/x86/boot/compressed/misc.h |  1 +
 arch/x86/include/asm/paravirt.h |  2 --
 arch/x86/include/asm/paravirt_full.h|  2 ++
 arch/x86/include/asm/paravirt_types.h   |  7 ---
 arch/x86/include/asm/paravirt_types_full.h  | 10 ++
 arch/x86/include/asm/pgtable-3level_types.h |  4 ++--
 arch/x86/include/asm/ptrace.h   |  5 +++--
 arch/x86/include/asm/segment.h  |  2 +-
 arch/x86/kernel/paravirt.c  |  9 ++---
 arch/x86/kernel/paravirt_full.c | 10 ++
 arch/x86/lguest/boot.c  |  4 ++--
 arch/x86/xen/enlighten_pv.c | 12 ++--
 12 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch/x86/boot/compressed/misc.h b/arch/x86/boot/compressed/misc.h
index 1c8355eadbd1..007b58f3d985 100644
--- a/arch/x86/boot/compressed/misc.h
+++ b/arch/x86/boot/compressed/misc.h
@@ -9,6 +9,7 @@
  */
 #undef CONFIG_PARAVIRT
 #undef CONFIG_PARAVIRT_SPINLOCKS
+#undef CONFIG_PARAVIRT_FULL
 #undef CONFIG_KASAN
 
 #include 
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 3b9960a5de4a..55e0c1807df2 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -26,8 +26,6 @@ static inline enum paravirt_lazy_mode 
paravirt_get_lazy_mode(void)
 
 #endif
 
-#define get_kernel_rpl()  (pv_info.kernel_rpl)
-
 static inline unsigned long long paravirt_sched_clock(void)
 {
return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
diff --git a/arch/x86/include/asm/paravirt_full.h 
b/arch/x86/include/asm/paravirt_full.h
index 53f2eb436ba3..95d1c21bbef7 100644
--- a/arch/x86/include/asm/paravirt_full.h
+++ b/arch/x86/include/asm/paravirt_full.h
@@ -3,6 +3,8 @@
 
 #ifndef __ASSEMBLY__
 
+#define get_kernel_rpl()   (pvfull_info.kernel_rpl)
+
 static inline void load_sp0(struct tss_struct *tss,
struct thread_struct *thread)
 {
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index b1ac2a5698b4..34753d10ebbc 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -66,13 +66,6 @@ struct paravirt_callee_save {
 
 /* general info */
 struct pv_info {
-   unsigned int kernel_rpl;
-   int shared_kernel_pmd;
-
-#ifdef CONFIG_X86_64
-   u16 extra_user_64bit_cs;  /* __USER_CS if none */
-#endif
-
const char *name;
 };
 
diff --git a/arch/x86/include/asm/paravirt_types_full.h 
b/arch/x86/include/asm/paravirt_types_full.h
index 15d595a5f9d2..b1f91fad5842 100644
--- a/arch/x86/include/asm/paravirt_types_full.h
+++ b/arch/x86/include/asm/paravirt_types_full.h
@@ -1,6 +1,15 @@
 #ifndef _ASM_X86_PARAVIRT_TYPES_FULL_H
 #define _ASM_X86_PARAVIRT_TYPES_FULL_H
 
+struct pvfull_info {
+   unsigned int kernel_rpl;
+   int shared_kernel_pmd;
+
+#ifdef CONFIG_X86_64
+   u16 extra_user_64bit_cs;/* __USER_CS if none */
+#endif
+};
+
 struct pv_lazy_ops {
/* Set deferred update mode, used for batching operations. */
void (*enter)(void);
@@ -193,6 +202,7 @@ struct pvfull_mmu_ops {
   phys_addr_t phys, pgprot_t flags);
 };
 
+extern struct pvfull_info pvfull_info;
 extern struct pvfull_cpu_ops pvfull_cpu_ops;
 extern struct pvfull_irq_ops pvfull_irq_ops;
 extern struct pvfull_mmu_ops pvfull_mmu_ops;
diff --git a/arch/x86/include/asm/pgtable-3level_types.h 
b/arch/x86/include/asm/pgtable-3level_types.h
index b8a4341faafa..fdf132570189 100644
--- a/arch/x86/include/asm/pgtable-3level_types.h
+++ b/arch/x86/include/asm/pgtable-3level_types.h
@@ -19,8 +19,8 @@ typedef union {
 } pte_t;
 #endif /* !__ASSEMBLY__ */
 
-#ifdef CONFIG_PARAVIRT
-#define SHARED_KERNEL_PMD  (pv_info.shared_kernel_pmd)
+#ifdef CONFIG_PARAVIRT_FULL
+#define SHARED_KERNEL_PMD  (pvfull_info.shared_kernel_pmd)
 #else
 #define SHARED_KERNEL_PMD  1
 #endif
diff --git a/arch/x86/include/asm/ptrace.h b/arch/x86/include/asm/ptrace.h
index 2b5d686ea9f3..81d73663c497 100644
--- a/arch/x86/include/asm/ptrace.h
+++ b/arch/x86/include/asm/ptrace.h
@@ -118,7 +118,7 @@ static inline int v8086_mode(struct pt_regs *regs)
 #ifdef CONFIG_X86_64
 static inline bool user_64bit_mode(struct pt_regs *regs)
 {
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_FULL
/*
 * On non-paravirt systems, this is the only long mode CPL 3
 * selector.  We do not allow long mode selectors in the LDT.
@@ -126,7 +126,8 @@ static inline bool user_64bit_mode(struct pt_regs *regs)
return regs->cs == __USER_CS;
 #else
/* Headers are too twisted for this to go in paravirt.h. */
-   return regs->cs == __USER_CS || regs->cs == pv_info.extra_user

[Xen-devel] [PATCH 04/10] xen: remove non-pv test from arch/x86/xen/irq.c

2017-05-19 Thread Juergen Gross
As arch/x86/xen/irq.c is used for pv-guests only, there is no need to
have a test targeting a HVM guest in it. Remove it.

Signed-off-by: Juergen Gross 
---
 arch/x86/xen/irq.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/x86/xen/irq.c b/arch/x86/xen/irq.c
index 33e92955e09d..3b55ae664521 100644
--- a/arch/x86/xen/irq.c
+++ b/arch/x86/xen/irq.c
@@ -130,8 +130,6 @@ static const struct pv_irq_ops xen_irq_ops __initconst = {
 
 void __init xen_init_irq_ops(void)
 {
-   /* For PVH we use default pv_irq_ops settings. */
-   if (!xen_feature(XENFEAT_hvm_callback_vector))
-   pv_irq_ops = xen_irq_ops;
+   pv_irq_ops = xen_irq_ops;
x86_init.irqs.intr_init = xen_init_IRQ;
 }
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 08/10] paravirt: split pv_mmu_ops for support of PARAVIRT_FULL

2017-05-19 Thread Juergen Gross
Move functions needed for fully paravirtualized guests only into a new
structure pvfull_mmu_ops in paravirt_types_full.h, paravirt_full.h and
the associated vector into paravirt_full.c.

.flush_tlb_others is left in pv_mmu_ops as hyperv support will use it
soon.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/fixmap.h  |   2 +-
 arch/x86/include/asm/mmu_context.h |   4 +-
 arch/x86/include/asm/paravirt.h| 442 ++---
 arch/x86/include/asm/paravirt_full.h   | 422 +++
 arch/x86/include/asm/paravirt_types.h  | 117 +---
 arch/x86/include/asm/paravirt_types_full.h | 116 
 arch/x86/include/asm/pgalloc.h |   2 +-
 arch/x86/include/asm/pgtable.h |   8 +-
 arch/x86/include/asm/special_insns.h   |   6 +-
 arch/x86/include/asm/tlbflush.h|   2 +-
 arch/x86/kernel/asm-offsets.c  |   4 +-
 arch/x86/kernel/head_64.S  |   2 +-
 arch/x86/kernel/paravirt.c | 171 ---
 arch/x86/kernel/paravirt_full.c| 176 
 arch/x86/kernel/paravirt_patch_32.c|  12 +-
 arch/x86/kernel/paravirt_patch_64.c|  16 +-
 arch/x86/lguest/boot.c |  36 +--
 arch/x86/xen/enlighten_pv.c|   8 +-
 arch/x86/xen/mmu_pv.c  |  34 +--
 19 files changed, 797 insertions(+), 783 deletions(-)

diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h
index b65155cc3760..dfef874cb9d6 100644
--- a/arch/x86/include/asm/fixmap.h
+++ b/arch/x86/include/asm/fixmap.h
@@ -149,7 +149,7 @@ void __native_set_fixmap(enum fixed_addresses idx, pte_t 
pte);
 void native_set_fixmap(enum fixed_addresses idx,
   phys_addr_t phys, pgprot_t flags);
 
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_FULL
 static inline void __set_fixmap(enum fixed_addresses idx,
phys_addr_t phys, pgprot_t flags)
 {
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 68b329d77b3a..b38431024463 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -12,12 +12,12 @@
 #include 
 #include 
 #include 
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_FULL
 static inline void paravirt_activate_mm(struct mm_struct *prev,
struct mm_struct *next)
 {
 }
-#endif /* !CONFIG_PARAVIRT */
+#endif /* !CONFIG_PARAVIRT_FULL */
 
 #ifdef CONFIG_PERF_EVENTS
 extern struct static_key rdpmc_always_available;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index f1680e70162b..3b9960a5de4a 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -17,28 +17,15 @@
 
 #ifdef CONFIG_PARAVIRT_FULL
 #include 
+#else
+
+static inline enum paravirt_lazy_mode paravirt_get_lazy_mode(void)
+{
+   return PARAVIRT_LAZY_NONE;
+}
+
 #endif
 
-static inline unsigned long read_cr2(void)
-{
-   return PVOP_CALL0(unsigned long, pv_mmu_ops.read_cr2);
-}
-
-static inline void write_cr2(unsigned long x)
-{
-   PVOP_VCALL1(pv_mmu_ops.write_cr2, x);
-}
-
-static inline unsigned long read_cr3(void)
-{
-   return PVOP_CALL0(unsigned long, pv_mmu_ops.read_cr3);
-}
-
-static inline void write_cr3(unsigned long x)
-{
-   PVOP_VCALL1(pv_mmu_ops.write_cr3, x);
-}
-
 #define get_kernel_rpl()  (pv_info.kernel_rpl)
 
 static inline unsigned long long paravirt_sched_clock(void)
@@ -66,36 +53,11 @@ static inline void slow_down_io(void)
 #endif
 }
 
-static inline void paravirt_activate_mm(struct mm_struct *prev,
-   struct mm_struct *next)
-{
-   PVOP_VCALL2(pv_mmu_ops.activate_mm, prev, next);
-}
-
-static inline void paravirt_arch_dup_mmap(struct mm_struct *oldmm,
- struct mm_struct *mm)
-{
-   PVOP_VCALL2(pv_mmu_ops.dup_mmap, oldmm, mm);
-}
-
 static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 {
PVOP_VCALL1(pv_mmu_ops.exit_mmap, mm);
 }
 
-static inline void __flush_tlb(void)
-{
-   PVOP_VCALL0(pv_mmu_ops.flush_tlb_user);
-}
-static inline void __flush_tlb_global(void)
-{
-   PVOP_VCALL0(pv_mmu_ops.flush_tlb_kernel);
-}
-static inline void __flush_tlb_single(unsigned long addr)
-{
-   PVOP_VCALL1(pv_mmu_ops.flush_tlb_single, addr);
-}
-
 static inline void flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm,
unsigned long start,
@@ -104,375 +66,6 @@ static inline void flush_tlb_others(const struct cpumask 
*cpumask,
PVOP_VCALL4(pv_mmu_ops.flush_tlb_others, cpumask, mm, start, end);
 }
 
-static inline int paravirt_pgd_alloc(struct mm_struct *mm)
-{
-   return PVOP_CALL1(int, pv_mmu_ops.pgd_alloc, mm);
-}
-
-static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *pgd

[Xen-devel] [PATCH 05/10] paravirt: add new PARAVIRT_FULL config item

2017-05-19 Thread Juergen Gross
Add a new config item PARAVIRT_FULL. It will be used to guard the
pv_*_ops functions used by fully paravirtualized guests (Xen pv-guests
and lguest) only.

Kernels not meant to support those guest types will be able to use many
operations without paravirt abstraction while still supporting all the
other paravirt features.

For now just add the new Kconfig option and select it for XEN_PV and
LGUEST_GUEST. Add paravirt_full.c, paravirt_full.h and
paravirt_types_full.h which will contain the necessary implementation
parts of the pv guest specific paravirt functions.

Signed-off-by: Juergen Gross 
---
 MAINTAINERS|  2 +-
 arch/x86/Kconfig   |  4 
 arch/x86/include/asm/paravirt.h|  8 
 arch/x86/include/asm/paravirt_full.h   |  4 
 arch/x86/include/asm/paravirt_types.h  |  4 
 arch/x86/include/asm/paravirt_types_full.h |  4 
 arch/x86/kernel/Makefile   |  1 +
 arch/x86/kernel/paravirt_full.c| 16 
 arch/x86/lguest/Kconfig|  1 +
 arch/x86/xen/Kconfig   |  1 +
 10 files changed, 44 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/include/asm/paravirt_full.h
 create mode 100644 arch/x86/include/asm/paravirt_types_full.h
 create mode 100644 arch/x86/kernel/paravirt_full.c

diff --git a/MAINTAINERS b/MAINTAINERS
index f7d568b8f133..8f22d1cd10a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -9644,7 +9644,7 @@ L:virtualizat...@lists.linux-foundation.org
 S: Supported
 F: Documentation/virtual/paravirt_ops.txt
 F: arch/*/kernel/paravirt*
-F: arch/*/include/asm/paravirt.h
+F: arch/*/include/asm/paravirt*.h
 F: include/linux/hypervisor.h
 
 PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index cd18994a9555..4d032ed27ce7 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -738,6 +738,10 @@ config PARAVIRT_SPINLOCKS
 
  If you are unsure how to answer this question, answer Y.
 
+config PARAVIRT_FULL
+   bool
+   depends on PARAVIRT
+
 config QUEUED_LOCK_STAT
bool "Paravirt queued spinlock statistics"
depends on PARAVIRT_SPINLOCKS && DEBUG_FS
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 55fa56fe4e45..419a3b991e72 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -15,6 +15,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_PARAVIRT_FULL
+#include 
+#endif
+
 static inline void load_sp0(struct tss_struct *tss,
 struct thread_struct *thread)
 {
@@ -916,6 +920,10 @@ extern void default_banner(void);
 #define PARA_INDIRECT(addr)*%cs:addr
 #endif
 
+#ifdef CONFIG_PARAVIRT_FULL
+#include 
+#endif
+
 #define INTERRUPT_RETURN   \
PARA_SITE(PARA_PATCH(pv_cpu_ops, PV_CPU_iret), CLBR_NONE,   \
  jmp PARA_INDIRECT(pv_cpu_ops+PV_CPU_iret))
diff --git a/arch/x86/include/asm/paravirt_full.h 
b/arch/x86/include/asm/paravirt_full.h
new file mode 100644
index ..1cabcfff6791
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_full.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_X86_PARAVIRT_FULL_H
+#define _ASM_X86_PARAVIRT_FULL_H
+
+#endif /* _ASM_X86_PARAVIRT_FULL_H */
diff --git a/arch/x86/include/asm/paravirt_types.h 
b/arch/x86/include/asm/paravirt_types.h
index 7a5de42cb465..dbb0e69cd5c6 100644
--- a/arch/x86/include/asm/paravirt_types.h
+++ b/arch/x86/include/asm/paravirt_types.h
@@ -60,6 +60,10 @@ struct paravirt_callee_save {
void *func;
 };
 
+#ifdef CONFIG_PARAVIRT_FULL
+#include 
+#endif
+
 /* general info */
 struct pv_info {
unsigned int kernel_rpl;
diff --git a/arch/x86/include/asm/paravirt_types_full.h 
b/arch/x86/include/asm/paravirt_types_full.h
new file mode 100644
index ..69c048324e70
--- /dev/null
+++ b/arch/x86/include/asm/paravirt_types_full.h
@@ -0,0 +1,4 @@
+#ifndef _ASM_X86_PARAVIRT_TYPES_FULL_H
+#define _ASM_X86_PARAVIRT_TYPES_FULL_H
+
+#endif  /* _ASM_X86_PARAVIRT_TYPES_FULL_H */
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 4b994232cb57..80fe640e9b63 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -107,6 +107,7 @@ obj-$(CONFIG_KVM_GUEST) += kvm.o kvmclock.o
 obj-$(CONFIG_PARAVIRT) += paravirt.o paravirt_patch_$(BITS).o
 obj-$(CONFIG_PARAVIRT_SPINLOCKS)+= paravirt-spinlocks.o
 obj-$(CONFIG_PARAVIRT_CLOCK)   += pvclock.o
+obj-$(CONFIG_PARAVIRT_FULL)+= paravirt_full.o
 obj-$(CONFIG_X86_PMEM_LEGACY_DEVICE) += pmem.o
 
 obj-$(CONFIG_PCSPKR_PLATFORM)  += pcspeaker.o
diff --git a/arch/x86/kernel/paravirt_full.c b/arch/x86/kernel/paravirt_full.c
new file mode 100644
index ..0c7de64129c5
--- /dev/null
+++ b/arch/x86/kernel/paravirt_full.c
@@ -0,0 +1,16 @@
+/*
+Paravirtualization interfaces for fully paravirtualized guests

[Xen-devel] [PATCH 01/10] x86: remove stale prototype from arch/x86/include/asm/pgalloc.h

2017-05-19 Thread Juergen Gross
paravirt_alloc_pmd_clone() doesn't exist anywhere. Remove its
prototype.

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/pgalloc.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/arch/x86/include/asm/pgalloc.h b/arch/x86/include/asm/pgalloc.h
index b2d0cd8288aa..71de65bb1791 100644
--- a/arch/x86/include/asm/pgalloc.h
+++ b/arch/x86/include/asm/pgalloc.h
@@ -14,8 +14,6 @@ static inline int  __paravirt_pgd_alloc(struct mm_struct *mm) 
{ return 0; }
 static inline void paravirt_pgd_free(struct mm_struct *mm, pgd_t *pgd) {}
 static inline void paravirt_alloc_pte(struct mm_struct *mm, unsigned long pfn) 
{}
 static inline void paravirt_alloc_pmd(struct mm_struct *mm, unsigned long pfn) 
{}
-static inline void paravirt_alloc_pmd_clone(unsigned long pfn, unsigned long 
clonepfn,
-   unsigned long start, unsigned long 
count) {}
 static inline void paravirt_alloc_pud(struct mm_struct *mm, unsigned long pfn) 
{}
 static inline void paravirt_alloc_p4d(struct mm_struct *mm, unsigned long pfn) 
{}
 static inline void paravirt_release_pte(unsigned long pfn) {}
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 06/10] paravirt: split pv_cpu_ops for support of PARAVIRT_FULL

2017-05-19 Thread Juergen Gross
Move functions needed for fully paravirtualized guests only into a new
structure pvfull_cpu_ops in paravirt_types_full.h, paravirt_full.h and
the associated vector into paravirt_full.c.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/entry_64.S  |   4 +-
 arch/x86/include/asm/debugreg.h|   2 +-
 arch/x86/include/asm/desc.h|   4 +-
 arch/x86/include/asm/irqflags.h|  16 +-
 arch/x86/include/asm/msr.h |   4 +-
 arch/x86/include/asm/paravirt.h| 257 +--
 arch/x86/include/asm/paravirt_full.h   | 269 +
 arch/x86/include/asm/paravirt_types.h  |  78 +
 arch/x86/include/asm/paravirt_types_full.h |  78 +
 arch/x86/include/asm/pgtable.h |   8 +-
 arch/x86/include/asm/processor.h   |   4 +-
 arch/x86/include/asm/special_insns.h   |  27 +--
 arch/x86/kernel/asm-offsets.c  |   9 +-
 arch/x86/kernel/asm-offsets_64.c   |   6 +-
 arch/x86/kernel/cpu/common.c   |   4 +-
 arch/x86/kernel/paravirt.c |  67 +--
 arch/x86/kernel/paravirt_full.c|  66 +++
 arch/x86/kernel/paravirt_patch_32.c|   8 +-
 arch/x86/kernel/paravirt_patch_64.c|  18 +-
 arch/x86/lguest/boot.c |  38 ++--
 arch/x86/xen/enlighten_pv.c|  14 +-
 21 files changed, 522 insertions(+), 459 deletions(-)

diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index cd47214ff402..4e85e9c9a2f8 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -42,12 +42,12 @@
 .code64
 .section .entry.text, "ax"
 
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_FULL
 ENTRY(native_usergs_sysret64)
swapgs
sysretq
 ENDPROC(native_usergs_sysret64)
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_FULL */
 
 .macro TRACE_IRQS_IRETQ
 #ifdef CONFIG_TRACE_IRQFLAGS
diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h
index 12cb66f6d3a5..6477da0e4869 100644
--- a/arch/x86/include/asm/debugreg.h
+++ b/arch/x86/include/asm/debugreg.h
@@ -7,7 +7,7 @@
 
 DECLARE_PER_CPU(unsigned long, cpu_dr7);
 
-#ifndef CONFIG_PARAVIRT
+#ifndef CONFIG_PARAVIRT_FULL
 /*
  * These special macros can be used to get or set a debugging register
  */
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d0a21b12dd58..be2037db49a8 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -118,7 +118,7 @@ static inline int desc_empty(const void *ptr)
return !(desc[0] | desc[1]);
 }
 
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_FULL
 #include 
 #else
 #define load_TR_desc() native_load_tr_desc()
@@ -145,7 +145,7 @@ static inline void paravirt_alloc_ldt(struct desc_struct 
*ldt, unsigned entries)
 static inline void paravirt_free_ldt(struct desc_struct *ldt, unsigned entries)
 {
 }
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_FULL */
 
 #define store_ldt(ldt) asm("sldt %0" : "=m"(ldt))
 
diff --git a/arch/x86/include/asm/irqflags.h b/arch/x86/include/asm/irqflags.h
index ac7692dcfa2e..c3319c20127c 100644
--- a/arch/x86/include/asm/irqflags.h
+++ b/arch/x86/include/asm/irqflags.h
@@ -119,6 +119,16 @@ static inline notrace unsigned long 
arch_local_irq_save(void)
 #define DISABLE_INTERRUPTS(x)  cli
 
 #ifdef CONFIG_X86_64
+#define PARAVIRT_ADJUST_EXCEPTION_FRAME/*  */
+#endif
+
+#endif /* __ASSEMBLY__ */
+#endif /* CONFIG_PARAVIRT */
+
+#ifndef CONFIG_PARAVIRT_FULL
+#ifdef __ASSEMBLY__
+
+#ifdef CONFIG_X86_64
 #define SWAPGS swapgs
 /*
  * Currently paravirt can't handle swapgs nicely when we
@@ -131,8 +141,6 @@ static inline notrace unsigned long 
arch_local_irq_save(void)
  */
 #define SWAPGS_UNSAFE_STACKswapgs
 
-#define PARAVIRT_ADJUST_EXCEPTION_FRAME/*  */
-
 #define INTERRUPT_RETURN   jmp native_iret
 #define USERGS_SYSRET64\
swapgs; \
@@ -143,13 +151,11 @@ static inline notrace unsigned long 
arch_local_irq_save(void)
 
 #else
 #define INTERRUPT_RETURN   iret
-#define ENABLE_INTERRUPTS_SYSEXIT  sti; sysexit
 #define GET_CR0_INTO_EAX   movl %cr0, %eax
 #endif
 
-
 #endif /* __ASSEMBLY__ */
-#endif /* CONFIG_PARAVIRT */
+#endif /* CONFIG_PARAVIRT_FULL */
 
 #ifndef __ASSEMBLY__
 static inline int arch_irqs_disabled_flags(unsigned long flags)
diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
index 898dba2e2e2c..7c715f811590 100644
--- a/arch/x86/include/asm/msr.h
+++ b/arch/x86/include/asm/msr.h
@@ -231,7 +231,7 @@ static inline unsigned long long native_read_pmc(int 
counter)
return EAX_EDX_VAL(val, low, high);
 }
 
-#ifdef CONFIG_PARAVIRT
+#ifdef CONFIG_PARAVIRT_FULL
 #include 
 #else
 #include 
@@ -294,7 +294,7 @@ do { 

[Xen-devel] [PATCH 10/10] paravirt: merge pv_ops_* structures into one

2017-05-19 Thread Juergen Gross
As there are now only very few pvops functions left when
CONFIG_PARAVIRT_FULL isn't set, merge the related structures into
one named "pv_ops".

Signed-off-by: Juergen Gross 
---
 arch/x86/include/asm/paravirt.h   | 32 
 arch/x86/include/asm/paravirt_types.h | 27 ---
 arch/x86/kernel/alternative.c |  4 ++--
 arch/x86/kernel/asm-offsets.c |  6 +++---
 arch/x86/kernel/cpu/vmware.c  |  6 +++---
 arch/x86/kernel/kvm.c |  6 +++---
 arch/x86/kernel/kvmclock.c|  6 +++---
 arch/x86/kernel/paravirt.c| 33 +
 arch/x86/kernel/paravirt_patch_32.c   | 16 
 arch/x86/kernel/paravirt_patch_64.c   | 16 
 arch/x86/kernel/tsc.c |  2 +-
 arch/x86/kernel/vsmp_64.c | 18 +-
 arch/x86/lguest/boot.c| 20 ++--
 arch/x86/xen/enlighten_hvm.c  |  4 ++--
 arch/x86/xen/enlighten_pv.c   | 28 
 arch/x86/xen/irq.c| 12 
 arch/x86/xen/mmu_hvm.c|  2 +-
 arch/x86/xen/mmu_pv.c |  4 ++--
 arch/x86/xen/time.c   | 11 ---
 drivers/xen/time.c|  2 +-
 20 files changed, 101 insertions(+), 154 deletions(-)

diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 55e0c1807df2..0f8194ec64c9 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x86/include/asm/paravirt.h
@@ -28,7 +28,7 @@ static inline enum paravirt_lazy_mode 
paravirt_get_lazy_mode(void)
 
 static inline unsigned long long paravirt_sched_clock(void)
 {
-   return PVOP_CALL0(unsigned long long, pv_time_ops.sched_clock);
+   return PVOP_CALL0(unsigned long long, pv_ops.sched_clock);
 }
 
 struct static_key;
@@ -37,23 +37,23 @@ extern struct static_key paravirt_steal_rq_enabled;
 
 static inline u64 paravirt_steal_clock(int cpu)
 {
-   return PVOP_CALL1(u64, pv_time_ops.steal_clock, cpu);
+   return PVOP_CALL1(u64, pv_ops.steal_clock, cpu);
 }
 
 /* The paravirtualized I/O functions */
 static inline void slow_down_io(void)
 {
-   pv_cpu_ops.io_delay();
+   pv_ops.io_delay();
 #ifdef REALLY_SLOW_IO
-   pv_cpu_ops.io_delay();
-   pv_cpu_ops.io_delay();
-   pv_cpu_ops.io_delay();
+   pv_ops.io_delay();
+   pv_ops.io_delay();
+   pv_ops.io_delay();
 #endif
 }
 
 static inline void paravirt_arch_exit_mmap(struct mm_struct *mm)
 {
-   PVOP_VCALL1(pv_mmu_ops.exit_mmap, mm);
+   PVOP_VCALL1(pv_ops.exit_mmap, mm);
 }
 
 static inline void flush_tlb_others(const struct cpumask *cpumask,
@@ -61,7 +61,7 @@ static inline void flush_tlb_others(const struct cpumask 
*cpumask,
unsigned long start,
unsigned long end)
 {
-   PVOP_VCALL4(pv_mmu_ops.flush_tlb_others, cpumask, mm, start, end);
+   PVOP_VCALL4(pv_ops.flush_tlb_others, cpumask, mm, start, end);
 }
 
 #if defined(CONFIG_SMP) && defined(CONFIG_PARAVIRT_SPINLOCKS)
@@ -173,22 +173,22 @@ static __always_inline bool pv_vcpu_is_preempted(long cpu)
 
 static inline notrace unsigned long arch_local_save_flags(void)
 {
-   return PVOP_CALLEE0(unsigned long, pv_irq_ops.save_fl);
+   return PVOP_CALLEE0(unsigned long, pv_ops.save_fl);
 }
 
 static inline notrace void arch_local_irq_restore(unsigned long f)
 {
-   PVOP_VCALLEE1(pv_irq_ops.restore_fl, f);
+   PVOP_VCALLEE1(pv_ops.restore_fl, f);
 }
 
 static inline notrace void arch_local_irq_disable(void)
 {
-   PVOP_VCALLEE0(pv_irq_ops.irq_disable);
+   PVOP_VCALLEE0(pv_ops.irq_disable);
 }
 
 static inline notrace void arch_local_irq_enable(void)
 {
-   PVOP_VCALLEE0(pv_irq_ops.irq_enable);
+   PVOP_VCALLEE0(pv_ops.irq_enable);
 }
 
 static inline notrace unsigned long arch_local_irq_save(void)
@@ -286,15 +286,15 @@ extern void default_banner(void);
 #endif
 
 #define DISABLE_INTERRUPTS(clobbers)   \
-   PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers, \
+   PARA_SITE(PARA_PATCH(pv_ops, PV_IRQ_irq_disable), clobbers, \
  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_disable);\
+ call PARA_INDIRECT(pv_ops+PV_IRQ_irq_disable);\
  PV_RESTORE_REGS(clobbers | CLBR_CALLEE_SAVE);)
 
 #define ENABLE_INTERRUPTS(clobbers)\
-   PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_enable), clobbers,  \
+   PARA_SITE(PARA_PATCH(pv_ops, PV_IRQ_irq_enable), clobbers,  \
  PV_SAVE_REGS(clobbers | CLBR_CALLEE_SAVE);\
- call PARA_INDIRECT(pv_irq_ops+PV_IRQ_irq_enable); \
+ call PARA_INDIRECT(p

[Xen-devel] [PATCH 03/10] xen: move interrupt handling for pv guests under CONFIG_XEN_PV umbrella

2017-05-19 Thread Juergen Gross
There is no need to include pv-guest only object files in a kernel not
configured to support those. Move Xen's irq.o, xen-asm*.o and pv parts
of entry_*.o into CONFIG_XEN_PV sections.

Signed-off-by: Juergen Gross 
---
 arch/x86/entry/entry_32.S | 4 +++-
 arch/x86/entry/entry_64.S | 6 --
 arch/x86/xen/Makefile | 8 
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
index 50bc26949e9e..37ae4a7809d9 100644
--- a/arch/x86/entry/entry_32.S
+++ b/arch/x86/entry/entry_32.S
@@ -789,7 +789,7 @@ ENTRY(spurious_interrupt_bug)
jmp common_exception
 END(spurious_interrupt_bug)
 
-#ifdef CONFIG_XEN
+#ifdef CONFIG_XEN_PV
 ENTRY(xen_hypervisor_callback)
pushl   $-1 /* orig_ax = -1 => not a system 
call */
SAVE_ALL
@@ -870,7 +870,9 @@ ENTRY(xen_failsafe_callback)
_ASM_EXTABLE(3b, 8b)
_ASM_EXTABLE(4b, 9b)
 ENDPROC(xen_failsafe_callback)
+#endif /* CONFIG_XEN_PV */
 
+#ifdef CONFIG_XEN
 BUILD_INTERRUPT3(xen_hvm_callback_vector, HYPERVISOR_CALLBACK_VECTOR,
xen_evtchn_do_upcall)
 
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 607d72c4a485..cd47214ff402 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -902,7 +902,7 @@ ENTRY(do_softirq_own_stack)
ret
 END(do_softirq_own_stack)
 
-#ifdef CONFIG_XEN
+#ifdef CONFIG_XEN_PV
 idtentry xen_hypervisor_callback xen_do_hypervisor_callback has_error_code=0
 
 /*
@@ -983,7 +983,9 @@ ENTRY(xen_failsafe_callback)
ENCODE_FRAME_POINTER
jmp error_exit
 END(xen_failsafe_callback)
+#endif /* CONFIG_XEN_PV */
 
+#ifdef CONFIG_XEN
 apicinterrupt3 HYPERVISOR_CALLBACK_VECTOR \
xen_hvm_callback_vector xen_evtchn_do_upcall
 
@@ -998,7 +1000,7 @@ idtentry debug do_debug
has_error_code=0paranoid=1 shift_ist=DEBUG_STACK
 idtentry int3  do_int3 has_error_code=0
paranoid=1 shift_ist=DEBUG_STACK
 idtentry stack_segment do_stack_segmenthas_error_code=1
 
-#ifdef CONFIG_XEN
+#ifdef CONFIG_XEN_PV
 idtentry xen_debug do_debughas_error_code=0
 idtentry xen_int3  do_int3 has_error_code=0
 idtentry xen_stack_segment do_stack_segmenthas_error_code=1
diff --git a/arch/x86/xen/Makefile b/arch/x86/xen/Makefile
index fffb0a16f9e3..5fc463eaafff 100644
--- a/arch/x86/xen/Makefile
+++ b/arch/x86/xen/Makefile
@@ -10,13 +10,13 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_enlighten_pv.o  := $(nostackp)
 CFLAGS_mmu_pv.o:= $(nostackp)
 
-obj-y  := enlighten.o multicalls.o mmu.o irq.o \
-   time.o xen-asm.o xen-asm_$(BITS).o \
+obj-y  := enlighten.o multicalls.o mmu.o time.o \
grant-table.o suspend.o platform-pci-unplug.o
 
 obj-$(CONFIG_XEN_PVHVM)+= enlighten_hvm.o mmu_hvm.o 
suspend_hvm.o
-obj-$(CONFIG_XEN_PV)   += setup.o apic.o pmu.o suspend_pv.o \
-   p2m.o enlighten_pv.o mmu_pv.o
+obj-$(CONFIG_XEN_PV)   += setup.o apic.o pmu.o suspend_pv.o irq.o \
+   p2m.o enlighten_pv.o mmu_pv.o \
+   xen-asm.o xen-asm_$(BITS).o
 obj-$(CONFIG_XEN_PVH)  += enlighten_pvh.o
 
 obj-$(CONFIG_EVENT_TRACING) += trace.o
-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 00/10] paravirt: make amount of paravirtualization configurable

2017-05-19 Thread Juergen Gross
Today paravirtualization is a all-or-nothing game: either a kernel is
compiled with no paravirtualization support at all, or it is supporting
paravirtualized environments like Xen pv-guests or lguest additionally
to some paravirtualized tuning for KVM, Hyperv, VMWare or Xen
HVM-guests.

As support of pv-guests requires quite intrusive pv-hooks (e.g. all
access functions to page table entries, privileged instructions) it is
desirable to enable those hooks only in cases where support of
pv-guests is really desired.

With splitting up of Xen guest support into pv-guest support and
support for HVM-guests it is now possible to do the same for support
of paravirtualization: only if XEN_PV or LGUEST_GUEST are configured
full paravirtualization is required.

This patch series carves out pv-guest support form PARAVIRT by
introducing PARAVIRT_FULL config option selected by XEN_PV and
LGUEST_GUEST config options.

The series has been tested for 32 and 64 bit kernels without
PARAVIRT, with PARAVIRT, and with PARAVIRT + PARAVIRT_FULL configured.


Juergen Gross (10):
  x86: remove stale prototype from arch/x86/include/asm/pgalloc.h
  paravirt: remove unused function paravirt_disable_iospace()
  xen: move interrupt handling for pv guests under CONFIG_XEN_PV
umbrella
  xen: remove non-pv test from arch/x86/xen/irq.c
  paravirt: add new PARAVIRT_FULL config item
  paravirt: split pv_cpu_ops for support of PARAVIRT_FULL
  paravirt: split pv_irq_ops for support of PARAVIRT_FULL
  paravirt: split pv_mmu_ops for support of PARAVIRT_FULL
  paravirt: split pv_info for support of PARAVIRT_FULL
  paravirt: merge pv_ops_* structures into one

 MAINTAINERS |   2 +-
 arch/x86/Kconfig|   4 +
 arch/x86/boot/compressed/misc.h |   1 +
 arch/x86/entry/entry_32.S   |   4 +-
 arch/x86/entry/entry_64.S   |  10 +-
 arch/x86/include/asm/debugreg.h |   2 +-
 arch/x86/include/asm/desc.h |   4 +-
 arch/x86/include/asm/fixmap.h   |   2 +-
 arch/x86/include/asm/irqflags.h |  40 +-
 arch/x86/include/asm/mmu_context.h  |   4 +-
 arch/x86/include/asm/msr.h  |   4 +-
 arch/x86/include/asm/paravirt.h | 738 ++--
 arch/x86/include/asm/paravirt_full.h| 714 +++
 arch/x86/include/asm/paravirt_types.h   | 243 +
 arch/x86/include/asm/paravirt_types_full.h  | 218 
 arch/x86/include/asm/pgalloc.h  |   4 +-
 arch/x86/include/asm/pgtable-3level_types.h |   4 +-
 arch/x86/include/asm/pgtable.h  |   8 +-
 arch/x86/include/asm/processor.h|   4 +-
 arch/x86/include/asm/ptrace.h   |   5 +-
 arch/x86/include/asm/segment.h  |   2 +-
 arch/x86/include/asm/special_insns.h|  25 +-
 arch/x86/include/asm/tlbflush.h |   2 +-
 arch/x86/kernel/Makefile|   1 +
 arch/x86/kernel/alternative.c   |   4 +-
 arch/x86/kernel/asm-offsets.c   |  21 +-
 arch/x86/kernel/asm-offsets_64.c|   9 +-
 arch/x86/kernel/cpu/common.c|   4 +-
 arch/x86/kernel/cpu/vmware.c|   6 +-
 arch/x86/kernel/head_64.S   |   2 +-
 arch/x86/kernel/kvm.c   |   6 +-
 arch/x86/kernel/kvmclock.c  |   6 +-
 arch/x86/kernel/paravirt.c  | 303 +---
 arch/x86/kernel/paravirt_full.c | 277 +++
 arch/x86/kernel/paravirt_patch_32.c |  36 +-
 arch/x86/kernel/paravirt_patch_64.c |  50 +-
 arch/x86/kernel/tsc.c   |   2 +-
 arch/x86/kernel/vsmp_64.c   |  18 +-
 arch/x86/lguest/Kconfig |   1 +
 arch/x86/lguest/boot.c  | 100 ++--
 arch/x86/xen/Kconfig|   1 +
 arch/x86/xen/Makefile   |   8 +-
 arch/x86/xen/enlighten_hvm.c|   4 +-
 arch/x86/xen/enlighten_pv.c |  58 ++-
 arch/x86/xen/irq.c  |  15 +-
 arch/x86/xen/mmu_hvm.c  |   2 +-
 arch/x86/xen/mmu_pv.c   |  34 +-
 arch/x86/xen/time.c |  11 +-
 drivers/xen/time.c  |   2 +-
 49 files changed, 1548 insertions(+), 1477 deletions(-)
 create mode 100644 arch/x86/include/asm/paravirt_full.h
 create mode 100644 arch/x86/include/asm/paravirt_types_full.h
 create mode 100644 arch/x86/kernel/paravirt_full.c

-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [GIT PULL] xen: fixes for 4.12 rc2

2017-05-19 Thread Juergen Gross
Linus,

Please git pull the following tag:

 git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git 
for-linus-4.12b-rc2-tag

xen: fixes for 4.12 rc2

It contains some fixes for the new Xen 9pfs frontend and some minor cleanups.
Thanks.

Juergen

 arch/x86/xen/enlighten_pv.c | 15 ++-
 arch/x86/xen/mmu.c  |  2 +-
 arch/x86/xen/mmu_pv.c   | 98 +
 net/9p/trans_xen.c  |  8 ++--
 4 files changed, 45 insertions(+), 78 deletions(-)

Juergen Gross (2):
  xen: cleanup pvh leftovers from pv-only sources
  xen: make xen_flush_tlb_all() static

Wei Yongjun (2):
  xen/9pfs: fix return value check in xen_9pfs_front_probe()
  xen/9pfs: p9_trans_xen_init and p9_trans_xen_exit can be static

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 0/2] xen: add xen sysfs node

2017-05-22 Thread Juergen Gross
In order to be able to determine the Xen guest type from within the
guest as a user there is currently no stable interface available.

Add a sysfs node for that purpose as the guest type information is
available for the kernel.

While doing this document all the other Xen related sysfs nodes.

Juergen Gross (2):
  doc,xen: document hypervisor sysfs nodes for xen
  xen: add sysfs node for guest type

 Documentation/ABI/testing/sysfs-hypervisor | 144 +
 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 
 MAINTAINERS|   1 +
 arch/arm/xen/enlighten.c   |   3 +
 arch/x86/xen/enlighten.c   |   3 +
 arch/x86/xen/enlighten_hvm.c   |   6 +-
 arch/x86/xen/enlighten_pv.c|   1 +
 drivers/xen/sys-hypervisor.c   |  17 +++
 include/xen/xen.h  |   2 +
 9 files changed, 175 insertions(+), 25 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor
 delete mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu

-- 
2.12.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
Currently there is no reliable user interface inside a Xen guest to
determine its type (e.g. HVM, PV or PVH). Instead of letting user mode
try to determine this by various rather hacky mechanisms (parsing of
boot messages before they are gone, trying to make use of known subtle
differences in behavior of some instructions), add a sysfs node
/sys/hypervisor/guest_type to explicitly deliver this information as
it is known to the kernel.

Signed-off-by: Juergen Gross 
---
 Documentation/ABI/testing/sysfs-hypervisor | 13 +
 arch/arm/xen/enlighten.c   |  3 +++
 arch/x86/xen/enlighten.c   |  3 +++
 arch/x86/xen/enlighten_hvm.c   |  6 --
 arch/x86/xen/enlighten_pv.c|  1 +
 drivers/xen/sys-hypervisor.c   | 17 +
 include/xen/xen.h  |  2 ++
 7 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
b/Documentation/ABI/testing/sysfs-hypervisor
index 443196f0aa1c..06850f74ebd4 100644
--- a/Documentation/ABI/testing/sysfs-hypervisor
+++ b/Documentation/ABI/testing/sysfs-hypervisor
@@ -19,6 +19,19 @@ Contact: xen-de...@lists.xenproject.org
 Description:
Compiler which was used to build the Xen hypervisor
 
+What:  /sys/hypervisor/guest_type
+Date:  May 2017
+KernelVersion: 4.12
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Type of guest:
+   "native": standard guest type on arm
+   "HVM": fully virtualized guest (x86)
+   "PV": paravirtualized guest (x86)
+   "PVH": fully virtualized guest without legacy emulation (x86)
+   "PVHVM": fully virtualized guest using paravirtualized
+   interfaces (e.g. interrupts, timers) (x86)
+
 What:  /sys/hypervisor/pmu/pmu_mode
 Date:  August 2015
 KernelVersion: 4.3
diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..d2100f79e458 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -43,6 +43,9 @@ EXPORT_SYMBOL(xen_start_info);
 enum xen_domain_type xen_domain_type = XEN_NATIVE;
 EXPORT_SYMBOL(xen_domain_type);
 
+char *xen_guest_type = "native";
+EXPORT_SYMBOL_GPL(xen_guest_type);
+
 struct shared_info xen_dummy_shared_info;
 struct shared_info *HYPERVISOR_shared_info = (void *)&xen_dummy_shared_info;
 
diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c
index a5ffcbb20cc0..53bd23dfecac 100644
--- a/arch/x86/xen/enlighten.c
+++ b/arch/x86/xen/enlighten.c
@@ -47,6 +47,9 @@ EXPORT_PER_CPU_SYMBOL(xen_vcpu_id);
 enum xen_domain_type xen_domain_type = XEN_NATIVE;
 EXPORT_SYMBOL_GPL(xen_domain_type);
 
+char *xen_guest_type = "HVM";
+EXPORT_SYMBOL_GPL(xen_guest_type);
+
 unsigned long *machine_to_phys_mapping = (void *)MACH2PHYS_VIRT_START;
 EXPORT_SYMBOL(machine_to_phys_mapping);
 unsigned long  machine_to_phys_nr;
diff --git a/arch/x86/xen/enlighten_hvm.c b/arch/x86/xen/enlighten_hvm.c
index a6d014f47e52..ba6dbff8675d 100644
--- a/arch/x86/xen/enlighten_hvm.c
+++ b/arch/x86/xen/enlighten_hvm.c
@@ -68,13 +68,15 @@ static void __init init_hvm_pv_info(void)
xen_domain_type = XEN_HVM_DOMAIN;
 
/* PVH set up hypercall page in xen_prepare_pvh(). */
-   if (xen_pvh_domain())
+   if (xen_pvh_domain()) {
pv_info.name = "Xen PVH";
-   else {
+   xen_guest_type = "PVH";
+   } else {
u64 pfn;
uint32_t msr;
 
pv_info.name = "Xen HVM";
+   xen_guest_type = "PVHVM";
msr = cpuid_ebx(base + 2);
pfn = __pa(hypercall_page);
wrmsr_safe(msr, (u32)pfn, (u32)(pfn >> 32));
diff --git a/arch/x86/xen/enlighten_pv.c b/arch/x86/xen/enlighten_pv.c
index f33eef4ebd12..7ac5e93d96f3 100644
--- a/arch/x86/xen/enlighten_pv.c
+++ b/arch/x86/xen/enlighten_pv.c
@@ -1255,6 +1255,7 @@ asmlinkage __visible void __init xen_start_kernel(void)
return;
 
xen_domain_type = XEN_PV_DOMAIN;
+   xen_guest_type = "PV";
 
xen_setup_features();
 
diff --git a/drivers/xen/sys-hypervisor.c b/drivers/xen/sys-hypervisor.c
index 84106f9c456c..d641e9970d5d 100644
--- a/drivers/xen/sys-hypervisor.c
+++ b/drivers/xen/sys-hypervisor.c
@@ -50,6 +50,18 @@ static int __init xen_sysfs_type_init(void)
return sysfs_create_file(hypervisor_kobj, &type_attr.attr);
 }
 
+static ssize_t guest_type_show(struct hyp_sysfs_attr *attr, char *buffer)
+{
+   return sprintf(buffer, "%s\n", xen_guest_type);
+}
+
+HYPERVISOR_ATTR_RO(guest_type);
+
+static int __init xen_sysfs_guest_type_init(void)
+{
+   return sysfs_create_file(hypervisor_kobj, &guest_type_attr.attr);
+}
+
 /* xen version attributes 

[Xen-devel] [PATCH 1/2] doc, xen: document hypervisor sysfs nodes for xen

2017-05-22 Thread Juergen Gross
Today only a few sysfs nodes under /sys/hypervisor/ are documented
for Xen in Documentation/ABI/testing/sysfs-hypervisor-pmu. Rename
this file to Documentation/ABI/testing/sysfs-hypervisor and add
descriptions of the other nodes.

Signed-off-by: Juergen Gross 
---
 Documentation/ABI/testing/sysfs-hypervisor | 131 +
 Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 -
 MAINTAINERS|   1 +
 3 files changed, 132 insertions(+), 23 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-hypervisor
 delete mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu

diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
b/Documentation/ABI/testing/sysfs-hypervisor
new file mode 100644
index ..443196f0aa1c
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-hypervisor
@@ -0,0 +1,131 @@
+What:  /sys/hypervisor/compilation/compile_date
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Contains the build time stamp of the Xen hypervisor
+
+What:  /sys/hypervisor/compilation/compiled_by
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Contains information who built the Xen hypervisor
+
+What:  /sys/hypervisor/compilation/compiler
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Compiler which was used to build the Xen hypervisor
+
+What:  /sys/hypervisor/pmu/pmu_mode
+Date:  August 2015
+KernelVersion: 4.3
+Contact:   Boris Ostrovsky 
+Description:
+   Describes mode that Xen's performance-monitoring unit (PMU)
+   uses. Accepted values are
+   "off"  -- PMU is disabled
+   "self" -- The guest can profile itself
+   "hv"   -- The guest can profile itself and, if it is
+ privileged (e.g. dom0), the hypervisor
+   "all" --  The guest can profile itself, the hypervisor
+ and all other guests. Only available to
+ privileged guests.
+
+What:   /sys/hypervisor/pmu/pmu_features
+Date:   August 2015
+KernelVersion:  4.3
+Contact:Boris Ostrovsky 
+Description:
+   Describes Xen PMU features (as an integer). A set bit indicates
+   that the corresponding feature is enabled. See
+   include/xen/interface/xenpmu.h for available features
+
+What:  /sys/hypervisor/properties/capabilities
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Space separated list of supported guest system types. Each type
+   is in the format: -.-
+   With:
+   : "xen" -- x86: paravirtualized, arm: standard
+"hvm" -- x86 only: full virtualized
+   : major guest interface version
+   : minor guest interface version
+   :  architecture, e.g.:
+"x86_32": 32 bit x86 guest without PAE
+"x86_32p": 32 bit x86 guest with PAE
+"x86_64": 64 bit x86 guest
+"armv7l": 32 bit arm guest
+"aarch64": 64 bit arm guest
+
+What:  /sys/hypervisor/properties/changeset
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Changeset of the hypervisor (git commit)
+
+What:  /sys/hypervisor/properties/features
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Features the Xen hypervisor supports for the guest as defined
+   in include/xen/interface/features.h printed as a hex value.
+
+What:  /sys/hypervisor/properties/pagesize
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Default page size of the hypervisor printed as a hex value.
+
+What:  /sys/hypervisor/properties/virtual_start
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Virtual address of the hypervisor as a hex value.
+
+What:  /sys/hypervisor/type
+Date:  March 2009
+KernelVersion: 2.6.30
+Contact:   xen-de...@lists.xenproject.org
+Description:
+   Type of hypervisor:
+   "xen": Xen hypervisor
+
+What:  

[Xen-devel] [RFC PATCH] xen: get rid of paravirt op adjust_exception_frame

2017-05-22 Thread Juergen Gross
When running as Xen pv-guest the exception frame on the stack contains
%r11 and %rcx additional to the other data pushed by the processor.

Instead of having a paravirt op being called for each exception type
prepend the Xen specific code to each exception entry. When running as
Xen pv-guest just use the exception entry with prepended instructions,
otherwise use the entry without the Xen specific code.

Signed-off-by: Juergen Gross 
---
I'm sure there are several comments regarding how to hide e.g.
pv_idt_prologue or where to put the definition of the Xen specific
prologue (now just defined in arch/x86/entry/calling.h).

This is the main reason for making this patch RFC right now.

I have tested the kernel to boot successfully native and as Xen dom0
(pv, of course).
---
 arch/x86/entry/calling.h  |  6 ++
 arch/x86/entry/entry_64.S | 17 ++---
 arch/x86/entry/entry_64_compat.S  |  3 +--
 arch/x86/include/asm/desc.h   |  7 +++
 arch/x86/include/asm/paravirt.h   |  5 -
 arch/x86/include/asm/paravirt_types.h |  4 
 arch/x86/kernel/asm-offsets_64.c  |  1 -
 arch/x86/kernel/paravirt.c|  3 ---
 arch/x86/xen/enlighten_pv.c   |  8 ++--
 arch/x86/xen/irq.c|  3 ---
 arch/x86/xen/setup.c  |  3 ++-
 arch/x86/xen/smp_pv.c |  2 +-
 arch/x86/xen/xen-asm_64.S |  6 --
 arch/x86/xen/xen-ops.h|  2 +-
 14 files changed, 26 insertions(+), 44 deletions(-)

diff --git a/arch/x86/entry/calling.h b/arch/x86/entry/calling.h
index 05ed3d393da7..8b315ee49c93 100644
--- a/arch/x86/entry/calling.h
+++ b/arch/x86/entry/calling.h
@@ -227,3 +227,9 @@ For 32-bit we have the following conventions - kernel is 
built with
 .Lafter_call_\@:
 #endif
 .endm
+
+#ifdef CONFIG_XEN_PV
+#define PV_ENTRY(sym) ENTRY(_xen_##sym); pop %rcx; pop %r11; .globl sym; sym:
+#else
+#define PV_ENTRY(sym) ENTRY(sym)
+#endif
diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S
index 607d72c4a485..780ea67fb7ea 100644
--- a/arch/x86/entry/entry_64.S
+++ b/arch/x86/entry/entry_64.S
@@ -740,14 +740,13 @@ apicinterrupt IRQ_WORK_VECTOR 
irq_work_interrupt  smp_irq_work_interrupt
 #define CPU_TSS_IST(x) PER_CPU_VAR(cpu_tss) + (TSS_ist + ((x) - 1) * 8)
 
 .macro idtentry sym do_sym has_error_code:req paranoid=0 shift_ist=-1
-ENTRY(\sym)
+PV_ENTRY(\sym)
/* Sanity check */
.if \shift_ist != -1 && \paranoid == 0
.error "using shift_ist requires paranoid=1"
.endif
 
ASM_CLAC
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
 
.ifeq \has_error_code
pushq   $-1 /* ORIG_RAX: no syscall to 
restart */
@@ -1161,19 +1160,7 @@ ENTRY(error_exit)
 END(error_exit)
 
 /* Runs on exception stack */
-ENTRY(nmi)
-   /*
-* Fix up the exception frame if we're on Xen.
-* PARAVIRT_ADJUST_EXCEPTION_FRAME is guaranteed to push at most
-* one value to the stack on native, so it may clobber the rdx
-* scratch slot, but it won't clobber any of the important
-* slots past it.
-*
-* Xen is a different story, because the Xen frame itself overlaps
-* the "NMI executing" variable.
-*/
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
-
+PV_ENTRY(nmi)
/*
 * We allow breakpoints in NMIs. If a breakpoint occurs, then
 * the iretq it performs will take us out of NMI context.
diff --git a/arch/x86/entry/entry_64_compat.S b/arch/x86/entry/entry_64_compat.S
index e1721dafbcb1..9fd8c8f6004e 100644
--- a/arch/x86/entry/entry_64_compat.S
+++ b/arch/x86/entry/entry_64_compat.S
@@ -290,11 +290,10 @@ END(entry_SYSCALL_compat)
  * edi  arg5
  * ebp  arg6
  */
-ENTRY(entry_INT80_compat)
+PV_ENTRY(entry_INT80_compat)
/*
 * Interrupts are off on entry.
 */
-   PARAVIRT_ADJUST_EXCEPTION_FRAME
ASM_CLAC/* Do this early to minimize exposure */
SWAPGS
 
diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index d0a21b12dd58..ff19be44877a 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -85,9 +85,16 @@ static inline phys_addr_t get_cpu_gdt_paddr(unsigned int cpu)
 
 #ifdef CONFIG_X86_64
 
+#ifdef CONFIG_XEN_PV
+extern unsigned int pv_idt_prologue;
+#else
+#define pv_idt_prologue 0
+#endif
+
 static inline void pack_gate(gate_desc *gate, unsigned type, unsigned long 
func,
 unsigned dpl, unsigned ist, unsigned seg)
 {
+   func -= pv_idt_prologue;
gate->offset_low= PTR_LOW(func);
gate->segment   = __KERNEL_CS;
gate->ist   = ist;
diff --git a/arch/x86/include/asm/paravirt.h b/arch/x86/include/asm/paravirt.h
index 55fa56fe4e45..e0da3d7d6609 100644
--- a/arch/x86/include/asm/paravirt.h
+++ b/arch/x8

Re: [Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
On 22/05/17 15:33, Andrew Cooper wrote:
> On 22/05/17 09:57, Juergen Gross wrote:
>> Currently there is no reliable user interface inside a Xen guest to
>> determine its type (e.g. HVM, PV or PVH). Instead of letting user mode
>> try to determine this by various rather hacky mechanisms (parsing of
>> boot messages before they are gone, trying to make use of known subtle
>> differences in behavior of some instructions), add a sysfs node
>> /sys/hypervisor/guest_type to explicitly deliver this information as
>> it is known to the kernel.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  Documentation/ABI/testing/sysfs-hypervisor | 13 +
>>  arch/arm/xen/enlighten.c   |  3 +++
>>  arch/x86/xen/enlighten.c   |  3 +++
>>  arch/x86/xen/enlighten_hvm.c   |  6 --
>>  arch/x86/xen/enlighten_pv.c|  1 +
>>  drivers/xen/sys-hypervisor.c   | 17 +
>>  include/xen/xen.h  |  2 ++
>>  7 files changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
>> b/Documentation/ABI/testing/sysfs-hypervisor
>> index 443196f0aa1c..06850f74ebd4 100644
>> --- a/Documentation/ABI/testing/sysfs-hypervisor
>> +++ b/Documentation/ABI/testing/sysfs-hypervisor
>> @@ -19,6 +19,19 @@ Contact:  xen-de...@lists.xenproject.org
>>  Description:
>>  Compiler which was used to build the Xen hypervisor
>>  
>> +What:   /sys/hypervisor/guest_type
>> +Date:   May 2017
>> +KernelVersion:  4.12
>> +Contact:xen-de...@lists.xenproject.org
>> +Description:
>> +Type of guest:
>> +"native": standard guest type on arm
>> +"HVM": fully virtualized guest (x86)
>> +"PV": paravirtualized guest (x86)
>> +"PVH": fully virtualized guest without legacy emulation (x86)
>> +"PVHVM": fully virtualized guest using paravirtualized
>> +interfaces (e.g. interrupts, timers) (x86)
> 
> I'm not sure this is wise split.  PVHVM is a spectrum which changes
> dynamically, especially in the presence of hardware APIC support.
> 
> I'd suggest guest type being straight PV or HVM (being the container
> type), and a list of items (interrupts, timers, legacy emulation) which
> are either using paravirt or native interfaces, or are not used at all.

Dropping PVHVM from this list is okay, but I'd like to keep PVH. Even if
technically it _is_ HVM without legacy emulation, for the user it is
more some kind of a guest type than just an attribute.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] doc, xen: document hypervisor sysfs nodes for xen

2017-05-22 Thread Juergen Gross
On 22/05/17 15:30, Boris Ostrovsky wrote:
> On 05/22/2017 04:56 AM, Juergen Gross wrote:
>> Today only a few sysfs nodes under /sys/hypervisor/ are documented
>> for Xen in Documentation/ABI/testing/sysfs-hypervisor-pmu. Rename
>> this file to Documentation/ABI/testing/sysfs-hypervisor and add
>> descriptions of the other nodes.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  Documentation/ABI/testing/sysfs-hypervisor | 131 
>> +
>>  Documentation/ABI/testing/sysfs-hypervisor-pmu |  23 -
>>  MAINTAINERS|   1 +
>>  3 files changed, 132 insertions(+), 23 deletions(-)
>>  create mode 100644 Documentation/ABI/testing/sysfs-hypervisor
>>  delete mode 100644 Documentation/ABI/testing/sysfs-hypervisor-pmu
>>
>> diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
>> b/Documentation/ABI/testing/sysfs-hypervisor
> 
> I wonder whether at least some of the non-pmu entries should by now be
> considered stable.

Hmm, do you think the pmu entries are not?

I could:

a) move sysfs-hypervisor as posted here to stable
b) leave the pmu entries in testing and just add another doc for
   the non-pmu entries in stable
c) do some split of the non-pmu entries (which to put where?)
d) or let it all in testing

Next question then: where to put the new guest_type of patch 2?


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
On 22/05/17 15:45, Jan Beulich wrote:
 On 22.05.17 at 10:57,  wrote:
>> --- a/include/xen/xen.h
>> +++ b/include/xen/xen.h
>> @@ -9,8 +9,10 @@ enum xen_domain_type {
>>  
>>  #ifdef CONFIG_XEN
>>  extern enum xen_domain_type xen_domain_type;
>> +extern char *xen_guest_type;
> 
> const char * ?

Yes.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
On 22/05/17 15:17, Boris Ostrovsky wrote:
> 
>> diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
>> b/Documentation/ABI/testing/sysfs-hypervisor
>> index 443196f0aa1c..06850f74ebd4 100644
>> --- a/Documentation/ABI/testing/sysfs-hypervisor
>> +++ b/Documentation/ABI/testing/sysfs-hypervisor
>> @@ -19,6 +19,19 @@ Contact:  xen-de...@lists.xenproject.org
>>  Description:
>>  Compiler which was used to build the Xen hypervisor
>>  
>> +What:   /sys/hypervisor/guest_type
>> +Date:   May 2017
>> +KernelVersion:  4.12
>> +Contact:xen-de...@lists.xenproject.org
>> +Description:
>> +Type of guest:
>> +"native": standard guest type on arm
>> +"HVM": fully virtualized guest (x86)
> 
> Can we ever get this? Xen sysfs node won't load unless we are
> !XEN_NATIVE and then we are either PVH or PVHVM.

When we remove PVHVM as suggested by Andrew, HVM will be kept. :-)


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-next 1/2] xen/x86/alternatives: Do not use sync_core() to serialize I$

2017-05-22 Thread Juergen Gross
On 22/05/17 16:17, Jan Beulich wrote:
 On 22.05.17 at 15:51,  wrote:
>> On 22/05/17 14:38, Jan Beulich wrote:
>> On 19.05.17 at 20:49,  wrote:
 We use sync_core() in the alternatives code to stop speculative
 execution of prefetched instructions because we are potentially changing
 them and don't want to execute stale bytes.

 What it does on most machines is call CPUID which is a serializing
 instruction. And that's expensive.

 However, the instruction cache is serialized when we're on the local CPU
 and are changing the data through the same virtual address.
>>> Do you have the background of this "same virtual address"
>>> constraint?
>>
>> There was a long LKML thread on the subject. 
>> https://lkml.org/lkml/2016/12/3/108 
> 
> Well, interesting reading (and at least part of it was Cc-ed to
> xen-devel iirc), but none of it nor ...
> 
>>> Caches are physically indexed, so I don't see the
>>> connection. Yet if there is one, our stub generation in the
>>> emulator may have an issue.
>>
>> I think https://lkml.org/lkml/2016/12/2/454 is probably the relevant
>> statement.
> 
> ... this one doesn't give any background at all of why the
> virtual address would matter here. Searching the SDM I also can't
> find any statement as to virtual or physical indexing being used
> for any of the caches.

SDM Vol. 3 chapter 11.6 (self modifying code):

A write to a memory location in a code segment that is currently cached
in the processor causes the associated
cache line (or lines) to be invalidated. This check is based on the
physical address of the instruction. In addition,
the P6 family and Pentium processors check whether a write to a code
segment may modify an instruction that has
been prefetched for execution. If the write affects a prefetched
instruction, the prefetch queue is invalidated. This
latter check is based on the linear address of the instruction.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
On 22/05/17 17:23, Konrad Rzeszutek Wilk wrote:
> On Mon, May 22, 2017 at 10:57:00AM +0200, Juergen Gross wrote:
>> Currently there is no reliable user interface inside a Xen guest to
>> determine its type (e.g. HVM, PV or PVH). Instead of letting user mode
>> try to determine this by various rather hacky mechanisms (parsing of
>> boot messages before they are gone, trying to make use of known subtle
>> differences in behavior of some instructions), add a sysfs node
>> /sys/hypervisor/guest_type to explicitly deliver this information as
>> it is known to the kernel.
>>
>> Signed-off-by: Juergen Gross 
>> ---
>>  Documentation/ABI/testing/sysfs-hypervisor | 13 +
> 
> You forgot to CC Greg KH who is the maintainer of that file.

Hmm, right.

>>  arch/arm/xen/enlighten.c   |  3 +++
>>  arch/x86/xen/enlighten.c   |  3 +++
>>  arch/x86/xen/enlighten_hvm.c   |  6 --
>>  arch/x86/xen/enlighten_pv.c|  1 +
>>  drivers/xen/sys-hypervisor.c   | 17 +
>>  include/xen/xen.h  |  2 ++
>>  7 files changed, 43 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/ABI/testing/sysfs-hypervisor 
>> b/Documentation/ABI/testing/sysfs-hypervisor
>> index 443196f0aa1c..06850f74ebd4 100644
>> --- a/Documentation/ABI/testing/sysfs-hypervisor
>> +++ b/Documentation/ABI/testing/sysfs-hypervisor
>> @@ -19,6 +19,19 @@ Contact:  xen-de...@lists.xenproject.org
>>  Description:
>>  Compiler which was used to build the Xen hypervisor
>>  
>> +What:   /sys/hypervisor/guest_type
>> +Date:   May 2017
>> +KernelVersion:  4.12
>> +Contact:xen-de...@lists.xenproject.org
>> +Description:
>> +Type of guest:
>> +"native": standard guest type on arm
>> +"HVM": fully virtualized guest (x86)
>> +"PV": paravirtualized guest (x86)
>> +"PVH": fully virtualized guest without legacy emulation (x86)
>> +"PVHVM": fully virtualized guest using paravirtualized
>> +interfaces (e.g. interrupts, timers) (x86)
> 
> What about KVM? Shouldn't that also be here?

/sys/hypervisor is Xen-only (at least up to now).

> And what should be there if say you boot without Xen, what is the correct 
> value
> on x86 (above 'native' says arm)?

The node isn't existing.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/2] xen: add sysfs node for guest type

2017-05-22 Thread Juergen Gross
On 22/05/17 17:37, Jan Beulich wrote:
>>>> On 22.05.17 at 17:28,  wrote:
>> On 22/05/17 17:23, Konrad Rzeszutek Wilk wrote:
>>> On Mon, May 22, 2017 at 10:57:00AM +0200, Juergen Gross wrote:
>>>> --- a/Documentation/ABI/testing/sysfs-hypervisor
>>>> +++ b/Documentation/ABI/testing/sysfs-hypervisor
>>>> @@ -19,6 +19,19 @@ Contact:xen-de...@lists.xenproject.org 
>>>>  Description:
>>>>Compiler which was used to build the Xen hypervisor
>>>>  
>>>> +What: /sys/hypervisor/guest_type
>>>> +Date: May 2017
>>>> +KernelVersion:4.12
>>>> +Contact:  xen-de...@lists.xenproject.org 
>>>> +Description:
>>>> +  Type of guest:
>>>> +  "native": standard guest type on arm
>>>> +  "HVM": fully virtualized guest (x86)
>>>> +  "PV": paravirtualized guest (x86)
>>>> +  "PVH": fully virtualized guest without legacy emulation (x86)
>>>> +  "PVHVM": fully virtualized guest using paravirtualized
>>>> +  interfaces (e.g. interrupts, timers) (x86)
>>>
>>> What about KVM? Shouldn't that also be here?
>>
>> /sys/hypervisor is Xen-only (at least up to now).
> 
> How that? It's being created by drivers/base/hypervisor.c,
> and iirc had been introduced for s390 originally.

Hmm, cscope fooled me. Maybe it should build the data base for all
architectures...

Thanks for the note!


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 00/10] paravirt: make amount of paravirtualization configurable

2017-05-22 Thread Juergen Gross
On 22/05/17 21:42, Boris Ostrovsky wrote:
> 
>>  49 files changed, 1548 insertions(+), 1477 deletions(-)
>>  create mode 100644 arch/x86/include/asm/paravirt_full.h
>>  create mode 100644 arch/x86/include/asm/paravirt_types_full.h
>>  create mode 100644 arch/x86/kernel/paravirt_full.c
> 
> 
> Do you have this in a tree that can be pulled?

https://github.com/jgross1/linux pvops


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 6/8] xen/pvh: Initialize grant table for PVH guests

2016-10-18 Thread Juergen Gross
On 18/10/16 18:40, Boris Ostrovsky wrote:
> On 10/18/2016 12:08 PM, Juergen Gross wrote:
>> On 14/10/16 22:02, Boris Ostrovsky wrote:
>>> On 10/14/2016 03:51 PM, Konrad Rzeszutek Wilk wrote:
>>>> On Fri, Oct 14, 2016 at 03:43:19PM -0400, Boris Ostrovsky wrote:
>>>>> On 10/14/2016 03:19 PM, Konrad Rzeszutek Wilk wrote:
>>>>>> On Fri, Oct 14, 2016 at 02:05:16PM -0400, Boris Ostrovsky wrote:
>>>>>>
>>>>>> Perhaps add in here:
>>>>>>
>>>>>> PVH is like PV in that there are no PCI devices - which HVM
>>>>>> code would piggyback on to find the Xen PCI platform device and
>>>>>> use its MMIO space to stash the grants in.
>>>>>>
>>>>>> For PVH we balloon out memory and stash the grants in there.
>>>>>>
>>>>>> (Which begs the next question - where and when do we balloon out the
>>>>>> normal memory back in?)
>>>>> Are you saying that we should get back memory that we gave to grant 
>>>>> tables?
>>>> Yes.
>>>>
>>>> In pure HVM that area is MMIO - which hvmloader has balloonned out.
>>>>
>>>> The hvmloader then balloons that number of pages back  at the end of
>>>> guest memory (after 4GB).
>>> We don't do this for PV though, do we?
>> Uuh, kind of. We try to allocate granted pages from the ballooned area.
>> See gnttab_alloc_pages().
> 
> 
> I meant that we don't give memory back for PV.

That's right AFAIK.

>> So for PV(H) we don't need to balloon this memory back in as it was
>> never shadowed by a grant.
> 
> 
> Is it *never* or *may or may not be* shadowed? (I assume "shadowed"
> means "used for" here.)

"shadowed" means a pte is being used for a granted page which was
referencing a RAM page before. So the RAM page is unusable as long as
the grant is active.

A page is shadowed by a grant only if there is no ballooning space
available, so ballooning that page out would serve no purpose as we
would have no way to balloon it in at another address.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-usb.c/usbif.h: fix ARM64 build issue

2016-10-19 Thread Juergen Gross
On 19/10/16 10:50, Wei Liu wrote:
> On Tue, Oct 18, 2016 at 12:09:55PM -0700, Stefano Stabellini wrote:
>> Hi all,
>>
>> While I was investigating the recent libxl ARM64 build issue, I found
>> another ARM64 build problem. This one is in QEMU:
>>
>>
>>   CChw/usb/xen-usb.o
>> hw/usb/xen-usb.c: In function ‘usbback_gnttab_map’:
>> hw/usb/xen-usb.c:163:56: error: ‘PAGE_SIZE’ undeclared (first use in this 
>> function)
>>  (unsigned)usbback_req->req.seg[i].length > PAGE_SIZE) {
>> ^
>> hw/usb/xen-usb.c:163:56: note: each undeclared identifier is reported only 
>> once for each function it appears in
>> In file included from hw/usb/xen-usb.c:36:0:
>> hw/usb/xen-usb.c: In function ‘usbback_alloc’:
>> /users/stefanos/xen/tools/../tools/include/xen/io/usbif.h:229:56: error: 
>> ‘PAGE_SIZE’ undeclared (first use in this function)
>>  #define USB_URB_RING_SIZE __CONST_RING_SIZE(usbif_urb, PAGE_SIZE)
>> ^
>> /users/stefanos/xen/tools/../tools/include/xen/io/ring.h:45:23: note: in 
>> definition of macro ‘__RD32’
>>  #define __RD32(_x) (((_x) & 0x) ? __RD16((_x)>>16)<<16 : __RD16(_x))
>>^
>> /users/stefanos/xen/tools/../tools/include/xen/io/usbif.h:229:27: note: in 
>> expansion of macro ‘__CONST_RING_SIZE’
>>  #define USB_URB_RING_SIZE __CONST_RING_SIZE(usbif_urb, PAGE_SIZE)
>>^
>> hw/usb/xen-usb.c:1029:51: note: in expansion of macro ‘USB_URB_RING_SIZE’
>>  max_grants = USBIF_MAX_SEGMENTS_PER_REQUEST * USB_URB_RING_SIZE + 2;
>>^
>> make: *** [hw/usb/xen-usb.o] Error 1
>>
>>
>> It is caused by PAGE_SIZE being utilized in
>> xen/include/public/io/usbif.h, but undefined on ARM64 (in userspace).
>> QEMU includes xen/include/public/io/usbif.h directly, therefore
>> hw/usb/xen-usb.c doesn't build.
>>
>> I don't think we should be using "PAGE_SIZE" in public/io/usbif.h, but
>> that file is present in too many Xen releases to ignore now. So I am
>> going to work around the issue in QEMU.
>>
> 
> Actually it was introduced in 4.7.
> 
>> ---
>>
>> xen-usb.c does not build on ARM64 because it includes xen/io/usbif.h
>> which uses PAGE_SIZE without defining it. The header file has been
>> shipped with too many Xen releases to be able to solve the problem at
>> the source.
>>
>> This patch define PAGE_SIZE as XC_PAGE_SIZE when undefined.
>>
>> Signed-off-by: Stefano Stabellini 
>>
>> diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
>> index 174d715..ca9df87 100644
>> --- a/hw/usb/xen-usb.c
>> +++ b/hw/usb/xen-usb.c
>> @@ -34,6 +34,11 @@
>>  #include "qapi/qmp/qstring.h"
>>  
>>  #include 
>> +
>> +/* xen/io/usbif.h references PAGE_SIZE without defining it */
>> +#ifndef PAGE_SIZE
>> +#define PAGE_SIZE XC_PAGE_SIZE
>> +#endif
>>  #include 
> 
> 
> Can we just change the macro to
> 
>   #define USB_CONN_RING_SIZE(pg_size) __CONST_RING_SIZE(usbif_conn, (pg_size)
> 
> ?

Not easily. On x86 qemu builds just fine and modifying the macro to take
a parameter would break the build.

We could do something like:

#define USB_CONN_RING_SZ(pgsz) __CONST_RING_SIZE(usbif_conn, (pgsz)
#ifdef PAGE_SIZE
#define USB_CONN_RING_SIZE USB_CONN_RING_SZ(PAGE_SIZE)
#endif

And then modify qemu to use USB_CONN_RING_SZ()


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] xen-usb.c/usbif.h: fix ARM64 build issue

2016-10-19 Thread Juergen Gross
On 19/10/16 11:22, Wei Liu wrote:
> On Wed, Oct 19, 2016 at 11:09:21AM +0200, Juergen Gross wrote:
>> On 19/10/16 10:50, Wei Liu wrote:
>>> On Tue, Oct 18, 2016 at 12:09:55PM -0700, Stefano Stabellini wrote:
>>>> Hi all,
>>>>
>>>> While I was investigating the recent libxl ARM64 build issue, I found
>>>> another ARM64 build problem. This one is in QEMU:
>>>>
>>>>
>>>>   CChw/usb/xen-usb.o
>>>> hw/usb/xen-usb.c: In function ‘usbback_gnttab_map’:
>>>> hw/usb/xen-usb.c:163:56: error: ‘PAGE_SIZE’ undeclared (first use in this 
>>>> function)
>>>>  (unsigned)usbback_req->req.seg[i].length > PAGE_SIZE) {
>>>> ^
>>>> hw/usb/xen-usb.c:163:56: note: each undeclared identifier is reported only 
>>>> once for each function it appears in
>>>> In file included from hw/usb/xen-usb.c:36:0:
>>>> hw/usb/xen-usb.c: In function ‘usbback_alloc’:
>>>> /users/stefanos/xen/tools/../tools/include/xen/io/usbif.h:229:56: error: 
>>>> ‘PAGE_SIZE’ undeclared (first use in this function)
>>>>  #define USB_URB_RING_SIZE __CONST_RING_SIZE(usbif_urb, PAGE_SIZE)
>>>> ^
>>>> /users/stefanos/xen/tools/../tools/include/xen/io/ring.h:45:23: note: in 
>>>> definition of macro ‘__RD32’
>>>>  #define __RD32(_x) (((_x) & 0x) ? __RD16((_x)>>16)<<16 : 
>>>> __RD16(_x))
>>>>^
>>>> /users/stefanos/xen/tools/../tools/include/xen/io/usbif.h:229:27: note: in 
>>>> expansion of macro ‘__CONST_RING_SIZE’
>>>>  #define USB_URB_RING_SIZE __CONST_RING_SIZE(usbif_urb, PAGE_SIZE)
>>>>^
>>>> hw/usb/xen-usb.c:1029:51: note: in expansion of macro ‘USB_URB_RING_SIZE’
>>>>  max_grants = USBIF_MAX_SEGMENTS_PER_REQUEST * USB_URB_RING_SIZE + 2;
>>>>^
>>>> make: *** [hw/usb/xen-usb.o] Error 1
>>>>
>>>>
>>>> It is caused by PAGE_SIZE being utilized in
>>>> xen/include/public/io/usbif.h, but undefined on ARM64 (in userspace).
>>>> QEMU includes xen/include/public/io/usbif.h directly, therefore
>>>> hw/usb/xen-usb.c doesn't build.
>>>>
>>>> I don't think we should be using "PAGE_SIZE" in public/io/usbif.h, but
>>>> that file is present in too many Xen releases to ignore now. So I am
>>>> going to work around the issue in QEMU.
>>>>
>>>
>>> Actually it was introduced in 4.7.
>>>
>>>> ---
>>>>
>>>> xen-usb.c does not build on ARM64 because it includes xen/io/usbif.h
>>>> which uses PAGE_SIZE without defining it. The header file has been
>>>> shipped with too many Xen releases to be able to solve the problem at
>>>> the source.
>>>>
>>>> This patch define PAGE_SIZE as XC_PAGE_SIZE when undefined.
>>>>
>>>> Signed-off-by: Stefano Stabellini 
>>>>
>>>> diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
>>>> index 174d715..ca9df87 100644
>>>> --- a/hw/usb/xen-usb.c
>>>> +++ b/hw/usb/xen-usb.c
>>>> @@ -34,6 +34,11 @@
>>>>  #include "qapi/qmp/qstring.h"
>>>>  
>>>>  #include 
>>>> +
>>>> +/* xen/io/usbif.h references PAGE_SIZE without defining it */
>>>> +#ifndef PAGE_SIZE
>>>> +#define PAGE_SIZE XC_PAGE_SIZE
>>>> +#endif
>>>>  #include 
>>>
>>>
>>> Can we just change the macro to
>>>
>>>   #define USB_CONN_RING_SIZE(pg_size) __CONST_RING_SIZE(usbif_conn, 
>>> (pg_size)
>>>
>>> ?
>>
>> Not easily. On x86 qemu builds just fine and modifying the macro to take
>> a parameter would break the build.
>>
>> We could do something like:
>>
>> #define USB_CONN_RING_SZ(pgsz) __CONST_RING_SIZE(usbif_conn, (pgsz)
>> #ifdef PAGE_SIZE
>> #define USB_CONN_RING_SIZE USB_CONN_RING_SZ(PAGE_SIZE)
>> #endif
>>
>> And then modify qemu to use USB_CONN_RING_SZ()
>>
> 
> This would also be problematic I'm afraid.  One's build could break
> randomly depending on the order of header files inclusion. I think it
> would even be harder to debug such bug than the existing one.
> 
> Inspired by Andrew on IRC. I think we can do:
> 
> /* Backward-compatible macro, use _SIZE2 variant in new code instead */
> #define USB_CONN_RING_SIZE __CONST_RING_SIZE(usbif_conn, 4096)
> 
> #define USB_CONN_RING_SIZE2(pgsize) __CONST_RING_SIZE(usbif_conn, (pgsize))
> 
> The assumption is that all existing implementations of the said protocol
> use 4k page size. Juergen, do you think this would work?

In theory, yes. I'd still prefer using PAGE_SIZE if it is defined.

What about:

#ifdef PAGE_SIZE
#define USB_RING_PAGE_SIZE PAGE_SIZE
#else
#define USB_RING_PAGE_SIZE 4096
#endif
#define USB_CONN_RING_SIZE __CONST_RING_SIZE(usbif_conn, USB_RING_PAGE_SIZE)
#define USB_CONN_RING_SIZE2(pgsize) __CONST_RING_SIZE(usbif_conn, (pgsize))


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 0/2] Fix PAGE_SIZE issues in public headers

2016-10-19 Thread Juergen Gross
On 19/10/16 12:46, Wei Liu wrote:
> Reference to PAGE_SIZE slipped in to two public header files. QEMU build on
> ARM64 is broken by this.
> 
> This series tries to fix that. The approach is discussed at [0].
> 
> QEMU will need to update its copy of usbif.h.

There is no QEMU copy of Xen headers. This is the reason why an
imcompatible change is not possible. With the patches there is no
longer an urgent need for qemu to be modified.

> It would be good if we can have this fixed either by this series or some other
> method for 4.8 to avoid releasing problematic headers for another release.

Both patches:

Reviewed-by: Juergen Gross 


Thanks,

Juergen

> 
> Wei.
> 
> [0] 
> 
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Wei Liu 
> Cc: Juergen Gross 
> 
> Wei Liu (2):
>   usbif.h: don't require PAGE_SIZE to be present
>   vscsiif.h: don't require PAGE_SIZE to be present
> 
>  xen/include/public/io/usbif.h   | 24 ++--
>  xen/include/public/io/vscsiif.h | 18 +-
>  2 files changed, 39 insertions(+), 3 deletions(-)
> 
> --
> 2.1.4
> 
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4 0/5] implement vcpu preempted check

2016-10-19 Thread Juergen Gross
On 19/10/16 12:20, Pan Xinhui wrote:
> change from v3:
>   add x86 vcpu preempted check patch
> change from v2:
>   no code change, fix typos, update some comments
> change from v1:
>   a simplier definition of default vcpu_is_preempted
>   skip mahcine type check on ppc, and add config. remove dedicated macro.
>   add one patch to drop overload of rwsem_spin_on_owner and 
> mutex_spin_on_owner. 
>   add more comments
>   thanks boqun and Peter's suggestion.
> 
> This patch set aims to fix lock holder preemption issues.
> 
> test-case:
> perf record -a perf bench sched messaging -g 400 -p && perf report
> 
> 18.09%  sched-messaging  [kernel.vmlinux]  [k] osq_lock
> 12.28%  sched-messaging  [kernel.vmlinux]  [k] rwsem_spin_on_owner
>  5.27%  sched-messaging  [kernel.vmlinux]  [k] mutex_unlock
>  3.89%  sched-messaging  [kernel.vmlinux]  [k] wait_consider_task
>  3.64%  sched-messaging  [kernel.vmlinux]  [k] _raw_write_lock_irq
>  3.41%  sched-messaging  [kernel.vmlinux]  [k] mutex_spin_on_owner.is
>  2.49%  sched-messaging  [kernel.vmlinux]  [k] system_call
> 
> We introduce interface bool vcpu_is_preempted(int cpu) and use it in some spin
> loops of osq_lock, rwsem_spin_on_owner and mutex_spin_on_owner.
> These spin_on_onwer variant also cause rcu stall before we apply this patch 
> set
> 
> We also have observed some performace improvements.
> 
> PPC test result:
> 
> 1 copy - 0.94%
> 2 copy - 7.17%
> 4 copy - 11.9%
> 8 copy -  3.04%
> 16 copy - 15.11%
> 
> details below:
> Without patch:
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2188223.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1804433.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1237257.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1032658.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   768000.0 KBps  (30.1 
> s, 1 samples)
> 
> With patch: 
> 
> 1 copy - File Write 4096 bufsize 8000 maxblocks  2209189.0 KBps  (30.0 s, 
> 1 samples)
> 2 copy - File Write 4096 bufsize 8000 maxblocks  1943816.0 KBps  (30.0 s, 
> 1 samples)
> 4 copy - File Write 4096 bufsize 8000 maxblocks  1405591.0 KBps  (30.0 s, 
> 1 samples)
> 8 copy - File Write 4096 bufsize 8000 maxblocks  1065080.0 KBps  (30.0 s, 
> 1 samples)
> 16 copy - File Write 4096 bufsize 8000 maxblocks   904762.0 KBps  (30.0 
> s, 1 samples)
> 
> X86 test result:
>   test-case   after-patch   before-patch
> Execl Throughput   |18307.9 lps  |11701.6 lps 
> File Copy 1024 bufsize 2000 maxblocks  |  1352407.3 KBps |   790418.9 KBps
> File Copy 256 bufsize 500 maxblocks|   367555.6 KBps |   222867.7 KBps
> File Copy 4096 bufsize 8000 maxblocks  |  3675649.7 KBps |  1780614.4 KBps
> Pipe Throughput| 11872208.7 lps  | 11855628.9 lps 
> Pipe-based Context Switching   |  1495126.5 lps  |  1490533.9 lps 
> Process Creation   |29881.2 lps  |28572.8 lps 
> Shell Scripts (1 concurrent)   |23224.3 lpm  |22607.4 lpm 
> Shell Scripts (8 concurrent)   | 3531.4 lpm  | 3211.9 lpm 
> System Call Overhead   | 10385653.0 lps  | 10419979.0 lps 
> 
> Pan Xinhui (5):
>   kernel/sched: introduce vcpu preempted check interface
>   locking/osq: Drop the overload of osq_lock()
>   kernel/locking: Drop the overload of {mutex,rwsem}_spin_on_owner
>   powerpc/spinlock: support vcpu preempted check
>   x86, kvm: support vcpu preempted check

The attached patch adds Xen support for x86. Please tell me whether you
want to add this patch to your series or if I should post it when your
series has been accepted.

You can add my

Tested-by: Juergen Gross 

for patches 1-3 and 5 (paravirt parts only).


Juergen

> 
>  arch/powerpc/include/asm/spinlock.h   |  8 
>  arch/x86/include/asm/paravirt_types.h |  6 ++
>  arch/x86/include/asm/spinlock.h   |  8 
>  arch/x86/include/uapi/asm/kvm_para.h  |  3 ++-
>  arch/x86/kernel/kvm.c | 11 +++
>  arch/x86/kernel/paravirt.c| 11 +++
>  arch/x86/kvm/x86.c| 12 
>  include/linux/sched.h | 12 
>  kernel/locking/mutex.c    | 15 +--
>  kernel/locking/osq_lock.c | 10 +-
>  kernel/locking/rwsem-xadd.c   | 16 +---
>  11 files changed, 105 insertions(+), 7 deletions(-)
> 

>From c79b86d00a812d6207ef788d453e2d0289ef22a0 Mon Sep 

Re: [Xen-devel] [PATCH v2 0/2] Remove PAGE_SIZE from public headers

2016-10-19 Thread Juergen Gross
On 19/10/16 21:21, stef...@aporeto.com wrote:
> Reference to PAGE_SIZE slipped in to two public header files. QEMU build on
> ARM64 is broken by this. PAGE_SIZE should not be used because it could
> be undefined or it could be defined differently on different operating
> systems.
> 
> Stefano Stabellini (2):
>   usbif.h: replace PAGE_SIZE with USBIF_RING_SIZE
>   vscsiif.h: replace PAGE_SIZE with VSCSIIF_PAGE_SIZE
> 
>  xen/include/public/io/usbif.h   | 5 +++--
>  xen/include/public/io/vscsiif.h | 3 ++-
>  2 files changed, 5 insertions(+), 3 deletions(-)
> 

In principle I'm okay with these changes. But wouldn't it be
better to add

#define XEN_RING_PAGE_SIZE 4096

to xen/include/public/io/ring.h instead of having each interface
file its own definition?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH RFC 0/2] Fix PAGE_SIZE issues in public headers

2016-10-19 Thread Juergen Gross
On 19/10/16 21:20, Stefano Stabellini wrote:
> On Wed, 19 Oct 2016, Stefano Stabellini wrote:
>> On Wed, 19 Oct 2016, Juergen Gross wrote:
>>> On 19/10/16 12:46, Wei Liu wrote:
>>>> Reference to PAGE_SIZE slipped in to two public header files. QEMU build on
>>>> ARM64 is broken by this.
>>>>
>>>> This series tries to fix that. The approach is discussed at [0].
>>>>
>>>> QEMU will need to update its copy of usbif.h.
>>>
>>> There is no QEMU copy of Xen headers. This is the reason why an
>>> imcompatible change is not possible. With the patches there is no
>>> longer an urgent need for qemu to be modified.
>>
>> Yes, as long as we backport the fixes to the Xen stable trees as soon as
>> possible.
> 
> I spoke too soon. Actually QEMU xen-usb.c references PAGE_SIZE
> directly, which also needs to be fixed. In this case, I think I would
> use XC_PAGE_SIZE, because we are already using it in that file and
> because this way we don't tie the fix with the public headers fixes in
> Xen.
> 
> ---
> 
> xen-usb: do not reference PAGE_SIZE
> 
> PAGE_SIZE is undefined on ARM64. Use XC_PAGE_SIZE instead, which is
> always 4096 even when page granularity is 64K.
> 
> For this to actually work with 64K pages, more changes are required.
> 
> Signed-off-by: Stefano Stabellini 

Reviewed-by: Juergen Gross 

> 
> diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
> index 174d715..de2ebd6 100644
> --- a/hw/usb/xen-usb.c
> +++ b/hw/usb/xen-usb.c
> @@ -160,7 +160,7 @@ static int usbback_gnttab_map(struct usbback_req 
> *usbback_req)
>  
>  for (i = 0; i < nr_segs; i++) {
>  if ((unsigned)usbback_req->req.seg[i].offset +
> -(unsigned)usbback_req->req.seg[i].length > PAGE_SIZE) {
> +(unsigned)usbback_req->req.seg[i].length > XC_PAGE_SIZE) {
>  xen_be_printf(xendev, 0, "segment crosses page boundary\n");
>  return -EINVAL;
>  }
> @@ -183,7 +183,7 @@ static int usbback_gnttab_map(struct usbback_req 
> *usbback_req)
>  
>  for (i = 0; i < usbback_req->nr_buffer_segs; i++) {
>  seg = usbback_req->req.seg + i;
> -addr = usbback_req->buffer + i * PAGE_SIZE + seg->offset;
> +addr = usbback_req->buffer + i * XC_PAGE_SIZE + seg->offset;
>  qemu_iovec_add(&usbback_req->packet.iov, addr, seg->length);
>  }
>  }
> 
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5 7/9] x86, xen: support vcpu preempted check

2016-10-20 Thread Juergen Gross
Corrected xen-devel mailing list address, added other Xen maintainers

On 20/10/16 23:27, Pan Xinhui wrote:
> From: Juergen Gross 
> 
> Support the vcpu_is_preempted() functionality under Xen. This will
> enhance lock performance on overcommitted hosts (more runnable vcpus
> than physical cpus in the system) as doing busy waits for preempted
> vcpus will hurt system performance far worse than early yielding.
> 
> A quick test (4 vcpus on 1 physical cpu doing a parallel build job
> with "make -j 8") reduced system time by about 5% with this patch.
> 
> Signed-off-by: Juergen Gross 
> Signed-off-by: Pan Xinhui 
> ---
>  arch/x86/xen/spinlock.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c
> index 3d6e006..74756bb 100644
> --- a/arch/x86/xen/spinlock.c
> +++ b/arch/x86/xen/spinlock.c
> @@ -114,7 +114,6 @@ void xen_uninit_lock_cpu(int cpu)
>   per_cpu(irq_name, cpu) = NULL;
>  }
>  
> -
>  /*
>   * Our init of PV spinlocks is split in two init functions due to us
>   * using paravirt patching and jump labels patching and having to do
> @@ -137,6 +136,8 @@ void __init xen_init_spinlocks(void)
>   pv_lock_ops.queued_spin_unlock = 
> PV_CALLEE_SAVE(__pv_queued_spin_unlock);
>   pv_lock_ops.wait = xen_qlock_wait;
>   pv_lock_ops.kick = xen_qlock_kick;
> +
> + pv_lock_ops.vcpu_is_preempted = xen_vcpu_stolen;
>  }
>  
>  /*
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] libxl: avoid considering pCPUs outside of the cpupool during NUMA placement

2016-10-21 Thread Juergen Gross
On 21/10/16 12:29, Wei Liu wrote:
> On Fri, Oct 21, 2016 at 11:56:14AM +0200, Dario Faggioli wrote:
>> During NUMA automatic placement, the information
>> of how many vCPUs can run on what NUMA nodes is used,
>> in order to spread the load as evenly as possible.
>>
>> Such information is derived from vCPU hard and soft
>> affinity, but that is not enough. In fact, affinity
>> can be set to be a superset of the pCPUs that belongs
>> to the cpupool in which a domain is but, of course,
>> the domain will never run on pCPUs outside of its
>> cpupool.
>>
>> Take this into account in the placement algorithm.
>>
>> Signed-off-by: Dario Faggioli 
>> Reported-by: George Dunlap 
>> ---
>> Cc: Ian Jackson 
>> Cc: Wei Liu 
>> Cc: George Dunlap 
>> Cc: Juergen Gross 
>> Cc: Anshul Makkar 
>> ---
>> Wei, this is bugfix, so I think it should go in 4.8.
>>
> 
> Yes. I agree.
> 
>> Ian, this is bugfix, so I think it is a backporting candidate.
>>
>> Also, note that this function does not respect the libxl coding style, as far
>> as error handling is concerned. However, given that I'm asking for it to go 
>> in
>> now and to be backported, I've tried to keep the changes to the minimum.
>>
>> I'm up for a follow up patch for 4.9 to make the style compliant.
>>
>> Thanks, Dario
>> ---
>>  tools/libxl/libxl_numa.c |   25 ++---
>>  1 file changed, 22 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
>> index 33289d5..f2a719d 100644
>> --- a/tools/libxl/libxl_numa.c
>> +++ b/tools/libxl/libxl_numa.c
>> @@ -186,9 +186,12 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
>> libxl_cputopology *tinfo,
>>  {
>>  libxl_dominfo *dinfo = NULL;
>>  libxl_bitmap dom_nodemap, nodes_counted;
>> +libxl_cpupoolinfo cpupool_info;
>>  int nr_doms, nr_cpus;
>>  int i, j, k;
>>  
>> +libxl_cpupoolinfo_init(&cpupool_info);
>> +
> 
> Please move this into the loop below, see (*).

Why? libxl_cpupoolinfo_dispose() will clear cpupool_info.

> 
>>  dinfo = libxl_list_domain(CTX, &nr_doms);
>>  if (dinfo == NULL)
>>  return ERROR_FAIL;
>> @@ -205,12 +208,18 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
>> libxl_cputopology *tinfo,
>>  }
>>  
>>  for (i = 0; i < nr_doms; i++) {
>> -libxl_vcpuinfo *vinfo;
>> -int nr_dom_vcpus;
>> +libxl_vcpuinfo *vinfo = NULL;
> 
> This is not necessary because vinfo is written right away.

No, the first "goto next" is before vinfo is being written.

> 
>> +int cpupool, nr_dom_vcpus;
>> +
> 
> (*) here.
> 
>> +cpupool = libxl__domain_cpupool(gc, dinfo[i].domid);
>> +if (cpupool < 0)
>> +goto next;
>> +if (libxl_cpupool_info(CTX, &cpupool_info, cpupool))
>> +goto next;
>>  
>>  vinfo = libxl_list_vcpu(CTX, dinfo[i].domid, &nr_dom_vcpus, 
>> &nr_cpus);
>>  if (vinfo == NULL)
>> -continue;
>> +goto next;
>>  
>>  /* Retrieve the domain's node-affinity map */
>>  libxl_domain_get_nodeaffinity(CTX, dinfo[i].domid, &dom_nodemap);
>> @@ -220,6 +229,12 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
>> libxl_cputopology *tinfo,
>>   * For each vcpu of each domain, it must have both vcpu-affinity
>>   * and node-affinity to (a pcpu belonging to) a certain node to
>>   * cause an increment in the corresponding element of the array.
>> + *
>> + * Note that we also need to check whether the cpu actually
>> + * belongs to the domain's cpupool (the cpupool of the domain
>> + * being checked). In fact, it could be that the vcpu has 
>> affinity
>> + * with cpus in suitable_cpumask, but that are not in its own
>> + * cpupool, and we don't want to consider those!
>>   */
>>  libxl_bitmap_set_none(&nodes_counted);
>>  libxl_for_each_set_bit(k, vinfo[j].cpumap) {
>> @@ -228,6 +243,7 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
>> libxl_cputopology *tinfo,
>>  int node = tinfo[k].node;
>>  
>>  if (libxl_bitmap_test(suitable_cpumap, k) &&
>> +libxl_bitmap_test(&cpupool_

Re: [Xen-devel] [PATCH] libxl: avoid considering pCPUs outside of the cpupool during NUMA placement

2016-10-21 Thread Juergen Gross
On 21/10/16 11:56, Dario Faggioli wrote:
> During NUMA automatic placement, the information
> of how many vCPUs can run on what NUMA nodes is used,
> in order to spread the load as evenly as possible.
> 
> Such information is derived from vCPU hard and soft
> affinity, but that is not enough. In fact, affinity
> can be set to be a superset of the pCPUs that belongs
> to the cpupool in which a domain is but, of course,
> the domain will never run on pCPUs outside of its
> cpupool.
> 
> Take this into account in the placement algorithm.
> 
> Signed-off-by: Dario Faggioli 
> Reported-by: George Dunlap 

Reviewed-by: Juergen Gross 

> ---
> Cc: Ian Jackson 
> Cc: Wei Liu 
> Cc: George Dunlap 
> Cc: Juergen Gross 
> Cc: Anshul Makkar 
> ---
> Wei, this is bugfix, so I think it should go in 4.8.
> 
> Ian, this is bugfix, so I think it is a backporting candidate.
> 
> Also, note that this function does not respect the libxl coding style, as far
> as error handling is concerned. However, given that I'm asking for it to go in
> now and to be backported, I've tried to keep the changes to the minimum.
> 
> I'm up for a follow up patch for 4.9 to make the style compliant.
> 
> Thanks, Dario
> ---
>  tools/libxl/libxl_numa.c |   25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_numa.c b/tools/libxl/libxl_numa.c
> index 33289d5..f2a719d 100644
> --- a/tools/libxl/libxl_numa.c
> +++ b/tools/libxl/libxl_numa.c
> @@ -186,9 +186,12 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
> libxl_cputopology *tinfo,
>  {
>  libxl_dominfo *dinfo = NULL;
>  libxl_bitmap dom_nodemap, nodes_counted;
> +libxl_cpupoolinfo cpupool_info;
>  int nr_doms, nr_cpus;
>  int i, j, k;
>  
> +libxl_cpupoolinfo_init(&cpupool_info);
> +
>  dinfo = libxl_list_domain(CTX, &nr_doms);
>  if (dinfo == NULL)
>  return ERROR_FAIL;
> @@ -205,12 +208,18 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
> libxl_cputopology *tinfo,
>  }
>  
>  for (i = 0; i < nr_doms; i++) {
> -libxl_vcpuinfo *vinfo;
> -int nr_dom_vcpus;
> +libxl_vcpuinfo *vinfo = NULL;
> +int cpupool, nr_dom_vcpus;
> +
> +cpupool = libxl__domain_cpupool(gc, dinfo[i].domid);
> +if (cpupool < 0)
> +goto next;
> +if (libxl_cpupool_info(CTX, &cpupool_info, cpupool))
> +goto next;
>  
>  vinfo = libxl_list_vcpu(CTX, dinfo[i].domid, &nr_dom_vcpus, 
> &nr_cpus);
>  if (vinfo == NULL)
> -continue;
> +goto next;
>  
>  /* Retrieve the domain's node-affinity map */
>  libxl_domain_get_nodeaffinity(CTX, dinfo[i].domid, &dom_nodemap);
> @@ -220,6 +229,12 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
> libxl_cputopology *tinfo,
>   * For each vcpu of each domain, it must have both vcpu-affinity
>   * and node-affinity to (a pcpu belonging to) a certain node to
>   * cause an increment in the corresponding element of the array.
> + *
> + * Note that we also need to check whether the cpu actually
> + * belongs to the domain's cpupool (the cpupool of the domain
> + * being checked). In fact, it could be that the vcpu has 
> affinity
> + * with cpus in suitable_cpumask, but that are not in its own
> + * cpupool, and we don't want to consider those!
>   */
>  libxl_bitmap_set_none(&nodes_counted);
>  libxl_for_each_set_bit(k, vinfo[j].cpumap) {
> @@ -228,6 +243,7 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
> libxl_cputopology *tinfo,
>  int node = tinfo[k].node;
>  
>  if (libxl_bitmap_test(suitable_cpumap, k) &&
> +libxl_bitmap_test(&cpupool_info.cpumap, k) &&
>  libxl_bitmap_test(&dom_nodemap, node) &&
>  !libxl_bitmap_test(&nodes_counted, node)) {
>  libxl_bitmap_set(&nodes_counted, node);
> @@ -236,7 +252,10 @@ static int nr_vcpus_on_nodes(libxl__gc *gc, 
> libxl_cputopology *tinfo,
>  }
>  }
>  
> + next:
> +libxl_cpupoolinfo_dispose(&cpupool_info);
>  libxl_vcpuinfo_list_free(vinfo, nr_dom_vcpus);
> +vinfo = NULL;
>  }
>  
>  libxl_bitmap_dispose(&dom_nodemap);
> 
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/8] xen/pvh: Import PVH-related Xen public interfaces

2016-10-21 Thread Juergen Gross
On 14/10/16 20:05, Boris Ostrovsky wrote:
> Signed-off-by: Boris Ostrovsky 

Reviewed-by: Juergen Gross 


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 2/5] stop_machine: yield CPU during stop machine

2016-10-21 Thread Juergen Gross
On 21/10/16 14:05, Peter Zijlstra wrote:
> On Fri, Oct 21, 2016 at 01:58:55PM +0200, Christian Borntraeger wrote:
>> stop_machine can take a very long time if the hypervisor does
>> overcommitment for guest CPUs. When waiting for "the one", lets
>> give up our CPU by using the new cpu_relax_yield.
> 
> This seems something that would apply to most other virt stuff. Lets Cc
> a few more lists for that.

Corrected xen-devel mail address.


Juergen

> 
>> Signed-off-by: Christian Borntraeger 
>> ---
>>  kernel/stop_machine.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/kernel/stop_machine.c b/kernel/stop_machine.c
>> index ec9ab2f..1eb8266 100644
>> --- a/kernel/stop_machine.c
>> +++ b/kernel/stop_machine.c
>> @@ -194,7 +194,7 @@ static int multi_cpu_stop(void *data)
>>  /* Simple state machine */
>>  do {
>>  /* Chill out and ensure we re-read multi_stop_state. */
>> -cpu_relax();
>> +cpu_relax_yield();
>>  if (msdata->state != curstate) {
>>  curstate = msdata->state;
>>  switch (curstate) {
>> -- 
>> 2.5.5
>>
> ___
> Virtualization mailing list
> virtualizat...@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/virtualization
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Remove that
statement.

Signed-off-by: Juergen Gross 
---
 docs/misc/xenstore.txt | 2 --
 1 file changed, 2 deletions(-)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index c9f4a05..ae1b6a8 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -229,8 +229,6 @@ TRANSACTION_START   |   |
tx_id request header field.  When transaction is started whole
db is copied; reads and writes happen on the copy.
It is not legal to send non-0 tx_id in TRANSACTION_START.
-   Currently xenstored has the bug that after 2^32 transactions
-   it will allocate the transid 0 for an actual transaction.
 
 TRANSACTION_ENDT|
 TRANSACTION_ENDF|
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
On 24/10/16 13:41, Wei Liu wrote:
> On Mon, Oct 24, 2016 at 01:27:17PM +0200, Juergen Gross wrote:
>> docs/misc/xenstore.txt states that xenstored will use "0" as a valid
>> transaction id after 2^32 transactions. This is not true. Remove that
>> statement.
>>
>> Signed-off-by: Juergen Gross 
> 
> Can you point me to the relevant code snippet? Better still I would like
> to see why it is the case in commit message.

Sure: tools/xenstore/xenstored_transaction.c

do_transaction_start():
...
/* Pick an unused transaction identifier. */
do {
trans->id = conn->next_transaction_id;
exists = transaction_lookup(conn,
conn->next_transaction_id++);
} while (!IS_ERR(exists));

It should be noted here that conn->next_transaction_id is initialized
to be 0. So the error would occur for the first transaction, too.


Juergen

> 
>> ---
>>  docs/misc/xenstore.txt | 2 --
>>  1 file changed, 2 deletions(-)
>>
>> diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
>> index c9f4a05..ae1b6a8 100644
>> --- a/docs/misc/xenstore.txt
>> +++ b/docs/misc/xenstore.txt
>> @@ -229,8 +229,6 @@ TRANSACTION_START|   
>> |
>>  tx_id request header field.  When transaction is started whole
>>  db is copied; reads and writes happen on the copy.
>>  It is not legal to send non-0 tx_id in TRANSACTION_START.
>> -Currently xenstored has the bug that after 2^32 transactions
>> -it will allocate the transid 0 for an actual transaction.
>>  
>>  TRANSACTION_END T|
>>  TRANSACTION_END F|
>> -- 
>> 2.6.6
>>
> 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
On 24/10/16 14:06, Andrew Cooper wrote:
> On 24/10/16 12:49, Juergen Gross wrote:
>> On 24/10/16 13:41, Wei Liu wrote:
>>> On Mon, Oct 24, 2016 at 01:27:17PM +0200, Juergen Gross wrote:
>>>> docs/misc/xenstore.txt states that xenstored will use "0" as a valid
>>>> transaction id after 2^32 transactions. This is not true. Remove that
>>>> statement.
>>>>
>>>> Signed-off-by: Juergen Gross 
>>> Can you point me to the relevant code snippet? Better still I would like
>>> to see why it is the case in commit message.
>> Sure: tools/xenstore/xenstored_transaction.c
>>
>> do_transaction_start():
>> ...
>> /* Pick an unused transaction identifier. */
>> do {
>> trans->id = conn->next_transaction_id;
>> exists = transaction_lookup(conn,
>> conn->next_transaction_id++);
>> } while (!IS_ERR(exists));
>>
>> It should be noted here that conn->next_transaction_id is initialized
>> to be 0. So the error would occur for the first transaction, too.
> 
> Cxenstored isn't the only xenstored implementation, and I can't see
> anything in the Ocaml version which mitigates this issue.  Furthermore,
> because Ocaml's int is 31 bits or 63 bits, I suspect a 64bit oxenstored
> will become unusable when the transaction id hits 4 billion and an a
> truncation occurs when writing the id into the ring.  A 32bit oxenstored
> only uses half the available transaction id space, and does wrap around
> to 0.

Okay, so either oxenstored should be corrected by some ocaml capable
developer, or I can send a patch which will limit the bug statement to
oxenstored.

Such a simple to fix problem should not be just mentioned in some text
file, but it should be fixed! Leaving the text unmodified is no option
IMHO.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Only
oxenstored has this bug.

Signed-off-by: Juergen Gross 
---
v2: oxenstored has this bug
---
 docs/misc/xenstore.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index c9f4a05..9a02b9d 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -229,8 +229,10 @@ TRANSACTION_START  |   |
tx_id request header field.  When transaction is started whole
db is copied; reads and writes happen on the copy.
It is not legal to send non-0 tx_id in TRANSACTION_START.
-   Currently xenstored has the bug that after 2^32 transactions
+   Currently oxenstored has the bug that after 2^32 transactions
it will allocate the transid 0 for an actual transaction.
+   xenstored (the implementation written in C) doesn't have this
+   bug.
 
 TRANSACTION_ENDT|
 TRANSACTION_ENDF|
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
On 24/10/16 15:57, Jan Beulich wrote:
 On 24.10.16 at 15:29,  wrote:
>> --- a/docs/misc/xenstore.txt
>> +++ b/docs/misc/xenstore.txt
>> @@ -229,8 +229,10 @@ TRANSACTION_START   |   
>> |
>>  tx_id request header field.  When transaction is started whole
>>  db is copied; reads and writes happen on the copy.
>>  It is not legal to send non-0 tx_id in TRANSACTION_START.
>> -Currently xenstored has the bug that after 2^32 transactions
>> +Currently oxenstored has the bug that after 2^32 transactions
>>  it will allocate the transid 0 for an actual transaction.
> 
> While I know nothing about OCaml, I read Andrew's earlier reply
> to mean different behavior would result than the one described,
> namely differing between 32-bit and 64-bit builds.

Do we really still support 32 bit dom0? I thought the support has
ended some time ago?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] docs: remove wrong statement about bug in xenstore

2016-10-24 Thread Juergen Gross
On 24/10/16 16:47, Wei Liu wrote:
> On Mon, Oct 24, 2016 at 04:43:59PM +0200, Juergen Gross wrote:
>> On 24/10/16 15:57, Jan Beulich wrote:
>>>>>> On 24.10.16 at 15:29,  wrote:
>>>> --- a/docs/misc/xenstore.txt
>>>> +++ b/docs/misc/xenstore.txt
>>>> @@ -229,8 +229,10 @@ TRANSACTION_START |   
>>>> |
>>>>tx_id request header field.  When transaction is started whole
>>>>db is copied; reads and writes happen on the copy.
>>>>It is not legal to send non-0 tx_id in TRANSACTION_START.
>>>> -  Currently xenstored has the bug that after 2^32 transactions
>>>> +  Currently oxenstored has the bug that after 2^32 transactions
>>>>it will allocate the transid 0 for an actual transaction.
>>>
>>> While I know nothing about OCaml, I read Andrew's earlier reply
>>> to mean different behavior would result than the one described,
>>> namely differing between 32-bit and 64-bit builds.
>>
>> Do we really still support 32 bit dom0? I thought the support has
>> ended some time ago?
> 
> Yes, we still support that.
> 
> It is the support of 32 bit Xen that has ended.

Okay, do we need to specify the exact conditions where the bug will
occur, or is v2 of the patch okay, as it is clearly more accurate than
the original text?


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 1/2] xenstore: add support for reading directory with many children

2016-10-24 Thread Juergen Gross
As the payload size for one xenstore wire command is limited to 4096
bytes it is impossible to read the children names of a node with a
large number of children (e.g. /local/domain in case of a host with
more than about 2000 domains). This effectively limits the maximum
number of domains a host can support.

In order to support such long directory outputs add a new wire command
XS_DIRECTORY_PART which will return only some entries in each call and
can be called in a loop to get all entries. For this to work reliably
the loop using XS_DIRECTORY_PART until no further entries are returned
must be in one transaction.

Using XS_DIRECTORY_PART with a valid path will start the transmission
by copying the list of children to a xenstored internal buffer linked
to the transaction. Further calls of XS_DIRECTORY_PART with a path "@"
will advance in the buffer. The end of the output is indicated by an
empty child name. The internal buffer is released when:

- the buffer is exhausted
- XS_DIRECTORY or XS_DIRECTORY_PART with a valid path is called
  (this will allocate a new buffer, of course)
- the transaction is terminated (explicit or implicit termination)

The number of entries returned for each call is implementation
specific. The only guarantee is that no call will exceed the limit of
4096 bytes returned.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c|  4 +++
 tools/xenstore/xenstored_transaction.c | 57 ++
 tools/xenstore/xenstored_transaction.h |  1 +
 xen/include/public/io/xs_wire.h|  1 +
 4 files changed, 63 insertions(+)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3df977b..9667ce5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -1327,6 +1327,10 @@ static void process_message(struct connection *conn, 
struct buffered_data *in)
do_reset_watches(conn, in);
break;
 
+   case XS_DIRECTORY_PART:
+   send_directory_part(conn, in);
+   break;
+
default:
eprintf("Client unknown operation %i", in->hdr.msg.type);
send_error(conn, ENOSYS);
diff --git a/tools/xenstore/xenstored_transaction.c 
b/tools/xenstore/xenstored_transaction.c
index 34720fa..22a51da 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -79,6 +79,11 @@ struct transaction
 
/* List of changed domains - to record the changed domain entry number 
*/
struct list_head changed_domains;
+
+   /* Temporary buffer for XS_DIRECTORY_PART. */
+   char *dirpart_buf;
+   unsigned buf_off;
+   unsigned buf_len;
 };
 
 extern int quota_max_transaction;
@@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection *conn)
conn->transaction_started = 0;
 }
 
+void send_directory_part(struct connection *conn, struct buffered_data *in)
+{
+   struct transaction *trans = conn->transaction;
+   struct node *node;
+   const char *name = onearg(in);
+   unsigned len;
+
+   if (name == NULL || trans == NULL) {
+   send_error(conn, EINVAL);
+   return;
+   }
+
+   if (name[0] == '@' && name[1] == 0) {
+   if (trans->dirpart_buf == NULL) {
+   send_error(conn, EINVAL);
+   return;
+   }
+   } else {
+   if (trans->dirpart_buf) {
+   talloc_free(trans->dirpart_buf);
+   trans->dirpart_buf = NULL;
+   }
+
+   name = canonicalize(conn, name);
+   node = get_node(conn, in, name, XS_PERM_READ);
+   if (!node) {
+   send_error(conn, errno);
+   return;
+   }
+   trans->dirpart_buf = talloc_array(trans, char,
+ node->childlen + 1);
+   memcpy(trans->dirpart_buf, node->children, node->childlen);
+   trans->dirpart_buf[node->childlen] = 0;
+   trans->buf_off = 0;
+   trans->buf_len = node->childlen + 1;
+   }
+
+   if (trans->buf_len - trans->buf_off > 1024)
+   len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
+   else
+   len = trans->buf_len - trans->buf_off;
+
+   send_reply(conn, XS_DIRECTORY_PART, trans->dirpart_buf + trans->buf_off,
+  len);
+
+   trans->buf_off += len;
+   if (trans->buf_off == trans->buf_len) {
+   talloc_free(trans->dirpart_buf);
+   trans->dirpart_buf = NULL;
+   }
+}
+
 /*
  * Local variables:
  *  c-file-style: "linux"
diff --git a/tools/xenstore/xenstored_transaction.h 
b/tools/xenstore/xenstored_transaction.h
i

[Xen-devel] [PATCH 0/2] xenstore: support reading directory with many children

2016-10-24 Thread Juergen Gross
Reading the children list of a xenstore node with the length of that
list exceeding 4096 bytes is currently not possible. This can be a
problem for a large host with a huge number of domains as Xen tools
will no longer by capable to scan some directories of xenstore (e.g.
/local/domain).

This small patch series adds a new xs wire command to read a directory
in multiple chunks. libxenstore is modified in a compatible way to
show an unmodified result in case xenstored doesn't support the new
command.

The patch set has been verified to work by using the following shell script:

xenstore-write /test "test"

for i in `seq 100 500`
do
xenstore-write /test/entry_with_very_long_name_$i $i
done

xenstore-ls
xenstore-rm /test


Juergen Gross (2):
  xenstore: add support for reading directory with many children
  xenstore: support XS_DIRECTORY_PART in libxenstore

 tools/xenstore/xenstored_core.c|  4 ++
 tools/xenstore/xenstored_transaction.c | 57 ++
 tools/xenstore/xenstored_transaction.h |  1 +
 tools/xenstore/xs.c| 74 ++
 xen/include/public/io/xs_wire.h|  1 +
 5 files changed, 129 insertions(+), 8 deletions(-)

-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 2/2] xenstore: support XS_DIRECTORY_PART in libxenstore

2016-10-24 Thread Juergen Gross
This will enable all users of libxenstore to handle xenstore nodes
with a huge amount of children.

In order to not depend completely on the XS_DIRECTORY_PART
functionality use it only in case of E2BIG returned by XS_DIRECTORY.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xs.c | 74 +++--
 1 file changed, 66 insertions(+), 8 deletions(-)

diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index d1e01ba..e64ded8 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -558,15 +558,10 @@ static bool xs_bool(char *reply)
return true;
 }
 
-char **xs_directory(struct xs_handle *h, xs_transaction_t t,
-   const char *path, unsigned int *num)
+static char **xs_directory_common(char *strings, unsigned int len,
+ unsigned int *num)
 {
-   char *strings, *p, **ret;
-   unsigned int len;
-
-   strings = xs_single(h, t, XS_DIRECTORY, path, &len);
-   if (!strings)
-   return NULL;
+   char *p, **ret;
 
/* Count the strings. */
*num = xs_count_strings(strings, len);
@@ -586,6 +581,69 @@ char **xs_directory(struct xs_handle *h, xs_transaction_t 
t,
return ret;
 }
 
+static char **xs_directory_part(struct xs_handle *h, xs_transaction_t t,
+   const char *path, unsigned int *num)
+{
+   char *strings, *p, **ret;
+   unsigned int len, p_len;
+   int saved_errno;
+
+   if (t == XBT_NULL) {
+   for (;;) {
+   t = xs_transaction_start(h);
+   if (t == XBT_NULL)
+   return NULL;
+
+   ret = xs_directory_part(h, t, path, num);
+   saved_errno = errno;
+
+   if (xs_transaction_end(h, t, false) || ret == NULL) {
+   errno = saved_errno;
+   return ret;
+   }
+
+   free(ret);
+   }
+   }
+
+   strings = xs_single(h, t, XS_DIRECTORY_PART, path, &len);
+   if (!strings) {
+   /* If XS_DIRECTORY_PART is not supported return E2BIG. */
+   if (errno == ENOSYS)
+   errno = E2BIG;
+   return NULL;
+   }
+
+   while (len > 1 && strings[len - 2] != 0) {
+   p = xs_single(h, t, XS_DIRECTORY_PART, "@", &p_len);
+   if (!p) {
+   free_no_errno(strings);
+   return NULL;
+   }
+   strings = realloc(strings, len + p_len);
+   memcpy(strings + len, p, p_len);
+   len += p_len;
+   }
+
+   return xs_directory_common(strings, len - 1, num);
+}
+
+char **xs_directory(struct xs_handle *h, xs_transaction_t t,
+   const char *path, unsigned int *num)
+{
+   char *strings;
+   unsigned int len;
+
+   strings = xs_single(h, t, XS_DIRECTORY, path, &len);
+   if (!strings) {
+   if (errno != E2BIG)
+   return NULL;
+   return xs_directory_part(h, t, path, num);
+   }
+
+   return xs_directory_common(strings, len, num);
+}
+
 /* Get the value of a single file, nul terminated.
  * Returns a malloced value: call free() on it after use.
  * len indicates length in bytes, not including the nul.
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xenstore: add support for reading directory with many children

2016-10-25 Thread Juergen Gross
On 25/10/16 11:06, Jan Beulich wrote:
 On 25.10.16 at 07:52,  wrote:
>> --- a/tools/xenstore/xenstored_transaction.c
>> +++ b/tools/xenstore/xenstored_transaction.c
>> @@ -79,6 +79,11 @@ struct transaction
>>  
>>  /* List of changed domains - to record the changed domain entry number 
>> */
>>  struct list_head changed_domains;
>> +
>> +/* Temporary buffer for XS_DIRECTORY_PART. */
>> +char *dirpart_buf;
>> +unsigned buf_off;
>> +unsigned buf_len;
>>  };
> 
> The buffer you allocate for this can - by nature of this change - be
> huge, and there's one per transaction. Isn't this causing a resource
> utilization issue?

It will be allocated only when needed and its size is less than that of
the node to be read.

>> @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection 
>> *conn)
>>  conn->transaction_started = 0;
>>  }
>>  
>> +void send_directory_part(struct connection *conn, struct buffered_data *in)
>> +{
>> +struct transaction *trans = conn->transaction;
>> +struct node *node;
>> +const char *name = onearg(in);
>> +unsigned len;
>> +
>> +if (name == NULL || trans == NULL) {
>> +send_error(conn, EINVAL);
>> +return;
>> +}
>> +
>> +if (name[0] == '@' && name[1] == 0) {
>> +if (trans->dirpart_buf == NULL) {
>> +send_error(conn, EINVAL);
>> +return;
>> +}
>> +} else {
>> +if (trans->dirpart_buf) {
>> +talloc_free(trans->dirpart_buf);
>> +trans->dirpart_buf = NULL;
>> +}
>> +
>> +name = canonicalize(conn, name);
>> +node = get_node(conn, in, name, XS_PERM_READ);
>> +if (!node) {
>> +send_error(conn, errno);
>> +return;
>> +}
>> +trans->dirpart_buf = talloc_array(trans, char,
>> +  node->childlen + 1);
> 
> Once again, especially considering the buffer possibly being huge,
> shouldn't you check for allocation failure here?

I followed coding style from xenstored. Modifying this style should be
another patch set IMHO (I'd be fine doing this).

> 
>> +memcpy(trans->dirpart_buf, node->children, node->childlen);
>> +trans->dirpart_buf[node->childlen] = 0;
>> +trans->buf_off = 0;
>> +trans->buf_len = node->childlen + 1;
>> +}
>> +
>> +if (trans->buf_len - trans->buf_off > 1024)
> 
> What has this 1024 been derived from? In particular, why is it not
> (close to) the 4096 limit mentioned in the description?

In theory a single entry could be up to 2048 bytes long. I wanted to
keep the logic simple by not trying to iterate until the limit is
reached. I can change that, of course.

> 
>> +len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
> 
> Looking at construct_node() I'm getting the impression that there
> are embedded nul characters in the ->children array, in which case
> this strlen() would likely chop off values which could still fit, requiring
> needlessly many iterations. And the explicit nul termination after the
> allocation above would then also appear to be unnecessary.

Each children member is a nul terminated string, so I need to keep
the nul bytes in the result. And the final nul byte is the indicator
for the last entry being reached.

> And then - why did you put this function here instead of in
> xenstored_core.c, e.g. next to send_directory()?

It is using the xenstored_transaction.c private struct transaction.
Putting it in xenstored_core.c would have required to move the
struct definition into a header file.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/2] xenstore: add support for reading directory with many children

2016-10-25 Thread Juergen Gross
On 25/10/16 15:20, Jan Beulich wrote:
 On 25.10.16 at 13:41,  wrote:
>> On 25/10/16 11:06, Jan Beulich wrote:
>> On 25.10.16 at 07:52,  wrote:
 --- a/tools/xenstore/xenstored_transaction.c
 +++ b/tools/xenstore/xenstored_transaction.c
 @@ -79,6 +79,11 @@ struct transaction
  
/* List of changed domains - to record the changed domain entry number 
 */
struct list_head changed_domains;
 +
 +  /* Temporary buffer for XS_DIRECTORY_PART. */
 +  char *dirpart_buf;
 +  unsigned buf_off;
 +  unsigned buf_len;
  };
>>>
>>> The buffer you allocate for this can - by nature of this change - be
>>> huge, and there's one per transaction. Isn't this causing a resource
>>> utilization issue?
>>
>> It will be allocated only when needed and its size is less than that of
>> the node to be read.
> 
> That's not addressing my concern. A DomU could have multiple
> transactions each with such an operation in progress. And the
> space here doesn't get accounted against any of the quota.

It is accounted against the maximum number of transactions.

Additionally the buffer will be kept only if not all of the data
could be transferred in the first iteration of the read.

> 
 @@ -280,6 +285,58 @@ void conn_delete_all_transactions(struct connection 
 *conn)
conn->transaction_started = 0;
  }
  
 +void send_directory_part(struct connection *conn, struct buffered_data 
 *in)
 +{
 +  struct transaction *trans = conn->transaction;
 +  struct node *node;
 +  const char *name = onearg(in);
 +  unsigned len;
 +
 +  if (name == NULL || trans == NULL) {
 +  send_error(conn, EINVAL);
 +  return;
 +  }
 +
 +  if (name[0] == '@' && name[1] == 0) {
 +  if (trans->dirpart_buf == NULL) {
 +  send_error(conn, EINVAL);
 +  return;
 +  }
 +  } else {
 +  if (trans->dirpart_buf) {
 +  talloc_free(trans->dirpart_buf);
 +  trans->dirpart_buf = NULL;
 +  }
 +
 +  name = canonicalize(conn, name);
 +  node = get_node(conn, in, name, XS_PERM_READ);
 +  if (!node) {
 +  send_error(conn, errno);
 +  return;
 +  }
 +  trans->dirpart_buf = talloc_array(trans, char,
 +node->childlen + 1);
>>>
>>> Once again, especially considering the buffer possibly being huge,
>>> shouldn't you check for allocation failure here?
>>
>> I followed coding style from xenstored. Modifying this style should be
>> another patch set IMHO (I'd be fine doing this).
> 
> Cloning buggy code is never a good idea imo.
> 
 +  memcpy(trans->dirpart_buf, node->children, node->childlen);
 +  trans->dirpart_buf[node->childlen] = 0;
 +  trans->buf_off = 0;
 +  trans->buf_len = node->childlen + 1;
 +  }
 +
 +  if (trans->buf_len - trans->buf_off > 1024)
>>>
>>> What has this 1024 been derived from? In particular, why is it not
>>> (close to) the 4096 limit mentioned in the description?
>>
>> In theory a single entry could be up to 2048 bytes long. I wanted to
>> keep the logic simple by not trying to iterate until the limit is
>> reached. I can change that, of course.
> 
> While I think that would be worthwhile, I don't think this addresses
> my comment: You don't really explain where the 1024 is coming from.
> Instead to me your response looks to be related to the following
> point.

The 1024 is some arbitrary choice. I'll change it with the iteration
towards the allowed maximum added.

> 
 +  len = strlen(trans->dirpart_buf + trans->buf_off) + 1;
>>>
>>> Looking at construct_node() I'm getting the impression that there
>>> are embedded nul characters in the ->children array, in which case
>>> this strlen() would likely chop off values which could still fit, requiring
>>> needlessly many iterations. And the explicit nul termination after the
>>> allocation above would then also appear to be unnecessary.
>>
>> Each children member is a nul terminated string, so I need to keep
>> the nul bytes in the result. And the final nul byte is the indicator
>> for the last entry being reached.
> 
> Right, but this means you transfer only a single entry here. That was
> the point of my original concern.

And only now I realize that you are right. I meant to transfer at
least 1024 bytes but didn't do so. I'll correct this.

> 
>>> And then - why did you put this function here instead of in
>>> xenstored_core.c, e.g. next to send_directory()?
>>
>> It is using the xenstored_transaction.c private struct transaction.
>> Putting it in xenstored_core.c would have required to move the
>> struct definition into a header file.
> 
> Yeah, I've realized this after having sent the reply, but then again
> this als

Re: [Xen-devel] [PATCH 1/2] xenstore: add support for reading directory with many children

2016-10-25 Thread Juergen Gross
On 25/10/16 16:02, Jan Beulich wrote:
 On 25.10.16 at 15:47,  wrote:
>> On 25/10/16 15:20, Jan Beulich wrote:
>> On 25.10.16 at 13:41,  wrote:
 On 25/10/16 11:06, Jan Beulich wrote:
 On 25.10.16 at 07:52,  wrote:
>> --- a/tools/xenstore/xenstored_transaction.c
>> +++ b/tools/xenstore/xenstored_transaction.c
>> @@ -79,6 +79,11 @@ struct transaction
>>  
>>  /* List of changed domains - to record the changed domain entry 
>> number */
>>  struct list_head changed_domains;
>> +
>> +/* Temporary buffer for XS_DIRECTORY_PART. */
>> +char *dirpart_buf;
>> +unsigned buf_off;
>> +unsigned buf_len;
>>  };
>
> The buffer you allocate for this can - by nature of this change - be
> huge, and there's one per transaction. Isn't this causing a resource
> utilization issue?

 It will be allocated only when needed and its size is less than that of
 the node to be read.
>>>
>>> That's not addressing my concern. A DomU could have multiple
>>> transactions each with such an operation in progress. And the
>>> space here doesn't get accounted against any of the quota.
>>
>> It is accounted against the maximum number of transactions.
>>
>> Additionally the buffer will be kept only if not all of the data
>> could be transferred in the first iteration of the read.
> 
> Which nevertheless means an ill behaved guest could (if its quota
> allow) create a node with sufficiently many children and then start
> as many transactions as it can, and do a partial directory listing
> over each of them. Perhaps, considering that you mainly need this
> for Dom0, the new verb could be restricted to the control domain
> for now?

What about driver domains? They might need it, too.

Regarding memory usage: current limits for a guest if no quota have been
specified are:

- 1000 nodes
- max. entry size of 2048 bytes
- 10 transactions

So each node will need max. 4kB (2048 bytes entry, 2048 bytes name),
total node memory will be 4MB for this guest.

Lets assume the domU puts all those nodes in just one directory and
tries to read it in all possible 10 transactions. For each
transaction we'll need:

- 2MB temp buffer (1000 nodes * 2048 bytes name)
- at least 4MB data base copy (each transaction needs a complete
  copy of the data base, which is one of the main reasons I want to
  rewrite the transaction handling)

So an ill-behaved domU can force xenstored to use at least 64MB of
memory. The (minimal) memory amount for n ill-behaved domUs is:

n * (24 MB + n * 40MB)

n=1:  64MB
n=2: 208MB
n=3: 432MB
n=4: 736MB

Without the XS_DIRECTORY_PART support the numbers are rather similar:

n * n * 40MB total memory, so e.g. 640MB for 4 evil domUs.

> 
> And then - why did you put this function here instead of in
> xenstored_core.c, e.g. next to send_directory()?

 It is using the xenstored_transaction.c private struct transaction.
 Putting it in xenstored_core.c would have required to move the
 struct definition into a header file.
>>>
>>> Yeah, I've realized this after having sent the reply, but then again
>>> this also relates to the first question above (whether this should be
>>> a per-transaction item in the first place). In particular I don't think
>>> this being per transaction helps with allowing the guest to (easily)
>>> run multiple such queries in parallel, as commonly simple operations
>>> get run with null transactions.
>>
>> The XS_DIRECTORY_PART command is valid in a transaction only. This is
>> a prerequisite to be able to split the operation into multiple pieces
>> while keeping the consistency.
> 
> But I'm sure other models could be come up with, ideally without
> requiring meaningful amounts of temporary storage. For instance
> nodes could be versioned, and if the version changed between
> iterations, the caller could be signaled to start over. And it looks
> like such a version wouldn't even need to be bumped for child
> additions, but only for child removals.

In case nobody objects to that idea I'll look into it. The node version
might even help for better transaction support.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3] docs: remove wrong statement about bug in xenstore

2016-10-25 Thread Juergen Gross
docs/misc/xenstore.txt states that xenstored will use "0" as a valid
transaction id after 2^32 transactions. This is not true. Remove that
statement.

Signed-off-by: Juergen Gross 
---
v3: rephrase oxenstored part as suggested by Wei Liu
v2: oxenstored has this bug
---
 docs/misc/xenstore.txt | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
index c9f4a05..a0207f1 100644
--- a/docs/misc/xenstore.txt
+++ b/docs/misc/xenstore.txt
@@ -229,8 +229,10 @@ TRANSACTION_START  |   |
tx_id request header field.  When transaction is started whole
db is copied; reads and writes happen on the copy.
It is not legal to send non-0 tx_id in TRANSACTION_START.
-   Currently xenstored has the bug that after 2^32 transactions
-   it will allocate the transid 0 for an actual transaction.
+   Currently oxenstored has the bug that after the internal
+   representation of transaction id gets overflowed, 0 is returned.
+   xenstored (the implementation written in C) doesn't have this
+   bug.
 
 TRANSACTION_ENDT|
 TRANSACTION_ENDF|
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/7] xenstore: fix add_change_node()

2016-10-27 Thread Juergen Gross
add_change_node() in xenstored is used to add a modified node to the
list of changed nodes of one transaction. It is being called with the
recurse parameter set to true when removing a node in order to get
watches for children of the node fired at transaction end, too.

If, however, the node to be deleted had been modified in the same
transaction the recurse parameter of add_change_node() is lost as
an entry already in the list of the changed nodes won't be entered
again.

Signed-off-by: Juergen Gross 
---
Candidate for backport
---
 tools/xenstore/xenstored_transaction.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/tools/xenstore/xenstored_transaction.c 
b/tools/xenstore/xenstored_transaction.c
index 34720fa..3c0b8f7 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -103,8 +103,11 @@ void add_change_node(struct transaction *trans, const char 
*node, bool recurse)
}
 
list_for_each_entry(i, &trans->changes, list)
-   if (streq(i->node, node))
+   if (streq(i->node, node)) {
+   if (recurse)
+   i->recurse = recurse;
return;
+   }
 
i = talloc(trans, struct changed_node);
i->node = talloc_strdup(i, node);
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 5/7] xenstore: add per-node generation counter

2016-10-27 Thread Juergen Gross
In order to be able to support reading the list of a node's children in
multiple chunks (needed for list sizes > 4096 bytes) without having to
allocate a temporary buffer we need some kind of generation counter for
each node. This will help to recognize a node has changed between
reading two chunks.

As removing a node and reintroducing it must result in different
generation counts each generation value has to be globally unique. This
can be ensured only by using a global 64 bit counter.

For handling of transactions there is already such a counter available,
it just has to be expanded to 64 bits and must be stored in each
modified node.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/include/xenstore_lib.h  |  1 +
 tools/xenstore/xenstored_core.c|  2 ++
 tools/xenstore/xenstored_core.h|  3 +++
 tools/xenstore/xenstored_transaction.c | 13 +
 4 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/tools/xenstore/include/xenstore_lib.h 
b/tools/xenstore/include/xenstore_lib.h
index efdf935..0ffbae9 100644
--- a/tools/xenstore/include/xenstore_lib.h
+++ b/tools/xenstore/include/xenstore_lib.h
@@ -44,6 +44,7 @@ struct xs_permissions
 
 /* Header of the node record in tdb. */
 struct xs_tdb_record_hdr {
+   uint64_t generation;
uint32_t num_perms;
uint32_t datalen;
uint32_t childlen;
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index dfad0d5..95d6d7d 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -442,6 +442,7 @@ static struct node *read_node(struct connection *conn, 
const void *ctx,
 
/* Datalen, childlen, number of permissions */
hdr = (void *)data.dptr;
+   node->generation = hdr->generation;
node->num_perms = hdr->num_perms;
node->datalen = hdr->datalen;
node->childlen = hdr->childlen;
@@ -481,6 +482,7 @@ static bool write_node(struct connection *conn, struct node 
*node)
 
data.dptr = talloc_size(node, data.dsize);
hdr = (void *)data.dptr;
+   hdr->generation = node->generation;
hdr->num_perms = node->num_perms;
hdr->datalen = node->datalen;
hdr->childlen = node->childlen;
diff --git a/tools/xenstore/xenstored_core.h b/tools/xenstore/xenstored_core.h
index ecc614f..089625f 100644
--- a/tools/xenstore/xenstored_core.h
+++ b/tools/xenstore/xenstored_core.h
@@ -109,6 +109,9 @@ struct node {
/* Parent (optional) */
struct node *parent;
 
+   /* Generation count. */
+   uint64_t generation;
+
/* Permissions. */
unsigned int num_perms;
struct xs_permissions *perms;
diff --git a/tools/xenstore/xenstored_transaction.c 
b/tools/xenstore/xenstored_transaction.c
index 39996f1..39e5a49 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -68,7 +68,10 @@ struct transaction
uint32_t id;
 
/* Generation when transaction started. */
-   unsigned int generation;
+   uint64_t generation;
+
+   /* Transaction internal generation. */
+   uint64_t trans_gen;
 
/* TDB to work on, and filename */
TDB_CONTEXT *tdb;
@@ -82,7 +85,7 @@ struct transaction
 };
 
 extern int quota_max_transaction;
-static unsigned int generation;
+static uint64_t generation;
 
 /* Return tdb context to use for this connection. */
 TDB_CONTEXT *tdb_transaction_context(struct transaction *trans)
@@ -99,12 +102,14 @@ void add_change_node(struct connection *conn, struct node 
*node, bool recurse)
 
if (!conn || !conn->transaction) {
/* They're changing the global database. */
-   generation++;
+   node->generation = generation++;
return;
}
 
trans = conn->transaction;
 
+   node->generation = generation + trans->trans_gen++;
+
list_for_each_entry(i, &trans->changes, list)
if (streq(i->node, node->name)) {
if (recurse)
@@ -234,7 +239,7 @@ void do_transaction_end(struct connection *conn, struct 
buffered_data *in)
/* Fire off the watches for everything that changed. */
list_for_each_entry(i, &trans->changes, list)
fire_watches(conn, in, i->node, i->recurse);
-   generation++;
+   generation += trans->trans_gen;
}
send_ack(conn, XS_TRANSACTION_END);
 }
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 6/7] xenstore: add support for reading directory with many children

2016-10-27 Thread Juergen Gross
As the payload size for one xenstore wire command is limited to 4096
bytes it is impossible to read the children names of a node with a
large number of children (e.g. /local/domain in case of a host with
more than about 2000 domains). This effectively limits the maximum
number of domains a host can support.

In order to support such long directory outputs add a new wire command
XS_DIRECTORY_PART which will return only some entries in each call and
can be called in a loop to get all entries.

Input data are the path of the node and the byte offset into the child
list where returned data should start.

Output is the generation count of the node (which will change each time
the node is being modified) and a list of child names starting with
the specified index. The end of the list is indicated by an empty
child name. It is the responsibility of the caller to check for data
consistency by comparing the generation counts of all returned data
sets to be the same for one node.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c | 67 +
 xen/include/public/io/xs_wire.h |  1 +
 2 files changed, 68 insertions(+)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 95d6d7d..ecee4e2 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -16,6 +16,7 @@
 along with this program; If not, see <http://www.gnu.org/licenses/>.
 */
 
+#include 
 #include 
 #include 
 #include 
@@ -812,6 +813,68 @@ static void send_directory(struct connection *conn, struct 
buffered_data *in)
send_reply(conn, XS_DIRECTORY, node->children, node->childlen);
 }
 
+static void send_directory_part(struct connection *conn,
+   struct buffered_data *in)
+{
+   unsigned int off, len, maxlen, genlen;
+   char *name, *child, *data;
+   struct node *node;
+   char gen[24];
+
+   if (xs_count_strings(in->buffer, in->used) != 2) {
+   send_error(conn, EINVAL);
+   return;
+   }
+
+   /* First arg is node name. */
+   name = canonicalize(conn, in->buffer);
+
+   /* Second arg is childlist offset. */
+   off = atoi(in->buffer + strlen(in->buffer) + 1);
+
+   node = get_node(conn, in, name, XS_PERM_READ);
+   if (!node) {
+   send_error(conn, errno);
+   return;
+   }
+
+   genlen = snprintf(gen, sizeof(gen), "%"PRIu64, node->generation) + 1;
+
+   /* Offset behind list: just return a list with an empty string. */
+   if (off >= node->childlen) {
+   len = strlen(gen) + 2;
+   gen[len - 1] = 0;
+   send_reply(conn, XS_DIRECTORY_PART, gen, len);
+   return;
+   }
+
+   len = 0;
+   maxlen = XENSTORE_PAYLOAD_MAX - genlen - 1;
+   child = node->children + off;
+
+   while (len + strlen(child) < maxlen) {
+   len += strlen(child) + 1;
+   child += strlen(child) + 1;
+   if (off + len == node->childlen)
+   break;
+   }
+
+   data = talloc_array(in, char, genlen + len + 1);
+   if (!data) {
+   send_error(conn, ENOMEM);
+   return;
+   }
+
+   strcpy(data, gen);
+   memcpy(data + genlen, node->children + off, len);
+   if (off + len == node->childlen) {
+   len++;
+   data[genlen + len] = 0;
+   }
+
+   send_reply(conn, XS_DIRECTORY_PART, data, genlen + len);
+}
+
 static void do_read(struct connection *conn, struct buffered_data *in)
 {
struct node *node;
@@ -1334,6 +1397,10 @@ static void process_message(struct connection *conn, 
struct buffered_data *in)
do_reset_watches(conn, in);
break;
 
+   case XS_DIRECTORY_PART:
+   send_directory_part(conn, in);
+   break;
+
default:
eprintf("Client unknown operation %i", in->hdr.msg.type);
send_error(conn, ENOSYS);
diff --git a/xen/include/public/io/xs_wire.h b/xen/include/public/io/xs_wire.h
index 0a0cdbc..545f916 100644
--- a/xen/include/public/io/xs_wire.h
+++ b/xen/include/public/io/xs_wire.h
@@ -50,6 +50,7 @@ enum xsd_sockmsg_type
 XS_SET_TARGET,
 XS_RESTRICT,
 XS_RESET_WATCHES,
+XS_DIRECTORY_PART,
 
 XS_INVALID = 0x /* Guaranteed to remain an invalid type */
 };
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/7] xenstore: use common tdb record header in xenstore

2016-10-27 Thread Juergen Gross
The layout of the tdb record of xenstored is defined at multiple
places: read_node(), write_node() and in xs_tdb_dump.c

Use a common structure instead.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/include/xenstore_lib.h |  8 
 tools/xenstore/xenstored_core.c   | 27 ++-
 tools/xenstore/xs_tdb_dump.c  | 11 ++-
 3 files changed, 24 insertions(+), 22 deletions(-)

diff --git a/tools/xenstore/include/xenstore_lib.h 
b/tools/xenstore/include/xenstore_lib.h
index 462b7b9..efdf935 100644
--- a/tools/xenstore/include/xenstore_lib.h
+++ b/tools/xenstore/include/xenstore_lib.h
@@ -42,6 +42,14 @@ struct xs_permissions
enum xs_perm_type perms;
 };
 
+/* Header of the node record in tdb. */
+struct xs_tdb_record_hdr {
+   uint32_t num_perms;
+   uint32_t datalen;
+   uint32_t childlen;
+   struct xs_permissions perms[0];
+};
+
 /* Each 10 bits takes ~ 3 digits, plus one, plus one for nul terminator. */
 #define MAX_STRLEN(x) ((sizeof(x) * CHAR_BIT + CHAR_BIT-1) / 10 * 3 + 2)
 
diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 1354387..dfad0d5 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -416,7 +416,7 @@ static struct node *read_node(struct connection *conn, 
const void *ctx,
  const char *name)
 {
TDB_DATA key, data;
-   uint32_t *p;
+   struct xs_tdb_record_hdr *hdr;
struct node *node;
TDB_CONTEXT * context = tdb_context(conn);
 
@@ -441,13 +441,13 @@ static struct node *read_node(struct connection *conn, 
const void *ctx,
talloc_steal(node, data.dptr);
 
/* Datalen, childlen, number of permissions */
-   p = (uint32_t *)data.dptr;
-   node->num_perms = p[0];
-   node->datalen = p[1];
-   node->childlen = p[2];
+   hdr = (void *)data.dptr;
+   node->num_perms = hdr->num_perms;
+   node->datalen = hdr->datalen;
+   node->childlen = hdr->childlen;
 
/* Permissions are struct xs_permissions. */
-   node->perms = (void *)&p[3];
+   node->perms = hdr->perms;
/* Data is binary blob (usually ascii, no nul). */
node->data = node->perms + node->num_perms;
/* Children is strings, nul separated. */
@@ -465,11 +465,12 @@ static bool write_node(struct connection *conn, struct 
node *node)
 
TDB_DATA key, data;
void *p;
+   struct xs_tdb_record_hdr *hdr;
 
key.dptr = (void *)node->name;
key.dsize = strlen(node->name);
 
-   data.dsize = 3*sizeof(uint32_t)
+   data.dsize = sizeof(*hdr)
+ node->num_perms*sizeof(node->perms[0])
+ node->datalen + node->childlen;
 
@@ -479,13 +480,13 @@ static bool write_node(struct connection *conn, struct 
node *node)
add_change_node(conn, node, false);
 
data.dptr = talloc_size(node, data.dsize);
-   ((uint32_t *)data.dptr)[0] = node->num_perms;
-   ((uint32_t *)data.dptr)[1] = node->datalen;
-   ((uint32_t *)data.dptr)[2] = node->childlen;
-   p = data.dptr + 3 * sizeof(uint32_t);
+   hdr = (void *)data.dptr;
+   hdr->num_perms = node->num_perms;
+   hdr->datalen = node->datalen;
+   hdr->childlen = node->childlen;
 
-   memcpy(p, node->perms, node->num_perms*sizeof(node->perms[0]));
-   p += node->num_perms*sizeof(node->perms[0]);
+   memcpy(hdr->perms, node->perms, node->num_perms*sizeof(node->perms[0]));
+   p = hdr->perms + node->num_perms;
memcpy(p, node->data, node->datalen);
p += node->datalen;
memcpy(p, node->children, node->childlen);
diff --git a/tools/xenstore/xs_tdb_dump.c b/tools/xenstore/xs_tdb_dump.c
index 9f636f9..207ed44 100644
--- a/tools/xenstore/xs_tdb_dump.c
+++ b/tools/xenstore/xs_tdb_dump.c
@@ -11,14 +11,7 @@
 #include "talloc.h"
 #include "utils.h"
 
-struct record_hdr {
-   uint32_t num_perms;
-   uint32_t datalen;
-   uint32_t childlen;
-   struct xs_permissions perms[0];
-};
-
-static uint32_t total_size(struct record_hdr *hdr)
+static uint32_t total_size(struct xs_tdb_record_hdr *hdr)
 {
return sizeof(*hdr) + hdr->num_perms * sizeof(struct xs_permissions) 
+ hdr->datalen + hdr->childlen;
@@ -58,7 +51,7 @@ int main(int argc, char *argv[])
key = tdb_firstkey(tdb);
while (key.dptr) {
TDB_DATA data;
-   struct record_hdr *hdr;
+   struct xs_tdb_record_hdr *hdr;
 
data = tdb_fetch(tdb, key);
hdr = (void *)data.dptr;
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/7] xenstore: call add_change_node() directly when writing node

2016-10-27 Thread Juergen Gross
Instead of calling add_change_node() at places where write_node() is
called, do that inside write_node().

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index de1a9b4..1354387 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -456,7 +456,7 @@ static struct node *read_node(struct connection *conn, 
const void *ctx,
return node;
 }
 
-static bool write_node(struct connection *conn, const struct node *node)
+static bool write_node(struct connection *conn, struct node *node)
 {
/*
 * conn will be null when this is called from manual_node.
@@ -476,6 +476,8 @@ static bool write_node(struct connection *conn, const 
struct node *node)
if (domain_is_unprivileged(conn) && data.dsize >= quota_max_entry_size)
goto error;
 
+   add_change_node(conn, node, false);
+
data.dptr = talloc_size(node, data.dsize);
((uint32_t *)data.dptr)[0] = node->num_perms;
((uint32_t *)data.dptr)[1] = node->datalen;
@@ -976,7 +978,6 @@ static void do_write(struct connection *conn, struct 
buffered_data *in)
}
}
 
-   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
send_ack(conn, XS_WRITE);
 }
@@ -1007,7 +1008,6 @@ static void do_mkdir(struct connection *conn, struct 
buffered_data *in)
send_error(conn, errno);
return;
}
-   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
}
send_ack(conn, XS_MKDIR);
@@ -1209,7 +1209,6 @@ static void do_set_perms(struct connection *conn, struct 
buffered_data *in)
return;
}
 
-   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
send_ack(conn, XS_SET_PERMS);
 }
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/7] xenstore: support reading directory with many children

2016-10-27 Thread Juergen Gross
Reading the children list of a xenstore node with the length of that
list exceeding 4096 bytes is currently not possible. This can be a
problem for a large host with a huge number of domains as Xen tools
will no longer by capable to scan some directories of xenstore (e.g.
/local/domain).

This patch series adds a new xs wire command to read a directory
in multiple chunks. libxenstore is modified in a compatible way to
show an unmodified result in case xenstored doesn't support the new
command.

The patch set has been verified to work by using the following shell script:

xenstore-write /test "test"

for i in `seq 100 500`
do
xenstore-write /test/entry_with_very_long_name_$i $i
done

xenstore-ls
xenstore-rm /test

Xenstore has been verified to work by starting multiple domain types.
Especially HVM with qemu-stubdom has been tested as this configuration
seems to be rather sensible to concurrent transactions.

Changes in V2:
- complete rework as suggested by Jan Beulich: don't use transactions
  for consistency, but a per-node generation count
- fix a (minor?) problem in transaction code regarding watches (patch 1)

Juergen Gross (7):
  xenstore: fix add_change_node()
  xenstore: modify add_change_node() parameter types
  xenstore: call add_change_node() directly when writing node
  xenstore: use common tdb record header in xenstore
  xenstore: add per-node generation counter
  xenstore: add support for reading directory with many children
  xenstore: support XS_DIRECTORY_PART in libxenstore

 tools/xenstore/include/xenstore_lib.h  |   9 +++
 tools/xenstore/xenstored_core.c| 120 ++---
 tools/xenstore/xenstored_core.h|   3 +
 tools/xenstore/xenstored_transaction.c |  27 +---
 tools/xenstore/xenstored_transaction.h |   4 +-
 tools/xenstore/xs.c|  80 +++---
 tools/xenstore/xs_tdb_dump.c   |  11 +--
 xen/include/public/io/xs_wire.h|   1 +
 8 files changed, 205 insertions(+), 50 deletions(-)

-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 7/7] xenstore: support XS_DIRECTORY_PART in libxenstore

2016-10-27 Thread Juergen Gross
This will enable all users of libxenstore to handle xenstore nodes
with a huge amount of children.

In order to not depend completely on the XS_DIRECTORY_PART
functionality use it only in case of E2BIG returned by XS_DIRECTORY.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xs.c | 80 +++--
 1 file changed, 72 insertions(+), 8 deletions(-)

diff --git a/tools/xenstore/xs.c b/tools/xenstore/xs.c
index d1e01ba..40e3275 100644
--- a/tools/xenstore/xs.c
+++ b/tools/xenstore/xs.c
@@ -558,15 +558,10 @@ static bool xs_bool(char *reply)
return true;
 }
 
-char **xs_directory(struct xs_handle *h, xs_transaction_t t,
-   const char *path, unsigned int *num)
+static char **xs_directory_common(char *strings, unsigned int len,
+ unsigned int *num)
 {
-   char *strings, *p, **ret;
-   unsigned int len;
-
-   strings = xs_single(h, t, XS_DIRECTORY, path, &len);
-   if (!strings)
-   return NULL;
+   char *p, **ret;
 
/* Count the strings. */
*num = xs_count_strings(strings, len);
@@ -586,6 +581,75 @@ char **xs_directory(struct xs_handle *h, xs_transaction_t 
t,
return ret;
 }
 
+static char **xs_directory_part(struct xs_handle *h, xs_transaction_t t,
+   const char *path, unsigned int *num)
+{
+   unsigned int off, result_len;
+   char gen[24], offstr[8];
+   struct iovec iovec[2];
+   char *result = NULL, *strings = NULL;
+
+   gen[0] = 0;
+   iovec[0].iov_base = (void *)path;
+   iovec[0].iov_len = strlen(path) + 1;
+
+   for (off = 0;;) {
+   snprintf(offstr, sizeof(offstr), "%u", off);
+   iovec[1].iov_base = (void *)offstr;
+   iovec[1].iov_len = strlen(offstr) + 1;
+   result = xs_talkv(h, t, XS_DIRECTORY_PART, iovec, 2,
+ &result_len);
+
+   /* If XS_DIRECTORY_PART isn't supported return E2BIG. */
+   if (!result) {
+   if (errno == ENOSYS)
+   errno = E2BIG;
+   return NULL;
+   }
+
+   if (off) {
+   if (strcmp(gen, result)) {
+   free(result);
+   free(strings);
+   strings = NULL;
+   off = 0;
+   continue;
+   }
+   } else
+   strncpy(gen, result, sizeof(gen));
+
+   result_len -= strlen(result) + 1;
+   strings = realloc(strings, off + result_len);
+   memcpy(strings + off, result + strlen(result) + 1, result_len);
+   free(result);
+   off += result_len;
+
+   if (off <= 1 || strings[off - 2] == 0)
+   break;
+   }
+
+   if (off > 1)
+   off--;
+
+   return xs_directory_common(strings, off, num);
+}
+
+char **xs_directory(struct xs_handle *h, xs_transaction_t t,
+   const char *path, unsigned int *num)
+{
+   char *strings;
+   unsigned int len;
+
+   strings = xs_single(h, t, XS_DIRECTORY, path, &len);
+   if (!strings) {
+   if (errno != E2BIG)
+   return NULL;
+   return xs_directory_part(h, t, path, num);
+   }
+
+   return xs_directory_common(strings, len, num);
+}
+
 /* Get the value of a single file, nul terminated.
  * Returns a malloced value: call free() on it after use.
  * len indicates length in bytes, not including the nul.
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/7] xenstore: modify add_change_node() parameter types

2016-10-27 Thread Juergen Gross
In order to prepare adding a generation count to each node modify
add_change_node() to take the connection pointer and a node pointer
instead of the transaction pointer and node name as parameters. This
requires moving the call of add_change_node() from do_rm() to
delete_node_single().

While at it correct the comment for the prototype: there is no
longjmp() involved.

Signed-off-by: Juergen Gross 
---
 tools/xenstore/xenstored_core.c| 23 ++-
 tools/xenstore/xenstored_transaction.c | 11 +++
 tools/xenstore/xenstored_transaction.h |  4 ++--
 3 files changed, 23 insertions(+), 15 deletions(-)

diff --git a/tools/xenstore/xenstored_core.c b/tools/xenstore/xenstored_core.c
index 3df977b..de1a9b4 100644
--- a/tools/xenstore/xenstored_core.c
+++ b/tools/xenstore/xenstored_core.c
@@ -822,7 +822,8 @@ static void do_read(struct connection *conn, struct 
buffered_data *in)
send_reply(conn, XS_READ, node->data, node->datalen);
 }
 
-static void delete_node_single(struct connection *conn, struct node *node)
+static void delete_node_single(struct connection *conn, struct node *node,
+  bool changed)
 {
TDB_DATA key;
 
@@ -833,6 +834,10 @@ static void delete_node_single(struct connection *conn, 
struct node *node)
corrupt(conn, "Could not delete '%s'", node->name);
return;
}
+
+   if (changed)
+   add_change_node(conn, node, true);
+
domain_entry_dec(conn, node);
 }
 
@@ -971,7 +976,7 @@ static void do_write(struct connection *conn, struct 
buffered_data *in)
}
}
 
-   add_change_node(conn->transaction, name, false);
+   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
send_ack(conn, XS_WRITE);
 }
@@ -1002,20 +1007,21 @@ static void do_mkdir(struct connection *conn, struct 
buffered_data *in)
send_error(conn, errno);
return;
}
-   add_change_node(conn->transaction, name, false);
+   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
}
send_ack(conn, XS_MKDIR);
 }
 
-static void delete_node(struct connection *conn, struct node *node)
+static void delete_node(struct connection *conn, struct node *node,
+   bool changed)
 {
unsigned int i;
 
/* Delete self, then delete children.  If we crash, then the worst
   that can happen is the children will continue to take up space, but
   will otherwise be unreachable. */
-   delete_node_single(conn, node);
+   delete_node_single(conn, node, changed);
 
/* Delete children, too. */
for (i = 0; i < node->childlen; i += strlen(node->children+i) + 1) {
@@ -1025,7 +1031,7 @@ static void delete_node(struct connection *conn, struct 
node *node)
  talloc_asprintf(node, "%s/%s", node->name,
  node->children + i));
if (child) {
-   delete_node(conn, child);
+   delete_node(conn, child, false);
}
else {
trace("delete_node: No child '%s/%s' found!\n",
@@ -1084,7 +1090,7 @@ static int _rm(struct connection *conn, struct node 
*node, const char *name)
return 0;
}
 
-   delete_node(conn, node);
+   delete_node(conn, node, true);
return 1;
 }
 
@@ -1128,7 +1134,6 @@ static void do_rm(struct connection *conn, struct 
buffered_data *in)
}
 
if (_rm(conn, node, name)) {
-   add_change_node(conn->transaction, name, true);
fire_watches(conn, in, name, true);
send_ack(conn, XS_RM);
}
@@ -1204,7 +1209,7 @@ static void do_set_perms(struct connection *conn, struct 
buffered_data *in)
return;
}
 
-   add_change_node(conn->transaction, name, false);
+   add_change_node(conn, node, false);
fire_watches(conn, in, name, false);
send_ack(conn, XS_SET_PERMS);
 }
diff --git a/tools/xenstore/xenstored_transaction.c 
b/tools/xenstore/xenstored_transaction.c
index 3c0b8f7..39996f1 100644
--- a/tools/xenstore/xenstored_transaction.c
+++ b/tools/xenstore/xenstored_transaction.c
@@ -92,25 +92,28 @@ TDB_CONTEXT *tdb_transaction_context(struct transaction 
*trans)
 
 /* Callers get a change node (which can fail) and only commit after they've
  * finished.  This way they don't have to unwind eg. a write. */
-void add_change_node(struct transaction *trans, const char *node, bool recurse)
+void add_change_node(struct connection *conn, struct node *node, bool recurse)
 {
struct changed_node *i;
+   struct transaction *trans;
 
-   if (!trans) {
+ 

Re: [Xen-devel] [PATCH v2 1/7] xenstore: fix add_change_node()

2016-10-27 Thread Juergen Gross
On 27/10/16 12:15, Wei Liu wrote:
> On Thu, Oct 27, 2016 at 11:55:52AM +0200, Juergen Gross wrote:
>> add_change_node() in xenstored is used to add a modified node to the
>> list of changed nodes of one transaction. It is being called with the
>> recurse parameter set to true when removing a node in order to get
>> watches for children of the node fired at transaction end, too.
>>
>> If, however, the node to be deleted had been modified in the same
>> transaction the recurse parameter of add_change_node() is lost as
>> an entry already in the list of the changed nodes won't be entered
>> again.
>>
>> Signed-off-by: Juergen Gross 
> 
> Reviewed-by: Wei Liu 
> 
> I think we should apply this to 4.8, too. I will wait a bit for
> different opinions.
> 
> (I've only looked at this patch in this series because I caught the
> "fix" in subject line)

That's how it was meant to be. I just wanted to get some feedback
about the general idea before continuing the work on xenstored
(handling of memory allocation failures, more effective transaction
handling), as some of this work will be based on the per-node
generation counter I've introduced with this series.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 6/7] xenstore: add support for reading directory with many children

2016-10-27 Thread Juergen Gross
On 27/10/16 15:56, Jan Beulich wrote:
 On 27.10.16 at 11:55,  wrote:
>> +static void send_directory_part(struct connection *conn,
>> +struct buffered_data *in)
>> +{
>> +unsigned int off, len, maxlen, genlen;
>> +char *name, *child, *data;
>> +struct node *node;
>> +char gen[24];
>> +
>> +if (xs_count_strings(in->buffer, in->used) != 2) {
>> +send_error(conn, EINVAL);
>> +return;
>> +}
>> +
>> +/* First arg is node name. */
>> +name = canonicalize(conn, in->buffer);
>> +
>> +/* Second arg is childlist offset. */
>> +off = atoi(in->buffer + strlen(in->buffer) + 1);
> 
> Not being a user land person, I consider this too weak: Imo using
> strtoul() and properly handling conversion errors would be better
> here.

Common practice in xenstored. In case the user is doing silly things
the worst case here is he will receive silly data (either an empty
children list or a list starting in the middle of a child's name).

>> +node = get_node(conn, in, name, XS_PERM_READ);
>> +if (!node) {
>> +send_error(conn, errno);
>> +return;
>> +}
>> +
>> +genlen = snprintf(gen, sizeof(gen), "%"PRIu64, node->generation) + 1;
> 
> Why not use the shorter hex representation here?

Numbers are normally transferred in decimal representation (domid,
transaction id).

>> +/* Offset behind list: just return a list with an empty string. */
>> +if (off >= node->childlen) {
> 
> Is it perhaps worth separating the == and > cases? The former is
> just EOL, while the latter is really an error.

Hmm, yes. I'll modify it.

>> +len = strlen(gen) + 2;
>> +gen[len - 1] = 0;
>> +send_reply(conn, XS_DIRECTORY_PART, gen, len);
> 
> Any reason not to utilize genlen here, instead of the extra strlen()?

No, you are right.

>> +return;
>> +}
>> +
>> +len = 0;
>> +maxlen = XENSTORE_PAYLOAD_MAX - genlen - 1;
>> +child = node->children + off;
>> +
>> +while (len + strlen(child) < maxlen) {
>> +len += strlen(child) + 1;
>> +child += strlen(child) + 1;
>> +if (off + len == node->childlen)
>> +break;
>> +}
>> +
>> +data = talloc_array(in, char, genlen + len + 1);
>> +if (!data) {
>> +send_error(conn, ENOMEM);
>> +return;
>> +}
>> +
>> +strcpy(data, gen);
> 
> memcpy(data, gen, genlen); ?

I don't mind.

> 
>> +memcpy(data + genlen, node->children + off, len);
>> +if (off + len == node->childlen) {
>> +len++;
>> +data[genlen + len] = 0;
>> +}
>> +
>> +send_reply(conn, XS_DIRECTORY_PART, data, genlen + len);
>> +}
> 
> Since you don't return the offset, am I understanding right that the
> remote side is expected to accumulate that value?

Yes.


Thanks for the review,

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 6/7] xenstore: add support for reading directory with many children

2016-10-27 Thread Juergen Gross
On 27/10/16 17:00, Juergen Gross wrote:
> On 27/10/16 15:56, Jan Beulich wrote:
>>>>> On 27.10.16 at 11:55,  wrote:
>>> +   /* Offset behind list: just return a list with an empty string. */
>>> +   if (off >= node->childlen) {
>>
>> Is it perhaps worth separating the == and > cases? The former is
>> just EOL, while the latter is really an error.
> 
> Hmm, yes. I'll modify it.

On a second thought: this might be the result of a concurrent
child removal. So the caller would have no chance to avoid this
situation, but he would see a different generation count in the
response.

I think I leave it as it is.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Stubdom GMP build failure for gcc 6

2016-10-28 Thread Juergen Gross
On 28/10/16 14:10, Wei Liu wrote:
> Hi all
> 
> There have been a few reports on stubdom build failure with gcc 6
> toolchain. I spent some time yesterday to figure what went wrong. Here
> is what I found.
> 
> When building GMP library, its configure script generates small C
> programs to determine various aspects of the system. Unfortunately the
> build rune for it is incorrect, so the test program ends up consuming
> newlib headers while linking against the host glibc. It's amazing that
> this even worked in the past few years! :-)
> 
> Unfortunately my attempt to fix it by providing LDFLAGS="-nostdlib
> -LXXX" doesn't work. It turns out that there is no crt generated in
> newlib. I'm not sure if that's because the newlib port is incomplete or
> I haven't discovered a way to teach it to generate one.
> 
> So what should we do with this? I'm not sure if I can come up with a
> non-intrusive patch quickly.  GMP is only used by tpm emulator, so for
> the imminent 4.8 release I can write a patch to disable building that.
> 
> Ultimately we need to have a proper solution, because there can be other
> breakages in the future. And I do wish users who need tpm emulator can
> continue to use it. I don't have a clear answer as to how many people
> care about this and how can we fix it.
> 
> Thoughts?

I just tried to verify it is working (or failing) for me. On the machine
I normally did my Xen builds cmake was missing so it never tried to
build libgmp. After installing it I saw the following problem:

at the end of libgmp build:
Libraries have been installed in:
   /home/gross/xen/stubdom/cross-root-x86_64/x86_64-xen-elf/lib64

and when trying to link stubdom-vtpm:
make[2]: Entering directory '/home/gross/xen/extras/mini-os-remote'
ld -r -d -nostdlib
-L/home/gross/xen/stubdom/cross-root-x86_64/x86_64-xen-elf/lib  -m
elf_x86_64 -\( /home/gross/xen/stubdom/vtpm/vtpm.a -T app.lds -\) -ltpm
-ltpm_crypto -lgmp -lpolarssl --undefined main -o
/home/gross/xen/stubdom/mini-os-x86_64-vtpm/mini-os_app.o
ld: cannot find -lgmp

manually adding
"-L/home/gross/xen/stubdom/cross-root-x86_64/x86_64-xen-elf/lib64" lets
the link command succeed.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] stubdom: fix "make distclean" regarding gmp

2016-10-28 Thread Juergen Gross
make distclean tries to remove stubdom/gmp-4.3.2.tar.gz, while the
downloaded file is stubdom/gmp-4.3.2.tar.bz2

Signed-off-by: Juergen Gross 
---
 stubdom/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stubdom/Makefile b/stubdom/Makefile
index d7a47f0..0252bcc 100644
--- a/stubdom/Makefile
+++ b/stubdom/Makefile
@@ -649,7 +649,7 @@ patchclean: crossclean
 downloadclean: patchclean
rm -f newlib-$(NEWLIB_VERSION).tar.gz
rm -f zlib-$(ZLIB_VERSION).tar.gz
-   rm -f gmp-$(GMP_VERSION).tar.gz
+   rm -f gmp-$(GMP_VERSION).tar.bz2
rm -f tpm_emulator-$(TPMEMU_VERSION).tar.gz
rm -f pciutils-$(LIBPCI_VERSION).tar.bz2
rm -f grub-$(GRUB_VERSION).tar.gz
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH] stubdom: fix stubdom-vtpm build

2016-10-28 Thread Juergen Gross
stubdom-vtpm needs gmp and expects it under
stubdom/cross-root-x86_64/x86_64-xen-elf/lib while gmp seems to install
it under stubdom/cross-root-x86_64/x86_64-xen-elf/lib64

Modify the Makefile to account for this by moving the gmp.a file from
lib64 to lib if necessary.

Signed-off-by: Juergen Gross 
---
 stubdom/Makefile | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/stubdom/Makefile b/stubdom/Makefile
index 0252bcc..4851d03 100644
--- a/stubdom/Makefile
+++ b/stubdom/Makefile
@@ -176,11 +176,13 @@ gmp-$(XEN_TARGET_ARCH): gmp-$(GMP_VERSION).tar.bz2 
$(NEWLIB_STAMPFILE)
touch $@
 
 GMP_STAMPFILE=$(CROSS_ROOT)/$(GNU_TARGET_ARCH)-xen-elf/lib/libgmp.a
+GMP_STAMPFILE64=$(CROSS_ROOT)/$(GNU_TARGET_ARCH)-xen-elf/lib64/libgmp.a
 cross-gmp: $(GMP_STAMPFILE)
 $(GMP_STAMPFILE): gmp-$(XEN_TARGET_ARCH)
( cd $< && \
  $(MAKE) && \
  $(MAKE) DESTDIR= install )
+   if [ ! -f "$(GMP_STAMPFILE)" -a -f "$(GMP_STAMPFILE64)" ]; then mv 
$(GMP_STAMPFILE64) $(GMP_STAMPFILE); fi
 
 #
 # cross-polarssl
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] stubdom: fix stubdom-vtpm build

2016-10-31 Thread Juergen Gross
On 31/10/16 08:55, Jan Beulich wrote:
 On 28.10.16 at 17:10,  wrote:
>> stubdom-vtpm needs gmp and expects it under
>> stubdom/cross-root-x86_64/x86_64-xen-elf/lib while gmp seems to install
>> it under stubdom/cross-root-x86_64/x86_64-xen-elf/lib64
> 
> Are you sure this is universal, rather dependent upon some
> (possibly even host) configuration item? I think that some distros
> have switched to (or have always used) /lib as the main library
> directory for x86-64, while others (like SUSE's) stick to the
> originally mandated /lib64. Over the years I have found quite a
> few projects where, in order to become consistent with SUSE's
> model, I had to override the global default of /lib with /lib64 just
> for x86-64 (but not for e.g. ia64).

No, I don't think it is universal. This is the reason I did the move
of the file only in case it is to be found under lib64 but not under
lib.

I've tested the build to break with plain

./configure

and with

./configure --prefix=/usr --libdir=/usr/lib64

which I use normally to match my SUSE environment.

OTOH looking at Wei's recent patch to modify gmp's configure to
specify cross compilation I now believe the correct answer to my
problem is to add

--libdir=$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf/lib

to the configure statement of gmp. I'll send V2 of the patch.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH for-4.8 v2] stubdom: fix stubdom-vtpm build

2016-10-31 Thread Juergen Gross
stubdom-vtpm needs gmp and expects it under
stubdom/cross-root-x86_64/x86_64-xen-elf/lib while gmp seems to install
it under stubdom/cross-root-x86_64/x86_64-xen-elf/lib64 at least in an
openSUSE environment.

Modify gmp configure parameters to explicitly specify --libdir.

Signed-off-by: Juergen Gross 
---
To be applied on top of Wei's patch to account for cross environment
Backport candidate
---
 stubdom/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/stubdom/Makefile b/stubdom/Makefile
index dd52121..571704d 100644
--- a/stubdom/Makefile
+++ b/stubdom/Makefile
@@ -171,7 +171,7 @@ gmp-$(XEN_TARGET_ARCH): gmp-$(GMP_VERSION).tar.bz2 
$(NEWLIB_STAMPFILE)
rm $@ -rf || :
mv gmp-$(GMP_VERSION) $@
#patch -d $@ -p0 < gmp.patch
-   cd $@; CPPFLAGS="-isystem 
$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf/include $(TARGET_CPPFLAGS)" 
CFLAGS="$(TARGET_CFLAGS)" CC=$(CC) $(GMPEXT) ./configure --disable-shared 
--enable-static --disable-fft --without-readline 
--prefix=$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf --build=`gcc -dumpmachine` 
--host=$(GNU_TARGET_ARCH)-xen-elf
+   cd $@; CPPFLAGS="-isystem 
$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf/include $(TARGET_CPPFLAGS)" 
CFLAGS="$(TARGET_CFLAGS)" CC=$(CC) $(GMPEXT) ./configure --disable-shared 
--enable-static --disable-fft --without-readline 
--prefix=$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf 
--libdir=$(CROSS_PREFIX)/$(GNU_TARGET_ARCH)-xen-elf/lib --build=`gcc 
-dumpmachine` --host=$(GNU_TARGET_ARCH)-xen-elf
sed -i 's/#define HAVE_OBSTACK_VPRINTF 1/\/\/#define 
HAVE_OBSTACK_VPRINTF 1/' $@/config.h
touch $@
 
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for-4.8] stubdom: make GMP aware that it's being cross-compiled

2016-10-31 Thread Juergen Gross
On 29/10/16 19:22, Wei Liu wrote:
> Append --build and --host flags to GMP's configure script so that it
> knows it is being compiled for another architecture.
> 
> This should fix the issue that GMP doesn't compile with gcc 6, because
> configure script won't try to test the host environment anymore.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Juergen Gross 

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 11/12] xen: make use of xenbus_read_unsigned() in xen-pciback

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xen-pciback/xenbus.c | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/xen/xen-pciback/xenbus.c b/drivers/xen/xen-pciback/xenbus.c
index 5ce878c..3f0aee0 100644
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -362,7 +362,7 @@ static int xen_pcibk_reconfigure(struct xen_pcibk_device 
*pdev)
int err = 0;
int num_devs;
int domain, bus, slot, func;
-   int substate;
+   unsigned int substate;
int i, len;
char state_str[64];
char dev_str[64];
@@ -395,10 +395,8 @@ static int xen_pcibk_reconfigure(struct xen_pcibk_device 
*pdev)
 "configuration");
goto out;
}
-   err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, state_str,
-  "%d", &substate);
-   if (err != 1)
-   substate = XenbusStateUnknown;
+   substate = xenbus_read_unsigned(pdev->xdev->nodename, state_str,
+   XenbusStateUnknown);
 
switch (substate) {
case XenbusStateInitialising:
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 08/12] xen: make use of xenbus_read_unsigned() in xen-pcifront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Cc: bhelg...@google.com
Cc: linux-...@vger.kernel.org

Signed-off-by: Juergen Gross 
---
 drivers/pci/xen-pcifront.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/xen-pcifront.c b/drivers/pci/xen-pcifront.c
index d6ff5e8..8fc2e95 100644
--- a/drivers/pci/xen-pcifront.c
+++ b/drivers/pci/xen-pcifront.c
@@ -1038,10 +1038,8 @@ static int pcifront_detach_devices(struct 
pcifront_device *pdev)
err = -ENOMEM;
goto out;
}
-   err = xenbus_scanf(XBT_NIL, pdev->xdev->otherend, str, "%d",
-  &state);
-   if (err != 1)
-   state = XenbusStateUnknown;
+   state = xenbus_read_unsigned(pdev->xdev->otherend, str,
+XenbusStateUnknown);
 
if (state != XenbusStateClosing)
continue;
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 04/12] xen: make use of xenbus_read_unsigned() in xen-tpmfront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Cc: peterhu...@gmx.de
Cc: tp...@selhorst.net
Cc: jarkko.sakki...@linux.intel.com
Cc: jguntho...@obsidianresearch.com
Cc: tpmdd-de...@lists.sourceforge.net

Signed-off-by: Juergen Gross 
---
 drivers/char/tpm/xen-tpmfront.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/drivers/char/tpm/xen-tpmfront.c b/drivers/char/tpm/xen-tpmfront.c
index 62028f4..50072cc 100644
--- a/drivers/char/tpm/xen-tpmfront.c
+++ b/drivers/char/tpm/xen-tpmfront.c
@@ -337,18 +337,14 @@ static int tpmfront_resume(struct xenbus_device *dev)
 static void backend_changed(struct xenbus_device *dev,
enum xenbus_state backend_state)
 {
-   int val;
-
switch (backend_state) {
case XenbusStateInitialised:
case XenbusStateConnected:
if (dev->state == XenbusStateConnected)
break;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend,
-   "feature-protocol-v2", "%d", &val) < 0)
-   val = 0;
-   if (!val) {
+   if (!xenbus_read_unsigned(dev->otherend, "feature-protocol-v2",
+ 0)) {
xenbus_dev_fatal(dev, -EINVAL,
"vTPM protocol 2 required");
return;
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 12/12] xen: make use of xenbus_read_unsigned() in xenbus

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xenbus/xenbus_probe_backend.c | 8 +---
 drivers/xen/xenbus/xenbus_xs.c| 7 +++
 2 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_probe_backend.c 
b/drivers/xen/xenbus/xenbus_probe_backend.c
index 04f7f85..37929df 100644
--- a/drivers/xen/xenbus/xenbus_probe_backend.c
+++ b/drivers/xen/xenbus/xenbus_probe_backend.c
@@ -224,13 +224,7 @@ static int read_frontend_details(struct xenbus_device 
*xendev)
 
 int xenbus_dev_is_online(struct xenbus_device *dev)
 {
-   int rc, val;
-
-   rc = xenbus_scanf(XBT_NIL, dev->nodename, "online", "%d", &val);
-   if (rc != 1)
-   val = 0; /* no online node present */
-
-   return val;
+   return !!xenbus_read_unsigned(dev->nodename, "online", 0);
 }
 EXPORT_SYMBOL_GPL(xenbus_dev_is_online);
 
diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 99dfdfa..6afb993 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -687,7 +687,7 @@ static bool xen_strict_xenbus_quirk(void)
 }
 static void xs_reset_watches(void)
 {
-   int err, supported = 0;
+   int err;
 
if (!xen_hvm_domain() || xen_initial_domain())
return;
@@ -695,9 +695,8 @@ static void xs_reset_watches(void)
if (xen_strict_xenbus_quirk())
return;
 
-   err = xenbus_scanf(XBT_NIL, "control",
-   "platform-feature-xs_reset_watches", "%d", &supported);
-   if (err != 1 || !supported)
+   if (!xenbus_read_unsigned("control",
+ "platform-feature-xs_reset_watches", 0))
return;
 
err = xs_error(xs_single(XBT_NIL, XS_RESET_WATCHES, "", NULL));
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 10/12] xen: make use of xenbus_read_unsigned() in xen-fbfront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: tomi.valkei...@ti.com
Cc: linux-fb...@vger.kernel.org

Signed-off-by: Juergen Gross 
---
 drivers/video/fbdev/xen-fbfront.c | 13 -
 1 file changed, 4 insertions(+), 9 deletions(-)

diff --git a/drivers/video/fbdev/xen-fbfront.c 
b/drivers/video/fbdev/xen-fbfront.c
index 0567d51..d0115a7 100644
--- a/drivers/video/fbdev/xen-fbfront.c
+++ b/drivers/video/fbdev/xen-fbfront.c
@@ -633,7 +633,6 @@ static void xenfb_backend_changed(struct xenbus_device *dev,
  enum xenbus_state backend_state)
 {
struct xenfb_info *info = dev_get_drvdata(&dev->dev);
-   int val;
 
switch (backend_state) {
case XenbusStateInitialising:
@@ -657,16 +656,12 @@ static void xenfb_backend_changed(struct xenbus_device 
*dev,
if (dev->state != XenbusStateConnected)
goto InitWait; /* no InitWait seen yet, fudge it */
 
-   if (xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-"request-update", "%d", &val) < 0)
-   val = 0;
-   if (val)
+   if (xenbus_read_unsigned(info->xbdev->otherend,
+"request-update", 0))
info->update_wanted = 1;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend,
-"feature-resize", "%d", &val) < 0)
-   val = 0;
-   info->feature_resize = val;
+   info->feature_resize = xenbus_read_unsigned(dev->otherend,
+   "feature-resize", 0);
break;
 
case XenbusStateClosed:
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 05/12] xen: make use of xenbus_read_unsigned() in xen-kbdfront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of the reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: dmitry.torok...@gmail.com
Cc: linux-in...@vger.kernel.org

Signed-off-by: Juergen Gross 
---
 drivers/input/misc/xen-kbdfront.c | 13 +
 1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/input/misc/xen-kbdfront.c 
b/drivers/input/misc/xen-kbdfront.c
index 227fbd2..3900875 100644
--- a/drivers/input/misc/xen-kbdfront.c
+++ b/drivers/input/misc/xen-kbdfront.c
@@ -108,7 +108,8 @@ static irqreturn_t input_handler(int rq, void *dev_id)
 static int xenkbd_probe(struct xenbus_device *dev,
  const struct xenbus_device_id *id)
 {
-   int ret, i, abs;
+   int ret, i;
+   unsigned int abs;
struct xenkbd_info *info;
struct input_dev *kbd, *ptr;
 
@@ -127,8 +128,7 @@ static int xenkbd_probe(struct xenbus_device *dev,
if (!info->page)
goto error_nomem;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-abs-pointer", "%d", 
&abs) < 0)
-   abs = 0;
+   abs = xenbus_read_unsigned(dev->otherend, "feature-abs-pointer", 0);
if (abs) {
ret = xenbus_write(XBT_NIL, dev->nodename,
   "request-abs-pointer", "1");
@@ -322,11 +322,8 @@ static void xenkbd_backend_changed(struct xenbus_device 
*dev,
 
case XenbusStateInitWait:
 InitWait:
-   ret = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "feature-abs-pointer", "%d", &val);
-   if (ret < 0)
-   val = 0;
-   if (val) {
+   if (xenbus_read_unsigned(info->xbdev->otherend,
+"feature-abs-pointer", 0)) {
ret = xenbus_write(XBT_NIL, info->xbdev->nodename,
   "request-abs-pointer", "1");
if (ret)
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 09/12] xen: make use of xenbus_read_unsigned() in xen-scsifront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.

Signed-off-by: Juergen Gross 
---
 drivers/scsi/xen-scsifront.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/scsi/xen-scsifront.c b/drivers/scsi/xen-scsifront.c
index 9dc8687..7e817c6 100644
--- a/drivers/scsi/xen-scsifront.c
+++ b/drivers/scsi/xen-scsifront.c
@@ -1060,13 +1060,9 @@ static void scsifront_read_backend_params(struct 
xenbus_device *dev,
  struct vscsifrnt_info *info)
 {
unsigned int sg_grant, nr_segs;
-   int ret;
struct Scsi_Host *host = info->host;
 
-   ret = xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg-grant", "%u",
-  &sg_grant);
-   if (ret != 1)
-   sg_grant = 0;
+   sg_grant = xenbus_read_unsigned(dev->otherend, "feature-sg-grant", 0);
nr_segs = min_t(unsigned int, sg_grant, SG_ALL);
nr_segs = max_t(unsigned int, nr_segs, VSCSIIF_SG_TABLESIZE);
nr_segs = min_t(unsigned int, nr_segs,
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 02/12] xen: make use of xenbus_read_unsigned() in xen-blkback

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of one read from int to unsigned,
but this case has been wrong before: negative values are not allowed
for the modified case.

Cc: konrad.w...@oracle.com
Cc: roger@citrix.com

Signed-off-by: Juergen Gross 
---
 drivers/block/xen-blkback/xenbus.c | 36 ++--
 1 file changed, 14 insertions(+), 22 deletions(-)

diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 3cc6d1d..415e79b 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -533,13 +533,11 @@ static void xen_blkbk_discard(struct xenbus_transaction 
xbt, struct backend_info
struct xenbus_device *dev = be->dev;
struct xen_blkif *blkif = be->blkif;
int err;
-   int state = 0, discard_enable;
+   int state = 0;
struct block_device *bdev = be->blkif->vbd.bdev;
struct request_queue *q = bdev_get_queue(bdev);
 
-   err = xenbus_scanf(XBT_NIL, dev->nodename, "discard-enable", "%d",
-  &discard_enable);
-   if (err == 1 && !discard_enable)
+   if (!xenbus_read_unsigned(dev->nodename, "discard-enable", 1))
return;
 
if (blk_queue_discard(q)) {
@@ -1039,30 +1037,24 @@ static int connect_ring(struct backend_info *be)
xenbus_dev_fatal(dev, err, "unknown fe protocol %s", protocol);
return -ENOSYS;
}
-   err = xenbus_scanf(XBT_NIL, dev->otherend,
-  "feature-persistent", "%u", &pers_grants);
-   if (err <= 0)
-   pers_grants = 0;
-
+   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-persistent",
+  0);
be->blkif->vbd.feature_gnt_persistent = pers_grants;
be->blkif->vbd.overflow_max_grants = 0;
 
/*
 * Read the number of hardware queues from frontend.
 */
-   err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues",
-  "%u", &requested_num_queues);
-   if (err < 0) {
-   requested_num_queues = 1;
-   } else {
-   if (requested_num_queues > xenblk_max_queues
-   || requested_num_queues == 0) {
-   /* Buggy or malicious guest. */
-   xenbus_dev_fatal(dev, err,
-   "guest requested %u queues, exceeding 
the maximum of %u.",
-   requested_num_queues, 
xenblk_max_queues);
-   return -ENOSYS;
-   }
+   requested_num_queues = xenbus_read_unsigned(dev->otherend,
+   "multi-queue-num-queues",
+   1);
+   if (requested_num_queues > xenblk_max_queues
+   || requested_num_queues == 0) {
+   /* Buggy or malicious guest. */
+   xenbus_dev_fatal(dev, err,
+   "guest requested %u queues, exceeding the 
maximum of %u.",
+   requested_num_queues, xenblk_max_queues);
+   return -ENOSYS;
}
be->blkif->nr_rings = requested_num_queues;
if (xen_blkif_alloc_rings(be->blkif))
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 01/12] xen: introduce xenbus_read_unsigned()

2016-10-31 Thread Juergen Gross
There are multiple instances of code reading an optional unsigned
parameter from Xenstore via xenbus_scanf(). Instead of repeating the
same code over and over add a service function doing the job.

Signed-off-by: Juergen Gross 
---
 drivers/xen/xenbus/xenbus_xs.c | 15 +++
 include/xen/xenbus.h   |  4 
 2 files changed, 19 insertions(+)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index 22f7cd7..99dfdfa 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -559,6 +559,21 @@ int xenbus_scanf(struct xenbus_transaction t,
 }
 EXPORT_SYMBOL_GPL(xenbus_scanf);
 
+/* Read an (optional) unsigned value. */
+unsigned int xenbus_read_unsigned(const char *dir, const char *node,
+ unsigned int default_val)
+{
+   unsigned int val;
+   int ret;
+
+   ret = xenbus_scanf(XBT_NIL, dir, node, "%u", &val);
+   if (ret <= 0)
+   val = default_val;
+
+   return val;
+}
+EXPORT_SYMBOL_GPL(xenbus_read_unsigned);
+
 /* Single printf and write: returns -errno or 0. */
 int xenbus_printf(struct xenbus_transaction t,
  const char *dir, const char *node, const char *fmt, ...)
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 32b944b..271ba62 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -151,6 +151,10 @@ __scanf(4, 5)
 int xenbus_scanf(struct xenbus_transaction t,
 const char *dir, const char *node, const char *fmt, ...);
 
+/* Read an (optional) unsigned value. */
+unsigned int xenbus_read_unsigned(const char *dir, const char *node,
+ unsigned int default_val);
+
 /* Single printf and write: returns -errno or 0. */
 __printf(4, 5)
 int xenbus_printf(struct xenbus_transaction t,
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 07/12] xen: make use of xenbus_read_unsigned() in xen-netfront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: net...@vger.kernel.org

Signed-off-by: Juergen Gross 
---
 drivers/net/xen-netfront.c | 67 +-
 1 file changed, 18 insertions(+), 49 deletions(-)

diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index e17879d..95d664e 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -1169,43 +1169,23 @@ static netdev_features_t xennet_fix_features(struct 
net_device *dev,
netdev_features_t features)
 {
struct netfront_info *np = netdev_priv(dev);
-   int val;
 
-   if (features & NETIF_F_SG) {
-   if (xenbus_scanf(XBT_NIL, np->xbdev->otherend, "feature-sg",
-"%d", &val) < 0)
-   val = 0;
+   if (features & NETIF_F_SG &&
+   !xenbus_read_unsigned(np->xbdev->otherend, "feature-sg", 0))
+   features &= ~NETIF_F_SG;
 
-   if (!val)
-   features &= ~NETIF_F_SG;
-   }
+   if (features & NETIF_F_IPV6_CSUM &&
+   !xenbus_read_unsigned(np->xbdev->otherend,
+ "feature-ipv6-csum-offload", 0))
+   features &= ~NETIF_F_IPV6_CSUM;
 
-   if (features & NETIF_F_IPV6_CSUM) {
-   if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
-"feature-ipv6-csum-offload", "%d", &val) < 0)
-   val = 0;
+   if (features & NETIF_F_TSO &&
+   !xenbus_read_unsigned(np->xbdev->otherend, "feature-gso-tcpv4", 0))
+   features &= ~NETIF_F_TSO;
 
-   if (!val)
-   features &= ~NETIF_F_IPV6_CSUM;
-   }
-
-   if (features & NETIF_F_TSO) {
-   if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
-"feature-gso-tcpv4", "%d", &val) < 0)
-   val = 0;
-
-   if (!val)
-   features &= ~NETIF_F_TSO;
-   }
-
-   if (features & NETIF_F_TSO6) {
-   if (xenbus_scanf(XBT_NIL, np->xbdev->otherend,
-"feature-gso-tcpv6", "%d", &val) < 0)
-   val = 0;
-
-   if (!val)
-   features &= ~NETIF_F_TSO6;
-   }
+   if (features & NETIF_F_TSO6 &&
+   !xenbus_read_unsigned(np->xbdev->otherend, "feature-gso-tcpv6", 0))
+   features &= ~NETIF_F_TSO6;
 
return features;
 }
@@ -1821,18 +1801,13 @@ static int talk_to_netback(struct xenbus_device *dev,
info->netdev->irq = 0;
 
/* Check if backend supports multiple queues */
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "multi-queue-max-queues", "%u", &max_queues);
-   if (err < 0)
-   max_queues = 1;
+   max_queues = xenbus_read_unsigned(info->xbdev->otherend,
+ "multi-queue-max-queues", 1);
num_queues = min(max_queues, xennet_max_queues);
 
/* Check feature-split-event-channels */
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "feature-split-event-channels", "%u",
-  &feature_split_evtchn);
-   if (err < 0)
-   feature_split_evtchn = 0;
+   feature_split_evtchn = xenbus_read_unsigned(info->xbdev->otherend,
+   "feature-split-event-channels", 0);
 
/* Read mac addr. */
err = xen_net_read_mac(dev, info->netdev->dev_addr);
@@ -1966,16 +1941,10 @@ static int xennet_connect(struct net_device *dev)
struct netfront_info *np = netdev_priv(dev);
unsigned int num_queues = 0;
int err;
-   unsigned int feature_rx_copy;
unsigned int j = 0;
struct netfront_queue *queue = NULL;
 
-   err = xenbus_scanf(XBT_NIL, np->xbdev->otherend,
-  "feature-rx-copy", "%u", &feature_rx_copy);
-   if (err != 1)
-   feature_rx_copy = 0;
-
-   if (!feature_rx_copy) {
+   if (!xenbus_read_unsigned(np->xbdev->otherend, "feature-rx-copy", 0)) {
dev_info(&dev->dev,
 "backend does not support copying receive path\n");
return -ENODEV;
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 03/12] xen: make use of xenbus_read_unsigned() in xen-blkfront

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: konrad.w...@oracle.com
Cc: roger@citrix.com

Signed-off-by: Juergen Gross 
---
 drivers/block/xen-blkfront.c | 81 ++--
 1 file changed, 26 insertions(+), 55 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 9908597..2ee9646 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -1758,17 +1758,13 @@ static int talk_to_blkback(struct xenbus_device *dev,
const char *message = NULL;
struct xenbus_transaction xbt;
int err;
-   unsigned int i, max_page_order = 0;
-   unsigned int ring_page_order = 0;
+   unsigned int i, max_page_order;
+   unsigned int ring_page_order;
 
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "max-ring-page-order", "%u", &max_page_order);
-   if (err != 1)
-   info->nr_ring_pages = 1;
-   else {
-   ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
-   info->nr_ring_pages = 1 << ring_page_order;
-   }
+   max_page_order = xenbus_read_unsigned(info->xbdev->otherend,
+ "max-ring-page-order", 0);
+   ring_page_order = min(xen_blkif_max_ring_order, max_page_order);
+   info->nr_ring_pages = 1 << ring_page_order;
 
for (i = 0; i < info->nr_rings; i++) {
struct blkfront_ring_info *rinfo = &info->rinfo[i];
@@ -1877,18 +1873,14 @@ static int talk_to_blkback(struct xenbus_device *dev,
 
 static int negotiate_mq(struct blkfront_info *info)
 {
-   unsigned int backend_max_queues = 0;
-   int err;
+   unsigned int backend_max_queues;
unsigned int i;
 
BUG_ON(info->nr_rings);
 
/* Check if backend supports multiple queues. */
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "multi-queue-max-queues", "%u", &backend_max_queues);
-   if (err < 0)
-   backend_max_queues = 1;
-
+   backend_max_queues = xenbus_read_unsigned(info->xbdev->otherend,
+ "multi-queue-max-queues", 1);
info->nr_rings = min(backend_max_queues, xen_blkif_max_queues);
/* We need at least one ring. */
if (!info->nr_rings)
@@ -2195,7 +2187,6 @@ static void blkfront_setup_discard(struct blkfront_info 
*info)
int err;
unsigned int discard_granularity;
unsigned int discard_alignment;
-   unsigned int discard_secure;
 
info->feature_discard = 1;
err = xenbus_gather(XBT_NIL, info->xbdev->otherend,
@@ -2206,10 +2197,9 @@ static void blkfront_setup_discard(struct blkfront_info 
*info)
info->discard_granularity = discard_granularity;
info->discard_alignment = discard_alignment;
}
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "discard-secure", "%u", &discard_secure);
-   if (err > 0)
-   info->feature_secdiscard = !!discard_secure;
+   info->feature_secdiscard =
+   !!xenbus_read_unsigned(info->xbdev->otherend, "discard-secure",
+  0);
 }
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo)
@@ -2301,16 +2291,11 @@ static int blkfront_setup_indirect(struct 
blkfront_ring_info *rinfo)
  */
 static void blkfront_gather_backend_features(struct blkfront_info *info)
 {
-   int err;
-   int barrier, flush, discard, persistent;
unsigned int indirect_segments;
 
info->feature_flush = 0;
info->feature_fua = 0;
 
-   err = xenbus_scanf(XBT_NIL, info->xbdev->otherend,
-  "feature-barrier", "%d", &barrier);
-
/*
 * If there's no "feature-barrier" defined, then it means
 * we're dealing with a very old backend which writes
@@ -2318,7 +2303,7 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
 *
 * If there are barriers, then we use flush.
 */
-   if (err > 0 && barrier) {
+   if (xenbus_read_unsigned(info->xbdev->otherend, "feature-barrier", 0)) {
info->feature_flush = 1;
info->feature_fua = 1;
}
@@ -2327,35 +2312,23 @@ static void blkfront_gather_backend_features(struct 
blkfront_info *info)
 * And if there is "f

[Xen-devel] [PATCH 00/12] xen: add common function for reading optional value

2016-10-31 Thread Juergen Gross
There are multiple instances of code reading an optional unsigned
parameter from Xenstore via xenbus_scanf(). Instead of repeating the
same code over and over add a service function doing the job and
replace the call of xenbus_scanf() with the call of the new function
where appropriate.

Juergen Gross (12):
  xen: introduce xenbus_read_unsigned()
  xen: make use of xenbus_read_unsigned() in xen-blkback
  xen: make use of xenbus_read_unsigned() in xen-blkfront
  xen: make use of xenbus_read_unsigned() in xen-tpmfront
  xen: make use of xenbus_read_unsigned() in xen-kbdfront
  xen: make use of xenbus_read_unsigned() in xen-netback
  xen: make use of xenbus_read_unsigned() in xen-netfront
  xen: make use of xenbus_read_unsigned() in xen-pcifront
  xen: make use of xenbus_read_unsigned() in xen-scsifront
  xen: make use of xenbus_read_unsigned() in xen-fbfront
  xen: make use of xenbus_read_unsigned() in xen-pciback
  xen: make use of xenbus_read_unsigned() in xenbus

 drivers/block/xen-blkback/xenbus.c| 36 ++
 drivers/block/xen-blkfront.c  | 81 ++-
 drivers/char/tpm/xen-tpmfront.c   |  8 +--
 drivers/input/misc/xen-kbdfront.c | 13 ++---
 drivers/net/xen-netback/xenbus.c  | 50 ++-
 drivers/net/xen-netfront.c| 67 +++--
 drivers/pci/xen-pcifront.c|  6 +--
 drivers/scsi/xen-scsifront.c  |  6 +--
 drivers/video/fbdev/xen-fbfront.c | 13 ++---
 drivers/xen/xen-pciback/xenbus.c  |  8 ++-
 drivers/xen/xenbus/xenbus_probe_backend.c |  8 +--
 drivers/xen/xenbus/xenbus_xs.c| 22 +++--
 include/xen/xenbus.h  |  4 ++
 13 files changed, 112 insertions(+), 210 deletions(-)

Cc: konrad.w...@oracle.com
Cc: roger@citrix.com
Cc: peterhu...@gmx.de
Cc: tp...@selhorst.net
Cc: jarkko.sakki...@linux.intel.com
Cc: jguntho...@obsidianresearch.com
Cc: tpmdd-de...@lists.sourceforge.net
Cc: dmitry.torok...@gmail.com
Cc: linux-in...@vger.kernel.org
Cc: wei.l...@citrix.com
Cc: paul.durr...@citrix.com
Cc: net...@vger.kernel.org
Cc: bhelg...@google.com
Cc: linux-...@vger.kernel.org
Cc: tomi.valkei...@ti.com
Cc: linux-fb...@vger.kernel.org
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH 06/12] xen: make use of xenbus_read_unsigned() in xen-netback

2016-10-31 Thread Juergen Gross
Use xenbus_read_unsigned() instead of xenbus_scanf() when possible.
This requires to change the type of some reads from int to unsigned,
but these cases have been wrong before: negative values are not allowed
for the modified cases.

Cc: wei.l...@citrix.com
Cc: paul.durr...@citrix.com
Cc: net...@vger.kernel.org

Signed-off-by: Juergen Gross 
---
 drivers/net/xen-netback/xenbus.c | 50 +++-
 1 file changed, 14 insertions(+), 36 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 8674e18..7356e00 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -785,12 +785,9 @@ static void xen_mcast_ctrl_changed(struct xenbus_watch 
*watch,
struct xenvif *vif = container_of(watch, struct xenvif,
  mcast_ctrl_watch);
struct xenbus_device *dev = xenvif_to_xenbus_device(vif);
-   int val;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend,
-"request-multicast-control", "%d", &val) < 0)
-   val = 0;
-   vif->multicast_control = !!val;
+   vif->multicast_control = !!xenbus_read_unsigned(dev->otherend,
+   "request-multicast-control", 0);
 }
 
 static int xen_register_mcast_ctrl_watch(struct xenbus_device *dev,
@@ -934,12 +931,9 @@ static void connect(struct backend_info *be)
/* Check whether the frontend requested multiple queues
 * and read the number requested.
 */
-   err = xenbus_scanf(XBT_NIL, dev->otherend,
-  "multi-queue-num-queues",
-  "%u", &requested_num_queues);
-   if (err < 0) {
-   requested_num_queues = 1; /* Fall back to single queue */
-   } else if (requested_num_queues > xenvif_max_queues) {
+   requested_num_queues = xenbus_read_unsigned(dev->otherend,
+   "multi-queue-num-queues", 1);
+   if (requested_num_queues > xenvif_max_queues) {
/* buggy or malicious guest */
xenbus_dev_fatal(dev, err,
 "guest requested %u queues, exceeding the 
maximum of %u.",
@@ -1134,7 +1128,7 @@ static int read_xenbus_vif_flags(struct backend_info *be)
struct xenvif *vif = be->vif;
struct xenbus_device *dev = be->dev;
unsigned int rx_copy;
-   int err, val;
+   int err;
 
err = xenbus_scanf(XBT_NIL, dev->otherend, "request-rx-copy", "%u",
   &rx_copy);
@@ -1150,10 +1144,7 @@ static int read_xenbus_vif_flags(struct backend_info *be)
if (!rx_copy)
return -EOPNOTSUPP;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend,
-"feature-rx-notify", "%d", &val) < 0)
-   val = 0;
-   if (!val) {
+   if (!xenbus_read_unsigned(dev->otherend, "feature-rx-notify", 0)) {
/* - Reduce drain timeout to poll more frequently for
 *   Rx requests.
 * - Disable Rx stall detection.
@@ -1162,34 +1153,21 @@ static int read_xenbus_vif_flags(struct backend_info 
*be)
be->vif->stall_timeout = 0;
}
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-sg",
-"%d", &val) < 0)
-   val = 0;
-   vif->can_sg = !!val;
+   vif->can_sg = !!xenbus_read_unsigned(dev->otherend, "feature-sg", 0);
 
vif->gso_mask = 0;
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv4",
-"%d", &val) < 0)
-   val = 0;
-   if (val)
+   if (xenbus_read_unsigned(dev->otherend, "feature-gso-tcpv4", 0))
vif->gso_mask |= GSO_BIT(TCPV4);
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-gso-tcpv6",
-"%d", &val) < 0)
-   val = 0;
-   if (val)
+   if (xenbus_read_unsigned(dev->otherend, "feature-gso-tcpv6", 0))
vif->gso_mask |= GSO_BIT(TCPV6);
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-no-csum-offload",
-"%d", &val) < 0)
-   val = 0;
-   vif->ip_csum = !val;
+   vif->ip_csum = !xenbus_read_unsigned(dev->otherend,
+"feature-no-csum-offload", 0);
 
-   if (xenbus_scanf(XBT_NIL, dev->otherend, "feature-ipv6-csum-offload",
-"%d", &val) < 0)
-   val = 0;
-   vif->ipv6_csum = !!val;
+   vif->ipv6_csum = !!xenbus_read_unsigned(dev->otherend,
+   "feature-ipv6-csum-offload", 0);
 
return 0;
 }
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 00/12] xen: add common function for reading optional value

2016-10-31 Thread Juergen Gross
On 31/10/16 18:08, David Miller wrote:
> From: Juergen Gross 
> Date: Mon, 31 Oct 2016 17:48:18 +0100
> 
>> There are multiple instances of code reading an optional unsigned
>> parameter from Xenstore via xenbus_scanf(). Instead of repeating the
>> same code over and over add a service function doing the job and
>> replace the call of xenbus_scanf() with the call of the new function
>> where appropriate.
> 
> As this seems to be a series that will go through some tree other
> than mine, I assume the networking bits will be taken care of that
> way.
> 

If accepted I expect this series to go through the Xen tree.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 2/4] qdev: add function qdev_set_id()

2016-11-02 Thread Juergen Gross
In order to have an easy way to add a new qdev with a specific id
carve out the needed functionality from qdev_device_add() into a new
function qdev_set_id().

Signed-off-by: Juergen Gross 
---
 include/monitor/qdev.h |  1 +
 qdev-monitor.c | 36 
 2 files changed, 21 insertions(+), 16 deletions(-)

diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index 8e504bc..0ff3331 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -12,5 +12,6 @@ void qmp_device_add(QDict *qdict, QObject **ret_data, Error 
**errp);
 
 int qdev_device_help(QemuOpts *opts);
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp);
+void qdev_set_id(DeviceState *dev, const char *id);
 
 #endif
diff --git a/qdev-monitor.c b/qdev-monitor.c
index 4f78ecb..c73410c 100644
--- a/qdev-monitor.c
+++ b/qdev-monitor.c
@@ -539,10 +539,28 @@ static BusState *qbus_find(const char *path, Error **errp)
 return bus;
 }
 
+void qdev_set_id(DeviceState *dev, const char *id)
+{
+if (id) {
+dev->id = id;
+}
+
+if (dev->id) {
+object_property_add_child(qdev_get_peripheral(), dev->id,
+  OBJECT(dev), NULL);
+} else {
+static int anon_count;
+gchar *name = g_strdup_printf("device[%d]", anon_count++);
+object_property_add_child(qdev_get_peripheral_anon(), name,
+  OBJECT(dev), NULL);
+g_free(name);
+}
+}
+
 DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 {
 DeviceClass *dc;
-const char *driver, *path, *id;
+const char *driver, *path;
 DeviceState *dev;
 BusState *bus = NULL;
 Error *err = NULL;
@@ -591,21 +609,7 @@ DeviceState *qdev_device_add(QemuOpts *opts, Error **errp)
 qdev_set_parent_bus(dev, bus);
 }
 
-id = qemu_opts_id(opts);
-if (id) {
-dev->id = id;
-}
-
-if (dev->id) {
-object_property_add_child(qdev_get_peripheral(), dev->id,
-  OBJECT(dev), NULL);
-} else {
-static int anon_count;
-gchar *name = g_strdup_printf("device[%d]", anon_count++);
-object_property_add_child(qdev_get_peripheral_anon(), name,
-  OBJECT(dev), NULL);
-g_free(name);
-}
+qdev_set_id(dev, qemu_opts_id(opts));
 
 /* set properties */
 if (qemu_opt_foreach(opts, set_property, dev, &err)) {
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 0/4] xed: add qdevs for each backend, correct pvUSB

2016-11-02 Thread Juergen Gross
Trying to use pvUSB in a Xen guest with a qemu emulated USB controller
will crash qemu as it tries to attach a pvUSB device to the emulated
controller.

This can be avoided by adding a unique id to each pvUSB controller which
can be used when attaching the pvUSB device. In order to make this
possible the pvUSB controller has to be a hotpluggable qemu device.

This is achieved by adding a qdev for each Xen backend all attached to
a new Xen specific bus.

Changes in V2:
- one qdev for each backend instead of pvUSB only

Juergen Gross (4):
  xen: add an own bus for xen backend devices
  qdev: add function qdev_set_id()
  xen: create qdev for each backend device
  xen: attach pvusb usb bus to backend qdev

 hw/usb/xen-usb.c | 23 +++
 hw/xen/xen_backend.c | 67 +---
 hw/xen/xen_pvdev.c   |  5 +++-
 include/hw/xen/xen_backend.h |  8 ++
 include/hw/xen/xen_pvdev.h   |  1 +
 include/monitor/qdev.h   |  1 +
 qdev-monitor.c   | 36 +---
 7 files changed, 107 insertions(+), 34 deletions(-)

-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 1/4] xen: add an own bus for xen backend devices

2016-11-02 Thread Juergen Gross
Add a bus for Xen backend devices in order to be able to establish a
dedicated device path for pluggable devices.

Signed-off-by: Juergen Gross 
---
 hw/xen/xen_backend.c | 19 ---
 include/hw/xen/xen_backend.h |  4 
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index 41ba5c5..5ad3caa 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -29,14 +29,14 @@
 #include "hw/sysbus.h"
 #include "sysemu/char.h"
 #include "qemu/log.h"
+#include "qapi/error.h"
 #include "hw/xen/xen_backend.h"
 #include "hw/xen/xen_pvdev.h"
 
 #include 
 
-#define TYPE_XENSYSDEV "xensysdev"
-
 DeviceState *xen_sysdev;
+BusState *xen_sysbus;
 
 /* - */
 
@@ -528,6 +528,8 @@ int xen_be_init(void)
 
 xen_sysdev = qdev_create(NULL, TYPE_XENSYSDEV);
 qdev_init_nofail(xen_sysdev);
+xen_sysbus = qbus_create(TYPE_XENSYSBUS, DEVICE(xen_sysdev), "xen-sysbus");
+qbus_set_bus_hotplug_handler(xen_sysbus, &error_abort);
 
 return 0;
 
@@ -586,6 +588,15 @@ int xen_be_bind_evtchn(struct XenDevice *xendev)
 }
 
 
+static const TypeInfo xensysbus_info = {
+.name   = TYPE_XENSYSBUS,
+.parent = TYPE_BUS,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_HOTPLUG_HANDLER },
+{ }
+}
+};
+
 static int xen_sysdev_init(SysBusDevice *dev)
 {
 return 0;
@@ -602,6 +613,7 @@ static void xen_sysdev_class_init(ObjectClass *klass, void 
*data)
 
 k->init = xen_sysdev_init;
 dc->props = xen_sysdev_properties;
+dc->bus_type = TYPE_XENSYSBUS;
 }
 
 static const TypeInfo xensysdev_info = {
@@ -613,7 +625,8 @@ static const TypeInfo xensysdev_info = {
 
 static void xenbe_register_types(void)
 {
+type_register_static(&xensysbus_info);
 type_register_static(&xensysdev_info);
 }
 
-type_init(xenbe_register_types);
+type_init(xenbe_register_types)
diff --git a/include/hw/xen/xen_backend.h b/include/hw/xen/xen_backend.h
index cbda40e..38f730e 100644
--- a/include/hw/xen/xen_backend.h
+++ b/include/hw/xen/xen_backend.h
@@ -6,12 +6,16 @@
 #include "sysemu/sysemu.h"
 #include "net/net.h"
 
+#define TYPE_XENSYSDEV "xen-sysdev"
+#define TYPE_XENSYSBUS "xen-sysbus"
+
 /* variables */
 extern xc_interface *xen_xc;
 extern xenforeignmemory_handle *xen_fmem;
 extern struct xs_handle *xenstore;
 extern const char *xen_protocol;
 extern DeviceState *xen_sysdev;
+extern BusState *xen_sysbus;
 
 int xenstore_mkdir(char *path, int p);
 int xenstore_write_be_str(struct XenDevice *xendev, const char *node, const 
char *val);
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 4/4] xen: attach pvusb usb bus to backend qdev

2016-11-02 Thread Juergen Gross
Attach the usb bus of a new pvusb controller to the qdev associated
with the Xen backend. Any device connected to that controller can now
specify the bus and port directly via its properties.

Signed-off-by: Juergen Gross 
---
 hw/usb/xen-usb.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/hw/usb/xen-usb.c b/hw/usb/xen-usb.c
index 1b3c2fb..8e676e6 100644
--- a/hw/usb/xen-usb.c
+++ b/hw/usb/xen-usb.c
@@ -712,15 +712,10 @@ static void usbback_portid_detach(struct usbback_info 
*usbif, unsigned port)
 
 static void usbback_portid_remove(struct usbback_info *usbif, unsigned port)
 {
-USBPort *p;
-
 if (!usbif->ports[port - 1].dev) {
 return;
 }
 
-p = &(usbif->ports[port - 1].port);
-snprintf(p->path, sizeof(p->path), "%d", 99);
-
 object_unparent(OBJECT(usbif->ports[port - 1].dev));
 usbif->ports[port - 1].dev = NULL;
 usbback_portid_detach(usbif, port);
@@ -733,10 +728,10 @@ static void usbback_portid_add(struct usbback_info 
*usbif, unsigned port,
 {
 unsigned speed;
 char *portname;
-USBPort *p;
 Error *local_err = NULL;
 QDict *qdict;
 QemuOpts *opts;
+char *tmp;
 
 if (usbif->ports[port - 1].dev) {
 return;
@@ -749,11 +744,16 @@ static void usbback_portid_add(struct usbback_info 
*usbif, unsigned port,
 return;
 }
 portname++;
-p = &(usbif->ports[port - 1].port);
-snprintf(p->path, sizeof(p->path), "%s", portname);
 
 qdict = qdict_new();
 qdict_put(qdict, "driver", qstring_from_str("usb-host"));
+tmp = g_strdup_printf("%s.0", usbif->xendev.qdev.id);
+qdict_put(qdict, "bus", qstring_from_str(tmp));
+g_free(tmp);
+tmp = g_strdup_printf("%s-%u", usbif->xendev.qdev.id, port);
+qdict_put(qdict, "id", qstring_from_str(tmp));
+g_free(tmp);
+qdict_put(qdict, "port", qint_from_int(port));
 qdict_put(qdict, "hostbus", qint_from_int(atoi(busid)));
 qdict_put(qdict, "hostport", qstring_from_str(portname));
 opts = qemu_opts_from_qdict(qemu_find_opts("device"), qdict, &local_err);
@@ -765,7 +765,6 @@ static void usbback_portid_add(struct usbback_info *usbif, 
unsigned port,
 goto err;
 }
 QDECREF(qdict);
-snprintf(p->path, sizeof(p->path), "%d", port);
 speed = usbif->ports[port - 1].dev->speed;
 switch (speed) {
 case USB_SPEED_LOW:
@@ -799,7 +798,6 @@ static void usbback_portid_add(struct usbback_info *usbif, 
unsigned port,
 
 err:
 QDECREF(qdict);
-snprintf(p->path, sizeof(p->path), "%d", 99);
 xen_pv_printf(&usbif->xendev, 0, "device %s could not be opened\n", busid);
 }
 
@@ -1012,13 +1010,13 @@ static void usbback_alloc(struct XenDevice *xendev)
 
 usbif = container_of(xendev, struct usbback_info, xendev);
 
-usb_bus_new(&usbif->bus, sizeof(usbif->bus), &xen_usb_bus_ops, xen_sysdev);
+usb_bus_new(&usbif->bus, sizeof(usbif->bus), &xen_usb_bus_ops,
+DEVICE(&xendev->qdev));
 for (i = 0; i < USBBACK_MAXPORTS; i++) {
 p = &(usbif->ports[i].port);
 usb_register_port(&usbif->bus, p, usbif, i, &xen_usb_port_ops,
   USB_SPEED_MASK_LOW | USB_SPEED_MASK_FULL |
   USB_SPEED_MASK_HIGH);
-snprintf(p->path, sizeof(p->path), "%d", 99);
 }
 
 QTAILQ_INIT(&usbif->req_free_q);
@@ -1066,7 +1064,6 @@ static int usbback_free(struct XenDevice *xendev)
 }
 
 usb_bus_release(&usbif->bus);
-object_unparent(OBJECT(&usbif->bus));
 
 TR_BUS(xendev, "finished\n");
 
-- 
2.6.6


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


[Xen-devel] [PATCH v2 3/4] xen: create qdev for each backend device

2016-11-02 Thread Juergen Gross
Create a qdev plugged to the xen-sysbus for each new backend device.
This device can be used as a parent for all needed devices of that
backend. The id of the new device will be "xen--" with
 being the xen backend type (e.g. "qdisk") and  the xen
backend number of the type under which it is to be found in xenstore.

Signed-off-by: Juergen Gross 
---
 hw/xen/xen_backend.c | 48 +++-
 hw/xen/xen_pvdev.c   |  5 -
 include/hw/xen/xen_backend.h |  4 
 include/hw/xen/xen_pvdev.h   |  1 +
 4 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/hw/xen/xen_backend.c b/hw/xen/xen_backend.c
index 5ad3caa..3cb40b2 100644
--- a/hw/xen/xen_backend.c
+++ b/hw/xen/xen_backend.c
@@ -27,11 +27,13 @@
 
 #include "hw/hw.h"
 #include "hw/sysbus.h"
+#include "hw/boards.h"
 #include "sysemu/char.h"
 #include "qemu/log.h"
 #include "qapi/error.h"
 #include "hw/xen/xen_backend.h"
 #include "hw/xen/xen_pvdev.h"
+#include "monitor/qdev.h"
 
 #include 
 
@@ -121,6 +123,12 @@ static struct XenDevice *xen_be_get_xendev(const char 
*type, int dom, int dev,
 
 /* init new xendev */
 xendev = g_malloc0(ops->size);
+object_initialize(&xendev->qdev, ops->size, TYPE_XENBACKEND);
+qdev_set_parent_bus(&xendev->qdev, xen_sysbus);
+qdev_set_id(&xendev->qdev, g_strdup_printf("xen-%s-%d", type, dev));
+qdev_init_nofail(&xendev->qdev);
+object_unref(OBJECT(&xendev->qdev));
+
 xendev->type  = type;
 xendev->dom   = dom;
 xendev->dev   = dev;
@@ -163,7 +171,6 @@ static struct XenDevice *xen_be_get_xendev(const char 
*type, int dom, int dev,
 return xendev;
 }
 
-
 /*
  * Sync internal data structures on xenstore updates.
  * Node specifies the changed field.  node = NULL means
@@ -541,6 +548,15 @@ err:
 return -1;
 }
 
+static void xen_set_dynamic_sysbus(void)
+{
+Object *machine = qdev_get_machine();
+ObjectClass *oc = object_get_class(machine);
+MachineClass *mc = MACHINE_CLASS(oc);
+
+mc->has_dynamic_sysbus = true;
+}
+
 int xen_be_register(const char *type, struct XenDevOps *ops)
 {
 char path[50];
@@ -562,6 +578,8 @@ int xen_be_register(const char *type, struct XenDevOps *ops)
 
 void xen_be_register_common(void)
 {
+xen_set_dynamic_sysbus();
+
 xen_be_register("console", &xen_console_ops);
 xen_be_register("vkbd", &xen_kbdmouse_ops);
 xen_be_register("qdisk", &xen_blkdev_ops);
@@ -588,9 +606,36 @@ int xen_be_bind_evtchn(struct XenDevice *xendev)
 }
 
 
+static Property xendev_properties[] = {
+DEFINE_PROP_END_OF_LIST(),
+};
+
+static void xendev_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->props = xendev_properties;
+set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+}
+
+static const TypeInfo xendev_type_info = {
+.name  = TYPE_XENBACKEND,
+.parent= TYPE_XENSYSDEV,
+.class_init= xendev_class_init,
+.instance_size = sizeof(struct XenDevice),
+};
+
+static void xen_sysbus_class_init(ObjectClass *klass, void *data)
+{
+HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
+
+hc->unplug = qdev_simple_device_unplug_cb;
+}
+
 static const TypeInfo xensysbus_info = {
 .name   = TYPE_XENSYSBUS,
 .parent = TYPE_BUS,
+.class_init = xen_sysbus_class_init,
 .interfaces = (InterfaceInfo[]) {
 { TYPE_HOTPLUG_HANDLER },
 { }
@@ -627,6 +672,7 @@ static void xenbe_register_types(void)
 {
 type_register_static(&xensysbus_info);
 type_register_static(&xensysdev_info);
+type_register_static(&xendev_type_info);
 }
 
 type_init(xenbe_register_types)
diff --git a/hw/xen/xen_pvdev.c b/hw/xen/xen_pvdev.c
index 405e154..773c278 100644
--- a/hw/xen/xen_pvdev.c
+++ b/hw/xen/xen_pvdev.c
@@ -18,10 +18,12 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/qdev-core.h"
 
 #include "hw/xen/xen_backend.h"
 #include "hw/xen/xen_pvdev.h"
 
+
 /* private */
 static int debug;
 
@@ -307,7 +309,8 @@ void xen_pv_del_xendev(struct XenDevice *xendev)
 }
 
 QTAILQ_REMOVE(&xendevs, xendev, next);
-g_free(xendev);
+
+qdev_unplug(&xendev->qdev, NULL);
 }
 
 void xen_pv_insert_xendev(struct XenDevice *xendev)
diff --git a/include/hw/xen/xen_backend.h b/include/hw/xen/xen_backend.h
index 38f730e..4f4799a 100644
--- a/include/hw/xen/xen_backend.h
+++ b/include/hw/xen/xen_backend.h
@@ -8,6 +8,10 @@
 
 #define TYPE_XENSYSDEV "xen-sysdev"
 #define TYPE_XENSYSBUS "xen-sysbus"
+#define TYPE_XENBACKEND "xen-backend"
+
+#define XENBACKEND_DEVICE(obj) \
+OBJECT_CHECK(XenDevice, (obj), TYPE_XENBACKEND)
 
 /* variables */
 extern xc_interface *xen_xc;
diff --gi

[Xen-devel] Maximum number of vcpus for HVM?

2016-11-02 Thread Juergen Gross
According to

https://wiki.xenproject.org/wiki/Xen_Project_Release_Features

HVM guests can have up to 256 vcpus. The definition in

include/public/hvm/hvm_info_table.h:

/* Maximum we can support with current vLAPIC ID mapping. */
#define HVM_MAX_VCPUS128

and its usage in domain_max_vcpus(d) suggests the limit is 128.
I guess we should update the wiki page?


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


Re: [Xen-devel] Maximum number of vcpus for HVM?

2016-11-02 Thread Juergen Gross
On 02/11/16 10:59, Jan Beulich wrote:
 On 02.11.16 at 10:36,  wrote:
>> According to
>>
>> https://wiki.xenproject.org/wiki/Xen_Project_Release_Features 
>>
>> HVM guests can have up to 256 vcpus. The definition in
>>
>> include/public/hvm/hvm_info_table.h:
>>
>> /* Maximum we can support with current vLAPIC ID mapping. */
>> #define HVM_MAX_VCPUS128
>>
>> and its usage in domain_max_vcpus(d) suggests the limit is 128.
>> I guess we should update the wiki page?
> 
> Yes. I have no idea why it says differently for 4.2+.

Okay, I've updated the wiki.


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel


<    1   2   3   4   5   6   7   8   9   10   >