[Xen-devel] [RFC v0 1/9] smpboot: Add a separate CPU state when a surviving CPU times out

2015-09-05 Thread Daniel Wagner
The CPU_DEAD_FROZEN state is abused to report to cpu_wait_death() that
the operation timeout. It has nothing to do with the pm freezing
process. Introduce a new state to allow proper distinction between the
states and also prepares the code to get rid of all FROZEN states.

This was intruced in

8038dad7e888581266c76df15d70ca457a3c5910 smpboot: Add common code for 
notification from dying CPU
2a442c9c6453d3d043dfd89f2e03a1deff8a6f06 x86: Use common 
outgoing-CPU-notification code

Signed-off-by: Daniel Wagner 
Cc: "H. Peter Anvin" 
Cc: "Paul E. McKenney" 
Cc: Andrew Morton 
Cc: Boris Ostrovsky 
Cc: Chris Metcalf 
Cc: David Vrabel 
Cc: Ingo Molnar 
Cc: John Hubbard 
Cc: Konrad Rzeszutek Wilk 
Cc: Lai Jiangshan 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: x...@kernel.org
Cc: xen-de...@lists.xenproject.org
Cc: linux-ker...@vger.kernel.org
---
 arch/x86/xen/smp.c  | 2 +-
 include/linux/cpu.h | 2 ++
 kernel/smpboot.c| 4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/xen/smp.c b/arch/x86/xen/smp.c
index 8648438..7a8bc03 100644
--- a/arch/x86/xen/smp.c
+++ b/arch/x86/xen/smp.c
@@ -740,7 +740,7 @@ static int xen_hvm_cpu_up(unsigned int cpu, struct 
task_struct *tidle)
 * This can happen if CPU was offlined earlier and
 * offlining timed out in common_cpu_die().
 */
-   if (cpu_report_state(cpu) == CPU_DEAD_FROZEN) {
+   if (cpu_report_state(cpu) == CPU_DEAD_TIMEOUT) {
xen_smp_intr_free(cpu);
xen_uninit_lock_cpu(cpu);
}
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 23c30bd..381ea8a 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -101,6 +101,8 @@ enum {
* idle loop. */
 #define CPU_BROKEN 0x000C /* CPU (unsigned)v did not die properly,
* perhaps due to preemption. */
+#define CPU_DEAD_TIMEOUT   0x000D /* CPU (unsigned)v surviving CPU timed
+ out */
 
 /* Used for CPU hotplug events occurring while tasks are frozen due to a 
suspend
  * operation in progress
diff --git a/kernel/smpboot.c b/kernel/smpboot.c
index 7c434c3..e37efbf 100644
--- a/kernel/smpboot.c
+++ b/kernel/smpboot.c
@@ -505,7 +505,7 @@ update_state:
  * Called by the outgoing CPU to report its successful death.  Return
  * false if this report follows the surviving CPU's timing out.
  *
- * A separate "CPU_DEAD_FROZEN" is used when the surviving CPU
+ * A separate "CPU_DEAD_TIMEOUT" is used when the surviving CPU
  * timed out.  This approach allows architectures to omit calls to
  * cpu_check_up_prepare() and cpu_set_state_online() without defeating
  * the next cpu_wait_death()'s polling loop.
@@ -521,7 +521,7 @@ bool cpu_report_death(void)
if (oldstate != CPU_BROKEN)
newstate = CPU_DEAD;
else
-   newstate = CPU_DEAD_FROZEN;
+   newstate = CPU_DEAD_TIMEOUT;
} while (atomic_cmpxchg(&per_cpu(cpu_hotplug_state, cpu),
oldstate, newstate) != oldstate);
return newstate == CPU_DEAD;
-- 
2.4.3


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Fwd: [PATCH] x86: Use larger chunks in mtrr_cleanup

2015-09-05 Thread Toshi Kani
On Fri, 2015-09-04 at 03:40 +0200, Luis R. Rodriguez wrote:
> On Thu, Sep 03, 2015 at 06:48:46PM -0600, Toshi Kani wrote:
> > On Fri, 2015-09-04 at 01:54 +0200, Luis R. Rodriguez wrote:
> > > On Thu, Sep 03, 2015 at 05:21:14PM -0600, Toshi Kani wrote:
> > > > On Fri, 2015-09-04 at 00:45 +0200, Luis R. Rodriguez wrote:
> > > > > On Thu, Sep 03, 2015 at 04:25:31PM -0600, Toshi Kani wrote:
> >  :
> > > > > > On Xen,
> > > > > 
> > > > > When Xen is used a platform firmware may still set up MTRR, even if 
> > > > > the hypervisor doesn't set up MTRR right ? So same issue and 
> > > > > question here.
> > > > 
> > > > Right, I meant to say Xen guests.
> > > 
> > > Ah but its import complicated than that.
> > > 
> > > > In case of the Xen hypervisor,
> > > > mtrr_type_lookup() returns a valid type as it runs on a platform.
> > > 
> > > I am not sure if this happens today, I know MTRR is simply disabled by
> > > the Xen Hypervisor on the CPU explicitly, it disable it so guests 
> > > reading the MTRR capabilities sees it as disabled when queried.
> > 
> > Oh, I would not let the hypervisor to disable MTRRs...
> 
> Commit 586ab6a055376ec3f3e1e8 ("x86/pvh: disable MTRR feature on cpuid for 
> Dom0") by Roger Pau Monné disables MTRR for PVH dom0, so that cpuid returns 
> that MTRR is disabled to guests. 

Oh, I see.  It just clears the capability bit so that the kernel thinks MTRRs
are disabled.  That makes sense.

> Then later on Linux as of commit 47591df50512 ("xen: Support Xen pv-domains
> using PAT") added by Juergen as of v3.19 Linux guests can end up booting
> without MTRR but with PAT now enabled.

Nice!

> > > Then since the Xen Linux guests cannot speak MTRR through the hypervisor
> > > (for instance Xen guests cannot ask Xen hypervisor to mtrr_type_lookup() 
> > > for it) if PCI passthrough is used it could mean a guest might set up / 
> > > use incorrect info as well.
> > > 
> > > If I undestand this correctly then I think we're in a pickle with Xen 
> > > unless we add hypervisor support and hypercall support for
> > > mtrr_type_lookup().
> > 
> > I was under assumption that MTRRs are emulated and disabled on guests.
> 
> Some "special" flavor Linux guests (with non-upstream code) have guest
> MTRR hypercall support, for vanilla Xen and Linux they just never get MTRR
> support. After Juergen's Linux changes though Xen guests can now get
> shiny PAT support. Since MTRR hypercall support is not upstream and MTRR is
> ancient I decided instead of adding MTRR hypercall support upstream to go 
> with converting all drivers to PAT interfaces, with the assumption there 
> would be no issues.
> 
> > Isn't guest physical address virtualized?
> 
> It is, there is a xen iotlb and stuff but that should ensure dom0 gets
> to get proper access to devices, and if you use PCI passthrough you want
> the best experience as well.
> 
> > I know other proprietary VMMs on IA64, but know nothing about Xen...  So, 
> > please disregard my comments to Xen. :-)
> 
> No worries, no one knows all the answers, we work together to remove
> cob webs off of these odd corners no one cares about :)

Thanks for all the info!  That helps.
-Toshi

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 4/9] xen/blkfront: pseudo support for multi hardware queues/rings

2015-09-05 Thread Bob Liu
Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

* Use 'nr_rings' in per dev_info to identify how many hw queues/rings are
  supported, and a pointer *rinfo for all its rings.
* Rename 'nr_ring_pages' => 'pages_per_ring' to distinguish from 'nr_rings'
  better.

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkfront.c |  513 +-
 1 file changed, 308 insertions(+), 205 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf416d5..bf45c99 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(dinfo)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -157,9 +157,10 @@ struct blkfront_dev_info {
unsigned int feature_persistent:1;
unsigned int max_indirect_segments;
int is_ready;
-   unsigned int nr_ring_pages;
+   unsigned int pages_per_ring;
struct blk_mq_tag_set tag_set;
-   struct blkfront_ring_info rinfo;
+   struct blkfront_ring_info *rinfo;
+   unsigned int nr_rings;
 };
 
 static unsigned int nr_minors;
@@ -191,7 +192,7 @@ static DEFINE_SPINLOCK(minor_lock);
((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
 static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
-static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
+static void __blkfront_gather_backend_features(struct blkfront_dev_info 
*dinfo);
 
 static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
@@ -668,7 +669,7 @@ static int blk_mq_init_hctx(struct blk_mq_hw_ctx *hctx, 
void *data,
 {
struct blkfront_dev_info *dinfo = (struct blkfront_dev_info *)data;
 
-   hctx->driver_data = &dinfo->rinfo;
+   hctx->driver_data = &dinfo->rinfo[index];
return 0;
 }
 
@@ -927,8 +928,8 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 
 static void xlvbd_release_gendisk(struct blkfront_dev_info *dinfo)
 {
-   unsigned int minor, nr_minors;
-   struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+   unsigned int minor, nr_minors, i;
+   struct blkfront_ring_info *rinfo;
 
if (dinfo->rq == NULL)
return;
@@ -936,11 +937,15 @@ static void xlvbd_release_gendisk(struct 
blkfront_dev_info *dinfo)
/* No more blkif_request(). */
blk_mq_stop_hw_queues(dinfo->rq);
 
-   /* No more gnttab callback work. */
-   gnttab_cancel_free_callback(&rinfo->callback);
+   for (i = 0; i < dinfo->nr_rings; i++) {
+   rinfo = &dinfo->rinfo[i];
 
-   /* Flush gnttab callback work. Must be done with no locks held. */
-   flush_work(&rinfo->work);
+   /* No more gnttab callback work. */
+   gnttab_cancel_free_callback(&rinfo->callback);
+
+   /* Flush gnttab callback work. Must be done with no locks held. 
*/
+   flush_work(&rinfo->work);
+   }
 
del_gendisk(dinfo->gd);
 
@@ -977,8 +982,8 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int 
suspend)
 {
struct grant *persistent_gnt;
struct grant *n;
-   int i, j, segs;
-   struct blkfront_ring_info *rinfo = &dinfo->rinfo;
+   int i, j, segs, r_index;
+   struct blkfront_ring_info *rinfo;
 
/* Prevent new requests being issued until we fix things up. */
spin_lock_irq(&dinfo->io_lock);
@@ -988,100 +993,103 @@ static void blkif_free(struct blkfront_dev_info *dinfo, 
int suspend)
if (dinfo->rq)
blk_mq_stop_hw_queues(dinfo->rq);
 
-   /* Remove all persistent grants */
-   if (!list_empty(&rinfo->grants)) {
-   list_for_each_entry_safe(persistent_gnt, n,
-&rinfo->grants, node) {
-   list_del(&persistent_gnt->node);
-   if (persistent_gnt->gref != GRANT_INVALID_REF) {
-   gnttab_end_foreign_access(persistent_gnt->gref,
- 0, 0UL);
-   rinfo->persistent_gnts_c--;
+   for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
+   rinfo = &dinfo->rinfo[r_index];
+
+   /* Remove all persistent grants */
+   if (!list_empty(&rinfo->grants)) {
+   list_for_each_entry_safe(persistent_gnt, n,
+ 

[Xen-devel] [PATCH v3 1/9] xen-blkfront: convert to blk-mq APIs

2015-09-05 Thread Bob Liu
Note: This patch is based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

Only one hardware queue is used now, so there is no significant
performance change

The legacy non-mq code is deleted completely which is the same as other
drivers like virtio, mtip, and nvme.

Also dropped one unnecessary holding of info->io_lock when calling
blk_mq_stop_hw_queues().

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
Reviewed-by: Christoph Hellwig 
Acked-by: Jens Axboe 
Signed-off-by: David Vrabel 
---
 drivers/block/xen-blkfront.c |  146 +-
 1 file changed, 60 insertions(+), 86 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 7a8a73f..5dd591d 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -37,6 +37,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -148,6 +149,7 @@ struct blkfront_info
unsigned int feature_persistent:1;
unsigned int max_indirect_segments;
int is_ready;
+   struct blk_mq_tag_set tag_set;
 };
 
 static unsigned int nr_minors;
@@ -617,54 +619,41 @@ static inline bool blkif_request_flush_invalid(struct 
request *req,
 !(info->feature_flush & REQ_FUA)));
 }
 
-/*
- * do_blkif_request
- *  read a block; request is in a request queue
- */
-static void do_blkif_request(struct request_queue *rq)
+static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
+  const struct blk_mq_queue_data *qd)
 {
-   struct blkfront_info *info = NULL;
-   struct request *req;
-   int queued;
-
-   pr_debug("Entered do_blkif_request\n");
-
-   queued = 0;
+   struct blkfront_info *info = qd->rq->rq_disk->private_data;
 
-   while ((req = blk_peek_request(rq)) != NULL) {
-   info = req->rq_disk->private_data;
-
-   if (RING_FULL(&info->ring))
-   goto wait;
+   blk_mq_start_request(qd->rq);
+   spin_lock_irq(&info->io_lock);
+   if (RING_FULL(&info->ring))
+   goto out_busy;
 
-   blk_start_request(req);
+   if (blkif_request_flush_invalid(qd->rq, info))
+   goto out_err;
 
-   if (blkif_request_flush_invalid(req, info)) {
-   __blk_end_request_all(req, -EOPNOTSUPP);
-   continue;
-   }
+   if (blkif_queue_request(qd->rq))
+   goto out_busy;
 
-   pr_debug("do_blk_req %p: cmd %p, sec %lx, "
-"(%u/%u) [%s]\n",
-req, req->cmd, (unsigned long)blk_rq_pos(req),
-blk_rq_cur_sectors(req), blk_rq_sectors(req),
-rq_data_dir(req) ? "write" : "read");
-
-   if (blkif_queue_request(req)) {
-   blk_requeue_request(rq, req);
-wait:
-   /* Avoid pointless unplugs. */
-   blk_stop_queue(rq);
-   break;
-   }
+   flush_requests(info);
+   spin_unlock_irq(&info->io_lock);
+   return BLK_MQ_RQ_QUEUE_OK;
 
-   queued++;
-   }
+out_err:
+   spin_unlock_irq(&info->io_lock);
+   return BLK_MQ_RQ_QUEUE_ERROR;
 
-   if (queued != 0)
-   flush_requests(info);
+out_busy:
+   spin_unlock_irq(&info->io_lock);
+   blk_mq_stop_hw_queue(hctx);
+   return BLK_MQ_RQ_QUEUE_BUSY;
 }
 
+static struct blk_mq_ops blkfront_mq_ops = {
+   .queue_rq = blkif_queue_rq,
+   .map_queue = blk_mq_map_queue,
+};
+
 static int xlvbd_init_blk_queue(struct gendisk *gd, u16 sector_size,
unsigned int physical_sector_size,
unsigned int segments)
@@ -672,9 +661,22 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
struct request_queue *rq;
struct blkfront_info *info = gd->private_data;
 
-   rq = blk_init_queue(do_blkif_request, &info->io_lock);
-   if (rq == NULL)
+   memset(&info->tag_set, 0, sizeof(info->tag_set));
+   info->tag_set.ops = &blkfront_mq_ops;
+   info->tag_set.nr_hw_queues = 1;
+   info->tag_set.queue_depth =  BLK_RING_SIZE(info);
+   info->tag_set.numa_node = NUMA_NO_NODE;
+   info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
+   info->tag_set.cmd_size = 0;
+   info->tag_set.driver_data = info;
+
+   if (blk_mq_alloc_tag_set(&info->tag_set))
return -1;
+   rq = blk_mq_init_queue(&info->tag_set);
+   if (IS_ERR(rq)) {
+   blk_mq_free_tag_set(&info->tag_set);
+   return -1;
+   }
 
queue_flag_set_unlocked(QUEUE_FLAG_VIRT, rq);
 
@@ -902,19 +904,15 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
 static void xlvbd_release_gendisk(struct blkfront_info *info)
 {
unsigned int minor, nr_minors;
-   unsigned long

[Xen-devel] [PATCH v3 2/9] xen-block: add document for mutli hardware queues/rings

2015-09-05 Thread Bob Liu
Document multi queues/rings of xen-block.

Signed-off-by: Bob Liu 
---
 include/xen/interface/io/blkif.h |   32 
 1 file changed, 32 insertions(+)

diff --git a/include/xen/interface/io/blkif.h b/include/xen/interface/io/blkif.h
index c33e1c4..b453b70 100644
--- a/include/xen/interface/io/blkif.h
+++ b/include/xen/interface/io/blkif.h
@@ -28,6 +28,38 @@ typedef uint16_t blkif_vdev_t;
 typedef uint64_t blkif_sector_t;
 
 /*
+ * Multiple hardware queues/rings:
+ * If supported, the backend will write the key "multi-queue-max-queues" to
+ * the directory for that vbd, and set its value to the maximum supported
+ * number of queues.
+ * Frontends that are aware of this feature and wish to use it can write the
+ * key "multi-queue-num-queues", set to the number they wish to use, which
+ * must be greater than zero, and no more than the value reported by the 
backend
+ * in "multi-queue-max-queues".
+ *
+ * For frontends requesting just one queue, the usual event-channel and
+ * ring-ref keys are written as before, simplifying the backend processing
+ * to avoid distinguishing between a frontend that doesn't understand the
+ * multi-queue feature, and one that does, but requested only one queue.
+ *
+ * Frontends requesting two or more queues must not write the toplevel
+ * event-channeland ring-ref keys, instead writing those keys under sub-keys
+ * having the name "queue-N" where N is the integer ID of the queue/ring for
+ * which those keys belong. Queues are indexed from zero.
+ * For example, a frontend with two queues must write the following set of
+ * queue-related keys:
+ *
+ * /local/domain/1/device/vbd/0/multi-queue-num-queues = "2"
+ * /local/domain/1/device/vbd/0/queue-0 = ""
+ * /local/domain/1/device/vbd/0/queue-0/ring-ref = ""
+ * /local/domain/1/device/vbd/0/queue-0/event-channel = ""
+ * /local/domain/1/device/vbd/0/queue-1 = ""
+ * /local/domain/1/device/vbd/0/queue-1/ring-ref = ""
+ * /local/domain/1/device/vbd/0/queue-1/event-channel = ""
+ *
+ */
+
+/*
  * REQUEST CODES.
  */
 #define BLKIF_OP_READ  0
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 6/9] xen/blkfront: negotiate the number of hw queues/rings with backend

2015-09-05 Thread Bob Liu
The max number of hardware queues for xen/blkfront is set by parameter
'max_queues', while the number xen/blkback supported is notified through
xenstore("multi-queue-max-queues").

The negotiated number was the smaller one, and was written back to
xen/blkback as "multi-queue-num-queues".

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkfront.c |  142 --
 1 file changed, 108 insertions(+), 34 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 1cae76b..1aa66c9 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,6 +107,10 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
+static unsigned int xen_blkif_max_queues;
+module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
+MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings per 
virtual disk");
+
 #define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(dinfo)->pages_per_ring)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
 /*
@@ -114,6 +118,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
  * characters are enough. Define to 20 to keep consist with backend.
  */
 #define RINGREF_NAME_LEN (20)
+#define QUEUE_NAME_LEN (12)
 
 /*
  *  Per-ring info.
@@ -687,7 +692,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, u16 
sector_size,
 
memset(&dinfo->tag_set, 0, sizeof(dinfo->tag_set));
dinfo->tag_set.ops = &blkfront_mq_ops;
-   dinfo->tag_set.nr_hw_queues = 1;
+   dinfo->tag_set.nr_hw_queues = dinfo->nr_rings;
dinfo->tag_set.queue_depth =  BLK_RING_SIZE(dinfo);
dinfo->tag_set.numa_node = NUMA_NO_NODE;
dinfo->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
@@ -1350,6 +1355,51 @@ fail:
return err;
 }
 
+static int write_per_ring_nodes(struct xenbus_transaction xbt,
+   struct blkfront_ring_info *rinfo, const char 
*dir)
+{
+   int err, i;
+   const char *message = NULL;
+   struct blkfront_dev_info *dinfo = rinfo->dinfo;
+
+   if (dinfo->pages_per_ring == 1) {
+   err = xenbus_printf(xbt, dir, "ring-ref", "%u", 
rinfo->ring_ref[0]);
+   if (err) {
+   message = "writing ring-ref";
+   goto abort_transaction;
+   }
+   pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
+   } else {
+   for (i = 0; i < dinfo->pages_per_ring; i++) {
+   char ring_ref_name[RINGREF_NAME_LEN];
+
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   err = xenbus_printf(xbt, dir, ring_ref_name,
+   "%u", rinfo->ring_ref[i]);
+   if (err) {
+   message = "writing ring-ref";
+   goto abort_transaction;
+   }
+   pr_info("%s: write ring-ref:%d\n", dir, 
rinfo->ring_ref[i]);
+   }
+   }
+
+   err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
+   if (err) {
+   message = "writing event-channel";
+   goto abort_transaction;
+   }
+   pr_info("%s: write event-channel:%d\n", dir, rinfo->evtchn);
+
+   return 0;
+
+abort_transaction:
+   xenbus_transaction_end(xbt, 1);
+   if (message)
+   xenbus_dev_fatal(dinfo->xbdev, err, "%s", message);
+
+   return err;
+}
 
 /* Common code used when first setting up, and when resuming. */
 static int talk_to_blkback(struct xenbus_device *dev,
@@ -1386,45 +1436,51 @@ again:
goto out;
}
 
+   if (dinfo->pages_per_ring > 1) {
+   err = xenbus_printf(xbt, dev->nodename, "ring-page-order", "%u",
+   ring_page_order);
+   if (err) {
+   message = "writing ring-page-order";
+   goto abort_transaction;
+   }
+   }
+
+   /* We already got the number of queues in _probe */
if (dinfo->nr_rings == 1) {
rinfo = &dinfo->rinfo[0];
 
-   if (dinfo->pages_per_ring == 1) {
-   err = xenbus_printf(xbt, dev->nodename,
-   "ring-ref", "%u", 
rinfo->ring_ref[0]);
-   if (err) {
-   message = "writing ring-ref";
-   goto abort_transaction;
-   }
-   } else {
-   err = xenbus_printf(xbt, dev->nodename,
-   "ring-page-o

[Xen-devel] [PATCH v3 3/9] xen/blkfront: separate per ring information out of device info

2015-09-05 Thread Bob Liu
Split per ring information to an new structure:blkfront_ring_info, also rename
per blkfront_info to blkfront_dev_info.

A ring is the representation of a hardware queue, every vbd device can associate
with one or more blkfront_ring_info depending on how many hardware
queues/rings to be used.

This patch is a preparation for supporting real multi hardware queues/rings.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkfront.c |  854 ++
 1 file changed, 445 insertions(+), 409 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index 5dd591d..bf416d5 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -107,7 +107,7 @@ static unsigned int xen_blkif_max_ring_order;
 module_param_named(max_ring_page_order, xen_blkif_max_ring_order, int, 
S_IRUGO);
 MODULE_PARM_DESC(max_ring_page_order, "Maximum order of pages to be used for 
the shared ring");
 
-#define BLK_RING_SIZE(info) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(info)->nr_ring_pages)
+#define BLK_RING_SIZE(dinfo) __CONST_RING_SIZE(blkif, PAGE_SIZE * 
(dinfo)->nr_ring_pages)
 #define BLK_MAX_RING_SIZE __CONST_RING_SIZE(blkif, PAGE_SIZE * 
XENBUS_MAX_RING_PAGES)
 /*
  * ring-ref%i i=(-1UL) would take 11 characters + 'ring-ref' is 8, so 19
@@ -116,12 +116,31 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
 #define RINGREF_NAME_LEN (20)
 
 /*
+ *  Per-ring info.
+ *  Every blkfront device can associate with one or more blkfront_ring_info,
+ *  depending on how many hardware queues to be used.
+ */
+struct blkfront_ring_info
+{
+   struct blkif_front_ring ring;
+   unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
+   unsigned int evtchn, irq;
+   struct work_struct work;
+   struct gnttab_free_callback callback;
+   struct blk_shadow shadow[BLK_MAX_RING_SIZE];
+   struct list_head grants;
+   struct list_head indirect_pages;
+   unsigned int persistent_gnts_c;
+   unsigned long shadow_free;
+   struct blkfront_dev_info *dinfo;
+};
+
+/*
  * We have one of these per vbd, whether ide, scsi or 'other'.  They
  * hang in private_data off the gendisk structure. We may end up
  * putting all kinds of interesting stuff here :-)
  */
-struct blkfront_info
-{
+struct blkfront_dev_info {
spinlock_t io_lock;
struct mutex mutex;
struct xenbus_device *xbdev;
@@ -129,18 +148,7 @@ struct blkfront_info
int vdevice;
blkif_vdev_t handle;
enum blkif_state connected;
-   int ring_ref[XENBUS_MAX_RING_PAGES];
-   unsigned int nr_ring_pages;
-   struct blkif_front_ring ring;
-   unsigned int evtchn, irq;
struct request_queue *rq;
-   struct work_struct work;
-   struct gnttab_free_callback callback;
-   struct blk_shadow shadow[BLK_MAX_RING_SIZE];
-   struct list_head grants;
-   struct list_head indirect_pages;
-   unsigned int persistent_gnts_c;
-   unsigned long shadow_free;
unsigned int feature_flush;
unsigned int feature_discard:1;
unsigned int feature_secdiscard:1;
@@ -149,7 +157,9 @@ struct blkfront_info
unsigned int feature_persistent:1;
unsigned int max_indirect_segments;
int is_ready;
+   unsigned int nr_ring_pages;
struct blk_mq_tag_set tag_set;
+   struct blkfront_ring_info rinfo;
 };
 
 static unsigned int nr_minors;
@@ -180,32 +190,33 @@ static DEFINE_SPINLOCK(minor_lock);
 #define INDIRECT_GREFS(_segs) \
((_segs + SEGS_PER_INDIRECT_FRAME - 1)/SEGS_PER_INDIRECT_FRAME)
 
-static int blkfront_setup_indirect(struct blkfront_info *info);
-static int blkfront_gather_backend_features(struct blkfront_info *info);
+static int blkfront_setup_indirect(struct blkfront_ring_info *rinfo);
+static int blkfront_gather_backend_features(struct blkfront_dev_info *dinfo);
 
-static int get_id_from_freelist(struct blkfront_info *info)
+static int get_id_from_freelist(struct blkfront_ring_info *rinfo)
 {
-   unsigned long free = info->shadow_free;
-   BUG_ON(free >= BLK_RING_SIZE(info));
-   info->shadow_free = info->shadow[free].req.u.rw.id;
-   info->shadow[free].req.u.rw.id = 0x0fee; /* debug */
+   unsigned long free = rinfo->shadow_free;
+
+   BUG_ON(free >= BLK_RING_SIZE(rinfo->dinfo));
+   rinfo->shadow_free = rinfo->shadow[free].req.u.rw.id;
+   rinfo->shadow[free].req.u.rw.id = 0x0fee; /* debug */
return free;
 }
 
-static int add_id_to_freelist(struct blkfront_info *info,
+static int add_id_to_freelist(struct blkfront_ring_info *rinfo,
   unsigned long id)
 {
-   if (info->shadow[id].req.u.rw.id != id)
+   if (rinfo->shadow[id].req.u.rw.id != id)
return -EINVAL;
-   if (info->shadow[id].request == NULL)
+   if (rinfo->shadow[id].request == NULL)
return -EINVAL;
-   info->shadow[id]

[Xen-devel] [PATCH v3 0/9] xen-block: support multi hardware-queues/rings

2015-09-05 Thread Bob Liu
Note: These patches were based on original work of Arianna's internship for
GNOME's Outreach Program for Women.

The first patch which just convert xen-blkfront driver to use blk-mq api has
been applied by David.

After using blk-mq api, a guest has more than one(nr_vpus) software request
queues associated with each block front. These queues can be mapped over several
rings(hardware queues) to the backend, making it very easy for us to run
multiple threads on the backend for a single virtual disk.

By having different threads issuing requests at the same time, the performance
of guest can be improved significantly in the end.

Test was done based on null_blk driver:
dom0: v4.2-rc8 16vcpus 10GB "modprobe null_blk"
domu: v4.2-rc8 16vcpus 10GB

[test]
rw=read or randread
direct=1
ioengine=libaio
bs=4k
time_based
runtime=30
filename=/dev/xvdb
numjobs=16
iodepth=64
iodepth_batch=64
iodepth_batch_complete=64
group_reporting

Seqread:
dom0domU(no_mq) domU(4 queues)   8 queues   16 queues
iops:  1308k690k1380k(+200%)1238k   1471k

Randread:
dom0domU(no_mq) domU(4 queues)   8 queues   16 queues
iops:  1310k279k810k(+200%)  871k   1000k

Only with 4queues, iops for domU get improved a lot and nearly catch up with
dom0. There were also similar huge improvement for write and real SSD storage.

---
v3: Rebased to v4.2-rc8

Bob Liu (9):
  xen-blkfront: convert to blk-mq APIs
  xen-block: add document for mutli hardware queues/rings
  xen/blkfront: separate per ring information out of device info
  xen/blkfront: pseudo support for multi hardware queues/rings
  xen/blkfront: convert per device io_lock to per ring ring_lock
  xen/blkfront: negotiate the number of hw queues/rings with backend
  xen/blkback: separate ring information out of struct xen_blkif
  xen/blkback: pseudo support for multi hardware queues/rings
  xen/blkback: get number of hardware queues/rings from blkfront

 drivers/block/xen-blkback/blkback.c |  373 +-
 drivers/block/xen-blkback/common.h  |   53 +-
 drivers/block/xen-blkback/xenbus.c  |  376 ++
 drivers/block/xen-blkfront.c| 1343 ---
 include/xen/interface/io/blkif.h|   32 +
 5 files changed, 1278 insertions(+), 899 deletions(-)

-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v3 8/9] xen/blkback: pseudo support for multi hardware queues/rings

2015-09-05 Thread Bob Liu
Prepare patch for multi hardware queues/rings, the ring number was set to 1 by
force.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/common.h |3 +-
 drivers/block/xen-blkback/xenbus.c |  328 +++-
 2 files changed, 209 insertions(+), 122 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index cc253d4..ba058a0 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -339,7 +339,8 @@ struct xen_blkif {
unsigned long long  st_wr_sect;
unsigned int nr_ring_pages;
/* All rings for this device */
-   struct xen_blkif_ring ring;
+   struct xen_blkif_ring *rings;
+   unsigned int nr_rings;
 };
 
 struct seg_buf {
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 6482ee3..04b8d0d 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -26,6 +26,7 @@
 /* Enlarge the array size in order to fully show blkback name. */
 #define BLKBACK_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
+#define RINGREF_NAME_LEN (20)
 
 struct backend_info {
struct xenbus_device*dev;
@@ -84,11 +85,13 @@ static int blkback_name(struct xen_blkif *blkif, char *buf)
 
 static void xen_update_blkif_status(struct xen_blkif *blkif)
 {
-   int err;
+   int err, i;
char name[BLKBACK_NAME_LEN];
+   struct xen_blkif_ring *ring;
+   char per_ring_name[BLKBACK_NAME_LEN + 2];
 
/* Not ready to connect? */
-   if (!blkif->ring.irq || !blkif->vbd.bdev)
+   if (!blkif->rings || !blkif->rings[0].irq || !blkif->vbd.bdev)
return;
 
/* Already connected? */
@@ -113,19 +116,68 @@ static void xen_update_blkif_status(struct xen_blkif 
*blkif)
}
invalidate_inode_pages2(blkif->vbd.bdev->bd_inode->i_mapping);
 
-   blkif->ring.xenblkd = kthread_run(xen_blkif_schedule, &blkif->ring, 
"%s", name);
-   if (IS_ERR(blkif->ring.xenblkd)) {
-   err = PTR_ERR(blkif->ring.xenblkd);
-   blkif->ring.xenblkd = NULL;
-   xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
-   return;
+   if (blkif->nr_rings == 1) {
+   blkif->rings[0].xenblkd = kthread_run(xen_blkif_schedule, 
&blkif->rings[0], "%s", name);
+   if (IS_ERR(blkif->rings[0].xenblkd)) {
+   err = PTR_ERR(blkif->rings[0].xenblkd);
+   blkif->rings[0].xenblkd = NULL;
+   xenbus_dev_error(blkif->be->dev, err, "start xenblkd");
+   return;
+   }
+   } else {
+   for (i = 0; i < blkif->nr_rings; i++) {
+   snprintf(per_ring_name, BLKBACK_NAME_LEN + 2, "%s-%d", 
name, i);
+   ring = &blkif->rings[i];
+   ring->xenblkd = kthread_run(xen_blkif_schedule, ring, 
"%s", per_ring_name);
+   if (IS_ERR(ring->xenblkd)) {
+   err = PTR_ERR(ring->xenblkd);
+   ring->xenblkd = NULL;
+   xenbus_dev_error(blkif->be->dev, err,
+   "start %s xenblkd", 
per_ring_name);
+   return;
+   }
+   }
+   }
+}
+
+static int xen_blkif_alloc_rings(struct xen_blkif *blkif)
+{
+   struct xen_blkif_ring *ring;
+   int r;
+
+   blkif->rings = kzalloc(blkif->nr_rings * sizeof(struct xen_blkif_ring), 
GFP_KERNEL);
+   if (!blkif->rings)
+   return -ENOMEM;
+
+   for (r = 0; r < blkif->nr_rings; r++) {
+   ring = &blkif->rings[r];
+
+   spin_lock_init(&ring->blk_ring_lock);
+   init_waitqueue_head(&ring->wq);
+   ring->st_print = jiffies;
+   ring->persistent_gnts.rb_node = NULL;
+   spin_lock_init(&ring->free_pages_lock);
+   INIT_LIST_HEAD(&ring->free_pages);
+   INIT_LIST_HEAD(&ring->persistent_purge_list);
+   ring->free_pages_num = 0;
+   atomic_set(&ring->persistent_gnt_in_use, 0);
+   atomic_set(&ring->inflight, 0);
+   INIT_WORK(&ring->persistent_purge_work, 
xen_blkbk_unmap_purged_grants);
+   INIT_LIST_HEAD(&ring->pending_free);
+
+   spin_lock_init(&ring->pending_free_lock);
+   init_waitqueue_head(&ring->pending_free_wq);
+   init_waitqueue_head(&ring->shutdown_wq);
+   ring->blkif = blkif;
+   xen_blkif_get(blkif);
}
+
+   return 0;
 }
 
 static struct xen_blkif *xen_blkif_alloc(domid_t domid)
 {
struct xen_blkif *blkif;
-   struct xen_blkif_ring *ring;
 
BUILD_BUG_ON(MAX_INDIRECT_PAGES > BLKIF_MAX_INDIRECT_PAGES_PER_REQUEST);
 
@@ 

[Xen-devel] [PATCH v3 7/9] xen/blkback: separate ring information out of struct xen_blkif

2015-09-05 Thread Bob Liu
Split per ring information to an new structure:xen_blkif_ring, so that one vbd
device can associate with one or more rings/hardware queues.

This patch is a preparation for supporting multi hardware queues/rings.

Signed-off-by: Arianna Avanzini 
Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/blkback.c |  365 ++-
 drivers/block/xen-blkback/common.h  |   52 +++--
 drivers/block/xen-blkback/xenbus.c  |  130 +++--
 3 files changed, 295 insertions(+), 252 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 954c002..fd02240 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -113,71 +113,71 @@ module_param(log_stats, int, 0644);
 /* Number of free pages to remove on each call to gnttab_free_pages */
 #define NUM_BATCH_FREE_PAGES 10
 
-static inline int get_free_page(struct xen_blkif *blkif, struct page **page)
+static inline int get_free_page(struct xen_blkif_ring *ring, struct page 
**page)
 {
unsigned long flags;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
-   if (list_empty(&blkif->free_pages)) {
-   BUG_ON(blkif->free_pages_num != 0);
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
+   if (list_empty(&ring->free_pages)) {
+   BUG_ON(ring->free_pages_num != 0);
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
return gnttab_alloc_pages(1, page);
}
-   BUG_ON(blkif->free_pages_num == 0);
-   page[0] = list_first_entry(&blkif->free_pages, struct page, lru);
+   BUG_ON(ring->free_pages_num == 0);
+   page[0] = list_first_entry(&ring->free_pages, struct page, lru);
list_del(&page[0]->lru);
-   blkif->free_pages_num--;
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   ring->free_pages_num--;
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 
return 0;
 }
 
-static inline void put_free_pages(struct xen_blkif *blkif, struct page **page,
+static inline void put_free_pages(struct xen_blkif_ring *ring, struct page 
**page,
   int num)
 {
unsigned long flags;
int i;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
for (i = 0; i < num; i++)
-   list_add(&page[i]->lru, &blkif->free_pages);
-   blkif->free_pages_num += num;
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   list_add(&page[i]->lru, &ring->free_pages);
+   ring->free_pages_num += num;
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
 }
 
-static inline void shrink_free_pagepool(struct xen_blkif *blkif, int num)
+static inline void shrink_free_pagepool(struct xen_blkif_ring *ring, int num)
 {
/* Remove requested pages in batches of NUM_BATCH_FREE_PAGES */
struct page *page[NUM_BATCH_FREE_PAGES];
unsigned int num_pages = 0;
unsigned long flags;
 
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
-   while (blkif->free_pages_num > num) {
-   BUG_ON(list_empty(&blkif->free_pages));
-   page[num_pages] = list_first_entry(&blkif->free_pages,
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
+   while (ring->free_pages_num > num) {
+   BUG_ON(list_empty(&ring->free_pages));
+   page[num_pages] = list_first_entry(&ring->free_pages,
   struct page, lru);
list_del(&page[num_pages]->lru);
-   blkif->free_pages_num--;
+   ring->free_pages_num--;
if (++num_pages == NUM_BATCH_FREE_PAGES) {
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
gnttab_free_pages(num_pages, page);
-   spin_lock_irqsave(&blkif->free_pages_lock, flags);
+   spin_lock_irqsave(&ring->free_pages_lock, flags);
num_pages = 0;
}
}
-   spin_unlock_irqrestore(&blkif->free_pages_lock, flags);
+   spin_unlock_irqrestore(&ring->free_pages_lock, flags);
if (num_pages != 0)
gnttab_free_pages(num_pages, page);
 }
 
 #define vaddr(page) ((unsigned long)pfn_to_kaddr(page_to_pfn(page)))
 
-static int do_block_io_op(struct xen_blkif *blkif);
-static int dispatch_rw_block_io(struct xen_blkif *blkif,
+static int do_block_io_op(struct xen_blkif_ring *ring);
+static int dispatch_rw_block_io(struct xen_blkif_ring *ring,
struct blkif_request *req,
struct pending_req *pending_req);
-static void make_response(struct xen_blkif 

[Xen-devel] [PATCH v3 5/9] xen/blkfront: convert per device io_lock to per ring ring_lock

2015-09-05 Thread Bob Liu
The per device io_lock became a coarser grained lock after multi-queues/rings
was introduced, this patch converts it to a fine-grained per ring lock.

NOTE: The per dev_info structure was no more protected by any lock.

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkfront.c |   44 +++---
 1 file changed, 20 insertions(+), 24 deletions(-)

diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
index bf45c99..1cae76b 100644
--- a/drivers/block/xen-blkfront.c
+++ b/drivers/block/xen-blkfront.c
@@ -123,6 +123,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order of 
pages to be used for the
 struct blkfront_ring_info
 {
struct blkif_front_ring ring;
+   spinlock_t ring_lock;
unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
unsigned int evtchn, irq;
struct work_struct work;
@@ -141,7 +142,6 @@ struct blkfront_ring_info
  * putting all kinds of interesting stuff here :-)
  */
 struct blkfront_dev_info {
-   spinlock_t io_lock;
struct mutex mutex;
struct xenbus_device *xbdev;
struct gendisk *gd;
@@ -637,29 +637,28 @@ static int blkif_queue_rq(struct blk_mq_hw_ctx *hctx,
   const struct blk_mq_queue_data *qd)
 {
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info 
*)hctx->driver_data;
-   struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
blk_mq_start_request(qd->rq);
-   spin_lock_irq(&dinfo->io_lock);
+   spin_lock_irq(&rinfo->ring_lock);
if (RING_FULL(&rinfo->ring))
goto out_busy;
 
-   if (blkif_request_flush_invalid(qd->rq, dinfo))
+   if (blkif_request_flush_invalid(qd->rq, rinfo->dinfo))
goto out_err;
 
if (blkif_queue_request(qd->rq, rinfo))
goto out_busy;
 
flush_requests(rinfo);
-   spin_unlock_irq(&dinfo->io_lock);
+   spin_unlock_irq(&rinfo->ring_lock);
return BLK_MQ_RQ_QUEUE_OK;
 
 out_err:
-   spin_unlock_irq(&dinfo->io_lock);
+   spin_unlock_irq(&rinfo->ring_lock);
return BLK_MQ_RQ_QUEUE_ERROR;
 
 out_busy:
-   spin_unlock_irq(&dinfo->io_lock);
+   spin_unlock_irq(&rinfo->ring_lock);
blk_mq_stop_hw_queue(hctx);
return BLK_MQ_RQ_QUEUE_BUSY;
 }
@@ -961,7 +960,7 @@ static void xlvbd_release_gendisk(struct blkfront_dev_info 
*dinfo)
dinfo->gd = NULL;
 }
 
-/* Must be called with io_lock holded */
+/* Must be called with ring_lock holded */
 static void kick_pending_request_queues(struct blkfront_ring_info *rinfo)
 {
if (!RING_FULL(&rinfo->ring))
@@ -972,10 +971,10 @@ static void blkif_restart_queue(struct work_struct *work)
 {
struct blkfront_ring_info *rinfo = container_of(work, struct 
blkfront_ring_info, work);
 
-   spin_lock_irq(&rinfo->dinfo->io_lock);
+   spin_lock_irq(&rinfo->ring_lock);
if (rinfo->dinfo->connected == BLKIF_STATE_CONNECTED)
kick_pending_request_queues(rinfo);
-   spin_unlock_irq(&rinfo->dinfo->io_lock);
+   spin_unlock_irq(&rinfo->ring_lock);
 }
 
 static void blkif_free(struct blkfront_dev_info *dinfo, int suspend)
@@ -986,7 +985,6 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int 
suspend)
struct blkfront_ring_info *rinfo;
 
/* Prevent new requests being issued until we fix things up. */
-   spin_lock_irq(&dinfo->io_lock);
dinfo->connected = suspend ?
BLKIF_STATE_SUSPENDED : BLKIF_STATE_DISCONNECTED;
/* No more blkif_request(). */
@@ -996,6 +994,7 @@ static void blkif_free(struct blkfront_dev_info *dinfo, int 
suspend)
for (r_index = 0; r_index < dinfo->nr_rings; r_index++) {
rinfo = &dinfo->rinfo[r_index];
 
+   spin_lock_irq(&rinfo->ring_lock);
/* Remove all persistent grants */
if (!list_empty(&rinfo->grants)) {
list_for_each_entry_safe(persistent_gnt, n,
@@ -1071,7 +1070,7 @@ free_shadow:
 
/* No more gnttab callback work. */
gnttab_cancel_free_callback(&rinfo->callback);
-   spin_unlock_irq(&dinfo->io_lock);
+   spin_unlock_irq(&rinfo->ring_lock);
 
/* Flush gnttab callback work. Must be done with no locks held. 
*/
flush_work(&rinfo->work);
@@ -1180,13 +1179,10 @@ static irqreturn_t blkif_interrupt(int irq, void 
*dev_id)
struct blkfront_ring_info *rinfo = (struct blkfront_ring_info *)dev_id;
struct blkfront_dev_info *dinfo = rinfo->dinfo;
 
-   spin_lock_irqsave(&dinfo->io_lock, flags);
-
-   if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED)) {
-   spin_unlock_irqrestore(&dinfo->io_lock, flags);
+   if (unlikely(dinfo->connected != BLKIF_STATE_CONNECTED))
return IRQ_HANDLED;
-   }
 
+   spin_lock_irqsave(&rinfo->ring_lock, flags);
  again:
rp = rinfo->ring.sring->rsp_prod;

[Xen-devel] [PATCH v3 9/9] xen/blkback: get number of hardware queues/rings from blkfront

2015-09-05 Thread Bob Liu
Backend advertises "multi-queue-max-queues" to front, and then read back the
final negotiated queues/rings from "multi-queue-num-queues" which is wrote by
blkfront.

Signed-off-by: Bob Liu 
---
 drivers/block/xen-blkback/blkback.c |8 
 drivers/block/xen-blkback/xenbus.c  |   36 ++-
 2 files changed, 39 insertions(+), 5 deletions(-)

diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index fd02240..b904fe05f0 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -83,6 +83,11 @@ module_param_named(max_persistent_grants, 
xen_blkif_max_pgrants, int, 0644);
 MODULE_PARM_DESC(max_persistent_grants,
  "Maximum number of grants to map persistently");
 
+unsigned int xenblk_max_queues;
+module_param_named(max_queues, xenblk_max_queues, uint, 0644);
+MODULE_PARM_DESC(max_queues,
+"Maximum number of hardware queues per virtual disk");
+
 /*
  * Maximum order of pages to be used for the shared ring between front and
  * backend, 4KB page granularity is used.
@@ -1458,6 +1463,9 @@ static int __init xen_blkif_init(void)
xen_blkif_max_ring_order = XENBUS_MAX_RING_PAGE_ORDER;
}
 
+   /* Allow as many queues as there are CPUs, by default */
+   xenblk_max_queues = num_online_cpus();
+
rc = xen_blkif_interface_init();
if (rc)
goto failed_init;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 04b8d0d..aa97ea5 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -28,6 +28,8 @@
 #define RINGREF_NAME_LEN (20)
 #define RINGREF_NAME_LEN (20)
 
+extern unsigned int xenblk_max_queues;
+
 struct backend_info {
struct xenbus_device*dev;
struct xen_blkif*blkif;
@@ -191,11 +193,6 @@ static struct xen_blkif *xen_blkif_alloc(domid_t domid)
atomic_set(&blkif->drain, 0);
INIT_WORK(&blkif->free_work, xen_blkif_deferred_free);
 
-   blkif->nr_rings = 1;
-   if (xen_blkif_alloc_rings(blkif)) {
-   kmem_cache_free(xen_blkif_cachep, blkif);
-   return ERR_PTR(-ENOMEM);
-   }
return blkif;
 }
 
@@ -618,6 +615,14 @@ static int xen_blkbk_probe(struct xenbus_device *dev,
goto fail;
}
 
+   /* Multi-queue: wrte how many queues backend supported. */
+   err = xenbus_printf(XBT_NIL, dev->nodename,
+   "multi-queue-max-queues", "%u", xenblk_max_queues);
+   if (err) {
+   pr_debug("Error writing multi-queue-num-queues\n");
+   goto fail;
+   }
+
/* setup back pointer */
be->blkif->be = be;
 
@@ -1008,6 +1013,7 @@ static int connect_ring(struct backend_info *be)
char *xspath;
size_t xspathsize;
const size_t xenstore_path_ext_size = 11; /* sufficient for 
"/queue-NNN" */
+   unsigned int requested_num_queues = 0;
 
pr_debug("%s %s\n", __func__, dev->otherend);
 
@@ -1035,6 +1041,26 @@ static int connect_ring(struct backend_info *be)
be->blkif->vbd.feature_gnt_persistent = pers_grants;
be->blkif->vbd.overflow_max_grants = 0;
 
+   /*
+* Read the number of hardware queus from frontend.
+*/
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "multi-queue-num-queues", 
"%u", &requested_num_queues);
+   if (err < 0) {
+   requested_num_queues = 1;
+   } else {
+   if (requested_num_queues > xenblk_max_queues
+   || requested_num_queues == 0) {
+   /* buggy or malicious guest */
+   xenbus_dev_fatal(dev, err,
+   "guest requested %u queues, exceeding 
the maximum of %u.",
+   requested_num_queues, 
xenblk_max_queues);
+   return -1;
+   }
+   }
+   be->blkif->nr_rings = requested_num_queues;
+   if (xen_blkif_alloc_rings(be->blkif))
+   return -ENOMEM;
+
pr_info("nr_rings:%d protocol %d (%s) %s\n", be->blkif->nr_rings,
 be->blkif->blk_protocol, protocol,
 pers_grants ? "persistent grants" : "");
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 61352: trouble: blocked/broken

2015-09-05 Thread osstest service owner
flight 61352 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61352/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   3 host-install(3) broken REGR. vs. 61059
 build-armhf-xsm   3 host-install(3) broken REGR. vs. 61059
 build-amd64-prev  3 host-install(3) broken REGR. vs. 61059
 build-amd64-pvops 3 host-install(3) broken REGR. vs. 61059
 build-amd64-oldkern   3 host-install(3) broken REGR. vs. 61059
 build-amd64   3 host-install(3) broken REGR. vs. 61059
 build-amd64-xsm   3 host-install(3) broken REGR. vs. 61059
 build-i386-oldkern3 host-install(3) broken REGR. vs. 61059
 build-i386-prev   3 host-install(3) broken REGR. vs. 61059
 build-i386-pvops  3 host-install(3) broken REGR. vs. 61059
 build-i3863 host-install(3) broken REGR. vs. 61059
 build-i386-xsm3 host-install(3) broken REGR. vs. 61059
 build-armhf-pvops 3 host-install(3) broken REGR. vs. 61059

Tests which did not succeed, but are not blocking:
 build-amd64-rumpuserxen   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-raw   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 build-i386-rumpuserxen1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-xsm1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-stubdom-debianhvm-amd64-xsm 1 build-check(1) blocked 
n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-i386-migrupgrade   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64-xsm  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-armhf-armhf-xl-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu

[Xen-devel] [distros-debian-snapshot test] 37897: regressions - FAIL

2015-09-05 Thread Platform Team regression test user
flight 37897 distros-debian-snapshot real [real]
http://osstest.xs.citrite.net/~osstest/testlogs/logs/37897/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-i386-current-netinst-pygrub 7 host-ping-check-xen fail REGR. 
vs. 37846

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-amd64-weekly-netinst-pygrub 13 guest-saverestore fail like 
37846
 test-amd64-amd64-amd64-weekly-netinst-pygrub 13 guest-saverestore fail like 
37846

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-amd64-daily-netboot-pvgrub 13 guest-saverestore fail never 
pass
 test-armhf-armhf-armhf-daily-netboot-pygrub 9 debian-di-install fail never pass

baseline version:
 flight   37846

jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-amd64-daily-netboot-pvgrub  fail
 test-amd64-i386-i386-daily-netboot-pvgrubpass
 test-amd64-i386-amd64-daily-netboot-pygrub   pass
 test-armhf-armhf-armhf-daily-netboot-pygrub  fail
 test-amd64-amd64-i386-daily-netboot-pygrub   pass
 test-amd64-amd64-amd64-current-netinst-pygrubpass
 test-amd64-i386-amd64-current-netinst-pygrub pass
 test-amd64-amd64-i386-current-netinst-pygrub fail
 test-amd64-i386-i386-current-netinst-pygrub  pass
 test-amd64-amd64-amd64-weekly-netinst-pygrub fail
 test-amd64-i386-amd64-weekly-netinst-pygrub  fail
 test-amd64-amd64-i386-weekly-netinst-pygrub  pass
 test-amd64-i386-i386-weekly-netinst-pygrub   pass



sg-report-flight on osstest.xs.citrite.net
logs: /home/osstest/logs
images: /home/osstest/images

Logs, config files, etc. are available at
http://osstest.xs.citrite.net/~osstest/testlogs/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Push not applicable.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-4.5-testing test] 61356: trouble: blocked/broken

2015-09-05 Thread osstest service owner
flight 61356 xen-4.5-testing real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61356/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf   3 host-install(3) broken REGR. vs. 60723
 build-amd64   3 host-install(3) broken REGR. vs. 60723
 build-i3863 host-install(3) broken REGR. vs. 60723
 build-amd64-pvops 3 host-install(3) broken REGR. vs. 60723
 build-i386-pvops  3 host-install(3) broken REGR. vs. 60723
 build-armhf-pvops 3 host-install(3) broken REGR. vs. 60723

Regressions which are regarded as allowable (not blocking):
 build-i386-prev   3 host-install(3)   broken baseline untested
 build-amd64-prev  3 host-install(3)   broken baseline untested

Tests which did not succeed, but are not blocking:
 test-amd64-i386-migrupgrade   1 build-check(1)   blocked  n/a
 test-amd64-amd64-migrupgrade  1 build-check(1)   blocked  n/a
 build-amd64-rumpuserxen   1 build-check(1)   blocked  n/a
 build-i386-rumpuserxen1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-amd   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-pvh-intel  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-qcow2 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-vhd   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-i386-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-debianhvm-amd64  1 build-check(1) blocked n/a
 test-armhf-armhf-xl-rtds  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-amd64-i386-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-pair 1 build-check(1)   blocked  n/a
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qcow2 1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl   1 build-check(1)   blocked  n/a
 test-amd64-amd64-amd64-pvgrub  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-arndale   1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-credit2   1 build-check(1)   blocked  n/a
 test-amd64-i386-qemut-rhel6hvm-amd  1 build-check(1)   blocked n/a
 test-amd64-amd64-xl-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qemuu-ovmf-amd64  1 build-check(1)  blocked n/a
 test-amd64-i386-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-i386-xl-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-multivcpu  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-intel  1 build-check(1) blocked n/a
 test-amd64-amd64-pygrub   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-amd64-xl-qemuu-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-qemuu-ovmf-amd64  1 build-check(1) blocked n/a
 test-amd64-amd64-xl-qemut-debianhvm-amd64  1 build-check(1)blocked n/a
 test-amd64-amd64-xl-rtds  1 build-check(1)   blocked  n/a
 test-armhf-armhf-xl-cubietruck  1 build-check(1)   blocked  n/a
 test-amd64-i386-qemuu-rhel6hvm-amd  1 build-check(1)

Re: [Xen-devel] [PATCH 0/5] Intel Code/Data Prioritization(CDP) feature enabling

2015-09-05 Thread Chao Peng
On Wed, Sep 02, 2015 at 01:08:33PM +0100, Andrew Cooper wrote:
> On 02/09/15 09:27, He Chen wrote:
> > Hi all,
> >
> > Code/Data Prioritization(CDP) is offered in Intel Broadwell and later server
> > platforms, which is an extension of CAT. CDP enables isolation and separate
> > prioritization of code and data fetches to the L3 cache in a software
> > configurable manner, which can enable workload prioritization and tuning of
> > cache capacity to the characteristics of the workload. CDP extends Cache
> > Allocation Technology (CAT) by providing separate code and data capacity bit
> > masks(CBM) per Class of Service (COS). CDP is used on VM basis in the Xen
> > implementation.
> >
> > More information about CDP, please refer to Intel SDM, Volumn 3, section 
> > 17.16
> > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> >
> > This patch series enables CDP feature in Xen based on CAT code, including
> > extending CBM operation functions and introducing new commands to 
> > enable/disable
> > CDP dynamically. For all the changes, please see in each patch.
> >
> > This patchset has been tested on Intel Broadwell server platform.
> >
> > To make this patchset better, any comment or suggestion is welcomed, I would
> > really appreciate it.
> 
> I have taken a look at patches 1-3.  For the most part, it looks good.
> 
> The main point I have is on patch 2, as to whether it is sensible to
> permit enabling/disabling cdp at runtime.  I suggest that it is not
> sensible, and should be a command line parameter instead.
> 
> If this is agreed as ok going forwards, patches 3 through 5 should
> become rather more simple.

I guess it would be OK, and a helpful change.
Let's see the next version.

Chao

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [libvirt test] 61360: trouble: blocked/broken

2015-09-05 Thread osstest service owner
flight 61360 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/61360/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-armhf-xsm   3 host-install(3) broken REGR. vs. 61304
 build-armhf   3 host-install(3) broken REGR. vs. 61304
 build-armhf-pvops 3 host-install(3) broken REGR. vs. 61304
 build-i386-pvops  3 host-install(3) broken REGR. vs. 61304
 build-i386-xsm3 host-install(3) broken REGR. vs. 61304
 build-i3863 host-install(3) broken REGR. vs. 61304
 build-amd64   3 host-install(3) broken REGR. vs. 61304
 build-amd64-pvops 3 host-install(3) broken REGR. vs. 61304
 build-amd64-xsm   3 host-install(3) broken REGR. vs. 61304

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-vhd  1 build-check(1)   blocked  n/a
 build-amd64-libvirt   1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-pair  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-raw   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 1 build-check(1) blocked n/a
 test-amd64-i386-libvirt   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-xsm   1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-vhd  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-xsm  1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-vhd   1 build-check(1)   blocked  n/a
 test-amd64-i386-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-raw  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-qcow2  1 build-check(1)   blocked  n/a
 test-amd64-amd64-libvirt-raw  1 build-check(1)   blocked  n/a
 test-armhf-armhf-libvirt-xsm  1 build-check(1)   blocked  n/a
 build-armhf-libvirt   1 build-check(1)   blocked  n/a
 build-i386-libvirt1 build-check(1)   blocked  n/a

version targeted for testing:
 libvirt  62c677849509728419ba700507be6a8ec38a39f7
baseline version:
 libvirt  5c668a78d85b0d71e6ac8e23f2c605058b44df65

Last test of basis61304  2015-09-02 12:32:03 Z3 days
Failing since 61350  2015-09-04 13:14:40 Z1 days2 attempts
Testing same since61360  2015-09-05 05:09:40 Z0 days1 attempts


People who touched revisions under test:
  Erik Skultety 
  Jim Fehlig 
  Jiri Denemark 
  John Ferlan 
  Laine Stump 
  Martin Kletzander 
  Michal Privoznik 
  Pavel Hrdina 

jobs:
 build-amd64-xsm  broken  
 build-armhf-xsm  broken  
 build-i386-xsm   broken  
 build-amd64  broken  
 build-armhf  broken  
 build-i386   broken  
 build-amd64-libvirt  blocked 
 build-armhf-libvirt  blocked 
 build-i386-libvirt   blocked 
 build-amd64-pvopsbroken  
 build-armhf-pvopsbroken  
 build-i386-pvops broken  
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   blocked 
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmblocked 
 test-amd64-amd64-libvirt-xsm blocked 
 test-armhf-armhf-libvirt-xsm blocked 
 test-amd64-i386-libvirt-xsm  blocked 
 test-amd64-amd64-libvirt blocked 
 test-armhf-armhf-libvirt blocked 
 test-amd64-i386-libvirt  blocked 
 test-amd64-amd64-libvirt-pairblocked 
 test-amd64-i386-libvirt-pair blocked 
 test-amd64-amd64-libvirt-qcow2   blocked 
 test-armhf-armhf-libvirt-qcow2   blocked 
 test-amd6

Re: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature detection

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 10:31 PM
> To: Wu, Feng
> Cc: Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org
> Subject: Re: [Xen-devel] [PATCH v6 04/18] vt-d: VT-d Posted-Interrupts feature
> detection
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > --- a/xen/drivers/passthrough/vtd/iommu.c
> > +++ b/xen/drivers/passthrough/vtd/iommu.c
> > @@ -2079,6 +2079,9 @@ static int init_vtd_hw(void)
> >  disable_intremap(drhd->iommu);
> >  }
> >
> > +if ( !iommu_intremap )
> > +iommu_intpost = 0;
> 
> Why is this being repeated here? iommu_setup() is already taking
> care of this thanks to the earlier patch.

Here is my original thought, we have two path to get here as below:

iommu_setup() -> ... -> int_vtd_hw()
vtd_resume() -> ... -> int_vtd_hw()

I added the logic here to cover the 'vtd_resume()' case, however,
after thinking more, seems I don't need to do this, because the
logic has been done before resume, it should be there after back,
right?

> 
> > @@ -2176,6 +2179,14 @@ int __init intel_vtd_setup(void)
> >  if ( iommu_intremap && !ecap_intr_remap(iommu->ecap) )
> >  iommu_intremap = 0;
> >
> > +/*
> > + * We cannot use posted interrupt if X86_FEATURE_CX16 is
> > + * not supported, since we count on this feature to
> > + * atomically update 16-byte IRTE in posted format.
> > + */
> > +if ( !iommu_intremap || !cap_intr_post(iommu->cap)
> || !cpu_has_cx16 )
> > +iommu_intpost = 0;
> 
> You should check iommu_intpost instead of iommu_intremap to
> match the other checks above here.

Here I followed your advice:
http://lists.xenproject.org/archives/html/xen-devel/2015-06/msg01261.html

I think the following code should be ok here though, if you like it, I can 
change back.

+if ( !iommu_intpost || !cap_intr_post(iommu->cap)
+|| !cpu_has_cx16 )
+iommu_intpost = 0;


Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 06/18] vmx: Add some helper functions for Posted-Interrupts

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 10:40 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 06/18] vmx: Add some helper functions for
> Posted-Interrupts
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > @@ -121,11 +122,31 @@ static inline int pi_test_and_clear_on(struct
> pi_desc *pi_desc)
> >  return test_and_clear_bit(POSTED_INTR_ON, &pi_desc->control);
> >  }
> >
> > +static inline int pi_test_on(struct pi_desc *pi_desc)
> > +{
> > +return test_bit(POSTED_INTR_ON, &pi_desc->control);
> > +}
> 
> For this and ...
> 
> > +static inline int pi_test_sn(struct pi_desc *pi_desc)
> > +{
> > +return test_bit(POSTED_INTR_SN, &pi_desc->control);
> > +}
> 
> ... this I wonder whether using the bitfield you defined in the
> previous patch wouldn't allow the compiler more freedom in
> how to carry this out.

I am sorry, I don't quite understand it. Do you mean: the bitfield
defined in previous patch is pointless, or using the bitfield here?

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Chen, Tiejun

Sorry for this delay response because I was on vacation.

On 9/5/2015 5:52 AM, Tamas K Lengyel wrote:

On Fri, Sep 4, 2015 at 2:17 AM, Jan Beulich  wrote:


>>> On 03.09.15 at 21:39,  wrote:
> So this change in 4.6 prevents me from passing through devices that have
> worked previously with VT-d:
>
> (XEN) [VT-D] cannot assign :00:1a.0 with shared RMRR at ae8a9000 for
> Dom30.
> (XEN) [VT-D] cannot assign :00:1d.0 with shared RMRR at ae8a9000 for
> Dom31.
>
> The devices are the USB 2.0 devices on a DQ67SW motherboard:
>
> 00:1a.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
> Family USB Enhanced Host Controller #2 (rev 04)
> 00:1d.0 USB controller: Intel Corporation 6 Series/C200 Series Chipset
> Family USB Enhanced Host Controller #1 (rev 04)

Please don't top post. And I'm also puzzled by you sending this to
Ian rather than the author.



Hm, I've just hit reply-all to the latest message I've found in the thread.




Technically - Tiejun, should this perhaps be permitted in relaxed
mode, at least until group assignment gets implemented? (Or


I agree. What about this?

diff --git a/xen/drivers/passthrough/vtd/iommu.c 
b/xen/drivers/passthrough/vtd/iommu.c

index 836aed5..038776a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "disallowed" : "risky",
seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }



Tamas, do you actually mean to assign these to _different_
guests, considering the log fragment above?)



No, I actually want to assign them to the same domain. The domain creation
fails with either of those devices specified for passthrough whether they
are to be attached to the same domain or not.



Tamas, could you try this in your case?

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 07/18] vmx: Initialize VT-d Posted-Interrupts Descriptor

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 10:47 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 07/18] vmx: Initialize VT-d Posted-Interrupts 
> Descriptor
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmcs.c
> > +++ b/xen/arch/x86/hvm/vmx/vmcs.c
> > @@ -39,6 +39,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> 
> I can't seem to spot what this is needed for.

It is for ' x2apic_enabled '.

> > @@ -1089,6 +1104,9 @@ static int construct_vmcs(struct vcpu *v)
> >
> >  if ( cpu_has_vmx_posted_intr_processing )
> >  {
> > +if ( iommu_intpost )
> > +pi_desc_init(v);
> 
> If this is going to remain the only call site of the function, I'd suggest
> expanding it here. This is because calling the function from elsewhere
> is, at least at the first glance, unsafe: It doesn't update pi_desc
> atomically, which is in (only apparent?) conflict with the atomic
> modifications helpers added in an earlier patch.
> 
> If further call sites will get added by later patches, clarifying in a
> comment why the non-atomic updates are okay would seem
> necessary; alternatively change them to become atomic.

I will add some comments to this function.

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 08/18] vmx: Suppress posting interrupts when 'SN' is set

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 10:53 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Tian, Kevin; xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 08/18] vmx: Suppress posting interrupts when 'SN' is 
> set
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -1701,8 +1701,36 @@ static void vmx_deliver_posted_intr(struct vcpu
> *v, u8 vector)
> >   */
> >  pi_set_on(&v->arch.hvm_vmx.pi_desc);
> >  }
> > -else if ( !pi_test_and_set_on(&v->arch.hvm_vmx.pi_desc) )
> > +else
> >  {
> > +struct pi_desc old, new, prev;
> > +
> > +/* To skip over first check in the loop below. */
> > +prev.control = 0;
> 
> Why don't you just read the field instead of adding the comment?

What do you mean by "read the field"? Could you please elaborate it?

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Tamas K Lengyel
>
> diff --git a/xen/drivers/passthrough/vtd/iommu.c
> b/xen/drivers/passthrough/vtd/iommu.c
> index 836aed5..038776a 100644
> --- a/xen/drivers/passthrough/vtd/iommu.c
> +++ b/xen/drivers/passthrough/vtd/iommu.c
> @@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
>   PCI_DEVFN2(bdf) == devfn &&
>   rmrr->scope.devices_cnt > 1 )
>  {
> +u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
> +
>  printk(XENLOG_G_ERR VTDPREFIX
> -   " cannot assign %04x:%02x:%02x.%u"
> +   " Currently its %s to assign %04x:%02x:%02x.%u"
> " with shared RMRR at %"PRIx64" for Dom%d.\n",
> +   relaxed ? "disallowed" : "risky",
>

This debug message is backwards?


> seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
> rmrr->base_address, d->domain_id);
> -return -EPERM;
> +if ( !relaxed )
> +return -EPERM;
>  }
>  }
>
>
> Tamas, do you actually mean to assign these to _different_
>>> guests, considering the log fragment above?)
>>>
>>>
>> No, I actually want to assign them to the same domain. The domain creation
>> fails with either of those devices specified for passthrough whether they
>> are to be attached to the same domain or not.
>>
>>
> Tamas, could you try this in your case?
>

Took me a while to find the xl config option to set this flag (pci = [
'sbdf, rdm_policy=strict/relaxed' ]) but now it works as expected!

Thanks,
Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Chen, Tiejun

On 9/6/2015 11:19 AM, Tamas K Lengyel wrote:


diff --git a/xen/drivers/passthrough/vtd/iommu.c
b/xen/drivers/passthrough/vtd/iommu.c
index 836aed5..038776a 100644
--- a/xen/drivers/passthrough/vtd/iommu.c
+++ b/xen/drivers/passthrough/vtd/iommu.c
@@ -2310,12 +2310,16 @@ static int intel_iommu_assign_device(
  PCI_DEVFN2(bdf) == devfn &&
  rmrr->scope.devices_cnt > 1 )
 {
+u32 relaxed = flag & XEN_DOMCTL_DEV_RDM_RELAXED;
+
 printk(XENLOG_G_ERR VTDPREFIX
-   " cannot assign %04x:%02x:%02x.%u"
+   " Currently its %s to assign %04x:%02x:%02x.%u"
" with shared RMRR at %"PRIx64" for Dom%d.\n",
+   relaxed ? "disallowed" : "risky",



This debug message is backwards?


Yeah. Its indeed like this, relaxed ? "risky" : "disallowed"

But lets wait Jan's comment to step next.





seg, bus, PCI_SLOT(devfn), PCI_FUNC(devfn),
rmrr->base_address, d->domain_id);
-return -EPERM;
+if ( !relaxed )
+return -EPERM;
 }
 }


Tamas, do you actually mean to assign these to _different_

guests, considering the log fragment above?)



No, I actually want to assign them to the same domain. The domain creation
fails with either of those devices specified for passthrough whether they
are to be attached to the same domain or not.



Tamas, could you try this in your case?



Took me a while to find the xl config option to set this flag (pci = [
'sbdf, rdm_policy=strict/relaxed' ]) but now it works as expected!



I remember 'relaxed' is a default value so 'rdm_policy' can't be dropped 
here if you like this.


Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 15/16] xen/vtd: prevent from assign the device with shared rmrr

2015-09-05 Thread Tamas K Lengyel
> Took me a while to find the xl config option to set this flag (pci = [
>> 'sbdf, rdm_policy=strict/relaxed' ]) but now it works as expected!
>>
>>
> I remember 'relaxed' is a default value so 'rdm_policy' can't be dropped
> here if you like this.
>

Without specifically setting rdm_policy=relaxed domain creation failed, so
it's definitely not the default.

Tamas
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 13/18] Update IRTE according to guest interrupt config changes

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 11:59 PM
> To: Wu, Feng
> Cc: xen-devel@lists.xen.org
> Subject: Re: [PATCH v6 13/18] Update IRTE according to guest interrupt config
> changes
> 
> >>> On 25.08.15 at 03:57,  wrote:
> 
> First of all - an empty Cc list on a patch is suspicious.

I did Cc you for this patch. Why do you say "an empty Cc list"?

> 
> > +static struct vcpu *pi_find_dest_vcpu(const struct domain *d, uint32_t
> dest_id,
> > +  bool_t dest_mode, uint8_t
> delivery_mode,
> > +  uint8_t gvec)
> > +{
> > +unsigned int dest_vcpus = 0;
> > +struct vcpu *v, *dest = NULL;
> > +
> > +if ( delivery_mode == dest_LowestPrio )
> > +return vector_hashing_dest(d, dest_id, dest_mode, gvec);
> 
> So at this point delivery_mode == dest_Fixed, right?

It won't be dest_LowestPrio here, so it might be proper to add
else if (delivery_mode == dest_Fixed) for the code below. So the
question is: Can deliver modes other than lowest priority and fixed,
such as NMI, SMI, etc. hit here? Any ideas?

> 
> > +for_each_vcpu ( d, v )
> > +{
> > +if ( !vlapic_match_dest(vcpu_vlapic(v), NULL,
> APIC_DEST_NOSHORT,
> > +dest_id, dest_mode) )
> > +continue;
> > +
> > +dest_vcpus++;
> > +dest = v;
> > +}
> > +
> > +/*
> > + * For fixed destination, we only handle single-destination
> > + * interrupts.
> > + */
> > +if ( dest_vcpus == 1 )
> > +return dest;
> 
> Is it thus even possible for the if() condition to be false? 

It can be false if the interrupts has multiple destination with Fixed
deliver mode.

> If it isn't,
> returning early from the loop would seem the better option. And
> even if it is, I would question whether delivering the interrupt to
> the first match isn't going to be better than dropping it.

Here, if we deliver all the interrupts to the first match, only this
vCPU will receive all the interrupts, even though the irq affinity
shows it has multiple destination. I don't think this is correct.
Furthermore, is there any performance issues if the interrupt frequency
is high and the matched vCPU cannot handle them in time? So here, I
just leave these kind of interrupts to legacy interrupt remapping
mechanism.

> 
> > @@ -329,11 +433,30 @@ int pt_irq_create_bind(
> >  /* Calculate dest_vcpu_id for MSI-type pirq migration. */
> >  dest = pirq_dpci->gmsi.gflags & VMSI_DEST_ID_MASK;
> >  dest_mode = !!(pirq_dpci->gmsi.gflags & VMSI_DM_MASK);
> > +delivery_mode = (pirq_dpci->gmsi.gflags >>
> GFLAGS_SHIFT_DELIV_MODE) &
> > +VMSI_DELIV_MASK;
> 
> In numbers (gflags >> 12) & 0x7000, which is likely not what you
> want.
> 
> >  dest_vcpu_id = hvm_girq_dest_2_vcpu_id(d, dest, dest_mode);
> >  pirq_dpci->gmsi.dest_vcpu_id = dest_vcpu_id;
> >  spin_unlock(&d->event_lock);
> >  if ( dest_vcpu_id >= 0 )
> >  hvm_migrate_pirqs(d->vcpu[dest_vcpu_id]);
> > +
> > +/* Use interrupt posting if it is supported. */
> > +if ( iommu_intpost )
> > +{
> > +const struct vcpu *vcpu = pi_find_dest_vcpu(d, dest,
> dest_mode,
> > +  delivery_mode,
> pirq_dpci->gmsi.gvec);
> > +
> > +if ( vcpu )
> > +{
> > +rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
> > +if ( unlikely(rc) )
> > +dprintk(XENLOG_G_INFO,
> > +"%pv: failed to update PI IRTE,
> gvec:%02x\n",
> > +vcpu, pirq_dpci->gmsi.gvec);
> 
> Even if only a debug build printk() - aren't you afraid that if this
> ever triggers, it will trigger a lot? And hence be pretty useless?

I think it reaches this debug printk rarely, but a least, when it is really 
failed, it
can give people some hints about why we are using interrupt remapping instead
of interrupt posing for the external interrupts.

> 
> > +}
> 
> (Not only) depending on the answer, I'd consider adding an "else
> printk()" here.

So do you mean something like this:

if ( vcpu )
{
rc = pi_update_irte( vcpu, info, pirq_dpci->gmsi.gvec );
if ( unlikely(rc) )
dprintk(XENLOG_G_INFO,
"%pv: failed to update PI IRTE, gvec:%02x, use 
interrupt remapping\n",
vcpu, pirq_dpci->gmsi.gvec);
else
dprintk(XENLOG_G_INFO,
"%pv: Succeed to update PI IRTE, gvec:%02x, use 
interrupt posting\n",
vcpu, pirq_dpci->gmsi.gvec);
}

Thanks,
Feng

> 
> Jan

___
Xen-devel mailing list
Xen-devel@l

Re: [Xen-devel] [PATCH v6 11/18] vt-d: Add API to update IRTE when VT-d PI is used

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 11:11 PM
> To: Wu, Feng
> Cc: Andrew Cooper; Tian, Kevin; Zhang, Yang Z; xen-devel@lists.xen.org; Keir
> Fraser
> Subject: Re: [PATCH v6 11/18] vt-d: Add API to update IRTE when VT-d PI is 
> used
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > This patch adds an API which is used to update the IRTE
> > for posted-interrupt when guest changes MSI/MSI-X information.
> >
> > CC: Yang Zhang 
> > CC: Kevin Tian 
> > CC: Keir Fraser 
> > CC: Jan Beulich 
> > CC: Andrew Cooper 
> > Signed-off-by: Feng Wu 
> > Acked-by: Kevin Tian 
> > ---
> > v6:
> > - In some error cases, the desc->lock will be unlocked twice, fix it.
> 
> Bug fixes invalidate prior acks.
> 
> > --- a/xen/drivers/passthrough/vtd/intremap.c
> > +++ b/xen/drivers/passthrough/vtd/intremap.c
> > @@ -899,3 +899,110 @@ void iommu_disable_x2apic_IR(void)
> >  for_each_drhd_unit ( drhd )
> >  disable_qinval(drhd->iommu);
> >  }
> > +
> > +static void setup_posted_irte(struct iremap_entry *new_ire,
> > +const struct pi_desc *pi_desc, const uint8_t gvec)
> > +{
> > +new_ire->post.urg = 0;
> > +new_ire->post.vector = gvec;
> > +new_ire->post.pda_l = virt_to_maddr(pi_desc) >> (32 - PDA_LOW_BIT);
> > +new_ire->post.pda_h = virt_to_maddr(pi_desc) >> 32;
> > +
> > +new_ire->post.res_1 = 0;
> > +new_ire->post.res_2 = 0;
> > +new_ire->post.res_3 = 0;
> > +new_ire->post.res_4 = 0;
> > +new_ire->post.res_5 = 0;
> > +
> > +new_ire->post.im = 1;
> > +}
> 
> I think it would be better to just clear out the structure first. This
> may then also allow for no longer naming all of the bitfield res_*
> member, making it even more obvious that they're reserved. Or
> wait - looking at the use below you seem to be updating the
> descriptor. Why would you need to zero out reserved fields in
> that case?

Here we first get the IRTE from hardware, which may be in
remapped format, we still need some of the information in it
after updating it to posted format, such as, fpd, sid, etc. So
I cannot clear it here. Besides that, there are some fields which
are needed in remapped format are defined as reserved in posted
format, I need to clear the reserved field one by one if we get a
remapped format from the hardware, here I just simply clear it
for all the cases and it is not a frequent operations, I think it is
not a big deal.

Thanks,
Feng

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Jan Beulich [mailto:jbeul...@suse.com]
> Sent: Friday, September 04, 2015 10:23 PM
> To: Wu, Feng
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Keir Fraser
> Subject: Re: [PATCH v6 02/18] Add cmpxchg16b support for x86-64
> 
> >>> On 25.08.15 at 03:57,  wrote:
> > This patch adds cmpxchg16b support for x86-64, so software
> > can perform 128-bit atomic write/read.
> >
> > CC: Keir Fraser 
> > CC: Jan Beulich 
> > CC: Andrew Cooper 
> > Signed-off-by: Feng Wu 
> > ---
> > v6:
> > - Fix a typo
> >
> > v5:
> > - Change back the parameters of __cmpxchg16b() to __uint128_t *
> > - Remove pointless cast for 'ptr'
> > - Remove pointless parentheses
> > - Use A constraint for the output
> 
> There are a few comments I had on v4 which neither have been
> addressed verbally nor have got incorporated into the patch. See
> http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg04694.html

Do you mean I missed the following part in the above link?

" But I think now I recall that it may have been me asking for using
void pointers here: That way one can pass pointers to other types.
But then you would better cast to void * here, and have the inline
function have properly typed pointers. And it should go without
saying that if you cast (or implicitly convert to void), you should
validate that what your caller handed you is at least the size of a
__uint128_t (and for "ptr" perhaps also, as I think Andrew already
pointed out, that it is suitably aligned). "

If that is the case, what about the changes below?

#define cmpxchg16b(ptr,o,n)  \
ASSERT((unsigned long)ptr & 0xF == 0);  \
ASSERT(sizeof(*o) == sizeof(__uint128_t))\
ASSERT(sizeof(*n) == sizeof(__uint128_t))\
__cmpxchg16b((ptr), (void *)(o), (void *)(n))

Thanks,
Feng

> 
> Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 02/18] Add cmpxchg16b support for x86-64

2015-09-05 Thread Wu, Feng


> -Original Message-
> From: Wu, Feng
> Sent: Sunday, September 06, 2015 2:07 PM
> To: Jan Beulich
> Cc: Andrew Cooper; xen-devel@lists.xen.org; Keir Fraser; Wu, Feng
> Subject: RE: [PATCH v6 02/18] Add cmpxchg16b support for x86-64
> 
> 
> 
> > -Original Message-
> > From: Jan Beulich [mailto:jbeul...@suse.com]
> > Sent: Friday, September 04, 2015 10:23 PM
> > To: Wu, Feng
> > Cc: Andrew Cooper; xen-devel@lists.xen.org; Keir Fraser
> > Subject: Re: [PATCH v6 02/18] Add cmpxchg16b support for x86-64
> >
> > >>> On 25.08.15 at 03:57,  wrote:
> > > This patch adds cmpxchg16b support for x86-64, so software
> > > can perform 128-bit atomic write/read.
> > >
> > > CC: Keir Fraser 
> > > CC: Jan Beulich 
> > > CC: Andrew Cooper 
> > > Signed-off-by: Feng Wu 
> > > ---
> > > v6:
> > > - Fix a typo
> > >
> > > v5:
> > > - Change back the parameters of __cmpxchg16b() to __uint128_t *
> > > - Remove pointless cast for 'ptr'
> > > - Remove pointless parentheses
> > > - Use A constraint for the output
> >
> > There are a few comments I had on v4 which neither have been
> > addressed verbally nor have got incorporated into the patch. See
> > http://lists.xenproject.org/archives/html/xen-devel/2015-07/msg04694.html
> 
> Do you mean I missed the following part in the above link?
> 
> " But I think now I recall that it may have been me asking for using
> void pointers here: That way one can pass pointers to other types.
> But then you would better cast to void * here, and have the inline
> function have properly typed pointers. And it should go without
> saying that if you cast (or implicitly convert to void), you should
> validate that what your caller handed you is at least the size of a
> __uint128_t (and for "ptr" perhaps also, as I think Andrew already
> pointed out, that it is suitably aligned). "
> 
> If that is the case, what about the changes below?
> 
> #define cmpxchg16b(ptr,o,n)
> \
> ASSERT((unsigned long)ptr & 0xF == 0);
> \
> ASSERT(sizeof(*o) == sizeof(__uint128_t))
> \
> ASSERT(sizeof(*n) == sizeof(__uint128_t))
> \
> __cmpxchg16b((ptr), (void *)(o), (void *)(n))

Seems there is a build error with this change, we cannot
add stuff before __cmpxchg() since it needs return some
value to the caller. Any suggestion here?

Thanks,
Feng


> 
> Thanks,
> Feng
> 
> >
> > Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 0/5] Intel Code/Data Prioritization(CDP) feature enabling

2015-09-05 Thread He Chen
On Wed, Sep 02, 2015 at 01:08:33PM +0100, Andrew Cooper wrote:
> On 02/09/15 09:27, He Chen wrote:
> > Hi all,
> >
> > Code/Data Prioritization(CDP) is offered in Intel Broadwell and later server
> > platforms, which is an extension of CAT. CDP enables isolation and separate
> > prioritization of code and data fetches to the L3 cache in a software
> > configurable manner, which can enable workload prioritization and tuning of
> > cache capacity to the characteristics of the workload. CDP extends Cache
> > Allocation Technology (CAT) by providing separate code and data capacity bit
> > masks(CBM) per Class of Service (COS). CDP is used on VM basis in the Xen
> > implementation.
> >
> > More information about CDP, please refer to Intel SDM, Volumn 3, section 
> > 17.16
> > http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf
> >
> > This patch series enables CDP feature in Xen based on CAT code, including
> > extending CBM operation functions and introducing new commands to 
> > enable/disable
> > CDP dynamically. For all the changes, please see in each patch.
> >
> > This patchset has been tested on Intel Broadwell server platform.
> >
> > To make this patchset better, any comment or suggestion is welcomed, I would
> > really appreciate it.
> 
> I have taken a look at patches 1-3.  For the most part, it looks good.
> 
> The main point I have is on patch 2, as to whether it is sensible to
> permit enabling/disabling cdp at runtime.  I suggest that it is not
> sensible, and should be a command line parameter instead.
> 
> If this is agreed as ok going forwards, patches 3 through 5 should
> become rather more simple.
> 
> ~Andrew

Thanks for your patient review and valuable suggestions.

About permitting enabling/disabling CDP at runtime, I agree with you to
use command line parameter instead, it really makes code simple and
reliable.

For caution's sake, hardware support confingure CDP dynamically at any
point during normal system operation according to Intel SDM (see section
17.16.2), and that is why I wrote patch 2.

Anyway, since there is few cases to change CDP at runtime, I think it is
better to make this a boot-time parameter. I would resend v2 patch soon
and thanks again~

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel