date:20150317

Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number

2015-03-17 Thread Yijing Wang

On 2015/3/18 12:26, Manish Jaggi wrote:
> 
> On Tuesday 17 March 2015 07:35 PM, Ian Campbell wrote:
>> On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote:
>>> On Monday 09 March 2015 08:04 AM, Yijing Wang wrote:
 Now we could pass PCI domain combined with bus number
 in u32 argu. Because in arm/arm64, PCI domain number
 is assigned by pci_bus_assign_domain_nr(). So we leave
 pci_scan_root_bus() and pci_create_root_bus() in arm/arm64
 unchanged. A new function pci_host_assign_domain_nr()
 will be introduced for arm/arm64 to assign domain number
 in later patch.
>>> Hi,
>>> I think these changes might not be required. We have made very few
>>> changes in the xen-pcifront to support PCI passthrough in arm64.
>>> As per xen architecture for a domU only a single pci virtual bus is
>>> created and all passthrough devices are attached to it.
>> I guess you are only talking about the changes to xen-pcifront.c?
>> Otherwise you are ignoring the dom0 case which is exposed to the real
>> set of PCI root complexes and anyway I'm not sure how "not needed for
>> Xen domU" translates into not required, since it is clearly required for
>> other systems.
>>
>> Strictly speaking the Xen pciif protocol does support multiple buses,
>> it's just that the tools, and perhaps kernels, have not yet felt any
>> need to actually make use of that.
>>
>> There doesn't seem to be any harm in updating pcifront to follow this
>> generic API change.
> ok.
> 
> One side question, the function
> 
>  pci_host_assign_domain_nr()
> 
> which would be introduced in later patch, does it appear to be doing the same 
> binding which we are trying to implement via a pci_host_bridge add hypercall.

pci_host_assign_domain_nr() will be called only when CONFIG_PCI_DOMAINS_GENERIC 
enabled, now mostly be used in arm/arm64.

Thanks!
Yijing.

> 
>>
>> Ian.
>>
> 
> 
> 


-- 
Thanks!
Yijing


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 3/5] xen: print online pCPUs and free pCPUs when dumping

2015-03-17 Thread Juergen Gross


On 03/17/2015 04:33 PM, Dario Faggioli wrote:

e.g., with  `xl debug-key r', like this:

   (XEN) Online Cpus: 0-15
   (XEN) Free Cpus: 8-15

Also, for each cpupool, print the set of pCPUs it
contains, like this:

   (XEN) Cpupool 0:
   (XEN) Cpus: 0-7
   (XEN) Scheduler: SMP Credit Scheduler (credit)

Signed-off-by: Dario Faggioli 


Acked-by: Juergen Gross 


Cc: Juergen Gross 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Keir Fraser 
---
Changes from v1:
  * _print_cpumap() becomes print_cpumap() (i.e., the
leading '_' was not particularly useful in this
case), as suggested during review
  * changed the output such as (1) we only print the
maps, not the number of elements, and (2) we avoid
printing the free cpus map when empty
  * improved the changelog
---
I'm not including any Reviewed-by / Acked-by tag,
since the patch changed.
---
  xen/common/cpupool.c |   12 
  1 file changed, 12 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index cd6aab9..812a2f9 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -17,6 +17,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 

  #define for_each_cpupool(ptr)\
@@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
  return ret;
  }

+static void print_cpumap(const char *str, const cpumask_t *map)
+{
+cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map);
+printk("%s: %s\n", str, keyhandler_scratch);
+}
+
  void dump_runq(unsigned char key)
  {
  unsigned longflags;
@@ -671,12 +678,17 @@ void dump_runq(unsigned char key)
  sched_smt_power_savings? "enabled":"disabled");
  printk("NOW=0x%08X%08X\n",  (u32)(now>>32), (u32)now);

+print_cpumap("Online Cpus", &cpu_online_map);
+if ( cpumask_weight(&cpupool_free_cpus) )
+print_cpumap("Free Cpus", &cpupool_free_cpus);
+
  printk("Idle cpupool:\n");
  schedule_dump(NULL);

  for_each_cpupool(c)
  {
  printk("Cpupool %d:\n", (*c)->cpupool_id);
+print_cpumap("Cpus", (*c)->cpu_valid);
  schedule_dump(*c);
  }






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number

2015-03-17 Thread Manish Jaggi



On Tuesday 17 March 2015 07:35 PM, Ian Campbell wrote:

On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote:

On Monday 09 March 2015 08:04 AM, Yijing Wang wrote:

Now we could pass PCI domain combined with bus number
in u32 argu. Because in arm/arm64, PCI domain number
is assigned by pci_bus_assign_domain_nr(). So we leave
pci_scan_root_bus() and pci_create_root_bus() in arm/arm64
unchanged. A new function pci_host_assign_domain_nr()
will be introduced for arm/arm64 to assign domain number
in later patch.

Hi,
I think these changes might not be required. We have made very few
changes in the xen-pcifront to support PCI passthrough in arm64.
As per xen architecture for a domU only a single pci virtual bus is
created and all passthrough devices are attached to it.

I guess you are only talking about the changes to xen-pcifront.c?
Otherwise you are ignoring the dom0 case which is exposed to the real
set of PCI root complexes and anyway I'm not sure how "not needed for
Xen domU" translates into not required, since it is clearly required for
other systems.

Strictly speaking the Xen pciif protocol does support multiple buses,
it's just that the tools, and perhaps kernels, have not yet felt any
need to actually make use of that.

There doesn't seem to be any harm in updating pcifront to follow this
generic API change.

ok.

One side question, the function

 pci_host_assign_domain_nr()

which would be introduced in later patch, does it appear to be doing the 
same binding which we are trying to implement via a pci_host_bridge add 
hypercall.




Ian.




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Manish Jaggi



On Tuesday 17 March 2015 06:01 PM, Jan Beulich wrote:

On 17.03.15 at 13:06,  wrote:

On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote:

On 17.03.15 at 06:26,  wrote:

In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
hypercall to inform xen that a new pci device has been added.
If we were to inform xen about a new pci bus that is added there are  2 ways
a) Issue the hypercall from drivers/pci/probe.c
b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
segment number (s_bdf), it will return an error
SEG_NO_NOT_FOUND. After that the linux xen code could issue the
PHYSDEVOP_pci_host_bridge_add hypercall.

I think (b) can be done with minimal code changes. What do you think ?

I'm pretty sure (a) would even be refused by the maintainers, unless
there already is a notification being sent. As to (b) - kernel code could
keep track of which segment/bus pairs it informed Xen about, and
hence wouldn't even need to wait for an error to be returned from
the device-add request (which in your proposal would need to be re-
issued after the host-bridge-add).

Have a query on the CFG space address to be passed as hypercall parameter.
The of_pci_get_host_bridge_resource only parses the ranges property and
not reg.
reg property has the CFG space address, which is usually stored in
private pci host controller driver structures.

so pci_dev 's parent pci_bus would not have that info.
One way is to add a method in struct pci_ops but not sure it will be
accepted or not.

I'm afraid I don't understand what you're trying to tell me.

Hi Jan,
I missed this during initial discussion and found out while coding that 
CFG Space address  of a pci host is stored in the reg property
and the of_pci code dos not store reg in the resources only ranges are 
stored. So the pci_bus which is the rootbus created in the probe 
function of the
pcie controller driver will have ranges values in resources but reg 
property value (CFG space address) in the private data.
So from drivers/xen/pci.c we can find out the root bus (pci_bus) from 
the pci_dev (via BUS_NOTIFY) but cannot get the CFG space address.


Now there are 2 ways
a) Add a pci_ops to return the CFG space address
b) Let the pci host controller driver invoke a function 
xen_invoke_hypercall () providing bus number and cfg_space address.
xen_invoke_hypercall would be implemented in drivers/xen/pci.c and 
would issue  PHYSDEVOP_pci_host_bridge_add hypercall

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 5/5] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Meng Xu

2015-03-17 11:33 GMT-04:00 Dario Faggioli :
> In fact, printing the cpupool's CPU online mask
> for each vCPU is just redundant, as that is the
> same for all the vCPUs of all the domains in the
> same cpupool, while hard affinity is already part
> of the output of dumping domains info.
>
> Instead, print the intersection between hard
> affinity and online CPUs, which is --in case of this
> scheduler-- the effective affinity always used for
> the vCPUs.
>
> This change also takes the chance to add a scratch
> cpumask area, to avoid having to either put one
> (more) cpumask_t on the stack, or dynamically
> allocate it within the dumping routine. (The former
> being bad because hypervisor stack size is limited,
> the latter because dynamic allocations can fail, if
> the hypervisor was built for a large enough number
> of CPUs.)
>
> Such scratch area can be used to kill most of the
> cpumasks{_var}_t local variables in other functions
> in the file, but that is *NOT* done in this chage.
>
> Finally, convert the file to use keyhandler scratch,
> instead of open coded string buffers.
>
> Signed-off-by: Dario Faggioli 
> Cc: George Dunlap 
> Cc: Meng Xu 
> Cc: Jan Beulich 
> Cc: Keir Fraser 
> ---
> Changes from v1:
>  * improved changelog;
>  * made a local variable to point to the correct
>scratch mask, as suggested during review.
> ---

Reviewed-by: Meng Xu 

Thanks,

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Meng Xu

Hi Dario,

2015-03-17 10:12 GMT-04:00 Dario Faggioli :
> On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote:
>> Hi Dario,
>>
> Hey,
>
>> 2015-03-16 13:05 GMT-04:00 Dario Faggioli :
>
>> >
>> > This change also takes the chance to add a scratch
>> > cpumask, to avoid having to create one more
>> > cpumask_var_t on the stack of the dumping routine.
>>
>> Actually, I have a question about the strength of this design. When we
>> have a machine with many cpus, we will end up with allocating a
>> cpumask for each cpu.
>>
> Just FTR, what we will end up allocating is:
>  - an array of *pointers* to cpumasks with as many elements as the
>number of pCPUs,
>  - a cpumask *only* for the pCPUs subjected to an instance of the RTDS
>scheduler.
>
> So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler
> only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to
> cpumask_t, but only 2 actual cpumasks.
>
>> Is this better than having a cpumask_var_t on
>> the stack of the dumping routine, since the dumping routine is not in
>> the hot path?
>>
> George and Jan replied to this already, I think. Allow me to add just a
> few words:
>> > Such scratch area can be used to kill most of the
>> > cpumasks_var_t local variables in other functions
>> > in the file, but that is *NOT* done in this chage.
>> >
> This is the point, actually! As said here, this is not only for the sake
> of the dumping routine. In fact, ideally, someone will, in the near
> future, go throughout the whole file and kill most of the cpumask_t
> local variables, and most of the cpumask dynamic allocations, in favour
> of using this scratch area.
>
>> > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
>> >  if ( prv == NULL )
>> >  return -ENOMEM;
>> >
>> > +_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
>>
>> Is it better to use xzalloc_array?
>>
> Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the
> elements of the array that have been previously successfully allocated
> (in rt_alloc_pdata()), so I don't think there is any special requirement
> for all the elements to be NULL right away.

OK. I see.

>
>> > +if ( _cpumask_scratch == NULL )
>> > +return -ENOMEM;
>> > +
>> >  spin_lock_init(&prv->lock);
>> >  INIT_LIST_HEAD(&prv->sdom);
>> >  INIT_LIST_HEAD(&prv->runq);
>> > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
>> >  {
>> >  struct rt_private *prv = rt_priv(ops);
>> >
>> > +xfree(_cpumask_scratch);
>> >  xfree(prv);
>> >  }
>> >
>> > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
>> >  per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
>> >  spin_unlock_irqrestore(&prv->lock, flags);
>> >
>> > +if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
>>
>> Is it better to use zalloc_cpumask_var() here?
>>
> Nope. It's a scratch area, after all, so one really should not assume it
> to be in a specific state (e.g., no bits set as you're suggesting) when
> using it.

I see the point. Now I got it. :-)

>
> Thanks and Regards,

Thank you very much for clarification! :-)

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Meng Xu

>>>
>>> This change also takes the chance to add a scratch
>>> cpumask, to avoid having to create one more
>>> cpumask_var_t on the stack of the dumping routine.
>>
>> Actually, I have a question about the strength of this design. When we
>> have a machine with many cpus, we will end up with allocating a
>> cpumask for each cpu. Is this better than having a cpumask_var_t on
>> the stack of the dumping routine, since the dumping routine is not in
>> the hot path?
>
> The reason for taking this off the stack is that the hypervisor stack is
> a fairly limited resource -- IIRC it's only 8k (for each cpu).  If the
> call stack gets too deep, the hypervisor will triple-fault.  Keeping
> really large variables like cpumasks off the stack is key to making sure
> we don't get close to that.

I see. I didn't realize the fact of the limited size of hypervisor
stack. That makes sense.


Thank you very much for clarification! :-)

Best,

Meng

---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-17 Thread Bob Liu


On 03/17/2015 10:52 PM, Felipe Franciosi wrote:
> Hi Bob,
> 
> I've put the hardware back together and am sorting out the software for 
> testing. Things are not moving as fast as I wanted due to other commitments. 
> I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to 
> get his patches to work on a newer Xen.
> 

Thank you!

> The evaluation will compare:
> 1) bare metal i/o (for baseline)
> 2) tapdisk3 (currently using grant copy, which is what scales best in my 
> experience)
> 3) blkback w/ persistent grants
> 4) blkback w/o persistent grants (I will just comment out the handshake bits 
> in blkback/blkfront)
> 5) blkback w/o persistent grants + Malcolm's grant map patches
> 

I think you need to add the patches from Christoph Egger with title
"[PATCH v5 0/2] gnttab: Improve scaleability" here.
http://lists.xen.org/archives/html/xen-devel/2015-02/msg01188.html


> To my knowledge, blkback (w/ or w/o persistent grants) is always faster than 
> user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. 
> However, tapdisk with grant copy has been shown to produce (much) better 
> aggregate throughput figures as it avoids any issues with grant (un)mapping.
> 
> I'm hoping to show that (5) above scales better than (3) and (4) in a 
> representative scenario. If it does, I will recommend that we get rid of 
> persistent grants in favour of a better and more scalable grant (un)mapping 
> implementation.
> 

Right, but even if 5) have better performance, we have to make sure
older hypervisors with new linux kernel won't be affected after get rid
of persistent grants.

-- 
Regards,
-Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Any work on sharing of large multi-page segments?

2015-03-17 Thread Andrew Warkentin

On 3/17/15, Jan Beulich  wrote:
> And how would that be significantly different from the batching
> that's already built into the grant table hypercall?
>
I guess it does do more or less what I want already. I was looking
more at the inner mapping/unmapping functions, rather than the
wrappers around them that implement the actual hypercalls.

What would be a useful addition would be support for granting 2M
pages. That would eliminate any problem with running out of grant
table slots.

On 3/17/15, George Dunlap  wrote:
> Any deduplication code would run in as a process probably in domain 0,
> and may be somewhat slow; but the actual mechanism of sharing is a
> generic mechanism in the hypervisor which any client can use.  Jan is
> suggesting that you might be able to use that interface to
> pro-actively tell Xen about the memory pages shared between your
> various domains.
>

I wasn't quite sure if it's generic enough to use to implement shared
segments, or if it were specific to deduplication at the hypervisor
level.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-17 Thread Sander Eikelenboom


Tuesday, March 17, 2015, 6:44:54 PM, you wrote:


>> >> Additionally I think it should be considered whether the bitmap
>> >> approach of interpreting ->state is the right one, and we don't
>> >> instead want a clean 3-state (idle, sched, run) model.
>> > 
>> > Could you elaborate a bit more please? As in three different unsigned int
>> > (or bool_t) that set in what state we are in?
>> 
>> An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially
>> if my comment above turns out to be wrong, you'd have no real
>> need for the SCHED and RUN flags to be set at the same time.

> I cobbled what I believe is what you were thinking off.

> As you can see to preserve the existing functionality such as
> being able to schedule N amount of interrupt injections
> for the N interrupts we might get - I modified '->masked'
> to be an atomic counter.

> The end result is that we can still live-lock. Unless we:
>  - Drop on the floor the injection of N interrupts and
>just deliever at max one per VMX_EXIT (and not bother
>with interrupts arriving when we are in the VMX handler).

>  - Alter the softirq code slightly - to have an variant
>which will only iterate once over the pending softirq
>bits per call. (so save an copy of the bitmap on the
>stack when entering the softirq handler - and use that.
>We could also xor it against the current to catch any
>non-duplicate bits being set that we should deal with).

> Here is the compile, but not run-time tested patch.

> From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001
> From: Konrad Rzeszutek Wilk 
> Date: Tue, 17 Mar 2015 13:31:52 -0400
> Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap

> *TODO*:
>  - Writeup.
>  - Tests

Done, and unfortunately it doesn't fly ..
Some devices seem to work fine, others don't receive any interrupts shortly 
after boot like:
 40:  3  0  0  0  xen-pirq-ioapic-level  
cx25821[1]

Don't see any crashes or errors though, so it seems to silently lock somewhere.

--
Sander

> Suggested-by: Jan Beulich 
> Signed-off-by: Konrad Rzeszutek Wilk 
> ---
>  xen/drivers/passthrough/io.c | 140 
> ---
>  xen/include/xen/hvm/irq.h|   4 +-
>  2 files changed, 82 insertions(+), 62 deletions(-)

> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..663e104 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -30,42 +30,28 @@
>  static DEFINE_PER_CPU(struct list_head, dpci_list);
>  
>  /*
> - * These two bit states help to safely schedule, deschedule, and wait until
> - * the softirq has finished.
> - *
> - * The semantics behind these two bits is as follow:
> - *  - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom).
> - *  - STATE_RUN - only softirq is allowed to set and clear it. If it has
> - *  been set hvm_dirq_assist will RUN with a saved value of the
> - *  'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was 
> set.
> - *
> - * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(unset).
> - *
> - * However the states can also diverge such as: STATE_SCHED(set) ->
> - * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means
> - * the 'hvm_dirq_assist' never run and that the softirq did not do any
> - * ref-counting.
> - */
> -
> -enum {
> -STATE_SCHED,
> -STATE_RUN
> -};
> -
> -/*
>   * This can be called multiple times, but the softirq is only raised once.
> - * That is until the STATE_SCHED state has been cleared. The state can be
> - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
> + * That is until state is in init. The state can be changed  by:
> + * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
> + * or by 'pt_pirq_softirq_reset' (which will try to init the state before
>   * the softirq had a chance to run).
>   */
>  static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
>  {
>  unsigned long flags;
>  
> -if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
> +switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) )
> +{
> +case STATE_RUN:
> +case STATE_SCHED:
> +/*
> + * The pirq_dpci->mapping has been incremented to let us know
> + * how many we have left to do.
> + */
>  return;
> +case STATE_INIT:
> +break;
> +}
>  
>  get_knownalive_domain(pirq_dpci->dom);
>  
> @@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci 
> *pirq_dpci)
>   */
>  bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
>  {
> -if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
> +if ( pirq_dpci->state != STATE_INIT )
>  return 1;
>  
>  /*
> @@ -109,22 +95,22 @@ static void p

[Xen-devel] [PATCH 1/3] checkpolicy: Expand allowed character set in paths

2015-03-17 Thread Daniel De Graaf

In order to support paths containing spaces or other characters, allow a
quoted string with these characters to be parsed as a path in addition
to the existing unquoted string.

Signed-off-by: Daniel De Graaf 
---
 checkpolicy/policy_parse.y | 3 +++
 checkpolicy/policy_scan.l  | 1 +
 2 files changed, 4 insertions(+)

diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y
index 15c8997..e5210bd 100644
--- a/checkpolicy/policy_parse.y
+++ b/checkpolicy/policy_parse.y
@@ -81,6 +81,7 @@ typedef int (* require_func_t)(int pass);
 %type  require_decl_def
 
 %token PATH
+%token QPATH
 %token FILENAME
 %token CLONE
 %token COMMON
@@ -805,6 +806,8 @@ filesystem  : FILESYSTEM
 ;
 path   : PATH
{ if (insert_id(yytext,0)) return -1; }
+   | QPATH
+   { yytext[strlen(yytext) - 1] = '\0'; if 
(insert_id(yytext + 1,0)) return -1; }
;
 filename   : FILENAME
{ yytext[strlen(yytext) - 1] = '\0'; if 
(insert_id(yytext + 1,0)) return -1; }
diff --git a/checkpolicy/policy_scan.l b/checkpolicy/policy_scan.l
index 648e1d6..6763c38 100644
--- a/checkpolicy/policy_scan.l
+++ b/checkpolicy/policy_scan.l
@@ -240,6 +240,7 @@ HIGH{ return(HIGH); }
 low |
 LOW{ return(LOW); }
 "/"({alnum}|[_\.\-/])* { return(PATH); }
+\""/"[ !#-~]*\"{ return(QPATH); }
 \"({alnum}|[_\.\-\+\~\: ])+\"  { return(FILENAME); }
 {letter}({alnum}|[_\-])*([\.]?({alnum}|[_\-]))*{ return(IDENTIFIER); }
 {digit}+|0x{hexval}+{ return(NUMBER); }
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 3/3] libsepol, checkpolicy: add device tree ocontext nodes to Xen policy

2015-03-17 Thread Daniel De Graaf

In Xen on ARM, device tree nodes identified by a path (string) need to
be labeled by the security policy.

Signed-off-by: Daniel De Graaf 
---
 checkpolicy/policy_define.c| 55 +
 checkpolicy/policy_define.h|  1 +
 checkpolicy/policy_parse.y |  8 +++-
 checkpolicy/policy_scan.l  |  2 +
 libsepol/cil/src/cil.c | 17 
 libsepol/cil/src/cil_binary.c  | 29 +
 libsepol/cil/src/cil_build_ast.c   | 66 ++
 libsepol/cil/src/cil_build_ast.h   |  2 +
 libsepol/cil/src/cil_copy_ast.c| 24 +++
 libsepol/cil/src/cil_flavor.h  |  1 +
 libsepol/cil/src/cil_internal.h| 10 +
 libsepol/cil/src/cil_post.c| 34 +++
 libsepol/cil/src/cil_reset_ast.c   | 10 +
 libsepol/cil/src/cil_resolve_ast.c | 28 +
 libsepol/cil/src/cil_tree.c| 13 ++
 libsepol/cil/src/cil_verify.c  | 24 +++
 libsepol/include/sepol/policydb/policydb.h |  1 +
 libsepol/src/expand.c  |  7 
 libsepol/src/policydb.c| 18 +++-
 libsepol/src/write.c   | 14 ++-
 sepolgen/src/sepolgen/refparser.py | 11 +
 sepolgen/src/sepolgen/refpolicy.py |  9 
 22 files changed, 379 insertions(+), 5 deletions(-)

diff --git a/checkpolicy/policy_define.c b/checkpolicy/policy_define.c
index 66c1ff2..de01f6f 100644
--- a/checkpolicy/policy_define.c
+++ b/checkpolicy/policy_define.c
@@ -4116,6 +4116,61 @@ bad:
return -1;
 }
 
+int define_devicetree_context()
+{
+   ocontext_t *newc, *c, *l, *head;
+
+   if (policydbp->target_platform != SEPOL_TARGET_XEN) {
+   yyerror("devicetreecon not supported for target");
+   return -1;
+   }
+
+   if (pass == 1) {
+   free(queue_remove(id_queue));
+   parse_security_context(NULL);
+   return 0;
+   }
+
+   newc = malloc(sizeof(ocontext_t));
+   if (!newc) {
+   yyerror("out of memory");
+   return -1;
+   }
+   memset(newc, 0, sizeof(ocontext_t));
+
+   newc->u.name = (char *)queue_remove(id_queue);
+   if (!newc->u.name) {
+   free(newc);
+   return -1;
+   }
+
+   if (parse_security_context(&newc->context[0])) {
+   free(newc->u.name);
+   free(newc);
+   return -1;
+   }
+
+   head = policydbp->ocontexts[OCON_XEN_DEVICETREE];
+   for (l = NULL, c = head; c; l = c, c = c->next) {
+   if (strcmp(newc->u.name, c->u.name) == 0) {
+   yyerror2("duplicate devicetree entry for '%s'", 
newc->u.name);
+   goto bad;
+   }
+   }
+
+   if (l)
+   l->next = newc;
+   else
+   policydbp->ocontexts[OCON_XEN_DEVICETREE] = newc;
+
+   return 0;
+
+bad:
+   free(newc->u.name);
+   free(newc);
+   return -1;
+}
+
 int define_port_context(unsigned int low, unsigned int high)
 {
ocontext_t *newc, *c, *l, *head;
diff --git a/checkpolicy/policy_define.h b/checkpolicy/policy_define.h
index 14d30e1..a87ced3 100644
--- a/checkpolicy/policy_define.h
+++ b/checkpolicy/policy_define.h
@@ -49,6 +49,7 @@ int define_pirq_context(unsigned int pirq);
 int define_iomem_context(uint64_t low, uint64_t high);
 int define_ioport_context(unsigned long low, unsigned long high);
 int define_pcidevice_context(unsigned long device);
+int define_devicetree_context(void);
 int define_range_trans(int class_specified);
 int define_role_allow(void);
 int define_role_trans(int class_specified);
diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y
index e3899b9..8b81f04 100644
--- a/checkpolicy/policy_parse.y
+++ b/checkpolicy/policy_parse.y
@@ -130,7 +130,7 @@ typedef int (* require_func_t)(int pass);
 %token TARGET
 %token SAMEUSER
 %token FSCON PORTCON NETIFCON NODECON 
-%token PIRQCON IOMEMCON IOPORTCON PCIDEVICECON
+%token PIRQCON IOMEMCON IOPORTCON PCIDEVICECON DEVICETREECON
 %token FSUSEXATTR FSUSETASK FSUSETRANS
 %token GENFSCON
 %token U1 U2 U3 R1 R2 R3 T1 T2 T3 L1 L2 H1 H2
@@ -644,7 +644,8 @@ dev_contexts: dev_context_def
 dev_context_def: pirq_context_def |
  iomem_context_def |
  ioport_context_def |
- pci_context_def
+ pci_context_def |
+ dtree_context_def
;
 pirq_context_def   : PIRQCON number security_context_def
{if (define_pirq_context($2)) return -1;}
@@ -662,6 +663,9 @@ ioport_context_def  : IOPORTCON number security_context_def
 pci_context_def: PCIDEVICECON number security_context_def

[Xen-devel] [PATCH v3 0/3] Xen/FLASK policy updates for device contexts

2015-03-17 Thread Daniel De Graaf

In order to support assigning security lables to ARM device tree nodes
in Xen's XSM policy, a new ocontext type is needed in the security
policy.

In addition to adding the new ocontext, the existing I/O memory range
ocontext is expanded to 64 bits in order to support hardware with more
than 44 bits of physical address space (32-bit count of 4K pages).

Changes from v2:
 - Clean up printf format strings for 32-bit builds

Changes from v1:
 - Use policy version 30 instead of forking the version numbers for Xen;
   this removes the need for v1's patch 3.
 - Report an error when attempting to use an I/O memory range that
   requires a 64-bit representation with an old policy output version
   that cannot support this
 - Fix a few incorrect references to PCIDEVICECON
 - Reorder patches to clarify the allowed characterset of device tree
   paths

[PATCH 1/3] checkpolicy: Expand allowed character set in paths
[PATCH 2/3] libsepol, checkpolicy: widen Xen IOMEM ocontext entries
[PATCH 3/3] libsepol, checkpolicy: add device tree ocontext nodes to

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] libsepol, checkpolicy: widen Xen IOMEM ocontext entries

2015-03-17 Thread Daniel De Graaf

This expands IOMEMCON device context entries to 64 bits.  This change is
required to support static I/O memory range labeling for systems with
over 16TB of physical address space.  The policy version number change
is shared with the next patch.

While this makes no changes to SELinux policy, a new SELinux policy
compatibility entry was added in order to avoid breaking compilation of
an SELinux policy without explicitly specifying the policy version.

Signed-off-by: Daniel De Graaf 
---
 checkpolicy/policy_define.c| 11 +-
 checkpolicy/policy_define.h|  2 +-
 checkpolicy/policy_parse.y |  9 ++--
 libsepol/cil/src/cil_build_ast.c   | 32 ++---
 libsepol/cil/src/cil_build_ast.h   |  1 +
 libsepol/cil/src/cil_internal.h|  4 ++--
 libsepol/cil/src/cil_policy.c  |  3 ++-
 libsepol/cil/src/cil_tree.c|  3 ++-
 libsepol/include/sepol/policydb/policydb.h |  7 ---
 libsepol/src/policydb.c| 33 +-
 libsepol/src/write.c   | 32 ++---
 policycoreutils/hll/pp/pp.c|  4 ++--
 12 files changed, 109 insertions(+), 32 deletions(-)

diff --git a/checkpolicy/policy_define.c b/checkpolicy/policy_define.c
index a6c5d65..66c1ff2 100644
--- a/checkpolicy/policy_define.c
+++ b/checkpolicy/policy_define.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -3932,7 +3933,7 @@ bad:
return -1;
 }
 
-int define_iomem_context(unsigned long low, unsigned long high)
+int define_iomem_context(uint64_t low, uint64_t high)
 {
ocontext_t *newc, *c, *l, *head;
char *id;
@@ -3960,7 +3961,7 @@ int define_iomem_context(unsigned long low, unsigned long 
high)
newc->u.iomem.high_iomem = high;
 
if (low > high) {
-   yyerror2("low memory 0x%lx exceeds high memory 0x%lx", low, 
high);
+   yyerror2("low memory 0x%"PRIx64" exceeds high memory 
0x%"PRIx64"", low, high);
free(newc);
return -1;
}
@@ -3972,13 +3973,13 @@ int define_iomem_context(unsigned long low, unsigned 
long high)
 
head = policydbp->ocontexts[OCON_XEN_IOMEM];
for (l = NULL, c = head; c; l = c, c = c->next) {
-   uint32_t low2, high2;
+   uint64_t low2, high2;
 
low2 = c->u.iomem.low_iomem;
high2 = c->u.iomem.high_iomem;
if (low <= high2 && low2 <= high) {
-   yyerror2("iomemcon entry for 0x%lx-0x%lx overlaps with "
-   "earlier entry 0x%x-0x%x", low, high,
+   yyerror2("iomemcon entry for 0x%"PRIx64"-0x%"PRIx64" 
overlaps with "
+   "earlier entry 0x%"PRIx64"-0x%"PRIx64"", low, 
high,
low2, high2);
goto bad;
}
diff --git a/checkpolicy/policy_define.h b/checkpolicy/policy_define.h
index 4ef0f4f..14d30e1 100644
--- a/checkpolicy/policy_define.h
+++ b/checkpolicy/policy_define.h
@@ -46,7 +46,7 @@ int define_permissive(void);
 int define_polcap(void);
 int define_port_context(unsigned int low, unsigned int high);
 int define_pirq_context(unsigned int pirq);
-int define_iomem_context(unsigned long low, unsigned long high);
+int define_iomem_context(uint64_t low, uint64_t high);
 int define_ioport_context(unsigned long low, unsigned long high);
 int define_pcidevice_context(unsigned long device);
 int define_range_trans(int class_specified);
diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y
index e5210bd..e3899b9 100644
--- a/checkpolicy/policy_parse.y
+++ b/checkpolicy/policy_parse.y
@@ -67,6 +67,7 @@ typedef int (* require_func_t)(int pass);
 
 %union {
unsigned int val;
+   uint64_t val64;
uintptr_t valptr;
void *ptr;
 require_func_t require_func;
@@ -78,6 +79,7 @@ typedef int (* require_func_t)(int pass);
 %type  role_def roles
 %type  cexpr cexpr_prim op role_mls_op
 %type  ipv4_addr_def number
+%type  number64
 %type  require_decl_def
 
 %token PATH
@@ -647,9 +649,9 @@ dev_context_def : pirq_context_def |
 pirq_context_def   : PIRQCON number security_context_def
{if (define_pirq_context($2)) return -1;}
;
-iomem_context_def  : IOMEMCON number security_context_def
+iomem_context_def  : IOMEMCON number64 security_context_def
{if (define_iomem_context($2,$2)) return -1;}
-   | IOMEMCON number '-' number security_context_def
+   | IOMEMCON number64 '-' number64 security_context_def
{if (define_iomem_context($2,$4)) return -1;}
;
 ioport_context_def : IOPORTCON number security_context_def
@@ -815,6 +817,9 @@ filename

Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code

2015-03-17 Thread Dario Faggioli

On Mon, 2015-03-16 at 12:56 +, Jan Beulich wrote:
> >>> On 16.03.15 at 13:51,  wrote:
> > On 03/16/2015 12:48 PM, Jan Beulich wrote:

> >> Them returning garbage isn't what needs fixing. Instead the code
> >> here should use a different condition to check whether this is the
> >> boot CPU (e.g. looking at system_state). And that can very well be
> >> done directly in this patch.
> > 
> > What do you suggest, then?
> 
> My preferred solution would be, as said, to leverage system_state.
> Provided the state to look for is consistent between x86 and ARM.
> 
Would something like this make sense?

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cfca5a7..2f2aa73 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int 
cpu)
 }
 
 /* Figure out which runqueue to put it in */
-rqi = 0;
-
-/* Figure out which runqueue to put it in */
-/* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to 
runqueue 0. */
-if ( cpu == 0 )
-rqi = 0;
+if ( system_state == SYS_STATE_boot )
+rqi = boot_cpu_to_socket(cpu);
 else
 rqi = cpu_to_socket(cpu);
 
@@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int 
cpu)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
-/* Check to see if the cpu is online yet */
-/* Note: cpu 0 doesn't get a STARTING callback */
-if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
+/*
+ * Actual initialization is deferred to when the pCPU will be
+ * online, via a STARTING callback. The only exception is
+ * the boot cpu, which does not get such a notification, and
+ * hence needs to be taken care of here.
+ */
+if ( system_state == SYS_STATE_boot )
 init_pcpu(ops, cpu);
 else
 printk("%s: cpu %d not online yet, deferring initializatgion\n",

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index cfca5a7..2f2aa73 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 }
 
 /* Figure out which runqueue to put it in */
-rqi = 0;
-
-/* Figure out which runqueue to put it in */
-/* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */
-if ( cpu == 0 )
-rqi = 0;
+if ( system_state == SYS_STATE_boot )
+rqi = boot_cpu_to_socket(cpu);
 else
 rqi = cpu_to_socket(cpu);
 
@@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu)
 static void *
 csched2_alloc_pdata(const struct scheduler *ops, int cpu)
 {
-/* Check to see if the cpu is online yet */
-/* Note: cpu 0 doesn't get a STARTING callback */
-if ( cpu == 0 || cpu_to_socket(cpu) >= 0 )
+/*
+ * Actual initialization is deferred to when the pCPU will be
+ * online, via a STARTING callback. The only exception is
+ * the boot cpu, which does not get such a notification, and
+ * hence needs to be taken care of here.
+ */
+if ( system_state == SYS_STATE_boot )
 init_pcpu(ops, cpu);
 else
 printk("%s: cpu %d not online yet, deferring initializatgion\n",


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] libxl: Domain destroy: fork

2015-03-17 Thread Wei Liu

On Tue, Mar 17, 2015 at 09:30:59AM -0600, Jim Fehlig wrote:
> From: Ian Jackson 
> 
> Call xc_domain_destroy in a subprocess.  That allows us to do so
> asynchronously, rather than blocking the whole process calling libxl.
> 
> The changes in detail:
> 
>  * Provide an libxl__ev_child in libxl__domain_destroy_state, and
>initialise it in libxl__domain_destroy.  There is no possibility
>to `clean up' a libxl__ev_child, but there need to clean it up, as
>the control flow ensures that we only continue after the child has
>exited.
> 
>  * Call libxl__ev_child_fork at the right point and put the call to
>xc_domain_destroy and associated logging in the child.  (The child
>opens a new xenctrl handle because we mustn't use the parent's.)
> 
>  * Consequently, the success return path from domain_destroy_domid_cb
>no longer calls dis->callback.  Instead it simply returns.
> 
>  * We plumb the errorno value through the child's exit status, if it
>fits.  This means we normally do the logging only in the parent.
> 
>  * Incidentally, we fix the bug that we were treating the return value
>from xc_domain_destroy as an errno value when in fact it is a
>return value from do_domctl (in this case, 0 or -1 setting errno).
> 
> Signed-off-by: Ian Jackson 
> Reviewed-by: Jim Fehlig 
> Tested-by: Jim Fehlig 

Reviewed-by: Wei Liu 

One nit below.

> ---
[...]
> +ctx->xch = xc_interface_open(ctx->lg,0,0);
> +if (!ctx->xch) goto badchild;
> +
> +rc = xc_domain_destroy(ctx->xch, domid);
> +if (rc < 0) goto badchild;
> +_exit(0);
> +
> +badchild:
> +if (errno > 0  && errno < 126) {
> +_exit(errno);
> +} else {
> +LOGE(ERROR,
> + "xc_domain_destroy failed for %d (with difficult errno value %d)",

Indentation is wrong.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] Running update-server-info on push

2015-03-17 Thread Ian Campbell

It seems some repos have this via the hooks/post-update.sample having
been renamed to hooks/post-update, but a few don't.

So I've done:

xen@xenbits:~/git$ for i in rumpuser-xen.git mini-os.git libvirt.git ; 
do
> mv -iv $i/hooks/post-update.sample $i/hooks/post-update
> done
`rumpuser-xen.git/hooks/post-update.sample' -> 
`rumpuser-xen.git/hooks/post-update'
`mini-os.git/hooks/post-update.sample' -> 
`mini-os.git/hooks/post-update'
`libvirt.git/hooks/post-update.sample' -> 
`libvirt.git/hooks/post-update'
xen@xenbits:~/git$ for i in rumpuser-xen.git mini-os.git libvirt.git ; 
do
> ( cd $i && git update-server-info )
> done
xen@xenbits:~/git$ 

I did not investigate people/* or xenclient/*.

This will explain the failure of flight 36502 which has yet to be posted.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/3] libxl: In domain death search, start search at first domid we want

2015-03-17 Thread Wei Liu

On Tue, Mar 17, 2015 at 09:30:57AM -0600, Jim Fehlig wrote:
> From: Ian Jackson 
> 
> From: Ian Jackson 
> 
> When domain_death_xswatch_callback needed a further call to
> xc_domain_getinfolist it would restart it with the last domain it
> found rather than the first one it wants.
> 
> If it only wants one it will also only ask for one domain.  The result
> would then be that it gets the previous domain again (ie, the previous
> one to the one it wants), which still doesn't reveal the answer to the
> question, and it would therefore loop again.
> 
> It's completely unclear to me why I thought it was a good idea to
> start the xc_domain_getinfolist with the last domain previously found
> rather than the first one left un-confirmed.  The code has been that
> way since it was introduced.
> 
> Instead, start each xc_domain_getinfolist at the next domain whose
> status we need to check.
> 
> We also need to move the test for !evg into the loop, we now need evg
> to compute the arguments to getinfolist.
> 
> Signed-off-by: Ian Jackson 
> Reported-by: Jim Fehlig 
> Reviewed-by: Jim Fehlig 
> Tested-by: Jim Fehlig 

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: Domain destroy: unlock userdata earlier

2015-03-17 Thread Wei Liu

On Tue, Mar 17, 2015 at 09:30:58AM -0600, Jim Fehlig wrote:
> From: Ian Jackson 
> 
> Unlock the userdata before we actually call xc_domain_destroy.  This
> leaves open the possibility that other libxl callers will see the
> half-destroyed domain (with no devices, paused), but this is fine.
> 
> Signed-off-by: Ian Jackson 
> CC: Wei Liu 
> Reviewed-by: Jim Fehlig 
> Tested-by: Jim Fehlig 

Acked-by: Wei Liu 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-17 Thread Konrad Rzeszutek Wilk


> >> Additionally I think it should be considered whether the bitmap
> >> approach of interpreting ->state is the right one, and we don't
> >> instead want a clean 3-state (idle, sched, run) model.
> > 
> > Could you elaborate a bit more please? As in three different unsigned int
> > (or bool_t) that set in what state we are in?
> 
> An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially
> if my comment above turns out to be wrong, you'd have no real
> need for the SCHED and RUN flags to be set at the same time.

I cobbled what I believe is what you were thinking off.

As you can see to preserve the existing functionality such as
being able to schedule N amount of interrupt injections
for the N interrupts we might get - I modified '->masked'
to be an atomic counter.

The end result is that we can still live-lock. Unless we:
 - Drop on the floor the injection of N interrupts and
   just deliever at max one per VMX_EXIT (and not bother
   with interrupts arriving when we are in the VMX handler).

 - Alter the softirq code slightly - to have an variant
   which will only iterate once over the pending softirq
   bits per call. (so save an copy of the bitmap on the
   stack when entering the softirq handler - and use that.
   We could also xor it against the current to catch any
   non-duplicate bits being set that we should deal with).

Here is the compile, but not run-time tested patch.

>From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Tue, 17 Mar 2015 13:31:52 -0400
Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap

*TODO*:
 - Writeup.
 - Tests

Suggested-by: Jan Beulich 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 140 ---
 xen/include/xen/hvm/irq.h|   4 +-
 2 files changed, 82 insertions(+), 62 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..663e104 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -30,42 +30,28 @@
 static DEFINE_PER_CPU(struct list_head, dpci_list);
 
 /*
- * These two bit states help to safely schedule, deschedule, and wait until
- * the softirq has finished.
- *
- * The semantics behind these two bits is as follow:
- *  - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom).
- *  - STATE_RUN - only softirq is allowed to set and clear it. If it has
- *  been set hvm_dirq_assist will RUN with a saved value of the
- *  'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was set.
- *
- * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) ->
- * STATE_SCHED(unset) -> STATE_RUN(unset).
- *
- * However the states can also diverge such as: STATE_SCHED(set) ->
- * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means
- * the 'hvm_dirq_assist' never run and that the softirq did not do any
- * ref-counting.
- */
-
-enum {
-STATE_SCHED,
-STATE_RUN
-};
-
-/*
  * This can be called multiple times, but the softirq is only raised once.
- * That is until the STATE_SCHED state has been cleared. The state can be
- * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
- * or by 'pt_pirq_softirq_reset' (which will try to clear the state before
+ * That is until state is in init. The state can be changed  by:
+ * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'),
+ * or by 'pt_pirq_softirq_reset' (which will try to init the state before
  * the softirq had a chance to run).
  */
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
 unsigned long flags;
 
-if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
+switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) )
+{
+case STATE_RUN:
+case STATE_SCHED:
+/*
+ * The pirq_dpci->mapping has been incremented to let us know
+ * how many we have left to do.
+ */
 return;
+case STATE_INIT:
+break;
+}
 
 get_knownalive_domain(pirq_dpci->dom);
 
@@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
  */
 bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci)
 {
-if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) )
+if ( pirq_dpci->state != STATE_INIT )
 return 1;
 
 /*
@@ -109,22 +95,22 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci 
*pirq_dpci)
 
 ASSERT(spin_is_locked(&d->event_lock));
 
-switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) )
+switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_INIT) )
 {
-case (1 << STATE_SCHED):
+case STATE_SCHED:
 /*
- * We are going to try to de-schedule the softirq before it goes in
- * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'.
+ * We are going to try to de-schedule the softirq before it goes to
+ * running state. Whoever moves from

Re: [Xen-devel] [PATCH 2/2] VT-d: extend XSA-59 workaround to XeonE5 v3 (Haswell)

2015-03-17 Thread Dugger, Donald D

Note that the following Haswell chipsets should also be included in this list:

Haswell - 0xc0f, 0xd00, 0xd04, 0xd08, 0xd0f, 0xa00, 0xa08, 0xa0f

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com] 
Sent: Friday, December 19, 2014 1:42 AM
To: xen-devel
Cc: Dugger, Donald D; Tian, Kevin; Zhang, Yang Z
Subject: [PATCH 2/2] VT-d: extend XSA-59 workaround to XeonE5 v3 (Haswell)

Note that the datasheet lacks PCI IDs for Dev 1 Fn 0-1, so their IDs are being 
added based on what https://pci-ids.ucw.cz/read/PC/8086 says.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -431,6 +431,7 @@ void pci_vtd_quirk(const struct pci_dev 
  *   - Potential security issue if malicious guest trigger VT-d faults.
  */
 case 0x0e28: /* Xeon-E5v2 (IvyBridge) */
+case 0x2f28: /* Xeon-E5v3 (Haswell) */
 case 0x342e: /* Tylersburg chipset (Nehalem / Westmere systems) */
 case 0x3728: /* Xeon C5500/C3500 (JasperForest) */
 case 0x3c28: /* Sandybridge */
@@ -443,6 +444,9 @@ void pci_vtd_quirk(const struct pci_dev 
 /* Xeon E5/E7 v2 */
 case 0x0e00: /* host bridge */
 case 0x0e01: case 0x0e04 ... 0x0e0b: /* root ports */
+/* Xeon E5 v3 */
+case 0x2f00: /* host bridge */
+case 0x2f01 ... 0x2f0b: /* root ports */
 /* Tylersburg (EP)/Boxboro (MP) chipsets (NHM-EP/EX, WSM-EP/EX) */
 case 0x3400 ... 0x3407: /* host bridges */
 case 0x3408 ... 0x3411: case 0x3420 ... 0x3421: /* root ports */




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/2] VT-d: make XSA-59 workaround fully cover XeonE5/E7 v2

2015-03-17 Thread Dugger, Donald D

Note that the following Nehalem/Westmere chipsets should be included in this 
list:

Nehalem - 0x40, 0x2c01, 0x2c41, 0x313x
Westmere - 0x2c70, 0x2d81, 0xd15x

--
Don Dugger
"Censeo Toto nos in Kansa esse decisse." - D. Gale
Ph: 303/443-3786

-Original Message-
From: Jan Beulich [mailto:jbeul...@suse.com] 
Sent: Friday, December 19, 2014 1:41 AM
To: xen-devel
Cc: Dugger, Donald D; Tian, Kevin; Zhang, Yang Z
Subject: [PATCH 1/2] VT-d: make XSA-59 workaround fully cover XeonE5/E7 v2

So far only the VT-d UR masking was being done for them.

Signed-off-by: Jan Beulich 

--- a/xen/drivers/passthrough/vtd/quirks.c
+++ b/xen/drivers/passthrough/vtd/quirks.c
@@ -440,6 +440,9 @@ void pci_vtd_quirk(const struct pci_dev 
seg, bus, dev, func);
 break;
 
+/* Xeon E5/E7 v2 */
+case 0x0e00: /* host bridge */
+case 0x0e01: case 0x0e04 ... 0x0e0b: /* root ports */
 /* Tylersburg (EP)/Boxboro (MP) chipsets (NHM-EP/EX, WSM-EP/EX) */
 case 0x3400 ... 0x3407: /* host bridges */
 case 0x3408 ... 0x3411: case 0x3420 ... 0x3421: /* root ports */




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] slightly reduce vm_assist code

2015-03-17 Thread Tim Deegan

At 15:55 + on 17 Mar (1426607705), Jan Beulich wrote:
> - drop an effectively unused struct pv_vcpu field (x86)
> - adjust VM_ASSIST() to prepend VMASST_TYPE_
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan , though I think these would
have been better as two separate patches.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 04:06:14PM +, Jan Beulich wrote:
> >>> On 17.03.15 at 16:38,  wrote:
> > --- a/xen/drivers/passthrough/io.c
> > +++ b/xen/drivers/passthrough/io.c
> > @@ -804,7 +804,17 @@ static void dpci_softirq(void)
> >  d = pirq_dpci->dom;
> >  smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
> >  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> > -BUG();
> > +{
> > +unsigned long flags;
> > +
> > +/* Put back on the list and retry. */
> > +local_irq_save(flags);
> > +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> > +local_irq_restore(flags);
> > +
> > +raise_softirq(HVM_DPCI_SOFTIRQ);
> > +continue;
> > +}
> 
> As just said in another mail - unless there are convincing new
> arguments in favor of this (more of a hack than a real fix), I'm
> not going to accept it and instead consider reverting the
> offending commit. Iirc the latest we had come to looked quite a
> bit better than this one.

The latest one (please see attached) would cause an dead-lock iff
on the CPU we are running the softirq and an do_IRQ comes for the
exact dpci we are in process of executing.

> 
> Jan
> 
>From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Tue, 17 Mar 2015 09:46:09 -0400
Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has
 been cleared.

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' (on another CPU)
to schedule the dpci.

Specifically this can happen whenthe other CPU receives
an interrupt, calls 'raise_softirq_for', and puts the dpci
on its per-cpu list (same dpci structure).

There would be two 'dpci_softirq' running at the same time
(on different CPUs) where on one CPU it would be executing
hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
and on the other CPU it is trying to call:

   if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
BUG();

Since STATE_RUN is already set it would end badly.

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Potential solutions:

a) Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clears.

b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci
in a more safe manner (delay it). The dpci would stil not
be scheduled when STATE_SCHED bit was set.

c) This patch explores a third option - we will only schedule
the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN).

We will spin if STATE_RUN is set (as it is in progress and will
finish). If the STATE_SCHED is set (so hasn't run yet) we won't
try to spin and just exit. This can cause an dead-lock if the interrupt
comes when we are processing the dpci in the softirq.

Interestingly the old ('tasklet') code used an a) mechanism.
If the function assigned to the tasklet was running  - the softirq
that ran said function (hvm_dirq_assist) would be responsible for
putting the tasklet back on the per-cpu list. This would allow
to have an running tasklet and an 'to-be-scheduled' tasklet
at the same time. This solution moves this 'to-be-scheduled'
job to be done in 'raise_softirq_for' (instead of the
'softirq').

Reported-by: Sander Eikelenboom 
Reported-by: Malcolm Crossley 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 28 +---
 1 file changed, 25 insertions(+), 3 deletions(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..9c30ebb 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -63,10 +63,32 @@ enum {
 static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci)
 {
 unsigned long flags;
+unsigned long old;

-if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) )
-return;
-
+/*
+ * This cmpxchg spins until the state is zero (unused).
+ */
+for ( ;; )
+{
+old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED);
+switch ( old )
+{
+case (1 << STATE_SCHED):
+/*
+ * Whenever STATE_SCHED is set we MUST not schedule it.
+ */
+return;
+case (1 << STATE_RUN) | (1 << STATE_SCHED):
+case (1 << STATE_RUN):
+/* Getting close to finish. Spin. */
+continue;
+}
+/*
+ * If the 'state' is 0 (not in use) we can schedule it.
+ */
+if ( old == 0 )
+break;
+}
 get_knownalive_domain(pirq_dpci->dom);

 local_irq_save(flags);
-- 
2.1.0

___
Xen

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Antti Kantee


On 17/03/15 14:29, Wei Liu wrote:

I've now successfully built QEMU upstream with rump kernel. However to
make it fully functional as a stubdom, there are some missing pieces to
be added in.

1. The ability to access QMP socket (a unix socket) from Dom0. That
will be used to issue command to QEMU.
2. The ability to access files in Dom0. That will be used to write to /
read from QEMU state file.


There's a way to map file access to rump kernel hypercalls with a 
facility called etfs (extra-terrestrial file system).  In fact, the 
current implementation for accessing the Xen block device from the rump 
kernel is done using etfs (... historical reasons, I'd have to go back 
5+ years to explain why it doesn't attach as a regular block device).


etfs isn't a file system, e.g. it doesn't allow listing files or 
removing them, but it does give you complete control of what happens 
when data is read or written for /some/path.  But based on the other 
posts, sounds like it might be enough for what you need.


See:
http://man.netbsd.org/cgi-bin/man-cgi?rump_etfs++NetBSD-current


3. The building process requires mini-os headers. That will be used
to build libxc (the controlling library).


That's not really a problem, though I do want to limit the amount of 
interface we claim to support with rump kernels.  For example, ISTR you 
mentioned on irc you'd like to use minios wait.h.  It would be better to 
use pthread synchronization instead of minios synchronization.  That 
way, if we do have a need to change the underlying threading in the 
future, you won't run into trouble.


So, we should just determine what is actually needed and expose those 
bits by default.



One of my lessons learned from the existing stubdom stuffs is that I
should work with upstream and produce maintainable code. So before I do
anything for real I'd better consult the community. My gut feeling is
that the first two requirements are not really Xen specific. Let me know
what you guys plan and think.


Yes, please.  If there's something silly going on, it's most likely due to:

1) we didn't get that far in our experiments and weren't aware of it
2) we were aware, but some bits were even sillier, taking priority

Either way, a real need is a definite reason to expedite fixing.

  - antti

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()

2015-03-17 Thread Tim Deegan

At 15:56 + on 17 Mar (1426607770), Jan Beulich wrote:
> Most callers have this available already, and the functions don't need
> any vcpu specifics.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Tim Deegan 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 04:01:49PM +, Jan Beulich wrote:
> >>> On 17.03.15 at 15:54,  wrote:
> > On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote:
> >> I'm still running with this first simple stopgap patch from Konrad,
> >> and it has worked fine for me since.
> > 
> > I believe the patch that Sander and Malcom had been running is the best
> > candidate.
> 
> That's the one Sander had quoted I suppose? I don't think this is

Correct.
> any better in terms of live locking, and we went quite some hoops
> to get to something that looked more like a fix than a quick
> workaround. (If there's nothing we can agree to, we'll have to
> revert as we did for 4.5.)

The live-locking does get broken (other softirqs get activated which
moves things along). Keep in mind that the live-locking scenario
exists already in Xen 4.x with the tasklet implementation.

> 
> Jan
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 2/3] tools/libxl/libxl_cpuid.c: Fix leak of resstr on error path

2015-03-17 Thread Wei Liu

On Mon, Mar 16, 2015 at 10:06:17AM +, PRAMOD DEVENDRA wrote:
> From: Pramod Devendra 
> 
> Signed-off-by: Pramod Devendra 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 

Acked-by: Wei Liu 

> ---
>  tools/libxl/libxl_cpuid.c |8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c
> index b0bdc9d..c66e912 100644
> --- a/tools/libxl/libxl_cpuid.c
> +++ b/tools/libxl/libxl_cpuid.c
> @@ -223,9 +223,6 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
> *cpuid, const char* str)
>  }
>  entry = cpuid_find_match(cpuid, flag->leaf, flag->subleaf);
>  resstr = entry->policy[flag->reg - 1];
> -if (resstr == NULL) {
> -resstr = strdup("");
> -}

Minor nit. I would prefer "resstr = " be grouped with the code you
moved. No need to resend though.

Wei.

>  num = strtoull(val, &endptr, 0);
>  flags[flag->length] = 0;
>  if (endptr != val) {
> @@ -242,6 +239,11 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list 
> *cpuid, const char* str)
>  return 3;
>  }
>  }
> +
> +if (resstr == NULL) {
> +resstr = strdup("");
> +}
> +
>  /* the family and model entry is potentially split up across
>   * two fields in Fn_0001_EAX, so handle them here separately.
>   */
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask with xc_cpumap_t instead of uint32_t

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 01:52:27PM +, George Dunlap wrote:
> On 03/13/2015 08:37 PM, Konrad Rzeszutek Wilk wrote:
> > +static int parse_cpumask(const char *arg)
> > +{
> > +xc_cpumap_t map;
> > +uint32_t v, i;
> > +int bits = 0;
> > +
> > +map = malloc(sizeof(uint32_t));
> > +if ( !map )
> > +return -ENOMEM;
> > +
> > +v = argtol(arg, 0);
> > +for ( i = 0; i < sizeof(uint32_t) ; i++ )
> > +map[i] = (v >> (i * 8)) & 0xff;
> > +
> > +for ( i = 0; v; v >>= 1)
> > +bits += v & 1;
> 
> Uum, it looks like this is counting the 1-bits in v, not the total
> number of bist.  So "0x8000" would finish with bits == 1 ; but we would
> this to finish with bits == 16, don't we?

Duh!

It should be:
for ( bits = 0; v; v >>= 1 )
bits ++;

And the 'int bits = 0' can now be 'int bits'.

See patch:
>From aa8a0ddc295161f55531c7f5ac643aadbfe70917 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Fri, 20 Jun 2014 15:34:53 -0400
Subject: [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask
 with xc_cpumap_t instead of uint32_t

We replace the implementation of xc_tbuf_set_cpu_mask with
an xc_cpumap_t instead of a uint32_t. This means we can use an
arbitrary bitmap without being limited to the 32-bits as
previously we were. Furthermore since there is only one
user of xc_tbuf_set_cpu_mask we just replace it and
its user in one go.

We also add an macro which can be used by both libxc and
xentrace.

And update the man page to describe this behavior.

Signed-off-by: Konrad Rzeszutek Wilk 
Acked-by: Ian Campbell 
[libxc pieces]

[v2: Fix up the bit mask counting.
---
 tools/libxc/include/xenctrl.h |   7 ++-
 tools/libxc/xc_tbuf.c |  26 +++
 tools/xentrace/xentrace.8 |   3 ++
 tools/xentrace/xentrace.c | 106 --
 4 files changed, 116 insertions(+), 26 deletions(-)

diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index df18292..713e52b 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -1534,6 +1534,11 @@ int xc_availheap(xc_interface *xch, int min_width, int 
max_width, int node,
  */
 
 /**
+ * Useful macro for converting byte arrays to bitmaps.
+ */
+#define XC_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d))
+
+/**
  * xc_tbuf_enable - enable tracing buffers
  *
  * @parm xch a handle to an open hypervisor interface
@@ -1574,7 +1579,7 @@ int xc_tbuf_set_size(xc_interface *xch, unsigned long 
size);
  */
 int xc_tbuf_get_size(xc_interface *xch, unsigned long *size);
 
-int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask);
+int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask, int bits);
 
 int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask);
 
diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c
index 8777492..d54da8a 100644
--- a/tools/libxc/xc_tbuf.c
+++ b/tools/libxc/xc_tbuf.c
@@ -113,15 +113,23 @@ int xc_tbuf_disable(xc_interface *xch)
 return tbuf_enable(xch, 0);
 }
 
-int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask)
+int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask, int bits)
 {
 DECLARE_SYSCTL;
-DECLARE_HYPERCALL_BUFFER(uint8_t, bytemap);
+DECLARE_HYPERCALL_BOUNCE(mask, XC_DIV_ROUND_UP(bits, 8), 
XC_HYPERCALL_BUFFER_BOUNCE_IN);
 int ret = -1;
-uint64_t mask64 = mask;
+int local_bits;
 
-bytemap = xc_hypercall_buffer_alloc(xch, bytemap, sizeof(mask64));
-if ( bytemap == NULL )
+if ( bits <= 0 )
+goto out;
+
+local_bits = xc_get_max_cpus(xch);
+if ( bits > local_bits )
+{
+PERROR("Wrong amount of bits supplied: %d > %d!\n", bits, local_bits);
+goto out;
+}
+if ( xc_hypercall_bounce_pre(xch, mask) )
 {
 PERROR("Could not allocate memory for xc_tbuf_set_cpu_mask hypercall");
 goto out;
@@ -131,14 +139,12 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask)
 sysctl.interface_version = XEN_SYSCTL_INTERFACE_VERSION;
 sysctl.u.tbuf_op.cmd  = XEN_SYSCTL_TBUFOP_set_cpu_mask;
 
-bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8);
-
-set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap);
-sysctl.u.tbuf_op.cpu_mask.nr_bits = sizeof(bytemap) * 8;
+set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, mask);
+sysctl.u.tbuf_op.cpu_mask.nr_bits = bits;
 
 ret = do_sysctl(xch, &sysctl);
 
-xc_hypercall_buffer_free(xch, bytemap);
+xc_hypercall_bounce_post(xch, mask);
 
  out:
 return ret;
diff --git a/tools/xentrace/xentrace.8 b/tools/xentrace/xentrace.8
index ac18e9f..c176a96 100644
--- a/tools/xentrace/xentrace.8
+++ b/tools/xentrace/xentrace.8
@@ -38,6 +38,9 @@ for new data.
 .TP
 .B -c, --cpu-mask=c
 set bitmask of CPUs to trace. It is limited to 32-bits.
+If not specified, the cpu-mask of all of the available CPUs will be
+constructed.
+
 .TP
 .B -e, --evt-mask=e
 set event capture mask. If not specified

Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 16:38,  wrote:
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -804,7 +804,17 @@ static void dpci_softirq(void)
>  d = pirq_dpci->dom;
>  smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
>  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> -BUG();
> +{
> +unsigned long flags;
> +
> +/* Put back on the list and retry. */
> +local_irq_save(flags);
> +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> +local_irq_restore(flags);
> +
> +raise_softirq(HVM_DPCI_SOFTIRQ);
> +continue;
> +}

As just said in another mail - unless there are convincing new
arguments in favor of this (more of a hack than a real fix), I'm
not going to accept it and instead consider reverting the
offending commit. Iirc the latest we had come to looked quite a
bit better than this one.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote:
> 2. The ability to access files in Dom0. That will be used to write to /
>read from QEMU state file.

This requirement is not as broad as you make it sound.

All which is really required is the ability to slurp in or write out a
blob of bytes to a service running in a control domain, not actual
ability to read/write files in dom0 (which would need careful security
consideration!).

For the old qemu-traditional stubdom for example this is implemented as
a pair of console devices (one r/o for restore + one w/o for save) which
are setup by the toolstack at start of day and pre-plumbed into two
temporary files.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests

2015-03-17 Thread Boris Ostrovsky

Add support for handling PMU interrupts for PV guests.

VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush
hypercall. This allows the guest to access PMU MSR values that are stored in
VPMU context which is shared between hypervisor and domain, thus avoiding
traps to hypervisor.

Since the interrupt handler may now force VPMU context save (i.e. set
VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which
until now expected this flag to be set only when the counters were stopped.

Signed-off-by: Boris Ostrovsky 
Acked-by: Daniel De Graaf 
---

Changes in v19:
* Adjusted for new ops interfaces (passing vcpu vs. vpmu)
* Test for domain->max_cpu in choose_hwdom_vcpu() instead of 
'domain->vcpu!=NULL'
* Replaced '!(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV))' test with 
  'vpmu_mode == XENPMU_MODE_OFF' in vpmu_rd/wrmsr() (to make more logical diff
  in patch#13)

 xen/arch/x86/hvm/svm/vpmu.c   |  11 +-
 xen/arch/x86/hvm/vpmu.c   | 211 --
 xen/include/public/arch-x86/pmu.h |   6 ++
 xen/include/public/pmu.h  |   2 +
 xen/include/xsm/dummy.h   |   4 +-
 xen/xsm/flask/hooks.c |   2 +
 6 files changed, 216 insertions(+), 20 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 474d0db..0997901 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -228,17 +228,12 @@ static int amd_vpmu_save(struct vcpu *v)
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 unsigned int i;
 
-/*
- * Stop the counters. If we came here via vpmu_save_force (i.e.
- * when VPMU_CONTEXT_SAVE is set) counters are already stopped.
- */
+for ( i = 0; i < num_counters; i++ )
+wrmsrl(ctrls[i], 0);
+
 if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) )
 {
 vpmu_set(vpmu, VPMU_FROZEN);
-
-for ( i = 0; i < num_counters; i++ )
-wrmsrl(ctrls[i], 0);
-
 return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 26eda34..c287d8b 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -87,31 +87,57 @@ static void __init parse_vpmu_param(char *s)
 void vpmu_lvtpc_update(uint32_t val)
 {
 struct vpmu_struct *vpmu;
+struct vcpu *curr;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
 return;
 
-vpmu = vcpu_vpmu(current);
+curr = current;
+vpmu = vcpu_vpmu(curr);
 
 vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED);
-apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
+
+/* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */
+if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data ||
+ !(vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
 int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
 {
-struct vpmu_struct *vpmu = vcpu_vpmu(current);
+struct vcpu *curr = current;
+struct vpmu_struct *vpmu;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
 return 0;
 
+vpmu = vcpu_vpmu(curr);
 if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+{
+int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
+
+/*
+ * We may have received a PMU interrupt during WRMSR handling
+ * and since do_wrmsr may load VPMU context we should save
+ * (and unload) it again.
+ */
+if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
+ (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+{
+vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+}
+return ret;
+}
+
 return 0;
 }
 
 int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
 {
-struct vpmu_struct *vpmu = vcpu_vpmu(current);
+struct vcpu *curr = current;
+struct vpmu_struct *vpmu;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
 {
@@ -119,24 +145,163 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t 
*msr_content)
 return 0;
 }
 
+vpmu = vcpu_vpmu(curr);
 if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+{
+int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+
+if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
+ (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
+{
+vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
+vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
+}
+return ret;
+}
 else
 *msr_content = 0;
 
 return 0;
 }
 
+static inline struct vcpu *choose_hwdom_vcpu(void)
+{
+unsigned idx;
+
+if

[Xen-devel] [PATCH 0/3] libxl: Fixes from Ian Jackson

2015-03-17 Thread Jim Fehlig

This is a small series of libxl patches I received off-list from
Ian Jackson.  The patches fix a few issues I found when converting
the libvirt libxl driver to use a single libxl_ctx.  Patch 2 has
been modified slightly to address off-list comments from Wei Liu.

Ian Jackson (3):
  libxl: In domain death search, start search at first domid we want
  libxl: Domain destroy: unlock userdata earlier
  libxl: Domain destroy: fork

 tools/libxl/libxl.c  | 77 +++-
 tools/libxl/libxl_internal.h |  1 +
 2 files changed, 63 insertions(+), 15 deletions(-)

-- 
1.8.0.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 15:54,  wrote:
> On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote:
>> I'm still running with this first simple stopgap patch from Konrad,
>> and it has worked fine for me since.
> 
> I believe the patch that Sander and Malcom had been running is the best
> candidate.

That's the one Sander had quoted I suppose? I don't think this is
any better in terms of live locking, and we went quite some hoops
to get to something that looked more like a fix than a quick
workaround. (If there's nothing we can agree to, we'll have to
revert as we did for 4.5.)

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 15:27 +, Wei Liu wrote:
> This looks most interesting as it implies we can easily pipe a console
> to it.

BTW, rather than rawe consoles we should probably consider using the
channel extension: http://xenbits.xen.org/docs/unstable/misc/channel.txt

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 5/5] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Dario Faggioli

In fact, printing the cpupool's CPU online mask
for each vCPU is just redundant, as that is the
same for all the vCPUs of all the domains in the
same cpupool, while hard affinity is already part
of the output of dumping domains info.

Instead, print the intersection between hard
affinity and online CPUs, which is --in case of this
scheduler-- the effective affinity always used for
the vCPUs.

This change also takes the chance to add a scratch
cpumask area, to avoid having to either put one
(more) cpumask_t on the stack, or dynamically
allocate it within the dumping routine. (The former
being bad because hypervisor stack size is limited,
the latter because dynamic allocations can fail, if
the hypervisor was built for a large enough number
of CPUs.)

Such scratch area can be used to kill most of the
cpumasks{_var}_t local variables in other functions
in the file, but that is *NOT* done in this chage.

Finally, convert the file to use keyhandler scratch,
instead of open coded string buffers.

Signed-off-by: Dario Faggioli 
Cc: George Dunlap 
Cc: Meng Xu 
Cc: Jan Beulich 
Cc: Keir Fraser 
---
Changes from v1:
 * improved changelog;
 * made a local variable to point to the correct
   scratch mask, as suggested during review.
---
 xen/common/sched_rt.c |   42 +-
 1 file changed, 33 insertions(+), 9 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index 7c39a9e..ec28956 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -124,6 +124,12 @@
 #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4)
 #define TRC_RTDS_SCHED_TASKLETTRC_SCHED_CLASS_EVT(RTDS, 5)
 
+ /*
+  * Useful to avoid too many cpumask_var_t on the stack.
+  */
+static cpumask_t **_cpumask_scratch;
+#define cpumask_scratch _cpumask_scratch[smp_processor_id()]
+
 /*
  * Systme-wide private data, include global RunQueue/DepletedQ
  * Global lock is referenced by schedule_data.schedule_lock from all
@@ -218,8 +224,7 @@ __q_elem(struct list_head *elem)
 static void
 rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
 {
-char cpustr[1024];
-cpumask_t *cpupool_mask;
+cpumask_t *cpupool_mask, *mask;
 
 ASSERT(svc != NULL);
 /* idle vcpu */
@@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 return;
 }
 
-cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity);
+/*
+ * We can't just use 'cpumask_scratch' because the dumping can
+ * happen from a pCPU outside of this scheduler's cpupool, and
+ * hence it's not right to use the pCPU's scratch mask (which
+ * may even not exist!). On the other hand, it is safe to use
+ * svc->vcpu->processor's own scratch space, since we hold the
+ * runqueue lock.
+ */
+mask = _cpumask_scratch[svc->vcpu->processor];
+
+cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
+cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity);
+cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask);
 printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime"),"
" cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n"
-   " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ",
+   " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n",
 svc->vcpu->domain->domain_id,
 svc->vcpu->vcpu_id,
 svc->vcpu->processor,
@@ -243,11 +260,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct 
rt_vcpu *svc)
 svc->last_start,
 __vcpu_on_q(svc),
 vcpu_runnable(svc->vcpu),
-cpustr);
-memset(cpustr, 0, sizeof(cpustr));
-cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
-cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask);
-printk("cpupool=%s\n", cpustr);
+svc->flags,
+keyhandler_scratch);
 }
 
 static void
@@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
 if ( prv == NULL )
 return -ENOMEM;
 
+_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
+if ( _cpumask_scratch == NULL )
+return -ENOMEM;
+
 spin_lock_init(&prv->lock);
 INIT_LIST_HEAD(&prv->sdom);
 INIT_LIST_HEAD(&prv->runq);
@@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
 {
 struct rt_private *prv = rt_priv(ops);
 
+xfree(_cpumask_scratch);
 xfree(prv);
 }
 
@@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
 per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
 spin_unlock_irqrestore(&prv->lock, flags);
 
+if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
+return NULL;
+
 /* 1 indicates alloc. succeed in schedule.c */
 return (void *)1;
 }
@@ -462,6 +484,8 @@ rt_free_pdata(const struct scheduler *ops, void *pcpu, int 
cpu)
 sd->schedule_lock = &sd->_lock;
 
 spin_unlock_irqrestore(&prv->lock, flag

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Stefano Stabellini

On Tue, 17 Mar 2015, Anthony PERARD wrote:
> On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote:
> > I've now successfully built QEMU upstream with rump kernel. However to
> > make it fully functional as a stubdom, there are some missing pieces to
> > be added in.
> > 
> > 1. The ability to access QMP socket (a unix socket) from Dom0. That
> >will be used to issue command to QEMU.
> 
> The QMP "socket" does not needs to be a unix socket. It can be any of
> those (from qemu --help):
> Character device options:
> -chardev null,id=id[,mux=on|off]
> -chardev 
> socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]
>  [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp)
> -chardev 
> socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off]
>  (unix)
> -chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]
>  [,localport=localport][,ipv4][,ipv6][,mux=on|off]
> -chardev msmouse,id=id[,mux=on|off]
> -chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]
>  [,mux=on|off]
> -chardev ringbuf,id=id[,size=size]
> -chardev file,id=id,path=path[,mux=on|off]
> -chardev pipe,id=id,path=path[,mux=on|off]
> -chardev pty,id=id[,mux=on|off]
> -chardev stdio,id=id[,mux=on|off][,signal=on|off]
> -chardev serial,id=id,path=path[,mux=on|off]
> -chardev tty,id=id,path=path[,mux=on|off]
> -chardev parallel,id=id,path=path[,mux=on|off]
> -chardev parport,id=id,path=path[,mux=on|off]
> -chardev spicevmc,id=id,name=name[,debug=debug]
> -chardev spiceport,id=id,name=name[,debug=debug]
> 
> > 2. The ability to access files in Dom0. That will be used to write to /
> >read from QEMU state file.
> 
> To save a QEMU state (write), we do use a filename. But I guest we could
> expand the QMP command (xen-save-devices-state) to use something else, if
> it's easier.
> 
> To restore, we provide a file descriptor from libxl to QEMU, with the fd on
> the file that contain the state we want to restore. But there are a few
> other way to load a state (from qemu.git/docs/migration.txt):
> - tcp migration: do the migration using tcp sockets
> - unix migration: do the migration using unix sockets
> - exec migration: do the migration using the stdin/stdout through a process.
> - fd migration: do the migration using an file descriptor that is
>   passed to QEMU.  QEMU doesn't care how this file descriptor is opened.

QEMU would definitely be happy if we started using fds instead of files
to save/restore the state on Xen.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 0/5] Improving dumping of scheduler related info

2015-03-17 Thread Dario Faggioli

Take 2. Some of the patches have been checked-in already, so here's what's
remaining:
 - fix a bug in the RTDS scheduler (patch 1),
 - improve how the whole process of dumping scheduling info is serialized,
   by moving all locking code into specific schedulers (patch 2),
 - print more useful scheduling related information (patches 3, 4 and 5).

Git branch here:

  git://xenbits.xen.org/people/dariof/xen.git  rel/sched/dump-v2
  
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/dump-v2

I think I addressed all the comments raised upon v1. More details in the
changelogs of the various patches.

Thanks and Regards,
Dario

---
Dario Faggioli (5):
  xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  xen: rework locking for dump of scheduler info (debug-key r)
  xen: print online pCPUs and free pCPUs when dumping
  xen: sched_credit2: more info when dumping
  xen: sched_rt: print useful affinity info when dumping

 xen/common/cpupool.c   |   12 +
 xen/common/sched_credit.c  |   42 ++-
 xen/common/sched_credit2.c |   53 +---
 xen/common/sched_rt.c  |   59 
 xen/common/sched_sedf.c|   16 
 xen/common/schedule.c  |5 +---
 6 files changed, 157 insertions(+), 30 deletions(-)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 1/5] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains

2015-03-17 Thread Dario Faggioli

being serviced by the RTDS scheduler, as that is a
legit situation to be in: think, for instance, of a
newly created RTDS cpupool, with no domains migrated
to it yet.

While there:
 - move the spinlock acquisition up, to effectively
   protect the domain list and avoid races;
 - the mask of online pCPUs was being retrieved
   but then not used anywhere in the function: get
   rid of that.

Signed-off-by: Dario Faggioli 
Cc: George Dunlap 
Cc: Meng Xu 
Cc: Jan Beulich 
Cc: Keir Fraser 
Reviewed-by: Meng Xu 
Acked-by: George Dunlap 
---
Changes from v1:
 * updated the changelog as requested during review;
 * fixed coding style, as requested during review;
 * fixed label indentation, as requested during review.
---
 xen/common/sched_rt.c |   10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index ffc5107..2b0b7c6 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops)
 struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter;
 struct rt_private *prv = rt_priv(ops);
 struct rt_vcpu *svc;
-cpumask_t *online;
 struct rt_dom *sdom;
 unsigned long flags;
 
-ASSERT(!list_empty(&prv->sdom));
+spin_lock_irqsave(&prv->lock, flags);
+
+if ( list_empty(&prv->sdom) )
+goto out;
 
-sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem);
-online = cpupool_scheduler_cpumask(sdom->dom->cpupool);
 runq = rt_runq(ops);
 depletedq = rt_depletedq(ops);
 
-spin_lock_irqsave(&prv->lock, flags);
 printk("Global RunQueue info:\n");
 list_for_each( iter, runq )
 {
@@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops)
 }
 }
 
+ out:
 spin_unlock_irqrestore(&prv->lock, flags);
 }
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/3] tools/libxl/libxl_qmp.c: Make sure sun_path is NULL terminated in qmp_open

2015-03-17 Thread Wei Liu

On Mon, Mar 16, 2015 at 10:05:38AM +, PRAMOD DEVENDRA wrote:
> From: Pramod Devendra 
> 
> Signed-off-by: Pramod Devendra 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 
> ---
>  tools/libxl/libxl_qmp.c |5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c
> index c7324e6..1080162 100644
> --- a/tools/libxl/libxl_qmp.c
> +++ b/tools/libxl/libxl_qmp.c
> @@ -369,10 +369,13 @@ static int qmp_open(libxl__qmp_handler *qmp, const char 
> *qmp_socket_path,
>  ret = libxl_fd_set_cloexec(qmp->ctx, qmp->qmp_fd, 1);
>  if (ret) return -1;
>  
> +if(sizeof (qmp->addr.sun_path) <= strlen(qmp_socket_path))
> +return -1;
> +

I know this is not your fault, but the function seems to leak qmp_fd on
error path (qmp_fd is not closed). Do you fancy fixing that?

Wei.

>  memset(&qmp->addr, 0, sizeof (qmp->addr));
>  qmp->addr.sun_family = AF_UNIX;
>  strncpy(qmp->addr.sun_path, qmp_socket_path,
> -sizeof (qmp->addr.sun_path));
> +sizeof (qmp->addr.sun_path)-1);
>  
>  do {
>  ret = connect(qmp->qmp_fd, (struct sockaddr *) &qmp->addr,
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 3/3] tools/libxc/xc_linux_osdep.c: Don't leak mmap() mapping on map_foreign_bulk() error path

2015-03-17 Thread Wei Liu

On Mon, Mar 16, 2015 at 10:06:50AM +, PRAMOD DEVENDRA wrote:
> From: Pramod Devendra 
> 
> Signed-off-by: Pramod Devendra 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 

Acked-by: Wei Liu 

> ---
>  tools/libxc/xc_linux_osdep.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c
> index b6c435a..ce59590 100644
> --- a/tools/libxc/xc_linux_osdep.c
> +++ b/tools/libxc/xc_linux_osdep.c
> @@ -323,6 +323,7 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface 
> *xch, xc_osdep_handle h
>  if ( pfn == MAP_FAILED )
>  {
>  PERROR("xc_map_foreign_bulk: mmap of pfn array failed");
> +(void)munmap(addr, (unsigned long)num << XC_PAGE_SHIFT);
>  return NULL;
>  }
>  }
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Wei Liu

On Tue, Mar 17, 2015 at 03:15:17PM +, Anthony PERARD wrote:
> On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote:
> > I've now successfully built QEMU upstream with rump kernel. However to
> > make it fully functional as a stubdom, there are some missing pieces to
> > be added in.
> > 
> > 1. The ability to access QMP socket (a unix socket) from Dom0. That
> >will be used to issue command to QEMU.
> 
> The QMP "socket" does not needs to be a unix socket. It can be any of
> those (from qemu --help):
> Character device options:
> -chardev null,id=id[,mux=on|off]
> -chardev 
> socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]
>  [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp)
> -chardev 
> socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off]
>  (unix)
> -chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]
>  [,localport=localport][,ipv4][,ipv6][,mux=on|off]
> -chardev msmouse,id=id[,mux=on|off]
> -chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]
>  [,mux=on|off]
> -chardev ringbuf,id=id[,size=size]
> -chardev file,id=id,path=path[,mux=on|off]
> -chardev pipe,id=id,path=path[,mux=on|off]
> -chardev pty,id=id[,mux=on|off]
> -chardev stdio,id=id[,mux=on|off][,signal=on|off]
> -chardev serial,id=id,path=path[,mux=on|off]
> -chardev tty,id=id,path=path[,mux=on|off]
> -chardev parallel,id=id,path=path[,mux=on|off]
> -chardev parport,id=id,path=path[,mux=on|off]
> -chardev spicevmc,id=id,name=name[,debug=debug]
> -chardev spiceport,id=id,name=name[,debug=debug]
> 

Ha, thanks for the list. My brain was too locked in to the current
implementation.

So yes, we now have an array of possible transports at our disposal.

> > 2. The ability to access files in Dom0. That will be used to write to /
> >read from QEMU state file.
> 
> To save a QEMU state (write), we do use a filename. But I guest we could
> expand the QMP command (xen-save-devices-state) to use something else, if
> it's easier.
> 

That's also an option.

> To restore, we provide a file descriptor from libxl to QEMU, with the fd on
> the file that contain the state we want to restore. But there are a few
> other way to load a state (from qemu.git/docs/migration.txt):
> - tcp migration: do the migration using tcp sockets
> - unix migration: do the migration using unix sockets
> - exec migration: do the migration using the stdin/stdout through a process.

This looks most interesting as it implies we can easily pipe a console
to it.

Wei.

> - fd migration: do the migration using an file descriptor that is
>   passed to QEMU.  QEMU doesn't care how this file descriptor is opened.
> 
> -- 
> Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] libxl: Domain destroy: unlock userdata earlier

2015-03-17 Thread Jim Fehlig

From: Ian Jackson 

Unlock the userdata before we actually call xc_domain_destroy.  This
leaves open the possibility that other libxl callers will see the
half-destroyed domain (with no devices, paused), but this is fine.

Signed-off-by: Ian Jackson 
CC: Wei Liu 
Reviewed-by: Jim Fehlig 
Tested-by: Jim Fehlig 
---

Addressed off-list comments from Wei Liu

 tools/libxl/libxl.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index e7eb863..b6541d4 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1636,7 +1636,7 @@ static void devices_destroy_cb(libxl__egc *egc,
 uint32_t domid = dis->domid;
 char *dom_path;
 char *vm_path;
-libxl__domain_userdata_lock *lock = NULL;
+libxl__domain_userdata_lock *lock;
 
 dom_path = libxl__xs_get_dompath(gc, domid);
 if (!dom_path) {
@@ -1670,6 +1670,8 @@ static void devices_destroy_cb(libxl__egc *egc,
 }
 libxl__userdata_destroyall(gc, domid);
 
+libxl__unlock_domain_userdata(lock);
+
 rc = xc_domain_destroy(ctx->xch, domid);
 if (rc < 0) {
 LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy 
failed for %d", domid);
@@ -1679,7 +1681,6 @@ static void devices_destroy_cb(libxl__egc *egc,
 rc = 0;
 
 out:
-if (lock) libxl__unlock_domain_userdata(lock);
 dis->callback(egc, dis, rc);
 return;
 }
-- 
1.8.0.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 2/5] xen: rework locking for dump of scheduler info (debug-key r)

2015-03-17 Thread Dario Faggioli

such as it is taken care of by the various schedulers, rather
than happening in schedule.c. In fact, it is the schedulers
that know better which locks are necessary for the specific
dumping operations.

While there, fix a few style issues (indentation, trailing
whitespace, parentheses and blank line after var declarations)

Signed-off-by: Dario Faggioli 
Cc: George Dunlap 
Cc: Meng Xu 
Cc: Jan Beulich 
Cc: Keir Fraser 
Reviewed-by: Meng Xu 
---
Changes from v1:
 * take care of SEDF too, as requested during review;
---
As far as tags are concerned, I kept Meng's 'Reviewed-by', as I think this
applies mostly to chenges to sched_rt.c. I, OTOH, dropped George's one, to
give him the chance to look at changes to sched_sedf.c.
---
 xen/common/sched_credit.c  |   42 --
 xen/common/sched_credit2.c |   40 
 xen/common/sched_rt.c  |7 +--
 xen/common/sched_sedf.c|   16 
 xen/common/schedule.c  |5 ++---
 5 files changed, 95 insertions(+), 15 deletions(-)

diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index bec67ff..953ecb0 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -26,6 +26,23 @@
 
 
 /*
+ * Locking:
+ * - Scheduler-lock (a.k.a. runqueue lock):
+ *  + is per-runqueue, and there is one runqueue per-cpu;
+ *  + serializes all runqueue manipulation operations;
+ * - Private data lock (a.k.a. private scheduler lock):
+ *  + serializes accesses to the scheduler global state (weight,
+ *credit, balance_credit, etc);
+ *  + serializes updates to the domains' scheduling parameters.
+ *
+ * Ordering is "private lock always comes first":
+ *  + if we need both locks, we must acquire the private
+ *scheduler lock for first;
+ *  + if we already own a runqueue lock, we must never acquire
+ *the private scheduler lock.
+ */
+
+/*
  * Basic constants
  */
 #define CSCHED_DEFAULT_WEIGHT   256
@@ -1750,11 +1767,24 @@ static void
 csched_dump_pcpu(const struct scheduler *ops, int cpu)
 {
 struct list_head *runq, *iter;
+struct csched_private *prv = CSCHED_PRIV(ops);
 struct csched_pcpu *spc;
 struct csched_vcpu *svc;
+spinlock_t *lock = lock;
+unsigned long flags;
 int loop;
 #define cpustr keyhandler_scratch
 
+/*
+ * We need both locks:
+ * - csched_dump_vcpu() wants to access domains' scheduling
+ *   parameters, which are protected by the private scheduler lock;
+ * - we scan through the runqueue, so we need the proper runqueue
+ *   lock (the one of the runqueue of this cpu).
+ */
+spin_lock_irqsave(&prv->lock, flags);
+lock = pcpu_schedule_lock(cpu);
+
 spc = CSCHED_PCPU(cpu);
 runq = &spc->runq;
 
@@ -1781,6 +1811,9 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu)
 csched_dump_vcpu(svc);
 }
 }
+
+pcpu_schedule_unlock(lock, cpu);
+spin_unlock_irqrestore(&prv->lock, flags);
 #undef cpustr
 }
 
@@ -1792,7 +1825,7 @@ csched_dump(const struct scheduler *ops)
 int loop;
 unsigned long flags;
 
-spin_lock_irqsave(&(prv->lock), flags);
+spin_lock_irqsave(&prv->lock, flags);
 
 #define idlers_buf keyhandler_scratch
 
@@ -1835,15 +1868,20 @@ csched_dump(const struct scheduler *ops)
 list_for_each( iter_svc, &sdom->active_vcpu )
 {
 struct csched_vcpu *svc;
+spinlock_t *lock;
+
 svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem);
+lock = vcpu_schedule_lock(svc->vcpu);
 
 printk("\t%3d: ", ++loop);
 csched_dump_vcpu(svc);
+
+vcpu_schedule_unlock(lock, svc->vcpu);
 }
 }
 #undef idlers_buf
 
-spin_unlock_irqrestore(&(prv->lock), flags);
+spin_unlock_irqrestore(&prv->lock, flags);
 }
 
 static int
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index be6859a..ae9b359 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -51,8 +51,6 @@
  * credit2 wiki page:
  *  http://wiki.xen.org/wiki/Credit2_Scheduler_Development
  * TODO:
- * + Immediate bug-fixes
- *  - Do per-runqueue, grab proper lock for dump debugkey
  * + Multiple sockets
  *  - Detect cpu layout and make runqueue map, one per L2 (make_runq_map())
  *  - Simple load balancer / runqueue assignment
@@ -1832,12 +1830,24 @@ csched2_dump_vcpu(struct csched2_vcpu *svc)
 static void
 csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 {
+struct csched2_private *prv = CSCHED2_PRIV(ops);
 struct list_head *runq, *iter;
 struct csched2_vcpu *svc;
+unsigned long flags;
+spinlock_t *lock;
 int loop;
 char cpustr[100];
 
-/* FIXME: Do locking properly for access to runqueue structures */
+/*
+ * We need both locks:
+ * - csched2_dump_vcpu() wants to access domains' scheduling
+ *   parameters, which are protected by the private scheduler lock;
+ * - we sc

[Xen-devel] [PATCH 1/3] libxl: In domain death search, start search at first domid we want

2015-03-17 Thread Jim Fehlig

From: Ian Jackson 

From: Ian Jackson 

When domain_death_xswatch_callback needed a further call to
xc_domain_getinfolist it would restart it with the last domain it
found rather than the first one it wants.

If it only wants one it will also only ask for one domain.  The result
would then be that it gets the previous domain again (ie, the previous
one to the one it wants), which still doesn't reveal the answer to the
question, and it would therefore loop again.

It's completely unclear to me why I thought it was a good idea to
start the xc_domain_getinfolist with the last domain previously found
rather than the first one left un-confirmed.  The code has been that
way since it was introduced.

Instead, start each xc_domain_getinfolist at the next domain whose
status we need to check.

We also need to move the test for !evg into the loop, we now need evg
to compute the arguments to getinfolist.

Signed-off-by: Ian Jackson 
Reported-by: Jim Fehlig 
Reviewed-by: Jim Fehlig 
Tested-by: Jim Fehlig 
---
 tools/libxl/libxl.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index 088786e..e7eb863 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1168,22 +1168,20 @@ static void domain_death_xswatch_callback(libxl__egc 
*egc, libxl__ev_xswatch *w,
 const char *wpath, const char *epath) {
 EGC_GC;
 libxl_evgen_domain_death *evg;
-uint32_t domid;
 int rc;

 CTX_LOCK;

 evg = LIBXL_TAILQ_FIRST(&CTX->death_list);
-if (!evg) goto out;
-
-domid = evg->domid;

 for (;;) {
+if (!evg) goto out;
+
 int nentries = LIBXL_TAILQ_NEXT(evg, entry) ? 200 : 1;
 xc_domaininfo_t domaininfos[nentries];
 const xc_domaininfo_t *got = domaininfos, *gotend;

-rc = xc_domain_getinfolist(CTX->xch, domid, nentries, domaininfos);
+rc = xc_domain_getinfolist(CTX->xch, evg->domid, nentries, 
domaininfos);
 if (rc == -1) {
 LIBXL__EVENT_DISASTER(egc, "xc_domain_getinfolist failed while"
   " processing @releaseDomain watch event",
@@ -1193,8 +1191,10 @@ static void domain_death_xswatch_callback(libxl__egc 
*egc, libxl__ev_xswatch *w,
 gotend = &domaininfos[rc];

 LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "[evg=%p:%"PRIu32"]"
-   " from domid=%"PRIu32" nentries=%d rc=%d",
-   evg, evg->domid, domid, nentries, rc);
+   " nentries=%d rc=%d %ld..%ld",
+   evg, evg->domid, nentries, rc,
+   rc>0 ? (long)domaininfos[0].domain : 0,
+   rc>0 ? (long)domaininfos[rc-1].domain : 0);

 for (;;) {
 if (!evg) {
@@ -1257,7 +1257,6 @@ static void domain_death_xswatch_callback(libxl__egc 
*egc, libxl__ev_xswatch *w,
 }

 assert(rc); /* rc==0 results in us eating all evgs and quitting */
-domid = gotend[-1].domain;
 }
  all_reported:
  out:
-- 
1.8.0.1

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 4/5] xen: sched_credit2: more info when dumping

2015-03-17 Thread Dario Faggioli

more specifically, for each runqueue, print what pCPUs
belong to it, which ones are idle and which ones have
been tickled.

While there, also convert the whole file to use
keyhandler_scratch for printing cpumask-s.

Signed-off-b: Dario Faggioli 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Keir Fraser 
Reviewed-by: George Dunlap 
---
 xen/common/sched_credit2.c |   13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ae9b359..8aa1438 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define d2printk(x...)
 //#define d2printk printk
@@ -1836,7 +1837,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 unsigned long flags;
 spinlock_t *lock;
 int loop;
-char cpustr[100];
+#define cpustr keyhandler_scratch
 
 /*
  * We need both locks:
@@ -1877,6 +1878,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu)
 
 spin_unlock(lock);
 spin_unlock_irqrestore(&prv->lock, flags);
+#undef cpustr
 }
 
 static void
@@ -1886,6 +1888,7 @@ csched2_dump(const struct scheduler *ops)
 struct csched2_private *prv = CSCHED2_PRIV(ops);
 unsigned long flags;
 int i, loop;
+#define cpustr keyhandler_scratch
 
 /* We need the private lock as we access global scheduler data
  * and (below) the list of active domains. */
@@ -1901,17 +1904,24 @@ csched2_dump(const struct scheduler *ops)
 
 fraction = prv->rqd[i].avgload * 100 / (1ULLrqd[i].active);
 printk("Runqueue %d:\n"
"\tncpus  = %u\n"
+   "\tcpus   = %s\n"
"\tmax_weight = %d\n"
"\tinstload   = %d\n"
"\taveload= %3"PRI_stime"\n",
i,
cpumask_weight(&prv->rqd[i].active),
+   cpustr,
prv->rqd[i].max_weight,
prv->rqd[i].load,
fraction);
 
+cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].idle);
+printk("\tidlers: %s\n", cpustr);
+cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].tickled);
+printk("\ttickled: %s\n", cpustr);
 }
 
 printk("Domain info:\n");
@@ -1942,6 +1952,7 @@ csched2_dump(const struct scheduler *ops)
 }
 
 spin_unlock_irqrestore(&prv->lock, flags);
+#undef cpustr
 }
 
 static void activate_runqueue(struct csched2_private *prv, int rqi)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()

2015-03-17 Thread Jan Beulich

Most callers have this available already, and the functions don't need
any vcpu specifics.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1416,9 +1416,8 @@ do {
 //shadow-types.h to shadow-private.h
 //
 #if GUEST_PAGING_LEVELS == 4
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn)
+void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn)
 {
-struct domain *d = v->domain;
 shadow_l4e_t *sl4e;
 unsigned int slots;
 
@@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct
 shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR);
 
 /* Self linear mapping.  */
-if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) )
+if ( shadow_mode_translate(d) && !shadow_mode_external(d) )
 {
 // linear tables may not be used with translated PV guests
 sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] =
@@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in_l4(struct
 // place, which means that we need to populate the l2h entry in the l3
 // table.
 
-static void sh_install_xen_entries_in_l2h(struct vcpu *v, mfn_t sl2hmfn)
+static void sh_install_xen_entries_in_l2h(struct domain *d, mfn_t sl2hmfn)
 {
-struct domain *d = v->domain;
 shadow_l2e_t *sl2e;
 
-if ( !is_pv_32on64_vcpu(v) )
+if ( !is_pv_32on64_domain(d) )
 return;
 
 sl2e = sh_map_domain_page(sl2hmfn);
@@ -1549,11 +1547,13 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf
 {
 #if GUEST_PAGING_LEVELS == 4
 case SH_type_l4_shadow:
-sh_install_xen_entries_in_l4(v, gmfn, smfn); break;
+sh_install_xen_entries_in_l4(v->domain, gmfn, smfn);
+break;
 #endif
 #if GUEST_PAGING_LEVELS >= 3
 case SH_type_l2h_shadow:
-sh_install_xen_entries_in_l2h(v, smfn); break;
+sh_install_xen_entries_in_l2h(v->domain, smfn);
+break;
 #endif
 default: /* Do nothing */ break;
 }
@@ -1594,7 +1594,7 @@ sh_make_monitor_table(struct vcpu *v)
 {
 mfn_t m4mfn;
 m4mfn = shadow_alloc(d, SH_type_monitor_table, 0);
-sh_install_xen_entries_in_l4(v, m4mfn, m4mfn);
+sh_install_xen_entries_in_l4(d, m4mfn, m4mfn);
 /* Remember the level of this table */
 mfn_to_page(m4mfn)->shadow_flags = 4;
 #if SHADOW_PAGING_LEVELS < 4
@@ -1618,7 +1618,7 @@ sh_make_monitor_table(struct vcpu *v)
 l3e[0] = l3e_from_pfn(mfn_x(m2mfn), __PAGE_HYPERVISOR);
 sh_unmap_domain_page(l3e);
 
-if ( is_pv_32on64_vcpu(v) )
+if ( is_pv_32on64_domain(d) )
 {
 /* For 32-on-64 PV guests, we need to map the 32-bit Xen
  * area into its usual VAs in the monitor tables */
@@ -1630,7 +1630,7 @@ sh_make_monitor_table(struct vcpu *v)
 mfn_to_page(m2mfn)->shadow_flags = 2;
 l3e = sh_map_domain_page(m3mfn);
 l3e[3] = l3e_from_pfn(mfn_x(m2mfn), _PAGE_PRESENT);
-sh_install_xen_entries_in_l2h(v, m2mfn);
+sh_install_xen_entries_in_l2h(d, m2mfn);
 sh_unmap_domain_page(l3e);
 }
 
--- a/xen/arch/x86/mm/shadow/private.h
+++ b/xen/arch/x86/mm/shadow/private.h
@@ -361,7 +361,7 @@ mfn_t shadow_alloc(struct domain *d, 
 void  shadow_free(struct domain *d, mfn_t smfn);
 
 /* Install the xen mappings in various flavours of shadow */
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn);
+void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn);
 
 /* Update the shadows in response to a pagetable write from Xen */
 int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size);



x86/shadow: pass domain to sh_install_xen_entries_in_lN()

Most callers have this available already, and the functions don't need
any vcpu specifics.

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1416,9 +1416,8 @@ do {
 //shadow-types.h to shadow-private.h
 //
 #if GUEST_PAGING_LEVELS == 4
-void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn)
+void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn)
 {
-struct domain *d = v->domain;
 shadow_l4e_t *sl4e;
 unsigned int slots;
 
@@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct
 shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR);
 
 /* Self linear mapping.  */
-if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) )
+if ( shadow_mode_translate(d) && !shadow_mode_external(d) )
 {
 // linear tables may not be used with translated PV guests
 sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] =
@@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in

Re: [Xen-devel] [PATCH] tools/libxl: avoid comparing an unsigned int to -1

2015-03-17 Thread Wei Liu

On Mon, Mar 16, 2015 at 10:12:34AM +, Koushik Chakravarty wrote:
> Signed-off-by: Koushik Chakravarty 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 

Acked-by: Wei Liu 

Ian J, this one should be backported to 4.5.

> ---
>  tools/libxl/libxl_json.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c
> index 98335b0..346929a 100644
> --- a/tools/libxl/libxl_json.c
> +++ b/tools/libxl/libxl_json.c
> @@ -1013,7 +1013,7 @@ out:
>  yajl_gen_status libxl__uint64_gen_json(yajl_gen hand, uint64_t val)
>  {
>  char *num;
> -unsigned int len;
> +int len;
>  yajl_gen_status s;
>  
>  
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 2/3] slightly reduce vm_assist code

2015-03-17 Thread Jan Beulich

- drop an effectively unused struct pv_vcpu field (x86)
- adjust VM_ASSIST() to prepend VMASST_TYPE_

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -901,7 +901,6 @@ int arch_set_info_guest(
 v->arch.pv_vcpu.event_callback_cs = c(event_callback_cs);
 v->arch.pv_vcpu.failsafe_callback_cs = c(failsafe_callback_cs);
 }
-v->arch.pv_vcpu.vm_assist = c(vm_assist);
 
 /* Only CR0.TS is modifiable by guest or admin. */
 v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS;
@@ -973,7 +972,7 @@ int arch_set_info_guest(
 case -ERESTART:
 break;
 case 0:
-if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+if ( !compat && !VM_ASSIST(d, m2p_strict) &&
  !paging_mode_refcounts(d) )
 {
 l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
@@ -1023,7 +1022,7 @@ int arch_set_info_guest(
 cr3_page = NULL;
 break;
 case 0:
-if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+if ( VM_ASSIST(d, m2p_strict) )
 {
 l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
 
--- a/xen/arch/x86/domctl.c
+++ b/xen/arch/x86/domctl.c
@@ -1436,7 +1436,6 @@ void arch_get_info_guest(struct vcpu *v,
 c(event_callback_cs = v->arch.pv_vcpu.event_callback_cs);
 c(failsafe_callback_cs = v->arch.pv_vcpu.failsafe_callback_cs);
 }
-c(vm_assist = v->arch.pv_vcpu.vm_assist);
 
 /* IOPL privileges are virtualised: merge back into returned eflags. */
 BUG_ON((c(user_regs.eflags) & X86_EFLAGS_IOPL) != 0);
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1454,7 +1454,7 @@ static int alloc_l4_table(struct page_in
 adjust_guest_l4e(pl4e[i], d);
 }
 
-init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
+init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict));
 unmap_domain_page(pl4e);
 
 return rc > 0 ? 0 : rc;
@@ -2765,7 +2765,7 @@ int new_guest_cr3(unsigned long mfn)
 
 invalidate_shadow_ldt(curr, 0);
 
-if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) )
+if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
 {
 l4_pgentry_t *l4tab = map_domain_page(mfn);
 
@@ -3135,8 +3135,7 @@ long do_mmuext_op(
 op.arg1.mfn);
 break;
 }
-if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
- !paging_mode_refcounts(d) )
+if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) )
 {
 l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn);
 
--- a/xen/arch/x86/mm/shadow/multi.c
+++ b/xen/arch/x86/mm/shadow/multi.c
@@ -1436,7 +1436,7 @@ void sh_install_xen_entries_in_l4(struct
 shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg),
 __PAGE_HYPERVISOR);
 
-if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+if ( !VM_ASSIST(d, m2p_strict) )
 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty();
 
 /* Shadow linear mapping for 4-level shadows.  N.B. for 3-level
@@ -3983,11 +3983,11 @@ sh_update_cr3(struct vcpu *v, int do_loc
 shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable;
 
 if ( (v->arch.flags & TF_kernel_mode) &&
- !VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+ !VM_ASSIST(d, m2p_strict) )
 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
 idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
 else if ( !(v->arch.flags & TF_kernel_mode) &&
-  VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+  VM_ASSIST(d, m2p_strict) )
 sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] =
 shadow_l4e_empty();
 }
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -1441,7 +1441,7 @@ static int fixup_page_fault(unsigned lon
  !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) &&
  (regs->error_code & PFEC_write_access) )
 {
-if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) &&
+if ( VM_ASSIST(d, writable_pagetables) &&
  /* Do not check if access-protection fault since the page may
 legitimately be not present in shadow page tables */
  (paging_mode_enabled(d) ||
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -306,7 +306,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL
 {
 case 0:
 fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
-if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
+if ( VM_ASSIST(d, pae_extended_cr3) )
 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
 if ( paging_mod

[Xen-devel] [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support

2015-03-17 Thread Boris Ostrovsky


Changes in v19:

* Do not allow changing mode to/from OFF/ALL while guests are
  running. This significantly simplifies code due
  to large number of corner cases that I had to deal with. Most of the
  changes are in patch#5. This also makes patch 4 from last version
  unnecessary
* Defer NMI support (drop patch#14 from last version)
* Make patch#15 from last series be patch#1 (vpmu init cleanup)
* Other changes are listed per patch


Changes in v18:

* Return 1 (i.e. "handled") in vpmu_do_interrupt() if PMU_CACHED is
  set. This is needed since we can get an interrupt while this flag is
  set on AMD processors when multiple counters are in use (**) (AMD
  processor don't mask LVTPC when PMC interrupt happens and so there
  is a window in vpmu_do_interrupt() until it sets the mask
  bit). Patch #14
* Unload both current and last_vcpu (if different) vpmu and clear
  this_cpu(last_vcpu) in vpmu_unload_all. Patch #5
* Make major version check for certain xenpmu_ops. Patch #5
* Make xenpmu_op()'s first argument unsigned. Patch #5
* Don't use format specifier for __stringify(). Patch #6
* Don't print generic error in vpmu_init(). Patch #6
* Don't test for VPMU existance in vpmu_initialise(). New patch #15
* Added vpmu_disabled flag to make sure VPMU doesn't get reenabled from
  dom0 (for example when watchdog is active). Patch #5
* Updated tags on some patches to better reflect latest reviewed status)

(**) While testing this I discovered that AMD VPMU is quite broken for
HVM: when multiple counters are in use linux dom0 often gets
unexpected NMIs. This may have something to do with what I mentioned
in the first bullet. However, this doesn't appear to be related to
this patch series (or earlier VPMU patches) --- I can reproduce this
all the way back to 4.1

Changes in v17:
* Disable VPMU when unknown CPU vendor is detected (patch #2)
* Remove unnecessary vendor tests in vendor-specific init routines (patch #14)
* Remember first CPU that starts mode change and use it to stop the cycle 
(patch #13)
* If vpmu ops is not present, return 0 as value for VPMU MSR read (as opposed 
to 
  returning an error as was the case in previous patch.) (patch #18)
  * Change slightly vpmu_do_msr() logic as result of this chage (patch #20)
* stringify VPMU version (patch #14)
* Use 'CS > 1' to mark sample as PMU_SAMPLE_USER (patch #19)

Changes in v16:

* Many changes in VPMU mode patch (#13):
  * Replaced arguments to some vpmu routines (vcpu -> vpmu). New patch (#12)
  * Added vpmu_unload vpmu op to completely unload vpmu data (e.g clear
MSR bitmaps). This routine may be called in context switch 
(vpmu_switch_to()).
  * Added vmx_write_guest_msr_vcpu() interface to write MSRs of non-current VCPU
  * Use cpumask_cycle instead of cpumask_next
  * Dropped timeout error
  * Adjusted types of mode variables
  * Don't allow oprofile to allocate its context on MSR access if VPMU context
has already been allocated (which may happen when VMPU mode was set to off
while the guest was running)
* vpmu_initialise() no longer turns off VPMU globally on failure. New patch (#2)
* vpmu_do_msr() will return 1 (failure) if vpmu_ops are not set. This is done to
  prevent PV guests that are not VPMU-enabled from wrongly assuming that they 
have
  access to counters (Linux check_hw_exists() will make this assumption) (patch 
#18)
* Add cpl field to shared structure that will be passed for HVM guests' samples
  (instead of PMU_SAMPLE_USER flag). Add PMU_SAMPLE_PV flag to mark whose sample
  is passed up. (Patches ## 10, 19, 22)

Changes in v15:

* Rewrote vpmu_force_context_switch() to use continue_hypercall_on_cpu()
* Added vpmu_init initcall that will call vendor-specific init routines
* Added a lock to vpmu_struct to serialize pmu_init()/pmu_finish()
* Use SS instead of CS for DPL (instead of RPL)
* Don't take lock for XENPMU_mode_get
* Make vmpu_mode/features an unsigned int (from uint64_t)
* Adjusted pvh_hypercall64_table[] order
* Replaced address range check [XEN_VIRT_START..XEN_VIRT_END] with guest_mode()
* A few style cleanups

Changes in v14:

* Moved struct xen_pmu_regs to pmu.h
* Moved CHECK_pmu_* to an earlier patch (when structures are first introduced)
* Added PMU_SAMPLE_REAL flag to indicate whether the sample was taken in real 
mode
* Simplified slightly setting rules for xenpmu_data flags
* Rewrote vpmu_force_context_switch() to again use continuations. (Returning 
EAGAIN
  to user would mean that VPMU mode may get into inconsistent state (across 
processors)
  and dealing with that is more compicated than I'd like).
* Fixed msraddr_to_bitpos() and converted it into an inline
* Replaced address range check in vmpu_do_interrupt() with guest_mode()
* No error returns from __initcall
* Rebased on top of recent VPMU changes
* Various cleanups

Changes in v13:

* Rearranged data in xenpf_symdata to eliminate a hole (no change in
  structure size)
* Removed unnecessary zeroing of last character in name string during
  symbol re

[Xen-devel] [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P

2015-03-17 Thread Jan Beulich

Xen L4 entries being uniformly installed into any L4 table and 64-bit
PV kernels running in ring 3 means that user mode was able to see the
read-only M2P presented by Xen to the guests. While apparently not
really representing an exploitable information leak, this still very
certainly was never meant to be that way.

Building on the fact that these guests already have separate kernel and
user mode page tables we can allow guest kernels to tell Xen that they
don't want user mode to see this table. We can't, however, do this by
default: There is no ABI requirement that kernel and user mode page
tables be separate. Therefore introduce a new VM-assist flag allowing
the guest to control respective hypervisor behavior:
- when not set, L4 tables get created with the respective slot blank,
  and whenever the L4 table gets used as a kernel one the missing
  mapping gets inserted,
- when set, L4 tables get created with the respective slot initialized
  as before, and whenever the L4 table gets used as a user one the
  mapping gets zapped.

Since the new flag gets assigned a value discontiguous to the existing
ones (in order to preserve the low bits, as only those are currently
accessible to 32-bit guests), this requires a little bit of rework of
the VM assist code in general: An architecture specific
VM_ASSIST_VALID definition gets introduced (with an optional compat
mode counterpart), and compilation of the respective code becomes
conditional upon this being defined (ARM doesn't wire these up and
hence doesn't need that code).

Signed-off-by: Jan Beulich 

--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu *
 
 l4tab = __map_domain_page(pg);
 clear_page(l4tab);
-init_guest_l4_table(l4tab, v->domain);
+init_guest_l4_table(l4tab, v->domain, 1);
 unmap_domain_page(l4tab);
 
 v->arch.guest_table = pagetable_from_page(pg);
@@ -971,7 +971,17 @@ int arch_set_info_guest(
 case -EINTR:
 rc = -ERESTART;
 case -ERESTART:
+break;
 case 0:
+if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) &&
+ !paging_mode_refcounts(d) )
+{
+l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)];
+unmap_domain_page(l4tab);
+}
 break;
 default:
 if ( cr3_page == current->arch.old_guest_table )
@@ -1006,7 +1016,16 @@ int arch_set_info_guest(
 default:
 if ( cr3_page == current->arch.old_guest_table )
 cr3_page = NULL;
+break;
 case 0:
+if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) )
+{
+l4_pgentry_t *l4tab = __map_domain_page(cr3_page);
+
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] =
+l4e_empty();
+unmap_domain_page(l4tab);
+}
 break;
 }
 }
--- a/xen/arch/x86/domain_build.c
+++ b/xen/arch/x86/domain_build.c
@@ -1203,7 +1203,7 @@ int __init construct_dom0(
 l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE;
 }
 clear_page(l4tab);
-init_guest_l4_table(l4tab, d);
+init_guest_l4_table(l4tab, d, 0);
 v->arch.guest_table = pagetable_from_paddr(__pa(l4start));
 if ( is_pv_32on64_domain(d) )
 v->arch.guest_table_user = v->arch.guest_table;
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in
 return rc > 0 ? 0 : rc;
 }
 
-void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d)
+void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d,
+ bool_t zap_ro_mpt)
 {
 /* Xen private mappings. */
 memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT],
@@ -1395,6 +1396,8 @@ void init_guest_l4_table(l4_pgentry_t l4
 l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR);
 l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] =
 l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR);
+if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) )
+l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty();
 }
 
 static int alloc_l4_table(struct page_info *page)
@@ -1444,7 +1447,7 @@ static int alloc_l4_table(struct page_in
 adjust_guest_l4e(pl4e[i], d);
 }
 
-init_guest_l4_table(pl4e, d);
+init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict));
 unmap_domain_page(pl4e);
 
 return rc > 0 ? 0 : rc;
@@ -2755,6 +2758,14 @@ int new_guest_cr3(unsigned long mfn)
 
 invalidate_shadow_ldt(curr, 0);
 
+if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcou

[Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.

2015-03-17 Thread Konrad Rzeszutek Wilk

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' (on another CPU or
on the one running the softirq) to schedule the dpci.

Specifically this can happen when the other CPU receives
an interrupt, calls 'raise_softirq_for', and puts the dpci
on its per-cpu list (same dpci structure). Note that
this could also happen on the same physical CPU, however
the explanation for simplicity will assume two CPUs actors.

There would be two 'dpci_softirq' running at the same time
(on different CPUs) where on one CPU it would be executing
hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
and on the other CPU it is trying to call:

   if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
BUG();

Since STATE_RUN is already set it would end badly.

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Potential solutions:

a) Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clears.

b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci.
The BUG() check in dpci_softirq would be replace with a spin until
'STATE_RUN' has been cleared. The dpci would still not
be scheduled when STATE_SCHED bit was set.

c) Only schedule the dpci when the state is cleared
(no STATE_SCHED and no STATE_RUN).  It would spin if STATE_RUN is set
(as it is in progress and will finish). If the STATE_SCHED is set
(so hasn't run yet) we won't try to spin and just exit.

Down-sides of the solutions:

a). Live-lock of the CPU. We could be finishing an dpci, then adding
the dpci, exiting, and the processing the dpci once more. And so on.
We would eventually stop as the TIMER_SOFTIRQ would be set, which will
cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop.

Interestingly the old ('tasklet') code used this mechanism.
If the function assigned to the tasklet was running  - the softirq
that ran said function (hvm_dirq_assist) would be responsible for
putting the tasklet back on the per-cpu list. This would allow
to have an running tasklet and an 'to-be-scheduled' tasklet
at the same time.

b). is similar to a) - instead of re-entering the dpci_softirq
we are looping in the softirq waiting for the correct condition to
arrive. As it does not allow unwedging ourselves because the other
softirqs are not called - it is less preferable.

c) can cause an dead-lock if the interrupt comes in when we are
processing the dpci in the softirq - iff this happens on the same CPU.
We would be looping in on raise_softirq waiting for STATE_RUN
to be cleared, while the softirq that was to clear it - is preempted
by our interrupt handler.

As such, this patch - which implements a) is the best candidate
for this quagmire.

Reported-and-Tested-by: Sander Eikelenboom 
Reported-and-Tested-by: Malcolm Crossley 
Signed-off-by: Konrad Rzeszutek Wilk 
---
 xen/drivers/passthrough/io.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
index ae050df..9b77334 100644
--- a/xen/drivers/passthrough/io.c
+++ b/xen/drivers/passthrough/io.c
@@ -804,7 +804,17 @@ static void dpci_softirq(void)
 d = pirq_dpci->dom;
 smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
 if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
-BUG();
+{
+unsigned long flags;
+
+/* Put back on the list and retry. */
+local_irq_save(flags);
+list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
+local_irq_restore(flags);
+
+raise_softirq(HVM_DPCI_SOFTIRQ);
+continue;
+}
 /*
  * The one who clears STATE_SCHED MUST refcount the domain.
  */
-- 
2.1.0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest

2015-03-17 Thread Boris Ostrovsky

Export Xen's symbols as {} triplet via new XENPF_get_symbol
hypercall

Signed-off-by: Boris Ostrovsky 
Acked-by: Daniel De Graaf 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Dietmar Hahn 
Tested-by: Dietmar Hahn 
---
 xen/arch/x86/platform_hypercall.c   | 28 +++
 xen/common/symbols.c| 54 +
 xen/include/public/platform.h   | 19 +
 xen/include/xen/symbols.h   |  3 +++
 xen/include/xlat.lst|  1 +
 xen/xsm/flask/hooks.c   |  4 +++
 xen/xsm/flask/policy/access_vectors |  2 ++
 7 files changed, 111 insertions(+)

diff --git a/xen/arch/x86/platform_hypercall.c 
b/xen/arch/x86/platform_hypercall.c
index 334d474..7626261 100644
--- a/xen/arch/x86/platform_hypercall.c
+++ b/xen/arch/x86/platform_hypercall.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -798,6 +799,33 @@ ret_t 
do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op)
 }
 break;
 
+case XENPF_get_symbol:
+{
+static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */
+XEN_GUEST_HANDLE(char) nameh;
+uint32_t namelen, copylen;
+
+guest_from_compat_handle(nameh, op->u.symdata.name);
+
+ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type,
+   &op->u.symdata.address, name);
+
+namelen = strlen(name) + 1;
+
+if ( namelen > op->u.symdata.namelen )
+copylen = op->u.symdata.namelen;
+else
+copylen = namelen;
+
+op->u.symdata.namelen = namelen;
+
+if ( !ret && copy_to_guest(nameh, name, copylen) )
+ret = -EFAULT;
+if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) )
+ret = -EFAULT;
+}
+break;
+
 default:
 ret = -ENOSYS;
 break;
diff --git a/xen/common/symbols.c b/xen/common/symbols.c
index bc2fde6..2c0942d 100644
--- a/xen/common/symbols.c
+++ b/xen/common/symbols.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #ifdef SYMBOLS_ORIGIN
 extern const unsigned int symbols_offsets[1];
@@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr,
 *offset = addr - symbols_address(low);
 return namebuf;
 }
+
+/*
+ * Get symbol type information. This is encoded as a single char at the
+ * beginning of the symbol name.
+ */
+static char symbols_get_symbol_type(unsigned int off)
+{
+/*
+ * Get just the first code, look it up in the token table,
+ * and return the first char from this token.
+ */
+return symbols_token_table[symbols_token_index[symbols_names[off + 1]]];
+}
+
+int xensyms_read(uint32_t *symnum, char *type,
+ uint64_t *address, char *name)
+{
+/*
+ * Symbols are most likely accessed sequentially so we remember position
+ * from previous read. This can help us avoid the extra call to
+ * get_symbol_offset().
+ */
+static uint64_t next_symbol, next_offset;
+static DEFINE_SPINLOCK(symbols_mutex);
+
+if ( *symnum > symbols_num_syms )
+return -ERANGE;
+if ( *symnum == symbols_num_syms )
+{
+/* No more symbols */
+name[0] = '\0';
+return 0;
+}
+
+spin_lock(&symbols_mutex);
+
+if ( *symnum == 0 )
+next_offset = next_symbol = 0;
+if ( next_symbol != *symnum )
+/* Non-sequential access */
+next_offset = get_symbol_offset(*symnum);
+
+*type = symbols_get_symbol_type(next_offset);
+next_offset = symbols_expand_symbol(next_offset, name);
+*address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN;
+
+next_symbol = ++*symnum;
+
+spin_unlock(&symbols_mutex);
+
+return 0;
+}
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 82ec84e..1e6a6ce 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -590,6 +590,24 @@ struct xenpf_resource_op {
 typedef struct xenpf_resource_op xenpf_resource_op_t;
 DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t);
 
+#define XENPF_get_symbol   63
+struct xenpf_symdata {
+/* IN/OUT variables */
+uint32_t namelen; /* IN:  size of name buffer   */
+  /* OUT: strlen(name) of hypervisor symbol (may be */
+  /*  larger than what's been copied to guest)  */
+uint32_t symnum;  /* IN:  Symbol to read*/
+  /* OUT: Next available symbol. If same as IN then */
+  /*  we reached the end*/
+
+/* OUT variables */
+XEN_GUEST_HANDLE(char) name;
+uint64_t address;
+char type;
+};
+typedef struct xenpf_symdata xenpf_symdata_t;
+DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t);
+
 /*
  * ` enum neg_errnoval
  * ` HYPERVISOR_platform_op(const struct xen_platform_op*);
@@ -619,6 +637,7 @@ struct xen_platform_op {

Re: [Xen-devel] [PATCH] tools/libxl: close the logfile_w and null file descriptors in libxl__spawn_qdisk_backend() error path

2015-03-17 Thread Wei Liu

On Mon, Mar 16, 2015 at 10:09:29AM +, Koushik Chakravarty wrote:
> Signed-off-by: Koushik Chakravarty 
> CC: Ian Jackson 
> CC: Stefano Stabellini 
> CC: Ian Campbell 
> CC: Wei Liu 
> ---
>  tools/libxl/libxl_dm.c |   10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
> index cb006df..161401c 100644
> --- a/tools/libxl/libxl_dm.c
> +++ b/tools/libxl/libxl_dm.c
> @@ -1508,7 +1508,7 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
> libxl__dm_spawn_state *dmss)
>  flexarray_t *dm_args;
>  char **args;
>  const char *dm;
> -int logfile_w, null, rc;
> +int logfile_w, null = -1, rc;
>  uint32_t domid = dmss->guest_domid;
>  
>  /* Always use qemu-xen as device model */
> @@ -1534,6 +1534,10 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
> libxl__dm_spawn_state *dmss)
>  goto error;
>  }
>  null = open("/dev/null", O_RDONLY);
> +if (null < 0) {
> +   rc = ERROR_FAIL;
> +   goto error;
> +}
>  
>  dmss->guest_config = NULL;
>  /*
> @@ -1568,6 +1572,10 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, 
> libxl__dm_spawn_state *dmss)
>  
>  error:
>  assert(rc);
> +if(logfile_w >= 0)
> +   close(logfile_w);
> +if(null >= 0)
> +   close(null);

Please add space between `if' and `('.

Also you can just write

   if (logfile_w >= 0) close (logfile_w);
   if (null >= 0) close (null);

Wei.
>  dmss->callback(egc, dmss, rc);
>  return;
>  }
> -- 
> 1.7.10.4

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [Fwd: [PATCH v2 0/5] Improving dumping of scheduler related info]

2015-03-17 Thread Dario Faggioli

Forgot to Cc people in the cover letter of the series... Sorry!

 Forwarded Message 
From: Dario Faggioli 
To: Xen-devel 
Subject: [PATCH v2 0/5] Improving dumping of scheduler related info
Date: Tue, 17 Mar 2015 16:32:41 +0100
Mailer: StGit/0.17.1-dirty
Message-Id: <20150317152615.9867.48676.stgit@Solace.station>

Take 2. Some of the patches have been checked-in already, so here's what's
remaining:
 - fix a bug in the RTDS scheduler (patch 1),
 - improve how the whole process of dumping scheduling info is serialized,
   by moving all locking code into specific schedulers (patch 2),
 - print more useful scheduling related information (patches 3, 4 and 5).

Git branch here:

  git://xenbits.xen.org/people/dariof/xen.git  rel/sched/dump-v2
  
http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/dump-v2

I think I addressed all the comments raised upon v1. More details in the
changelogs of the various patches.

Thanks and Regards,
Dario

---
Dario Faggioli (5):
  xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
  xen: rework locking for dump of scheduler info (debug-key r)
  xen: print online pCPUs and free pCPUs when dumping
  xen: sched_credit2: more info when dumping
  xen: sched_rt: print useful affinity info when dumping

 xen/common/cpupool.c   |   12 +
 xen/common/sched_credit.c  |   42 ++-
 xen/common/sched_credit2.c |   53 +---
 xen/common/sched_rt.c  |   59 
 xen/common/sched_sedf.c|   16 
 xen/common/schedule.c  |5 +---
 6 files changed, 157 insertions(+), 30 deletions(-)



signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall

2015-03-17 Thread Boris Ostrovsky

Move some VPMU initilization operations into __initcalls to avoid performing
same tests and calculations for each vcpu.

Signed-off-by: Boris Ostrovsky 
Acked-by: Jan Beulich 
---
 xen/arch/x86/hvm/svm/vpmu.c   | 106 --
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 +++---
 xen/arch/x86/hvm/vpmu.c   |  32 
 xen/include/asm-x86/hvm/vpmu.h|   2 +
 4 files changed, 155 insertions(+), 136 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 481ea7b..b60ca40 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t 
*msr_content)
 return 1;
 }
 
-static int amd_vpmu_initialise(struct vcpu *v)
-{
-struct xen_pmu_amd_ctxt *ctxt;
-struct vpmu_struct *vpmu = vcpu_vpmu(v);
-uint8_t family = current_cpu_data.x86;
-
-if ( counters == NULL )
-{
- switch ( family )
-{
-case 0x15:
-num_counters = F15H_NUM_COUNTERS;
-counters = AMD_F15H_COUNTERS;
-ctrls = AMD_F15H_CTRLS;
-k7_counters_mirrored = 1;
-break;
-case 0x10:
-case 0x12:
-case 0x14:
-case 0x16:
-default:
-num_counters = F10H_NUM_COUNTERS;
-counters = AMD_F10H_COUNTERS;
-ctrls = AMD_F10H_CTRLS;
-k7_counters_mirrored = 0;
-break;
-}
-}
-
-ctxt = xzalloc_bytes(sizeof(*ctxt) +
- 2 * sizeof(uint64_t) * num_counters);
-if ( !ctxt )
-{
-gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, "
-" PMU feature is unavailable on domain %d vcpu %d.\n",
-v->vcpu_id, v->domain->domain_id);
-return -ENOMEM;
-}
-
-ctxt->counters = sizeof(*ctxt);
-ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
-
-vpmu->context = ctxt;
-vpmu->priv_context = NULL;
-vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
-return 0;
-}
-
 static void amd_vpmu_destroy(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
@@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = {
 
 int svm_vpmu_initialise(struct vcpu *v)
 {
+struct xen_pmu_amd_ctxt *ctxt;
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
-uint8_t family = current_cpu_data.x86;
-int ret = 0;
 
-/* vpmu enabled? */
 if ( vpmu_mode == XENPMU_MODE_OFF )
 return 0;
 
-switch ( family )
+if ( !counters )
+return -EINVAL;
+
+ctxt = xzalloc_bytes(sizeof(*ctxt) +
+ 2 * sizeof(uint64_t) * num_counters);
+if ( !ctxt )
 {
+printk(XENLOG_G_WARNING "Insufficient memory for PMU, "
+   " PMU feature is unavailable on domain %d vcpu %d.\n",
+   v->vcpu_id, v->domain->domain_id);
+return -ENOMEM;
+}
+
+ctxt->counters = sizeof(*ctxt);
+ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
+
+vpmu->context = ctxt;
+vpmu->priv_context = NULL;
+
+vpmu->arch_vpmu_ops = &amd_vpmu_ops;
+
+vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED);
+return 0;
+}
+
+int __init amd_vpmu_init(void)
+{
+switch ( current_cpu_data.x86 )
+{
+case 0x15:
+num_counters = F15H_NUM_COUNTERS;
+counters = AMD_F15H_COUNTERS;
+ctrls = AMD_F15H_CTRLS;
+k7_counters_mirrored = 1;
+break;
 case 0x10:
 case 0x12:
 case 0x14:
-case 0x15:
 case 0x16:
-ret = amd_vpmu_initialise(v);
-if ( !ret )
-vpmu->arch_vpmu_ops = &amd_vpmu_ops;
-return ret;
+num_counters = F10H_NUM_COUNTERS;
+counters = AMD_F10H_COUNTERS;
+ctrls = AMD_F10H_CTRLS;
+k7_counters_mirrored = 0;
+break;
+default:
+printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n",
+   current_cpu_data.x86);
+return -EINVAL;
 }
 
-printk("VPMU: Initialization failed. "
-   "AMD processor family %d has not "
-   "been supported\n", family);
-return -EINVAL;
+return 0;
 }
 
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index 6280644..17d1b04 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
*regs)
 return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v)
-{
-struct vpmu_struct *vpmu = vcpu_vpmu(v);
-u64 msr_content;
-static bool_t ds_warned;
-
-if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
-goto func_out;
-/* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
-while ( boot_cpu_has(X86_FEATURE_DS) )
-{
-if ( !boot_cpu_has(X86_FEATURE_DTES64) )
-{
-if ( !ds_warned )
-printk(XENLOG_G_WARNING "CPU does

[Xen-devel] [PATCH 3/3] libxl: Domain destroy: fork

2015-03-17 Thread Jim Fehlig

From: Ian Jackson 

Call xc_domain_destroy in a subprocess.  That allows us to do so
asynchronously, rather than blocking the whole process calling libxl.

The changes in detail:

 * Provide an libxl__ev_child in libxl__domain_destroy_state, and
   initialise it in libxl__domain_destroy.  There is no possibility
   to `clean up' a libxl__ev_child, but there need to clean it up, as
   the control flow ensures that we only continue after the child has
   exited.

 * Call libxl__ev_child_fork at the right point and put the call to
   xc_domain_destroy and associated logging in the child.  (The child
   opens a new xenctrl handle because we mustn't use the parent's.)

 * Consequently, the success return path from domain_destroy_domid_cb
   no longer calls dis->callback.  Instead it simply returns.

 * We plumb the errorno value through the child's exit status, if it
   fits.  This means we normally do the logging only in the parent.

 * Incidentally, we fix the bug that we were treating the return value
   from xc_domain_destroy as an errno value when in fact it is a
   return value from do_domctl (in this case, 0 or -1 setting errno).

Signed-off-by: Ian Jackson 
Reviewed-by: Jim Fehlig 
Tested-by: Jim Fehlig 
---
 tools/libxl/libxl.c  | 57 
 tools/libxl/libxl_internal.h |  1 +
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b6541d4..b43db1a 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -1481,6 +1481,10 @@ static void domain_destroy_callback(libxl__egc *egc,
 static void destroy_finish_check(libxl__egc *egc,
  libxl__domain_destroy_state *dds);
 
+static void domain_destroy_domid_cb(libxl__egc *egc,
+libxl__ev_child *destroyer,
+pid_t pid, int status);
+
 void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds)
 {
 STATE_AO_GC(dds->ao);
@@ -1567,6 +1571,8 @@ void libxl__destroy_domid(libxl__egc *egc, 
libxl__destroy_domid_state *dis)
 char *pid;
 int rc, dm_present;
 
+libxl__ev_child_init(&dis->destroyer);
+
 rc = libxl_domain_info(ctx, NULL, domid);
 switch(rc) {
 case 0:
@@ -1672,17 +1678,58 @@ static void devices_destroy_cb(libxl__egc *egc,
 
 libxl__unlock_domain_userdata(lock);
 
-rc = xc_domain_destroy(ctx->xch, domid);
-if (rc < 0) {
-LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy 
failed for %d", domid);
+rc = libxl__ev_child_fork(gc, &dis->destroyer, domain_destroy_domid_cb);
+if (rc < 0) goto out;
+if (!rc) { /* child */
+ctx->xch = xc_interface_open(ctx->lg,0,0);
+if (!ctx->xch) goto badchild;
+
+rc = xc_domain_destroy(ctx->xch, domid);
+if (rc < 0) goto badchild;
+_exit(0);
+
+badchild:
+if (errno > 0  && errno < 126) {
+_exit(errno);
+} else {
+LOGE(ERROR,
+ "xc_domain_destroy failed for %d (with difficult errno value %d)",
+ domid, errno);
+_exit(-1);
+}
+}
+LOG(INFO, "forked pid %ld for destroy of domain %d", (long)rc, domid);
+
+return;
+
+out:
+dis->callback(egc, dis, rc);
+return;
+}
+
+static void domain_destroy_domid_cb(libxl__egc *egc,
+libxl__ev_child *destroyer,
+pid_t pid, int status)
+{
+libxl__destroy_domid_state *dis = CONTAINER_OF(destroyer, *dis, destroyer);
+STATE_AO_GC(dis->ao);
+int rc;
+
+if (status) {
+if (WIFEXITED(status) && WEXITSTATUS(status)<126) {
+LOGEV(ERROR, WEXITSTATUS(status),
+  "xc_domain_destroy failed for %"PRIu32"", dis->domid);
+} else {
+libxl_report_child_exitstatus(CTX, XTL_ERROR,
+  "async domain destroy", pid, status);
+}
 rc = ERROR_FAIL;
 goto out;
 }
 rc = 0;
 
-out:
+ out:
 dis->callback(egc, dis, rc);
-return;
 }
 
 int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..28d32ef 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -2957,6 +2957,7 @@ struct libxl__destroy_domid_state {
 libxl__domid_destroy_cb *callback;
 /* private to implementation */
 libxl__devices_remove_state drs;
+libxl__ev_child destroyer;
 };
 
 struct libxl__domain_destroy_state {
-- 
1.8.0.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] OpenStack - Libvirt+Xen CI overview

2015-03-17 Thread Alvin Starr


On 03/10/2015 08:03 AM, Bob Ball wrote:

For the last few weeks Anthony and I have been working on creating a CI 
environment to run against all OpenStack jobs.  We're now in a position where 
we can share the current status, overview of how it works and next steps.  We 
actively want to support involvement in this effort from others with an 
interest in libvirt+Xen's openstack integration.

The CI we have set up is follow the recommendations made by the OpenStack 
official infrastructure maintainers, and reproduces a notable portion of the 
official OpenStack CI environment to run these tests.  Namely this setup is 
using:
- Puppet to deploy the master node
- Zuul to watch for code changes uploaded to review.openstack.org
- Jenkins job builder to create Jenkins job definitions from a YAML file
- Nodepool to automatically create single-use virtual machines in the Rackspace 
public cloud
- Devstack-gate to run Tempest tests in serial

More information on Zuul, JJB, Nodepool and devstack-gate is available through 
http://ci.openstack.org

The current status is that we have a zuul instance monitoring for jobs and 
adding them to the queue of jobs to be run at 
http://zuul.openstack.xenproject.org/

In the background Nodepool provisions virtual machines into a pool of nodes 
ready to be used.  All ready nodes are automatically added to Jenkins 
(https://jenkins.openstack.xenproject.org/), and then Zuul+Jenkins will trigger 
a particular job on a node when one is available.

Logs are then uploaded to Rackspace's Cloud Files with sample logs for a 
passing job at 
http://logs.openstack.xenproject.org/52/162352/3/silent/dsvm-tempest-xen/da3ff30/index.html

I'd like to organise a meeting to walk through the various components of the CI 
with those who are interested, so this is an initial call to find out who is 
interested in finding out more!

Thanks,

Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

I would also love to find out more.


--
Alvin Starr   ||   voice: (905)513-7688
Netvel Inc.   ||   Cell:  (416)806-0133
al...@netvel.net  ||


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH 0/3] x86: misc changes

2015-03-17 Thread Jan Beulich

The main point of the series is really patch 1 (which after consultation
among the security team doesn't appear to represent a security fix);
the other two are just cleanup that I found possible/desirable while
putting together the first one.

1: x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P
2: slightly reduce vm_assist code
3: x86/shadow: pass domain to sh_install_xen_entries_in_lN()

Signed-off-by: Jan Beulich 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h

2015-03-17 Thread Boris Ostrovsky

Add pmu.h header files, move various macros and structures that will be
shared between hypervisor and PV guests to it.

Move MSR banks out of architectural PMU structures to allow for larger sizes
in the future. The banks are allocated immediately after the context and
PMU structures store offsets to them.

While making these updates, also:
* Remove unused vpmu_domain() macro from vpmu.h
* Convert msraddr_to_bitpos() into an inline and make it a little faster by
  realizing that all Intel's PMU-related MSRs are in the lower MSR range.

Signed-off-by: Boris Ostrovsky 
Acked-by: Kevin Tian 
Acked-by: Jan Beulich 
Reviewed-by: Dietmar Hahn 
Tested-by: Dietmar Hahn 
---

Change in v19:
* Moved PMU-related structs in xlat.lst to alphabetical order

 xen/arch/x86/hvm/svm/vpmu.c  |  83 +++--
 xen/arch/x86/hvm/vmx/vpmu_core2.c| 123 +--
 xen/arch/x86/hvm/vpmu.c  |  10 +++
 xen/arch/x86/oprofile/op_model_ppro.c|   6 +-
 xen/include/Makefile |   3 +-
 xen/include/asm-x86/hvm/vmx/vpmu_core2.h |  32 
 xen/include/asm-x86/hvm/vpmu.h   |  16 ++--
 xen/include/public/arch-arm.h|   3 +
 xen/include/public/arch-x86/pmu.h|  91 +++
 xen/include/public/pmu.h |  38 ++
 xen/include/xlat.lst |   4 +
 11 files changed, 275 insertions(+), 134 deletions(-)
 delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h
 create mode 100644 xen/include/public/arch-x86/pmu.h
 create mode 100644 xen/include/public/pmu.h

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 6764070..a8b79df 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -30,10 +30,7 @@
 #include 
 #include 
 #include 
-
-#define F10H_NUM_COUNTERS 4
-#define F15H_NUM_COUNTERS 6
-#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS
+#include 
 
 #define MSR_F10H_EVNTSEL_GO_SHIFT   40
 #define MSR_F10H_EVNTSEL_EN_SHIFT   22
@@ -49,6 +46,9 @@ static const u32 __read_mostly *counters;
 static const u32 __read_mostly *ctrls;
 static bool_t __read_mostly k7_counters_mirrored;
 
+#define F10H_NUM_COUNTERS   4
+#define F15H_NUM_COUNTERS   6
+
 /* PMU Counter MSRs. */
 static const u32 AMD_F10H_COUNTERS[] = {
 MSR_K7_PERFCTR0,
@@ -83,12 +83,14 @@ static const u32 AMD_F15H_CTRLS[] = {
 MSR_AMD_FAM15H_EVNTSEL5
 };
 
-/* storage for context switching */
-struct amd_vpmu_context {
-u64 counters[MAX_NUM_COUNTERS];
-u64 ctrls[MAX_NUM_COUNTERS];
-bool_t msr_bitmap_set;
-};
+/* Use private context as a flag for MSR bitmap */
+#define msr_bitmap_on(vpmu)do {\
+   (vpmu)->priv_context = (void *)-1L; \
+   } while (0)
+#define msr_bitmap_off(vpmu)   do {\
+   (vpmu)->priv_context = NULL;\
+   } while (0)
+#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL)
 
 static inline int get_pmu_reg_type(u32 addr)
 {
@@ -142,7 +144,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 {
 unsigned int i;
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
-struct amd_vpmu_context *ctxt = vpmu->context;
 
 for ( i = 0; i < num_counters; i++ )
 {
@@ -150,14 +151,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v)
 svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE);
 }
 
-ctxt->msr_bitmap_set = 1;
+msr_bitmap_on(vpmu);
 }
 
 static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 {
 unsigned int i;
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
-struct amd_vpmu_context *ctxt = vpmu->context;
 
 for ( i = 0; i < num_counters; i++ )
 {
@@ -165,7 +165,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v)
 svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW);
 }
 
-ctxt->msr_bitmap_set = 0;
+msr_bitmap_off(vpmu);
 }
 
 static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs)
@@ -177,19 +177,22 @@ static inline void context_load(struct vcpu *v)
 {
 unsigned int i;
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
-struct amd_vpmu_context *ctxt = vpmu->context;
+struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters);
+uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
 for ( i = 0; i < num_counters; i++ )
 {
-wrmsrl(counters[i], ctxt->counters[i]);
-wrmsrl(ctrls[i], ctxt->ctrls[i]);
+wrmsrl(counters[i], counter_regs[i]);
+wrmsrl(ctrls[i], ctrl_regs[i]);
 }
 }
 
 static void amd_vpmu_load(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
-struct amd_vpmu_context *ctxt = vpmu->context;
+struct xen_pmu_amd_ctxt *ctxt = vpmu->context;
+uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls);
 
 vpmu_reset(vpmu, VPMU_FROZEN);

[Xen-devel] [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests

2015-03-17 Thread Boris Ostrovsky

Code for initializing/tearing down PMU for PV guests

Signed-off-by: Boris Ostrovsky 
Acked-by: Kevin Tian 
Acked-by: Daniel De Graaf 
---

Changes in v19:
* Keep track of PV(H) VPMU count for non-dom0 VPMUs
* Move vpmu.xenpmu_data test in pvpmu_init() under lock
* Return better error codes in pvpmu_init()

 tools/flask/policy/policy/modules/xen/xen.te |   4 +
 xen/arch/x86/domain.c|   2 +
 xen/arch/x86/hvm/hvm.c   |   1 +
 xen/arch/x86/hvm/svm/svm.c   |   4 +-
 xen/arch/x86/hvm/svm/vpmu.c  |  44 ++
 xen/arch/x86/hvm/vmx/vmx.c   |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c|  79 -
 xen/arch/x86/hvm/vpmu.c  | 121 +--
 xen/common/event_channel.c   |   1 +
 xen/include/asm-x86/hvm/vpmu.h   |   2 +
 xen/include/public/pmu.h |   2 +
 xen/include/public/xen.h |   1 +
 xen/include/xsm/dummy.h  |   3 +
 xen/xsm/flask/hooks.c|   4 +
 xen/xsm/flask/policy/access_vectors  |   2 +
 15 files changed, 226 insertions(+), 48 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
b/tools/flask/policy/policy/modules/xen/xen.te
index 870ff81..73bbe7b 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -120,6 +120,10 @@ domain_comms(dom0_t, dom0_t)
 # Allow all domains to use (unprivileged parts of) the tmem hypercall
 allow domain_type xen_t:xen tmem_op;
 
+# Allow all domains to use PMU (but not to change its settings --- that's what
+# pmu_ctrl is for)
+allow domain_type xen_t:xen2 pmu_use;
+
 ###
 #
 # Domain creation
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 60d9a80..f19087e 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -437,6 +437,8 @@ int vcpu_initialise(struct vcpu *v)
 vmce_init_vcpu(v);
 }
 
+spin_lock_init(&v->arch.vpmu.vpmu_lock);
+
 if ( has_hvm_container_domain(d) )
 {
 rc = hvm_vcpu_initialise(v);
diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
index 4734d71..07ad171 100644
--- a/xen/arch/x86/hvm/hvm.c
+++ b/xen/arch/x86/hvm/hvm.c
@@ -4915,6 +4915,7 @@ static hvm_hypercall_t *const 
pvh_hypercall64_table[NR_hypercalls] = {
 HYPERCALL(hvm_op),
 HYPERCALL(sysctl),
 HYPERCALL(domctl),
+HYPERCALL(xenpmu_op),
 [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation
 };
 
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index b6e77cd..e523d12 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1166,7 +1166,9 @@ static int svm_vcpu_initialise(struct vcpu *v)
 return rc;
 }
 
-vpmu_initialise(v);
+/* PVH's VPMU is initialized via hypercall */
+if ( is_hvm_vcpu(v) )
+vpmu_initialise(v);
 
 svm_guest_osvw_init(v);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index b60ca40..58a0dc4 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -360,17 +360,19 @@ static void amd_vpmu_destroy(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
-if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
-amd_vpmu_unset_msr_bitmap(v);
+if ( has_hvm_container_vcpu(v) )
+{
+if ( is_msr_bitmap_on(vpmu) )
+amd_vpmu_unset_msr_bitmap(v);
 
-xfree(vpmu->context);
-vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED);
+if ( is_hvm_vcpu(v) )
+xfree(vpmu->context);
 
-if ( vpmu_is_set(vpmu, VPMU_RUNNING) )
-{
-vpmu_reset(vpmu, VPMU_RUNNING);
 release_pmu_ownship(PMU_OWNER_HVM);
 }
+
+vpmu->context = NULL;
+vpmu_clear(vpmu);
 }
 
 /* VPMU part of the 'q' keyhandler */
@@ -435,15 +437,19 @@ int svm_vpmu_initialise(struct vcpu *v)
 if ( !counters )
 return -EINVAL;
 
-ctxt = xzalloc_bytes(sizeof(*ctxt) +
- 2 * sizeof(uint64_t) * num_counters);
-if ( !ctxt )
+if ( is_hvm_vcpu(v) )
 {
-printk(XENLOG_G_WARNING "Insufficient memory for PMU, "
-   " PMU feature is unavailable on domain %d vcpu %d.\n",
-   v->vcpu_id, v->domain->domain_id);
-return -ENOMEM;
+ctxt = xzalloc_bytes(sizeof(*ctxt) +
+ 2 * sizeof(uint64_t) * num_counters);
+if ( !ctxt )
+{
+printk(XENLOG_G_WARNING "%pv: Insufficient memory for PMU, "
+   " PMU feature is unavailable\n", v);
+return -ENOMEM;
+}
 }
+else
+ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd;
 
 ctxt->counters = sizeof(*ctxt);
 ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters;
@@ -482,6 +488,16 @@ int _

[Xen-devel] [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific

2015-03-17 Thread Boris Ostrovsky

vpmu structure will be used for both HVM and PV guests. Move it from
hvm_vcpu to arch_vcpu.

Signed-off-by: Boris Ostrovsky 
Acked-by: Jan Beulich 
Reviewed-by: Kevin Tian 
Reviewed-by: Dietmar Hahn 
Tested-by: Dietmar Hahn 
---
 xen/include/asm-x86/domain.h   | 2 ++
 xen/include/asm-x86/hvm/vcpu.h | 3 ---
 xen/include/asm-x86/hvm/vpmu.h | 5 ++---
 3 files changed, 4 insertions(+), 6 deletions(-)

diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h
index 9cdffa8..2686a4f 100644
--- a/xen/include/asm-x86/domain.h
+++ b/xen/include/asm-x86/domain.h
@@ -434,6 +434,8 @@ struct arch_vcpu
 void (*ctxt_switch_from) (struct vcpu *);
 void (*ctxt_switch_to) (struct vcpu *);
 
+struct vpmu_struct vpmu;
+
 /* Virtual Machine Extensions */
 union {
 struct pv_vcpu pv_vcpu;
diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h
index 3d8f4dc..0faf60d 100644
--- a/xen/include/asm-x86/hvm/vcpu.h
+++ b/xen/include/asm-x86/hvm/vcpu.h
@@ -151,9 +151,6 @@ struct hvm_vcpu {
 u32 msr_tsc_aux;
 u64 msr_tsc_adjust;
 
-/* VPMU */
-struct vpmu_struct  vpmu;
-
 union {
 struct arch_vmx_struct vmx;
 struct arch_svm_struct svm;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 83eea7e..82bfa0e 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -31,9 +31,8 @@
 #define VPMU_BOOT_ENABLED 0x1/* vpmu generally enabled. */
 #define VPMU_BOOT_BTS 0x2/* Intel BTS feature wanted. */
 
-#define vcpu_vpmu(vcpu)   (&((vcpu)->arch.hvm_vcpu.vpmu))
-#define vpmu_vcpu(vpmu)   (container_of((vpmu), struct vcpu, \
-  arch.hvm_vcpu.vpmu))
+#define vcpu_vpmu(vcpu)   (&(vcpu)->arch.vpmu)
+#define vpmu_vcpu(vpmu)   container_of((vpmu), struct vcpu, arch.vpmu)
 
 #define MSR_TYPE_COUNTER0
 #define MSR_TYPE_CTRL   1
-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v2 3/5] xen: print online pCPUs and free pCPUs when dumping

2015-03-17 Thread Dario Faggioli

e.g., with  `xl debug-key r', like this:

  (XEN) Online Cpus: 0-15
  (XEN) Free Cpus: 8-15

Also, for each cpupool, print the set of pCPUs it
contains, like this:

  (XEN) Cpupool 0:
  (XEN) Cpus: 0-7
  (XEN) Scheduler: SMP Credit Scheduler (credit)

Signed-off-by: Dario Faggioli 
Cc: Juergen Gross 
Cc: George Dunlap 
Cc: Jan Beulich 
Cc: Keir Fraser 
---
Changes from v1:
 * _print_cpumap() becomes print_cpumap() (i.e., the
   leading '_' was not particularly useful in this
   case), as suggested during review
 * changed the output such as (1) we only print the
   maps, not the number of elements, and (2) we avoid
   printing the free cpus map when empty
 * improved the changelog
---
I'm not including any Reviewed-by / Acked-by tag,
since the patch changed.
---
 xen/common/cpupool.c |   12 
 1 file changed, 12 insertions(+)

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c
index cd6aab9..812a2f9 100644
--- a/xen/common/cpupool.c
+++ b/xen/common/cpupool.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #define for_each_cpupool(ptr)\
@@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op)
 return ret;
 }
 
+static void print_cpumap(const char *str, const cpumask_t *map)
+{
+cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map);
+printk("%s: %s\n", str, keyhandler_scratch);
+}
+
 void dump_runq(unsigned char key)
 {
 unsigned longflags;
@@ -671,12 +678,17 @@ void dump_runq(unsigned char key)
 sched_smt_power_savings? "enabled":"disabled");
 printk("NOW=0x%08X%08X\n",  (u32)(now>>32), (u32)now);
 
+print_cpumap("Online Cpus", &cpu_online_map);
+if ( cpumask_weight(&cpupool_free_cpus) )
+print_cpumap("Free Cpus", &cpupool_free_cpus);
+
 printk("Idle cpupool:\n");
 schedule_dump(NULL);
 
 for_each_cpupool(c)
 {
 printk("Cpupool %d:\n", (*c)->cpupool_id);
+print_cpumap("Cpus", (*c)->cpu_valid);
 schedule_dump(*c);
 }
 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags

2015-03-17 Thread Boris Ostrovsky

Add runtime interface for setting PMU mode and flags. Three main modes are
provided:
* XENPMU_MODE_OFF:  PMU is not virtualized
* XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts.
* XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0
  can profile itself and the hypervisor.

Note that PMU modes are different from what can be provided at Xen's boot line
with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF.
Any other value, on the other hand, will cause VPMU mode to be set to
XENPMU_MODE_SELF during boot.

For feature flags only Intel's BTS is currently supported.

Mode and flags are set via HYPERVISOR_xenpmu_op hypercall.

Signed-off-by: Boris Ostrovsky 
Acked-by: Daniel De Graaf 
---

Changes in v19:
* Keep track of active vpmu count and allow certain mode changes only when the 
count
  is zero
* Drop vpmu_unload routines
* Revert to to using opt_vpmu_enabled
* Changes to oprofile code are no longer needed
* Changes to vmcs.c are no longer needed
* Simplified vpmu_switch_from/to inlines

 tools/flask/policy/policy/modules/xen/xen.te |   3 +
 xen/arch/x86/domain.c|   4 +-
 xen/arch/x86/hvm/svm/vpmu.c  |   4 +-
 xen/arch/x86/hvm/vmx/vpmu_core2.c|  10 +-
 xen/arch/x86/hvm/vpmu.c  | 155 +--
 xen/arch/x86/x86_64/compat/entry.S   |   4 +
 xen/arch/x86/x86_64/entry.S  |   4 +
 xen/include/asm-x86/hvm/vpmu.h   |  27 +++--
 xen/include/public/pmu.h |  45 
 xen/include/public/xen.h |   1 +
 xen/include/xen/hypercall.h  |   4 +
 xen/include/xlat.lst |   1 +
 xen/include/xsm/dummy.h  |  15 +++
 xen/include/xsm/xsm.h|   6 ++
 xen/xsm/dummy.c  |   1 +
 xen/xsm/flask/hooks.c|  18 
 xen/xsm/flask/policy/access_vectors  |   2 +
 17 files changed, 279 insertions(+), 25 deletions(-)

diff --git a/tools/flask/policy/policy/modules/xen/xen.te 
b/tools/flask/policy/policy/modules/xen/xen.te
index c0128aa..870ff81 100644
--- a/tools/flask/policy/policy/modules/xen/xen.te
+++ b/tools/flask/policy/policy/modules/xen/xen.te
@@ -68,6 +68,9 @@ allow dom0_t xen_t:xen2 {
 resource_op
 psr_cmt_op
 };
+allow dom0_t xen_t:xen2 {
+pmu_ctrl
+};
 allow dom0_t xen_t:mmu memorymap;
 
 # Allow dom0 to use these domctls on itself. For domctls acting on other
diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index 21f0766..60d9a80 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1536,7 +1536,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 if ( is_hvm_vcpu(prev) )
 {
 if (prev != next)
-vpmu_save(prev);
+vpmu_switch_from(prev);
 
 if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
 pt_save_timer(prev);
@@ -1581,7 +1581,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next)
 
 if (is_hvm_vcpu(next) && (prev != next) )
 /* Must be done with interrupts enabled */
-vpmu_load(next);
+vpmu_switch_to(next);
 
 context_saved(prev);
 
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index a8b79df..481ea7b 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -472,14 +472,14 @@ struct arch_vpmu_ops amd_vpmu_ops = {
 .arch_vpmu_dump = amd_vpmu_dump
 };
 
-int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int svm_vpmu_initialise(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 uint8_t family = current_cpu_data.x86;
 int ret = 0;
 
 /* vpmu enabled? */
-if ( !vpmu_flags )
+if ( vpmu_mode == XENPMU_MODE_OFF )
 return 0;
 
 switch ( family )
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index c2405bf..6280644 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -708,13 +708,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs 
*regs)
 return 1;
 }
 
-static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+static int core2_vpmu_initialise(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 u64 msr_content;
 static bool_t ds_warned;
 
-if ( !(vpmu_flags & VPMU_BOOT_BTS) )
+if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) )
 goto func_out;
 /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */
 while ( boot_cpu_has(X86_FEATURE_DS) )
@@ -826,7 +826,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = {
 .do_cpuid = core2_no_vpmu_do_cpuid,
 };
 
-int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags)
+int vmx_vpmu_initialise(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 uint8_t family = current_cpu_data.x86;
@@ -834,7 +8

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 14:57 +, Wei Liu wrote:
> On Tue, Mar 17, 2015 at 02:54:09PM +, Ian Campbell wrote:
> > On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote:
> > > 2. The ability to access files in Dom0. That will be used to write to /
> > >read from QEMU state file.
> > 
> > This requirement is not as broad as you make it sound.
> > 
> 
> Yes. You're right.
> 
> > All which is really required is the ability to slurp in or write out a
> > blob of bytes to a service running in a control domain, not actual
> 
> This is more accurate.

It's probably also worth also mentioning that it is a streaming read or
write, no need to support seek or such things.

> > ability to read/write files in dom0 (which would need careful security
> > consideration!).
> > 
> > For the old qemu-traditional stubdom for example this is implemented as
> > a pair of console devices (one r/o for restore + one w/o for save) which
> > are setup by the toolstack at start of day and pre-plumbed into two
> > temporary files.
> > 
> 
> Unfortunately I don't think that hack in mini-os is upstreamable in rump
> kernel.

The mini-os implementation is hacky, it is ultimately just a way of
implementing open("/dev/hvc1", "r") without actually having to have all
of that sort of thing really.

But the concept of "open a r/o device and read from it" (or vice versa)
doesn't seem to be too bad to me and I expected rumpkernels to have some
sort of concept like this somewhere.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode

2015-03-17 Thread Boris Ostrovsky

Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged
domain (dom0) profile both itself (and the hypervisor) and the guests. While
this mode is on profiling in guests is disabled.

Signed-off-by: Boris Ostrovsky 
---
Changes in v19:
* Slightly different mode changing logic in xenpmu_op() since we no longer
  allow mode changes while VPMUs are active

 xen/arch/x86/hvm/vpmu.c  | 34 +-
 xen/arch/x86/traps.c | 13 +
 xen/include/public/pmu.h |  3 +++
 3 files changed, 41 insertions(+), 9 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index beed956..71c5063 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -111,7 +111,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
 const struct arch_vpmu_ops *ops;
 int ret = 0;
 
-if ( vpmu_mode == XENPMU_MODE_OFF )
+if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+ ((vpmu_mode & XENPMU_MODE_ALL) &&
+  !is_hardware_domain(current->domain)) )
 goto nop;
 
 curr = current;
@@ -166,8 +168,12 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 struct vcpu *sampled = current, *sampling;
 struct vpmu_struct *vpmu;
 
-/* dom0 will handle interrupt for special domains (e.g. idle domain) */
-if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED )
+/*
+ * dom0 will handle interrupt for special domains (e.g. idle domain) or,
+ * in XENPMU_MODE_ALL, for everyone.
+ */
+if ( (vpmu_mode & XENPMU_MODE_ALL) ||
+ (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) )
 {
 sampling = choose_hwdom_vcpu();
 if ( !sampling )
@@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 sampling = sampled;
 
 vpmu = vcpu_vpmu(sampling);
-if ( !is_hvm_vcpu(sampling) )
+if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) )
 {
 /* PV(H) guest */
 const struct cpu_user_regs *cur_regs;
 uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags;
-uint32_t domid = DOMID_SELF;
+uint32_t domid;
 
 if ( !vpmu->xenpmu_data )
 return;
 
 if ( is_pvh_vcpu(sampling) &&
+ !(vpmu_mode & XENPMU_MODE_ALL) &&
  !vpmu->arch_vpmu_ops->do_interrupt(regs) )
 return;
 
@@ -204,6 +211,11 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 else
 *flags = PMU_SAMPLE_PV;
 
+if ( sampled == sampling )
+domid = DOMID_SELF;
+else
+domid = sampled->domain->domain_id;
+
 /* Store appropriate registers in xenpmu_data */
 /* FIXME: 32-bit PVH should go here as well */
 if ( is_pv_32bit_vcpu(sampling) )
@@ -232,7 +244,8 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs)
 
 if ( (vpmu_mode & XENPMU_MODE_SELF) )
 cur_regs = guest_cpu_user_regs();
-else if ( !guest_mode(regs) && 
is_hardware_domain(sampling->domain) )
+else if ( !guest_mode(regs) &&
+  is_hardware_domain(sampling->domain) )
 {
 cur_regs = regs;
 domid = DOMID_XEN;
@@ -508,7 +521,8 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t 
*params)
 struct page_info *page;
 uint64_t gfn = params->val;
 
-if ( vpmu_mode == XENPMU_MODE_OFF )
+if ( (vpmu_mode == XENPMU_MODE_OFF) ||
+ ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) )
 return -EINVAL;
 
 if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) ||
@@ -627,12 +641,14 @@ long do_xenpmu_op(unsigned int op, 
XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg)
 {
 case XENPMU_mode_set:
 {
-if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) ||
+if ( (pmu_params.val &
+  ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) ||
  (hweight64(pmu_params.val) > 1) )
 return -EINVAL;
 
 /* 32-bit dom0 can only sample itself. */
-if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) )
+if ( is_pv_32bit_vcpu(current) &&
+ (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) )
 return -EINVAL;
 
 spin_lock(&vpmu_lock);
diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c
index 1eb7bb4..8a40deb 100644
--- a/xen/arch/x86/traps.c
+++ b/xen/arch/x86/traps.c
@@ -2653,6 +2653,10 @@ static int emulate_privileged_op(struct cpu_user_regs 
*regs)
 case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5:
 if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) )
 {
+if ( (vpmu_mode & XENPMU_MODE_ALL) &&
+ !is_hardware_domain(v->domain) )
+break;
+
 if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) )
 goto fail;

[Xen-devel] [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr

2015-03-17 Thread Boris Ostrovsky

The two routines share most of their logic.

Signed-off-by: Boris Ostrovsky 
---
Changes in v19:
* const-ified arch_vpmu_ops in vpmu_do_wrmsr
* non-changes:
   - kept 'current' as a non-initializer to avoid unnecessary initialization
 in the (common) non-VPMU case
   - kept 'nop' label since there are multiple dissimilar cases that can cause
 a non-emulation of VPMU access

 xen/arch/x86/hvm/vpmu.c| 76 +-
 xen/include/asm-x86/hvm/vpmu.h | 14 ++--
 2 files changed, 42 insertions(+), 48 deletions(-)

diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index c287d8b..beed956 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -103,63 +103,47 @@ void vpmu_lvtpc_update(uint32_t val)
 apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc);
 }
 
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported)
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+uint64_t supported, bool_t is_write)
 {
-struct vcpu *curr = current;
+struct vcpu *curr;
 struct vpmu_struct *vpmu;
+const struct arch_vpmu_ops *ops;
+int ret = 0;
 
 if ( vpmu_mode == XENPMU_MODE_OFF )
-return 0;
+goto nop;
 
+curr = current;
 vpmu = vcpu_vpmu(curr);
-if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr )
-{
-int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported);
-
-/*
- * We may have received a PMU interrupt during WRMSR handling
- * and since do_wrmsr may load VPMU context we should save
- * (and unload) it again.
- */
-if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
- (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-{
-vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-}
-return ret;
-}
-
-return 0;
-}
-
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
-{
-struct vcpu *curr = current;
-struct vpmu_struct *vpmu;
+ops = vpmu->arch_vpmu_ops;
+if ( !ops )
+goto nop;
+
+if ( is_write && ops->do_wrmsr )
+ret = ops->do_wrmsr(msr, *msr_content, supported);
+else if ( !is_write && ops->do_rdmsr )
+ret = ops->do_rdmsr(msr, msr_content);
+else
+goto nop;
 
-if ( vpmu_mode == XENPMU_MODE_OFF )
+/*
+ * We may have received a PMU interrupt while handling MSR access
+ * and since do_wr/rdmsr may load VPMU context we should save
+ * (and unload) it again.
+ */
+if ( !is_hvm_vcpu(curr) &&
+ vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
 {
-*msr_content = 0;
-return 0;
+vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
+ops->arch_vpmu_save(curr);
+vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
 }
 
-vpmu = vcpu_vpmu(curr);
-if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr )
-{
-int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content);
+return ret;
 
-if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data &&
- (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) )
-{
-vpmu_set(vpmu, VPMU_CONTEXT_SAVE);
-vpmu->arch_vpmu_ops->arch_vpmu_save(curr);
-vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED);
-}
-return ret;
-}
-else
+ nop:
+if ( !is_write )
 *msr_content = 0;
 
 return 0;
diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h
index 642a4b7..63851a7 100644
--- a/xen/include/asm-x86/hvm/vpmu.h
+++ b/xen/include/asm-x86/hvm/vpmu.h
@@ -99,8 +99,8 @@ static inline bool_t vpmu_are_all_set(const struct 
vpmu_struct *vpmu,
 }
 
 void vpmu_lvtpc_update(uint32_t val);
-int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported);
-int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content);
+int vpmu_do_msr(unsigned int msr, uint64_t *msr_content,
+uint64_t supported, bool_t is_write);
 void vpmu_do_interrupt(struct cpu_user_regs *regs);
 void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx,
unsigned int *ecx, unsigned int *edx);
@@ -110,6 +110,16 @@ void vpmu_save(struct vcpu *v);
 void vpmu_load(struct vcpu *v);
 void vpmu_dump(struct vcpu *v);
 
+static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
+uint64_t supported)
+{
+return vpmu_do_msr(msr, &msr_content, supported, 1);
+}
+static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
+{
+return vpmu_do_msr(msr, msr_content, 0, 0);
+}
+
 extern int acquire_pmu_ownership(int pmu_ownership);
 extern void release_pmu_ownership(int pmu_ownership);
 
-- 
1.8.1.4


__

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Anthony PERARD

On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote:
> I've now successfully built QEMU upstream with rump kernel. However to
> make it fully functional as a stubdom, there are some missing pieces to
> be added in.
> 
> 1. The ability to access QMP socket (a unix socket) from Dom0. That
>will be used to issue command to QEMU.

The QMP "socket" does not needs to be a unix socket. It can be any of
those (from qemu --help):
Character device options:
-chardev null,id=id[,mux=on|off]
-chardev 
socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds]
 [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp)
-chardev 
socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off]
 (unix)
-chardev udp,id=id[,host=host],port=port[,localaddr=localaddr]
 [,localport=localport][,ipv4][,ipv6][,mux=on|off]
-chardev msmouse,id=id[,mux=on|off]
-chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]]
 [,mux=on|off]
-chardev ringbuf,id=id[,size=size]
-chardev file,id=id,path=path[,mux=on|off]
-chardev pipe,id=id,path=path[,mux=on|off]
-chardev pty,id=id[,mux=on|off]
-chardev stdio,id=id[,mux=on|off][,signal=on|off]
-chardev serial,id=id,path=path[,mux=on|off]
-chardev tty,id=id,path=path[,mux=on|off]
-chardev parallel,id=id,path=path[,mux=on|off]
-chardev parport,id=id,path=path[,mux=on|off]
-chardev spicevmc,id=id,name=name[,debug=debug]
-chardev spiceport,id=id,name=name[,debug=debug]

> 2. The ability to access files in Dom0. That will be used to write to /
>read from QEMU state file.

To save a QEMU state (write), we do use a filename. But I guest we could
expand the QMP command (xen-save-devices-state) to use something else, if
it's easier.

To restore, we provide a file descriptor from libxl to QEMU, with the fd on
the file that contain the state we want to restore. But there are a few
other way to load a state (from qemu.git/docs/migration.txt):
- tcp migration: do the migration using tcp sockets
- unix migration: do the migration using unix sockets
- exec migration: do the migration using the stdin/stdout through a process.
- fd migration: do the migration using an file descriptor that is
  passed to QEMU.  QEMU doesn't care how this file descriptor is opened.

-- 
Anthony PERARD

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests

2015-03-17 Thread Boris Ostrovsky

Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops
for VCPU are not initialized (which is the case, for example, for PV guests that
are not "VPMU-enlightened") access to MSRs will return failure.

Dump VPMU state for all domains (HVM and PV) when requested.

Signed-off-by: Boris Ostrovsky 
Acked-by: Jan Beulich 
Acked-by: Kevin Tian 
Reviewed-by: Dietmar Hahn 
Tested-by: Dietmar Hahn 
---
 xen/arch/x86/domain.c |  3 +--
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 +++--
 xen/arch/x86/hvm/vpmu.c   |  3 +++
 xen/arch/x86/traps.c  | 51 +--
 4 files changed, 95 insertions(+), 11 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index c7f8210..a48d824 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -2065,8 +2065,7 @@ void arch_dump_vcpu_info(struct vcpu *v)
 {
 paging_dump_vcpu_info(v);
 
-if ( is_hvm_vcpu(v) )
-vpmu_dump(v);
+vpmu_dump(v);
 }
 
 void domain_cpuid(
diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c 
b/xen/arch/x86/hvm/vmx/vpmu_core2.c
index d10e3e7..66d7bc0 100644
--- a/xen/arch/x86/hvm/vmx/vpmu_core2.c
+++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -299,12 +300,18 @@ static inline void __core2_vpmu_save(struct vcpu *v)
 rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]);
 for ( i = 0; i < arch_pmc_cnt; i++ )
 rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter);
+
+if ( !has_hvm_container_vcpu(v) )
+rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status);
 }
 
 static int core2_vpmu_save(struct vcpu *v)
 {
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 
+if ( !has_hvm_container_vcpu(v) )
+wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0);
+
 if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) )
 return 0;
 
@@ -342,6 +349,13 @@ static inline void __core2_vpmu_load(struct vcpu *v)
 wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl);
 wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area);
 wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable);
+
+if ( !has_hvm_container_vcpu(v) )
+{
+wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl);
+core2_vpmu_cxt->global_ovf_ctrl = 0;
+wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
+}
 }
 
 static void core2_vpmu_load(struct vcpu *v)
@@ -442,7 +456,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int 
*type, int *index)
 static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content,
uint64_t supported)
 {
-u64 global_ctrl;
 int i, tmp;
 int type = -1, index = -1;
 struct vcpu *v = current;
@@ -486,7 +499,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 switch ( msr )
 {
 case MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+if ( msr_content & ~(0xC000 |
+ (((1ULL << fixed_pmc_cnt) - 1) << 32) |
+ ((1ULL << arch_pmc_cnt) - 1)) )
+return 1;
 core2_vpmu_cxt->global_status &= ~msr_content;
+wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content);
 return 0;
 case MSR_CORE_PERF_GLOBAL_STATUS:
 gdprintk(XENLOG_INFO, "Can not write readonly MSR: "
@@ -514,14 +532,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n");
 return 0;
 case MSR_CORE_PERF_GLOBAL_CTRL:
-global_ctrl = msr_content;
+core2_vpmu_cxt->global_ctrl = msr_content;
 break;
 case MSR_CORE_PERF_FIXED_CTR_CTRL:
 if ( msr_content &
  ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) )
 return 1;
 
-vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+if ( has_hvm_container_vcpu(v) )
+vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+   &core2_vpmu_cxt->global_ctrl);
+else
+rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32);
 if ( msr_content != 0 )
 {
@@ -546,7 +568,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 if ( msr_content & (~((1ull << 32) - 1)) )
 return 1;
 
-vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl);
+if ( has_hvm_container_vcpu(v) )
+vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL,
+   &core2_vpmu_cxt->global_ctrl);
+else
+rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl);
 
 if ( msr_content & (1ULL <

[Xen-devel] [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called

2015-03-17 Thread Boris Ostrovsky

We don't need to try to destroy it since it can't be already allocated at the
time we try to initialize it.

Signed-off-by: Boris Ostrovsky 
Suggested-by: Andrew Cooper 
---

Changes in v19:
* Removed unnecesary test for VPMU_CONTEXT_ALLOCATED in svm/vpmu.c

 xen/arch/x86/hvm/svm/vpmu.c | 3 ---
 xen/arch/x86/hvm/vpmu.c | 5 +
 2 files changed, 1 insertion(+), 7 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 64dc167..6764070 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -359,9 +359,6 @@ static int amd_vpmu_initialise(struct vcpu *v)
 struct vpmu_struct *vpmu = vcpu_vpmu(v);
 uint8_t family = current_cpu_data.x86;
 
-if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-return 0;
-
 if ( counters == NULL )
 {
  switch ( family )
diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c
index 0e6b6c0..c3273ee 100644
--- a/xen/arch/x86/hvm/vpmu.c
+++ b/xen/arch/x86/hvm/vpmu.c
@@ -236,10 +236,7 @@ void vpmu_initialise(struct vcpu *v)
 if ( is_pvh_vcpu(v) )
 return;
 
-if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) )
-vpmu_destroy(v);
-vpmu_clear(vpmu);
-vpmu->context = NULL;
+ASSERT(!vpmu->flags && !vpmu->context);
 
 switch ( vendor )
 {
-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers

2015-03-17 Thread Boris Ostrovsky

With this patch return value of 1 of vpmu_do_msr() will now indicate whether an
error was encountered during MSR processing (instead of stating that the access
was to a VPMU register).

As part of this patch we also check for validity of certain MSR accesses right
when we determine which register is being written, as opposed to postponing this
until later.

Signed-off-by: Boris Ostrovsky 
Acked-by: Kevin Tian 
Reviewed-by: Dietmar Hahn 
Tested-by: Dietmar Hahn 
---
 xen/arch/x86/hvm/svm/svm.c|  6 ++-
 xen/arch/x86/hvm/svm/vpmu.c   |  6 +--
 xen/arch/x86/hvm/vmx/vmx.c| 24 +---
 xen/arch/x86/hvm/vmx/vpmu_core2.c | 82 ++-
 4 files changed, 55 insertions(+), 63 deletions(-)

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index e523d12..4fe36e9 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -1709,7 +1709,8 @@ static int svm_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 case MSR_AMD_FAM15H_EVNTSEL3:
 case MSR_AMD_FAM15H_EVNTSEL4:
 case MSR_AMD_FAM15H_EVNTSEL5:
-vpmu_do_rdmsr(msr, msr_content);
+if ( vpmu_do_rdmsr(msr, msr_content) )
+goto gpf;
 break;
 
 case MSR_AMD64_DR0_ADDRESS_MASK:
@@ -1860,7 +1861,8 @@ static int svm_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 case MSR_AMD_FAM15H_EVNTSEL3:
 case MSR_AMD_FAM15H_EVNTSEL4:
 case MSR_AMD_FAM15H_EVNTSEL5:
-vpmu_do_wrmsr(msr, msr_content, 0);
+if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+goto gpf;
 break;
 
 case MSR_IA32_MCx_MISC(4): /* Threshold register */
diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c
index 58a0dc4..474d0db 100644
--- a/xen/arch/x86/hvm/svm/vpmu.c
+++ b/xen/arch/x86/hvm/svm/vpmu.c
@@ -305,7 +305,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) )
 {
 if ( !acquire_pmu_ownership(PMU_OWNER_HVM) )
-return 1;
+return 0;
 vpmu_set(vpmu, VPMU_RUNNING);
 
 if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) )
@@ -335,7 +335,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t 
msr_content,
 
 /* Write to hw counters */
 wrmsrl(msr, msr_content);
-return 1;
+return 0;
 }
 
 static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content)
@@ -353,7 +353,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t 
*msr_content)
 
 rdmsrl(msr, *msr_content);
 
-return 1;
+return 0;
 }
 
 static void amd_vpmu_destroy(struct vcpu *v)
diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
index 83b740a..206e50d 100644
--- a/xen/arch/x86/hvm/vmx/vmx.c
+++ b/xen/arch/x86/hvm/vmx/vmx.c
@@ -2127,12 +2127,17 @@ static int vmx_msr_read_intercept(unsigned int msr, 
uint64_t *msr_content)
 *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL |
MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL;
 /* Perhaps vpmu will change some bits. */
+/* FALLTHROUGH */
+case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3):
+case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+case MSR_IA32_PEBS_ENABLE:
+case MSR_IA32_DS_AREA:
 if ( vpmu_do_rdmsr(msr, msr_content) )
-goto done;
+goto gp_fault;
 break;
 default:
-if ( vpmu_do_rdmsr(msr, msr_content) )
-break;
 if ( passive_domain_do_rdmsr(msr, msr_content) )
 goto done;
 switch ( long_mode_do_msr_read(msr, msr_content) )
@@ -2308,7 +2313,7 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 if ( msr_content & ~supported )
 {
 /* Perhaps some other bits are supported in vpmu. */
-if ( !vpmu_do_wrmsr(msr, msr_content, supported) )
+if ( vpmu_do_wrmsr(msr, msr_content, supported) )
 break;
 }
 if ( msr_content & IA32_DEBUGCTLMSR_LBR )
@@ -2336,9 +2341,16 @@ static int vmx_msr_write_intercept(unsigned int msr, 
uint64_t msr_content)
 if ( !nvmx_msr_write_intercept(msr, msr_content) )
 goto gp_fault;
 break;
+case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7):
+case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7):
+case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2:
+case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL:
+case MSR_IA32_PEBS_ENABLE:
+case MSR_IA32_DS_AREA:
+ if ( vpmu_do_wrmsr(msr, msr_content, 0) )
+goto gp_fault;
+break;
 default:
-if ( vpmu_do_wrmsr(msr, msr_content, 0) )
-return X86EMUL_OKAY;
 if ( passive_domain_do_wrmsr(msr, msr_content) )
 return X86EMUL_O

[Xen-devel] [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch

2015-03-17 Thread Boris Ostrovsky

Save VPMU state during context switch for both HVM and PV(H) guests.

A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible
for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs
is_running to be correctly set/cleared. To prepare for that, call 
context_saved()
before vpmu_switch_to() is executed. (Note that while this change could have
been dalayed until that later patch, the changes are harmless to existing code
and so we do it here)

Signed-off-by: Boris Ostrovsky 
---
Changes in v19:
* Adjusted for new vpmu_switch_to/from interface

 xen/arch/x86/domain.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
index f19087e..c7f8210 100644
--- a/xen/arch/x86/domain.c
+++ b/xen/arch/x86/domain.c
@@ -1533,17 +1533,14 @@ void context_switch(struct vcpu *prev, struct vcpu 
*next)
 }
 
 if ( prev != next )
-_update_runstate_area(prev);
-
-if ( is_hvm_vcpu(prev) )
 {
-if (prev != next)
-vpmu_switch_from(prev);
-
-if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) )
-pt_save_timer(prev);
+_update_runstate_area(prev);
+vpmu_switch_from(prev);
 }
 
+if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) )
+pt_save_timer(prev);
+
 local_irq_disable();
 
 set_current(next);
@@ -1581,15 +1578,16 @@ void context_switch(struct vcpu *prev, struct vcpu 
*next)
!is_hardware_domain(next->domain));
 }
 
-if (is_hvm_vcpu(next) && (prev != next) )
-/* Must be done with interrupts enabled */
-vpmu_switch_to(next);
-
 context_saved(prev);
 
 if ( prev != next )
+{
 _update_runstate_area(next);
 
+/* Must be done with interrupts enabled */
+vpmu_switch_to(next);
+}
+
 /* Ensure that the vcpu has an up-to-date time base. */
 update_vcpu_system_time(next);
 
-- 
1.8.1.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Wei Liu

On Tue, Mar 17, 2015 at 02:54:09PM +, Ian Campbell wrote:
> On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote:
> > 2. The ability to access files in Dom0. That will be used to write to /
> >read from QEMU state file.
> 
> This requirement is not as broad as you make it sound.
> 

Yes. You're right.

> All which is really required is the ability to slurp in or write out a
> blob of bytes to a service running in a control domain, not actual

This is more accurate.

> ability to read/write files in dom0 (which would need careful security
> consideration!).
> 
> For the old qemu-traditional stubdom for example this is implemented as
> a pair of console devices (one r/o for restore + one w/o for save) which
> are setup by the toolstack at start of day and pre-plumbed into two
> temporary files.
> 

Unfortunately I don't think that hack in mini-os is upstreamable in rump
kernel.

Wei.

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [qemu-upstream-4.3-testing test] 36494: trouble: pass/preparing

2015-03-17 Thread xen . org

flight 36494 qemu-upstream-4.3-testing running [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36494/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-winxpsp3-vcpus1  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemuu-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-qemut-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemuu-ovmf-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-pair  2 hosts-allocate   running [st=running!]
 test-amd64-i386-rhel6hvm-amd  2 hosts-allocate   running [st=running!]
 test-amd64-i386-rhel6hvm-intel  2 hosts-allocate running [st=running!]
 test-amd64-i386-freebsd10-i386  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-i386-xend-winxpsp3  2 hosts-allocate  running [st=running!]
 test-amd64-i386-pv2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-freebsd10-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-libvirt   2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-xend-qemut-winxpsp3  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-xl-qemut-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-xl2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-win7-amd64  2 hosts-allocate  running [st=running!]
 test-amd64-i386-qemut-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-libvirt  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-ovmf-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-sedf-pin  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-sedf  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-pcipt-intel  2 hosts-allocaterunning [st=running!]
 test-amd64-amd64-pv   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-multivcpu  2 hosts-allocate  running [st=running!]
 test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-credit2   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-pair 2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-win7-amd64  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-qemuu-win7-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemut-win7-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-winxpsp3  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemut-winxpsp3  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-qemuu-winxpsp3  2 hosts-allocate running [st=running!]

version targeted for testing:
 qemuuab689a89ec47b2e1c964c57bea7da68f8ddf89fd
baseline version:
 qemuu580b1d06aa3eed3ae9c12b4225a1ea1c192ab119


People who touched revisions under test:
  Andreas FÃ¤rber 
  Anthony Liguori 
  Asias He 
  Benoit Canet 
  BenoÃ®t Canet 
  Gerd Hoffmann 
  Juan Quintela 
  Kevin Wolf 
  Michael Roth 
  Michael S. Tsirkin 
  Paolo Bonzini 
  Peter Maydell 
  Petr Matousek 
  Stefan Hajnoczi 
  Stefano Stabellini 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  preparing
 test-amd64-i386-xl   preparing
 test-amd64-i386-rhel6hvm-amd preparing
 test-amd64-i386-qemut-rhel6hvm-amd   preparing
 test-amd64-i386-qemuu-rhel6hvm-amd   preparing
 test-amd64-amd64-xl-qemut-debianhvm-amd64preparing
 test-amd64-i386-xl-qemut-debianh

Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote:
> 
> Tuesday, March 17, 2015, 9:18:32 AM, you wrote:
> 
>  On 16.03.15 at 18:59,  wrote:
> >> Hence was wondering if it would just be easier to put
> >> this patch in (see above) - with the benfit that folks have
> >> an faster interrupt passthrough experience and then I work on another
> >> variant of this with tristate cmpxchg and ->mapping atomic counter.
> 
> > Considering how long this issue has been pending I think we really
> > need to get _something_ in (or revert); if this something is the
> > patch in its most recent form, so be it (even if maybe not the
> > simplest of all possible variants). So please submit as a proper non-
> > RFC patch.
> 
> > Jan
> 
> I'm still running with this first simple stopgap patch from Konrad,
> and it has worked fine for me since.

I believe the patch that Sander and Malcom had been running is the best
candidate.

The other ones I had been fiddling with - such as the one attached
here - I cannot make myself comfortable that it will not hit
a dead-lock. On Intel hardware the softirq is called
from the vmx_resume - which means that the whole 'interrupt guest' and
deliever the event code happens during the VMEXIT to VMENTER time. But that
does not preclude another interrupt destined for this same vCPU
to come right in as we are progressing through the softirqs - and 
dead-lock: in the vmx_resume stack we are in hvm_dirq_assist
(called from dpci_softirq) and haven't cleared the STATE_SHED, while in
the IRQ stack we spin in the raise_sofitrq_for for the STATE_SCHED to be 
cleared.

An dead-lock avoidance could be added to save the CPU value of
the softirq that is executing the dpci. And then 'raise_softirq_for'
can check that and bail out if (smp_processor_id == dpci_pirq->cpu).
Naturlly this means being very careful _where_ we initialize
the 'cpu' to -1, etc - which brings back to carefully work
out the corner cases and make sure we do the right thing - which can
take time.

The re-using the 'dpci' on the per-cpu list is doing the same
exact thing that older tasklet code was doing. That is :
If the function assigned to the tasklet was running  - the softirq
that ran said function (hvm_dirq_assist) would be responsible for
putting the tasklet back on the per-cpu list. This would allow
to have an running tasklet and an 'to-be-scheduled' tasklet
at the same time. 

And that is what we need. I will post an proper patch
and also add Tested-by from Malcom and Sander on it - as it did
fix their test-cases and is unmodified (except an updated
comment) from what theytested in 2014.

> 
> I will see if this new one also "works-for-me", somewhere today :-)
> 
> --
> Sander
> 
> 
> 
> diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c
> index ae050df..d1421b0 100644
> --- a/xen/drivers/passthrough/io.c
> +++ b/xen/drivers/passthrough/io.c
> @@ -804,7 +804,18 @@ static void dpci_softirq(void)
>  d = pirq_dpci->dom;
>  smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */
>  if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
> -BUG();
> +{
> +unsigned long flags;
> +
> +/* Put back on the list and retry. */
> +local_irq_save(flags);
> +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list));
> +local_irq_restore(flags);
> +
> +raise_softirq(HVM_DPCI_SOFTIRQ);
> +continue;
> +}
> +
>  /*
>   * The one who clears STATE_SCHED MUST refcount the domain.
>   */
> 
>From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk 
Date: Tue, 17 Mar 2015 09:46:09 -0400
Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has
 been cleared.

There is race when we clear the STATE_SCHED in the softirq
- which allows the 'raise_softirq_for' (on another CPU)
to schedule the dpci.

Specifically this can happen whenthe other CPU receives
an interrupt, calls 'raise_softirq_for', and puts the dpci
on its per-cpu list (same dpci structure).

There would be two 'dpci_softirq' running at the same time
(on different CPUs) where on one CPU it would be executing
hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN)
and on the other CPU it is trying to call:

   if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) )
BUG();

Since STATE_RUN is already set it would end badly.

The reason we can get his with this is when an interrupt
affinity is set over multiple CPUs.

Potential solutions:

a) Instead of the BUG() we can put the dpci back on the per-cpu
list to deal with later (when the softirq are activated again).
This putting the 'dpci' back on the per-cpu list is an spin
until the bad condition clears.

b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for
to detect for 'STATE_RUN' bit being set and schedule the dpci
in a more safe manner (delay

Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct

2015-03-17 Thread Felipe Franciosi

Hi Bob,

> -Original Message-
> From: Bob Liu [mailto:bob@oracle.com]
> Sent: 17 March 2015 07:00
> To: Felipe Franciosi
> Cc: Konrad Rzeszutek Wilk; Roger Pau Monne; David Vrabel; xen-
> de...@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com;
> h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de
> Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new
> struct
> 
> Hi Felipe,
> 
> On 03/06/2015 06:30 PM, Felipe Franciosi wrote:
> >> -Original Message-
> >> From: Bob Liu [mailto:bob@oracle.com]
> >> Sent: 05 March 2015 00:47
> >> To: Konrad Rzeszutek Wilk
> >> Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel;
> >> xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com;
> >> h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de
> >> Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to
> >> an new struct
> >>
> >>
> >> ...snip...
> >>>
> >>> Meaning you weren't able to do the same test?
> >>>
> >>
> >> I can if there are more details about how to set up this 5 and 10
> >> guests environment and test pattern have been used.
> >> Just think it might be save time if somebody still have the similar
> >> environment by hand.
> >> Roger and Felipe, if you still have the environment could you please
> >> have a quick compare about feature-persistent performance with patch
> >> [PATCH v5 0/2]
> >> gnttab: Improve scaleability?
> >
> > I've been meaning to do that. I don't have the environment up, but it isn't 
> > too
> hard to put it back together. A bit swamped at the moment, but will try (very
> hard) to do it next week.
> >
> 
> Do you have gotten any testing result?

I've put the hardware back together and am sorting out the software for 
testing. Things are not moving as fast as I wanted due to other commitments. 
I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to 
get his patches to work on a newer Xen.

The evaluation will compare:
1) bare metal i/o (for baseline)
2) tapdisk3 (currently using grant copy, which is what scales best in my 
experience)
3) blkback w/ persistent grants
4) blkback w/o persistent grants (I will just comment out the handshake bits in 
blkback/blkfront)
5) blkback w/o persistent grants + Malcolm's grant map patches

To my knowledge, blkback (w/ or w/o persistent grants) is always faster than 
user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. 
However, tapdisk with grant copy has been shown to produce (much) better 
aggregate throughput figures as it avoids any issues with grant (un)mapping.

I'm hoping to show that (5) above scales better than (3) and (4) in a 
representative scenario. If it does, I will recommend that we get rid of 
persistent grants in favour of a better and more scalable grant (un)mapping 
implementation.

Comments welcome.

Cheers,
F.

> 
> --
> Regards,
> -Bob

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

[Xen-devel] [qemu-upstream-4.5-testing test] 36492: trouble: pass/preparing

2015-03-17 Thread xen . org

flight 36492 qemu-upstream-4.5-testing running [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36492/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-qemut-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-freebsd10-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-pcipt-intel  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-libvirt   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-pvh-amd   2 hosts-allocate   running [st=running!]
 test-armhf-armhf-xl-multivcpu  2 hosts-allocate  running [st=running!]
 test-amd64-amd64-xl-sedf  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-sedf-pin  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-credit2   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemut-win7-amd64  2 hosts-allocate   running [st=running!]
 test-armhf-armhf-xl   2 hosts-allocate   running [st=running!]
 test-armhf-armhf-xl-credit2   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-multivcpu  2 hosts-allocate  running [st=running!]
 test-amd64-i386-xl-qemuu-ovmf-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-pair  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-libvirt  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-win7-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-winxpsp3   2 hosts-allocate   running [st=running!]
 test-armhf-armhf-xl-midway2 hosts-allocate   running [st=running!]
 test-amd64-i386-rhel6hvm-amd  2 hosts-allocate   running [st=running!]
 test-armhf-armhf-xl-sedf-pin  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl   2 hosts-allocate   running [st=running!]
 test-amd64-i386-rhel6hvm-intel  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-winxpsp3  2 hosts-allocate  running [st=running!]
 test-amd64-i386-freebsd10-i386  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-winxpsp3  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-xl-qemuu-winxpsp3  2 hosts-allocate  running [st=running!]
 test-amd64-amd64-xl-qemut-winxpsp3  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-win7-amd64  2 hosts-allocate running [st=running!]
 test-armhf-armhf-libvirt  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-pair 2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-qemuu-ovmf-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-win7-amd64  2 hosts-allocate  running [st=running!]
 test-amd64-amd64-xl-pvh-intel  2 hosts-allocate  running [st=running!]
 test-armhf-armhf-xl-sedf  2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemut-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-winxpsp3-vcpus1  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-qemuu-winxpsp3  2 hosts-allocate running [st=running!]

version targeted for testing:
 qemuu0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4
baseline version:
 qemuu1ebb75b1fee779621b63e84fefa7b07354c43a99


People who touched revisions under test:
  Gerd Hoffmann 
  Gonglei 
  Juan Quintela 
  Michael S. Tsirkin 
  Paolo Bonzini 
  Peter Maydell 
  Petr Matousek 
  Stefano Stabellini 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pa

[Xen-devel] [qemu-upstream-4.4-testing test] 36499: trouble: pass/preparing

2015-03-17 Thread xen . org

flight 36499 qemu-upstream-4.4-testing running [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/36499/

Failures and problems with tests :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-qemuu-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-qemut-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-freebsd10-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-libvirt   2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-ovmf-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-pair  2 hosts-allocate   running [st=running!]
 test-amd64-i386-rhel6hvm-amd  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xend-qemut-winxpsp3  2 hosts-allocaterunning [st=running!]
 test-amd64-i386-freebsd10-i386  2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl2 hosts-allocate   running [st=running!]
 test-amd64-i386-xend-winxpsp3  2 hosts-allocate  running [st=running!]
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!]
 test-amd64-i386-pv2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-win7-amd64  2 hosts-allocate  running [st=running!]
 test-amd64-i386-qemut-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!]
 test-amd64-i386-xl-winxpsp3-vcpus1  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-sedf  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-sedf-pin  2 hosts-allocate   running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-amd  2 hosts-allocate running [st=running!]
 test-amd64-i386-rhel6hvm-intel  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-winxpsp3  2 hosts-allocate   running [st=running!]
 test-amd64-i386-xl-qemut-win7-amd64  2 hosts-allocaterunning [st=running!]
 test-amd64-amd64-xl-win7-amd64  2 hosts-allocate running [st=running!]
 test-amd64-i386-qemuu-rhel6hvm-intel  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-ovmf-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-winxpsp3  2 hosts-allocate running [st=running!]
 test-amd64-amd64-xl-pcipt-intel  2 hosts-allocaterunning [st=running!]
 test-amd64-amd64-xl-credit2   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-multivcpu  2 hosts-allocate  running [st=running!]
 test-amd64-amd64-libvirt  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemuu-win7-amd64  2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemut-winxpsp3  2 hosts-allocate running [st=running!]
 test-amd64-amd64-pv   2 hosts-allocate   running [st=running!]
 test-amd64-amd64-pair 2 hosts-allocate   running [st=running!]
 test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running 
[st=running!]
 test-amd64-amd64-xl-qemut-win7-amd64  2 hosts-allocate   running [st=running!]

version targeted for testing:
 qemuud173a0c20d7970c17fa593cf86abc1791a8a4a3a
baseline version:
 qemuub04df88d41f64fc6b56d193b6e90fb840cedb1d3


People who touched revisions under test:
  Benoit Canet 
  BenoÃ®t Canet 
  Dmitry Fleytman 
  Gerd Hoffmann 
  Jason Wang 
  Jeff Cody 
  Juan Quintela 
  Kevin Wolf 
  Laszlo Ersek 
  Michael Roth 
  Michael S. Tsirkin 
  Peter Maydell 
  Petr Matousek 
  Stefan Hajnoczi 
  Stefano Stabellini 


jobs:
 build-amd64-xend pass
 build-i386-xend  pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl  preparing
 test-amd64-i386-xl   preparing
 test-amd64-i386-rhel6hvm-amd preparing
 test-amd64-i386-qemut-rhel6hvm-amd   preparing
 test-amd64-i386-qemuu-rhel6hvm-amd

[Xen-devel] Upstream QEMU based stubdom and rump kernel

2015-03-17 Thread Wei Liu

Hi all

I'm now working on upstream QEMU stubdom, and rump kernel seems to be a
good fit for this purpose.

A bit background information. A stubdom is a service domain.  With QEMU
stubdom we are able to run QEMU device emulation code in a separate
domain so that bugs in QEMU don't affect Dom0 (the controlling domain).
Xen currently has a QEMU stubdom, but it's based on our fork of ancient
QEMU (plus some other libraries and mini-os). Eventually we would like
to use upstream QEMU in stubdom.

I've now successfully built QEMU upstream with rump kernel. However to
make it fully functional as a stubdom, there are some missing pieces to
be added in.

1. The ability to access QMP socket (a unix socket) from Dom0. That
   will be used to issue command to QEMU.
2. The ability to access files in Dom0. That will be used to write to /
   read from QEMU state file.
3. The building process requires mini-os headers. That will be used
   to build libxc (the controlling library).

(Xen folks, do I miss anything?)

One of my lessons learned from the existing stubdom stuffs is that I
should work with upstream and produce maintainable code. So before I do
anything for real I'd better consult the community. My gut feeling is
that the first two requirements are not really Xen specific. Let me know
what you guys plan and think.

Wei.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events

2015-03-17 Thread Razvan Cojocaru

On 03/17/2015 04:20 PM, Jan Beulich wrote:
 On 17.03.15 at 15:07,  wrote:
>> Yes, but Andrew's idea (which I think is very neat) is that instead of
>> the trickery I used to do in the original patch (create a specific
>> VMCALL vm_event and compare eax to a magic constant on VMCALL-based
>> VMEXITS, to figure out if all I wanted to do was send out the event),
>> that I should instead have the guest set up rax, rdi and rsi and execute
>> vmcall, which would then be translated to a real hypercall that sends
>> out a vm_event.
> 
> If you think about a bare HVM guest OS (i.e. without any PV
> drivers), then of course you should provide such hypercall
> wrappers for code to use instead of open coding it in potentially
> many places.
> 
>> In this case, the (HVM) guest does need to concern itself with what
>> registers it should set up for that purpose. I suppose a workaround
>> could be to write the subop in both ebx and rdi, though without any
>> testing I don't know at this point what, if anything, might be broken
>> that way.
> 
> Guest code ought to know what mode it runs in. And introspection
> code (in case this is about injection of such code) ought to also
> know which mode the monitored guest is in.

Yes, we'll try to handle this, I was mainly asking because based on
Andrew's suggestion (which only mentioned rdi, not ebx) I wanted to make
sure that this is not someting that people might prefer to change at Xen
source code level.


Thanks for the clarification,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] osstest going offline for a bit due to database server move

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 10:28 +, Ian Campbell wrote:
> On Mon, 2015-03-16 at 12:41 +, Ian Campbell wrote:
> > We've not yet tracked down the source of the mysterious filer reboots
> > and there was another earlier today, we've fiddled with a few things to
> > see if we can track them down.
> > 
> > osstest is doing stuff now, fingers crossed.
> 
> There were some more reboots overnight. We've made another config change
> which we hope will resolve things. If not we will look at moving the
> controller VM to another filer tomorrow.
> 
> In the meantime in an attempt to try and keep some of the more important
> branches flowing with the limited bandwidth between reboots I've stopped
> a bunch of stuff:
[...]

After discussion with Stefano I've also stopped the qemu-upstream stuff
for 4.2, 4.3, 4.4 and 4.5. AIUI the tags to be used for the 4.3.x and
4.4.x branches are already in the tested branch and everything after
that is targeting the next point release.

$ for i in 4.2 4.3 4.4 4.5 ; do
> touch qemu-upstream-$i-testing
> done

and killed these flights:

 flight |  blessing   |  branch   | intended 
+-+---+--
  36492 | running | qemu-upstream-4.5-testing | real
  36494 | running | qemu-upstream-4.3-testing | real
  36499 | running | qemu-upstream-4.4-testing | real

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] PVH DomU panics on boot on Xen 4.5.0 whereas it was fine on 4.4.1

2015-03-17 Thread Ian Murray

On 17/03/15 12:54, Konrad Rzeszutek Wilk wrote:
> On Mon, Mar 16, 2015 at 11:08:50PM +, Ian Murray wrote:
>> On 16/03/15 14:12, Konrad Rzeszutek Wilk wrote:
>>> On Sun, Mar 15, 2015 at 09:34:16PM +, Ian Murray wrote:
 Hi,

 I have a domU guest that booted fine under Xen 4.4.1 with pvh=1 but now 
 fails to boot with it under Xen 4.5.0. Removing pvh=1, i.e. booting it as 
 traditional PV results in it booting fine.

 The only odd thing is that I had to compile with debug=y on Config.mk to 
 avoid a compiler warning that was causing compilation to fail outright. I 
 will create another mail about that. DomU is Ubuntu 14.10 and Dom0 is 
 12.04.5

>>> there were some incompatible changes in xen 4.5 in regards to
>>> PVH which were then updated in Linux 3.19 (or was it 3.18?)
>>>
>>> I would recommend you rev up to the latest version of Linux.
>> I tried the mainline support kernel for Ubuntu (GNU/Linux
>> 3.19.1-031901-generic x86_64) and it booted fine.
>>
>> Thanks for the assistance and happy to assist if anyone wants to treat
>> it as a regression.
> Nah, it is labelled 'expermintal' for that exact reason - as we did
> realized we made a mistake in Xen 4.4 that we ended up fixing
> in Xen 4.5 - and the fixed it in Linux.
Thanks. I was aware it was experimental, but just wanted offer the
chance to debug if the above behaviour was unexpected.

>
> Sorry thought that it was not widely mentioned and it made your
> day a bit sad. Was there an specific webpage you looked first for
> help? (Asking so I can at least edit it to mention this).

I don't remember where I looked, tbh, although I would have read the
release notes for 4.5 as a matter of course. I checked the PVH wiki
entry and that merely refers to the "latest" of Xen and Linux. Perhaps
that could be a bit more specific.

My day wasn't made sad by this, but my day was a little sadder when I
read (if I am reading it right) that AMD support for PVH (although
slated for it) does not appear to have made it into 4.5. :)

Thanks for reading.





>>
>>
 Here is the boot output when booting with pvh=1

 [0.00] Initializing cgroup subsys cpuset
 [0.00] Initializing cgroup subsys cpu
 [0.00] Initializing cgroup subsys cpuacct
 [0.00] Linux version 3.16.0-31-generic (buildd@batsu) (gcc version 
 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #41-Ubuntu SMP Tue Feb 10 15:24:04 UTC 
 2015 (Ubuntu 3.16.0-31.41-generic 3.16.7-ckt5)
 [0.00] Command line: 
 root=UUID=edfcef2a-dcf1-4c77-ad69-22456606702e ro  nomodeset 
 xen-fbfront.video=16,1024,768
 [0.00] KERNEL supported cpus:
 [0.00]   Intel GenuineIntel
 [0.00]   AMD AuthenticAMD
 [0.00]   Centaur CentaurHauls
 [0.00] ACPI in unprivileged domain disabled
 [0.00] e820: BIOS-provided physical RAM map:
 [0.00] Xen: [mem 0x-0x3fff] usable
 [0.00] NX (Execute Disable) protection: active
 [0.00] DMI not present or invalid.
 [0.00] AGP: No AGP bridge found
 [0.00] e820: last_pfn = 0x4 max_arch_pfn = 0x4
 [0.00] Scanning 1 areas for low memory corruption
 [0.00] init_memory_mapping: [mem 0x-0x000f]
 [0.00] init_memory_mapping: [mem 0x3fe0-0x3fff]
 [0.00] init_memory_mapping: [mem 0x3c00-0x3fdf]
 [0.00] init_memory_mapping: [mem 0x0010-0x3bff]
 [0.00] RAMDISK: [mem 0x023f6000-0x0589]
 [0.00] NUMA turned off
 [0.00] Faking a node at [mem 0x-0x3fff]
 [0.00] Initmem setup node 0 [mem 0x-0x3fff]
 [0.00]   NODE_DATA [mem 0x3fffb000-0x3fff]
 [0.00] Zone ranges:
 [0.00]   DMA  [mem 0x1000-0x00ff]
 [0.00]   DMA32[mem 0x0100-0x]
 [0.00]   Normal   empty
 [0.00] Movable zone start for each node
 [0.00] Early memory node ranges
 [0.00]   node   0: [mem 0x1000-0x0009]
 [0.00]   node   0: [mem 0x0010-0x3fff]
 [0.00] SFI: Simple Firmware Interface v0.81 
 http://simplefirmware.org
 [0.00] smpboot: Allowing 2 CPUs, 0 hotplug CPUs
 [0.00] PM: Registered nosave memory: [mem 0x000a-0x000f]
 [0.00] e820: [mem 0x4000-0x] available for PCI devices
 [0.00] Booting paravirtualized kernel with PVH extensions on Xen
 [0.00] Xen version: 4.5.0
 [0.00] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:2 
 nr_node_ids:1
 [0.00] PERCPU: Embedded 28 pages/cpu @88003fc0 s83328 
 r8192 d23168 u1048576
 [0.00] Built 1 zonelists in Node order, mobility grouping on.  
 Total pag

Re: [Xen-devel] OpenStack - Libvirt+Xen CI overview

2015-03-17 Thread Jim Fehlig

Bob Ball wrote:
> For the last few weeks Anthony and I have been working on creating a CI 
> environment to run against all OpenStack jobs.  We're now in a position where 
> we can share the current status, overview of how it works and next steps.  We 
> actively want to support involvement in this effort from others with an 
> interest in libvirt+Xen's openstack integration.
>
> The CI we have set up is follow the recommendations made by the OpenStack 
> official infrastructure maintainers, and reproduces a notable portion of the 
> official OpenStack CI environment to run these tests.  Namely this setup is 
> using:
> - Puppet to deploy the master node
> - Zuul to watch for code changes uploaded to review.openstack.org
> - Jenkins job builder to create Jenkins job definitions from a YAML file
> - Nodepool to automatically create single-use virtual machines in the 
> Rackspace public cloud 
> - Devstack-gate to run Tempest tests in serial
>
> More information on Zuul, JJB, Nodepool and devstack-gate is available 
> through http://ci.openstack.org
>
> The current status is that we have a zuul instance monitoring for jobs and 
> adding them to the queue of jobs to be run at 
> http://zuul.openstack.xenproject.org/
>
> In the background Nodepool provisions virtual machines into a pool of nodes 
> ready to be used.  All ready nodes are automatically added to Jenkins 
> (https://jenkins.openstack.xenproject.org/), and then Zuul+Jenkins will 
> trigger a particular job on a node when one is available.
>
>   
> Logs are then uploaded to Rackspace's Cloud Files with sample logs for
> a passing job at
> http://logs.openstack.xenproject.org/52/162352/3/silent/dsvm-tempest-xen/da3ff30/index.html

Thanks for the info!

>   
> I'd like to organise a meeting to walk through the various components
> of the CI with those who are interested, so this is an initial call to
> find out who is interested in finding out more!

I'd like to know more.

Regards,
Jim

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 15:07,  wrote:
> Yes, but Andrew's idea (which I think is very neat) is that instead of
> the trickery I used to do in the original patch (create a specific
> VMCALL vm_event and compare eax to a magic constant on VMCALL-based
> VMEXITS, to figure out if all I wanted to do was send out the event),
> that I should instead have the guest set up rax, rdi and rsi and execute
> vmcall, which would then be translated to a real hypercall that sends
> out a vm_event.

If you think about a bare HVM guest OS (i.e. without any PV
drivers), then of course you should provide such hypercall
wrappers for code to use instead of open coding it in potentially
many places.

> In this case, the (HVM) guest does need to concern itself with what
> registers it should set up for that purpose. I suppose a workaround
> could be to write the subop in both ebx and rdi, though without any
> testing I don't know at this point what, if anything, might be broken
> that way.

Guest code ought to know what mode it runs in. And introspection
code (in case this is about injection of such code) ought to also
know which mode the monitored guest is in.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Dario Faggioli

On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote:
> Hi Dario,
> 
Hey,

> 2015-03-16 13:05 GMT-04:00 Dario Faggioli :

> >
> > This change also takes the chance to add a scratch
> > cpumask, to avoid having to create one more
> > cpumask_var_t on the stack of the dumping routine.
> 
> Actually, I have a question about the strength of this design. When we
> have a machine with many cpus, we will end up with allocating a
> cpumask for each cpu. 
>
Just FTR, what we will end up allocating is:
 - an array of *pointers* to cpumasks with as many elements as the 
   number of pCPUs,
 - a cpumask *only* for the pCPUs subjected to an instance of the RTDS 
   scheduler.

So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler
only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to
cpumask_t, but only 2 actual cpumasks.

> Is this better than having a cpumask_var_t on
> the stack of the dumping routine, since the dumping routine is not in
> the hot path?
> 
George and Jan replied to this already, I think. Allow me to add just a
few words:
> > Such scratch area can be used to kill most of the
> > cpumasks_var_t local variables in other functions
> > in the file, but that is *NOT* done in this chage.
> >
This is the point, actually! As said here, this is not only for the sake
of the dumping routine. In fact, ideally, someone will, in the near
future, go throughout the whole file and kill most of the cpumask_t
local variables, and most of the cpumask dynamic allocations, in favour
of using this scratch area.

> > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops)
> >  if ( prv == NULL )
> >  return -ENOMEM;
> >
> > +_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids);
> 
> Is it better to use xzalloc_array?
> 
Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the
elements of the array that have been previously successfully allocated
(in rt_alloc_pdata()), so I don't think there is any special requirement
for all the elements to be NULL right away.

> > +if ( _cpumask_scratch == NULL )
> > +return -ENOMEM;
> > +
> >  spin_lock_init(&prv->lock);
> >  INIT_LIST_HEAD(&prv->sdom);
> >  INIT_LIST_HEAD(&prv->runq);
> > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops)
> >  {
> >  struct rt_private *prv = rt_priv(ops);
> >
> > +xfree(_cpumask_scratch);
> >  xfree(prv);
> >  }
> >
> > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu)
> >  per_cpu(schedule_data, cpu).schedule_lock = &prv->lock;
> >  spin_unlock_irqrestore(&prv->lock, flags);
> >
> > +if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) )
> 
> Is it better to use zalloc_cpumask_var() here?
> 
Nope. It's a scratch area, after all, so one really should not assume it
to be in a specific state (e.g., no bits set as you're suggesting) when
using it.

Thanks and Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote:
> On Monday 09 March 2015 08:04 AM, Yijing Wang wrote:
> > Now we could pass PCI domain combined with bus number
> > in u32 argu. Because in arm/arm64, PCI domain number
> > is assigned by pci_bus_assign_domain_nr(). So we leave
> > pci_scan_root_bus() and pci_create_root_bus() in arm/arm64
> > unchanged. A new function pci_host_assign_domain_nr()
> > will be introduced for arm/arm64 to assign domain number
> > in later patch.
> Hi,
> I think these changes might not be required. We have made very few 
> changes in the xen-pcifront to support PCI passthrough in arm64.
> As per xen architecture for a domU only a single pci virtual bus is 
> created and all passthrough devices are attached to it.

I guess you are only talking about the changes to xen-pcifront.c?
Otherwise you are ignoring the dom0 case which is exposed to the real
set of PCI root complexes and anyway I'm not sure how "not needed for
Xen domU" translates into not required, since it is clearly required for
other systems.

Strictly speaking the Xen pciif protocol does support multiple buses,
it's just that the tools, and perhaps kernels, have not yet felt any
need to actually make use of that.

There doesn't seem to be any harm in updating pcifront to follow this
generic API change.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events

2015-03-17 Thread Razvan Cojocaru

On 03/17/2015 03:58 PM, Jan Beulich wrote:
 On 17.03.15 at 14:50,  wrote:
>> On 07/11/2014 08:23 PM, Andrew Cooper wrote:
>>> From the point of view of your in-guest agent, it would be a vmcall with
>>> rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer
>>> to struct containing data, depending on how exactly you implement the
>>> hypercall.
>>>
>>> You would have the bonus of being able to detect errors, e.g. -ENOENT
>>> for "mem_event not active", get SVM support for free, and not need magic
>>> numbers, or vendor specific terms like "vmcall" finding their way into
>>> the Xen public API.
>>
>> Actually, this only seems to be the case where mode == 8 in
>> hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c):
>>
>> 4987  : hvm_hypercall64_table)[eax](rdi, rsi, rdx,
>> r10, r8, r9);
>>
>> Otherwise (and this seems to be the case with my Xen build), ebx seems
>> to be used for the subop:
>>
>> 5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi,
>> edi, ebp);
>>
>> So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended
>> (rdi in one case and ebx in the other)?
> 
> Of course - the ABIs (and hence the use of registers for certain
> specific purposes) of ix86 and x86-64 are different. Since there
> are hypercall wrappers in both the kernel and the tool stack, you
> shouldn't actually need to care about this on the caller side. And
> the handler side doesn't deal with specific registers anyway
> (outside of hvm_do_hypercall() that is).

Yes, but Andrew's idea (which I think is very neat) is that instead of
the trickery I used to do in the original patch (create a specific
VMCALL vm_event and compare eax to a magic constant on VMCALL-based
VMEXITS, to figure out if all I wanted to do was send out the event),
that I should instead have the guest set up rax, rdi and rsi and execute
vmcall, which would then be translated to a real hypercall that sends
out a vm_event.

In this case, the (HVM) guest does need to concern itself with what
registers it should set up for that purpose. I suppose a workaround
could be to write the subop in both ebx and rdi, though without any
testing I don't know at this point what, if anything, might be broken
that way.

Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 3/5] libxl: add pvusb API

2015-03-17 Thread Juergen Gross


Hi Chunyan,

I've found another problem while trying to write a qemu based pvUSB
backend.

On 01/19/2015 09:28 AM, Chunyan Liu wrote:

Add pvusb APIs, including:
  - attach/detach (create/destroy) virtual usb controller.
  - attach/detach usb device
  - list assignable usb devices in host
  - some other helper functions

Signed-off-by: Chunyan Liu 
Signed-off-by: Simon Cao 
---


...


diff --git a/tools/libxl/libxl_usb.c b/tools/libxl/libxl_usb.c
new file mode 100644
index 000..830a846
--- /dev/null
+++ b/tools/libxl/libxl_usb.c


...


+/* xenstore usb data */
+static int libxl__device_usb_add_xenstore(libxl__gc *gc, uint32_t domid,
+  libxl_device_usb *usb)
+{
+libxl_ctx *ctx = CTX;
+char *be_path;
+int rc;
+libxl_domain_config d_config;
+libxl_device_usb usb_saved;
+libxl__domain_userdata_lock *lock = NULL;
+
+libxl_domain_config_init(&d_config);
+libxl_device_usb_init(&usb_saved);
+libxl_device_usb_copy(CTX, &usb_saved, usb);
+
+be_path = libxl__sprintf(gc, "%s/backend/vusb/%d/%d",
+libxl__xs_get_dompath(gc, 0), domid, usb->ctrl);
+if (libxl__wait_for_backend(gc, be_path, "4") < 0) {


Don't do this! That's the reason I had to change my backend driver in
order to support assignment of a usb device via config file. Normally
the backend will witch to state 4 only after the frontend is started.

You can just remove waiting for the backend here. The backend has to
check all ports when it is changing is state to 4 ("connected").


+rc = ERROR_FAIL;
+goto out;
+}
+
+lock = libxl__lock_domain_userdata(gc, domid);
+if (!lock) {
+rc = ERROR_LOCK_FAIL;
+goto out;
+}
+
+rc = libxl__get_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+
+DEVICE_ADD(usb, usbs, domid, &usb_saved, COMPARE_USB, &d_config);
+
+rc = libxl__set_domain_configuration(gc, domid, &d_config);
+if (rc) goto out;
+
+be_path = libxl__sprintf(gc, "%s/port/%d", be_path, usb->port);
+LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Adding new usb device to xenstore");
+if (libxl__xs_write_checked(gc, XBT_NULL, be_path, usb->intf)) {
+rc = ERROR_FAIL;
+goto out;
+}
+
+rc = 0;
+
+out:
+if (lock) libxl__unlock_domain_userdata(lock);
+libxl_device_usb_dispose(&usb_saved);
+libxl_domain_config_dispose(&d_config);
+return rc;
+
+}
+
+static int libxl__device_usb_remove_xenstore(libxl__gc *gc, uint32_t domid,
+ libxl_device_usb *usb)
+{
+libxl_ctx *ctx = CTX;
+char *be_path;
+
+be_path = libxl__sprintf(gc, "%s/backend/vusb/%d/%d",
+libxl__xs_get_dompath(gc, 0), domid, usb->ctrl);
+if (libxl__wait_for_backend(gc, be_path, "4") < 0)


Remove this one, too.

Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 13:05 +, Lars Kurth wrote:
> > On 17 Mar 2015, at 11:40, Ian Campbell  wrote:
> > 
> > On Thu, 2015-03-12 at 18:14 +, Lars Kurth wrote:
> >> Hi,I nearly missed this. Please make sure you forward stuff and change
> >> the headline if you want me to look into things. Otherwise I may miss
> >> it.
> > 
> > Sure, I'll try and remember.
> > 
> > FYI before Ian J went away he mentioned that he had raised some
> > questions/issues (either on this or a previous version) which had not
> > yet been answered (or maybe not answered to his satisfaction, I'm not
> > sure) but that if those were addressed he would take a look with a view
> > to acking the interface for inclusion in xen.git.
> 
> OK. So this means there are some concrete lose ends, which need to be 
> followed up on. I also remember that there was a discussion on how we should 
> specify protocols, which does not appear to have fully concluded either. 
> 
> >> 
> >> Would this work as a way forward?
> > 
> > I think the main things which is missing is some decision as to the the
> > point at which we would consider the ABI for a PV protocol fixed, i.e.
> > to be maintained in a backwards compatible manner from then on. 
> 
> What do we do with new APIs in such situations?

We review then carefully and hope we get them right. We manage to get
this right at least some of the time because many of us are familiar
with the issues WRT e.g. memory management hypercalls.

This is what I was getting at with "people are naturally a bit cautious
about creating new ABIs, which must be maintained long term, for types
of device with which they are not really familiar." in my initial mail.
The "which they are not really familiar" is pretty key.

It's also (normally) not too hard to add a new hypercall fixing a
shortcoming in an existing one while retaining backwards compat,
compared with doing that for an I/O protocol (see: netchannel2).

In the I/O case adding extensions also is reasonably well understood and
something we manage, but fixing a core issue is much harder (see: the
non-uniformity of the blk protocol over different architectures, or the
ring space wastage due to various power of two requirements, neither of
which can realistically be properly fixed).

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 14:50,  wrote:
> On 07/11/2014 08:23 PM, Andrew Cooper wrote:
>> From the point of view of your in-guest agent, it would be a vmcall with
>> rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer
>> to struct containing data, depending on how exactly you implement the
>> hypercall.
>> 
>> You would have the bonus of being able to detect errors, e.g. -ENOENT
>> for "mem_event not active", get SVM support for free, and not need magic
>> numbers, or vendor specific terms like "vmcall" finding their way into
>> the Xen public API.
> 
> Actually, this only seems to be the case where mode == 8 in
> hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c):
> 
> 4987  : hvm_hypercall64_table)[eax](rdi, rsi, rdx,
> r10, r8, r9);
> 
> Otherwise (and this seems to be the case with my Xen build), ebx seems
> to be used for the subop:
> 
> 5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi,
> edi, ebp);
> 
> So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended
> (rdi in one case and ebx in the other)?

Of course - the ABIs (and hence the use of registers for certain
specific purposes) of ix86 and x86-64 are different. Since there
are hypercall wrappers in both the kernel and the tool stack, you
shouldn't actually need to care about this on the caller side. And
the handler side doesn't deal with specific registers anyway
(outside of hvm_do_hypercall() that is).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 14:05,  wrote:
>> On 17 Mar 2015, at 11:40, Ian Campbell  wrote:
>> I think the main things which is missing is some decision as to the the
>> point at which we would consider the ABI for a PV protocol fixed, i.e.
>> to be maintained in a backwards compatible manner from then on. 
> 
> What do we do with new APIs in such situations? It would appear that there 
> is some commonality in how we would handle a protocols and an API. I am 
> assuming APIs such as new hypercalls don't immediately become fixed and 
> backwards compatible. 

New hypercalls become set in stone as soon as they appear in any
released version, unless specifically marked as experimental or alike.
The situation is quite different for a protocol specification like this:
Here we talk about something where no code would live in xen.git at
all, only the abstract description. Hence its stability can't usefully be
tied to any released Xen version.

Jan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v2 2/4] xen/arm: Add GSER region to ThunderX platform mapping

2015-03-17 Thread Ian Campbell

On Tue, 2015-03-17 at 18:32 +0530, Vijay Kilari wrote:
> Hi Ian,
> 
> On Thu, Mar 5, 2015 at 10:40 PM, Ian Campbell  wrote:
> > On Thu, 2015-03-05 at 16:46 +, Ian Campbell wrote:
> >> On Wed, 2015-03-04 at 11:36 +0530, vijay.kil...@gmail.com wrote:
> >> > From: Vijaya Kumar K 
> >> >
> >> > Add GSER region to thunderx platfrom specific mappings.
> >> > This region is not mentioned in DT. This is required by
> >> > PCI driver to detect and configure pci devices attached.
> >> >
> >> > In future we can remove this mapping, if pci driver
> >> > in Dom does not require this.
> >>
> >> How do we know what the PCI driver in dom0 needs? I don't think we can,
> >> so we can in effect never remove this specific mapping, which is a
> >> shame.
> >>
> >> Unless you have some scheme in mind which would allow us to do so?
> >>
> >> IMHO by far the best solution would be to add this device to the DTB so
> >> that it is correctly mapped. I'm not quite sure what that will look like
> >> since thne mainline DTB doesn't have the PCI node at all.
> >
> > Looking at a more recent DTB which I have access to it seems like
> > 0x87e09000 is correctly covered by a ranges entry on the PCI
> > controller node.
> 
> Where did you find recent DTB?.  AFAIK, this region does not fall
> under any pci controller range.

It was in the tree you guys sent me a little while back.
ThunderX_Release_v0.3.tar.gz IIRC.

thunder-88xx-2n.dtsi in that contains a PCI node "pcie0:
pcie0@0x8480," with ranges containing this entry:
 <0x0300 0x87e0 0x 0x87e0 0x 0x01 
0x>,

Which covers the range from 0x87e0 to 0xe7f, i.e.
covering this region at 0x87e09000.

> > So I think all which is needed is a) to use this updated DTB and b) my
> > series "xen: arm: Parse PCI DT nodes' ranges and interrupt-map" from
> > last October which, as it happens, I've been working on bringing up to
> > date yesterday and today (one more thing to clean up before I repost).
> 
> Because it is not covered under any PCI ranges, your patch series
> still does not help.
> Infact, this is common region for SERDES configuration so cannot bind
> to any particular pci controller range.

Even if that turns out to be the case then surely this regions needs to
be defined somehow in the DT else how could it be discovered.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask with xc_cpumap_t instead of uint32_t

2015-03-17 Thread George Dunlap

On 03/13/2015 08:37 PM, Konrad Rzeszutek Wilk wrote:
> +static int parse_cpumask(const char *arg)
> +{
> +xc_cpumap_t map;
> +uint32_t v, i;
> +int bits = 0;
> +
> +map = malloc(sizeof(uint32_t));
> +if ( !map )
> +return -ENOMEM;
> +
> +v = argtol(arg, 0);
> +for ( i = 0; i < sizeof(uint32_t) ; i++ )
> +map[i] = (v >> (i * 8)) & 0xff;
> +
> +for ( i = 0; v; v >>= 1)
> +bits += v & 1;

Uum, it looks like this is counting the 1-bits in v, not the total
number of bist.  So "0x8000" would finish with bits == 1 ; but we would
this to finish with bits == 16, don't we?

Or am I confused?

 -George


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping

2015-03-17 Thread Dario Faggioli

On Mon, 2015-03-16 at 19:05 +, George Dunlap wrote:
> On 03/16/2015 05:05 PM, Dario Faggioli wrote:

> > @@ -218,7 +224,6 @@ __q_elem(struct list_head *elem)
> >  static void
> >  rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc)
> >  {
> > -char cpustr[1024];
> >  cpumask_t *cpupool_mask;
> >  
> >  ASSERT(svc != NULL);
> > @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const 
> > struct rt_vcpu *svc)
> >  return;
> >  }
> >  
> > -cpumask_scnprintf(cpustr, sizeof(cpustr), 
> > svc->vcpu->cpu_hard_affinity);
> > +cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool);
> > +/*
> > + * We can't just use 'cpumask_scratch' because the dumping can
> > + * happen from a pCPU outside of this scheduler's cpupool, and
> > + * hence it's not right to use the pCPU's scratch mask (which
> > + * may even not exist!). On the other hand, it is safe to use
> > + * svc->vcpu->processor's own scratch space, since we own the
> > + * runqueue lock.
> 
> Since we *hold* the lock.
> 
Right, thanks.

> > + */
> > +cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask,
> > +svc->vcpu->cpu_hard_affinity);
> > +cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch),
> > +  _cpumask_scratch[svc->vcpu->processor]);
> 
> Just a suggestion, would it be worth making a local variable to avoid
> typing this long thing twice?  
>
It probably would.

> Then you could also put the comment about
> using the svc->vcpu->processor's scratch space above the place where you
> set the local variable, while avoiding breaking up the logic of the
> cpumask operations.
> 
I like this, will do.

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events

2015-03-17 Thread Razvan Cojocaru

On 07/11/2014 08:23 PM, Andrew Cooper wrote:
> On 11/07/14 16:43, Razvan Cojocaru wrote:
>> Added support for VMCALL events (the memory introspection library
>> will have the guest trigger VMCALLs, which will then be sent along
>> via the mem_event mechanism).
>>
>> Changes since V1:
>>  - Added a #define and an comment explaining a previous magic
>>constant.
>>  - Had MEM_EVENT_REASON_VMCALL explicitly not honour
>>HVMPME_onchangeonly.
>>
>> Signed-off-by: Razvan Cojocaru 
>> ---
>>  xen/arch/x86/hvm/hvm.c  |9 +
>>  xen/arch/x86/hvm/vmx/vmx.c  |   18 +-
>>  xen/include/asm-x86/hvm/hvm.h   |1 +
>>  xen/include/public/hvm/params.h |4 +++-
>>  xen/include/public/mem_event.h  |5 +
>>  5 files changed, 35 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
>> index 89a0382..6e86d7c 100644
>> --- a/xen/arch/x86/hvm/hvm.c
>> +++ b/xen/arch/x86/hvm/hvm.c
>> @@ -5564,6 +5564,7 @@ long do_hvm_op(unsigned long op, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  case HVM_PARAM_MEMORY_EVENT_INT3:
>>  case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP:
>>  case HVM_PARAM_MEMORY_EVENT_MSR:
>> +case HVM_PARAM_MEMORY_EVENT_VMCALL:
>>  if ( d == current->domain )
>>  {
>>  rc = -EPERM;
>> @@ -6199,6 +6200,14 @@ void hvm_memory_event_msr(unsigned long msr, unsigned 
>> long value)
>> value, ~value, 1, msr);
>>  }
>>  
>> +void hvm_memory_event_vmcall(unsigned long rip, unsigned long eax)
>> +{
>> +hvm_memory_event_traps(current->domain->arch.hvm_domain
>> + .params[HVM_PARAM_MEMORY_EVENT_VMCALL],
>> +   MEM_EVENT_REASON_VMCALL,
>> +   rip, ~rip, 1, eax);
>> +}
>> +
>>  int hvm_memory_event_int3(unsigned long gla) 
>>  {
>>  uint32_t pfec = PFEC_page_present;
>> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
>> index 2caa04a..6c63225 100644
>> --- a/xen/arch/x86/hvm/vmx/vmx.c
>> +++ b/xen/arch/x86/hvm/vmx/vmx.c
>> @@ -2879,8 +2879,24 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs)
>>  case EXIT_REASON_VMCALL:
>>  {
>>  int rc;
>> +unsigned long eax = regs->eax;
>> +
>>  HVMTRACE_1D(VMMCALL, regs->eax);
>> -rc = hvm_do_hypercall(regs);
>> +
>> +/* Don't send a VMCALL mem_event unless something
>> + * caused the guests's eax register to contain the
>> + * VMCALL_EVENT_REQUEST constant. */
>> +if ( regs->eax != VMCALL_EVENT_REQUEST )
>> +{
>> +rc = hvm_do_hypercall(regs);
>> +}
>> +else
>> +{
>> +hvm_memory_event_vmcall(guest_cpu_user_regs()->eip, eax);
>> +update_guest_eip();
>> +break;
>> +}
> 
> Thinking more about this, it is really a hypercall pretending not to
> be.  It would be better to introduce a real HVMOP_send_mem_event.
> 
> From the point of view of your in-guest agent, it would be a vmcall with
> rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer
> to struct containing data, depending on how exactly you implement the
> hypercall.
> 
> You would have the bonus of being able to detect errors, e.g. -ENOENT
> for "mem_event not active", get SVM support for free, and not need magic
> numbers, or vendor specific terms like "vmcall" finding their way into
> the Xen public API.

Actually, this only seems to be the case where mode == 8 in
hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c):

4987  : hvm_hypercall64_table)[eax](rdi, rsi, rdx,
r10, r8, r9);

Otherwise (and this seems to be the case with my Xen build), ebx seems
to be used for the subop:

5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi,
edi, ebp);

So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended
(rdi in one case and ebx in the other)?


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v3 21/24] tools/(lib)xl: Add partial device tree support for ARM

2015-03-17 Thread Julien Grall

Hi Ian,

Sorry for the late answer.

On 23/02/15 17:22, Ian Campbell wrote:
> On Mon, 2015-02-23 at 17:06 +, Julien Grall wrote:
>> On 23/02/15 11:46, Ian Campbell wrote:
>>> On Tue, 2015-01-13 at 14:25 +, Julien Grall wrote:
 Let the user to pass additional nodes to the guest device tree. For this
 purpose, everything in the node /passthrough from the partial device tree 
 will
 be copied into the guest device tree.

 The node /aliases will be also copied to allow the user to define aliases
 which can be used by the guest kernel.

 A simple partial device tree will look like:

 /dts-v1/;

 / {
 #address-cells = <2>;
 #size-cells = <2>;
>>>
>>> Are these mandatory/required as implied below, or only the ones inside
>>> the passthrough node (which is what I would expect).
>>
>> It's to make DTC quiet.
> 
> Maybe add /* Keep DTC happy */ to both lines?
> 
>>

 passthrough {
 compatible = "simple-bus";
 ranges;
 #address-cells = <2>;
 #size-cells = <2>;

 /* List of your nodes */
 }
 };

 Note that:
 * The interrupt-parent proporties will be added by the toolstack in
>>>
>>> "properties"
>>>
 the root node
 * The properties compatible, ranges, #address-cells and #size-cells
 in /passthrough are mandatory.
>>>
>>> Does ranges need to be the empty form? I think ranges = 
>>> would be illegal?
>>
>> It's not illegal as long as you correctly use it in the inner "reg".
> 
> OK. This could be explained in some more complete documentaiton I think.
> (It's a doc day on Wednesday ;-))
> 
>>
>> Also, I admit that the "ranges" is confusing to read.
>>

 Signed-off-by: Julien Grall 
 Cc: Ian Jackson 
 Cc: Wei Liu 

 ---
 Changes in v3:
 - Patch added
 ---
  docs/man/xl.cfg.pod.5   |   7 ++
  tools/libxl/libxl_arm.c | 253 
 
  tools/libxl/libxl_types.idl |   1 +
  tools/libxl/xl_cmdimpl.c|   1 +
  4 files changed, 262 insertions(+)

 diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
 index e2f91fc..225b782 100644
 --- a/docs/man/xl.cfg.pod.5
 +++ b/docs/man/xl.cfg.pod.5
 @@ -398,6 +398,13 @@ not emulated.
  Specify that this domain is a driver domain. This enables certain
  features needed in order to run a driver domain.
  
 +=item B
 +
 +Specify a partial device tree (compiled via the Device Tree Compiler).
 +Everything under the node "/passthrough" will be copied into the guest
 +device tree. For convenience, the node "/aliases" is also copied to allow
 +the user to defined aliases which can be used by the guest kernel.
 +
  =back
  
  =head2 Devices
 diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
 index 53177eb..619458b 100644
 --- a/tools/libxl/libxl_arm.c
 +++ b/tools/libxl/libxl_arm.c
 @@ -540,6 +540,238 @@ out:
  }
  }
  
 +static bool check_overrun(uint64_t a, uint64_t b, uint32_t max)
 +{
 +return ((a + b) > UINT_MAX || (a + b) > max);
>>>
>>> Both halves here will fail if e.g. a == UINT64_MAX-1 and b == 2, so e..g
>>> a+b <= UINT_MAX and < max.
>>
>> Oops right.
>>
>>> To avoid this you should check that a and b are both less than some
>>> fraction of UINT64_MAX before the other checks, which would ensure the
>>> overflow can't happen, perhaps even UINT32_MAX would be acceptable for
>>> this use, depending on the input types involved.
>>
>> max is an uint32_t so a and b should be inferior to UINT32_MAX.
> 
> by "inferior to" do you mean less than? Or something to do with type
> promotion/demotion rules?

I meant less than.

>>
>> What about
>>
>> a < UINT_MAX && b < UINT_MAX && (a + b) < UINT_MAX
> 
> Isn't that inverted from the sense which the function name requires?
> 
> Given the complexity in reasoning about this I think a series of
> individual if and return statements which check each precondition one at
> a time and return failure if necessary wuold be clearer to read and
> reason about than trying to encode it all in one expression.

Given that we will mark the option unsafe. I'm thinking to drop this
check and some others. This would make the code less complex and avoid
to check on half of the FDT.

> 
 diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
 index 1214d2e..5651110 100644
 --- a/tools/libxl/libxl_types.idl
 +++ b/tools/libxl/libxl_types.idl
 @@ -399,6 +399,7 @@ libxl_domain_build_info = Struct("domain_build_info",[
  ("kernel",   string),
  ("cmdline",  string),
  ("ramdisk",  string),
 +("device_tree",  string),
>>>
>>> Needs a #define LIBXL_HAVE... in libxl.h
>>
>> Hmmm why? This

Re: [Xen-devel] [PATCH] flask/policy: fix static device labeling examples

2015-03-17 Thread Jan Beulich

>>> On 17.03.15 at 14:03,  wrote:
> (CC Ian and Jan)

This is mostly about tools stuff:

>>  docs/misc/xsm-flask.txt  | 31 +++
>>  tools/flask/policy/Makefile  |  3 ++-
>>  tools/flask/policy/policy/device_contexts| 32 +++
>>  tools/flask/policy/policy/modules/xen/xen.te | 38 
>> +++-
>>  4 files changed, 41 insertions(+), 63 deletions(-)

Hence I don't see why you ping me about it.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 10:56:48AM +0530, Manish Jaggi wrote:
> 
> On Friday 27 February 2015 10:20 PM, Ian Campbell wrote:
> >On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote:
> >On 27.02.15 at 16:24,  wrote:
> >>>On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote:
> MMCFG is a Linux config option, not to be confused with
> PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface.  I don't
> think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved
> is relevant.
> >>>My (possibly flawed) understanding was that pci_mmcfg_reserved was
> >>>intended to propagate the result of dom0 parsing some firmware table or
> >>>other to the hypevisor.
> >>That's not flawed at all.
> >I think that's a first in this thread ;-)
> >
> >>>In Linux dom0 we call it walking pci_mmcfg_list, which looking at
> >>>arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking
> >>>over a "struct acpi_table_mcfg" (there also appears to be a bunch of
> >>>processor family derived entries, which I guess are "quirks" of some
> >>>sort).
> >>Right - this parses ACPI tables (plus applies some knowledge about
> >>certain specific systems/chipsets/CPUs) and verifies that the space
> >>needed for the MMCFG region is properly reserved either in E820 or
> >>in the ACPI specified resources (only if so Linux decides to use
> >>MMCFG and consequently also tells Xen that it may use it).
> >Thanks.
> >
> >So I think what I wrote in <1424948710.14641.25.ca...@citrix.com>
> >applies as is to Device Tree based ARM devices, including the need for
> >the PHYSDEVOP_pci_host_bridge_add call.
> >
> >On ACPI based devices we will have the MCFG table, and things follow
> >much as for x86:
> >
> >   * Xen should parse MCFG to discover the PCI host-bridges
> >   * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in
> > the same way as Xen/x86 does.
> >
> >The SBSA, an ARM standard for "servers", mandates various things which
> >we can rely on here because ACPI on ARM requires an SBSA compliant
> >system. So things like odd quirks in PCI controllers or magic setup are
> >spec'd out of our zone of caring (into the firmware I suppose), hence
> >there is nothing like the DT_DEVICE_START stuff to register specific
> >drivers etc.
> >
> >The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM
> >systems (any more than it is on x86). We can decide whether to omit it
> >from dom0 or ignore it from Xen later on.
> >
> >(Manish, this is FYI, I don't expect you to implement ACPI support!)
> 
> In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a
> hypercall to inform xen that a new pci device has been added.
> If we were to inform xen about a new pci bus that is added there are  2 ways
> a) Issue the hypercall from drivers/pci/probe.c
> b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue
> PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that
> segment number (s_bdf), it will return an error
> SEG_NO_NOT_FOUND. After that the linux xen code could issue the
> PHYSDEVOP_pci_host_bridge_add hypercall.

Couldn't the code figure out from 'struct pci_dev' whether the device
is a bridge or an PCI device? And then do the proper hypercall?

Interesting thing you _might_ hit (that I did) was that if you use
'bus=reassign' which re-assigns the bus numbers during scan - Xen
gets very very confused. As in, the bus devices that Xen sees vs the
ones Linux sees are different.

Whether you will encounter this depends on whether the bridge
devices and pci devices end up having an differnet bus number
from what Xen scanned, and from what Linux has determined.

(As in, Linux has found a bridge device with more PCI devices -so
it repograms the bridge which moves all of the other PCI devices 
"below" it by X number).

The reason I am bringing it up - it sounds like Xen will have no clue
about some devices - and be told about it by Linux  - if some reason
it has the same bus number as some that Xen already scanned - gah!
> 
> I think (b) can be done with minimal code changes. What do you think ?

Less code == better.
> 
> >Ian.
> >
> >
> >___
> >Xen-devel mailing list
> >Xen-devel@lists.xen.org
> >http://lists.xen.org/xen-devel
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound

2015-03-17 Thread Stefano Stabellini

On Tue, 17 Mar 2015, Ian Campbell wrote:
> On Thu, 2015-03-12 at 18:14 +, Lars Kurth wrote:
> > Hi,I nearly missed this. Please make sure you forward stuff and change
> > the headline if you want me to look into things. Otherwise I may miss
> > it.
> 
> Sure, I'll try and remember.
> 
> FYI before Ian J went away he mentioned that he had raised some
> questions/issues (either on this or a previous version) which had not
> yet been answered (or maybe not answered to his satisfaction, I'm not
> sure) but that if those were addressed he would take a look with a view
> to acking the interface for inclusion in xen.git.
> 
> (I've not looked in the threads for it, so I don't know the exact
> state).
> 
> > From my perspective, this exactly the kind of scenario why we created
> > the embedded / automotive subproject, with an option to store code in
> > repos owned by the project. 
> > 
> > Given that the primary use-case of these drivers is embedded /
> > automotive, my suggestion would be to
> > 1.a) Use a repo in the embedded / automotive pv driver subproject to
> > host the spec - but use a file system structure that matches the xen
> > tree
> > 1.b) I would assume there would be one back-end and several front-ends
> > for these drivers and some would eventually appear in trees owned by
> > the embedded / automotive pv driver subproject
> > 
> > In this case, the maintainer responsibility would fall to members of
> > the embedded / automotive pv driver subproject. Once there are several
> > implementations, and enough people with skills to review we can
> > re-visit where the spec and drivers live. 
> > 
> > We can have a discussion about criteria of when to move, but I don't
> > think that makes a lot of sense. I think the concerns that need to be
> > addressed are:
> > 2.a) Enough skills to review the code / protocols from different
> > stake-holders - this should happen with time, once the spec and code
> > are there. And of course once the embedded / automotive pv driver
> > subproject graduates, that will also give extra weight to its
> > maintainers in the wider community
> > 2.b) Of course if there was a strong case that PV sound drivers are
> > extremely useful for core data centre use-cases, I would probably
> > suggest another approach
> > 
> > Maybe 2.b) needs to be checked with Intel folks - there may be some
> > sound requirement for XenGT
> > 
> > Would this work as a way forward?
> 
> I think the main things which is missing is some decision as to the the
> point at which we would consider the ABI for a PV protocol fixed, i.e.
> to be maintained in a backwards compatible manner from then on. 
> 
> That's of particular importance when one end of the pair is implemented
> in external projects (e.g. OS driver frontends). If the interface is not
> declared stable then changes would be allowed which would invalidate
> those drivers.

I think that you are right. Declaring the interface stable or unstable
is far more important than where the code or the spec lives.

If we formally specified within the spec that the ABI is not maintained
for backward compatibility, the bar for acceptance in xen-unstable would
be far lower. Maybe the spec could even be accepted as is if nobody has
any comments?

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

Re: [Xen-devel] [PATCH 1/6] x86: detect and initialize Intel CAT feature

2015-03-17 Thread Konrad Rzeszutek Wilk

On Tue, Mar 17, 2015 at 04:11:33PM +0800, Chao Peng wrote:
> On Fri, Mar 13, 2015 at 09:40:13AM -0400, Konrad Rzeszutek Wilk wrote:
> > On Fri, Mar 13, 2015 at 06:13:20PM +0800, Chao Peng wrote:
> > > Detect Intel Cache Allocation Technology(CAT) feature and store the
> > > cpuid information for later use. Currently only L3 cache allocation is
> > > supported. The L3 CAT features may vary among sockets so per-socket
> > > feature information is stored. The initialization can happen either at
> > > boot time or when CPU(s) is hot plugged after booting.
> > > 
> > > Signed-off-by: Chao Peng 
> > > ---
> > >  docs/misc/xen-command-line.markdown |  15 +++-
> > >  xen/arch/x86/psr.c  | 151 
> > > +---
> > >  xen/include/asm-x86/cpufeature.h|   1 +
> > >  3 files changed, 155 insertions(+), 12 deletions(-)
> > > 
> > > +cat_cpu_init(smp_processor_id());
> > 
> > Do 'if (!cat_cpu_init(..)).`'
> > 
> > as the CPU might not support this.
> > 
> > At which point you should also free the cat_socket_info and
> > not register the cpu notifier.
> 
> Even the booting CPU does not support this, other CPUs may still support
> this. Generally the feature is a per-socket feature. So break here is
> not the intention.

Oooh, and you did mention that in the git commit description and I dived
right in the code - without looking there - sorry for that noise!

Thought I am curious - what if all the sockets don't support and the
user does try enable it on the command line (user error)? Shouldn't we
then figure out that all of the CPUs don't support and xfree
cat_socket_info and not register the CPU notifier?
> 
> Except this, all other comments will be addressed by the next version.

thank you!
> Thanks for your time.
> 
> Chao
> > 
> > > +register_cpu_notifier(&cpu_nfb);
> > > +}
> > > +

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

1 2 >

1 - 100 of 163 matches

Mail list logo