Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number
On 2015/3/18 12:26, Manish Jaggi wrote: > > On Tuesday 17 March 2015 07:35 PM, Ian Campbell wrote: >> On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote: >>> On Monday 09 March 2015 08:04 AM, Yijing Wang wrote: Now we could pass PCI domain combined with bus number in u32 argu. Because in arm/arm64, PCI domain number is assigned by pci_bus_assign_domain_nr(). So we leave pci_scan_root_bus() and pci_create_root_bus() in arm/arm64 unchanged. A new function pci_host_assign_domain_nr() will be introduced for arm/arm64 to assign domain number in later patch. >>> Hi, >>> I think these changes might not be required. We have made very few >>> changes in the xen-pcifront to support PCI passthrough in arm64. >>> As per xen architecture for a domU only a single pci virtual bus is >>> created and all passthrough devices are attached to it. >> I guess you are only talking about the changes to xen-pcifront.c? >> Otherwise you are ignoring the dom0 case which is exposed to the real >> set of PCI root complexes and anyway I'm not sure how "not needed for >> Xen domU" translates into not required, since it is clearly required for >> other systems. >> >> Strictly speaking the Xen pciif protocol does support multiple buses, >> it's just that the tools, and perhaps kernels, have not yet felt any >> need to actually make use of that. >> >> There doesn't seem to be any harm in updating pcifront to follow this >> generic API change. > ok. > > One side question, the function > > pci_host_assign_domain_nr() > > which would be introduced in later patch, does it appear to be doing the same > binding which we are trying to implement via a pci_host_bridge add hypercall. pci_host_assign_domain_nr() will be called only when CONFIG_PCI_DOMAINS_GENERIC enabled, now mostly be used in arm/arm64. Thanks! Yijing. > >> >> Ian. >> > > > -- Thanks! Yijing ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 3/5] xen: print online pCPUs and free pCPUs when dumping
On 03/17/2015 04:33 PM, Dario Faggioli wrote: e.g., with `xl debug-key r', like this: (XEN) Online Cpus: 0-15 (XEN) Free Cpus: 8-15 Also, for each cpupool, print the set of pCPUs it contains, like this: (XEN) Cpupool 0: (XEN) Cpus: 0-7 (XEN) Scheduler: SMP Credit Scheduler (credit) Signed-off-by: Dario Faggioli Acked-by: Juergen Gross Cc: Juergen Gross Cc: George Dunlap Cc: Jan Beulich Cc: Keir Fraser --- Changes from v1: * _print_cpumap() becomes print_cpumap() (i.e., the leading '_' was not particularly useful in this case), as suggested during review * changed the output such as (1) we only print the maps, not the number of elements, and (2) we avoid printing the free cpus map when empty * improved the changelog --- I'm not including any Reviewed-by / Acked-by tag, since the patch changed. --- xen/common/cpupool.c | 12 1 file changed, 12 insertions(+) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index cd6aab9..812a2f9 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #define for_each_cpupool(ptr)\ @@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op) return ret; } +static void print_cpumap(const char *str, const cpumask_t *map) +{ +cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map); +printk("%s: %s\n", str, keyhandler_scratch); +} + void dump_runq(unsigned char key) { unsigned longflags; @@ -671,12 +678,17 @@ void dump_runq(unsigned char key) sched_smt_power_savings? "enabled":"disabled"); printk("NOW=0x%08X%08X\n", (u32)(now>>32), (u32)now); +print_cpumap("Online Cpus", &cpu_online_map); +if ( cpumask_weight(&cpupool_free_cpus) ) +print_cpumap("Free Cpus", &cpupool_free_cpus); + printk("Idle cpupool:\n"); schedule_dump(NULL); for_each_cpupool(c) { printk("Cpupool %d:\n", (*c)->cpupool_id); +print_cpumap("Cpus", (*c)->cpu_valid); schedule_dump(*c); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number
On Tuesday 17 March 2015 07:35 PM, Ian Campbell wrote: On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote: On Monday 09 March 2015 08:04 AM, Yijing Wang wrote: Now we could pass PCI domain combined with bus number in u32 argu. Because in arm/arm64, PCI domain number is assigned by pci_bus_assign_domain_nr(). So we leave pci_scan_root_bus() and pci_create_root_bus() in arm/arm64 unchanged. A new function pci_host_assign_domain_nr() will be introduced for arm/arm64 to assign domain number in later patch. Hi, I think these changes might not be required. We have made very few changes in the xen-pcifront to support PCI passthrough in arm64. As per xen architecture for a domU only a single pci virtual bus is created and all passthrough devices are attached to it. I guess you are only talking about the changes to xen-pcifront.c? Otherwise you are ignoring the dom0 case which is exposed to the real set of PCI root complexes and anyway I'm not sure how "not needed for Xen domU" translates into not required, since it is clearly required for other systems. Strictly speaking the Xen pciif protocol does support multiple buses, it's just that the tools, and perhaps kernels, have not yet felt any need to actually make use of that. There doesn't seem to be any harm in updating pcifront to follow this generic API change. ok. One side question, the function pci_host_assign_domain_nr() which would be introduced in later patch, does it appear to be doing the same binding which we are trying to implement via a pci_host_bridge add hypercall. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Tuesday 17 March 2015 06:01 PM, Jan Beulich wrote: On 17.03.15 at 13:06, wrote: On Tuesday 17 March 2015 12:58 PM, Jan Beulich wrote: On 17.03.15 at 06:26, wrote: In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a hypercall to inform xen that a new pci device has been added. If we were to inform xen about a new pci bus that is added there are 2 ways a) Issue the hypercall from drivers/pci/probe.c b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that segment number (s_bdf), it will return an error SEG_NO_NOT_FOUND. After that the linux xen code could issue the PHYSDEVOP_pci_host_bridge_add hypercall. I think (b) can be done with minimal code changes. What do you think ? I'm pretty sure (a) would even be refused by the maintainers, unless there already is a notification being sent. As to (b) - kernel code could keep track of which segment/bus pairs it informed Xen about, and hence wouldn't even need to wait for an error to be returned from the device-add request (which in your proposal would need to be re- issued after the host-bridge-add). Have a query on the CFG space address to be passed as hypercall parameter. The of_pci_get_host_bridge_resource only parses the ranges property and not reg. reg property has the CFG space address, which is usually stored in private pci host controller driver structures. so pci_dev 's parent pci_bus would not have that info. One way is to add a method in struct pci_ops but not sure it will be accepted or not. I'm afraid I don't understand what you're trying to tell me. Hi Jan, I missed this during initial discussion and found out while coding that CFG Space address of a pci host is stored in the reg property and the of_pci code dos not store reg in the resources only ranges are stored. So the pci_bus which is the rootbus created in the probe function of the pcie controller driver will have ranges values in resources but reg property value (CFG space address) in the private data. So from drivers/xen/pci.c we can find out the root bus (pci_bus) from the pci_dev (via BUS_NOTIFY) but cannot get the CFG space address. Now there are 2 ways a) Add a pci_ops to return the CFG space address b) Let the pci host controller driver invoke a function xen_invoke_hypercall () providing bus number and cfg_space address. xen_invoke_hypercall would be implemented in drivers/xen/pci.c and would issue PHYSDEVOP_pci_host_bridge_add hypercall Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 5/5] xen: sched_rt: print useful affinity info when dumping
2015-03-17 11:33 GMT-04:00 Dario Faggioli : > In fact, printing the cpupool's CPU online mask > for each vCPU is just redundant, as that is the > same for all the vCPUs of all the domains in the > same cpupool, while hard affinity is already part > of the output of dumping domains info. > > Instead, print the intersection between hard > affinity and online CPUs, which is --in case of this > scheduler-- the effective affinity always used for > the vCPUs. > > This change also takes the chance to add a scratch > cpumask area, to avoid having to either put one > (more) cpumask_t on the stack, or dynamically > allocate it within the dumping routine. (The former > being bad because hypervisor stack size is limited, > the latter because dynamic allocations can fail, if > the hypervisor was built for a large enough number > of CPUs.) > > Such scratch area can be used to kill most of the > cpumasks{_var}_t local variables in other functions > in the file, but that is *NOT* done in this chage. > > Finally, convert the file to use keyhandler scratch, > instead of open coded string buffers. > > Signed-off-by: Dario Faggioli > Cc: George Dunlap > Cc: Meng Xu > Cc: Jan Beulich > Cc: Keir Fraser > --- > Changes from v1: > * improved changelog; > * made a local variable to point to the correct >scratch mask, as suggested during review. > --- Reviewed-by: Meng Xu Thanks, Best, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
Hi Dario, 2015-03-17 10:12 GMT-04:00 Dario Faggioli : > On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote: >> Hi Dario, >> > Hey, > >> 2015-03-16 13:05 GMT-04:00 Dario Faggioli : > >> > >> > This change also takes the chance to add a scratch >> > cpumask, to avoid having to create one more >> > cpumask_var_t on the stack of the dumping routine. >> >> Actually, I have a question about the strength of this design. When we >> have a machine with many cpus, we will end up with allocating a >> cpumask for each cpu. >> > Just FTR, what we will end up allocating is: > - an array of *pointers* to cpumasks with as many elements as the >number of pCPUs, > - a cpumask *only* for the pCPUs subjected to an instance of the RTDS >scheduler. > > So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler > only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to > cpumask_t, but only 2 actual cpumasks. > >> Is this better than having a cpumask_var_t on >> the stack of the dumping routine, since the dumping routine is not in >> the hot path? >> > George and Jan replied to this already, I think. Allow me to add just a > few words: >> > Such scratch area can be used to kill most of the >> > cpumasks_var_t local variables in other functions >> > in the file, but that is *NOT* done in this chage. >> > > This is the point, actually! As said here, this is not only for the sake > of the dumping routine. In fact, ideally, someone will, in the near > future, go throughout the whole file and kill most of the cpumask_t > local variables, and most of the cpumask dynamic allocations, in favour > of using this scratch area. > >> > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops) >> > if ( prv == NULL ) >> > return -ENOMEM; >> > >> > +_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids); >> >> Is it better to use xzalloc_array? >> > Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the > elements of the array that have been previously successfully allocated > (in rt_alloc_pdata()), so I don't think there is any special requirement > for all the elements to be NULL right away. OK. I see. > >> > +if ( _cpumask_scratch == NULL ) >> > +return -ENOMEM; >> > + >> > spin_lock_init(&prv->lock); >> > INIT_LIST_HEAD(&prv->sdom); >> > INIT_LIST_HEAD(&prv->runq); >> > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops) >> > { >> > struct rt_private *prv = rt_priv(ops); >> > >> > +xfree(_cpumask_scratch); >> > xfree(prv); >> > } >> > >> > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu) >> > per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; >> > spin_unlock_irqrestore(&prv->lock, flags); >> > >> > +if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) ) >> >> Is it better to use zalloc_cpumask_var() here? >> > Nope. It's a scratch area, after all, so one really should not assume it > to be in a specific state (e.g., no bits set as you're suggesting) when > using it. I see the point. Now I got it. :-) > > Thanks and Regards, Thank you very much for clarification! :-) Best, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
>>> >>> This change also takes the chance to add a scratch >>> cpumask, to avoid having to create one more >>> cpumask_var_t on the stack of the dumping routine. >> >> Actually, I have a question about the strength of this design. When we >> have a machine with many cpus, we will end up with allocating a >> cpumask for each cpu. Is this better than having a cpumask_var_t on >> the stack of the dumping routine, since the dumping routine is not in >> the hot path? > > The reason for taking this off the stack is that the hypervisor stack is > a fairly limited resource -- IIRC it's only 8k (for each cpu). If the > call stack gets too deep, the hypervisor will triple-fault. Keeping > really large variables like cpumasks off the stack is key to making sure > we don't get close to that. I see. I didn't realize the fact of the limited size of hypervisor stack. That makes sense. Thank you very much for clarification! :-) Best, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
On 03/17/2015 10:52 PM, Felipe Franciosi wrote: > Hi Bob, > > I've put the hardware back together and am sorting out the software for > testing. Things are not moving as fast as I wanted due to other commitments. > I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to > get his patches to work on a newer Xen. > Thank you! > The evaluation will compare: > 1) bare metal i/o (for baseline) > 2) tapdisk3 (currently using grant copy, which is what scales best in my > experience) > 3) blkback w/ persistent grants > 4) blkback w/o persistent grants (I will just comment out the handshake bits > in blkback/blkfront) > 5) blkback w/o persistent grants + Malcolm's grant map patches > I think you need to add the patches from Christoph Egger with title "[PATCH v5 0/2] gnttab: Improve scaleability" here. http://lists.xen.org/archives/html/xen-devel/2015-02/msg01188.html > To my knowledge, blkback (w/ or w/o persistent grants) is always faster than > user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. > However, tapdisk with grant copy has been shown to produce (much) better > aggregate throughput figures as it avoids any issues with grant (un)mapping. > > I'm hoping to show that (5) above scales better than (3) and (4) in a > representative scenario. If it does, I will recommend that we get rid of > persistent grants in favour of a better and more scalable grant (un)mapping > implementation. > Right, but even if 5) have better performance, we have to make sure older hypervisors with new linux kernel won't be affected after get rid of persistent grants. -- Regards, -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Any work on sharing of large multi-page segments?
On 3/17/15, Jan Beulich wrote: > And how would that be significantly different from the batching > that's already built into the grant table hypercall? > I guess it does do more or less what I want already. I was looking more at the inner mapping/unmapping functions, rather than the wrappers around them that implement the actual hypercalls. What would be a useful addition would be support for granting 2M pages. That would eliminate any problem with running out of grant table slots. On 3/17/15, George Dunlap wrote: > Any deduplication code would run in as a process probably in domain 0, > and may be somewhat slow; but the actual mechanism of sharing is a > generic mechanism in the hypervisor which any client can use. Jan is > suggesting that you might be able to use that interface to > pro-actively tell Xen about the memory pages shared between your > various domains. > I wasn't quite sure if it's generic enough to use to implement shared segments, or if it were specific to deduplication at the hypervisor level. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
Tuesday, March 17, 2015, 6:44:54 PM, you wrote: >> >> Additionally I think it should be considered whether the bitmap >> >> approach of interpreting ->state is the right one, and we don't >> >> instead want a clean 3-state (idle, sched, run) model. >> > >> > Could you elaborate a bit more please? As in three different unsigned int >> > (or bool_t) that set in what state we are in? >> >> An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially >> if my comment above turns out to be wrong, you'd have no real >> need for the SCHED and RUN flags to be set at the same time. > I cobbled what I believe is what you were thinking off. > As you can see to preserve the existing functionality such as > being able to schedule N amount of interrupt injections > for the N interrupts we might get - I modified '->masked' > to be an atomic counter. > The end result is that we can still live-lock. Unless we: > - Drop on the floor the injection of N interrupts and >just deliever at max one per VMX_EXIT (and not bother >with interrupts arriving when we are in the VMX handler). > - Alter the softirq code slightly - to have an variant >which will only iterate once over the pending softirq >bits per call. (so save an copy of the bitmap on the >stack when entering the softirq handler - and use that. >We could also xor it against the current to catch any >non-duplicate bits being set that we should deal with). > Here is the compile, but not run-time tested patch. > From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001 > From: Konrad Rzeszutek Wilk > Date: Tue, 17 Mar 2015 13:31:52 -0400 > Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap > *TODO*: > - Writeup. > - Tests Done, and unfortunately it doesn't fly .. Some devices seem to work fine, others don't receive any interrupts shortly after boot like: 40: 3 0 0 0 xen-pirq-ioapic-level cx25821[1] Don't see any crashes or errors though, so it seems to silently lock somewhere. -- Sander > Suggested-by: Jan Beulich > Signed-off-by: Konrad Rzeszutek Wilk > --- > xen/drivers/passthrough/io.c | 140 > --- > xen/include/xen/hvm/irq.h| 4 +- > 2 files changed, 82 insertions(+), 62 deletions(-) > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c > index ae050df..663e104 100644 > --- a/xen/drivers/passthrough/io.c > +++ b/xen/drivers/passthrough/io.c > @@ -30,42 +30,28 @@ > static DEFINE_PER_CPU(struct list_head, dpci_list); > > /* > - * These two bit states help to safely schedule, deschedule, and wait until > - * the softirq has finished. > - * > - * The semantics behind these two bits is as follow: > - * - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom). > - * - STATE_RUN - only softirq is allowed to set and clear it. If it has > - * been set hvm_dirq_assist will RUN with a saved value of the > - * 'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was > set. > - * > - * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) -> > - * STATE_SCHED(unset) -> STATE_RUN(unset). > - * > - * However the states can also diverge such as: STATE_SCHED(set) -> > - * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means > - * the 'hvm_dirq_assist' never run and that the softirq did not do any > - * ref-counting. > - */ > - > -enum { > -STATE_SCHED, > -STATE_RUN > -}; > - > -/* > * This can be called multiple times, but the softirq is only raised once. > - * That is until the STATE_SCHED state has been cleared. The state can be > - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), > - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before > + * That is until state is in init. The state can be changed by: > + * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), > + * or by 'pt_pirq_softirq_reset' (which will try to init the state before > * the softirq had a chance to run). > */ > static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) > { > unsigned long flags; > > -if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) > +switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) ) > +{ > +case STATE_RUN: > +case STATE_SCHED: > +/* > + * The pirq_dpci->mapping has been incremented to let us know > + * how many we have left to do. > + */ > return; > +case STATE_INIT: > +break; > +} > > get_knownalive_domain(pirq_dpci->dom); > > @@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci > *pirq_dpci) > */ > bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) > { > -if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) ) > +if ( pirq_dpci->state != STATE_INIT ) > return 1; > > /* > @@ -109,22 +95,22 @@ static void p
[Xen-devel] [PATCH 1/3] checkpolicy: Expand allowed character set in paths
In order to support paths containing spaces or other characters, allow a quoted string with these characters to be parsed as a path in addition to the existing unquoted string. Signed-off-by: Daniel De Graaf --- checkpolicy/policy_parse.y | 3 +++ checkpolicy/policy_scan.l | 1 + 2 files changed, 4 insertions(+) diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y index 15c8997..e5210bd 100644 --- a/checkpolicy/policy_parse.y +++ b/checkpolicy/policy_parse.y @@ -81,6 +81,7 @@ typedef int (* require_func_t)(int pass); %type require_decl_def %token PATH +%token QPATH %token FILENAME %token CLONE %token COMMON @@ -805,6 +806,8 @@ filesystem : FILESYSTEM ; path : PATH { if (insert_id(yytext,0)) return -1; } + | QPATH + { yytext[strlen(yytext) - 1] = '\0'; if (insert_id(yytext + 1,0)) return -1; } ; filename : FILENAME { yytext[strlen(yytext) - 1] = '\0'; if (insert_id(yytext + 1,0)) return -1; } diff --git a/checkpolicy/policy_scan.l b/checkpolicy/policy_scan.l index 648e1d6..6763c38 100644 --- a/checkpolicy/policy_scan.l +++ b/checkpolicy/policy_scan.l @@ -240,6 +240,7 @@ HIGH{ return(HIGH); } low | LOW{ return(LOW); } "/"({alnum}|[_\.\-/])* { return(PATH); } +\""/"[ !#-~]*\"{ return(QPATH); } \"({alnum}|[_\.\-\+\~\: ])+\" { return(FILENAME); } {letter}({alnum}|[_\-])*([\.]?({alnum}|[_\-]))*{ return(IDENTIFIER); } {digit}+|0x{hexval}+{ return(NUMBER); } -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/3] libsepol, checkpolicy: add device tree ocontext nodes to Xen policy
In Xen on ARM, device tree nodes identified by a path (string) need to be labeled by the security policy. Signed-off-by: Daniel De Graaf --- checkpolicy/policy_define.c| 55 + checkpolicy/policy_define.h| 1 + checkpolicy/policy_parse.y | 8 +++- checkpolicy/policy_scan.l | 2 + libsepol/cil/src/cil.c | 17 libsepol/cil/src/cil_binary.c | 29 + libsepol/cil/src/cil_build_ast.c | 66 ++ libsepol/cil/src/cil_build_ast.h | 2 + libsepol/cil/src/cil_copy_ast.c| 24 +++ libsepol/cil/src/cil_flavor.h | 1 + libsepol/cil/src/cil_internal.h| 10 + libsepol/cil/src/cil_post.c| 34 +++ libsepol/cil/src/cil_reset_ast.c | 10 + libsepol/cil/src/cil_resolve_ast.c | 28 + libsepol/cil/src/cil_tree.c| 13 ++ libsepol/cil/src/cil_verify.c | 24 +++ libsepol/include/sepol/policydb/policydb.h | 1 + libsepol/src/expand.c | 7 libsepol/src/policydb.c| 18 +++- libsepol/src/write.c | 14 ++- sepolgen/src/sepolgen/refparser.py | 11 + sepolgen/src/sepolgen/refpolicy.py | 9 22 files changed, 379 insertions(+), 5 deletions(-) diff --git a/checkpolicy/policy_define.c b/checkpolicy/policy_define.c index 66c1ff2..de01f6f 100644 --- a/checkpolicy/policy_define.c +++ b/checkpolicy/policy_define.c @@ -4116,6 +4116,61 @@ bad: return -1; } +int define_devicetree_context() +{ + ocontext_t *newc, *c, *l, *head; + + if (policydbp->target_platform != SEPOL_TARGET_XEN) { + yyerror("devicetreecon not supported for target"); + return -1; + } + + if (pass == 1) { + free(queue_remove(id_queue)); + parse_security_context(NULL); + return 0; + } + + newc = malloc(sizeof(ocontext_t)); + if (!newc) { + yyerror("out of memory"); + return -1; + } + memset(newc, 0, sizeof(ocontext_t)); + + newc->u.name = (char *)queue_remove(id_queue); + if (!newc->u.name) { + free(newc); + return -1; + } + + if (parse_security_context(&newc->context[0])) { + free(newc->u.name); + free(newc); + return -1; + } + + head = policydbp->ocontexts[OCON_XEN_DEVICETREE]; + for (l = NULL, c = head; c; l = c, c = c->next) { + if (strcmp(newc->u.name, c->u.name) == 0) { + yyerror2("duplicate devicetree entry for '%s'", newc->u.name); + goto bad; + } + } + + if (l) + l->next = newc; + else + policydbp->ocontexts[OCON_XEN_DEVICETREE] = newc; + + return 0; + +bad: + free(newc->u.name); + free(newc); + return -1; +} + int define_port_context(unsigned int low, unsigned int high) { ocontext_t *newc, *c, *l, *head; diff --git a/checkpolicy/policy_define.h b/checkpolicy/policy_define.h index 14d30e1..a87ced3 100644 --- a/checkpolicy/policy_define.h +++ b/checkpolicy/policy_define.h @@ -49,6 +49,7 @@ int define_pirq_context(unsigned int pirq); int define_iomem_context(uint64_t low, uint64_t high); int define_ioport_context(unsigned long low, unsigned long high); int define_pcidevice_context(unsigned long device); +int define_devicetree_context(void); int define_range_trans(int class_specified); int define_role_allow(void); int define_role_trans(int class_specified); diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y index e3899b9..8b81f04 100644 --- a/checkpolicy/policy_parse.y +++ b/checkpolicy/policy_parse.y @@ -130,7 +130,7 @@ typedef int (* require_func_t)(int pass); %token TARGET %token SAMEUSER %token FSCON PORTCON NETIFCON NODECON -%token PIRQCON IOMEMCON IOPORTCON PCIDEVICECON +%token PIRQCON IOMEMCON IOPORTCON PCIDEVICECON DEVICETREECON %token FSUSEXATTR FSUSETASK FSUSETRANS %token GENFSCON %token U1 U2 U3 R1 R2 R3 T1 T2 T3 L1 L2 H1 H2 @@ -644,7 +644,8 @@ dev_contexts: dev_context_def dev_context_def: pirq_context_def | iomem_context_def | ioport_context_def | - pci_context_def + pci_context_def | + dtree_context_def ; pirq_context_def : PIRQCON number security_context_def {if (define_pirq_context($2)) return -1;} @@ -662,6 +663,9 @@ ioport_context_def : IOPORTCON number security_context_def pci_context_def: PCIDEVICECON number security_context_def
[Xen-devel] [PATCH v3 0/3] Xen/FLASK policy updates for device contexts
In order to support assigning security lables to ARM device tree nodes in Xen's XSM policy, a new ocontext type is needed in the security policy. In addition to adding the new ocontext, the existing I/O memory range ocontext is expanded to 64 bits in order to support hardware with more than 44 bits of physical address space (32-bit count of 4K pages). Changes from v2: - Clean up printf format strings for 32-bit builds Changes from v1: - Use policy version 30 instead of forking the version numbers for Xen; this removes the need for v1's patch 3. - Report an error when attempting to use an I/O memory range that requires a 64-bit representation with an old policy output version that cannot support this - Fix a few incorrect references to PCIDEVICECON - Reorder patches to clarify the allowed characterset of device tree paths [PATCH 1/3] checkpolicy: Expand allowed character set in paths [PATCH 2/3] libsepol, checkpolicy: widen Xen IOMEM ocontext entries [PATCH 3/3] libsepol, checkpolicy: add device tree ocontext nodes to ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] libsepol, checkpolicy: widen Xen IOMEM ocontext entries
This expands IOMEMCON device context entries to 64 bits. This change is required to support static I/O memory range labeling for systems with over 16TB of physical address space. The policy version number change is shared with the next patch. While this makes no changes to SELinux policy, a new SELinux policy compatibility entry was added in order to avoid breaking compilation of an SELinux policy without explicitly specifying the policy version. Signed-off-by: Daniel De Graaf --- checkpolicy/policy_define.c| 11 +- checkpolicy/policy_define.h| 2 +- checkpolicy/policy_parse.y | 9 ++-- libsepol/cil/src/cil_build_ast.c | 32 ++--- libsepol/cil/src/cil_build_ast.h | 1 + libsepol/cil/src/cil_internal.h| 4 ++-- libsepol/cil/src/cil_policy.c | 3 ++- libsepol/cil/src/cil_tree.c| 3 ++- libsepol/include/sepol/policydb/policydb.h | 7 --- libsepol/src/policydb.c| 33 +- libsepol/src/write.c | 32 ++--- policycoreutils/hll/pp/pp.c| 4 ++-- 12 files changed, 109 insertions(+), 32 deletions(-) diff --git a/checkpolicy/policy_define.c b/checkpolicy/policy_define.c index a6c5d65..66c1ff2 100644 --- a/checkpolicy/policy_define.c +++ b/checkpolicy/policy_define.c @@ -39,6 +39,7 @@ #include #include #include +#include #include #include @@ -3932,7 +3933,7 @@ bad: return -1; } -int define_iomem_context(unsigned long low, unsigned long high) +int define_iomem_context(uint64_t low, uint64_t high) { ocontext_t *newc, *c, *l, *head; char *id; @@ -3960,7 +3961,7 @@ int define_iomem_context(unsigned long low, unsigned long high) newc->u.iomem.high_iomem = high; if (low > high) { - yyerror2("low memory 0x%lx exceeds high memory 0x%lx", low, high); + yyerror2("low memory 0x%"PRIx64" exceeds high memory 0x%"PRIx64"", low, high); free(newc); return -1; } @@ -3972,13 +3973,13 @@ int define_iomem_context(unsigned long low, unsigned long high) head = policydbp->ocontexts[OCON_XEN_IOMEM]; for (l = NULL, c = head; c; l = c, c = c->next) { - uint32_t low2, high2; + uint64_t low2, high2; low2 = c->u.iomem.low_iomem; high2 = c->u.iomem.high_iomem; if (low <= high2 && low2 <= high) { - yyerror2("iomemcon entry for 0x%lx-0x%lx overlaps with " - "earlier entry 0x%x-0x%x", low, high, + yyerror2("iomemcon entry for 0x%"PRIx64"-0x%"PRIx64" overlaps with " + "earlier entry 0x%"PRIx64"-0x%"PRIx64"", low, high, low2, high2); goto bad; } diff --git a/checkpolicy/policy_define.h b/checkpolicy/policy_define.h index 4ef0f4f..14d30e1 100644 --- a/checkpolicy/policy_define.h +++ b/checkpolicy/policy_define.h @@ -46,7 +46,7 @@ int define_permissive(void); int define_polcap(void); int define_port_context(unsigned int low, unsigned int high); int define_pirq_context(unsigned int pirq); -int define_iomem_context(unsigned long low, unsigned long high); +int define_iomem_context(uint64_t low, uint64_t high); int define_ioport_context(unsigned long low, unsigned long high); int define_pcidevice_context(unsigned long device); int define_range_trans(int class_specified); diff --git a/checkpolicy/policy_parse.y b/checkpolicy/policy_parse.y index e5210bd..e3899b9 100644 --- a/checkpolicy/policy_parse.y +++ b/checkpolicy/policy_parse.y @@ -67,6 +67,7 @@ typedef int (* require_func_t)(int pass); %union { unsigned int val; + uint64_t val64; uintptr_t valptr; void *ptr; require_func_t require_func; @@ -78,6 +79,7 @@ typedef int (* require_func_t)(int pass); %type role_def roles %type cexpr cexpr_prim op role_mls_op %type ipv4_addr_def number +%type number64 %type require_decl_def %token PATH @@ -647,9 +649,9 @@ dev_context_def : pirq_context_def | pirq_context_def : PIRQCON number security_context_def {if (define_pirq_context($2)) return -1;} ; -iomem_context_def : IOMEMCON number security_context_def +iomem_context_def : IOMEMCON number64 security_context_def {if (define_iomem_context($2,$2)) return -1;} - | IOMEMCON number '-' number security_context_def + | IOMEMCON number64 '-' number64 security_context_def {if (define_iomem_context($2,$4)) return -1;} ; ioport_context_def : IOPORTCON number security_context_def @@ -815,6 +817,9 @@ filename
Re: [Xen-devel] [PATCH v2 2/2] sched_credit2.c: runqueue_per_core code
On Mon, 2015-03-16 at 12:56 +, Jan Beulich wrote: > >>> On 16.03.15 at 13:51, wrote: > > On 03/16/2015 12:48 PM, Jan Beulich wrote: > >> Them returning garbage isn't what needs fixing. Instead the code > >> here should use a different condition to check whether this is the > >> boot CPU (e.g. looking at system_state). And that can very well be > >> done directly in this patch. > > > > What do you suggest, then? > > My preferred solution would be, as said, to leverage system_state. > Provided the state to look for is consistent between x86 and ARM. > Would something like this make sense? diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index cfca5a7..2f2aa73 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu) } /* Figure out which runqueue to put it in */ -rqi = 0; - -/* Figure out which runqueue to put it in */ -/* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */ -if ( cpu == 0 ) -rqi = 0; +if ( system_state == SYS_STATE_boot ) +rqi = boot_cpu_to_socket(cpu); else rqi = cpu_to_socket(cpu); @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu) static void * csched2_alloc_pdata(const struct scheduler *ops, int cpu) { -/* Check to see if the cpu is online yet */ -/* Note: cpu 0 doesn't get a STARTING callback */ -if ( cpu == 0 || cpu_to_socket(cpu) >= 0 ) +/* + * Actual initialization is deferred to when the pCPU will be + * online, via a STARTING callback. The only exception is + * the boot cpu, which does not get such a notification, and + * hence needs to be taken care of here. + */ +if ( system_state == SYS_STATE_boot ) init_pcpu(ops, cpu); else printk("%s: cpu %d not online yet, deferring initializatgion\n", diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index cfca5a7..2f2aa73 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -1936,12 +1936,8 @@ static void init_pcpu(const struct scheduler *ops, int cpu) } /* Figure out which runqueue to put it in */ -rqi = 0; - -/* Figure out which runqueue to put it in */ -/* NB: cpu 0 doesn't get a STARTING callback, so we hard-code it to runqueue 0. */ -if ( cpu == 0 ) -rqi = 0; +if ( system_state == SYS_STATE_boot ) +rqi = boot_cpu_to_socket(cpu); else rqi = cpu_to_socket(cpu); @@ -1986,9 +1982,13 @@ static void init_pcpu(const struct scheduler *ops, int cpu) static void * csched2_alloc_pdata(const struct scheduler *ops, int cpu) { -/* Check to see if the cpu is online yet */ -/* Note: cpu 0 doesn't get a STARTING callback */ -if ( cpu == 0 || cpu_to_socket(cpu) >= 0 ) +/* + * Actual initialization is deferred to when the pCPU will be + * online, via a STARTING callback. The only exception is + * the boot cpu, which does not get such a notification, and + * hence needs to be taken care of here. + */ +if ( system_state == SYS_STATE_boot ) init_pcpu(ops, cpu); else printk("%s: cpu %d not online yet, deferring initializatgion\n", signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] libxl: Domain destroy: fork
On Tue, Mar 17, 2015 at 09:30:59AM -0600, Jim Fehlig wrote: > From: Ian Jackson > > Call xc_domain_destroy in a subprocess. That allows us to do so > asynchronously, rather than blocking the whole process calling libxl. > > The changes in detail: > > * Provide an libxl__ev_child in libxl__domain_destroy_state, and >initialise it in libxl__domain_destroy. There is no possibility >to `clean up' a libxl__ev_child, but there need to clean it up, as >the control flow ensures that we only continue after the child has >exited. > > * Call libxl__ev_child_fork at the right point and put the call to >xc_domain_destroy and associated logging in the child. (The child >opens a new xenctrl handle because we mustn't use the parent's.) > > * Consequently, the success return path from domain_destroy_domid_cb >no longer calls dis->callback. Instead it simply returns. > > * We plumb the errorno value through the child's exit status, if it >fits. This means we normally do the logging only in the parent. > > * Incidentally, we fix the bug that we were treating the return value >from xc_domain_destroy as an errno value when in fact it is a >return value from do_domctl (in this case, 0 or -1 setting errno). > > Signed-off-by: Ian Jackson > Reviewed-by: Jim Fehlig > Tested-by: Jim Fehlig Reviewed-by: Wei Liu One nit below. > --- [...] > +ctx->xch = xc_interface_open(ctx->lg,0,0); > +if (!ctx->xch) goto badchild; > + > +rc = xc_domain_destroy(ctx->xch, domid); > +if (rc < 0) goto badchild; > +_exit(0); > + > +badchild: > +if (errno > 0 && errno < 126) { > +_exit(errno); > +} else { > +LOGE(ERROR, > + "xc_domain_destroy failed for %d (with difficult errno value %d)", Indentation is wrong. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Running update-server-info on push
It seems some repos have this via the hooks/post-update.sample having been renamed to hooks/post-update, but a few don't. So I've done: xen@xenbits:~/git$ for i in rumpuser-xen.git mini-os.git libvirt.git ; do > mv -iv $i/hooks/post-update.sample $i/hooks/post-update > done `rumpuser-xen.git/hooks/post-update.sample' -> `rumpuser-xen.git/hooks/post-update' `mini-os.git/hooks/post-update.sample' -> `mini-os.git/hooks/post-update' `libvirt.git/hooks/post-update.sample' -> `libvirt.git/hooks/post-update' xen@xenbits:~/git$ for i in rumpuser-xen.git mini-os.git libvirt.git ; do > ( cd $i && git update-server-info ) > done xen@xenbits:~/git$ I did not investigate people/* or xenclient/*. This will explain the failure of flight 36502 which has yet to be posted. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] libxl: In domain death search, start search at first domid we want
On Tue, Mar 17, 2015 at 09:30:57AM -0600, Jim Fehlig wrote: > From: Ian Jackson > > From: Ian Jackson > > When domain_death_xswatch_callback needed a further call to > xc_domain_getinfolist it would restart it with the last domain it > found rather than the first one it wants. > > If it only wants one it will also only ask for one domain. The result > would then be that it gets the previous domain again (ie, the previous > one to the one it wants), which still doesn't reveal the answer to the > question, and it would therefore loop again. > > It's completely unclear to me why I thought it was a good idea to > start the xc_domain_getinfolist with the last domain previously found > rather than the first one left un-confirmed. The code has been that > way since it was introduced. > > Instead, start each xc_domain_getinfolist at the next domain whose > status we need to check. > > We also need to move the test for !evg into the loop, we now need evg > to compute the arguments to getinfolist. > > Signed-off-by: Ian Jackson > Reported-by: Jim Fehlig > Reviewed-by: Jim Fehlig > Tested-by: Jim Fehlig Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] libxl: Domain destroy: unlock userdata earlier
On Tue, Mar 17, 2015 at 09:30:58AM -0600, Jim Fehlig wrote: > From: Ian Jackson > > Unlock the userdata before we actually call xc_domain_destroy. This > leaves open the possibility that other libxl callers will see the > half-destroyed domain (with no devices, paused), but this is fine. > > Signed-off-by: Ian Jackson > CC: Wei Liu > Reviewed-by: Jim Fehlig > Tested-by: Jim Fehlig Acked-by: Wei Liu ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
> >> Additionally I think it should be considered whether the bitmap > >> approach of interpreting ->state is the right one, and we don't > >> instead want a clean 3-state (idle, sched, run) model. > > > > Could you elaborate a bit more please? As in three different unsigned int > > (or bool_t) that set in what state we are in? > > An enum { STATE_IDLE, STATE_SCHED, STATE_RUN }. Especially > if my comment above turns out to be wrong, you'd have no real > need for the SCHED and RUN flags to be set at the same time. I cobbled what I believe is what you were thinking off. As you can see to preserve the existing functionality such as being able to schedule N amount of interrupt injections for the N interrupts we might get - I modified '->masked' to be an atomic counter. The end result is that we can still live-lock. Unless we: - Drop on the floor the injection of N interrupts and just deliever at max one per VMX_EXIT (and not bother with interrupts arriving when we are in the VMX handler). - Alter the softirq code slightly - to have an variant which will only iterate once over the pending softirq bits per call. (so save an copy of the bitmap on the stack when entering the softirq handler - and use that. We could also xor it against the current to catch any non-duplicate bits being set that we should deal with). Here is the compile, but not run-time tested patch. >From e7d8bcd7c5d32c520554a4ad69c4716246036002 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 17 Mar 2015 13:31:52 -0400 Subject: [RFC PATCH] dpci: Switch to tristate instead of bitmap *TODO*: - Writeup. - Tests Suggested-by: Jan Beulich Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/io.c | 140 --- xen/include/xen/hvm/irq.h| 4 +- 2 files changed, 82 insertions(+), 62 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..663e104 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -30,42 +30,28 @@ static DEFINE_PER_CPU(struct list_head, dpci_list); /* - * These two bit states help to safely schedule, deschedule, and wait until - * the softirq has finished. - * - * The semantics behind these two bits is as follow: - * - STATE_SCHED - whoever modifies it has to ref-count the domain (->dom). - * - STATE_RUN - only softirq is allowed to set and clear it. If it has - * been set hvm_dirq_assist will RUN with a saved value of the - * 'struct domain' copied from 'pirq_dpci->dom' before STATE_RUN was set. - * - * The usual states are: STATE_SCHED(set) -> STATE_RUN(set) -> - * STATE_SCHED(unset) -> STATE_RUN(unset). - * - * However the states can also diverge such as: STATE_SCHED(set) -> - * STATE_SCHED(unset) -> STATE_RUN(set) -> STATE_RUN(unset). That means - * the 'hvm_dirq_assist' never run and that the softirq did not do any - * ref-counting. - */ - -enum { -STATE_SCHED, -STATE_RUN -}; - -/* * This can be called multiple times, but the softirq is only raised once. - * That is until the STATE_SCHED state has been cleared. The state can be - * cleared by: the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), - * or by 'pt_pirq_softirq_reset' (which will try to clear the state before + * That is until state is in init. The state can be changed by: + * the 'dpci_softirq' (when it has executed 'hvm_dirq_assist'), + * or by 'pt_pirq_softirq_reset' (which will try to init the state before * the softirq had a chance to run). */ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; -if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) +switch ( cmpxchg(&pirq_dpci->state, STATE_INIT, STATE_SCHED) ) +{ +case STATE_RUN: +case STATE_SCHED: +/* + * The pirq_dpci->mapping has been incremented to let us know + * how many we have left to do. + */ return; +case STATE_INIT: +break; +} get_knownalive_domain(pirq_dpci->dom); @@ -85,7 +71,7 @@ static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) */ bool_t pt_pirq_softirq_active(struct hvm_pirq_dpci *pirq_dpci) { -if ( pirq_dpci->state & ((1 << STATE_RUN) | (1 << STATE_SCHED)) ) +if ( pirq_dpci->state != STATE_INIT ) return 1; /* @@ -109,22 +95,22 @@ static void pt_pirq_softirq_reset(struct hvm_pirq_dpci *pirq_dpci) ASSERT(spin_is_locked(&d->event_lock)); -switch ( cmpxchg(&pirq_dpci->state, 1 << STATE_SCHED, 0) ) +switch ( cmpxchg(&pirq_dpci->state, STATE_SCHED, STATE_INIT) ) { -case (1 << STATE_SCHED): +case STATE_SCHED: /* - * We are going to try to de-schedule the softirq before it goes in - * STATE_RUN. Whoever clears STATE_SCHED MUST refcount the 'dom'. + * We are going to try to de-schedule the softirq before it goes to + * running state. Whoever moves from
Re: [Xen-devel] [PATCH 2/2] VT-d: extend XSA-59 workaround to XeonE5 v3 (Haswell)
Note that the following Haswell chipsets should also be included in this list: Haswell - 0xc0f, 0xd00, 0xd04, 0xd08, 0xd0f, 0xa00, 0xa08, 0xa0f -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, December 19, 2014 1:42 AM To: xen-devel Cc: Dugger, Donald D; Tian, Kevin; Zhang, Yang Z Subject: [PATCH 2/2] VT-d: extend XSA-59 workaround to XeonE5 v3 (Haswell) Note that the datasheet lacks PCI IDs for Dev 1 Fn 0-1, so their IDs are being added based on what https://pci-ids.ucw.cz/read/PC/8086 says. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -431,6 +431,7 @@ void pci_vtd_quirk(const struct pci_dev * - Potential security issue if malicious guest trigger VT-d faults. */ case 0x0e28: /* Xeon-E5v2 (IvyBridge) */ +case 0x2f28: /* Xeon-E5v3 (Haswell) */ case 0x342e: /* Tylersburg chipset (Nehalem / Westmere systems) */ case 0x3728: /* Xeon C5500/C3500 (JasperForest) */ case 0x3c28: /* Sandybridge */ @@ -443,6 +444,9 @@ void pci_vtd_quirk(const struct pci_dev /* Xeon E5/E7 v2 */ case 0x0e00: /* host bridge */ case 0x0e01: case 0x0e04 ... 0x0e0b: /* root ports */ +/* Xeon E5 v3 */ +case 0x2f00: /* host bridge */ +case 0x2f01 ... 0x2f0b: /* root ports */ /* Tylersburg (EP)/Boxboro (MP) chipsets (NHM-EP/EX, WSM-EP/EX) */ case 0x3400 ... 0x3407: /* host bridges */ case 0x3408 ... 0x3411: case 0x3420 ... 0x3421: /* root ports */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/2] VT-d: make XSA-59 workaround fully cover XeonE5/E7 v2
Note that the following Nehalem/Westmere chipsets should be included in this list: Nehalem - 0x40, 0x2c01, 0x2c41, 0x313x Westmere - 0x2c70, 0x2d81, 0xd15x -- Don Dugger "Censeo Toto nos in Kansa esse decisse." - D. Gale Ph: 303/443-3786 -Original Message- From: Jan Beulich [mailto:jbeul...@suse.com] Sent: Friday, December 19, 2014 1:41 AM To: xen-devel Cc: Dugger, Donald D; Tian, Kevin; Zhang, Yang Z Subject: [PATCH 1/2] VT-d: make XSA-59 workaround fully cover XeonE5/E7 v2 So far only the VT-d UR masking was being done for them. Signed-off-by: Jan Beulich --- a/xen/drivers/passthrough/vtd/quirks.c +++ b/xen/drivers/passthrough/vtd/quirks.c @@ -440,6 +440,9 @@ void pci_vtd_quirk(const struct pci_dev seg, bus, dev, func); break; +/* Xeon E5/E7 v2 */ +case 0x0e00: /* host bridge */ +case 0x0e01: case 0x0e04 ... 0x0e0b: /* root ports */ /* Tylersburg (EP)/Boxboro (MP) chipsets (NHM-EP/EX, WSM-EP/EX) */ case 0x3400 ... 0x3407: /* host bridges */ case 0x3408 ... 0x3411: case 0x3420 ... 0x3421: /* root ports */ ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] slightly reduce vm_assist code
At 15:55 + on 17 Mar (1426607705), Jan Beulich wrote: > - drop an effectively unused struct pv_vcpu field (x86) > - adjust VM_ASSIST() to prepend VMASST_TYPE_ > > Signed-off-by: Jan Beulich Reviewed-by: Tim Deegan , though I think these would have been better as two separate patches. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.
On Tue, Mar 17, 2015 at 04:06:14PM +, Jan Beulich wrote: > >>> On 17.03.15 at 16:38, wrote: > > --- a/xen/drivers/passthrough/io.c > > +++ b/xen/drivers/passthrough/io.c > > @@ -804,7 +804,17 @@ static void dpci_softirq(void) > > d = pirq_dpci->dom; > > smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ > > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > > -BUG(); > > +{ > > +unsigned long flags; > > + > > +/* Put back on the list and retry. */ > > +local_irq_save(flags); > > +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); > > +local_irq_restore(flags); > > + > > +raise_softirq(HVM_DPCI_SOFTIRQ); > > +continue; > > +} > > As just said in another mail - unless there are convincing new > arguments in favor of this (more of a hack than a real fix), I'm > not going to accept it and instead consider reverting the > offending commit. Iirc the latest we had come to looked quite a > bit better than this one. The latest one (please see attached) would cause an dead-lock iff on the CPU we are running the softirq and an do_IRQ comes for the exact dpci we are in process of executing. > > Jan > >From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 17 Mar 2015 09:46:09 -0400 Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has been cleared. There is race when we clear the STATE_SCHED in the softirq - which allows the 'raise_softirq_for' (on another CPU) to schedule the dpci. Specifically this can happen whenthe other CPU receives an interrupt, calls 'raise_softirq_for', and puts the dpci on its per-cpu list (same dpci structure). There would be two 'dpci_softirq' running at the same time (on different CPUs) where on one CPU it would be executing hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) and on the other CPU it is trying to call: if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) BUG(); Since STATE_RUN is already set it would end badly. The reason we can get his with this is when an interrupt affinity is set over multiple CPUs. Potential solutions: a) Instead of the BUG() we can put the dpci back on the per-cpu list to deal with later (when the softirq are activated again). This putting the 'dpci' back on the per-cpu list is an spin until the bad condition clears. b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for to detect for 'STATE_RUN' bit being set and schedule the dpci in a more safe manner (delay it). The dpci would stil not be scheduled when STATE_SCHED bit was set. c) This patch explores a third option - we will only schedule the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN). We will spin if STATE_RUN is set (as it is in progress and will finish). If the STATE_SCHED is set (so hasn't run yet) we won't try to spin and just exit. This can cause an dead-lock if the interrupt comes when we are processing the dpci in the softirq. Interestingly the old ('tasklet') code used an a) mechanism. If the function assigned to the tasklet was running - the softirq that ran said function (hvm_dirq_assist) would be responsible for putting the tasklet back on the per-cpu list. This would allow to have an running tasklet and an 'to-be-scheduled' tasklet at the same time. This solution moves this 'to-be-scheduled' job to be done in 'raise_softirq_for' (instead of the 'softirq'). Reported-by: Sander Eikelenboom Reported-by: Malcolm Crossley Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/io.c | 28 +--- 1 file changed, 25 insertions(+), 3 deletions(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..9c30ebb 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -63,10 +63,32 @@ enum { static void raise_softirq_for(struct hvm_pirq_dpci *pirq_dpci) { unsigned long flags; +unsigned long old; -if ( test_and_set_bit(STATE_SCHED, &pirq_dpci->state) ) -return; - +/* + * This cmpxchg spins until the state is zero (unused). + */ +for ( ;; ) +{ +old = cmpxchg(&pirq_dpci->state, 0, 1 << STATE_SCHED); +switch ( old ) +{ +case (1 << STATE_SCHED): +/* + * Whenever STATE_SCHED is set we MUST not schedule it. + */ +return; +case (1 << STATE_RUN) | (1 << STATE_SCHED): +case (1 << STATE_RUN): +/* Getting close to finish. Spin. */ +continue; +} +/* + * If the 'state' is 0 (not in use) we can schedule it. + */ +if ( old == 0 ) +break; +} get_knownalive_domain(pirq_dpci->dom); local_irq_save(flags); -- 2.1.0 ___ Xen
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On 17/03/15 14:29, Wei Liu wrote: I've now successfully built QEMU upstream with rump kernel. However to make it fully functional as a stubdom, there are some missing pieces to be added in. 1. The ability to access QMP socket (a unix socket) from Dom0. That will be used to issue command to QEMU. 2. The ability to access files in Dom0. That will be used to write to / read from QEMU state file. There's a way to map file access to rump kernel hypercalls with a facility called etfs (extra-terrestrial file system). In fact, the current implementation for accessing the Xen block device from the rump kernel is done using etfs (... historical reasons, I'd have to go back 5+ years to explain why it doesn't attach as a regular block device). etfs isn't a file system, e.g. it doesn't allow listing files or removing them, but it does give you complete control of what happens when data is read or written for /some/path. But based on the other posts, sounds like it might be enough for what you need. See: http://man.netbsd.org/cgi-bin/man-cgi?rump_etfs++NetBSD-current 3. The building process requires mini-os headers. That will be used to build libxc (the controlling library). That's not really a problem, though I do want to limit the amount of interface we claim to support with rump kernels. For example, ISTR you mentioned on irc you'd like to use minios wait.h. It would be better to use pthread synchronization instead of minios synchronization. That way, if we do have a need to change the underlying threading in the future, you won't run into trouble. So, we should just determine what is actually needed and expose those bits by default. One of my lessons learned from the existing stubdom stuffs is that I should work with upstream and produce maintainable code. So before I do anything for real I'd better consult the community. My gut feeling is that the first two requirements are not really Xen specific. Let me know what you guys plan and think. Yes, please. If there's something silly going on, it's most likely due to: 1) we didn't get that far in our experiments and weren't aware of it 2) we were aware, but some bits were even sillier, taking priority Either way, a real need is a definite reason to expedite fixing. - antti ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()
At 15:56 + on 17 Mar (1426607770), Jan Beulich wrote: > Most callers have this available already, and the functions don't need > any vcpu specifics. > > Signed-off-by: Jan Beulich Reviewed-by: Tim Deegan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
On Tue, Mar 17, 2015 at 04:01:49PM +, Jan Beulich wrote: > >>> On 17.03.15 at 15:54, wrote: > > On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote: > >> I'm still running with this first simple stopgap patch from Konrad, > >> and it has worked fine for me since. > > > > I believe the patch that Sander and Malcom had been running is the best > > candidate. > > That's the one Sander had quoted I suppose? I don't think this is Correct. > any better in terms of live locking, and we went quite some hoops > to get to something that looked more like a fix than a quick > workaround. (If there's nothing we can agree to, we'll have to > revert as we did for 4.5.) The live-locking does get broken (other softirqs get activated which moves things along). Keep in mind that the live-locking scenario exists already in Xen 4.x with the tasklet implementation. > > Jan > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/3] tools/libxl/libxl_cpuid.c: Fix leak of resstr on error path
On Mon, Mar 16, 2015 at 10:06:17AM +, PRAMOD DEVENDRA wrote: > From: Pramod Devendra > > Signed-off-by: Pramod Devendra > CC: Ian Jackson > CC: Stefano Stabellini > CC: Ian Campbell > CC: Wei Liu Acked-by: Wei Liu > --- > tools/libxl/libxl_cpuid.c |8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/tools/libxl/libxl_cpuid.c b/tools/libxl/libxl_cpuid.c > index b0bdc9d..c66e912 100644 > --- a/tools/libxl/libxl_cpuid.c > +++ b/tools/libxl/libxl_cpuid.c > @@ -223,9 +223,6 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list > *cpuid, const char* str) > } > entry = cpuid_find_match(cpuid, flag->leaf, flag->subleaf); > resstr = entry->policy[flag->reg - 1]; > -if (resstr == NULL) { > -resstr = strdup(""); > -} Minor nit. I would prefer "resstr = " be grouped with the code you moved. No need to resend though. Wei. > num = strtoull(val, &endptr, 0); > flags[flag->length] = 0; > if (endptr != val) { > @@ -242,6 +239,11 @@ int libxl_cpuid_parse_config(libxl_cpuid_policy_list > *cpuid, const char* str) > return 3; > } > } > + > +if (resstr == NULL) { > +resstr = strdup(""); > +} > + > /* the family and model entry is potentially split up across > * two fields in Fn_0001_EAX, so handle them here separately. > */ > -- > 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask with xc_cpumap_t instead of uint32_t
On Tue, Mar 17, 2015 at 01:52:27PM +, George Dunlap wrote: > On 03/13/2015 08:37 PM, Konrad Rzeszutek Wilk wrote: > > +static int parse_cpumask(const char *arg) > > +{ > > +xc_cpumap_t map; > > +uint32_t v, i; > > +int bits = 0; > > + > > +map = malloc(sizeof(uint32_t)); > > +if ( !map ) > > +return -ENOMEM; > > + > > +v = argtol(arg, 0); > > +for ( i = 0; i < sizeof(uint32_t) ; i++ ) > > +map[i] = (v >> (i * 8)) & 0xff; > > + > > +for ( i = 0; v; v >>= 1) > > +bits += v & 1; > > Uum, it looks like this is counting the 1-bits in v, not the total > number of bist. So "0x8000" would finish with bits == 1 ; but we would > this to finish with bits == 16, don't we? Duh! It should be: for ( bits = 0; v; v >>= 1 ) bits ++; And the 'int bits = 0' can now be 'int bits'. See patch: >From aa8a0ddc295161f55531c7f5ac643aadbfe70917 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Fri, 20 Jun 2014 15:34:53 -0400 Subject: [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask with xc_cpumap_t instead of uint32_t We replace the implementation of xc_tbuf_set_cpu_mask with an xc_cpumap_t instead of a uint32_t. This means we can use an arbitrary bitmap without being limited to the 32-bits as previously we were. Furthermore since there is only one user of xc_tbuf_set_cpu_mask we just replace it and its user in one go. We also add an macro which can be used by both libxc and xentrace. And update the man page to describe this behavior. Signed-off-by: Konrad Rzeszutek Wilk Acked-by: Ian Campbell [libxc pieces] [v2: Fix up the bit mask counting. --- tools/libxc/include/xenctrl.h | 7 ++- tools/libxc/xc_tbuf.c | 26 +++ tools/xentrace/xentrace.8 | 3 ++ tools/xentrace/xentrace.c | 106 -- 4 files changed, 116 insertions(+), 26 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index df18292..713e52b 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -1534,6 +1534,11 @@ int xc_availheap(xc_interface *xch, int min_width, int max_width, int node, */ /** + * Useful macro for converting byte arrays to bitmaps. + */ +#define XC_DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) + +/** * xc_tbuf_enable - enable tracing buffers * * @parm xch a handle to an open hypervisor interface @@ -1574,7 +1579,7 @@ int xc_tbuf_set_size(xc_interface *xch, unsigned long size); */ int xc_tbuf_get_size(xc_interface *xch, unsigned long *size); -int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask); +int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask, int bits); int xc_tbuf_set_evt_mask(xc_interface *xch, uint32_t mask); diff --git a/tools/libxc/xc_tbuf.c b/tools/libxc/xc_tbuf.c index 8777492..d54da8a 100644 --- a/tools/libxc/xc_tbuf.c +++ b/tools/libxc/xc_tbuf.c @@ -113,15 +113,23 @@ int xc_tbuf_disable(xc_interface *xch) return tbuf_enable(xch, 0); } -int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask) +int xc_tbuf_set_cpu_mask(xc_interface *xch, xc_cpumap_t mask, int bits) { DECLARE_SYSCTL; -DECLARE_HYPERCALL_BUFFER(uint8_t, bytemap); +DECLARE_HYPERCALL_BOUNCE(mask, XC_DIV_ROUND_UP(bits, 8), XC_HYPERCALL_BUFFER_BOUNCE_IN); int ret = -1; -uint64_t mask64 = mask; +int local_bits; -bytemap = xc_hypercall_buffer_alloc(xch, bytemap, sizeof(mask64)); -if ( bytemap == NULL ) +if ( bits <= 0 ) +goto out; + +local_bits = xc_get_max_cpus(xch); +if ( bits > local_bits ) +{ +PERROR("Wrong amount of bits supplied: %d > %d!\n", bits, local_bits); +goto out; +} +if ( xc_hypercall_bounce_pre(xch, mask) ) { PERROR("Could not allocate memory for xc_tbuf_set_cpu_mask hypercall"); goto out; @@ -131,14 +139,12 @@ int xc_tbuf_set_cpu_mask(xc_interface *xch, uint32_t mask) sysctl.interface_version = XEN_SYSCTL_INTERFACE_VERSION; sysctl.u.tbuf_op.cmd = XEN_SYSCTL_TBUFOP_set_cpu_mask; -bitmap_64_to_byte(bytemap, &mask64, sizeof (mask64) * 8); - -set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, bytemap); -sysctl.u.tbuf_op.cpu_mask.nr_bits = sizeof(bytemap) * 8; +set_xen_guest_handle(sysctl.u.tbuf_op.cpu_mask.bitmap, mask); +sysctl.u.tbuf_op.cpu_mask.nr_bits = bits; ret = do_sysctl(xch, &sysctl); -xc_hypercall_buffer_free(xch, bytemap); +xc_hypercall_bounce_post(xch, mask); out: return ret; diff --git a/tools/xentrace/xentrace.8 b/tools/xentrace/xentrace.8 index ac18e9f..c176a96 100644 --- a/tools/xentrace/xentrace.8 +++ b/tools/xentrace/xentrace.8 @@ -38,6 +38,9 @@ for new data. .TP .B -c, --cpu-mask=c set bitmask of CPUs to trace. It is limited to 32-bits. +If not specified, the cpu-mask of all of the available CPUs will be +constructed. + .TP .B -e, --evt-mask=e set event capture mask. If not specified
Re: [Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.
>>> On 17.03.15 at 16:38, wrote: > --- a/xen/drivers/passthrough/io.c > +++ b/xen/drivers/passthrough/io.c > @@ -804,7 +804,17 @@ static void dpci_softirq(void) > d = pirq_dpci->dom; > smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > -BUG(); > +{ > +unsigned long flags; > + > +/* Put back on the list and retry. */ > +local_irq_save(flags); > +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); > +local_irq_restore(flags); > + > +raise_softirq(HVM_DPCI_SOFTIRQ); > +continue; > +} As just said in another mail - unless there are convincing new arguments in favor of this (more of a hack than a real fix), I'm not going to accept it and instead consider reverting the offending commit. Iirc the latest we had come to looked quite a bit better than this one. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote: > 2. The ability to access files in Dom0. That will be used to write to / >read from QEMU state file. This requirement is not as broad as you make it sound. All which is really required is the ability to slurp in or write out a blob of bytes to a service running in a control domain, not actual ability to read/write files in dom0 (which would need careful security consideration!). For the old qemu-traditional stubdom for example this is implemented as a pair of console devices (one r/o for restore + one w/o for save) which are setup by the toolstack at start of day and pre-plumbed into two temporary files. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 11/14] x86/VPMU: Handle PMU interrupts for PV guests
Add support for handling PMU interrupts for PV guests. VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush hypercall. This allows the guest to access PMU MSR values that are stored in VPMU context which is shared between hypervisor and domain, thus avoiding traps to hypervisor. Since the interrupt handler may now force VPMU context save (i.e. set VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which until now expected this flag to be set only when the counters were stopped. Signed-off-by: Boris Ostrovsky Acked-by: Daniel De Graaf --- Changes in v19: * Adjusted for new ops interfaces (passing vcpu vs. vpmu) * Test for domain->max_cpu in choose_hwdom_vcpu() instead of 'domain->vcpu!=NULL' * Replaced '!(vpmu_mode & (XENPMU_MODE_SELF | XENPMU_MODE_HV))' test with 'vpmu_mode == XENPMU_MODE_OFF' in vpmu_rd/wrmsr() (to make more logical diff in patch#13) xen/arch/x86/hvm/svm/vpmu.c | 11 +- xen/arch/x86/hvm/vpmu.c | 211 -- xen/include/public/arch-x86/pmu.h | 6 ++ xen/include/public/pmu.h | 2 + xen/include/xsm/dummy.h | 4 +- xen/xsm/flask/hooks.c | 2 + 6 files changed, 216 insertions(+), 20 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 474d0db..0997901 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -228,17 +228,12 @@ static int amd_vpmu_save(struct vcpu *v) struct vpmu_struct *vpmu = vcpu_vpmu(v); unsigned int i; -/* - * Stop the counters. If we came here via vpmu_save_force (i.e. - * when VPMU_CONTEXT_SAVE is set) counters are already stopped. - */ +for ( i = 0; i < num_counters; i++ ) +wrmsrl(ctrls[i], 0); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) { vpmu_set(vpmu, VPMU_FROZEN); - -for ( i = 0; i < num_counters; i++ ) -wrmsrl(ctrls[i], 0); - return 0; } diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 26eda34..c287d8b 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -87,31 +87,57 @@ static void __init parse_vpmu_param(char *s) void vpmu_lvtpc_update(uint32_t val) { struct vpmu_struct *vpmu; +struct vcpu *curr; if ( vpmu_mode == XENPMU_MODE_OFF ) return; -vpmu = vcpu_vpmu(current); +curr = current; +vpmu = vcpu_vpmu(curr); vpmu->hw_lapic_lvtpc = PMU_APIC_VECTOR | (val & APIC_LVT_MASKED); -apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); + +/* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */ +if ( is_hvm_vcpu(curr) || !vpmu->xenpmu_data || + !(vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) +apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); } int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported) { -struct vpmu_struct *vpmu = vcpu_vpmu(current); +struct vcpu *curr = current; +struct vpmu_struct *vpmu; if ( vpmu_mode == XENPMU_MODE_OFF ) return 0; +vpmu = vcpu_vpmu(curr); if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) -return vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported); +{ +int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported); + +/* + * We may have received a PMU interrupt during WRMSR handling + * and since do_wrmsr may load VPMU context we should save + * (and unload) it again. + */ +if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data && + (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) +{ +vpmu_set(vpmu, VPMU_CONTEXT_SAVE); +vpmu->arch_vpmu_ops->arch_vpmu_save(curr); +vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); +} +return ret; +} + return 0; } int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) { -struct vpmu_struct *vpmu = vcpu_vpmu(current); +struct vcpu *curr = current; +struct vpmu_struct *vpmu; if ( vpmu_mode == XENPMU_MODE_OFF ) { @@ -119,24 +145,163 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) return 0; } +vpmu = vcpu_vpmu(curr); if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr ) -return vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); +{ +int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); + +if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data && + (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) +{ +vpmu_set(vpmu, VPMU_CONTEXT_SAVE); +vpmu->arch_vpmu_ops->arch_vpmu_save(curr); +vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); +} +return ret; +} else *msr_content = 0; return 0; } +static inline struct vcpu *choose_hwdom_vcpu(void) +{ +unsigned idx; + +if
[Xen-devel] [PATCH 0/3] libxl: Fixes from Ian Jackson
This is a small series of libxl patches I received off-list from Ian Jackson. The patches fix a few issues I found when converting the libvirt libxl driver to use a single libxl_ctx. Patch 2 has been modified slightly to address off-list comments from Wei Liu. Ian Jackson (3): libxl: In domain death search, start search at first domid we want libxl: Domain destroy: unlock userdata earlier libxl: Domain destroy: fork tools/libxl/libxl.c | 77 +++- tools/libxl/libxl_internal.h | 1 + 2 files changed, 63 insertions(+), 15 deletions(-) -- 1.8.0.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
>>> On 17.03.15 at 15:54, wrote: > On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote: >> I'm still running with this first simple stopgap patch from Konrad, >> and it has worked fine for me since. > > I believe the patch that Sander and Malcom had been running is the best > candidate. That's the one Sander had quoted I suppose? I don't think this is any better in terms of live locking, and we went quite some hoops to get to something that looked more like a fix than a quick workaround. (If there's nothing we can agree to, we'll have to revert as we did for 4.5.) Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, 2015-03-17 at 15:27 +, Wei Liu wrote: > This looks most interesting as it implies we can easily pipe a console > to it. BTW, rather than rawe consoles we should probably consider using the channel extension: http://xenbits.xen.org/docs/unstable/misc/channel.txt Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 5/5] xen: sched_rt: print useful affinity info when dumping
In fact, printing the cpupool's CPU online mask for each vCPU is just redundant, as that is the same for all the vCPUs of all the domains in the same cpupool, while hard affinity is already part of the output of dumping domains info. Instead, print the intersection between hard affinity and online CPUs, which is --in case of this scheduler-- the effective affinity always used for the vCPUs. This change also takes the chance to add a scratch cpumask area, to avoid having to either put one (more) cpumask_t on the stack, or dynamically allocate it within the dumping routine. (The former being bad because hypervisor stack size is limited, the latter because dynamic allocations can fail, if the hypervisor was built for a large enough number of CPUs.) Such scratch area can be used to kill most of the cpumasks{_var}_t local variables in other functions in the file, but that is *NOT* done in this chage. Finally, convert the file to use keyhandler scratch, instead of open coded string buffers. Signed-off-by: Dario Faggioli Cc: George Dunlap Cc: Meng Xu Cc: Jan Beulich Cc: Keir Fraser --- Changes from v1: * improved changelog; * made a local variable to point to the correct scratch mask, as suggested during review. --- xen/common/sched_rt.c | 42 +- 1 file changed, 33 insertions(+), 9 deletions(-) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index 7c39a9e..ec28956 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -124,6 +124,12 @@ #define TRC_RTDS_BUDGET_REPLENISH TRC_SCHED_CLASS_EVT(RTDS, 4) #define TRC_RTDS_SCHED_TASKLETTRC_SCHED_CLASS_EVT(RTDS, 5) + /* + * Useful to avoid too many cpumask_var_t on the stack. + */ +static cpumask_t **_cpumask_scratch; +#define cpumask_scratch _cpumask_scratch[smp_processor_id()] + /* * Systme-wide private data, include global RunQueue/DepletedQ * Global lock is referenced by schedule_data.schedule_lock from all @@ -218,8 +224,7 @@ __q_elem(struct list_head *elem) static void rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) { -char cpustr[1024]; -cpumask_t *cpupool_mask; +cpumask_t *cpupool_mask, *mask; ASSERT(svc != NULL); /* idle vcpu */ @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) return; } -cpumask_scnprintf(cpustr, sizeof(cpustr), svc->vcpu->cpu_hard_affinity); +/* + * We can't just use 'cpumask_scratch' because the dumping can + * happen from a pCPU outside of this scheduler's cpupool, and + * hence it's not right to use the pCPU's scratch mask (which + * may even not exist!). On the other hand, it is safe to use + * svc->vcpu->processor's own scratch space, since we hold the + * runqueue lock. + */ +mask = _cpumask_scratch[svc->vcpu->processor]; + +cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool); +cpumask_and(mask, cpupool_mask, svc->vcpu->cpu_hard_affinity); +cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), mask); printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime")," " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n" - " \t\t onQ=%d runnable=%d cpu_hard_affinity=%s ", + " \t\t onQ=%d runnable=%d flags=%x effective hard_affinity=%s\n", svc->vcpu->domain->domain_id, svc->vcpu->vcpu_id, svc->vcpu->processor, @@ -243,11 +260,8 @@ rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) svc->last_start, __vcpu_on_q(svc), vcpu_runnable(svc->vcpu), -cpustr); -memset(cpustr, 0, sizeof(cpustr)); -cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool); -cpumask_scnprintf(cpustr, sizeof(cpustr), cpupool_mask); -printk("cpupool=%s\n", cpustr); +svc->flags, +keyhandler_scratch); } static void @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops) if ( prv == NULL ) return -ENOMEM; +_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids); +if ( _cpumask_scratch == NULL ) +return -ENOMEM; + spin_lock_init(&prv->lock); INIT_LIST_HEAD(&prv->sdom); INIT_LIST_HEAD(&prv->runq); @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops) { struct rt_private *prv = rt_priv(ops); +xfree(_cpumask_scratch); xfree(prv); } @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu) per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; spin_unlock_irqrestore(&prv->lock, flags); +if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) ) +return NULL; + /* 1 indicates alloc. succeed in schedule.c */ return (void *)1; } @@ -462,6 +484,8 @@ rt_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) sd->schedule_lock = &sd->_lock; spin_unlock_irqrestore(&prv->lock, flag
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, 17 Mar 2015, Anthony PERARD wrote: > On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote: > > I've now successfully built QEMU upstream with rump kernel. However to > > make it fully functional as a stubdom, there are some missing pieces to > > be added in. > > > > 1. The ability to access QMP socket (a unix socket) from Dom0. That > >will be used to issue command to QEMU. > > The QMP "socket" does not needs to be a unix socket. It can be any of > those (from qemu --help): > Character device options: > -chardev null,id=id[,mux=on|off] > -chardev > socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds] > [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp) > -chardev > socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] > (unix) > -chardev udp,id=id[,host=host],port=port[,localaddr=localaddr] > [,localport=localport][,ipv4][,ipv6][,mux=on|off] > -chardev msmouse,id=id[,mux=on|off] > -chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]] > [,mux=on|off] > -chardev ringbuf,id=id[,size=size] > -chardev file,id=id,path=path[,mux=on|off] > -chardev pipe,id=id,path=path[,mux=on|off] > -chardev pty,id=id[,mux=on|off] > -chardev stdio,id=id[,mux=on|off][,signal=on|off] > -chardev serial,id=id,path=path[,mux=on|off] > -chardev tty,id=id,path=path[,mux=on|off] > -chardev parallel,id=id,path=path[,mux=on|off] > -chardev parport,id=id,path=path[,mux=on|off] > -chardev spicevmc,id=id,name=name[,debug=debug] > -chardev spiceport,id=id,name=name[,debug=debug] > > > 2. The ability to access files in Dom0. That will be used to write to / > >read from QEMU state file. > > To save a QEMU state (write), we do use a filename. But I guest we could > expand the QMP command (xen-save-devices-state) to use something else, if > it's easier. > > To restore, we provide a file descriptor from libxl to QEMU, with the fd on > the file that contain the state we want to restore. But there are a few > other way to load a state (from qemu.git/docs/migration.txt): > - tcp migration: do the migration using tcp sockets > - unix migration: do the migration using unix sockets > - exec migration: do the migration using the stdin/stdout through a process. > - fd migration: do the migration using an file descriptor that is > passed to QEMU. QEMU doesn't care how this file descriptor is opened. QEMU would definitely be happy if we started using fds instead of files to save/restore the state on Xen. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 0/5] Improving dumping of scheduler related info
Take 2. Some of the patches have been checked-in already, so here's what's remaining: - fix a bug in the RTDS scheduler (patch 1), - improve how the whole process of dumping scheduling info is serialized, by moving all locking code into specific schedulers (patch 2), - print more useful scheduling related information (patches 3, 4 and 5). Git branch here: git://xenbits.xen.org/people/dariof/xen.git rel/sched/dump-v2 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/dump-v2 I think I addressed all the comments raised upon v1. More details in the changelogs of the various patches. Thanks and Regards, Dario --- Dario Faggioli (5): xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains xen: rework locking for dump of scheduler info (debug-key r) xen: print online pCPUs and free pCPUs when dumping xen: sched_credit2: more info when dumping xen: sched_rt: print useful affinity info when dumping xen/common/cpupool.c | 12 + xen/common/sched_credit.c | 42 ++- xen/common/sched_credit2.c | 53 +--- xen/common/sched_rt.c | 59 xen/common/sched_sedf.c| 16 xen/common/schedule.c |5 +--- 6 files changed, 157 insertions(+), 30 deletions(-) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/5] xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains
being serviced by the RTDS scheduler, as that is a legit situation to be in: think, for instance, of a newly created RTDS cpupool, with no domains migrated to it yet. While there: - move the spinlock acquisition up, to effectively protect the domain list and avoid races; - the mask of online pCPUs was being retrieved but then not used anywhere in the function: get rid of that. Signed-off-by: Dario Faggioli Cc: George Dunlap Cc: Meng Xu Cc: Jan Beulich Cc: Keir Fraser Reviewed-by: Meng Xu Acked-by: George Dunlap --- Changes from v1: * updated the changelog as requested during review; * fixed coding style, as requested during review; * fixed label indentation, as requested during review. --- xen/common/sched_rt.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index ffc5107..2b0b7c6 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -264,18 +264,17 @@ rt_dump(const struct scheduler *ops) struct list_head *iter_sdom, *iter_svc, *runq, *depletedq, *iter; struct rt_private *prv = rt_priv(ops); struct rt_vcpu *svc; -cpumask_t *online; struct rt_dom *sdom; unsigned long flags; -ASSERT(!list_empty(&prv->sdom)); +spin_lock_irqsave(&prv->lock, flags); + +if ( list_empty(&prv->sdom) ) +goto out; -sdom = list_entry(prv->sdom.next, struct rt_dom, sdom_elem); -online = cpupool_scheduler_cpumask(sdom->dom->cpupool); runq = rt_runq(ops); depletedq = rt_depletedq(ops); -spin_lock_irqsave(&prv->lock, flags); printk("Global RunQueue info:\n"); list_for_each( iter, runq ) { @@ -303,6 +302,7 @@ rt_dump(const struct scheduler *ops) } } + out: spin_unlock_irqrestore(&prv->lock, flags); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/3] tools/libxl/libxl_qmp.c: Make sure sun_path is NULL terminated in qmp_open
On Mon, Mar 16, 2015 at 10:05:38AM +, PRAMOD DEVENDRA wrote: > From: Pramod Devendra > > Signed-off-by: Pramod Devendra > CC: Ian Jackson > CC: Stefano Stabellini > CC: Ian Campbell > CC: Wei Liu > --- > tools/libxl/libxl_qmp.c |5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl_qmp.c b/tools/libxl/libxl_qmp.c > index c7324e6..1080162 100644 > --- a/tools/libxl/libxl_qmp.c > +++ b/tools/libxl/libxl_qmp.c > @@ -369,10 +369,13 @@ static int qmp_open(libxl__qmp_handler *qmp, const char > *qmp_socket_path, > ret = libxl_fd_set_cloexec(qmp->ctx, qmp->qmp_fd, 1); > if (ret) return -1; > > +if(sizeof (qmp->addr.sun_path) <= strlen(qmp_socket_path)) > +return -1; > + I know this is not your fault, but the function seems to leak qmp_fd on error path (qmp_fd is not closed). Do you fancy fixing that? Wei. > memset(&qmp->addr, 0, sizeof (qmp->addr)); > qmp->addr.sun_family = AF_UNIX; > strncpy(qmp->addr.sun_path, qmp_socket_path, > -sizeof (qmp->addr.sun_path)); > +sizeof (qmp->addr.sun_path)-1); > > do { > ret = connect(qmp->qmp_fd, (struct sockaddr *) &qmp->addr, > -- > 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/3] tools/libxc/xc_linux_osdep.c: Don't leak mmap() mapping on map_foreign_bulk() error path
On Mon, Mar 16, 2015 at 10:06:50AM +, PRAMOD DEVENDRA wrote: > From: Pramod Devendra > > Signed-off-by: Pramod Devendra > CC: Ian Jackson > CC: Stefano Stabellini > CC: Ian Campbell > CC: Wei Liu Acked-by: Wei Liu > --- > tools/libxc/xc_linux_osdep.c |1 + > 1 file changed, 1 insertion(+) > > diff --git a/tools/libxc/xc_linux_osdep.c b/tools/libxc/xc_linux_osdep.c > index b6c435a..ce59590 100644 > --- a/tools/libxc/xc_linux_osdep.c > +++ b/tools/libxc/xc_linux_osdep.c > @@ -323,6 +323,7 @@ static void *linux_privcmd_map_foreign_bulk(xc_interface > *xch, xc_osdep_handle h > if ( pfn == MAP_FAILED ) > { > PERROR("xc_map_foreign_bulk: mmap of pfn array failed"); > +(void)munmap(addr, (unsigned long)num << XC_PAGE_SHIFT); > return NULL; > } > } > -- > 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, Mar 17, 2015 at 03:15:17PM +, Anthony PERARD wrote: > On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote: > > I've now successfully built QEMU upstream with rump kernel. However to > > make it fully functional as a stubdom, there are some missing pieces to > > be added in. > > > > 1. The ability to access QMP socket (a unix socket) from Dom0. That > >will be used to issue command to QEMU. > > The QMP "socket" does not needs to be a unix socket. It can be any of > those (from qemu --help): > Character device options: > -chardev null,id=id[,mux=on|off] > -chardev > socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds] > [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp) > -chardev > socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] > (unix) > -chardev udp,id=id[,host=host],port=port[,localaddr=localaddr] > [,localport=localport][,ipv4][,ipv6][,mux=on|off] > -chardev msmouse,id=id[,mux=on|off] > -chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]] > [,mux=on|off] > -chardev ringbuf,id=id[,size=size] > -chardev file,id=id,path=path[,mux=on|off] > -chardev pipe,id=id,path=path[,mux=on|off] > -chardev pty,id=id[,mux=on|off] > -chardev stdio,id=id[,mux=on|off][,signal=on|off] > -chardev serial,id=id,path=path[,mux=on|off] > -chardev tty,id=id,path=path[,mux=on|off] > -chardev parallel,id=id,path=path[,mux=on|off] > -chardev parport,id=id,path=path[,mux=on|off] > -chardev spicevmc,id=id,name=name[,debug=debug] > -chardev spiceport,id=id,name=name[,debug=debug] > Ha, thanks for the list. My brain was too locked in to the current implementation. So yes, we now have an array of possible transports at our disposal. > > 2. The ability to access files in Dom0. That will be used to write to / > >read from QEMU state file. > > To save a QEMU state (write), we do use a filename. But I guest we could > expand the QMP command (xen-save-devices-state) to use something else, if > it's easier. > That's also an option. > To restore, we provide a file descriptor from libxl to QEMU, with the fd on > the file that contain the state we want to restore. But there are a few > other way to load a state (from qemu.git/docs/migration.txt): > - tcp migration: do the migration using tcp sockets > - unix migration: do the migration using unix sockets > - exec migration: do the migration using the stdin/stdout through a process. This looks most interesting as it implies we can easily pipe a console to it. Wei. > - fd migration: do the migration using an file descriptor that is > passed to QEMU. QEMU doesn't care how this file descriptor is opened. > > -- > Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] libxl: Domain destroy: unlock userdata earlier
From: Ian Jackson Unlock the userdata before we actually call xc_domain_destroy. This leaves open the possibility that other libxl callers will see the half-destroyed domain (with no devices, paused), but this is fine. Signed-off-by: Ian Jackson CC: Wei Liu Reviewed-by: Jim Fehlig Tested-by: Jim Fehlig --- Addressed off-list comments from Wei Liu tools/libxl/libxl.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index e7eb863..b6541d4 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1636,7 +1636,7 @@ static void devices_destroy_cb(libxl__egc *egc, uint32_t domid = dis->domid; char *dom_path; char *vm_path; -libxl__domain_userdata_lock *lock = NULL; +libxl__domain_userdata_lock *lock; dom_path = libxl__xs_get_dompath(gc, domid); if (!dom_path) { @@ -1670,6 +1670,8 @@ static void devices_destroy_cb(libxl__egc *egc, } libxl__userdata_destroyall(gc, domid); +libxl__unlock_domain_userdata(lock); + rc = xc_domain_destroy(ctx->xch, domid); if (rc < 0) { LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy failed for %d", domid); @@ -1679,7 +1681,6 @@ static void devices_destroy_cb(libxl__egc *egc, rc = 0; out: -if (lock) libxl__unlock_domain_userdata(lock); dis->callback(egc, dis, rc); return; } -- 1.8.0.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/5] xen: rework locking for dump of scheduler info (debug-key r)
such as it is taken care of by the various schedulers, rather than happening in schedule.c. In fact, it is the schedulers that know better which locks are necessary for the specific dumping operations. While there, fix a few style issues (indentation, trailing whitespace, parentheses and blank line after var declarations) Signed-off-by: Dario Faggioli Cc: George Dunlap Cc: Meng Xu Cc: Jan Beulich Cc: Keir Fraser Reviewed-by: Meng Xu --- Changes from v1: * take care of SEDF too, as requested during review; --- As far as tags are concerned, I kept Meng's 'Reviewed-by', as I think this applies mostly to chenges to sched_rt.c. I, OTOH, dropped George's one, to give him the chance to look at changes to sched_sedf.c. --- xen/common/sched_credit.c | 42 -- xen/common/sched_credit2.c | 40 xen/common/sched_rt.c |7 +-- xen/common/sched_sedf.c| 16 xen/common/schedule.c |5 ++--- 5 files changed, 95 insertions(+), 15 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index bec67ff..953ecb0 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -26,6 +26,23 @@ /* + * Locking: + * - Scheduler-lock (a.k.a. runqueue lock): + * + is per-runqueue, and there is one runqueue per-cpu; + * + serializes all runqueue manipulation operations; + * - Private data lock (a.k.a. private scheduler lock): + * + serializes accesses to the scheduler global state (weight, + *credit, balance_credit, etc); + * + serializes updates to the domains' scheduling parameters. + * + * Ordering is "private lock always comes first": + * + if we need both locks, we must acquire the private + *scheduler lock for first; + * + if we already own a runqueue lock, we must never acquire + *the private scheduler lock. + */ + +/* * Basic constants */ #define CSCHED_DEFAULT_WEIGHT 256 @@ -1750,11 +1767,24 @@ static void csched_dump_pcpu(const struct scheduler *ops, int cpu) { struct list_head *runq, *iter; +struct csched_private *prv = CSCHED_PRIV(ops); struct csched_pcpu *spc; struct csched_vcpu *svc; +spinlock_t *lock = lock; +unsigned long flags; int loop; #define cpustr keyhandler_scratch +/* + * We need both locks: + * - csched_dump_vcpu() wants to access domains' scheduling + * parameters, which are protected by the private scheduler lock; + * - we scan through the runqueue, so we need the proper runqueue + * lock (the one of the runqueue of this cpu). + */ +spin_lock_irqsave(&prv->lock, flags); +lock = pcpu_schedule_lock(cpu); + spc = CSCHED_PCPU(cpu); runq = &spc->runq; @@ -1781,6 +1811,9 @@ csched_dump_pcpu(const struct scheduler *ops, int cpu) csched_dump_vcpu(svc); } } + +pcpu_schedule_unlock(lock, cpu); +spin_unlock_irqrestore(&prv->lock, flags); #undef cpustr } @@ -1792,7 +1825,7 @@ csched_dump(const struct scheduler *ops) int loop; unsigned long flags; -spin_lock_irqsave(&(prv->lock), flags); +spin_lock_irqsave(&prv->lock, flags); #define idlers_buf keyhandler_scratch @@ -1835,15 +1868,20 @@ csched_dump(const struct scheduler *ops) list_for_each( iter_svc, &sdom->active_vcpu ) { struct csched_vcpu *svc; +spinlock_t *lock; + svc = list_entry(iter_svc, struct csched_vcpu, active_vcpu_elem); +lock = vcpu_schedule_lock(svc->vcpu); printk("\t%3d: ", ++loop); csched_dump_vcpu(svc); + +vcpu_schedule_unlock(lock, svc->vcpu); } } #undef idlers_buf -spin_unlock_irqrestore(&(prv->lock), flags); +spin_unlock_irqrestore(&prv->lock, flags); } static int diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index be6859a..ae9b359 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -51,8 +51,6 @@ * credit2 wiki page: * http://wiki.xen.org/wiki/Credit2_Scheduler_Development * TODO: - * + Immediate bug-fixes - * - Do per-runqueue, grab proper lock for dump debugkey * + Multiple sockets * - Detect cpu layout and make runqueue map, one per L2 (make_runq_map()) * - Simple load balancer / runqueue assignment @@ -1832,12 +1830,24 @@ csched2_dump_vcpu(struct csched2_vcpu *svc) static void csched2_dump_pcpu(const struct scheduler *ops, int cpu) { +struct csched2_private *prv = CSCHED2_PRIV(ops); struct list_head *runq, *iter; struct csched2_vcpu *svc; +unsigned long flags; +spinlock_t *lock; int loop; char cpustr[100]; -/* FIXME: Do locking properly for access to runqueue structures */ +/* + * We need both locks: + * - csched2_dump_vcpu() wants to access domains' scheduling + * parameters, which are protected by the private scheduler lock; + * - we sc
[Xen-devel] [PATCH 1/3] libxl: In domain death search, start search at first domid we want
From: Ian Jackson From: Ian Jackson When domain_death_xswatch_callback needed a further call to xc_domain_getinfolist it would restart it with the last domain it found rather than the first one it wants. If it only wants one it will also only ask for one domain. The result would then be that it gets the previous domain again (ie, the previous one to the one it wants), which still doesn't reveal the answer to the question, and it would therefore loop again. It's completely unclear to me why I thought it was a good idea to start the xc_domain_getinfolist with the last domain previously found rather than the first one left un-confirmed. The code has been that way since it was introduced. Instead, start each xc_domain_getinfolist at the next domain whose status we need to check. We also need to move the test for !evg into the loop, we now need evg to compute the arguments to getinfolist. Signed-off-by: Ian Jackson Reported-by: Jim Fehlig Reviewed-by: Jim Fehlig Tested-by: Jim Fehlig --- tools/libxl/libxl.c | 15 +++ 1 file changed, 7 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index 088786e..e7eb863 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1168,22 +1168,20 @@ static void domain_death_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w, const char *wpath, const char *epath) { EGC_GC; libxl_evgen_domain_death *evg; -uint32_t domid; int rc; CTX_LOCK; evg = LIBXL_TAILQ_FIRST(&CTX->death_list); -if (!evg) goto out; - -domid = evg->domid; for (;;) { +if (!evg) goto out; + int nentries = LIBXL_TAILQ_NEXT(evg, entry) ? 200 : 1; xc_domaininfo_t domaininfos[nentries]; const xc_domaininfo_t *got = domaininfos, *gotend; -rc = xc_domain_getinfolist(CTX->xch, domid, nentries, domaininfos); +rc = xc_domain_getinfolist(CTX->xch, evg->domid, nentries, domaininfos); if (rc == -1) { LIBXL__EVENT_DISASTER(egc, "xc_domain_getinfolist failed while" " processing @releaseDomain watch event", @@ -1193,8 +1191,10 @@ static void domain_death_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w, gotend = &domaininfos[rc]; LIBXL__LOG(CTX, LIBXL__LOG_DEBUG, "[evg=%p:%"PRIu32"]" - " from domid=%"PRIu32" nentries=%d rc=%d", - evg, evg->domid, domid, nentries, rc); + " nentries=%d rc=%d %ld..%ld", + evg, evg->domid, nentries, rc, + rc>0 ? (long)domaininfos[0].domain : 0, + rc>0 ? (long)domaininfos[rc-1].domain : 0); for (;;) { if (!evg) { @@ -1257,7 +1257,6 @@ static void domain_death_xswatch_callback(libxl__egc *egc, libxl__ev_xswatch *w, } assert(rc); /* rc==0 results in us eating all evgs and quitting */ -domid = gotend[-1].domain; } all_reported: out: -- 1.8.0.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 4/5] xen: sched_credit2: more info when dumping
more specifically, for each runqueue, print what pCPUs belong to it, which ones are idle and which ones have been tickled. While there, also convert the whole file to use keyhandler_scratch for printing cpumask-s. Signed-off-b: Dario Faggioli Cc: George Dunlap Cc: Jan Beulich Cc: Keir Fraser Reviewed-by: George Dunlap --- xen/common/sched_credit2.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index ae9b359..8aa1438 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -25,6 +25,7 @@ #include #include #include +#include #define d2printk(x...) //#define d2printk printk @@ -1836,7 +1837,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu) unsigned long flags; spinlock_t *lock; int loop; -char cpustr[100]; +#define cpustr keyhandler_scratch /* * We need both locks: @@ -1877,6 +1878,7 @@ csched2_dump_pcpu(const struct scheduler *ops, int cpu) spin_unlock(lock); spin_unlock_irqrestore(&prv->lock, flags); +#undef cpustr } static void @@ -1886,6 +1888,7 @@ csched2_dump(const struct scheduler *ops) struct csched2_private *prv = CSCHED2_PRIV(ops); unsigned long flags; int i, loop; +#define cpustr keyhandler_scratch /* We need the private lock as we access global scheduler data * and (below) the list of active domains. */ @@ -1901,17 +1904,24 @@ csched2_dump(const struct scheduler *ops) fraction = prv->rqd[i].avgload * 100 / (1ULLrqd[i].active); printk("Runqueue %d:\n" "\tncpus = %u\n" + "\tcpus = %s\n" "\tmax_weight = %d\n" "\tinstload = %d\n" "\taveload= %3"PRI_stime"\n", i, cpumask_weight(&prv->rqd[i].active), + cpustr, prv->rqd[i].max_weight, prv->rqd[i].load, fraction); +cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].idle); +printk("\tidlers: %s\n", cpustr); +cpumask_scnprintf(cpustr, sizeof(cpustr), &prv->rqd[i].tickled); +printk("\ttickled: %s\n", cpustr); } printk("Domain info:\n"); @@ -1942,6 +1952,7 @@ csched2_dump(const struct scheduler *ops) } spin_unlock_irqrestore(&prv->lock, flags); +#undef cpustr } static void activate_runqueue(struct csched2_private *prv, int rqi) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/3] x86/shadow: pass domain to sh_install_xen_entries_in_lN()
Most callers have this available already, and the functions don't need any vcpu specifics. Signed-off-by: Jan Beulich --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -1416,9 +1416,8 @@ do { //shadow-types.h to shadow-private.h // #if GUEST_PAGING_LEVELS == 4 -void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn) +void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn) { -struct domain *d = v->domain; shadow_l4e_t *sl4e; unsigned int slots; @@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR); /* Self linear mapping. */ -if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) ) +if ( shadow_mode_translate(d) && !shadow_mode_external(d) ) { // linear tables may not be used with translated PV guests sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] = @@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in_l4(struct // place, which means that we need to populate the l2h entry in the l3 // table. -static void sh_install_xen_entries_in_l2h(struct vcpu *v, mfn_t sl2hmfn) +static void sh_install_xen_entries_in_l2h(struct domain *d, mfn_t sl2hmfn) { -struct domain *d = v->domain; shadow_l2e_t *sl2e; -if ( !is_pv_32on64_vcpu(v) ) +if ( !is_pv_32on64_domain(d) ) return; sl2e = sh_map_domain_page(sl2hmfn); @@ -1549,11 +1547,13 @@ sh_make_shadow(struct vcpu *v, mfn_t gmf { #if GUEST_PAGING_LEVELS == 4 case SH_type_l4_shadow: -sh_install_xen_entries_in_l4(v, gmfn, smfn); break; +sh_install_xen_entries_in_l4(v->domain, gmfn, smfn); +break; #endif #if GUEST_PAGING_LEVELS >= 3 case SH_type_l2h_shadow: -sh_install_xen_entries_in_l2h(v, smfn); break; +sh_install_xen_entries_in_l2h(v->domain, smfn); +break; #endif default: /* Do nothing */ break; } @@ -1594,7 +1594,7 @@ sh_make_monitor_table(struct vcpu *v) { mfn_t m4mfn; m4mfn = shadow_alloc(d, SH_type_monitor_table, 0); -sh_install_xen_entries_in_l4(v, m4mfn, m4mfn); +sh_install_xen_entries_in_l4(d, m4mfn, m4mfn); /* Remember the level of this table */ mfn_to_page(m4mfn)->shadow_flags = 4; #if SHADOW_PAGING_LEVELS < 4 @@ -1618,7 +1618,7 @@ sh_make_monitor_table(struct vcpu *v) l3e[0] = l3e_from_pfn(mfn_x(m2mfn), __PAGE_HYPERVISOR); sh_unmap_domain_page(l3e); -if ( is_pv_32on64_vcpu(v) ) +if ( is_pv_32on64_domain(d) ) { /* For 32-on-64 PV guests, we need to map the 32-bit Xen * area into its usual VAs in the monitor tables */ @@ -1630,7 +1630,7 @@ sh_make_monitor_table(struct vcpu *v) mfn_to_page(m2mfn)->shadow_flags = 2; l3e = sh_map_domain_page(m3mfn); l3e[3] = l3e_from_pfn(mfn_x(m2mfn), _PAGE_PRESENT); -sh_install_xen_entries_in_l2h(v, m2mfn); +sh_install_xen_entries_in_l2h(d, m2mfn); sh_unmap_domain_page(l3e); } --- a/xen/arch/x86/mm/shadow/private.h +++ b/xen/arch/x86/mm/shadow/private.h @@ -361,7 +361,7 @@ mfn_t shadow_alloc(struct domain *d, void shadow_free(struct domain *d, mfn_t smfn); /* Install the xen mappings in various flavours of shadow */ -void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn); +void sh_install_xen_entries_in_l4(struct domain *, mfn_t gl4mfn, mfn_t sl4mfn); /* Update the shadows in response to a pagetable write from Xen */ int sh_validate_guest_entry(struct vcpu *v, mfn_t gmfn, void *entry, u32 size); x86/shadow: pass domain to sh_install_xen_entries_in_lN() Most callers have this available already, and the functions don't need any vcpu specifics. Signed-off-by: Jan Beulich --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -1416,9 +1416,8 @@ do { //shadow-types.h to shadow-private.h // #if GUEST_PAGING_LEVELS == 4 -void sh_install_xen_entries_in_l4(struct vcpu *v, mfn_t gl4mfn, mfn_t sl4mfn) +void sh_install_xen_entries_in_l4(struct domain *d, mfn_t gl4mfn, mfn_t sl4mfn) { -struct domain *d = v->domain; shadow_l4e_t *sl4e; unsigned int slots; @@ -1449,7 +1448,7 @@ void sh_install_xen_entries_in_l4(struct shadow_l4e_from_mfn(sl4mfn, __PAGE_HYPERVISOR); /* Self linear mapping. */ -if ( shadow_mode_translate(v->domain) && !shadow_mode_external(v->domain) ) +if ( shadow_mode_translate(d) && !shadow_mode_external(d) ) { // linear tables may not be used with translated PV guests sl4e[shadow_l4_table_offset(LINEAR_PT_VIRT_START)] = @@ -1470,12 +1469,11 @@ void sh_install_xen_entries_in
Re: [Xen-devel] [PATCH] tools/libxl: avoid comparing an unsigned int to -1
On Mon, Mar 16, 2015 at 10:12:34AM +, Koushik Chakravarty wrote: > Signed-off-by: Koushik Chakravarty > CC: Ian Jackson > CC: Stefano Stabellini > CC: Ian Campbell > CC: Wei Liu Acked-by: Wei Liu Ian J, this one should be backported to 4.5. > --- > tools/libxl/libxl_json.c |2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl_json.c b/tools/libxl/libxl_json.c > index 98335b0..346929a 100644 > --- a/tools/libxl/libxl_json.c > +++ b/tools/libxl/libxl_json.c > @@ -1013,7 +1013,7 @@ out: > yajl_gen_status libxl__uint64_gen_json(yajl_gen hand, uint64_t val) > { > char *num; > -unsigned int len; > +int len; > yajl_gen_status s; > > > -- > 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] slightly reduce vm_assist code
- drop an effectively unused struct pv_vcpu field (x86) - adjust VM_ASSIST() to prepend VMASST_TYPE_ Signed-off-by: Jan Beulich --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -901,7 +901,6 @@ int arch_set_info_guest( v->arch.pv_vcpu.event_callback_cs = c(event_callback_cs); v->arch.pv_vcpu.failsafe_callback_cs = c(failsafe_callback_cs); } -v->arch.pv_vcpu.vm_assist = c(vm_assist); /* Only CR0.TS is modifiable by guest or admin. */ v->arch.pv_vcpu.ctrlreg[0] &= X86_CR0_TS; @@ -973,7 +972,7 @@ int arch_set_info_guest( case -ERESTART: break; case 0: -if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && +if ( !compat && !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) { l4_pgentry_t *l4tab = __map_domain_page(cr3_page); @@ -1023,7 +1022,7 @@ int arch_set_info_guest( cr3_page = NULL; break; case 0: -if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) ) +if ( VM_ASSIST(d, m2p_strict) ) { l4_pgentry_t *l4tab = __map_domain_page(cr3_page); --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1436,7 +1436,6 @@ void arch_get_info_guest(struct vcpu *v, c(event_callback_cs = v->arch.pv_vcpu.event_callback_cs); c(failsafe_callback_cs = v->arch.pv_vcpu.failsafe_callback_cs); } -c(vm_assist = v->arch.pv_vcpu.vm_assist); /* IOPL privileges are virtualised: merge back into returned eflags. */ BUG_ON((c(user_regs.eflags) & X86_EFLAGS_IOPL) != 0); --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1454,7 +1454,7 @@ static int alloc_l4_table(struct page_in adjust_guest_l4e(pl4e[i], d); } -init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict)); +init_guest_l4_table(pl4e, d, !VM_ASSIST(d, m2p_strict)); unmap_domain_page(pl4e); return rc > 0 ? 0 : rc; @@ -2765,7 +2765,7 @@ int new_guest_cr3(unsigned long mfn) invalidate_shadow_ldt(curr, 0); -if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcounts(d) ) +if ( !VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) { l4_pgentry_t *l4tab = map_domain_page(mfn); @@ -3135,8 +3135,7 @@ long do_mmuext_op( op.arg1.mfn); break; } -if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) && - !paging_mode_refcounts(d) ) +if ( VM_ASSIST(d, m2p_strict) && !paging_mode_refcounts(d) ) { l4_pgentry_t *l4tab = map_domain_page(op.arg1.mfn); --- a/xen/arch/x86/mm/shadow/multi.c +++ b/xen/arch/x86/mm/shadow/multi.c @@ -1436,7 +1436,7 @@ void sh_install_xen_entries_in_l4(struct shadow_l4e_from_mfn(page_to_mfn(d->arch.perdomain_l3_pg), __PAGE_HYPERVISOR); -if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) ) +if ( !VM_ASSIST(d, m2p_strict) ) sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty(); /* Shadow linear mapping for 4-level shadows. N.B. for 3-level @@ -3983,11 +3983,11 @@ sh_update_cr3(struct vcpu *v, int do_loc shadow_l4e_t *sl4e = v->arch.paging.shadow.guest_vtable; if ( (v->arch.flags & TF_kernel_mode) && - !VM_ASSIST(d, VMASST_TYPE_m2p_strict) ) + !VM_ASSIST(d, m2p_strict) ) sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]; else if ( !(v->arch.flags & TF_kernel_mode) && - VM_ASSIST(d, VMASST_TYPE_m2p_strict) ) + VM_ASSIST(d, m2p_strict) ) sl4e[shadow_l4_table_offset(RO_MPT_VIRT_START)] = shadow_l4e_empty(); } --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -1441,7 +1441,7 @@ static int fixup_page_fault(unsigned lon !(regs->error_code & (PFEC_reserved_bit | PFEC_insn_fetch)) && (regs->error_code & PFEC_write_access) ) { -if ( VM_ASSIST(d, VMASST_TYPE_writable_pagetables) && +if ( VM_ASSIST(d, writable_pagetables) && /* Do not check if access-protection fault since the page may legitimately be not present in shadow page tables */ (paging_mode_enabled(d) || --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -306,7 +306,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDL { case 0: fi.submap = (1U << XENFEAT_memory_op_vnode_supported); -if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) ) +if ( VM_ASSIST(d, pae_extended_cr3) ) fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb); if ( paging_mod
[Xen-devel] [PATCH v19 00/14] x86/PMU: Xen PMU PV(H) support
Changes in v19: * Do not allow changing mode to/from OFF/ALL while guests are running. This significantly simplifies code due to large number of corner cases that I had to deal with. Most of the changes are in patch#5. This also makes patch 4 from last version unnecessary * Defer NMI support (drop patch#14 from last version) * Make patch#15 from last series be patch#1 (vpmu init cleanup) * Other changes are listed per patch Changes in v18: * Return 1 (i.e. "handled") in vpmu_do_interrupt() if PMU_CACHED is set. This is needed since we can get an interrupt while this flag is set on AMD processors when multiple counters are in use (**) (AMD processor don't mask LVTPC when PMC interrupt happens and so there is a window in vpmu_do_interrupt() until it sets the mask bit). Patch #14 * Unload both current and last_vcpu (if different) vpmu and clear this_cpu(last_vcpu) in vpmu_unload_all. Patch #5 * Make major version check for certain xenpmu_ops. Patch #5 * Make xenpmu_op()'s first argument unsigned. Patch #5 * Don't use format specifier for __stringify(). Patch #6 * Don't print generic error in vpmu_init(). Patch #6 * Don't test for VPMU existance in vpmu_initialise(). New patch #15 * Added vpmu_disabled flag to make sure VPMU doesn't get reenabled from dom0 (for example when watchdog is active). Patch #5 * Updated tags on some patches to better reflect latest reviewed status) (**) While testing this I discovered that AMD VPMU is quite broken for HVM: when multiple counters are in use linux dom0 often gets unexpected NMIs. This may have something to do with what I mentioned in the first bullet. However, this doesn't appear to be related to this patch series (or earlier VPMU patches) --- I can reproduce this all the way back to 4.1 Changes in v17: * Disable VPMU when unknown CPU vendor is detected (patch #2) * Remove unnecessary vendor tests in vendor-specific init routines (patch #14) * Remember first CPU that starts mode change and use it to stop the cycle (patch #13) * If vpmu ops is not present, return 0 as value for VPMU MSR read (as opposed to returning an error as was the case in previous patch.) (patch #18) * Change slightly vpmu_do_msr() logic as result of this chage (patch #20) * stringify VPMU version (patch #14) * Use 'CS > 1' to mark sample as PMU_SAMPLE_USER (patch #19) Changes in v16: * Many changes in VPMU mode patch (#13): * Replaced arguments to some vpmu routines (vcpu -> vpmu). New patch (#12) * Added vpmu_unload vpmu op to completely unload vpmu data (e.g clear MSR bitmaps). This routine may be called in context switch (vpmu_switch_to()). * Added vmx_write_guest_msr_vcpu() interface to write MSRs of non-current VCPU * Use cpumask_cycle instead of cpumask_next * Dropped timeout error * Adjusted types of mode variables * Don't allow oprofile to allocate its context on MSR access if VPMU context has already been allocated (which may happen when VMPU mode was set to off while the guest was running) * vpmu_initialise() no longer turns off VPMU globally on failure. New patch (#2) * vpmu_do_msr() will return 1 (failure) if vpmu_ops are not set. This is done to prevent PV guests that are not VPMU-enabled from wrongly assuming that they have access to counters (Linux check_hw_exists() will make this assumption) (patch #18) * Add cpl field to shared structure that will be passed for HVM guests' samples (instead of PMU_SAMPLE_USER flag). Add PMU_SAMPLE_PV flag to mark whose sample is passed up. (Patches ## 10, 19, 22) Changes in v15: * Rewrote vpmu_force_context_switch() to use continue_hypercall_on_cpu() * Added vpmu_init initcall that will call vendor-specific init routines * Added a lock to vpmu_struct to serialize pmu_init()/pmu_finish() * Use SS instead of CS for DPL (instead of RPL) * Don't take lock for XENPMU_mode_get * Make vmpu_mode/features an unsigned int (from uint64_t) * Adjusted pvh_hypercall64_table[] order * Replaced address range check [XEN_VIRT_START..XEN_VIRT_END] with guest_mode() * A few style cleanups Changes in v14: * Moved struct xen_pmu_regs to pmu.h * Moved CHECK_pmu_* to an earlier patch (when structures are first introduced) * Added PMU_SAMPLE_REAL flag to indicate whether the sample was taken in real mode * Simplified slightly setting rules for xenpmu_data flags * Rewrote vpmu_force_context_switch() to again use continuations. (Returning EAGAIN to user would mean that VPMU mode may get into inconsistent state (across processors) and dealing with that is more compicated than I'd like). * Fixed msraddr_to_bitpos() and converted it into an inline * Replaced address range check in vmpu_do_interrupt() with guest_mode() * No error returns from __initcall * Rebased on top of recent VPMU changes * Various cleanups Changes in v13: * Rearranged data in xenpf_symdata to eliminate a hole (no change in structure size) * Removed unnecessary zeroing of last character in name string during symbol re
[Xen-devel] [PATCH 1/3] x86: allow 64‑bit PV guest kernels to suppress user mode exposure of M2P
Xen L4 entries being uniformly installed into any L4 table and 64-bit PV kernels running in ring 3 means that user mode was able to see the read-only M2P presented by Xen to the guests. While apparently not really representing an exploitable information leak, this still very certainly was never meant to be that way. Building on the fact that these guests already have separate kernel and user mode page tables we can allow guest kernels to tell Xen that they don't want user mode to see this table. We can't, however, do this by default: There is no ABI requirement that kernel and user mode page tables be separate. Therefore introduce a new VM-assist flag allowing the guest to control respective hypervisor behavior: - when not set, L4 tables get created with the respective slot blank, and whenever the L4 table gets used as a kernel one the missing mapping gets inserted, - when set, L4 tables get created with the respective slot initialized as before, and whenever the L4 table gets used as a user one the mapping gets zapped. Since the new flag gets assigned a value discontiguous to the existing ones (in order to preserve the low bits, as only those are currently accessible to 32-bit guests), this requires a little bit of rework of the VM assist code in general: An architecture specific VM_ASSIST_VALID definition gets introduced (with an optional compat mode counterpart), and compilation of the respective code becomes conditional upon this being defined (ARM doesn't wire these up and hence doesn't need that code). Signed-off-by: Jan Beulich --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -339,7 +339,7 @@ static int setup_compat_l4(struct vcpu * l4tab = __map_domain_page(pg); clear_page(l4tab); -init_guest_l4_table(l4tab, v->domain); +init_guest_l4_table(l4tab, v->domain, 1); unmap_domain_page(l4tab); v->arch.guest_table = pagetable_from_page(pg); @@ -971,7 +971,17 @@ int arch_set_info_guest( case -EINTR: rc = -ERESTART; case -ERESTART: +break; case 0: +if ( !compat && !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && + !paging_mode_refcounts(d) ) +{ +l4_pgentry_t *l4tab = __map_domain_page(cr3_page); + +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = +idle_pg_table[l4_table_offset(RO_MPT_VIRT_START)]; +unmap_domain_page(l4tab); +} break; default: if ( cr3_page == current->arch.old_guest_table ) @@ -1006,7 +1016,16 @@ int arch_set_info_guest( default: if ( cr3_page == current->arch.old_guest_table ) cr3_page = NULL; +break; case 0: +if ( VM_ASSIST(d, VMASST_TYPE_m2p_strict) ) +{ +l4_pgentry_t *l4tab = __map_domain_page(cr3_page); + +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = +l4e_empty(); +unmap_domain_page(l4tab); +} break; } } --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -1203,7 +1203,7 @@ int __init construct_dom0( l3start = __va(mpt_alloc); mpt_alloc += PAGE_SIZE; } clear_page(l4tab); -init_guest_l4_table(l4tab, d); +init_guest_l4_table(l4tab, d, 0); v->arch.guest_table = pagetable_from_paddr(__pa(l4start)); if ( is_pv_32on64_domain(d) ) v->arch.guest_table_user = v->arch.guest_table; --- a/xen/arch/x86/mm.c +++ b/xen/arch/x86/mm.c @@ -1380,7 +1380,8 @@ static int alloc_l3_table(struct page_in return rc > 0 ? 0 : rc; } -void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d) +void init_guest_l4_table(l4_pgentry_t l4tab[], const struct domain *d, + bool_t zap_ro_mpt) { /* Xen private mappings. */ memcpy(&l4tab[ROOT_PAGETABLE_FIRST_XEN_SLOT], @@ -1395,6 +1396,8 @@ void init_guest_l4_table(l4_pgentry_t l4 l4e_from_pfn(domain_page_map_to_mfn(l4tab), __PAGE_HYPERVISOR); l4tab[l4_table_offset(PERDOMAIN_VIRT_START)] = l4e_from_page(d->arch.perdomain_l3_pg, __PAGE_HYPERVISOR); +if ( zap_ro_mpt || is_pv_32on64_domain(d) || paging_mode_refcounts(d) ) +l4tab[l4_table_offset(RO_MPT_VIRT_START)] = l4e_empty(); } static int alloc_l4_table(struct page_info *page) @@ -1444,7 +1447,7 @@ static int alloc_l4_table(struct page_in adjust_guest_l4e(pl4e[i], d); } -init_guest_l4_table(pl4e, d); +init_guest_l4_table(pl4e, d, !VM_ASSIST(d, VMASST_TYPE_m2p_strict)); unmap_domain_page(pl4e); return rc > 0 ? 0 : rc; @@ -2755,6 +2758,14 @@ int new_guest_cr3(unsigned long mfn) invalidate_shadow_ldt(curr, 0); +if ( !VM_ASSIST(d, VMASST_TYPE_m2p_strict) && !paging_mode_refcou
[Xen-devel] [PATCH] dpci: Put the dpci back on the list if scheduled from another CPU.
There is race when we clear the STATE_SCHED in the softirq - which allows the 'raise_softirq_for' (on another CPU or on the one running the softirq) to schedule the dpci. Specifically this can happen when the other CPU receives an interrupt, calls 'raise_softirq_for', and puts the dpci on its per-cpu list (same dpci structure). Note that this could also happen on the same physical CPU, however the explanation for simplicity will assume two CPUs actors. There would be two 'dpci_softirq' running at the same time (on different CPUs) where on one CPU it would be executing hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) and on the other CPU it is trying to call: if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) BUG(); Since STATE_RUN is already set it would end badly. The reason we can get his with this is when an interrupt affinity is set over multiple CPUs. Potential solutions: a) Instead of the BUG() we can put the dpci back on the per-cpu list to deal with later (when the softirq are activated again). This putting the 'dpci' back on the per-cpu list is an spin until the bad condition clears. b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for to detect for 'STATE_RUN' bit being set and schedule the dpci. The BUG() check in dpci_softirq would be replace with a spin until 'STATE_RUN' has been cleared. The dpci would still not be scheduled when STATE_SCHED bit was set. c) Only schedule the dpci when the state is cleared (no STATE_SCHED and no STATE_RUN). It would spin if STATE_RUN is set (as it is in progress and will finish). If the STATE_SCHED is set (so hasn't run yet) we won't try to spin and just exit. Down-sides of the solutions: a). Live-lock of the CPU. We could be finishing an dpci, then adding the dpci, exiting, and the processing the dpci once more. And so on. We would eventually stop as the TIMER_SOFTIRQ would be set, which will cause SCHEDULER_SOFTIRQ to be set as well and we would exit this loop. Interestingly the old ('tasklet') code used this mechanism. If the function assigned to the tasklet was running - the softirq that ran said function (hvm_dirq_assist) would be responsible for putting the tasklet back on the per-cpu list. This would allow to have an running tasklet and an 'to-be-scheduled' tasklet at the same time. b). is similar to a) - instead of re-entering the dpci_softirq we are looping in the softirq waiting for the correct condition to arrive. As it does not allow unwedging ourselves because the other softirqs are not called - it is less preferable. c) can cause an dead-lock if the interrupt comes in when we are processing the dpci in the softirq - iff this happens on the same CPU. We would be looping in on raise_softirq waiting for STATE_RUN to be cleared, while the softirq that was to clear it - is preempted by our interrupt handler. As such, this patch - which implements a) is the best candidate for this quagmire. Reported-and-Tested-by: Sander Eikelenboom Reported-and-Tested-by: Malcolm Crossley Signed-off-by: Konrad Rzeszutek Wilk --- xen/drivers/passthrough/io.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c index ae050df..9b77334 100644 --- a/xen/drivers/passthrough/io.c +++ b/xen/drivers/passthrough/io.c @@ -804,7 +804,17 @@ static void dpci_softirq(void) d = pirq_dpci->dom; smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) -BUG(); +{ +unsigned long flags; + +/* Put back on the list and retry. */ +local_irq_save(flags); +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); +local_irq_restore(flags); + +raise_softirq(HVM_DPCI_SOFTIRQ); +continue; +} /* * The one who clears STATE_SCHED MUST refcount the domain. */ -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 02/14] common/symbols: Export hypervisor symbols to privileged guest
Export Xen's symbols as {} triplet via new XENPF_get_symbol hypercall Signed-off-by: Boris Ostrovsky Acked-by: Daniel De Graaf Reviewed-by: Konrad Rzeszutek Wilk Reviewed-by: Dietmar Hahn Tested-by: Dietmar Hahn --- xen/arch/x86/platform_hypercall.c | 28 +++ xen/common/symbols.c| 54 + xen/include/public/platform.h | 19 + xen/include/xen/symbols.h | 3 +++ xen/include/xlat.lst| 1 + xen/xsm/flask/hooks.c | 4 +++ xen/xsm/flask/policy/access_vectors | 2 ++ 7 files changed, 111 insertions(+) diff --git a/xen/arch/x86/platform_hypercall.c b/xen/arch/x86/platform_hypercall.c index 334d474..7626261 100644 --- a/xen/arch/x86/platform_hypercall.c +++ b/xen/arch/x86/platform_hypercall.c @@ -23,6 +23,7 @@ #include #include #include +#include #include #include #include @@ -798,6 +799,33 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PARAM(xen_platform_op_t) u_xenpf_op) } break; +case XENPF_get_symbol: +{ +static char name[KSYM_NAME_LEN + 1]; /* protected by xenpf_lock */ +XEN_GUEST_HANDLE(char) nameh; +uint32_t namelen, copylen; + +guest_from_compat_handle(nameh, op->u.symdata.name); + +ret = xensyms_read(&op->u.symdata.symnum, &op->u.symdata.type, + &op->u.symdata.address, name); + +namelen = strlen(name) + 1; + +if ( namelen > op->u.symdata.namelen ) +copylen = op->u.symdata.namelen; +else +copylen = namelen; + +op->u.symdata.namelen = namelen; + +if ( !ret && copy_to_guest(nameh, name, copylen) ) +ret = -EFAULT; +if ( !ret && __copy_field_to_guest(u_xenpf_op, op, u.symdata) ) +ret = -EFAULT; +} +break; + default: ret = -ENOSYS; break; diff --git a/xen/common/symbols.c b/xen/common/symbols.c index bc2fde6..2c0942d 100644 --- a/xen/common/symbols.c +++ b/xen/common/symbols.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #ifdef SYMBOLS_ORIGIN extern const unsigned int symbols_offsets[1]; @@ -148,3 +150,55 @@ const char *symbols_lookup(unsigned long addr, *offset = addr - symbols_address(low); return namebuf; } + +/* + * Get symbol type information. This is encoded as a single char at the + * beginning of the symbol name. + */ +static char symbols_get_symbol_type(unsigned int off) +{ +/* + * Get just the first code, look it up in the token table, + * and return the first char from this token. + */ +return symbols_token_table[symbols_token_index[symbols_names[off + 1]]]; +} + +int xensyms_read(uint32_t *symnum, char *type, + uint64_t *address, char *name) +{ +/* + * Symbols are most likely accessed sequentially so we remember position + * from previous read. This can help us avoid the extra call to + * get_symbol_offset(). + */ +static uint64_t next_symbol, next_offset; +static DEFINE_SPINLOCK(symbols_mutex); + +if ( *symnum > symbols_num_syms ) +return -ERANGE; +if ( *symnum == symbols_num_syms ) +{ +/* No more symbols */ +name[0] = '\0'; +return 0; +} + +spin_lock(&symbols_mutex); + +if ( *symnum == 0 ) +next_offset = next_symbol = 0; +if ( next_symbol != *symnum ) +/* Non-sequential access */ +next_offset = get_symbol_offset(*symnum); + +*type = symbols_get_symbol_type(next_offset); +next_offset = symbols_expand_symbol(next_offset, name); +*address = symbols_offsets[*symnum] + SYMBOLS_ORIGIN; + +next_symbol = ++*symnum; + +spin_unlock(&symbols_mutex); + +return 0; +} diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h index 82ec84e..1e6a6ce 100644 --- a/xen/include/public/platform.h +++ b/xen/include/public/platform.h @@ -590,6 +590,24 @@ struct xenpf_resource_op { typedef struct xenpf_resource_op xenpf_resource_op_t; DEFINE_XEN_GUEST_HANDLE(xenpf_resource_op_t); +#define XENPF_get_symbol 63 +struct xenpf_symdata { +/* IN/OUT variables */ +uint32_t namelen; /* IN: size of name buffer */ + /* OUT: strlen(name) of hypervisor symbol (may be */ + /* larger than what's been copied to guest) */ +uint32_t symnum; /* IN: Symbol to read*/ + /* OUT: Next available symbol. If same as IN then */ + /* we reached the end*/ + +/* OUT variables */ +XEN_GUEST_HANDLE(char) name; +uint64_t address; +char type; +}; +typedef struct xenpf_symdata xenpf_symdata_t; +DEFINE_XEN_GUEST_HANDLE(xenpf_symdata_t); + /* * ` enum neg_errnoval * ` HYPERVISOR_platform_op(const struct xen_platform_op*); @@ -619,6 +637,7 @@ struct xen_platform_op {
Re: [Xen-devel] [PATCH] tools/libxl: close the logfile_w and null file descriptors in libxl__spawn_qdisk_backend() error path
On Mon, Mar 16, 2015 at 10:09:29AM +, Koushik Chakravarty wrote: > Signed-off-by: Koushik Chakravarty > CC: Ian Jackson > CC: Stefano Stabellini > CC: Ian Campbell > CC: Wei Liu > --- > tools/libxl/libxl_dm.c | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c > index cb006df..161401c 100644 > --- a/tools/libxl/libxl_dm.c > +++ b/tools/libxl/libxl_dm.c > @@ -1508,7 +1508,7 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, > libxl__dm_spawn_state *dmss) > flexarray_t *dm_args; > char **args; > const char *dm; > -int logfile_w, null, rc; > +int logfile_w, null = -1, rc; > uint32_t domid = dmss->guest_domid; > > /* Always use qemu-xen as device model */ > @@ -1534,6 +1534,10 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, > libxl__dm_spawn_state *dmss) > goto error; > } > null = open("/dev/null", O_RDONLY); > +if (null < 0) { > + rc = ERROR_FAIL; > + goto error; > +} > > dmss->guest_config = NULL; > /* > @@ -1568,6 +1572,10 @@ void libxl__spawn_qdisk_backend(libxl__egc *egc, > libxl__dm_spawn_state *dmss) > > error: > assert(rc); > +if(logfile_w >= 0) > + close(logfile_w); > +if(null >= 0) > + close(null); Please add space between `if' and `('. Also you can just write if (logfile_w >= 0) close (logfile_w); if (null >= 0) close (null); Wei. > dmss->callback(egc, dmss, rc); > return; > } > -- > 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Fwd: [PATCH v2 0/5] Improving dumping of scheduler related info]
Forgot to Cc people in the cover letter of the series... Sorry! Forwarded Message From: Dario Faggioli To: Xen-devel Subject: [PATCH v2 0/5] Improving dumping of scheduler related info Date: Tue, 17 Mar 2015 16:32:41 +0100 Mailer: StGit/0.17.1-dirty Message-Id: <20150317152615.9867.48676.stgit@Solace.station> Take 2. Some of the patches have been checked-in already, so here's what's remaining: - fix a bug in the RTDS scheduler (patch 1), - improve how the whole process of dumping scheduling info is serialized, by moving all locking code into specific schedulers (patch 2), - print more useful scheduling related information (patches 3, 4 and 5). Git branch here: git://xenbits.xen.org/people/dariof/xen.git rel/sched/dump-v2 http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/rel/sched/dump-v2 I think I addressed all the comments raised upon v1. More details in the changelogs of the various patches. Thanks and Regards, Dario --- Dario Faggioli (5): xen: sched_rt: avoid ASSERT()ing on runq dump if there are no domains xen: rework locking for dump of scheduler info (debug-key r) xen: print online pCPUs and free pCPUs when dumping xen: sched_credit2: more info when dumping xen: sched_rt: print useful affinity info when dumping xen/common/cpupool.c | 12 + xen/common/sched_credit.c | 42 ++- xen/common/sched_credit2.c | 53 +--- xen/common/sched_rt.c | 59 xen/common/sched_sedf.c| 16 xen/common/schedule.c |5 +--- 6 files changed, 157 insertions(+), 30 deletions(-) signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 06/14] x86/VPMU: Initialize VPMUs with __initcall
Move some VPMU initilization operations into __initcalls to avoid performing same tests and calculations for each vcpu. Signed-off-by: Boris Ostrovsky Acked-by: Jan Beulich --- xen/arch/x86/hvm/svm/vpmu.c | 106 -- xen/arch/x86/hvm/vmx/vpmu_core2.c | 151 +++--- xen/arch/x86/hvm/vpmu.c | 32 xen/include/asm-x86/hvm/vpmu.h| 2 + 4 files changed, 155 insertions(+), 136 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 481ea7b..b60ca40 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -356,54 +356,6 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) return 1; } -static int amd_vpmu_initialise(struct vcpu *v) -{ -struct xen_pmu_amd_ctxt *ctxt; -struct vpmu_struct *vpmu = vcpu_vpmu(v); -uint8_t family = current_cpu_data.x86; - -if ( counters == NULL ) -{ - switch ( family ) -{ -case 0x15: -num_counters = F15H_NUM_COUNTERS; -counters = AMD_F15H_COUNTERS; -ctrls = AMD_F15H_CTRLS; -k7_counters_mirrored = 1; -break; -case 0x10: -case 0x12: -case 0x14: -case 0x16: -default: -num_counters = F10H_NUM_COUNTERS; -counters = AMD_F10H_COUNTERS; -ctrls = AMD_F10H_CTRLS; -k7_counters_mirrored = 0; -break; -} -} - -ctxt = xzalloc_bytes(sizeof(*ctxt) + - 2 * sizeof(uint64_t) * num_counters); -if ( !ctxt ) -{ -gdprintk(XENLOG_WARNING, "Insufficient memory for PMU, " -" PMU feature is unavailable on domain %d vcpu %d.\n", -v->vcpu_id, v->domain->domain_id); -return -ENOMEM; -} - -ctxt->counters = sizeof(*ctxt); -ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters; - -vpmu->context = ctxt; -vpmu->priv_context = NULL; -vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); -return 0; -} - static void amd_vpmu_destroy(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); @@ -474,30 +426,62 @@ struct arch_vpmu_ops amd_vpmu_ops = { int svm_vpmu_initialise(struct vcpu *v) { +struct xen_pmu_amd_ctxt *ctxt; struct vpmu_struct *vpmu = vcpu_vpmu(v); -uint8_t family = current_cpu_data.x86; -int ret = 0; -/* vpmu enabled? */ if ( vpmu_mode == XENPMU_MODE_OFF ) return 0; -switch ( family ) +if ( !counters ) +return -EINVAL; + +ctxt = xzalloc_bytes(sizeof(*ctxt) + + 2 * sizeof(uint64_t) * num_counters); +if ( !ctxt ) { +printk(XENLOG_G_WARNING "Insufficient memory for PMU, " + " PMU feature is unavailable on domain %d vcpu %d.\n", + v->vcpu_id, v->domain->domain_id); +return -ENOMEM; +} + +ctxt->counters = sizeof(*ctxt); +ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters; + +vpmu->context = ctxt; +vpmu->priv_context = NULL; + +vpmu->arch_vpmu_ops = &amd_vpmu_ops; + +vpmu_set(vpmu, VPMU_CONTEXT_ALLOCATED); +return 0; +} + +int __init amd_vpmu_init(void) +{ +switch ( current_cpu_data.x86 ) +{ +case 0x15: +num_counters = F15H_NUM_COUNTERS; +counters = AMD_F15H_COUNTERS; +ctrls = AMD_F15H_CTRLS; +k7_counters_mirrored = 1; +break; case 0x10: case 0x12: case 0x14: -case 0x15: case 0x16: -ret = amd_vpmu_initialise(v); -if ( !ret ) -vpmu->arch_vpmu_ops = &amd_vpmu_ops; -return ret; +num_counters = F10H_NUM_COUNTERS; +counters = AMD_F10H_COUNTERS; +ctrls = AMD_F10H_CTRLS; +k7_counters_mirrored = 0; +break; +default: +printk(XENLOG_WARNING "VPMU: Unsupported CPU family %#x\n", + current_cpu_data.x86); +return -EINVAL; } -printk("VPMU: Initialization failed. " - "AMD processor family %d has not " - "been supported\n", family); -return -EINVAL; +return 0; } diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index 6280644..17d1b04 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -708,62 +708,6 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) return 1; } -static int core2_vpmu_initialise(struct vcpu *v) -{ -struct vpmu_struct *vpmu = vcpu_vpmu(v); -u64 msr_content; -static bool_t ds_warned; - -if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) ) -goto func_out; -/* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */ -while ( boot_cpu_has(X86_FEATURE_DS) ) -{ -if ( !boot_cpu_has(X86_FEATURE_DTES64) ) -{ -if ( !ds_warned ) -printk(XENLOG_G_WARNING "CPU does
[Xen-devel] [PATCH 3/3] libxl: Domain destroy: fork
From: Ian Jackson Call xc_domain_destroy in a subprocess. That allows us to do so asynchronously, rather than blocking the whole process calling libxl. The changes in detail: * Provide an libxl__ev_child in libxl__domain_destroy_state, and initialise it in libxl__domain_destroy. There is no possibility to `clean up' a libxl__ev_child, but there need to clean it up, as the control flow ensures that we only continue after the child has exited. * Call libxl__ev_child_fork at the right point and put the call to xc_domain_destroy and associated logging in the child. (The child opens a new xenctrl handle because we mustn't use the parent's.) * Consequently, the success return path from domain_destroy_domid_cb no longer calls dis->callback. Instead it simply returns. * We plumb the errorno value through the child's exit status, if it fits. This means we normally do the logging only in the parent. * Incidentally, we fix the bug that we were treating the return value from xc_domain_destroy as an errno value when in fact it is a return value from do_domctl (in this case, 0 or -1 setting errno). Signed-off-by: Ian Jackson Reviewed-by: Jim Fehlig Tested-by: Jim Fehlig --- tools/libxl/libxl.c | 57 tools/libxl/libxl_internal.h | 1 + 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index b6541d4..b43db1a 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -1481,6 +1481,10 @@ static void domain_destroy_callback(libxl__egc *egc, static void destroy_finish_check(libxl__egc *egc, libxl__domain_destroy_state *dds); +static void domain_destroy_domid_cb(libxl__egc *egc, +libxl__ev_child *destroyer, +pid_t pid, int status); + void libxl__domain_destroy(libxl__egc *egc, libxl__domain_destroy_state *dds) { STATE_AO_GC(dds->ao); @@ -1567,6 +1571,8 @@ void libxl__destroy_domid(libxl__egc *egc, libxl__destroy_domid_state *dis) char *pid; int rc, dm_present; +libxl__ev_child_init(&dis->destroyer); + rc = libxl_domain_info(ctx, NULL, domid); switch(rc) { case 0: @@ -1672,17 +1678,58 @@ static void devices_destroy_cb(libxl__egc *egc, libxl__unlock_domain_userdata(lock); -rc = xc_domain_destroy(ctx->xch, domid); -if (rc < 0) { -LIBXL__LOG_ERRNOVAL(ctx, LIBXL__LOG_ERROR, rc, "xc_domain_destroy failed for %d", domid); +rc = libxl__ev_child_fork(gc, &dis->destroyer, domain_destroy_domid_cb); +if (rc < 0) goto out; +if (!rc) { /* child */ +ctx->xch = xc_interface_open(ctx->lg,0,0); +if (!ctx->xch) goto badchild; + +rc = xc_domain_destroy(ctx->xch, domid); +if (rc < 0) goto badchild; +_exit(0); + +badchild: +if (errno > 0 && errno < 126) { +_exit(errno); +} else { +LOGE(ERROR, + "xc_domain_destroy failed for %d (with difficult errno value %d)", + domid, errno); +_exit(-1); +} +} +LOG(INFO, "forked pid %ld for destroy of domain %d", (long)rc, domid); + +return; + +out: +dis->callback(egc, dis, rc); +return; +} + +static void domain_destroy_domid_cb(libxl__egc *egc, +libxl__ev_child *destroyer, +pid_t pid, int status) +{ +libxl__destroy_domid_state *dis = CONTAINER_OF(destroyer, *dis, destroyer); +STATE_AO_GC(dis->ao); +int rc; + +if (status) { +if (WIFEXITED(status) && WEXITSTATUS(status)<126) { +LOGEV(ERROR, WEXITSTATUS(status), + "xc_domain_destroy failed for %"PRIu32"", dis->domid); +} else { +libxl_report_child_exitstatus(CTX, XTL_ERROR, + "async domain destroy", pid, status); +} rc = ERROR_FAIL; goto out; } rc = 0; -out: + out: dis->callback(egc, dis, rc); -return; } int libxl_console_exec(libxl_ctx *ctx, uint32_t domid, int cons_num, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 934465a..28d32ef 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -2957,6 +2957,7 @@ struct libxl__destroy_domid_state { libxl__domid_destroy_cb *callback; /* private to implementation */ libxl__devices_remove_state drs; +libxl__ev_child destroyer; }; struct libxl__domain_destroy_state { -- 1.8.0.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] OpenStack - Libvirt+Xen CI overview
On 03/10/2015 08:03 AM, Bob Ball wrote: For the last few weeks Anthony and I have been working on creating a CI environment to run against all OpenStack jobs. We're now in a position where we can share the current status, overview of how it works and next steps. We actively want to support involvement in this effort from others with an interest in libvirt+Xen's openstack integration. The CI we have set up is follow the recommendations made by the OpenStack official infrastructure maintainers, and reproduces a notable portion of the official OpenStack CI environment to run these tests. Namely this setup is using: - Puppet to deploy the master node - Zuul to watch for code changes uploaded to review.openstack.org - Jenkins job builder to create Jenkins job definitions from a YAML file - Nodepool to automatically create single-use virtual machines in the Rackspace public cloud - Devstack-gate to run Tempest tests in serial More information on Zuul, JJB, Nodepool and devstack-gate is available through http://ci.openstack.org The current status is that we have a zuul instance monitoring for jobs and adding them to the queue of jobs to be run at http://zuul.openstack.xenproject.org/ In the background Nodepool provisions virtual machines into a pool of nodes ready to be used. All ready nodes are automatically added to Jenkins (https://jenkins.openstack.xenproject.org/), and then Zuul+Jenkins will trigger a particular job on a node when one is available. Logs are then uploaded to Rackspace's Cloud Files with sample logs for a passing job at http://logs.openstack.xenproject.org/52/162352/3/silent/dsvm-tempest-xen/da3ff30/index.html I'd like to organise a meeting to walk through the various components of the CI with those who are interested, so this is an initial call to find out who is interested in finding out more! Thanks, Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel I would also love to find out more. -- Alvin Starr || voice: (905)513-7688 Netvel Inc. || Cell: (416)806-0133 al...@netvel.net || ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 0/3] x86: misc changes
The main point of the series is really patch 1 (which after consultation among the security team doesn't appear to represent a security fix); the other two are just cleanup that I found possible/desirable while putting together the first one. 1: x86: allow 64-bit PV guest kernels to suppress user mode exposure of M2P 2: slightly reduce vm_assist code 3: x86/shadow: pass domain to sh_install_xen_entries_in_lN() Signed-off-by: Jan Beulich ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 03/14] x86/VPMU: Add public xenpmu.h
Add pmu.h header files, move various macros and structures that will be shared between hypervisor and PV guests to it. Move MSR banks out of architectural PMU structures to allow for larger sizes in the future. The banks are allocated immediately after the context and PMU structures store offsets to them. While making these updates, also: * Remove unused vpmu_domain() macro from vpmu.h * Convert msraddr_to_bitpos() into an inline and make it a little faster by realizing that all Intel's PMU-related MSRs are in the lower MSR range. Signed-off-by: Boris Ostrovsky Acked-by: Kevin Tian Acked-by: Jan Beulich Reviewed-by: Dietmar Hahn Tested-by: Dietmar Hahn --- Change in v19: * Moved PMU-related structs in xlat.lst to alphabetical order xen/arch/x86/hvm/svm/vpmu.c | 83 +++-- xen/arch/x86/hvm/vmx/vpmu_core2.c| 123 +-- xen/arch/x86/hvm/vpmu.c | 10 +++ xen/arch/x86/oprofile/op_model_ppro.c| 6 +- xen/include/Makefile | 3 +- xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 32 xen/include/asm-x86/hvm/vpmu.h | 16 ++-- xen/include/public/arch-arm.h| 3 + xen/include/public/arch-x86/pmu.h| 91 +++ xen/include/public/pmu.h | 38 ++ xen/include/xlat.lst | 4 + 11 files changed, 275 insertions(+), 134 deletions(-) delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h create mode 100644 xen/include/public/arch-x86/pmu.h create mode 100644 xen/include/public/pmu.h diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 6764070..a8b79df 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -30,10 +30,7 @@ #include #include #include - -#define F10H_NUM_COUNTERS 4 -#define F15H_NUM_COUNTERS 6 -#define MAX_NUM_COUNTERS F15H_NUM_COUNTERS +#include #define MSR_F10H_EVNTSEL_GO_SHIFT 40 #define MSR_F10H_EVNTSEL_EN_SHIFT 22 @@ -49,6 +46,9 @@ static const u32 __read_mostly *counters; static const u32 __read_mostly *ctrls; static bool_t __read_mostly k7_counters_mirrored; +#define F10H_NUM_COUNTERS 4 +#define F15H_NUM_COUNTERS 6 + /* PMU Counter MSRs. */ static const u32 AMD_F10H_COUNTERS[] = { MSR_K7_PERFCTR0, @@ -83,12 +83,14 @@ static const u32 AMD_F15H_CTRLS[] = { MSR_AMD_FAM15H_EVNTSEL5 }; -/* storage for context switching */ -struct amd_vpmu_context { -u64 counters[MAX_NUM_COUNTERS]; -u64 ctrls[MAX_NUM_COUNTERS]; -bool_t msr_bitmap_set; -}; +/* Use private context as a flag for MSR bitmap */ +#define msr_bitmap_on(vpmu)do {\ + (vpmu)->priv_context = (void *)-1L; \ + } while (0) +#define msr_bitmap_off(vpmu) do {\ + (vpmu)->priv_context = NULL;\ + } while (0) +#define is_msr_bitmap_on(vpmu) ((vpmu)->priv_context != NULL) static inline int get_pmu_reg_type(u32 addr) { @@ -142,7 +144,6 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v) { unsigned int i; struct vpmu_struct *vpmu = vcpu_vpmu(v); -struct amd_vpmu_context *ctxt = vpmu->context; for ( i = 0; i < num_counters; i++ ) { @@ -150,14 +151,13 @@ static void amd_vpmu_set_msr_bitmap(struct vcpu *v) svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_WRITE); } -ctxt->msr_bitmap_set = 1; +msr_bitmap_on(vpmu); } static void amd_vpmu_unset_msr_bitmap(struct vcpu *v) { unsigned int i; struct vpmu_struct *vpmu = vcpu_vpmu(v); -struct amd_vpmu_context *ctxt = vpmu->context; for ( i = 0; i < num_counters; i++ ) { @@ -165,7 +165,7 @@ static void amd_vpmu_unset_msr_bitmap(struct vcpu *v) svm_intercept_msr(v, ctrls[i], MSR_INTERCEPT_RW); } -ctxt->msr_bitmap_set = 0; +msr_bitmap_off(vpmu); } static int amd_vpmu_do_interrupt(struct cpu_user_regs *regs) @@ -177,19 +177,22 @@ static inline void context_load(struct vcpu *v) { unsigned int i; struct vpmu_struct *vpmu = vcpu_vpmu(v); -struct amd_vpmu_context *ctxt = vpmu->context; +struct xen_pmu_amd_ctxt *ctxt = vpmu->context; +uint64_t *counter_regs = vpmu_reg_pointer(ctxt, counters); +uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls); for ( i = 0; i < num_counters; i++ ) { -wrmsrl(counters[i], ctxt->counters[i]); -wrmsrl(ctrls[i], ctxt->ctrls[i]); +wrmsrl(counters[i], counter_regs[i]); +wrmsrl(ctrls[i], ctrl_regs[i]); } } static void amd_vpmu_load(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); -struct amd_vpmu_context *ctxt = vpmu->context; +struct xen_pmu_amd_ctxt *ctxt = vpmu->context; +uint64_t *ctrl_regs = vpmu_reg_pointer(ctxt, ctrls); vpmu_reset(vpmu, VPMU_FROZEN);
[Xen-devel] [PATCH v19 07/14] x86/VPMU: Initialize PMU for PV(H) guests
Code for initializing/tearing down PMU for PV guests Signed-off-by: Boris Ostrovsky Acked-by: Kevin Tian Acked-by: Daniel De Graaf --- Changes in v19: * Keep track of PV(H) VPMU count for non-dom0 VPMUs * Move vpmu.xenpmu_data test in pvpmu_init() under lock * Return better error codes in pvpmu_init() tools/flask/policy/policy/modules/xen/xen.te | 4 + xen/arch/x86/domain.c| 2 + xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/svm/svm.c | 4 +- xen/arch/x86/hvm/svm/vpmu.c | 44 ++ xen/arch/x86/hvm/vmx/vmx.c | 4 +- xen/arch/x86/hvm/vmx/vpmu_core2.c| 79 - xen/arch/x86/hvm/vpmu.c | 121 +-- xen/common/event_channel.c | 1 + xen/include/asm-x86/hvm/vpmu.h | 2 + xen/include/public/pmu.h | 2 + xen/include/public/xen.h | 1 + xen/include/xsm/dummy.h | 3 + xen/xsm/flask/hooks.c| 4 + xen/xsm/flask/policy/access_vectors | 2 + 15 files changed, 226 insertions(+), 48 deletions(-) diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index 870ff81..73bbe7b 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -120,6 +120,10 @@ domain_comms(dom0_t, dom0_t) # Allow all domains to use (unprivileged parts of) the tmem hypercall allow domain_type xen_t:xen tmem_op; +# Allow all domains to use PMU (but not to change its settings --- that's what +# pmu_ctrl is for) +allow domain_type xen_t:xen2 pmu_use; + ### # # Domain creation diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 60d9a80..f19087e 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -437,6 +437,8 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); } +spin_lock_init(&v->arch.vpmu.vpmu_lock); + if ( has_hvm_container_domain(d) ) { rc = hvm_vcpu_initialise(v); diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 4734d71..07ad171 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4915,6 +4915,7 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { HYPERCALL(hvm_op), HYPERCALL(sysctl), HYPERCALL(domctl), +HYPERCALL(xenpmu_op), [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation }; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b6e77cd..e523d12 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1166,7 +1166,9 @@ static int svm_vcpu_initialise(struct vcpu *v) return rc; } -vpmu_initialise(v); +/* PVH's VPMU is initialized via hypercall */ +if ( is_hvm_vcpu(v) ) +vpmu_initialise(v); svm_guest_osvw_init(v); diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index b60ca40..58a0dc4 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -360,17 +360,19 @@ static void amd_vpmu_destroy(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); -if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) ) -amd_vpmu_unset_msr_bitmap(v); +if ( has_hvm_container_vcpu(v) ) +{ +if ( is_msr_bitmap_on(vpmu) ) +amd_vpmu_unset_msr_bitmap(v); -xfree(vpmu->context); -vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); +if ( is_hvm_vcpu(v) ) +xfree(vpmu->context); -if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) -{ -vpmu_reset(vpmu, VPMU_RUNNING); release_pmu_ownship(PMU_OWNER_HVM); } + +vpmu->context = NULL; +vpmu_clear(vpmu); } /* VPMU part of the 'q' keyhandler */ @@ -435,15 +437,19 @@ int svm_vpmu_initialise(struct vcpu *v) if ( !counters ) return -EINVAL; -ctxt = xzalloc_bytes(sizeof(*ctxt) + - 2 * sizeof(uint64_t) * num_counters); -if ( !ctxt ) +if ( is_hvm_vcpu(v) ) { -printk(XENLOG_G_WARNING "Insufficient memory for PMU, " - " PMU feature is unavailable on domain %d vcpu %d.\n", - v->vcpu_id, v->domain->domain_id); -return -ENOMEM; +ctxt = xzalloc_bytes(sizeof(*ctxt) + + 2 * sizeof(uint64_t) * num_counters); +if ( !ctxt ) +{ +printk(XENLOG_G_WARNING "%pv: Insufficient memory for PMU, " + " PMU feature is unavailable\n", v); +return -ENOMEM; +} } +else +ctxt = &v->arch.vpmu.xenpmu_data->pmu.c.amd; ctxt->counters = sizeof(*ctxt); ctxt->ctrls = ctxt->counters + sizeof(uint64_t) * num_counters; @@ -482,6 +488,16 @@ int _
[Xen-devel] [PATCH v19 04/14] x86/VPMU: Make vpmu not HVM-specific
vpmu structure will be used for both HVM and PV guests. Move it from hvm_vcpu to arch_vcpu. Signed-off-by: Boris Ostrovsky Acked-by: Jan Beulich Reviewed-by: Kevin Tian Reviewed-by: Dietmar Hahn Tested-by: Dietmar Hahn --- xen/include/asm-x86/domain.h | 2 ++ xen/include/asm-x86/hvm/vcpu.h | 3 --- xen/include/asm-x86/hvm/vpmu.h | 5 ++--- 3 files changed, 4 insertions(+), 6 deletions(-) diff --git a/xen/include/asm-x86/domain.h b/xen/include/asm-x86/domain.h index 9cdffa8..2686a4f 100644 --- a/xen/include/asm-x86/domain.h +++ b/xen/include/asm-x86/domain.h @@ -434,6 +434,8 @@ struct arch_vcpu void (*ctxt_switch_from) (struct vcpu *); void (*ctxt_switch_to) (struct vcpu *); +struct vpmu_struct vpmu; + /* Virtual Machine Extensions */ union { struct pv_vcpu pv_vcpu; diff --git a/xen/include/asm-x86/hvm/vcpu.h b/xen/include/asm-x86/hvm/vcpu.h index 3d8f4dc..0faf60d 100644 --- a/xen/include/asm-x86/hvm/vcpu.h +++ b/xen/include/asm-x86/hvm/vcpu.h @@ -151,9 +151,6 @@ struct hvm_vcpu { u32 msr_tsc_aux; u64 msr_tsc_adjust; -/* VPMU */ -struct vpmu_struct vpmu; - union { struct arch_vmx_struct vmx; struct arch_svm_struct svm; diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 83eea7e..82bfa0e 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -31,9 +31,8 @@ #define VPMU_BOOT_ENABLED 0x1/* vpmu generally enabled. */ #define VPMU_BOOT_BTS 0x2/* Intel BTS feature wanted. */ -#define vcpu_vpmu(vcpu) (&((vcpu)->arch.hvm_vcpu.vpmu)) -#define vpmu_vcpu(vpmu) (container_of((vpmu), struct vcpu, \ - arch.hvm_vcpu.vpmu)) +#define vcpu_vpmu(vcpu) (&(vcpu)->arch.vpmu) +#define vpmu_vcpu(vpmu) container_of((vpmu), struct vcpu, arch.vpmu) #define MSR_TYPE_COUNTER0 #define MSR_TYPE_CTRL 1 -- 1.8.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 3/5] xen: print online pCPUs and free pCPUs when dumping
e.g., with `xl debug-key r', like this: (XEN) Online Cpus: 0-15 (XEN) Free Cpus: 8-15 Also, for each cpupool, print the set of pCPUs it contains, like this: (XEN) Cpupool 0: (XEN) Cpus: 0-7 (XEN) Scheduler: SMP Credit Scheduler (credit) Signed-off-by: Dario Faggioli Cc: Juergen Gross Cc: George Dunlap Cc: Jan Beulich Cc: Keir Fraser --- Changes from v1: * _print_cpumap() becomes print_cpumap() (i.e., the leading '_' was not particularly useful in this case), as suggested during review * changed the output such as (1) we only print the maps, not the number of elements, and (2) we avoid printing the free cpus map when empty * improved the changelog --- I'm not including any Reviewed-by / Acked-by tag, since the patch changed. --- xen/common/cpupool.c | 12 1 file changed, 12 insertions(+) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index cd6aab9..812a2f9 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #define for_each_cpupool(ptr)\ @@ -658,6 +659,12 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op) return ret; } +static void print_cpumap(const char *str, const cpumask_t *map) +{ +cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), map); +printk("%s: %s\n", str, keyhandler_scratch); +} + void dump_runq(unsigned char key) { unsigned longflags; @@ -671,12 +678,17 @@ void dump_runq(unsigned char key) sched_smt_power_savings? "enabled":"disabled"); printk("NOW=0x%08X%08X\n", (u32)(now>>32), (u32)now); +print_cpumap("Online Cpus", &cpu_online_map); +if ( cpumask_weight(&cpupool_free_cpus) ) +print_cpumap("Free Cpus", &cpupool_free_cpus); + printk("Idle cpupool:\n"); schedule_dump(NULL); for_each_cpupool(c) { printk("Cpupool %d:\n", (*c)->cpupool_id); +print_cpumap("Cpus", (*c)->cpu_valid); schedule_dump(*c); } ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 05/14] x86/VPMU: Interface for setting PMU mode and flags
Add runtime interface for setting PMU mode and flags. Three main modes are provided: * XENPMU_MODE_OFF: PMU is not virtualized * XENPMU_MODE_SELF: Guests can access PMU MSRs and receive PMU interrupts. * XENPMU_MODE_HV: Same as XENPMU_MODE_SELF for non-proviledged guests, dom0 can profile itself and the hypervisor. Note that PMU modes are different from what can be provided at Xen's boot line with 'vpmu' argument. An 'off' (or '0') value is equivalent to XENPMU_MODE_OFF. Any other value, on the other hand, will cause VPMU mode to be set to XENPMU_MODE_SELF during boot. For feature flags only Intel's BTS is currently supported. Mode and flags are set via HYPERVISOR_xenpmu_op hypercall. Signed-off-by: Boris Ostrovsky Acked-by: Daniel De Graaf --- Changes in v19: * Keep track of active vpmu count and allow certain mode changes only when the count is zero * Drop vpmu_unload routines * Revert to to using opt_vpmu_enabled * Changes to oprofile code are no longer needed * Changes to vmcs.c are no longer needed * Simplified vpmu_switch_from/to inlines tools/flask/policy/policy/modules/xen/xen.te | 3 + xen/arch/x86/domain.c| 4 +- xen/arch/x86/hvm/svm/vpmu.c | 4 +- xen/arch/x86/hvm/vmx/vpmu_core2.c| 10 +- xen/arch/x86/hvm/vpmu.c | 155 +-- xen/arch/x86/x86_64/compat/entry.S | 4 + xen/arch/x86/x86_64/entry.S | 4 + xen/include/asm-x86/hvm/vpmu.h | 27 +++-- xen/include/public/pmu.h | 45 xen/include/public/xen.h | 1 + xen/include/xen/hypercall.h | 4 + xen/include/xlat.lst | 1 + xen/include/xsm/dummy.h | 15 +++ xen/include/xsm/xsm.h| 6 ++ xen/xsm/dummy.c | 1 + xen/xsm/flask/hooks.c| 18 xen/xsm/flask/policy/access_vectors | 2 + 17 files changed, 279 insertions(+), 25 deletions(-) diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index c0128aa..870ff81 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -68,6 +68,9 @@ allow dom0_t xen_t:xen2 { resource_op psr_cmt_op }; +allow dom0_t xen_t:xen2 { +pmu_ctrl +}; allow dom0_t xen_t:mmu memorymap; # Allow dom0 to use these domctls on itself. For domctls acting on other diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 21f0766..60d9a80 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1536,7 +1536,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) if ( is_hvm_vcpu(prev) ) { if (prev != next) -vpmu_save(prev); +vpmu_switch_from(prev); if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) ) pt_save_timer(prev); @@ -1581,7 +1581,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) if (is_hvm_vcpu(next) && (prev != next) ) /* Must be done with interrupts enabled */ -vpmu_load(next); +vpmu_switch_to(next); context_saved(prev); diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index a8b79df..481ea7b 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -472,14 +472,14 @@ struct arch_vpmu_ops amd_vpmu_ops = { .arch_vpmu_dump = amd_vpmu_dump }; -int svm_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +int svm_vpmu_initialise(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); uint8_t family = current_cpu_data.x86; int ret = 0; /* vpmu enabled? */ -if ( !vpmu_flags ) +if ( vpmu_mode == XENPMU_MODE_OFF ) return 0; switch ( family ) diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index c2405bf..6280644 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -708,13 +708,13 @@ static int core2_vpmu_do_interrupt(struct cpu_user_regs *regs) return 1; } -static int core2_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +static int core2_vpmu_initialise(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); u64 msr_content; static bool_t ds_warned; -if ( !(vpmu_flags & VPMU_BOOT_BTS) ) +if ( !(vpmu_features & XENPMU_FEATURE_INTEL_BTS) ) goto func_out; /* Check the 'Debug Store' feature in the CPUID.EAX[1]:EDX[21] */ while ( boot_cpu_has(X86_FEATURE_DS) ) @@ -826,7 +826,7 @@ struct arch_vpmu_ops core2_no_vpmu_ops = { .do_cpuid = core2_no_vpmu_do_cpuid, }; -int vmx_vpmu_initialise(struct vcpu *v, unsigned int vpmu_flags) +int vmx_vpmu_initialise(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); uint8_t family = current_cpu_data.x86; @@ -834,7 +8
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, 2015-03-17 at 14:57 +, Wei Liu wrote: > On Tue, Mar 17, 2015 at 02:54:09PM +, Ian Campbell wrote: > > On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote: > > > 2. The ability to access files in Dom0. That will be used to write to / > > >read from QEMU state file. > > > > This requirement is not as broad as you make it sound. > > > > Yes. You're right. > > > All which is really required is the ability to slurp in or write out a > > blob of bytes to a service running in a control domain, not actual > > This is more accurate. It's probably also worth also mentioning that it is a streaming read or write, no need to support seek or such things. > > ability to read/write files in dom0 (which would need careful security > > consideration!). > > > > For the old qemu-traditional stubdom for example this is implemented as > > a pair of console devices (one r/o for restore + one w/o for save) which > > are setup by the toolstack at start of day and pre-plumbed into two > > temporary files. > > > > Unfortunately I don't think that hack in mini-os is upstreamable in rump > kernel. The mini-os implementation is hacky, it is ultimately just a way of implementing open("/dev/hvc1", "r") without actually having to have all of that sort of thing really. But the concept of "open a r/o device and read from it" (or vice versa) doesn't seem to be too bad to me and I expected rumpkernels to have some sort of concept like this somewhere. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 13/14] x86/VPMU: Add privileged PMU mode
Add support for privileged PMU mode (XENPMU_MODE_ALL) which allows privileged domain (dom0) profile both itself (and the hypervisor) and the guests. While this mode is on profiling in guests is disabled. Signed-off-by: Boris Ostrovsky --- Changes in v19: * Slightly different mode changing logic in xenpmu_op() since we no longer allow mode changes while VPMUs are active xen/arch/x86/hvm/vpmu.c | 34 +- xen/arch/x86/traps.c | 13 + xen/include/public/pmu.h | 3 +++ 3 files changed, 41 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index beed956..71c5063 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -111,7 +111,9 @@ int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, const struct arch_vpmu_ops *ops; int ret = 0; -if ( vpmu_mode == XENPMU_MODE_OFF ) +if ( (vpmu_mode == XENPMU_MODE_OFF) || + ((vpmu_mode & XENPMU_MODE_ALL) && + !is_hardware_domain(current->domain)) ) goto nop; curr = current; @@ -166,8 +168,12 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) struct vcpu *sampled = current, *sampling; struct vpmu_struct *vpmu; -/* dom0 will handle interrupt for special domains (e.g. idle domain) */ -if ( sampled->domain->domain_id >= DOMID_FIRST_RESERVED ) +/* + * dom0 will handle interrupt for special domains (e.g. idle domain) or, + * in XENPMU_MODE_ALL, for everyone. + */ +if ( (vpmu_mode & XENPMU_MODE_ALL) || + (sampled->domain->domain_id >= DOMID_FIRST_RESERVED) ) { sampling = choose_hwdom_vcpu(); if ( !sampling ) @@ -177,17 +183,18 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) sampling = sampled; vpmu = vcpu_vpmu(sampling); -if ( !is_hvm_vcpu(sampling) ) +if ( !is_hvm_vcpu(sampling) || (vpmu_mode & XENPMU_MODE_ALL) ) { /* PV(H) guest */ const struct cpu_user_regs *cur_regs; uint64_t *flags = &vpmu->xenpmu_data->pmu.pmu_flags; -uint32_t domid = DOMID_SELF; +uint32_t domid; if ( !vpmu->xenpmu_data ) return; if ( is_pvh_vcpu(sampling) && + !(vpmu_mode & XENPMU_MODE_ALL) && !vpmu->arch_vpmu_ops->do_interrupt(regs) ) return; @@ -204,6 +211,11 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) else *flags = PMU_SAMPLE_PV; +if ( sampled == sampling ) +domid = DOMID_SELF; +else +domid = sampled->domain->domain_id; + /* Store appropriate registers in xenpmu_data */ /* FIXME: 32-bit PVH should go here as well */ if ( is_pv_32bit_vcpu(sampling) ) @@ -232,7 +244,8 @@ void vpmu_do_interrupt(struct cpu_user_regs *regs) if ( (vpmu_mode & XENPMU_MODE_SELF) ) cur_regs = guest_cpu_user_regs(); -else if ( !guest_mode(regs) && is_hardware_domain(sampling->domain) ) +else if ( !guest_mode(regs) && + is_hardware_domain(sampling->domain) ) { cur_regs = regs; domid = DOMID_XEN; @@ -508,7 +521,8 @@ static int pvpmu_init(struct domain *d, xen_pmu_params_t *params) struct page_info *page; uint64_t gfn = params->val; -if ( vpmu_mode == XENPMU_MODE_OFF ) +if ( (vpmu_mode == XENPMU_MODE_OFF) || + ((vpmu_mode & XENPMU_MODE_ALL) && !is_hardware_domain(d)) ) return -EINVAL; if ( (params->vcpu >= d->max_vcpus) || (d->vcpu == NULL) || @@ -627,12 +641,14 @@ long do_xenpmu_op(unsigned int op, XEN_GUEST_HANDLE_PARAM(xen_pmu_params_t) arg) { case XENPMU_mode_set: { -if ( (pmu_params.val & ~(XENPMU_MODE_SELF | XENPMU_MODE_HV)) || +if ( (pmu_params.val & + ~(XENPMU_MODE_SELF | XENPMU_MODE_HV | XENPMU_MODE_ALL)) || (hweight64(pmu_params.val) > 1) ) return -EINVAL; /* 32-bit dom0 can only sample itself. */ -if ( is_pv_32bit_vcpu(current) && (pmu_params.val & XENPMU_MODE_HV) ) +if ( is_pv_32bit_vcpu(current) && + (pmu_params.val & (XENPMU_MODE_HV | XENPMU_MODE_ALL)) ) return -EINVAL; spin_lock(&vpmu_lock); diff --git a/xen/arch/x86/traps.c b/xen/arch/x86/traps.c index 1eb7bb4..8a40deb 100644 --- a/xen/arch/x86/traps.c +++ b/xen/arch/x86/traps.c @@ -2653,6 +2653,10 @@ static int emulate_privileged_op(struct cpu_user_regs *regs) case MSR_AMD_FAM15H_EVNTSEL0...MSR_AMD_FAM15H_PERFCTR5: if ( vpmu_msr || (boot_cpu_data.x86_vendor == X86_VENDOR_AMD) ) { +if ( (vpmu_mode & XENPMU_MODE_ALL) && + !is_hardware_domain(v->domain) ) +break; + if ( vpmu_do_wrmsr(regs->ecx, msr_content, 0) ) goto fail;
[Xen-devel] [PATCH v19 12/14] x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr
The two routines share most of their logic. Signed-off-by: Boris Ostrovsky --- Changes in v19: * const-ified arch_vpmu_ops in vpmu_do_wrmsr * non-changes: - kept 'current' as a non-initializer to avoid unnecessary initialization in the (common) non-VPMU case - kept 'nop' label since there are multiple dissimilar cases that can cause a non-emulation of VPMU access xen/arch/x86/hvm/vpmu.c| 76 +- xen/include/asm-x86/hvm/vpmu.h | 14 ++-- 2 files changed, 42 insertions(+), 48 deletions(-) diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index c287d8b..beed956 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -103,63 +103,47 @@ void vpmu_lvtpc_update(uint32_t val) apic_write(APIC_LVTPC, vpmu->hw_lapic_lvtpc); } -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported) +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, +uint64_t supported, bool_t is_write) { -struct vcpu *curr = current; +struct vcpu *curr; struct vpmu_struct *vpmu; +const struct arch_vpmu_ops *ops; +int ret = 0; if ( vpmu_mode == XENPMU_MODE_OFF ) -return 0; +goto nop; +curr = current; vpmu = vcpu_vpmu(curr); -if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_wrmsr ) -{ -int ret = vpmu->arch_vpmu_ops->do_wrmsr(msr, msr_content, supported); - -/* - * We may have received a PMU interrupt during WRMSR handling - * and since do_wrmsr may load VPMU context we should save - * (and unload) it again. - */ -if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data && - (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) -{ -vpmu_set(vpmu, VPMU_CONTEXT_SAVE); -vpmu->arch_vpmu_ops->arch_vpmu_save(curr); -vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); -} -return ret; -} - -return 0; -} - -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) -{ -struct vcpu *curr = current; -struct vpmu_struct *vpmu; +ops = vpmu->arch_vpmu_ops; +if ( !ops ) +goto nop; + +if ( is_write && ops->do_wrmsr ) +ret = ops->do_wrmsr(msr, *msr_content, supported); +else if ( !is_write && ops->do_rdmsr ) +ret = ops->do_rdmsr(msr, msr_content); +else +goto nop; -if ( vpmu_mode == XENPMU_MODE_OFF ) +/* + * We may have received a PMU interrupt while handling MSR access + * and since do_wr/rdmsr may load VPMU context we should save + * (and unload) it again. + */ +if ( !is_hvm_vcpu(curr) && + vpmu->xenpmu_data && (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) { -*msr_content = 0; -return 0; +vpmu_set(vpmu, VPMU_CONTEXT_SAVE); +ops->arch_vpmu_save(curr); +vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); } -vpmu = vcpu_vpmu(curr); -if ( vpmu->arch_vpmu_ops && vpmu->arch_vpmu_ops->do_rdmsr ) -{ -int ret = vpmu->arch_vpmu_ops->do_rdmsr(msr, msr_content); +return ret; -if ( !is_hvm_vcpu(curr) && vpmu->xenpmu_data && - (vpmu->xenpmu_data->pmu.pmu_flags & PMU_CACHED) ) -{ -vpmu_set(vpmu, VPMU_CONTEXT_SAVE); -vpmu->arch_vpmu_ops->arch_vpmu_save(curr); -vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); -} -return ret; -} -else + nop: +if ( !is_write ) *msr_content = 0; return 0; diff --git a/xen/include/asm-x86/hvm/vpmu.h b/xen/include/asm-x86/hvm/vpmu.h index 642a4b7..63851a7 100644 --- a/xen/include/asm-x86/hvm/vpmu.h +++ b/xen/include/asm-x86/hvm/vpmu.h @@ -99,8 +99,8 @@ static inline bool_t vpmu_are_all_set(const struct vpmu_struct *vpmu, } void vpmu_lvtpc_update(uint32_t val); -int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported); -int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content); +int vpmu_do_msr(unsigned int msr, uint64_t *msr_content, +uint64_t supported, bool_t is_write); void vpmu_do_interrupt(struct cpu_user_regs *regs); void vpmu_do_cpuid(unsigned int input, unsigned int *eax, unsigned int *ebx, unsigned int *ecx, unsigned int *edx); @@ -110,6 +110,16 @@ void vpmu_save(struct vcpu *v); void vpmu_load(struct vcpu *v); void vpmu_dump(struct vcpu *v); +static inline int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, +uint64_t supported) +{ +return vpmu_do_msr(msr, &msr_content, supported, 1); +} +static inline int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) +{ +return vpmu_do_msr(msr, msr_content, 0, 0); +} + extern int acquire_pmu_ownership(int pmu_ownership); extern void release_pmu_ownership(int pmu_ownership); -- 1.8.1.4 __
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, Mar 17, 2015 at 02:29:07PM +, Wei Liu wrote: > I've now successfully built QEMU upstream with rump kernel. However to > make it fully functional as a stubdom, there are some missing pieces to > be added in. > > 1. The ability to access QMP socket (a unix socket) from Dom0. That >will be used to issue command to QEMU. The QMP "socket" does not needs to be a unix socket. It can be any of those (from qemu --help): Character device options: -chardev null,id=id[,mux=on|off] -chardev socket,id=id[,host=host],port=port[,to=to][,ipv4][,ipv6][,nodelay][,reconnect=seconds] [,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (tcp) -chardev socket,id=id,path=path[,server][,nowait][,telnet][,reconnect=seconds][,mux=on|off] (unix) -chardev udp,id=id[,host=host],port=port[,localaddr=localaddr] [,localport=localport][,ipv4][,ipv6][,mux=on|off] -chardev msmouse,id=id[,mux=on|off] -chardev vc,id=id[[,width=width][,height=height]][[,cols=cols][,rows=rows]] [,mux=on|off] -chardev ringbuf,id=id[,size=size] -chardev file,id=id,path=path[,mux=on|off] -chardev pipe,id=id,path=path[,mux=on|off] -chardev pty,id=id[,mux=on|off] -chardev stdio,id=id[,mux=on|off][,signal=on|off] -chardev serial,id=id,path=path[,mux=on|off] -chardev tty,id=id,path=path[,mux=on|off] -chardev parallel,id=id,path=path[,mux=on|off] -chardev parport,id=id,path=path[,mux=on|off] -chardev spicevmc,id=id,name=name[,debug=debug] -chardev spiceport,id=id,name=name[,debug=debug] > 2. The ability to access files in Dom0. That will be used to write to / >read from QEMU state file. To save a QEMU state (write), we do use a filename. But I guest we could expand the QMP command (xen-save-devices-state) to use something else, if it's easier. To restore, we provide a file descriptor from libxl to QEMU, with the fd on the file that contain the state we want to restore. But there are a few other way to load a state (from qemu.git/docs/migration.txt): - tcp migration: do the migration using tcp sockets - unix migration: do the migration using unix sockets - exec migration: do the migration using the stdin/stdout through a process. - fd migration: do the migration using an file descriptor that is passed to QEMU. QEMU doesn't care how this file descriptor is opened. -- Anthony PERARD ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 10/14] x86/VPMU: Add support for PMU register handling on PV guests
Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops for VCPU are not initialized (which is the case, for example, for PV guests that are not "VPMU-enlightened") access to MSRs will return failure. Dump VPMU state for all domains (HVM and PV) when requested. Signed-off-by: Boris Ostrovsky Acked-by: Jan Beulich Acked-by: Kevin Tian Reviewed-by: Dietmar Hahn Tested-by: Dietmar Hahn --- xen/arch/x86/domain.c | 3 +-- xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 +++-- xen/arch/x86/hvm/vpmu.c | 3 +++ xen/arch/x86/traps.c | 51 +-- 4 files changed, 95 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index c7f8210..a48d824 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2065,8 +2065,7 @@ void arch_dump_vcpu_info(struct vcpu *v) { paging_dump_vcpu_info(v); -if ( is_hvm_vcpu(v) ) -vpmu_dump(v); +vpmu_dump(v); } void domain_cpuid( diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index d10e3e7..66d7bc0 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -27,6 +27,7 @@ #include #include #include +#include #include #include #include @@ -299,12 +300,18 @@ static inline void __core2_vpmu_save(struct vcpu *v) rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]); for ( i = 0; i < arch_pmc_cnt; i++ ) rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter); + +if ( !has_hvm_container_vcpu(v) ) +rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt->global_status); } static int core2_vpmu_save(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); +if ( !has_hvm_container_vcpu(v) ) +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) return 0; @@ -342,6 +349,13 @@ static inline void __core2_vpmu_load(struct vcpu *v) wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt->fixed_ctrl); wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt->ds_area); wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt->pebs_enable); + +if ( !has_hvm_container_vcpu(v) ) +{ +wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt->global_ovf_ctrl); +core2_vpmu_cxt->global_ovf_ctrl = 0; +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); +} } static void core2_vpmu_load(struct vcpu *v) @@ -442,7 +456,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported) { -u64 global_ctrl; int i, tmp; int type = -1, index = -1; struct vcpu *v = current; @@ -486,7 +499,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, switch ( msr ) { case MSR_CORE_PERF_GLOBAL_OVF_CTRL: +if ( msr_content & ~(0xC000 | + (((1ULL << fixed_pmc_cnt) - 1) << 32) | + ((1ULL << arch_pmc_cnt) - 1)) ) +return 1; core2_vpmu_cxt->global_status &= ~msr_content; +wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); return 0; case MSR_CORE_PERF_GLOBAL_STATUS: gdprintk(XENLOG_INFO, "Can not write readonly MSR: " @@ -514,14 +532,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, gdprintk(XENLOG_WARNING, "Guest setting of DTS is ignored.\n"); return 0; case MSR_CORE_PERF_GLOBAL_CTRL: -global_ctrl = msr_content; +core2_vpmu_cxt->global_ctrl = msr_content; break; case MSR_CORE_PERF_FIXED_CTR_CTRL: if ( msr_content & ( ~((1ull << (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) ) return 1; -vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); +if ( has_hvm_container_vcpu(v) ) +vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, + &core2_vpmu_cxt->global_ctrl); +else +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); *enabled_cntrs &= ~(((1ULL << fixed_pmc_cnt) - 1) << 32); if ( msr_content != 0 ) { @@ -546,7 +568,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, if ( msr_content & (~((1ull << 32) - 1)) ) return 1; -vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, &global_ctrl); +if ( has_hvm_container_vcpu(v) ) +vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, + &core2_vpmu_cxt->global_ctrl); +else +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt->global_ctrl); if ( msr_content & (1ULL <
[Xen-devel] [PATCH v19 01/14] x86/VPMU: VPMU should not exist when vpmu_initialise() is called
We don't need to try to destroy it since it can't be already allocated at the time we try to initialize it. Signed-off-by: Boris Ostrovsky Suggested-by: Andrew Cooper --- Changes in v19: * Removed unnecesary test for VPMU_CONTEXT_ALLOCATED in svm/vpmu.c xen/arch/x86/hvm/svm/vpmu.c | 3 --- xen/arch/x86/hvm/vpmu.c | 5 + 2 files changed, 1 insertion(+), 7 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 64dc167..6764070 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -359,9 +359,6 @@ static int amd_vpmu_initialise(struct vcpu *v) struct vpmu_struct *vpmu = vcpu_vpmu(v); uint8_t family = current_cpu_data.x86; -if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) -return 0; - if ( counters == NULL ) { switch ( family ) diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 0e6b6c0..c3273ee 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -236,10 +236,7 @@ void vpmu_initialise(struct vcpu *v) if ( is_pvh_vcpu(v) ) return; -if ( vpmu_is_set(vpmu, VPMU_CONTEXT_ALLOCATED) ) -vpmu_destroy(v); -vpmu_clear(vpmu); -vpmu->context = NULL; +ASSERT(!vpmu->flags && !vpmu->context); switch ( vendor ) { -- 1.8.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v19 09/14] x86/VPMU: When handling MSR accesses, leave fault injection to callers
With this patch return value of 1 of vpmu_do_msr() will now indicate whether an error was encountered during MSR processing (instead of stating that the access was to a VPMU register). As part of this patch we also check for validity of certain MSR accesses right when we determine which register is being written, as opposed to postponing this until later. Signed-off-by: Boris Ostrovsky Acked-by: Kevin Tian Reviewed-by: Dietmar Hahn Tested-by: Dietmar Hahn --- xen/arch/x86/hvm/svm/svm.c| 6 ++- xen/arch/x86/hvm/svm/vpmu.c | 6 +-- xen/arch/x86/hvm/vmx/vmx.c| 24 +--- xen/arch/x86/hvm/vmx/vpmu_core2.c | 82 ++- 4 files changed, 55 insertions(+), 63 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index e523d12..4fe36e9 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1709,7 +1709,8 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) case MSR_AMD_FAM15H_EVNTSEL3: case MSR_AMD_FAM15H_EVNTSEL4: case MSR_AMD_FAM15H_EVNTSEL5: -vpmu_do_rdmsr(msr, msr_content); +if ( vpmu_do_rdmsr(msr, msr_content) ) +goto gpf; break; case MSR_AMD64_DR0_ADDRESS_MASK: @@ -1860,7 +1861,8 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) case MSR_AMD_FAM15H_EVNTSEL3: case MSR_AMD_FAM15H_EVNTSEL4: case MSR_AMD_FAM15H_EVNTSEL5: -vpmu_do_wrmsr(msr, msr_content, 0); +if ( vpmu_do_wrmsr(msr, msr_content, 0) ) +goto gpf; break; case MSR_IA32_MCx_MISC(4): /* Threshold register */ diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 58a0dc4..474d0db 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -305,7 +305,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, is_pmu_enabled(msr_content) && !vpmu_is_set(vpmu, VPMU_RUNNING) ) { if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) -return 1; +return 0; vpmu_set(vpmu, VPMU_RUNNING); if ( has_hvm_container_vcpu(v) && is_msr_bitmap_on(vpmu) ) @@ -335,7 +335,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, /* Write to hw counters */ wrmsrl(msr, msr_content); -return 1; +return 0; } static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) @@ -353,7 +353,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) rdmsrl(msr, *msr_content); -return 1; +return 0; } static void amd_vpmu_destroy(struct vcpu *v) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index 83b740a..206e50d 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2127,12 +2127,17 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL | MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; /* Perhaps vpmu will change some bits. */ +/* FALLTHROUGH */ +case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7): +case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3): +case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: +case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: +case MSR_IA32_PEBS_ENABLE: +case MSR_IA32_DS_AREA: if ( vpmu_do_rdmsr(msr, msr_content) ) -goto done; +goto gp_fault; break; default: -if ( vpmu_do_rdmsr(msr, msr_content) ) -break; if ( passive_domain_do_rdmsr(msr, msr_content) ) goto done; switch ( long_mode_do_msr_read(msr, msr_content) ) @@ -2308,7 +2313,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) if ( msr_content & ~supported ) { /* Perhaps some other bits are supported in vpmu. */ -if ( !vpmu_do_wrmsr(msr, msr_content, supported) ) +if ( vpmu_do_wrmsr(msr, msr_content, supported) ) break; } if ( msr_content & IA32_DEBUGCTLMSR_LBR ) @@ -2336,9 +2341,16 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) if ( !nvmx_msr_write_intercept(msr, msr_content) ) goto gp_fault; break; +case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7): +case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7): +case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: +case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: +case MSR_IA32_PEBS_ENABLE: +case MSR_IA32_DS_AREA: + if ( vpmu_do_wrmsr(msr, msr_content, 0) ) +goto gp_fault; +break; default: -if ( vpmu_do_wrmsr(msr, msr_content, 0) ) -return X86EMUL_OKAY; if ( passive_domain_do_wrmsr(msr, msr_content) ) return X86EMUL_O
[Xen-devel] [PATCH v19 08/14] x86/VPMU: Save VPMU state for PV guests during context switch
Save VPMU state during context switch for both HVM and PV(H) guests. A subsequent patch ("x86/VPMU: NMI-based VPMU support") will make it possible for vpmu_switch_to() to call vmx_vmcs_try_enter()->vcpu_pause() which needs is_running to be correctly set/cleared. To prepare for that, call context_saved() before vpmu_switch_to() is executed. (Note that while this change could have been dalayed until that later patch, the changes are harmless to existing code and so we do it here) Signed-off-by: Boris Ostrovsky --- Changes in v19: * Adjusted for new vpmu_switch_to/from interface xen/arch/x86/domain.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index f19087e..c7f8210 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1533,17 +1533,14 @@ void context_switch(struct vcpu *prev, struct vcpu *next) } if ( prev != next ) -_update_runstate_area(prev); - -if ( is_hvm_vcpu(prev) ) { -if (prev != next) -vpmu_switch_from(prev); - -if ( !list_empty(&prev->arch.hvm_vcpu.tm_list) ) -pt_save_timer(prev); +_update_runstate_area(prev); +vpmu_switch_from(prev); } +if ( is_hvm_vcpu(prev) && !list_empty(&prev->arch.hvm_vcpu.tm_list) ) +pt_save_timer(prev); + local_irq_disable(); set_current(next); @@ -1581,15 +1578,16 @@ void context_switch(struct vcpu *prev, struct vcpu *next) !is_hardware_domain(next->domain)); } -if (is_hvm_vcpu(next) && (prev != next) ) -/* Must be done with interrupts enabled */ -vpmu_switch_to(next); - context_saved(prev); if ( prev != next ) +{ _update_runstate_area(next); +/* Must be done with interrupts enabled */ +vpmu_switch_to(next); +} + /* Ensure that the vcpu has an up-to-date time base. */ update_vcpu_system_time(next); -- 1.8.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Upstream QEMU based stubdom and rump kernel
On Tue, Mar 17, 2015 at 02:54:09PM +, Ian Campbell wrote: > On Tue, 2015-03-17 at 14:29 +, Wei Liu wrote: > > 2. The ability to access files in Dom0. That will be used to write to / > >read from QEMU state file. > > This requirement is not as broad as you make it sound. > Yes. You're right. > All which is really required is the ability to slurp in or write out a > blob of bytes to a service running in a control domain, not actual This is more accurate. > ability to read/write files in dom0 (which would need careful security > consideration!). > > For the old qemu-traditional stubdom for example this is implemented as > a pair of console devices (one r/o for restore + one w/o for save) which > are setup by the toolstack at start of day and pre-plumbed into two > temporary files. > Unfortunately I don't think that hack in mini-os is upstreamable in rump kernel. Wei. > Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-upstream-4.3-testing test] 36494: trouble: pass/preparing
flight 36494 qemu-upstream-4.3-testing running [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36494/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-qemut-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-ovmf-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-pair 2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-freebsd10-i386 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-xend-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-i386-pv2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-freebsd10-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-libvirt 2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xend-qemut-winxpsp3 2 hosts-allocaterunning [st=running!] test-amd64-i386-xl-qemut-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-xl2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-xl-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-qemut-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-amd64-libvirt 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-ovmf-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf-pin 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf 2 hosts-allocate running [st=running!] test-amd64-amd64-xl 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-pcipt-intel 2 hosts-allocaterunning [st=running!] test-amd64-amd64-pv 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-multivcpu 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-credit2 2 hosts-allocate running [st=running!] test-amd64-amd64-pair 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-winxpsp3 2 hosts-allocate running [st=running!] version targeted for testing: qemuuab689a89ec47b2e1c964c57bea7da68f8ddf89fd baseline version: qemuu580b1d06aa3eed3ae9c12b4225a1ea1c192ab119 People who touched revisions under test: Andreas Färber Anthony Liguori Asias He Benoit Canet Benoît Canet Gerd Hoffmann Juan Quintela Kevin Wolf Michael Roth Michael S. Tsirkin Paolo Bonzini Peter Maydell Petr Matousek Stefan Hajnoczi Stefano Stabellini jobs: build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl preparing test-amd64-i386-xl preparing test-amd64-i386-rhel6hvm-amd preparing test-amd64-i386-qemut-rhel6hvm-amd preparing test-amd64-i386-qemuu-rhel6hvm-amd preparing test-amd64-amd64-xl-qemut-debianhvm-amd64preparing test-amd64-i386-xl-qemut-debianh
Re: [Xen-devel] [RFC PATCH] dpci: Put the dpci back on the list if running on another CPU.
On Tue, Mar 17, 2015 at 09:42:21AM +0100, Sander Eikelenboom wrote: > > Tuesday, March 17, 2015, 9:18:32 AM, you wrote: > > On 16.03.15 at 18:59, wrote: > >> Hence was wondering if it would just be easier to put > >> this patch in (see above) - with the benfit that folks have > >> an faster interrupt passthrough experience and then I work on another > >> variant of this with tristate cmpxchg and ->mapping atomic counter. > > > Considering how long this issue has been pending I think we really > > need to get _something_ in (or revert); if this something is the > > patch in its most recent form, so be it (even if maybe not the > > simplest of all possible variants). So please submit as a proper non- > > RFC patch. > > > Jan > > I'm still running with this first simple stopgap patch from Konrad, > and it has worked fine for me since. I believe the patch that Sander and Malcom had been running is the best candidate. The other ones I had been fiddling with - such as the one attached here - I cannot make myself comfortable that it will not hit a dead-lock. On Intel hardware the softirq is called from the vmx_resume - which means that the whole 'interrupt guest' and deliever the event code happens during the VMEXIT to VMENTER time. But that does not preclude another interrupt destined for this same vCPU to come right in as we are progressing through the softirqs - and dead-lock: in the vmx_resume stack we are in hvm_dirq_assist (called from dpci_softirq) and haven't cleared the STATE_SHED, while in the IRQ stack we spin in the raise_sofitrq_for for the STATE_SCHED to be cleared. An dead-lock avoidance could be added to save the CPU value of the softirq that is executing the dpci. And then 'raise_softirq_for' can check that and bail out if (smp_processor_id == dpci_pirq->cpu). Naturlly this means being very careful _where_ we initialize the 'cpu' to -1, etc - which brings back to carefully work out the corner cases and make sure we do the right thing - which can take time. The re-using the 'dpci' on the per-cpu list is doing the same exact thing that older tasklet code was doing. That is : If the function assigned to the tasklet was running - the softirq that ran said function (hvm_dirq_assist) would be responsible for putting the tasklet back on the per-cpu list. This would allow to have an running tasklet and an 'to-be-scheduled' tasklet at the same time. And that is what we need. I will post an proper patch and also add Tested-by from Malcom and Sander on it - as it did fix their test-cases and is unmodified (except an updated comment) from what theytested in 2014. > > I will see if this new one also "works-for-me", somewhere today :-) > > -- > Sander > > > > diff --git a/xen/drivers/passthrough/io.c b/xen/drivers/passthrough/io.c > index ae050df..d1421b0 100644 > --- a/xen/drivers/passthrough/io.c > +++ b/xen/drivers/passthrough/io.c > @@ -804,7 +804,18 @@ static void dpci_softirq(void) > d = pirq_dpci->dom; > smp_mb(); /* 'd' MUST be saved before we set/clear the bits. */ > if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) > -BUG(); > +{ > +unsigned long flags; > + > +/* Put back on the list and retry. */ > +local_irq_save(flags); > +list_add_tail(&pirq_dpci->softirq_list, &this_cpu(dpci_list)); > +local_irq_restore(flags); > + > +raise_softirq(HVM_DPCI_SOFTIRQ); > +continue; > +} > + > /* > * The one who clears STATE_SCHED MUST refcount the domain. > */ > >From 6b32dccfbe00518d3ca9cd94d19a6e007b2645d9 Mon Sep 17 00:00:00 2001 From: Konrad Rzeszutek Wilk Date: Tue, 17 Mar 2015 09:46:09 -0400 Subject: [PATCH] dpci: when scheduling spin until STATE_RUN or STATE_SCHED has been cleared. There is race when we clear the STATE_SCHED in the softirq - which allows the 'raise_softirq_for' (on another CPU) to schedule the dpci. Specifically this can happen whenthe other CPU receives an interrupt, calls 'raise_softirq_for', and puts the dpci on its per-cpu list (same dpci structure). There would be two 'dpci_softirq' running at the same time (on different CPUs) where on one CPU it would be executing hvm_dirq_assist (so had cleared STATE_SCHED and set STATE_RUN) and on the other CPU it is trying to call: if ( test_and_set_bit(STATE_RUN, &pirq_dpci->state) ) BUG(); Since STATE_RUN is already set it would end badly. The reason we can get his with this is when an interrupt affinity is set over multiple CPUs. Potential solutions: a) Instead of the BUG() we can put the dpci back on the per-cpu list to deal with later (when the softirq are activated again). This putting the 'dpci' back on the per-cpu list is an spin until the bad condition clears. b) We could also expand the test-and-set(STATE_SCHED) in raise_softirq_for to detect for 'STATE_RUN' bit being set and schedule the dpci in a more safe manner (delay
Re: [Xen-devel] [PATCH 04/10] xen/blkfront: separate ring information to an new struct
Hi Bob, > -Original Message- > From: Bob Liu [mailto:bob@oracle.com] > Sent: 17 March 2015 07:00 > To: Felipe Franciosi > Cc: Konrad Rzeszutek Wilk; Roger Pau Monne; David Vrabel; xen- > de...@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; > h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de > Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to an new > struct > > Hi Felipe, > > On 03/06/2015 06:30 PM, Felipe Franciosi wrote: > >> -Original Message- > >> From: Bob Liu [mailto:bob@oracle.com] > >> Sent: 05 March 2015 00:47 > >> To: Konrad Rzeszutek Wilk > >> Cc: Roger Pau Monne; Felipe Franciosi; David Vrabel; > >> xen-devel@lists.xen.org; linux-ker...@vger.kernel.org; ax...@fb.com; > >> h...@infradead.org; avanzini.aria...@gmail.com; cheg...@amazon.de > >> Subject: Re: [PATCH 04/10] xen/blkfront: separate ring information to > >> an new struct > >> > >> > >> ...snip... > >>> > >>> Meaning you weren't able to do the same test? > >>> > >> > >> I can if there are more details about how to set up this 5 and 10 > >> guests environment and test pattern have been used. > >> Just think it might be save time if somebody still have the similar > >> environment by hand. > >> Roger and Felipe, if you still have the environment could you please > >> have a quick compare about feature-persistent performance with patch > >> [PATCH v5 0/2] > >> gnttab: Improve scaleability? > > > > I've been meaning to do that. I don't have the environment up, but it isn't > > too > hard to put it back together. A bit swamped at the moment, but will try (very > hard) to do it next week. > > > > Do you have gotten any testing result? I've put the hardware back together and am sorting out the software for testing. Things are not moving as fast as I wanted due to other commitments. I'll keep this thread updated as I progress. Malcolm is OOO and I'm trying to get his patches to work on a newer Xen. The evaluation will compare: 1) bare metal i/o (for baseline) 2) tapdisk3 (currently using grant copy, which is what scales best in my experience) 3) blkback w/ persistent grants 4) blkback w/o persistent grants (I will just comment out the handshake bits in blkback/blkfront) 5) blkback w/o persistent grants + Malcolm's grant map patches To my knowledge, blkback (w/ or w/o persistent grants) is always faster than user space alternatives (e.g. tapdisk, qemu-qdisk) as latency is much lower. However, tapdisk with grant copy has been shown to produce (much) better aggregate throughput figures as it avoids any issues with grant (un)mapping. I'm hoping to show that (5) above scales better than (3) and (4) in a representative scenario. If it does, I will recommend that we get rid of persistent grants in favour of a better and more scalable grant (un)mapping implementation. Comments welcome. Cheers, F. > > -- > Regards, > -Bob ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-upstream-4.5-testing test] 36492: trouble: pass/preparing
flight 36492 qemu-upstream-4.5-testing running [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36492/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-qemut-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-freebsd10-amd64 2 hosts-allocaterunning [st=running!] test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-pcipt-intel 2 hosts-allocaterunning [st=running!] test-amd64-i386-libvirt 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-pvh-amd 2 hosts-allocate running [st=running!] test-armhf-armhf-xl-multivcpu 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf-pin 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-credit2 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-win7-amd64 2 hosts-allocate running [st=running!] test-armhf-armhf-xl 2 hosts-allocate running [st=running!] test-armhf-armhf-xl-credit2 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-multivcpu 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-ovmf-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-pair 2 hosts-allocate running [st=running!] test-amd64-amd64-libvirt 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xl-winxpsp3 2 hosts-allocate running [st=running!] test-armhf-armhf-xl-midway2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-armhf-armhf-xl-sedf-pin 2 hosts-allocate running [st=running!] test-amd64-amd64-xl 2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-i386-freebsd10-i386 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-xl-qemuu-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-win7-amd64 2 hosts-allocate running [st=running!] test-armhf-armhf-libvirt 2 hosts-allocate running [st=running!] test-amd64-amd64-pair 2 hosts-allocate running [st=running!] test-amd64-i386-xl2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-ovmf-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-xl-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-pvh-intel 2 hosts-allocate running [st=running!] test-armhf-armhf-xl-sedf 2 hosts-allocate running [st=running!] test-amd64-i386-qemut-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-xl-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-winxpsp3 2 hosts-allocate running [st=running!] version targeted for testing: qemuu0b8fb1ec3d666d1eb8bbff56c76c5e6daa2789e4 baseline version: qemuu1ebb75b1fee779621b63e84fefa7b07354c43a99 People who touched revisions under test: Gerd Hoffmann Gonglei Juan Quintela Michael S. Tsirkin Paolo Bonzini Peter Maydell Petr Matousek Stefano Stabellini jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pa
[Xen-devel] [qemu-upstream-4.4-testing test] 36499: trouble: pass/preparing
flight 36499 qemu-upstream-4.4-testing running [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/36499/ Failures and problems with tests :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-qemuu-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-qemut-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-freebsd10-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-libvirt 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-ovmf-amd64 2 hosts-allocaterunning [st=running!] test-amd64-i386-pair 2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-xend-qemut-winxpsp3 2 hosts-allocaterunning [st=running!] test-amd64-i386-freebsd10-i386 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-xl2 hosts-allocate running [st=running!] test-amd64-i386-xend-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-i386-pv2 hosts-allocate running [st=running!] test-amd64-i386-xl-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-qemut-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-xl-winxpsp3-vcpus1 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-sedf-pin 2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-amd 2 hosts-allocate running [st=running!] test-amd64-i386-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-i386-xl-qemut-win7-amd64 2 hosts-allocaterunning [st=running!] test-amd64-amd64-xl-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-i386-qemuu-rhel6hvm-intel 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-ovmf-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-pcipt-intel 2 hosts-allocaterunning [st=running!] test-amd64-amd64-xl-credit2 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-multivcpu 2 hosts-allocate running [st=running!] test-amd64-amd64-libvirt 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemuu-win7-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-winxpsp3 2 hosts-allocate running [st=running!] test-amd64-amd64-pv 2 hosts-allocate running [st=running!] test-amd64-amd64-pair 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-debianhvm-amd64 2 hosts-allocate running [st=running!] test-amd64-amd64-xl-qemut-win7-amd64 2 hosts-allocate running [st=running!] version targeted for testing: qemuud173a0c20d7970c17fa593cf86abc1791a8a4a3a baseline version: qemuub04df88d41f64fc6b56d193b6e90fb840cedb1d3 People who touched revisions under test: Benoit Canet Benoît Canet Dmitry Fleytman Gerd Hoffmann Jason Wang Jeff Cody Juan Quintela Kevin Wolf Laszlo Ersek Michael Roth Michael S. Tsirkin Peter Maydell Petr Matousek Stefan Hajnoczi Stefano Stabellini jobs: build-amd64-xend pass build-i386-xend pass build-amd64 pass build-i386 pass build-amd64-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-i386-pvops pass test-amd64-amd64-xl preparing test-amd64-i386-xl preparing test-amd64-i386-rhel6hvm-amd preparing test-amd64-i386-qemut-rhel6hvm-amd preparing test-amd64-i386-qemuu-rhel6hvm-amd
[Xen-devel] Upstream QEMU based stubdom and rump kernel
Hi all I'm now working on upstream QEMU stubdom, and rump kernel seems to be a good fit for this purpose. A bit background information. A stubdom is a service domain. With QEMU stubdom we are able to run QEMU device emulation code in a separate domain so that bugs in QEMU don't affect Dom0 (the controlling domain). Xen currently has a QEMU stubdom, but it's based on our fork of ancient QEMU (plus some other libraries and mini-os). Eventually we would like to use upstream QEMU in stubdom. I've now successfully built QEMU upstream with rump kernel. However to make it fully functional as a stubdom, there are some missing pieces to be added in. 1. The ability to access QMP socket (a unix socket) from Dom0. That will be used to issue command to QEMU. 2. The ability to access files in Dom0. That will be used to write to / read from QEMU state file. 3. The building process requires mini-os headers. That will be used to build libxc (the controlling library). (Xen folks, do I miss anything?) One of my lessons learned from the existing stubdom stuffs is that I should work with upstream and produce maintainable code. So before I do anything for real I'd better consult the community. My gut feeling is that the first two requirements are not really Xen specific. Let me know what you guys plan and think. Wei. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events
On 03/17/2015 04:20 PM, Jan Beulich wrote: On 17.03.15 at 15:07, wrote: >> Yes, but Andrew's idea (which I think is very neat) is that instead of >> the trickery I used to do in the original patch (create a specific >> VMCALL vm_event and compare eax to a magic constant on VMCALL-based >> VMEXITS, to figure out if all I wanted to do was send out the event), >> that I should instead have the guest set up rax, rdi and rsi and execute >> vmcall, which would then be translated to a real hypercall that sends >> out a vm_event. > > If you think about a bare HVM guest OS (i.e. without any PV > drivers), then of course you should provide such hypercall > wrappers for code to use instead of open coding it in potentially > many places. > >> In this case, the (HVM) guest does need to concern itself with what >> registers it should set up for that purpose. I suppose a workaround >> could be to write the subop in both ebx and rdi, though without any >> testing I don't know at this point what, if anything, might be broken >> that way. > > Guest code ought to know what mode it runs in. And introspection > code (in case this is about injection of such code) ought to also > know which mode the monitored guest is in. Yes, we'll try to handle this, I was mainly asking because based on Andrew's suggestion (which only mentioned rdi, not ebx) I wanted to make sure that this is not someting that people might prefer to change at Xen source code level. Thanks for the clarification, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] osstest going offline for a bit due to database server move
On Tue, 2015-03-17 at 10:28 +, Ian Campbell wrote: > On Mon, 2015-03-16 at 12:41 +, Ian Campbell wrote: > > We've not yet tracked down the source of the mysterious filer reboots > > and there was another earlier today, we've fiddled with a few things to > > see if we can track them down. > > > > osstest is doing stuff now, fingers crossed. > > There were some more reboots overnight. We've made another config change > which we hope will resolve things. If not we will look at moving the > controller VM to another filer tomorrow. > > In the meantime in an attempt to try and keep some of the more important > branches flowing with the limited bandwidth between reboots I've stopped > a bunch of stuff: [...] After discussion with Stefano I've also stopped the qemu-upstream stuff for 4.2, 4.3, 4.4 and 4.5. AIUI the tags to be used for the 4.3.x and 4.4.x branches are already in the tested branch and everything after that is targeting the next point release. $ for i in 4.2 4.3 4.4 4.5 ; do > touch qemu-upstream-$i-testing > done and killed these flights: flight | blessing | branch | intended +-+---+-- 36492 | running | qemu-upstream-4.5-testing | real 36494 | running | qemu-upstream-4.3-testing | real 36499 | running | qemu-upstream-4.4-testing | real Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] PVH DomU panics on boot on Xen 4.5.0 whereas it was fine on 4.4.1
On 17/03/15 12:54, Konrad Rzeszutek Wilk wrote: > On Mon, Mar 16, 2015 at 11:08:50PM +, Ian Murray wrote: >> On 16/03/15 14:12, Konrad Rzeszutek Wilk wrote: >>> On Sun, Mar 15, 2015 at 09:34:16PM +, Ian Murray wrote: Hi, I have a domU guest that booted fine under Xen 4.4.1 with pvh=1 but now fails to boot with it under Xen 4.5.0. Removing pvh=1, i.e. booting it as traditional PV results in it booting fine. The only odd thing is that I had to compile with debug=y on Config.mk to avoid a compiler warning that was causing compilation to fail outright. I will create another mail about that. DomU is Ubuntu 14.10 and Dom0 is 12.04.5 >>> there were some incompatible changes in xen 4.5 in regards to >>> PVH which were then updated in Linux 3.19 (or was it 3.18?) >>> >>> I would recommend you rev up to the latest version of Linux. >> I tried the mainline support kernel for Ubuntu (GNU/Linux >> 3.19.1-031901-generic x86_64) and it booted fine. >> >> Thanks for the assistance and happy to assist if anyone wants to treat >> it as a regression. > Nah, it is labelled 'expermintal' for that exact reason - as we did > realized we made a mistake in Xen 4.4 that we ended up fixing > in Xen 4.5 - and the fixed it in Linux. Thanks. I was aware it was experimental, but just wanted offer the chance to debug if the above behaviour was unexpected. > > Sorry thought that it was not widely mentioned and it made your > day a bit sad. Was there an specific webpage you looked first for > help? (Asking so I can at least edit it to mention this). I don't remember where I looked, tbh, although I would have read the release notes for 4.5 as a matter of course. I checked the PVH wiki entry and that merely refers to the "latest" of Xen and Linux. Perhaps that could be a bit more specific. My day wasn't made sad by this, but my day was a little sadder when I read (if I am reading it right) that AMD support for PVH (although slated for it) does not appear to have made it into 4.5. :) Thanks for reading. >> >> Here is the boot output when booting with pvh=1 [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Initializing cgroup subsys cpuacct [0.00] Linux version 3.16.0-31-generic (buildd@batsu) (gcc version 4.9.1 (Ubuntu 4.9.1-16ubuntu6) ) #41-Ubuntu SMP Tue Feb 10 15:24:04 UTC 2015 (Ubuntu 3.16.0-31.41-generic 3.16.7-ckt5) [0.00] Command line: root=UUID=edfcef2a-dcf1-4c77-ad69-22456606702e ro nomodeset xen-fbfront.video=16,1024,768 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] ACPI in unprivileged domain disabled [0.00] e820: BIOS-provided physical RAM map: [0.00] Xen: [mem 0x-0x3fff] usable [0.00] NX (Execute Disable) protection: active [0.00] DMI not present or invalid. [0.00] AGP: No AGP bridge found [0.00] e820: last_pfn = 0x4 max_arch_pfn = 0x4 [0.00] Scanning 1 areas for low memory corruption [0.00] init_memory_mapping: [mem 0x-0x000f] [0.00] init_memory_mapping: [mem 0x3fe0-0x3fff] [0.00] init_memory_mapping: [mem 0x3c00-0x3fdf] [0.00] init_memory_mapping: [mem 0x0010-0x3bff] [0.00] RAMDISK: [mem 0x023f6000-0x0589] [0.00] NUMA turned off [0.00] Faking a node at [mem 0x-0x3fff] [0.00] Initmem setup node 0 [mem 0x-0x3fff] [0.00] NODE_DATA [mem 0x3fffb000-0x3fff] [0.00] Zone ranges: [0.00] DMA [mem 0x1000-0x00ff] [0.00] DMA32[mem 0x0100-0x] [0.00] Normal empty [0.00] Movable zone start for each node [0.00] Early memory node ranges [0.00] node 0: [mem 0x1000-0x0009] [0.00] node 0: [mem 0x0010-0x3fff] [0.00] SFI: Simple Firmware Interface v0.81 http://simplefirmware.org [0.00] smpboot: Allowing 2 CPUs, 0 hotplug CPUs [0.00] PM: Registered nosave memory: [mem 0x000a-0x000f] [0.00] e820: [mem 0x4000-0x] available for PCI devices [0.00] Booting paravirtualized kernel with PVH extensions on Xen [0.00] Xen version: 4.5.0 [0.00] setup_percpu: NR_CPUS:256 nr_cpumask_bits:256 nr_cpu_ids:2 nr_node_ids:1 [0.00] PERCPU: Embedded 28 pages/cpu @88003fc0 s83328 r8192 d23168 u1048576 [0.00] Built 1 zonelists in Node order, mobility grouping on. Total pag
Re: [Xen-devel] OpenStack - Libvirt+Xen CI overview
Bob Ball wrote: > For the last few weeks Anthony and I have been working on creating a CI > environment to run against all OpenStack jobs. We're now in a position where > we can share the current status, overview of how it works and next steps. We > actively want to support involvement in this effort from others with an > interest in libvirt+Xen's openstack integration. > > The CI we have set up is follow the recommendations made by the OpenStack > official infrastructure maintainers, and reproduces a notable portion of the > official OpenStack CI environment to run these tests. Namely this setup is > using: > - Puppet to deploy the master node > - Zuul to watch for code changes uploaded to review.openstack.org > - Jenkins job builder to create Jenkins job definitions from a YAML file > - Nodepool to automatically create single-use virtual machines in the > Rackspace public cloud > - Devstack-gate to run Tempest tests in serial > > More information on Zuul, JJB, Nodepool and devstack-gate is available > through http://ci.openstack.org > > The current status is that we have a zuul instance monitoring for jobs and > adding them to the queue of jobs to be run at > http://zuul.openstack.xenproject.org/ > > In the background Nodepool provisions virtual machines into a pool of nodes > ready to be used. All ready nodes are automatically added to Jenkins > (https://jenkins.openstack.xenproject.org/), and then Zuul+Jenkins will > trigger a particular job on a node when one is available. > > > Logs are then uploaded to Rackspace's Cloud Files with sample logs for > a passing job at > http://logs.openstack.xenproject.org/52/162352/3/silent/dsvm-tempest-xen/da3ff30/index.html Thanks for the info! > > I'd like to organise a meeting to walk through the various components > of the CI with those who are interested, so this is an initial call to > find out who is interested in finding out more! I'd like to know more. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events
>>> On 17.03.15 at 15:07, wrote: > Yes, but Andrew's idea (which I think is very neat) is that instead of > the trickery I used to do in the original patch (create a specific > VMCALL vm_event and compare eax to a magic constant on VMCALL-based > VMEXITS, to figure out if all I wanted to do was send out the event), > that I should instead have the guest set up rax, rdi and rsi and execute > vmcall, which would then be translated to a real hypercall that sends > out a vm_event. If you think about a bare HVM guest OS (i.e. without any PV drivers), then of course you should provide such hypercall wrappers for code to use instead of open coding it in potentially many places. > In this case, the (HVM) guest does need to concern itself with what > registers it should set up for that purpose. I suppose a workaround > could be to write the subop in both ebx and rdi, though without any > testing I don't know at this point what, if anything, might be broken > that way. Guest code ought to know what mode it runs in. And introspection code (in case this is about injection of such code) ought to also know which mode the monitored guest is in. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
On Mon, 2015-03-16 at 16:30 -0400, Meng Xu wrote: > Hi Dario, > Hey, > 2015-03-16 13:05 GMT-04:00 Dario Faggioli : > > > > This change also takes the chance to add a scratch > > cpumask, to avoid having to create one more > > cpumask_var_t on the stack of the dumping routine. > > Actually, I have a question about the strength of this design. When we > have a machine with many cpus, we will end up with allocating a > cpumask for each cpu. > Just FTR, what we will end up allocating is: - an array of *pointers* to cpumasks with as many elements as the number of pCPUs, - a cpumask *only* for the pCPUs subjected to an instance of the RTDS scheduler. So, for instance, if you have 64 pCPUs, but are using the RTDS scheduler only in a cpupool with 2 pCPUs, you'll have an array of 64 pointers to cpumask_t, but only 2 actual cpumasks. > Is this better than having a cpumask_var_t on > the stack of the dumping routine, since the dumping routine is not in > the hot path? > George and Jan replied to this already, I think. Allow me to add just a few words: > > Such scratch area can be used to kill most of the > > cpumasks_var_t local variables in other functions > > in the file, but that is *NOT* done in this chage. > > This is the point, actually! As said here, this is not only for the sake of the dumping routine. In fact, ideally, someone will, in the near future, go throughout the whole file and kill most of the cpumask_t local variables, and most of the cpumask dynamic allocations, in favour of using this scratch area. > > @@ -409,6 +423,10 @@ rt_init(struct scheduler *ops) > > if ( prv == NULL ) > > return -ENOMEM; > > > > +_cpumask_scratch = xmalloc_array(cpumask_var_t, nr_cpu_ids); > > Is it better to use xzalloc_array? > Why? IMO, not really. I'm only free()-ing (in rt_free_pdata()) the elements of the array that have been previously successfully allocated (in rt_alloc_pdata()), so I don't think there is any special requirement for all the elements to be NULL right away. > > +if ( _cpumask_scratch == NULL ) > > +return -ENOMEM; > > + > > spin_lock_init(&prv->lock); > > INIT_LIST_HEAD(&prv->sdom); > > INIT_LIST_HEAD(&prv->runq); > > @@ -426,6 +444,7 @@ rt_deinit(const struct scheduler *ops) > > { > > struct rt_private *prv = rt_priv(ops); > > > > +xfree(_cpumask_scratch); > > xfree(prv); > > } > > > > @@ -443,6 +462,9 @@ rt_alloc_pdata(const struct scheduler *ops, int cpu) > > per_cpu(schedule_data, cpu).schedule_lock = &prv->lock; > > spin_unlock_irqrestore(&prv->lock, flags); > > > > +if ( !alloc_cpumask_var(&_cpumask_scratch[cpu]) ) > > Is it better to use zalloc_cpumask_var() here? > Nope. It's a scratch area, after all, so one really should not assume it to be in a specific state (e.g., no bits set as you're suggesting) when using it. Thanks and Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 07/30] PCI: Pass PCI domain number combined with root bus number
On Tue, 2015-03-17 at 10:45 +0530, Manish Jaggi wrote: > On Monday 09 March 2015 08:04 AM, Yijing Wang wrote: > > Now we could pass PCI domain combined with bus number > > in u32 argu. Because in arm/arm64, PCI domain number > > is assigned by pci_bus_assign_domain_nr(). So we leave > > pci_scan_root_bus() and pci_create_root_bus() in arm/arm64 > > unchanged. A new function pci_host_assign_domain_nr() > > will be introduced for arm/arm64 to assign domain number > > in later patch. > Hi, > I think these changes might not be required. We have made very few > changes in the xen-pcifront to support PCI passthrough in arm64. > As per xen architecture for a domU only a single pci virtual bus is > created and all passthrough devices are attached to it. I guess you are only talking about the changes to xen-pcifront.c? Otherwise you are ignoring the dom0 case which is exposed to the real set of PCI root complexes and anyway I'm not sure how "not needed for Xen domU" translates into not required, since it is clearly required for other systems. Strictly speaking the Xen pciif protocol does support multiple buses, it's just that the tools, and perhaps kernels, have not yet felt any need to actually make use of that. There doesn't seem to be any harm in updating pcifront to follow this generic API change. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events
On 03/17/2015 03:58 PM, Jan Beulich wrote: On 17.03.15 at 14:50, wrote: >> On 07/11/2014 08:23 PM, Andrew Cooper wrote: >>> From the point of view of your in-guest agent, it would be a vmcall with >>> rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer >>> to struct containing data, depending on how exactly you implement the >>> hypercall. >>> >>> You would have the bonus of being able to detect errors, e.g. -ENOENT >>> for "mem_event not active", get SVM support for free, and not need magic >>> numbers, or vendor specific terms like "vmcall" finding their way into >>> the Xen public API. >> >> Actually, this only seems to be the case where mode == 8 in >> hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c): >> >> 4987 : hvm_hypercall64_table)[eax](rdi, rsi, rdx, >> r10, r8, r9); >> >> Otherwise (and this seems to be the case with my Xen build), ebx seems >> to be used for the subop: >> >> 5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi, >> edi, ebp); >> >> So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended >> (rdi in one case and ebx in the other)? > > Of course - the ABIs (and hence the use of registers for certain > specific purposes) of ix86 and x86-64 are different. Since there > are hypercall wrappers in both the kernel and the tool stack, you > shouldn't actually need to care about this on the caller side. And > the handler side doesn't deal with specific registers anyway > (outside of hvm_do_hypercall() that is). Yes, but Andrew's idea (which I think is very neat) is that instead of the trickery I used to do in the original patch (create a specific VMCALL vm_event and compare eax to a magic constant on VMCALL-based VMEXITS, to figure out if all I wanted to do was send out the event), that I should instead have the guest set up rax, rdi and rsi and execute vmcall, which would then be translated to a real hypercall that sends out a vm_event. In this case, the (HVM) guest does need to concern itself with what registers it should set up for that purpose. I suppose a workaround could be to write the subop in both ebx and rdi, though without any testing I don't know at this point what, if anything, might be broken that way. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 3/5] libxl: add pvusb API
Hi Chunyan, I've found another problem while trying to write a qemu based pvUSB backend. On 01/19/2015 09:28 AM, Chunyan Liu wrote: Add pvusb APIs, including: - attach/detach (create/destroy) virtual usb controller. - attach/detach usb device - list assignable usb devices in host - some other helper functions Signed-off-by: Chunyan Liu Signed-off-by: Simon Cao --- ... diff --git a/tools/libxl/libxl_usb.c b/tools/libxl/libxl_usb.c new file mode 100644 index 000..830a846 --- /dev/null +++ b/tools/libxl/libxl_usb.c ... +/* xenstore usb data */ +static int libxl__device_usb_add_xenstore(libxl__gc *gc, uint32_t domid, + libxl_device_usb *usb) +{ +libxl_ctx *ctx = CTX; +char *be_path; +int rc; +libxl_domain_config d_config; +libxl_device_usb usb_saved; +libxl__domain_userdata_lock *lock = NULL; + +libxl_domain_config_init(&d_config); +libxl_device_usb_init(&usb_saved); +libxl_device_usb_copy(CTX, &usb_saved, usb); + +be_path = libxl__sprintf(gc, "%s/backend/vusb/%d/%d", +libxl__xs_get_dompath(gc, 0), domid, usb->ctrl); +if (libxl__wait_for_backend(gc, be_path, "4") < 0) { Don't do this! That's the reason I had to change my backend driver in order to support assignment of a usb device via config file. Normally the backend will witch to state 4 only after the frontend is started. You can just remove waiting for the backend here. The backend has to check all ports when it is changing is state to 4 ("connected"). +rc = ERROR_FAIL; +goto out; +} + +lock = libxl__lock_domain_userdata(gc, domid); +if (!lock) { +rc = ERROR_LOCK_FAIL; +goto out; +} + +rc = libxl__get_domain_configuration(gc, domid, &d_config); +if (rc) goto out; + +DEVICE_ADD(usb, usbs, domid, &usb_saved, COMPARE_USB, &d_config); + +rc = libxl__set_domain_configuration(gc, domid, &d_config); +if (rc) goto out; + +be_path = libxl__sprintf(gc, "%s/port/%d", be_path, usb->port); +LIBXL__LOG(ctx, LIBXL__LOG_DEBUG, "Adding new usb device to xenstore"); +if (libxl__xs_write_checked(gc, XBT_NULL, be_path, usb->intf)) { +rc = ERROR_FAIL; +goto out; +} + +rc = 0; + +out: +if (lock) libxl__unlock_domain_userdata(lock); +libxl_device_usb_dispose(&usb_saved); +libxl_domain_config_dispose(&d_config); +return rc; + +} + +static int libxl__device_usb_remove_xenstore(libxl__gc *gc, uint32_t domid, + libxl_device_usb *usb) +{ +libxl_ctx *ctx = CTX; +char *be_path; + +be_path = libxl__sprintf(gc, "%s/backend/vusb/%d/%d", +libxl__xs_get_dompath(gc, 0), domid, usb->ctrl); +if (libxl__wait_for_backend(gc, be_path, "4") < 0) Remove this one, too. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound
On Tue, 2015-03-17 at 13:05 +, Lars Kurth wrote: > > On 17 Mar 2015, at 11:40, Ian Campbell wrote: > > > > On Thu, 2015-03-12 at 18:14 +, Lars Kurth wrote: > >> Hi,I nearly missed this. Please make sure you forward stuff and change > >> the headline if you want me to look into things. Otherwise I may miss > >> it. > > > > Sure, I'll try and remember. > > > > FYI before Ian J went away he mentioned that he had raised some > > questions/issues (either on this or a previous version) which had not > > yet been answered (or maybe not answered to his satisfaction, I'm not > > sure) but that if those were addressed he would take a look with a view > > to acking the interface for inclusion in xen.git. > > OK. So this means there are some concrete lose ends, which need to be > followed up on. I also remember that there was a discussion on how we should > specify protocols, which does not appear to have fully concluded either. > > >> > >> Would this work as a way forward? > > > > I think the main things which is missing is some decision as to the the > > point at which we would consider the ABI for a PV protocol fixed, i.e. > > to be maintained in a backwards compatible manner from then on. > > What do we do with new APIs in such situations? We review then carefully and hope we get them right. We manage to get this right at least some of the time because many of us are familiar with the issues WRT e.g. memory management hypercalls. This is what I was getting at with "people are naturally a bit cautious about creating new ABIs, which must be maintained long term, for types of device with which they are not really familiar." in my initial mail. The "which they are not really familiar" is pretty key. It's also (normally) not too hard to add a new hypercall fixing a shortcoming in an existing one while retaining backwards compat, compared with doing that for an I/O protocol (see: netchannel2). In the I/O case adding extensions also is reasonably well understood and something we manage, but fixing a core issue is much harder (see: the non-uniformity of the blk protocol over different architectures, or the ring space wastage due to various power of two requirements, neither of which can realistically be properly fixed). Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events
>>> On 17.03.15 at 14:50, wrote: > On 07/11/2014 08:23 PM, Andrew Cooper wrote: >> From the point of view of your in-guest agent, it would be a vmcall with >> rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer >> to struct containing data, depending on how exactly you implement the >> hypercall. >> >> You would have the bonus of being able to detect errors, e.g. -ENOENT >> for "mem_event not active", get SVM support for free, and not need magic >> numbers, or vendor specific terms like "vmcall" finding their way into >> the Xen public API. > > Actually, this only seems to be the case where mode == 8 in > hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c): > > 4987 : hvm_hypercall64_table)[eax](rdi, rsi, rdx, > r10, r8, r9); > > Otherwise (and this seems to be the case with my Xen build), ebx seems > to be used for the subop: > > 5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi, > edi, ebp); > > So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended > (rdi in one case and ebx in the other)? Of course - the ABIs (and hence the use of registers for certain specific purposes) of ix86 and x86-64 are different. Since there are hypercall wrappers in both the kernel and the tool stack, you shouldn't actually need to care about this on the caller side. And the handler side doesn't deal with specific registers anyway (outside of hvm_do_hypercall() that is). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound
>>> On 17.03.15 at 14:05, wrote: >> On 17 Mar 2015, at 11:40, Ian Campbell wrote: >> I think the main things which is missing is some decision as to the the >> point at which we would consider the ABI for a PV protocol fixed, i.e. >> to be maintained in a backwards compatible manner from then on. > > What do we do with new APIs in such situations? It would appear that there > is some commonality in how we would handle a protocols and an API. I am > assuming APIs such as new hypercalls don't immediately become fixed and > backwards compatible. New hypercalls become set in stone as soon as they appear in any released version, unless specifically marked as experimental or alike. The situation is quite different for a protocol specification like this: Here we talk about something where no code would live in xen.git at all, only the abstract description. Hence its stability can't usefully be tied to any released Xen version. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 2/4] xen/arm: Add GSER region to ThunderX platform mapping
On Tue, 2015-03-17 at 18:32 +0530, Vijay Kilari wrote: > Hi Ian, > > On Thu, Mar 5, 2015 at 10:40 PM, Ian Campbell wrote: > > On Thu, 2015-03-05 at 16:46 +, Ian Campbell wrote: > >> On Wed, 2015-03-04 at 11:36 +0530, vijay.kil...@gmail.com wrote: > >> > From: Vijaya Kumar K > >> > > >> > Add GSER region to thunderx platfrom specific mappings. > >> > This region is not mentioned in DT. This is required by > >> > PCI driver to detect and configure pci devices attached. > >> > > >> > In future we can remove this mapping, if pci driver > >> > in Dom does not require this. > >> > >> How do we know what the PCI driver in dom0 needs? I don't think we can, > >> so we can in effect never remove this specific mapping, which is a > >> shame. > >> > >> Unless you have some scheme in mind which would allow us to do so? > >> > >> IMHO by far the best solution would be to add this device to the DTB so > >> that it is correctly mapped. I'm not quite sure what that will look like > >> since thne mainline DTB doesn't have the PCI node at all. > > > > Looking at a more recent DTB which I have access to it seems like > > 0x87e09000 is correctly covered by a ranges entry on the PCI > > controller node. > > Where did you find recent DTB?. AFAIK, this region does not fall > under any pci controller range. It was in the tree you guys sent me a little while back. ThunderX_Release_v0.3.tar.gz IIRC. thunder-88xx-2n.dtsi in that contains a PCI node "pcie0: pcie0@0x8480," with ranges containing this entry: <0x0300 0x87e0 0x 0x87e0 0x 0x01 0x>, Which covers the range from 0x87e0 to 0xe7f, i.e. covering this region at 0x87e09000. > > So I think all which is needed is a) to use this updated DTB and b) my > > series "xen: arm: Parse PCI DT nodes' ranges and interrupt-map" from > > last October which, as it happens, I've been working on bringing up to > > date yesterday and today (one more thing to clean up before I repost). > > Because it is not covered under any PCI ranges, your patch series > still does not help. > Infact, this is common region for SERDES configuration so cannot bind > to any particular pci controller range. Even if that turns out to be the case then surely this regions needs to be defined somehow in the DT else how could it be discovered. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] libxc/xentrace: Replace xc_tbuf_set_cpu_mask with CPU mask with xc_cpumap_t instead of uint32_t
On 03/13/2015 08:37 PM, Konrad Rzeszutek Wilk wrote: > +static int parse_cpumask(const char *arg) > +{ > +xc_cpumap_t map; > +uint32_t v, i; > +int bits = 0; > + > +map = malloc(sizeof(uint32_t)); > +if ( !map ) > +return -ENOMEM; > + > +v = argtol(arg, 0); > +for ( i = 0; i < sizeof(uint32_t) ; i++ ) > +map[i] = (v >> (i * 8)) & 0xff; > + > +for ( i = 0; v; v >>= 1) > +bits += v & 1; Uum, it looks like this is counting the 1-bits in v, not the total number of bist. So "0x8000" would finish with bits == 1 ; but we would this to finish with bits == 16, don't we? Or am I confused? -George ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 7/7] xen: sched_rt: print useful affinity info when dumping
On Mon, 2015-03-16 at 19:05 +, George Dunlap wrote: > On 03/16/2015 05:05 PM, Dario Faggioli wrote: > > @@ -218,7 +224,6 @@ __q_elem(struct list_head *elem) > > static void > > rt_dump_vcpu(const struct scheduler *ops, const struct rt_vcpu *svc) > > { > > -char cpustr[1024]; > > cpumask_t *cpupool_mask; > > > > ASSERT(svc != NULL); > > @@ -229,10 +234,22 @@ rt_dump_vcpu(const struct scheduler *ops, const > > struct rt_vcpu *svc) > > return; > > } > > > > -cpumask_scnprintf(cpustr, sizeof(cpustr), > > svc->vcpu->cpu_hard_affinity); > > +cpupool_mask = cpupool_scheduler_cpumask(svc->vcpu->domain->cpupool); > > +/* > > + * We can't just use 'cpumask_scratch' because the dumping can > > + * happen from a pCPU outside of this scheduler's cpupool, and > > + * hence it's not right to use the pCPU's scratch mask (which > > + * may even not exist!). On the other hand, it is safe to use > > + * svc->vcpu->processor's own scratch space, since we own the > > + * runqueue lock. > > Since we *hold* the lock. > Right, thanks. > > + */ > > +cpumask_and(_cpumask_scratch[svc->vcpu->processor], cpupool_mask, > > +svc->vcpu->cpu_hard_affinity); > > +cpulist_scnprintf(keyhandler_scratch, sizeof(keyhandler_scratch), > > + _cpumask_scratch[svc->vcpu->processor]); > > Just a suggestion, would it be worth making a local variable to avoid > typing this long thing twice? > It probably would. > Then you could also put the comment about > using the svc->vcpu->processor's scratch space above the place where you > set the local variable, while avoiding breaking up the logic of the > cpumask operations. > I like this, will do. Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC V2 4/6] xen: Support for VMCALL mem_events
On 07/11/2014 08:23 PM, Andrew Cooper wrote: > On 11/07/14 16:43, Razvan Cojocaru wrote: >> Added support for VMCALL events (the memory introspection library >> will have the guest trigger VMCALLs, which will then be sent along >> via the mem_event mechanism). >> >> Changes since V1: >> - Added a #define and an comment explaining a previous magic >>constant. >> - Had MEM_EVENT_REASON_VMCALL explicitly not honour >>HVMPME_onchangeonly. >> >> Signed-off-by: Razvan Cojocaru >> --- >> xen/arch/x86/hvm/hvm.c |9 + >> xen/arch/x86/hvm/vmx/vmx.c | 18 +- >> xen/include/asm-x86/hvm/hvm.h |1 + >> xen/include/public/hvm/params.h |4 +++- >> xen/include/public/mem_event.h |5 + >> 5 files changed, 35 insertions(+), 2 deletions(-) >> >> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c >> index 89a0382..6e86d7c 100644 >> --- a/xen/arch/x86/hvm/hvm.c >> +++ b/xen/arch/x86/hvm/hvm.c >> @@ -5564,6 +5564,7 @@ long do_hvm_op(unsigned long op, >> XEN_GUEST_HANDLE_PARAM(void) arg) >> case HVM_PARAM_MEMORY_EVENT_INT3: >> case HVM_PARAM_MEMORY_EVENT_SINGLE_STEP: >> case HVM_PARAM_MEMORY_EVENT_MSR: >> +case HVM_PARAM_MEMORY_EVENT_VMCALL: >> if ( d == current->domain ) >> { >> rc = -EPERM; >> @@ -6199,6 +6200,14 @@ void hvm_memory_event_msr(unsigned long msr, unsigned >> long value) >> value, ~value, 1, msr); >> } >> >> +void hvm_memory_event_vmcall(unsigned long rip, unsigned long eax) >> +{ >> +hvm_memory_event_traps(current->domain->arch.hvm_domain >> + .params[HVM_PARAM_MEMORY_EVENT_VMCALL], >> + MEM_EVENT_REASON_VMCALL, >> + rip, ~rip, 1, eax); >> +} >> + >> int hvm_memory_event_int3(unsigned long gla) >> { >> uint32_t pfec = PFEC_page_present; >> diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c >> index 2caa04a..6c63225 100644 >> --- a/xen/arch/x86/hvm/vmx/vmx.c >> +++ b/xen/arch/x86/hvm/vmx/vmx.c >> @@ -2879,8 +2879,24 @@ void vmx_vmexit_handler(struct cpu_user_regs *regs) >> case EXIT_REASON_VMCALL: >> { >> int rc; >> +unsigned long eax = regs->eax; >> + >> HVMTRACE_1D(VMMCALL, regs->eax); >> -rc = hvm_do_hypercall(regs); >> + >> +/* Don't send a VMCALL mem_event unless something >> + * caused the guests's eax register to contain the >> + * VMCALL_EVENT_REQUEST constant. */ >> +if ( regs->eax != VMCALL_EVENT_REQUEST ) >> +{ >> +rc = hvm_do_hypercall(regs); >> +} >> +else >> +{ >> +hvm_memory_event_vmcall(guest_cpu_user_regs()->eip, eax); >> +update_guest_eip(); >> +break; >> +} > > Thinking more about this, it is really a hypercall pretending not to > be. It would be better to introduce a real HVMOP_send_mem_event. > > From the point of view of your in-guest agent, it would be a vmcall with > rax = 34 (hvmop) rdi = $N (send_mem_event subop) rsi = data or pointer > to struct containing data, depending on how exactly you implement the > hypercall. > > You would have the bonus of being able to detect errors, e.g. -ENOENT > for "mem_event not active", get SVM support for free, and not need magic > numbers, or vendor specific terms like "vmcall" finding their way into > the Xen public API. Actually, this only seems to be the case where mode == 8 in hvm_do_hypercall() (xen/arch/x86/hvm/hvm.c): 4987 : hvm_hypercall64_table)[eax](rdi, rsi, rdx, r10, r8, r9); Otherwise (and this seems to be the case with my Xen build), ebx seems to be used for the subop: 5033 regs->_eax = hvm_hypercall32_table[eax](ebx, ecx, edx, esi, edi, ebp); So, ebx needs to be $N (send_mem_event subop), not rdi. Is this intended (rdi in one case and ebx in the other)? Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3 21/24] tools/(lib)xl: Add partial device tree support for ARM
Hi Ian, Sorry for the late answer. On 23/02/15 17:22, Ian Campbell wrote: > On Mon, 2015-02-23 at 17:06 +, Julien Grall wrote: >> On 23/02/15 11:46, Ian Campbell wrote: >>> On Tue, 2015-01-13 at 14:25 +, Julien Grall wrote: Let the user to pass additional nodes to the guest device tree. For this purpose, everything in the node /passthrough from the partial device tree will be copied into the guest device tree. The node /aliases will be also copied to allow the user to define aliases which can be used by the guest kernel. A simple partial device tree will look like: /dts-v1/; / { #address-cells = <2>; #size-cells = <2>; >>> >>> Are these mandatory/required as implied below, or only the ones inside >>> the passthrough node (which is what I would expect). >> >> It's to make DTC quiet. > > Maybe add /* Keep DTC happy */ to both lines? > >> passthrough { compatible = "simple-bus"; ranges; #address-cells = <2>; #size-cells = <2>; /* List of your nodes */ } }; Note that: * The interrupt-parent proporties will be added by the toolstack in >>> >>> "properties" >>> the root node * The properties compatible, ranges, #address-cells and #size-cells in /passthrough are mandatory. >>> >>> Does ranges need to be the empty form? I think ranges = >>> would be illegal? >> >> It's not illegal as long as you correctly use it in the inner "reg". > > OK. This could be explained in some more complete documentaiton I think. > (It's a doc day on Wednesday ;-)) > >> >> Also, I admit that the "ranges" is confusing to read. >> Signed-off-by: Julien Grall Cc: Ian Jackson Cc: Wei Liu --- Changes in v3: - Patch added --- docs/man/xl.cfg.pod.5 | 7 ++ tools/libxl/libxl_arm.c | 253 tools/libxl/libxl_types.idl | 1 + tools/libxl/xl_cmdimpl.c| 1 + 4 files changed, 262 insertions(+) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index e2f91fc..225b782 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -398,6 +398,13 @@ not emulated. Specify that this domain is a driver domain. This enables certain features needed in order to run a driver domain. +=item B + +Specify a partial device tree (compiled via the Device Tree Compiler). +Everything under the node "/passthrough" will be copied into the guest +device tree. For convenience, the node "/aliases" is also copied to allow +the user to defined aliases which can be used by the guest kernel. + =back =head2 Devices diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index 53177eb..619458b 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -540,6 +540,238 @@ out: } } +static bool check_overrun(uint64_t a, uint64_t b, uint32_t max) +{ +return ((a + b) > UINT_MAX || (a + b) > max); >>> >>> Both halves here will fail if e.g. a == UINT64_MAX-1 and b == 2, so e..g >>> a+b <= UINT_MAX and < max. >> >> Oops right. >> >>> To avoid this you should check that a and b are both less than some >>> fraction of UINT64_MAX before the other checks, which would ensure the >>> overflow can't happen, perhaps even UINT32_MAX would be acceptable for >>> this use, depending on the input types involved. >> >> max is an uint32_t so a and b should be inferior to UINT32_MAX. > > by "inferior to" do you mean less than? Or something to do with type > promotion/demotion rules? I meant less than. >> >> What about >> >> a < UINT_MAX && b < UINT_MAX && (a + b) < UINT_MAX > > Isn't that inverted from the sense which the function name requires? > > Given the complexity in reasoning about this I think a series of > individual if and return statements which check each precondition one at > a time and return failure if necessary wuold be clearer to read and > reason about than trying to encode it all in one expression. Given that we will mark the option unsafe. I'm thinking to drop this check and some others. This would make the code less complex and avoid to check on half of the FDT. > diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 1214d2e..5651110 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -399,6 +399,7 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("kernel", string), ("cmdline", string), ("ramdisk", string), +("device_tree", string), >>> >>> Needs a #define LIBXL_HAVE... in libxl.h >> >> Hmmm why? This
Re: [Xen-devel] [PATCH] flask/policy: fix static device labeling examples
>>> On 17.03.15 at 14:03, wrote: > (CC Ian and Jan) This is mostly about tools stuff: >> docs/misc/xsm-flask.txt | 31 +++ >> tools/flask/policy/Makefile | 3 ++- >> tools/flask/policy/policy/device_contexts| 32 +++ >> tools/flask/policy/policy/modules/xen/xen.te | 38 >> +++- >> 4 files changed, 41 insertions(+), 63 deletions(-) Hence I don't see why you ping me about it. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: [PATCH 1/3] Enhance platform support for PCI
On Tue, Mar 17, 2015 at 10:56:48AM +0530, Manish Jaggi wrote: > > On Friday 27 February 2015 10:20 PM, Ian Campbell wrote: > >On Fri, 2015-02-27 at 16:35 +, Jan Beulich wrote: > >On 27.02.15 at 16:24, wrote: > >>>On Fri, 2015-02-27 at 14:54 +, Stefano Stabellini wrote: > MMCFG is a Linux config option, not to be confused with > PHYSDEVOP_pci_mmcfg_reserved that is a Xen hypercall interface. I don't > think that the way Linux (or FreeBSD) call PHYSDEVOP_pci_mmcfg_reserved > is relevant. > >>>My (possibly flawed) understanding was that pci_mmcfg_reserved was > >>>intended to propagate the result of dom0 parsing some firmware table or > >>>other to the hypevisor. > >>That's not flawed at all. > >I think that's a first in this thread ;-) > > > >>>In Linux dom0 we call it walking pci_mmcfg_list, which looking at > >>>arch/x86/pci/mmconfig-shared.c pci_parse_mcfg is populated by walking > >>>over a "struct acpi_table_mcfg" (there also appears to be a bunch of > >>>processor family derived entries, which I guess are "quirks" of some > >>>sort). > >>Right - this parses ACPI tables (plus applies some knowledge about > >>certain specific systems/chipsets/CPUs) and verifies that the space > >>needed for the MMCFG region is properly reserved either in E820 or > >>in the ACPI specified resources (only if so Linux decides to use > >>MMCFG and consequently also tells Xen that it may use it). > >Thanks. > > > >So I think what I wrote in <1424948710.14641.25.ca...@citrix.com> > >applies as is to Device Tree based ARM devices, including the need for > >the PHYSDEVOP_pci_host_bridge_add call. > > > >On ACPI based devices we will have the MCFG table, and things follow > >much as for x86: > > > > * Xen should parse MCFG to discover the PCI host-bridges > > * Dom0 should do likewise and call PHYSDEVOP_pci_mmcfg_reserved in > > the same way as Xen/x86 does. > > > >The SBSA, an ARM standard for "servers", mandates various things which > >we can rely on here because ACPI on ARM requires an SBSA compliant > >system. So things like odd quirks in PCI controllers or magic setup are > >spec'd out of our zone of caring (into the firmware I suppose), hence > >there is nothing like the DT_DEVICE_START stuff to register specific > >drivers etc. > > > >The PHYSDEVOP_pci_host_bridge_add call is not AFAICT needed on ACPI ARM > >systems (any more than it is on x86). We can decide whether to omit it > >from dom0 or ignore it from Xen later on. > > > >(Manish, this is FYI, I don't expect you to implement ACPI support!) > > In drivers/xen/pci.c on notification BUS_NOTIFY_ADD_DEVICE dom0 issues a > hypercall to inform xen that a new pci device has been added. > If we were to inform xen about a new pci bus that is added there are 2 ways > a) Issue the hypercall from drivers/pci/probe.c > b) When a new device is found (BUS_NOTIFY_ADD_DEVICE) issue > PHYSDEVOP_pci_device_add hypercall to xen, if xen does not finds that > segment number (s_bdf), it will return an error > SEG_NO_NOT_FOUND. After that the linux xen code could issue the > PHYSDEVOP_pci_host_bridge_add hypercall. Couldn't the code figure out from 'struct pci_dev' whether the device is a bridge or an PCI device? And then do the proper hypercall? Interesting thing you _might_ hit (that I did) was that if you use 'bus=reassign' which re-assigns the bus numbers during scan - Xen gets very very confused. As in, the bus devices that Xen sees vs the ones Linux sees are different. Whether you will encounter this depends on whether the bridge devices and pci devices end up having an differnet bus number from what Xen scanned, and from what Linux has determined. (As in, Linux has found a bridge device with more PCI devices -so it repograms the bridge which moves all of the other PCI devices "below" it by X number). The reason I am bringing it up - it sounds like Xen will have no clue about some devices - and be told about it by Linux - if some reason it has the same bus number as some that Xen already scanned - gah! > > I think (b) can be done with minimal code changes. What do you think ? Less code == better. > > >Ian. > > > > > >___ > >Xen-devel mailing list > >Xen-devel@lists.xen.org > >http://lists.xen.org/xen-devel > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v7] sndif: add ABI for Para-virtual sound
On Tue, 17 Mar 2015, Ian Campbell wrote: > On Thu, 2015-03-12 at 18:14 +, Lars Kurth wrote: > > Hi,I nearly missed this. Please make sure you forward stuff and change > > the headline if you want me to look into things. Otherwise I may miss > > it. > > Sure, I'll try and remember. > > FYI before Ian J went away he mentioned that he had raised some > questions/issues (either on this or a previous version) which had not > yet been answered (or maybe not answered to his satisfaction, I'm not > sure) but that if those were addressed he would take a look with a view > to acking the interface for inclusion in xen.git. > > (I've not looked in the threads for it, so I don't know the exact > state). > > > From my perspective, this exactly the kind of scenario why we created > > the embedded / automotive subproject, with an option to store code in > > repos owned by the project. > > > > Given that the primary use-case of these drivers is embedded / > > automotive, my suggestion would be to > > 1.a) Use a repo in the embedded / automotive pv driver subproject to > > host the spec - but use a file system structure that matches the xen > > tree > > 1.b) I would assume there would be one back-end and several front-ends > > for these drivers and some would eventually appear in trees owned by > > the embedded / automotive pv driver subproject > > > > In this case, the maintainer responsibility would fall to members of > > the embedded / automotive pv driver subproject. Once there are several > > implementations, and enough people with skills to review we can > > re-visit where the spec and drivers live. > > > > We can have a discussion about criteria of when to move, but I don't > > think that makes a lot of sense. I think the concerns that need to be > > addressed are: > > 2.a) Enough skills to review the code / protocols from different > > stake-holders - this should happen with time, once the spec and code > > are there. And of course once the embedded / automotive pv driver > > subproject graduates, that will also give extra weight to its > > maintainers in the wider community > > 2.b) Of course if there was a strong case that PV sound drivers are > > extremely useful for core data centre use-cases, I would probably > > suggest another approach > > > > Maybe 2.b) needs to be checked with Intel folks - there may be some > > sound requirement for XenGT > > > > Would this work as a way forward? > > I think the main things which is missing is some decision as to the the > point at which we would consider the ABI for a PV protocol fixed, i.e. > to be maintained in a backwards compatible manner from then on. > > That's of particular importance when one end of the pair is implemented > in external projects (e.g. OS driver frontends). If the interface is not > declared stable then changes would be allowed which would invalidate > those drivers. I think that you are right. Declaring the interface stable or unstable is far more important than where the code or the spec lives. If we formally specified within the spec that the ABI is not maintained for backward compatibility, the bar for acceptance in xen-unstable would be far lower. Maybe the spec could even be accepted as is if nobody has any comments? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] x86: detect and initialize Intel CAT feature
On Tue, Mar 17, 2015 at 04:11:33PM +0800, Chao Peng wrote: > On Fri, Mar 13, 2015 at 09:40:13AM -0400, Konrad Rzeszutek Wilk wrote: > > On Fri, Mar 13, 2015 at 06:13:20PM +0800, Chao Peng wrote: > > > Detect Intel Cache Allocation Technology(CAT) feature and store the > > > cpuid information for later use. Currently only L3 cache allocation is > > > supported. The L3 CAT features may vary among sockets so per-socket > > > feature information is stored. The initialization can happen either at > > > boot time or when CPU(s) is hot plugged after booting. > > > > > > Signed-off-by: Chao Peng > > > --- > > > docs/misc/xen-command-line.markdown | 15 +++- > > > xen/arch/x86/psr.c | 151 > > > +--- > > > xen/include/asm-x86/cpufeature.h| 1 + > > > 3 files changed, 155 insertions(+), 12 deletions(-) > > > > > > +cat_cpu_init(smp_processor_id()); > > > > Do 'if (!cat_cpu_init(..)).`' > > > > as the CPU might not support this. > > > > At which point you should also free the cat_socket_info and > > not register the cpu notifier. > > Even the booting CPU does not support this, other CPUs may still support > this. Generally the feature is a per-socket feature. So break here is > not the intention. Oooh, and you did mention that in the git commit description and I dived right in the code - without looking there - sorry for that noise! Thought I am curious - what if all the sockets don't support and the user does try enable it on the command line (user error)? Shouldn't we then figure out that all of the CPUs don't support and xfree cat_socket_info and not register the CPU notifier? > > Except this, all other comments will be addressed by the next version. thank you! > Thanks for your time. > > Chao > > > > > +register_cpu_notifier(&cpu_nfb); > > > +} > > > + ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel