Re: [Xen-devel] [PATCH] VTd/dmar: Tweak how the DMAR table is clobbered
On 09/04/15 09:51, David Vrabel wrote: On 08/04/15 20:44, Andrew Cooper wrote: Intead of clobbering DMAR - XMAR and back, clobber to RMAD instead. This means that changing the signature does not alter the checksum, which allows the clobbering/unclobbering to be peformed atomically and idempotently, which is an advantage on the kexec path which can reenter acpi_dmar_reinstate(). Could RMAD be specified as a real table in the future? Does the clobbered name have to start with X to avoid future conflicts? David I am not aware of any restrictions imposed by the APCI spec. Any clobbered signature is potentially a real table in the future. This DMAR clobbering was introduced by 83904107a33c9badc34ecdd1f8ca0f9271e5e370 which claims that the dom0 VT-d driver was capable of playing with the IOMMU(s) while Xen was also using them. An alternative approach might be to leave the DMAR table alone and sprinkle some iomem_deny_access() around to forcibly prevent dom0 from playing. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V9 0/6] xen: Clean-up of mem_event subsystem
This patch series aims to clean up the mem_event subsystem within Xen. The original use-case for this system was to allow external helper applications running in privileged domains to control various memory operations performed by Xen. Amongs these were paging, sharing and access control. The subsystem has since been extended to also deliver non-memory related events, namely various HVM debugging events (INT3, MTF, MOV-TO-CR, MOV-TO-MSR). The structures and naming of related functions however has not caught up to these new use-cases, thus leaving many ambiguities in the code. Furthermore, future use-cases envisioned for this subsystem include PV domains and ARM domains, thus there is a need to establish a common base to build on. Each patch in the series has been build-tested on x86 and ARM, both with and without XSM enabled. This PATCH series is also available at: https://github.com/tklengyel/xen/tree/mem_event_cleanup9 Tamas K Lengyel (6): xen: Introduce monitor_op domctl xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag xen/vm_event: Decouple vm_event and mem_access. xen/vm_event: Relocate memop checks xen/xsm: Split vm_event_op into three separate labels xen/vm_event: Add RESUME option to vm_event_op domctl MAINTAINERS | 1 + tools/libxc/Makefile| 1 + tools/libxc/include/xenctrl.h | 49 +++-- tools/libxc/xc_domain.c | 28 +- tools/libxc/xc_mem_access.c | 56 +-- tools/libxc/xc_mem_paging.c | 12 ++- tools/libxc/xc_memshr.c | 15 ++- tools/libxc/xc_monitor.c| 137 + tools/libxc/xc_private.h| 2 +- tools/libxc/xc_vm_event.c | 11 +- tools/tests/xen-access/xen-access.c | 40 xen/arch/x86/Makefile | 1 + xen/arch/x86/hvm/emulate.c | 2 +- xen/arch/x86/hvm/event.c| 82 --- xen/arch/x86/hvm/hvm.c | 35 +-- xen/arch/x86/hvm/vmx/vmcs.c | 7 +- xen/arch/x86/hvm/vmx/vmx.c | 2 +- xen/arch/x86/mm/mem_paging.c| 41 ++-- xen/arch/x86/mm/mem_sharing.c | 160 ++--- xen/arch/x86/mm/p2m.c | 74 -- xen/arch/x86/monitor.c | 196 xen/arch/x86/x86_64/compat/mm.c | 26 + xen/arch/x86/x86_64/mm.c| 24 + xen/common/Makefile | 18 ++-- xen/common/domctl.c | 9 ++ xen/common/mem_access.c | 51 ++ xen/common/vm_event.c | 181 + xen/include/asm-arm/monitor.h | 35 +++ xen/include/asm-arm/p2m.h | 18 +++- xen/include/asm-x86/domain.h| 22 +++- xen/include/asm-x86/hvm/domain.h| 1 - xen/include/asm-x86/mem_paging.h| 2 +- xen/include/asm-x86/mem_sharing.h | 4 +- xen/include/asm-x86/monitor.h | 31 ++ xen/include/asm-x86/p2m.h | 37 +-- xen/include/public/domctl.h | 80 +++ xen/include/public/hvm/params.h | 9 +- xen/include/public/memory.h | 18 ++-- xen/include/public/vm_event.h | 3 +- xen/include/xen/mem_access.h| 14 ++- xen/include/xen/vm_event.h | 59 +-- xen/include/xsm/dummy.h | 20 +++- xen/include/xsm/xsm.h | 33 +- xen/xsm/dummy.c | 13 ++- xen/xsm/flask/hooks.c | 64 xen/xsm/flask/policy/access_vectors | 12 ++- 46 files changed, 1106 insertions(+), 630 deletions(-) create mode 100644 tools/libxc/xc_monitor.c create mode 100644 xen/arch/x86/monitor.c create mode 100644 xen/include/asm-arm/monitor.h create mode 100644 xen/include/asm-x86/monitor.h -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/2] osstest: update FreeBSD guests to 10.1
Roger Pau Monne writes ([PATCH v2 1/2] osstest: update FreeBSD guests to 10.1): Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images should be placed in the osstest images folder: Thanks for the quick turnaround. I have pushed this and 2/2 to osstest pretest (with my acks) and will keep an eye on it. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 2/6] x86/numa: Correct the extern of cpu_to_node
On 09/04/15 16:00, Tim Deegan wrote: At 18:26 +0100 on 07 Apr (1428431176), Andrew Cooper wrote: --- a/xen/include/asm-x86/numa.h +++ b/xen/include/asm-x86/numa.h @@ -9,7 +9,7 @@ extern int srat_rev; -extern unsigned char cpu_to_node[]; +extern nodeid_t cpu_to_node[NR_CPUS]; Does the compiler do anything useful with the array size here? Specifying the size allows ARRAY_SIZE(cpu_to_node) to work in other translation units. It also allows static analysers to perform bounds checks, should they wish. In particular does it check that it matches the size at the definition? It will complain if they are mismatched. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/6] x86/link: Discard the alternatives .discard sections
At 18:26 +0100 on 07 Apr (1428431175), Andrew Cooper wrote: This appears to have been missed when porting the alternatives framework from Linux, and saves us a section which is otherwise loaded into memory. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Tim Deegan t...@xen.org ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
On Thu, 2015-04-09 at 12:11 +0100, Ian Jackson wrote: Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): On Wed, 2015-04-08 at 14:59 +0100, Ian Jackson wrote: Ian Jackson writes (Re: tg3 NIC driver bug in 3.14.x under Xen): The value for dropped increases steadily. This particular box is on a network with a lot of other stuff, so it will be constantly receiving broadcasts of various kinds even when I am not trying to address it directly. Based on the stats, the issue seems to be with the bridge than tg3. Do you have any filters enabled on xenbr0 ? No. I can try to repro the problem without the bridge, if it would help. yes please do ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC] xen/pvh: use a custom IO bitmap for PVH hardware domains
On 08/04/15 13:57, Roger Pau Monne wrote: Since a PVH hardware domain has access to the physical hardware create a custom more permissive IO bitmap. The permissions set on the bitmap are populated based on the contents of the ioports rangeset. Also add the IO ports of the serial console used by Xen to the list of not accessible IO ports. Thankyou for looking into this. I think it is the correct general direction, but I do have some questions/thoughts about this area. I know that the current implementation is that dom0 is whitelisted and can play with everything, but is this actually the best API? Conceptually, a better approach would be for dom0 to start with no permissions, and explicitly request access (After all, PV and PVH domains are expected to know exactly what they are doing under Xen). This has an extra advantage in that dom0 can't accidentally grant permissions for resources it doens't know about to domUs. Instead of adding to a growing blacklist in contruct_dom0, it might be a better to maintain a global rangeset (or few) for resources which are used by Xen and not permitted to be used by any other domains. This would allow the ioports_deny_access()/etc calls to move into the correct drivers, instead of having to extern things like the uart ports. It is also far more likely to be kept up to date. (On that note, we could probably do with an audit of the currently denied resources. I highly doubt there is a PIT driver which could function with access to only some of the ports). In addition, some specific review... Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Jan Beulich jbeul...@suse.com Cc: Andrew Cooper andrew.coop...@citrix.com Cc: Boris Ostrovsky boris.ostrov...@oracle.com Cc: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Cc: Aravind Gopalakrishnan aravind.gopalakrish...@amd.com Cc: Jun Nakajima jun.nakaj...@intel.com Cc: Eddie Dong eddie.d...@intel.com Cc: Kevin Tian kevin.t...@intel.com --- xen/arch/x86/domain_build.c | 10 ++ xen/arch/x86/hvm/hvm.c | 11 +++ xen/arch/x86/hvm/svm/vmcb.c | 3 ++- xen/arch/x86/hvm/vmx/vmcs.c | 6 -- xen/arch/x86/hvm/vmx/vmx.c | 1 + xen/drivers/char/ns16550.c | 10 ++ xen/include/asm-x86/hvm/domain.h | 2 ++ xen/include/asm-x86/hvm/hvm.h| 1 + xen/include/xen/serial.h | 4 9 files changed, 45 insertions(+), 3 deletions(-) diff --git a/xen/arch/x86/domain_build.c b/xen/arch/x86/domain_build.c index e5c845c..d0365fe 100644 --- a/xen/arch/x86/domain_build.c +++ b/xen/arch/x86/domain_build.c @@ -22,6 +22,7 @@ #include xen/compat.h #include xen/libelf.h #include xen/pfn.h +#include xen/serial.h #include asm/regs.h #include asm/system.h #include asm/io.h @@ -1541,6 +1542,11 @@ int __init construct_dom0( rc |= ioports_deny_access(d, 0x40, 0x43); /* PIT Channel 2 / PC Speaker Control. */ rc |= ioports_deny_access(d, 0x61, 0x61); +/* Serial console. */ +if ( uart_ioport1 != 0 ) +rc |= ioports_deny_access(d, uart_ioport1, uart_ioport1 + 7); +if ( uart_ioport2 != 0 ) +rc |= ioports_deny_access(d, uart_ioport2, uart_ioport2 + 7); /* ACPI PM Timer. */ if ( pmtmr_ioport ) rc |= ioports_deny_access(d, pmtmr_ioport, pmtmr_ioport + 3); @@ -1618,6 +1624,10 @@ int __init construct_dom0( pvh_map_all_iomem(d, nr_pages); pvh_setup_e820(d, nr_pages); + +for ( i = 0; i 0x1; i++ ) +if ( ioports_access_permitted(d, i, i) ) +__clear_bit(i, hvm_hw_io_bitmap); (There is surely a more efficient way of doing this? If there isn't, there probably should be) There is also a boundary issue between VT-x and SVM. For VT-x, the IO bitmap is 2 pages. For SVM, it is 2 pages and 3 bits. I suspect the difference is to do with the handling of a 4byte write to port 0x. I think you might need to check i 0x10003 instead. } if ( d-domain_id == hardware_domid ) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3ff87c6..6de89b2 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -82,6 +82,10 @@ struct hvm_function_table hvm_funcs __read_mostly; unsigned long __attribute__ ((__section__ (.bss.page_aligned))) hvm_io_bitmap[3*PAGE_SIZE/BYTES_PER_LONG]; +/* I/O permission bitmap for HVM hardware domain */ +unsigned long __attribute__ ((__section__ (.bss.page_aligned))) +hvm_hw_io_bitmap[3*PAGE_SIZE/BYTES_PER_LONG]; + /* Xen command-line option to enable HAP */ static bool_t __initdata opt_hap_enabled = 1; boolean_param(hap, opt_hap_enabled); @@ -162,6 +166,7 @@ static int __init hvm_enable(void) * often used for I/O delays, but the vmexits simply slow things down). */ memset(hvm_io_bitmap, ~0, sizeof(hvm_io_bitmap)); +memset(hvm_hw_io_bitmap, ~0, sizeof(hvm_hw_io_bitmap)); if (
Re: [Xen-devel] [PATCH 09/10] log-dirty: Refine common code to support PML
At 10:35 +0800 on 27 Mar (1427452553), Kai Huang wrote: --- a/xen/arch/x86/mm/paging.c +++ b/xen/arch/x86/mm/paging.c @@ -411,7 +411,18 @@ static int paging_log_dirty_op(struct domain *d, int i4, i3, i2; if ( !resuming ) +{ domain_pause(d); + +/* + * Only need to flush when not resuming, as domain was paused in + * resuming case therefore it's not possible to have any new dirty + * page. + */ +if ( d-arch.paging.log_dirty.flush_cached_dirty ) +d-arch.paging.log_dirty.flush_cached_dirty(d); I think there are too many layers of indirection here. :) How about: - don't add a flush_cached_dirty() function to the log_dirty ops. - just call p2m_flush_hardware_cached_dirty(d) here. Would that work OK? Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 11/12] tools: add tools support for Intel CAT
This is the xc/xl changes to support Intel Cache Allocation Technology(CAT). Two commands are introduced: - xl psr-cat-cbm-set [-s socket] domain cbm Set cache capacity bitmasks(CBM) for a domain. - xl psr-cat-show domain Show Cache Allocation Technology information. Examples: [root@vmm-psr]# xl psr-cat-cbm-set 0 0xff [root@vmm-psr]# xl psr-cat-show Socket ID : 0 L3 Cache: 12288KB Maximum COS : 15 CBM length : 12 Default CBM : 0xfff ID NAME CBM 0 Domain-00xff Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v4: * Add example output in commit message. * Make libxl__count_physical_sockets private to libxl_psr.c. * Set errno in several error cases. * Change libxl_psr_cat_get_l3_info to return all sockets information. * Remove unused libxl_domain_info call. Changes in v3: * Add manpage. * libxl_psr_cat_set/get_domain_data = libxl_psr_cat_set/get_cbm. * Move libxl_count_physical_sockets into seperate patch. * Support LIBXL_PSR_TARGET_ALL for libxl_psr_cat_set_cbm. * Clean up the print codes. --- docs/man/xl.pod.1 | 31 tools/libxc/include/xenctrl.h | 15 tools/libxc/xc_psr.c | 76 +++ tools/libxl/libxl.h | 26 +++ tools/libxl/libxl_psr.c | 168 -- tools/libxl/libxl_types.idl | 10 +++ tools/libxl/xl.h | 4 + tools/libxl/xl_cmdimpl.c | 140 +++ tools/libxl/xl_cmdtable.c | 12 +++ 9 files changed, 475 insertions(+), 7 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index b016272..dfab921 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1492,6 +1492,37 @@ monitor types are: =back +=head1 CACHE ALLOCATION TECHNOLOGY + +Intel Broadwell and later server platforms offer capabilities to configure and +make use of the Cache Allocation Technology (CAT) mechanisms, which enable more +cache resources (i.e. L3 cache) to be made available for high priority +applications. In Xen implementation, CAT is used to control cache allocation +on VM basis. To enforce cache on a specific domain, just set capacity bitmasks +(CBM) for the domain. + +=over 4 + +=item Bpsr-cat-cbm-set [IOPTIONS] [Idomain-id] [Icbm] + +Set cache capacity bitmasks(CBM) for a domain. + +BOPTIONS + +=over 4 + +=item B-s SOCKET, B--socket=SOCKET + +Specify the socket to process, otherwise all sockets are processed. + +=back + +=item Bpsr-cat-show [Idomain-id] + +Show CAT settings for a certain domain or all domains. + +=back + =head1 TO BE DOCUMENTED We need better documentation for: diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index df18292..1373a46 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2692,6 +2692,12 @@ enum xc_psr_cmt_type { XC_PSR_CMT_LOCAL_MEM_COUNT, }; typedef enum xc_psr_cmt_type xc_psr_cmt_type; + +enum xc_psr_cat_type { +XC_PSR_CAT_L3_CBM = 1, +}; +typedef enum xc_psr_cat_type xc_psr_cat_type; + int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid); int xc_psr_cmt_detach(xc_interface *xch, uint32_t domid); int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid, @@ -2706,6 +2712,15 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, uint32_t psr_cmt_type, uint64_t *monitor_data, uint64_t *tsc); int xc_psr_cmt_enabled(xc_interface *xch); + +int xc_psr_cat_set_domain_data(xc_interface *xch, uint32_t domid, + xc_psr_cat_type type, uint32_t target, + uint64_t data); +int xc_psr_cat_get_domain_data(xc_interface *xch, uint32_t domid, + xc_psr_cat_type type, uint32_t target, + uint64_t *data); +int xc_psr_cat_get_l3_info(xc_interface *xch, uint32_t socket, + uint32_t *cos_max, uint32_t *cbm_len); #endif #endif /* XENCTRL_H */ diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c index e367a80..d8b3a51 100644 --- a/tools/libxc/xc_psr.c +++ b/tools/libxc/xc_psr.c @@ -248,6 +248,82 @@ int xc_psr_cmt_enabled(xc_interface *xch) return 0; } +int xc_psr_cat_set_domain_data(xc_interface *xch, uint32_t domid, + xc_psr_cat_type type, uint32_t target, + uint64_t data) +{ +DECLARE_DOMCTL; +uint32_t cmd; + +switch ( type ) +{ +case XC_PSR_CAT_L3_CBM: +cmd = XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM; +break; +default: +errno = EINVAL; +return -1; +} + +domctl.cmd = XEN_DOMCTL_psr_cat_op; +domctl.domain = (domid_t)domid; +domctl.u.psr_cat_op.cmd = cmd; +domctl.u.psr_cat_op.target = target; +domctl.u.psr_cat_op.data = data; + +return do_domctl(xch, domctl); +} + +int
[Xen-devel] [PATCH v5 p2 06/19] xen/dts: Provide an helper to get a DT node from a path provided by a guest
From: Julien Grall julien.gr...@linaro.org The maximum size of the copied string has been chosen based on the value use by XSM in similar case. Furthermore, Linux seems to allow path up to 4096 characters. Though this could vary from one OS to another. Signed-off-by: Julien Grall julien.gr...@linaro.org --- Changes in v4: - Drop DEVICE_TREE_MAX_PATHLEN - Bump the value to PAGE_SIZE (i.e 4096). It's used in XSM and this value seems sensible for Linux - Clarify how the maximum size has been chosen Changes in v3: - Use the new prototype of safe_copy_string_from_guest Changes in v2: - guest_copy_string_from_guest has been renamed into safe_copy_string_from_guest --- xen/common/device_tree.c | 18 ++ xen/include/xen/device_tree.h | 14 ++ 2 files changed, 32 insertions(+) diff --git a/xen/common/device_tree.c b/xen/common/device_tree.c index 02cae91..31f169b 100644 --- a/xen/common/device_tree.c +++ b/xen/common/device_tree.c @@ -13,6 +13,7 @@ #include xen/config.h #include xen/types.h #include xen/init.h +#include xen/guest_access.h #include xen/device_tree.h #include xen/kernel.h #include xen/lib.h @@ -23,6 +24,7 @@ #include xen/cpumask.h #include xen/ctype.h #include asm/setup.h +#include xen/err.h const void *device_tree_flattened; dt_irq_xlate_func dt_irq_xlate; @@ -277,6 +279,22 @@ struct dt_device_node *dt_find_node_by_path(const char *path) return np; } +int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen, + struct dt_device_node **node) +{ +char *path; + +path = safe_copy_string_from_guest(u_path, u_plen, PAGE_SIZE); +if ( IS_ERR(path) ) +return PTR_ERR(path); + +*node = dt_find_node_by_path(path); + +xfree(path); + +return (*node == NULL) ? -ESRCH : 0; +} + struct dt_device_node *dt_find_node_by_alias(const char *alias) { const struct dt_alias_prop *app; diff --git a/xen/include/xen/device_tree.h b/xen/include/xen/device_tree.h index 57eb3ee..e187780 100644 --- a/xen/include/xen/device_tree.h +++ b/xen/include/xen/device_tree.h @@ -456,6 +456,20 @@ struct dt_device_node *dt_find_node_by_alias(const char *alias); */ struct dt_device_node *dt_find_node_by_path(const char *path); + +/** + * dt_find_node_by_gpath - Same as dt_find_node_by_path but retrieve the + * path from the guest + * + * @u_path: Xen Guest handle to the buffer containing the path + * @u_plen: Length of the buffer + * @node: TODO + * + * Return 0 if succeed otherwise -errno + */ +int dt_find_node_by_gpath(XEN_GUEST_HANDLE(char) u_path, uint32_t u_plen, + struct dt_device_node **node); + /** * dt_get_parent - Get a node's parent if any * @node: Node to get parent -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
Ian Jackson writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): yes please do I will do so. I did this test: - Linux 3.14.21 - baremetal - `iommu=soft swiotlb=force' as suggested by Konrad - no bridge - manually added arp entries on both ends between target box and a server on same network The results are: On the test box, `ping 10.80.248.135' and `ping -s 500 10.80.248.135' generate apparently-good ICMP echo requests which the server replies to, but they don't seem to be received. I ran tcpdump -pvvs500 -lnieth0 \! ether dst cc:cc:cc:cc:cc:cc and \! \ ether dst 00:00:00:00:00:00 and \! ether dst 00:00:cc:cc:cc:cc and \ \! ether dst 00:00:00:00:cc:cc and \! ether dst cc:cc:00:00:00:00 on the test box while pinging it from the server (-s500 and the default). No relevant packets matched the tcpdump filter. However, as time goes by more and more packets with apparently random data in their address fields start turning up so I have to keep adding more mac addresses to be filtered out. root@bedbug:~# ethtool -S eth0 | grep -v ': 0$' NIC statistics: rx_octets: 8196868 rx_ucast_packets: 633 rx_mcast_packets: 1 rx_bcast_packets: 123789 tx_octets: 42854 tx_ucast_packets: 9 tx_mcast_packets: 8 tx_bcast_packets: 603 root@bedbug:~# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:13:72:14:c0:51 inet addr:10.80.249.102 Bcast:10.80.251.255 Mask:255.255.252.0 inet6 addr: fe80::213:72ff:fe14:c051/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:124774 errors:0 dropped:88921 overruns:0 frame:0 TX packets:620 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8222158 (7.8 MiB) TX bytes:42854 (41.8 KiB) Interrupt:17 root@bedbug:~# It appears therefore that packets are being corrupted on the receive path, and the kernel then drops them (as misaddressed). I also tried under Xen (rather than with baremetal and Konrad's iommu/swiotlb kernel options), but that seems to be a less effective repro. Under Xen, without the bridge, I got ~6-8% packet loss, compared to ~25-30% with the bridge. I didn't investigate that configuration in detail. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 8/8] Refactor package dependency checking and installation
First, create a new global variable, PKGTYPE. At the moment support deb and rpm. Define _check-package-$PKGTYPE which returns true if the package is installed, false otherwise, and _install-package-$PKGTYPE which will install a list of packages. Define check-package(), which will take a list of packages, and check to see if they're installed. Any missing packages will be added to an array called missing. Change _${COMPONENT}_install_dependencies to ${COMPONENT}_check_package. Have these call check-package. Don't call _${COMPONENT}_install_dependencies from ${COMPONENT}_build. Define check-builddeps(). Define an empty missing array. Call check-package for raisin dependincies (like git and rpmbuild). Then call for_each_component check_package. At this point we have an array with all missing packages. If it's empty, be happy. If it's non-empty, and deps=true, try to install the packages; otherwise print the missing packages and exit. Add install-builddeps(), which is basically check-builddeps() with deps=true by default. Call check-builddeps from build() to close the loop. Signed-off-by: George Dunlap george.dun...@eu.citrix.com --- CC: Stefano Stabellini stefano.stabell...@citrix.com --- components/grub | 6 ++-- components/libvirt | 6 ++-- components/xen | 10 +++--- lib/build.sh| 87 +++ lib/common-functions.sh | 89 +++-- 5 files changed, 148 insertions(+), 50 deletions(-) diff --git a/components/grub b/components/grub index a5aa27d..839e001 100644 --- a/components/grub +++ b/components/grub @@ -1,6 +1,6 @@ #!/usr/bin/env bash -function _grub_install_dependencies() { +function grub_check_package() { local DEP_Debian_common=build-essential tar autoconf bison flex local DEP_Debian_x86_32=$DEP_Debian_common local DEP_Debian_x86_64=$DEP_Debian_common libc6-dev-i386 @@ -18,8 +18,8 @@ function _grub_install_dependencies() { echo grub is only supported on x86_32 and x86_64 return fi -echo installing Grub dependencies -eval install_dependencies \$DEP_$DISTRO_$ARCH +echo checking Grub dependencies +eval check-package \$DEP_$DISTRO_$ARCH } diff --git a/components/libvirt b/components/libvirt index 6602dcf..b106970 100644 --- a/components/libvirt +++ b/components/libvirt @@ -1,6 +1,6 @@ #!/usr/bin/env bash -function _libvirt_install_dependencies() { +function libvirt_check_package() { local DEP_Debian_common=build-essential libtool autoconf autopoint \ xsltproc libxml2-utils pkg-config python-dev \ libxml-xpath-perl libyajl-dev libxml2-dev \ @@ -18,8 +18,8 @@ function _libvirt_install_dependencies() { local DEP_Fedora_x86_32=$DEP_Fedora_common local DEP_Fedora_x86_64=$DEP_Fedora_common -echo installing Libvirt dependencies -eval install_dependencies \$DEP_$DISTRO_$ARCH +echo checking Libvirt dependencies +eval check-package \$DEP_$DISTRO_$ARCH } function libvirt_build() { diff --git a/components/xen b/components/xen index 7a9f22d..ce46e3d 100644 --- a/components/xen +++ b/components/xen @@ -1,6 +1,8 @@ #!/usr/bin/env bash -function _xen_install_dependencies() { +function xen_check_package() { +$requireargs DISTRO ARCH + local DEP_Debian_common=build-essential python-dev gettext uuid-dev \ libncurses5-dev libyajl-dev libaio-dev pkg-config libglib2.0-dev \ libssl-dev libpixman-1-dev bridge-utils wget @@ -15,13 +17,11 @@ function _xen_install_dependencies() { local DEP_Fedora_x86_32=$DEP_Fedora_common dev86 acpica-tools texinfo local DEP_Fedora_x86_64=$DEP_Fedora_x86_32 glibc-devel.i686 -echo installing Xen dependencies -eval install_dependencies \$DEP_$DISTRO_$ARCH +echo Checking Xen dependencies +eval check-package \$DEP_$DISTRO_$ARCH } function xen_build() { -_xen_install_dependencies - cd $BASEDIR git-checkout $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir cd xen-dir diff --git a/lib/build.sh b/lib/build.sh index ab1e087..a453874 100755 --- a/lib/build.sh +++ b/lib/build.sh @@ -2,32 +2,72 @@ set -e -_help() { -echo Usage: ./build.sh options command +RAISIN_HELP+=(check-builddep Check to make sure we have all dependencies installed) +function check-builddep() { +local -a missing + +$arg_parse + +default deps false ; $default_post + +$requireargs PKGTYPE DISTRO + +check-package git + +if [[ $DISTRO = Fedora ]] ; then +check-package rpm-build +fi + +for_each_component check_package + +if [[ -n ${missing[@]} ]] ; then + echo Missing packages: ${missing[@]} + if $deps ; then + echo Installing... + install-package ${missing[@]} + else + echo Please install, or run ./raise install-builddep + exit 1 +
Re: [Xen-devel] [PATCH 5/6] x86/smp: Allocate pcpu stacks on their local numa node
At 18:26 +0100 on 07 Apr (1428431179), Andrew Cooper wrote: Previously, all pcpu stacks tended to be allocated on node 0. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Tim Deegan t...@xen.org ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] domU jiffies not incrementing - timer issue? - Kernel 3.18.10 on Xen 4.5.0
On 31 March 2015 at 17:31, Mark Chambers m...@overnetdata.com wrote: On 31 March 2015 at 11:56, Mark Chambers m...@overnetdata.com wrote: It's nested under Hyper-V in the same manner as the problematic install. I was deliberately trying to replicate the issue, but the problem doesn't manifest. Mark Hi, I've got it booting. The machine without boot problems reports the use of emulated TSC: (XEN) TSC not marked as either constant or reliable, warp=575 (count=2) (XEN) dom109: mode=0,ofs=0x417376aa9c8c,khz=2633032,inc=1,vtsc count: 3576850 kernel, 9534 user The machine with problems reports no domains having emulated TSC: (XEN) TSC has constant rate, deep Cstates possible, so not reliable, warp=0 (count=3) (XEN) dom23: mode=0,ofs=0x41dc316839ac,khz=2208968,inc=1 (XEN) No domains have emulated TSC I have nothing specified in the xl config for tsc_mode. If I set tsc_mode='native' and restart the DomU it boots without any problems. If I explicitly specify any of the other tsc_mode it gets stuck with jiffies not incrementing as before. Mark Hi all, As Xen is reporting that TSC has constant rate, deep Cstates possible, so not reliable it would seem risky to use native mode on the domU and I would prefer to use emulated mode. I added debug code to xen to help understand why jiffies increment correctly in a DomU on one system but not at all on another system with an identical software setup. I don't understand the mechanism for updating jiffies in a Linux PV under Xen but I have been looking at the x86 time and trap code inside Xen and have gained a little insight. System 1 and system 2 have identical software configurations. Both are running windows 2012 RC2 hyper-V, running a VM which contains Xen 4.5.0 running a linux 3.18.10 dom0 running a PV domU. The biggest difference are their CPUs. System 1 has a AMD Athlon(tm) 7750 Dual-Core Processor System 2 has a Intel(R) Xeon(R) CPU E5520 From what I can deduce when using emulated TSC xen should receive lots of RDTSC traps (opcode 0x31). The system that isn't working doesn't receive any RDTSC traps. I suspect this may be a bug in Xen or the PV code in the linux kernel. i.e: System 1 tsc_mode='always_emulate' - xen receives RDTSC traps System 1 tsc_mode='native' - xen doesn't receive RDTSC traps System 2 tsc_mode='always_emulate' - xen doesn't receive RDTSC traps and DomU's jiffies do not increment. System 2 tsc_mode='native' - works, xen doesn't receive RDTSC traps I don't know if this is useful but the tsc lines from cpuid on system 1 report: RDTSCP= false TscInvariant = false MSR based TSC rate control= false while on system 2: IA32_TSC_ADJUST MSR supported= false RDTSCP = false TscInvariant = false I'm trying to understand how the jiffies are updated in a PV DomU when the TSC is emulated. If anyone can point me in the right direction it'd be much appreciated. Thanks for your time, Mark ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] qemu-trad: xenstore: use relative path for device-model node
Wei Liu writes (Re: [PATCH] qemu-trad: xenstore: use relative path for device-model node): On Thu, Apr 09, 2015 at 06:46:31PM +0100, Ian Jackson wrote: Right. So that means that this patch needs to go in at the same time as the corresponding libxl change. I don't follow go in at the same time. They are in two different trees, don't they? The commit id of the qemu-trad tree is in Config.mk in xen.git. So it is possible to update them simultaneously. (Of course not every way of building and deploying Xen will honour this, but if you don't honour it you deserve what you get.) And the answer is that unless both libxl and qemu change at the same time, it would be a regression in -unstable ? It would be a regression because stubdom in -unstable is working now with Paul's workaround. So yes, both changes need to go in at the same time -- though I don't know how you would do that. Right. That's what the Config.mk update is for. So if the libxl patch is otherwise ready, we can commit both at once. I will commit and push to qemu-trad, update the libxl patch to contain the Config.mk update as well, and push the result to xen.git. We normally explain the need to do this in the commit message for the patches, and cross reference the two commits. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/6] x86/smp: Clean up use of memflags in cpu_smpboot_alloc()
On 09/04/15 16:02, Tim Deegan wrote: At 18:26 +0100 on 07 Apr (1428431177), Andrew Cooper wrote: Hoist MEMF_node(cpu_to_node(cpu)) to the start of the function, and avoid passing (potentially bogus) memflags if node information is not available. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com As it happens, MEMF_node(NUMA_NO_NODE) is already == 0. Only because of a masked overflow. That is why (potentially) is in brackets. I'm not sure if that's by design or not, but this looks robuster. :) Indeed. Reviewed-by: Tim Deegan t...@xen.org ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V9 5/6] xen/xsm: Split vm_event_op into three separate labels
The XSM label vm_event_op has been used to control the three memops controlling mem_access, mem_paging and mem_sharing. While these systems rely on vm_event, these are not vm_event operations themselves. Thus, in this patch we introduce three separate labels for each of these memops. Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov Acked-by: Tim Deegan t...@xen.org --- xen/arch/x86/mm/mem_paging.c| 2 +- xen/arch/x86/mm/mem_sharing.c | 2 +- xen/common/mem_access.c | 2 +- xen/include/xsm/dummy.h | 20 +++- xen/include/xsm/xsm.h | 33 ++--- xen/xsm/dummy.c | 13 - xen/xsm/flask/hooks.c | 33 ++--- xen/xsm/flask/policy/access_vectors | 6 ++ 8 files changed, 100 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/mm/mem_paging.c b/xen/arch/x86/mm/mem_paging.c index 17d2319..9ee3aba 100644 --- a/xen/arch/x86/mm/mem_paging.c +++ b/xen/arch/x86/mm/mem_paging.c @@ -39,7 +39,7 @@ int mem_paging_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_paging_op_t) arg) if ( rc ) return rc; -rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_paging_op); +rc = xsm_mem_paging(XSM_DM_PRIV, d); if ( rc ) goto out; diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index ff01378..78fb013 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -1311,7 +1311,7 @@ int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) if ( rc ) return rc; -rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_sharing_op); +rc = xsm_mem_sharing(XSM_DM_PRIV, d); if ( rc ) goto out; diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c index 511c8c5..aa00513 100644 --- a/xen/common/mem_access.c +++ b/xen/common/mem_access.c @@ -48,7 +48,7 @@ int mem_access_memop(unsigned long cmd, if ( !p2m_mem_access_sanity_check(d) ) goto out; -rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_access_op); +rc = xsm_mem_access(XSM_DM_PRIV, d); if ( rc ) goto out; diff --git a/xen/include/xsm/dummy.h b/xen/include/xsm/dummy.h index 50ee929..16967ed 100644 --- a/xen/include/xsm/dummy.h +++ b/xen/include/xsm/dummy.h @@ -519,11 +519,29 @@ static XSM_INLINE int xsm_vm_event_control(XSM_DEFAULT_ARG struct domain *d, int return xsm_default_action(action, current-domain, d); } -static XSM_INLINE int xsm_vm_event_op(XSM_DEFAULT_ARG struct domain *d, int op) +#ifdef HAS_MEM_ACCESS +static XSM_INLINE int xsm_mem_access(XSM_DEFAULT_ARG struct domain *d) { XSM_ASSERT_ACTION(XSM_DM_PRIV); return xsm_default_action(action, current-domain, d); } +#endif + +#ifdef HAS_MEM_PAGING +static XSM_INLINE int xsm_mem_paging(XSM_DEFAULT_ARG struct domain *d) +{ +XSM_ASSERT_ACTION(XSM_DM_PRIV); +return xsm_default_action(action, current-domain, d); +} +#endif + +#ifdef HAS_MEM_SHARING +static XSM_INLINE int xsm_mem_sharing(XSM_DEFAULT_ARG struct domain *d) +{ +XSM_ASSERT_ACTION(XSM_DM_PRIV); +return xsm_default_action(action, current-domain, d); +} +#endif #ifdef CONFIG_X86 static XSM_INLINE int xsm_do_mca(XSM_DEFAULT_VOID) diff --git a/xen/include/xsm/xsm.h b/xen/include/xsm/xsm.h index ca8371c..49f06c9 100644 --- a/xen/include/xsm/xsm.h +++ b/xen/include/xsm/xsm.h @@ -142,7 +142,18 @@ struct xsm_operations { int (*get_vnumainfo) (struct domain *d); int (*vm_event_control) (struct domain *d, int mode, int op); -int (*vm_event_op) (struct domain *d, int op); + +#ifdef HAS_MEM_ACCESS +int (*mem_access) (struct domain *d); +#endif + +#ifdef HAS_MEM_PAGING +int (*mem_paging) (struct domain *d); +#endif + +#ifdef HAS_MEM_SHARING +int (*mem_sharing) (struct domain *d); +#endif #ifdef CONFIG_X86 int (*do_mca) (void); @@ -546,10 +557,26 @@ static inline int xsm_vm_event_control (xsm_default_t def, struct domain *d, int return xsm_ops-vm_event_control(d, mode, op); } -static inline int xsm_vm_event_op (xsm_default_t def, struct domain *d, int op) +#ifdef HAS_MEM_ACCESS +static inline int xsm_mem_access (xsm_default_t def, struct domain *d) { -return xsm_ops-vm_event_op(d, op); +return xsm_ops-mem_access(d); } +#endif + +#ifdef HAS_MEM_PAGING +static inline int xsm_mem_paging (xsm_default_t def, struct domain *d) +{ +return xsm_ops-mem_paging(d); +} +#endif + +#ifdef HAS_MEM_SHARING +static inline int xsm_mem_sharing (xsm_default_t def, struct domain *d) +{ +return xsm_ops-mem_sharing(d); +} +#endif #ifdef CONFIG_X86 static inline int xsm_do_mca(xsm_default_t def) diff --git a/xen/xsm/dummy.c b/xen/xsm/dummy.c index 6d12d32..3ddb4f6 100644 --- a/xen/xsm/dummy.c +++ b/xen/xsm/dummy.c @@ -119,7 +119,18 @@ void xsm_fixup_ops (struct
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
Euan Harris writes (Re: [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations): Yes, that would work, but an open loop approach like that can lead to frustratingly unreliable tests. I think it would be best to make the test aware of the state of the helper - or even in control of it. That would allow us to wait for the helper to reach a particular state before killing it. This is less bad than you might think because the helper's progress messages to libxl are at fairly predictable progress points. In any case, the helper (in general) runs concurrently with libxl, so when libxl decides to stop the progress there will often be a race. (Sometimes the helper has to stop and wait for libxl to confirm.) Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] x86/boot: Ensure the BSS is aligned on an 8 byte boundary
At 16:34 +0100 on 09 Apr (1428597298), Andrew Cooper wrote: On 09/04/15 16:15, Tim Deegan wrote: At 18:26 +0100 on 07 Apr (1428431180), Andrew Cooper wrote: --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -127,7 +127,8 @@ __start: mov $sym_phys(__bss_end),%ecx sub %edi,%ecx xor %eax,%eax -rep stosb +shr $2,%ecx +rep stosl Should this be shr $3 and stosq? You are aligning to 8 bytes in the linker runes. It is still 32bit code here, so no stosq available. Fair enough. :) I do however happen to know that the impending multiboot2 entry point is 64bit and is able to clear the BSS with stosq. OK. /* Interrogate CPU extended features via CPUID. */ mov $0x8000,%eax diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 4699a04..b1926e3 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -163,6 +163,7 @@ SECTIONS __init_end = .; .bss : { /* BSS */ + . = ALIGN(8); Here, we're already aligned to STACK_SIZE So we are - that should be fixed up. That alignment is not relevant to .init, but is relevant to .bss Yeah, I'm not sure whether it's a problem if __init_end != .bss; if not the alignment could just be moved down a bit. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 6/6] x86/boot: Ensure the BSS is aligned on an 8 byte boundary
On 09/04/15 16:15, Tim Deegan wrote: At 18:26 +0100 on 07 Apr (1428431180), Andrew Cooper wrote: --- a/xen/arch/x86/boot/head.S +++ b/xen/arch/x86/boot/head.S @@ -127,7 +127,8 @@ __start: mov $sym_phys(__bss_end),%ecx sub %edi,%ecx xor %eax,%eax -rep stosb +shr $2,%ecx +rep stosl Should this be shr $3 and stosq? You are aligning to 8 bytes in the linker runes. It is still 32bit code here, so no stosq available. I do however happen to know that the impending multiboot2 entry point is 64bit and is able to clear the BSS with stosq. /* Interrogate CPU extended features via CPUID. */ mov $0x8000,%eax diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S index 4699a04..b1926e3 100644 --- a/xen/arch/x86/xen.lds.S +++ b/xen/arch/x86/xen.lds.S @@ -163,6 +163,7 @@ SECTIONS __init_end = .; .bss : { /* BSS */ + . = ALIGN(8); Here, we're already aligned to STACK_SIZE So we are - that should be fixed up. That alignment is not relevant to .init, but is relevant to .bss , which the .bss.stack_aligned just below is relying on. So on the one hand this new alignment comment is sort-of-harmless, but on the other hand it distracts from the larger and more important alignment. I will see about fixing this up differently, but with the same overall effect that stosl/stosq can be used. ~Andrew ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2 02/15] xen: save linear p2m list address in shared info structure
On 09/04/15 07:55, Juergen Gross wrote: The virtual address of the linear p2m list should be stored in the shared info structure read by the Xen tools to be able to support 64 bit pv-domains larger than 512 GB. Additionally the linear p2m list interface includes a generation count which is changed prior to and after each mapping change of the p2m list. Reading the generation count the Xen tools can detect changes of the mappings and re-read the p2m list eventually. Reviewed-by: David Vrabel david.vra...@citrix.com David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle
Julien Grall writes ([PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle): The partial device tree may contains phandle. The Device Tree Compiler tends to allocate the phandle from 1. I have to say I have no idea what a phandle is... Reserve the ID 65000 for the GIC phandle. I think we can safely assume that the partial device tree will never contain a such ID. Do we control the DT compiler ? What if it should change its phandle allocation algorithm ? +/* + * The device tree compiler (DTC) is allocating the phandle from 1 to + * onwards. Reserve a high value for the GIC phandle. + */ FYI this should read The device tree compiler (DTC) allocates phandle values frrom 1 onwards. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v20 08/13] x86/VPMU: When handling MSR accesses, leave fault injection to callers
With this patch return value of 1 of vpmu_do_msr() will now indicate whether an error was encountered during MSR processing (instead of stating that the access was to a VPMU register). As part of this patch we also check for validity of certain MSR accesses right when we determine which register is being written, as opposed to postponing this until later. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com --- xen/arch/x86/hvm/svm/svm.c| 6 ++- xen/arch/x86/hvm/svm/vpmu.c | 6 +-- xen/arch/x86/hvm/vmx/vmx.c| 24 +--- xen/arch/x86/hvm/vmx/vpmu_core2.c | 82 ++- 4 files changed, 55 insertions(+), 63 deletions(-) diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index e523d12..4fe36e9 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1709,7 +1709,8 @@ static int svm_msr_read_intercept(unsigned int msr, uint64_t *msr_content) case MSR_AMD_FAM15H_EVNTSEL3: case MSR_AMD_FAM15H_EVNTSEL4: case MSR_AMD_FAM15H_EVNTSEL5: -vpmu_do_rdmsr(msr, msr_content); +if ( vpmu_do_rdmsr(msr, msr_content) ) +goto gpf; break; case MSR_AMD64_DR0_ADDRESS_MASK: @@ -1860,7 +1861,8 @@ static int svm_msr_write_intercept(unsigned int msr, uint64_t msr_content) case MSR_AMD_FAM15H_EVNTSEL3: case MSR_AMD_FAM15H_EVNTSEL4: case MSR_AMD_FAM15H_EVNTSEL5: -vpmu_do_wrmsr(msr, msr_content, 0); +if ( vpmu_do_wrmsr(msr, msr_content, 0) ) +goto gpf; break; case MSR_IA32_MCx_MISC(4): /* Threshold register */ diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 58a0dc4..474d0db 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -305,7 +305,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, is_pmu_enabled(msr_content) !vpmu_is_set(vpmu, VPMU_RUNNING) ) { if ( !acquire_pmu_ownership(PMU_OWNER_HVM) ) -return 1; +return 0; vpmu_set(vpmu, VPMU_RUNNING); if ( has_hvm_container_vcpu(v) is_msr_bitmap_on(vpmu) ) @@ -335,7 +335,7 @@ static int amd_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, /* Write to hw counters */ wrmsrl(msr, msr_content); -return 1; +return 0; } static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) @@ -353,7 +353,7 @@ static int amd_vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) rdmsrl(msr, *msr_content); -return 1; +return 0; } static void amd_vpmu_destroy(struct vcpu *v) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index d71aa07..e31c38d 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -2133,12 +2133,17 @@ static int vmx_msr_read_intercept(unsigned int msr, uint64_t *msr_content) *msr_content |= MSR_IA32_MISC_ENABLE_BTS_UNAVAIL | MSR_IA32_MISC_ENABLE_PEBS_UNAVAIL; /* Perhaps vpmu will change some bits. */ +/* FALLTHROUGH */ +case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7): +case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(3): +case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: +case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: +case MSR_IA32_PEBS_ENABLE: +case MSR_IA32_DS_AREA: if ( vpmu_do_rdmsr(msr, msr_content) ) -goto done; +goto gp_fault; break; default: -if ( vpmu_do_rdmsr(msr, msr_content) ) -break; if ( passive_domain_do_rdmsr(msr, msr_content) ) goto done; switch ( long_mode_do_msr_read(msr, msr_content) ) @@ -2314,7 +2319,7 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) if ( msr_content ~supported ) { /* Perhaps some other bits are supported in vpmu. */ -if ( !vpmu_do_wrmsr(msr, msr_content, supported) ) +if ( vpmu_do_wrmsr(msr, msr_content, supported) ) break; } if ( msr_content IA32_DEBUGCTLMSR_LBR ) @@ -2342,9 +2347,16 @@ static int vmx_msr_write_intercept(unsigned int msr, uint64_t msr_content) if ( !nvmx_msr_write_intercept(msr, msr_content) ) goto gp_fault; break; +case MSR_P6_PERFCTR(0)...MSR_P6_PERFCTR(7): +case MSR_P6_EVNTSEL(0)...MSR_P6_EVNTSEL(7): +case MSR_CORE_PERF_FIXED_CTR0...MSR_CORE_PERF_FIXED_CTR2: +case MSR_CORE_PERF_FIXED_CTR_CTRL...MSR_CORE_PERF_GLOBAL_OVF_CTRL: +case MSR_IA32_PEBS_ENABLE: +case MSR_IA32_DS_AREA: + if ( vpmu_do_wrmsr(msr, msr_content, 0) ) +goto gp_fault; +break; default: -if ( vpmu_do_wrmsr(msr, msr_content, 0) ) -return
[Xen-devel] [PATCH V9 2/6] xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag
There are no use-cases for this flag. Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com Acked-by: Tim Deegan t...@xen.org --- xen/arch/x86/mm/mem_sharing.c | 3 --- xen/arch/x86/mm/p2m.c | 3 --- xen/common/mem_access.c | 3 --- xen/include/public/vm_event.h | 1 - 4 files changed, 10 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 4e5477a..e6572af 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -606,9 +606,6 @@ int mem_sharing_sharing_resume(struct domain *d) continue; } -if ( rsp.flags VM_EVENT_FLAG_DUMMY ) -continue; - /* Validate the vcpu_id in the response. */ if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] ) continue; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 1d3356a..4032c62 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1312,9 +1312,6 @@ void p2m_mem_paging_resume(struct domain *d) continue; } -if ( rsp.flags VM_EVENT_FLAG_DUMMY ) -continue; - /* Validate the vcpu_id in the response. */ if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] ) continue; diff --git a/xen/common/mem_access.c b/xen/common/mem_access.c index f925ac7..7ed8a4e 100644 --- a/xen/common/mem_access.c +++ b/xen/common/mem_access.c @@ -44,9 +44,6 @@ void mem_access_resume(struct domain *d) continue; } -if ( rsp.flags VM_EVENT_FLAG_DUMMY ) -continue; - /* Validate the vcpu_id in the response. */ if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] ) continue; diff --git a/xen/include/public/vm_event.h b/xen/include/public/vm_event.h index ed9105b..c7426de 100644 --- a/xen/include/public/vm_event.h +++ b/xen/include/public/vm_event.h @@ -47,7 +47,6 @@ #define VM_EVENT_FLAG_VCPU_PAUSED (1 0) /* Flags to aid debugging mem_event */ #define VM_EVENT_FLAG_FOREIGN (1 1) -#define VM_EVENT_FLAG_DUMMY (1 2) /* * Reasons for the vm event request -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 p2 13/19] tools/libxl: Create a per-arch function to map IRQ to a domain
From: Julien Grall julien.gr...@linaro.org ARM and x86 use a different hypercall to map an IRQ to a domain. The hypercall to give IRQ permission to the domain has also been moved to be an x86 specific function as ARM guest won't be able to manage the IRQ. We may want to support it later. Signed-off-by: Julien Grall julien.gr...@linaro.org Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Wei Liu wei.l...@citrix.com --- Changes in v5: - Use the new function xc_domain_bind_pt_spi_irq - Fix typoes Changes in v4: - Patch added --- tools/libxl/libxl_arch.h | 4 tools/libxl/libxl_arm.c| 5 + tools/libxl/libxl_create.c | 6 ++ tools/libxl/libxl_x86.c| 13 + 4 files changed, 24 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h index cae64c0..77b1f2a 100644 --- a/tools/libxl/libxl_arch.h +++ b/tools/libxl/libxl_arch.h @@ -39,4 +39,8 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *b_info, libxl__domain_build_state *state); + +/* arch specific irq map function */ +int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq); + #endif diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index 5a5cb3f..aa302fd 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -742,6 +742,11 @@ int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc, return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state); } +int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq) +{ +return xc_domain_bind_pt_spi_irq(CTX-xch, domid, irq); +} + /* * Local variables: * mode: C diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index e5a343f..15b464e 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -1205,11 +1205,9 @@ static void domcreate_launch_dm(libxl__egc *egc, libxl__multidev *multidev, LOG(DEBUG, dom%d irq %d, domid, irq); -ret = irq = 0 ? xc_physdev_map_pirq(CTX-xch, domid, irq, irq) +ret = irq = 0 ? libxl__arch_domain_map_irq(gc, domid, irq) : -EOVERFLOW; -if (!ret) -ret = xc_domain_irq_permission(CTX-xch, domid, irq, 1); -if (ret 0) { +if (ret) { LOGE(ERROR, failed give dom%d access to irq %d, domid, irq); ret = ERROR_FAIL; goto error_out; diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index 5e9a8d2..ed2bd38 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -424,6 +424,19 @@ out: return rc; } +int libxl__arch_domain_map_irq(libxl__gc *gc, uint32_t domid, int irq) +{ +int ret; + +ret = xc_physdev_map_pirq(CTX-xch, domid, irq, irq); +if (ret) +return ret; + +ret = xc_domain_irq_permission(CTX-xch, domid, irq, 1); + +return ret; +} + /* * Local variables: * mode: C -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle
From: Julien Grall julien.gr...@linaro.org The partial device tree may contains phandle. The Device Tree Compiler tends to allocate the phandle from 1. Reserve the ID 65000 for the GIC phandle. I think we can safely assume that the partial device tree will never contain a such ID. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Wei Liu wei.l...@citrix.com --- To allocate dynamically the phandle, we would need to fill in post-hoc (like we do with e.g the initramfs location) the #interrupt-parent in /. That would also require some refactoring in the code to pass the phandle every time. Defer this solution to a follow-up in order as having 65000 would be very unlikely. Changes in v5: - Add Ian's Ack. Changes in v3: - Patch added --- tools/libxl/libxl_arm.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index 2ce7e23..cf1379d 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -80,10 +80,11 @@ static struct arch_info { {xen-3.0-aarch64, arm,armv8-timer, arm,armv8 }, }; -enum { -PHANDLE_NONE = 0, -PHANDLE_GIC, -}; +/* + * The device tree compiler (DTC) is allocating the phandle from 1 to + * onwards. Reserve a high value for the GIC phandle. + */ +#define PHANDLE_GIC (65000) typedef uint32_t be32; typedef be32 gic_interrupt[3]; -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 00/12] enable Cache Allocation Technology (CAT) for VMs
Changes in v4: * Address comments from Andrew and Ian(Detail in patch). * Split COS/CBM management patch into 4 small patches. * Add documentation xl-psr.markdown. Changes in v3: * Address comments from Jan and Ian(Detail in patch). * Add xl sample output in cover letter. Changes in v2: * Address comments from Konrad and Jan(Detail in patch): * Make all cat unrelated changes into the preparation patches. This patch serial enable the new Cache Allocation Technology (CAT) feature found in Intel Broadwell and later server platform. In Xen's implementation, CAT is used to control cache allocation on VM basis. Detail hardware spec can be found in section 17.15 of the Intel SDM [1]. The design for XEN can be found at [2]. patch1-2: preparation. patch3-11: real work for CAT. patch12: xl document for CMT/MBM/CAT. [1] Intel SDM (http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf) [2] CAT design for XEN( http://lists.xen.org/archives/html/xen-devel/2014-12/msg01382.html) Chao Peng (12): x86: clean up psr boot parameter parsing x86: improve psr scheduling code x86: detect and initialize Intel CAT feature x86: maintain COS to CBM mapping for each socket x86: maintain socket CPU mask for CAT x86: add COS information for each domain x86: expose CBM length and COS number information x86: dynamically get/set CBM for a domain x86: add scheduling support for Intel CAT xsm: add CAT related xsm policies tools: add tools support for Intel CAT docs: add xl-psr.markdown docs/man/xl.pod.1| 38 +++ docs/misc/xen-command-line.markdown | 13 +- docs/misc/xl-psr.markdown| 111 +++ tools/flask/policy/policy/modules/xen/xen.if | 2 +- tools/flask/policy/policy/modules/xen/xen.te | 4 +- tools/libxc/include/xenctrl.h| 15 + tools/libxc/xc_psr.c | 76 + tools/libxl/libxl.h | 26 ++ tools/libxl/libxl_psr.c | 168 +- tools/libxl/libxl_types.idl | 10 + tools/libxl/xl.h | 4 + tools/libxl/xl_cmdimpl.c | 140 + tools/libxl/xl_cmdtable.c| 12 + xen/arch/x86/domain.c| 13 +- xen/arch/x86/domctl.c| 18 ++ xen/arch/x86/psr.c | 446 --- xen/arch/x86/sysctl.c| 18 ++ xen/include/asm-x86/cpufeature.h | 1 + xen/include/asm-x86/domain.h | 5 +- xen/include/asm-x86/msr-index.h | 1 + xen/include/asm-x86/psr.h| 14 +- xen/include/public/domctl.h | 12 + xen/include/public/sysctl.h | 16 + xen/xsm/flask/hooks.c| 6 + xen/xsm/flask/policy/access_vectors | 6 + 25 files changed, 1120 insertions(+), 55 deletions(-) create mode 100644 docs/misc/xl-psr.markdown -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 05/12] x86: maintain socket CPU mask for CAT
Some CAT resource/registers exist in socket level and they must be accessed from the CPU of the corresponding socket. It's common to pick an arbitrary CPU from the socket. To make the picking easy, it's useful to maintain a reference to the cpu_core_mask which contains all the siblings of a CPU in the same socket. The reference needs to be synchronized with the CPU up/down. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/psr.c | 24 1 file changed, 24 insertions(+) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 4aff5f6..7de2504 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -32,6 +32,7 @@ struct psr_cat_socket_info { unsigned int cbm_len; unsigned int cos_max; struct psr_cat_cbm *cos_cbm_map; +cpumask_t *socket_cpu_mask; }; struct psr_assoc { @@ -234,6 +235,8 @@ static void cat_cpu_init(unsigned int cpu) ASSERT(socket nr_sockets); info = cat_socket_info + socket; +if ( info-socket_cpu_mask == NULL ) +info-socket_cpu_mask = per_cpu(cpu_core_mask, cpu); /* Avoid initializing more than one times for the same socket. */ if ( test_and_set_bool(info-initialized) ) @@ -274,6 +277,24 @@ static void psr_cpu_init(unsigned int cpu) psr_assoc_init(cpu); } +static void psr_cpu_fini(unsigned int cpu) +{ +unsigned int socket, next; +cpumask_t *cpu_mask; + +if ( cat_socket_info ) +{ +socket = cpu_to_socket(cpu); +cpu_mask = cat_socket_info[socket].socket_cpu_mask; + +if ( (next = cpumask_cycle(cpu, cpu_mask)) == cpu ) +cat_socket_info[socket].socket_cpu_mask = NULL; +else +cat_socket_info[socket].socket_cpu_mask = +per_cpu(cpu_core_mask, next); +} +} + static int cpu_callback( struct notifier_block *nfb, unsigned long action, void *hcpu) { @@ -284,6 +305,9 @@ static int cpu_callback( case CPU_STARTING: psr_cpu_init(cpu); break; +case CPU_DYING: +psr_cpu_fini(cpu); +break; } return NOTIFY_DONE; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 08/12] x86: dynamically get/set CBM for a domain
For CAT, COS is maintained in hypervisor only while CBM is exposed to user space directly to allow getting/setting domain's cache capacity. For each specified CBM, hypervisor will either use a existed COS which has the same CBM or allocate a new one if the same CBM is not found. If the allocation fails because of no enough COS available then error is returned. The getting/setting are always operated on a specified socket. For multiple sockets system, the interface may be called several times. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/domctl.c | 18 ++ xen/arch/x86/psr.c | 126 xen/include/asm-x86/msr-index.h | 1 + xen/include/asm-x86/psr.h | 2 + xen/include/public/domctl.h | 12 5 files changed, 159 insertions(+) diff --git a/xen/arch/x86/domctl.c b/xen/arch/x86/domctl.c index d4f6ccf..89a6b33 100644 --- a/xen/arch/x86/domctl.c +++ b/xen/arch/x86/domctl.c @@ -1326,6 +1326,24 @@ long arch_do_domctl( } break; +case XEN_DOMCTL_psr_cat_op: +switch ( domctl-u.psr_cat_op.cmd ) +{ +case XEN_DOMCTL_PSR_CAT_OP_SET_L3_CBM: +ret = psr_set_l3_cbm(d, domctl-u.psr_cat_op.target, + domctl-u.psr_cat_op.data); +break; +case XEN_DOMCTL_PSR_CAT_OP_GET_L3_CBM: +ret = psr_get_l3_cbm(d, domctl-u.psr_cat_op.target, + domctl-u.psr_cat_op.data); +copyback = 1; +break; +default: +ret = -EOPNOTSUPP; +break; +} +break; + default: ret = iommu_do_domctl(domctl, d, u_domctl); break; diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index e390fd9..5247bcd 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -56,6 +56,17 @@ static unsigned int get_socket_count(void) return DIV_ROUND_UP(nr_cpu_ids, cpus_per_socket); } +static unsigned int get_socket_cpu(unsigned int socket) +{ +if ( socket nr_sockets ) +{ +cpumask_t *cpu_mask = cat_socket_info[socket].socket_cpu_mask; +ASSERT(cpu_mask != NULL); +return cpumask_any(cpu_mask); +} +return nr_cpu_ids; +} + static void __init parse_psr_bool(char *s, char *value, char *feature, unsigned int mask) { @@ -252,6 +263,121 @@ int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len, return 0; } +int psr_get_l3_cbm(struct domain *d, unsigned int socket, uint64_t *cbm) +{ +unsigned int cos; +struct psr_cat_socket_info *info; +int ret = get_cat_socket_info(socket, info); + +if ( ret ) +return ret; + +cos = d-arch.psr_cos_ids[socket]; +*cbm = info-cos_cbm_map[cos].cbm; +return 0; +} + +static bool_t psr_check_cbm(unsigned int cbm_len, uint64_t cbm) +{ +unsigned int first_bit, zero_bit; + +/* Set bits should only in the range of [0, cbm_len). */ +if ( cbm (~0ull cbm_len) ) +return 0; + +first_bit = find_first_bit(cbm, cbm_len); +zero_bit = find_next_zero_bit(cbm, cbm_len, first_bit); + +/* Set bits should be contiguous. */ +if ( zero_bit cbm_len + find_next_bit(cbm, cbm_len, zero_bit) cbm_len ) +return 0; + +return 1; +} + +struct cos_cbm_info +{ +unsigned int cos; +uint64_t cbm; +}; + +static void do_write_l3_cbm(void *data) +{ +struct cos_cbm_info *info = data; +wrmsrl(MSR_IA32_PSR_L3_MASK(info-cos), info-cbm); +} + +static int write_l3_cbm(unsigned int socket, unsigned int cos, uint64_t cbm) +{ +struct cos_cbm_info info = { .cos = cos, .cbm = cbm }; + +if ( socket == cpu_to_socket(smp_processor_id()) ) +do_write_l3_cbm(info); +else +{ +unsigned int cpu = get_socket_cpu(socket); + +if ( cpu = nr_cpu_ids ) +return -EBADSLT; +on_selected_cpus(cpumask_of(cpu), do_write_l3_cbm, info, 1); +} + +return 0; +} + +int psr_set_l3_cbm(struct domain *d, unsigned int socket, uint64_t cbm) +{ +unsigned int old_cos, cos; +struct psr_cat_cbm *map, *find; +struct psr_cat_socket_info *info; +int ret = get_cat_socket_info(socket, info); + +if ( ret ) +return ret; + +if ( !psr_check_cbm(info-cbm_len, cbm) ) +return -EINVAL; + +old_cos = d-arch.psr_cos_ids[socket]; +map = info-cos_cbm_map; +find = NULL; + +for ( cos = 0; cos = info-cos_max; cos++ ) +{ +/* If still not found, then keep unused one. */ +if ( !find cos != 0 map[cos].ref == 0 ) +find = map + cos; +else if ( map[cos].cbm == cbm ) +{ +if ( unlikely(cos == old_cos) ) +return -EEXIST; +find = map + cos; +break; +} +} + +/* If old cos is referred only by the domain, then use it. */ +if ( !find map[old_cos].ref == 1 ) +find = map +
[Xen-devel] [PATCH v4 03/12] x86: detect and initialize Intel CAT feature
Detect Intel Cache Allocation Technology(CAT) feature and store the cpuid information for later use. Currently only L3 cache allocation is supported. The L3 CAT features may vary among sockets so per-socket feature information is stored. The initialization can happen either at boot time or when CPU(s) is hot plugged after booting. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v4: * check X86_FEATURE_CAT available before doing initialization. Changes in v3: * Remove num_sockets boot option instead calculate it at boot time. * Name hardcoded CAT cpuid leaf as PSR_CPUID_LEVEL_CAT. Changes in v2: * socket_num = num_sockets and fix several documentaion issues. * refactor boot line parameters parsing into standlone patch. * set opt_num_sockets = NR_CPUS when opt_num_sockets NR_CPUS. * replace CPU_ONLINE with CPU_STARTING and integrate that into scheduling improvement patch. * reimplement get_max_socket() with cpu_to_socket(); * cbm is still uint64 as there is a path forward for supporting long masks. --- docs/misc/xen-command-line.markdown | 13 +-- xen/arch/x86/psr.c | 68 +++-- xen/include/asm-x86/cpufeature.h| 1 + xen/include/asm-x86/psr.h | 3 ++ 4 files changed, 81 insertions(+), 4 deletions(-) diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index 1dda1f0..9ad8801 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1122,9 +1122,9 @@ This option can be specified more than once (up to 8 times at present). `= integer` ### psr (Intel) - `= List of ( cmt:boolean | rmid_max:integer )` + `= List of ( cmt:boolean | rmid_max:integer | cat:boolean )` - Default: `psr=cmt:0,rmid_max:255` + Default: `psr=cmt:0,rmid_max:255,cat:0` Platform Shared Resource(PSR) Services. Intel Haswell and later server platforms offer information about the sharing of resources. @@ -1134,6 +1134,11 @@ Monitoring ID(RMID) is used to bind the domain to corresponding shared resource. RMID is a hardware-provided layer of abstraction between software and logical processors. +To use the PSR cache allocation service for a certain domain, a capacity +bitmasks(CBM) is used to bind the domain to corresponding shared resource. +CBM represents cache capacity and indicates the degree of overlap and isolation +between domains. + The following resources are available: * Cache Monitoring Technology (Haswell and later). Information regarding the @@ -1144,6 +1149,10 @@ The following resources are available: total/local memory bandwidth. Follow the same options with Cache Monitoring Technology. +* Cache Alllocation Technology (Broadwell and later). Information regarding + the cache allocation. + * `cat` instructs Xen to enable/disable Cache Allocation Technology. + ### reboot `= t[riple] | k[bd] | a[cpi] | p[ci] | P[ower] | e[fi] | n[o] [, [w]arm | [c]old]` diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 6119c6e..16c37dd 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -19,17 +19,36 @@ #include asm/psr.h #define PSR_CMT(10) +#define PSR_CAT(11) + +struct psr_cat_socket_info { +bool_t initialized; +bool_t enabled; +unsigned int cbm_len; +unsigned int cos_max; +}; struct psr_assoc { uint64_t val; }; struct psr_cmt *__read_mostly psr_cmt; +static struct psr_cat_socket_info *__read_mostly cat_socket_info; + static unsigned int __initdata opt_psr; static unsigned int __initdata opt_rmid_max = 255; +static unsigned int __read_mostly nr_sockets; static uint64_t rmid_mask; static DEFINE_PER_CPU(struct psr_assoc, psr_assoc); +static unsigned int get_socket_count(void) +{ +unsigned int cpus_per_socket = boot_cpu_data.x86_max_cores * + boot_cpu_data.x86_num_siblings; + +return DIV_ROUND_UP(nr_cpu_ids, cpus_per_socket); +} + static void __init parse_psr_bool(char *s, char *value, char *feature, unsigned int mask) { @@ -63,6 +82,7 @@ static void __init parse_psr_param(char *s) *val_str++ = '\0'; parse_psr_bool(s, val_str, cmt, PSR_CMT); +parse_psr_bool(s, val_str, cat, PSR_CAT); if ( val_str !strcmp(s, rmid_max) ) opt_rmid_max = simple_strtoul(val_str, NULL, 0); @@ -194,8 +214,49 @@ void psr_ctxt_switch_to(struct domain *d) } } +static void cat_cpu_init(unsigned int cpu) +{ +unsigned int eax, ebx, ecx, edx; +struct psr_cat_socket_info *info; +unsigned int socket; +const struct cpuinfo_x86 *c = cpu_data + cpu; + +if ( !cpu_has(c, X86_FEATURE_CAT) ) +return; + +socket = cpu_to_socket(cpu); +ASSERT(socket nr_sockets); + +info = cat_socket_info + socket; + +/* Avoid initializing more than one times for the same socket. */ +if ( test_and_set_bool(info-initialized) ) +return; + +
[Xen-devel] [PATCH v4 04/12] x86: maintain COS to CBM mapping for each socket
For each socket, a COS to CBM mapping structure is maintained for each COS. The mapping is indexed by COS and the value is the corresponding CBM. Different VMs may use the same CBM, a reference count is used to indicate if the CBM is available. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/psr.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 16c37dd..4aff5f6 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -21,11 +21,17 @@ #define PSR_CMT(10) #define PSR_CAT(11) +struct psr_cat_cbm { +unsigned int ref; +uint64_t cbm; +}; + struct psr_cat_socket_info { bool_t initialized; bool_t enabled; unsigned int cbm_len; unsigned int cos_max; +struct psr_cat_cbm *cos_cbm_map; }; struct psr_assoc { @@ -240,6 +246,14 @@ static void cat_cpu_init(unsigned int cpu) info-cbm_len = (eax 0x1f) + 1; info-cos_max = (edx 0x); +info-cos_cbm_map = xzalloc_array(struct psr_cat_cbm, + info-cos_max + 1UL); +if ( !info-cos_cbm_map ) +return; + +/* cos=0 is reserved as default cbm(all ones). */ +info-cos_cbm_map[0].cbm = (1ull info-cbm_len) - 1; + info-enabled = 1; printk(XENLOG_INFO CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n, socket, info-cos_max, info-cbm_len); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 3/8] Move common-functions.sh and git-checkout.sh into lib
script implies something which is designed to be run standalone. lib implies that this is going to be sourced from another bash script. Also change git-checkout to be a function rather than a script Signed-off-by: George Dunlap george.dun...@eu.citrix.com --- CC: Stefano Stabellini stefano.stabell...@citrix.com --- components/grub | 2 +- components/libvirt | 2 +- components/xen | 2 +- {scripts = lib}/common-functions.sh | 0 lib/git-checkout.sh | 32 raise.sh | 3 ++- scripts/git-checkout.sh | 30 -- unraise.sh | 2 +- 8 files changed, 38 insertions(+), 35 deletions(-) rename {scripts = lib}/common-functions.sh (100%) create mode 100755 lib/git-checkout.sh delete mode 100755 scripts/git-checkout.sh diff --git a/components/grub b/components/grub index 5a42000..a5aa27d 100644 --- a/components/grub +++ b/components/grub @@ -29,7 +29,7 @@ function grub_build() { cd $BASEDIR rm -f memdisk.tar tar cf memdisk.tar -C data grub.cfg -./scripts/git-checkout.sh $GRUB_UPSTREAM_URL $GRUB_UPSTREAM_REVISION grub-dir +git-checkout $GRUB_UPSTREAM_URL $GRUB_UPSTREAM_REVISION grub-dir cd grub-dir ./autogen.sh ## GRUB32 diff --git a/components/libvirt b/components/libvirt index e22996e..6602dcf 100644 --- a/components/libvirt +++ b/components/libvirt @@ -26,7 +26,7 @@ function libvirt_build() { _libvirt_install_dependencies cd $BASEDIR -./scripts/git-checkout.sh $LIBVIRT_UPSTREAM_URL $LIBVIRT_UPSTREAM_REVISION libvirt-dir +git-checkout $LIBVIRT_UPSTREAM_URL $LIBVIRT_UPSTREAM_REVISION libvirt-dir cd libvirt-dir CFLAGS=-I$INST_DIR/$PREFIX/include \ LDFLAGS=-L$INST_DIR/$PREFIX/lib -Wl,-rpath-link=$INST_DIR/$PREFIX/lib \ diff --git a/components/xen b/components/xen index a49a1d1..70b72b0 100644 --- a/components/xen +++ b/components/xen @@ -23,7 +23,7 @@ function xen_build() { _xen_install_dependencies cd $BASEDIR -./scripts/git-checkout.sh $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir +git-checkout $XEN_UPSTREAM_URL $XEN_UPSTREAM_REVISION xen-dir cd xen-dir ./configure --prefix=$PREFIX $MAKE diff --git a/scripts/common-functions.sh b/lib/common-functions.sh similarity index 100% rename from scripts/common-functions.sh rename to lib/common-functions.sh diff --git a/lib/git-checkout.sh b/lib/git-checkout.sh new file mode 100755 index 000..2ca8f25 --- /dev/null +++ b/lib/git-checkout.sh @@ -0,0 +1,32 @@ +#!/usr/bin/env bash + +function git-checkout() { +if [[ $# -lt 3 ]] +then + echo Usage: $0 tree tag dir + exit 1 +fi + +TREE=$1 +TAG=$2 +DIR=$3 + +set -e + +if [[ ! -d $DIR-remote ]] +then + rm -rf $DIR-remote $DIR-remote.tmp + mkdir -p $DIR-remote.tmp; rmdir $DIR-remote.tmp + $GIT clone $TREE $DIR-remote.tmp + if [[ $TAG ]] + then + cd $DIR-remote.tmp + $GIT branch -D dummy /dev/null 21 ||: + $GIT checkout -b dummy $TAG + cd .. + fi + mv $DIR-remote.tmp $DIR-remote +fi +rm -f $DIR +ln -sf $DIR-remote $DIR +} diff --git a/raise.sh b/raise.sh index 3c8281e..422fbe4 100755 --- a/raise.sh +++ b/raise.sh @@ -3,7 +3,8 @@ set -e source config -source scripts/common-functions.sh +source lib/common-functions.sh +source lib/git-checkout.sh _help() { echo Usage: ./build.sh options command diff --git a/scripts/git-checkout.sh b/scripts/git-checkout.sh deleted file mode 100755 index 912bfae..000 --- a/scripts/git-checkout.sh +++ /dev/null @@ -1,30 +0,0 @@ -#!/usr/bin/env bash - -if [[ $# -lt 3 ]] -then - echo Usage: $0 tree tag dir - exit 1 -fi - -TREE=$1 -TAG=$2 -DIR=$3 - -set -e - -if [[ ! -d $DIR-remote ]] -then - rm -rf $DIR-remote $DIR-remote.tmp - mkdir -p $DIR-remote.tmp; rmdir $DIR-remote.tmp - $GIT clone $TREE $DIR-remote.tmp - if [[ $TAG ]] - then - cd $DIR-remote.tmp - $GIT branch -D dummy /dev/null 21 ||: - $GIT checkout -b dummy $TAG - cd .. - fi - mv $DIR-remote.tmp $DIR-remote -fi -rm -f $DIR -ln -sf $DIR-remote $DIR diff --git a/unraise.sh b/unraise.sh index 2f08901..50ce310 100755 --- a/unraise.sh +++ b/unraise.sh @@ -3,7 +3,7 @@ set -e source config -source scripts/common-functions.sh +source lib/common-functions.sh # start execution -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 4/8] Import raise.sh and unraise.sh into library
Make as few changes as possible to begin with, just to separate code motion from changes. For now, remove raise.sh and unraise.sh from package creation, until we can figure out what to do instead. Signed-off-by: George Dunlap george.dun...@eu.citrix.com --- CC: Stefano Stabellini stefano.stabell...@citrix.com --- raise.sh = lib/build.sh | 62 raise| 6 + scripts/mkdeb| 3 ++- scripts/mkrpm| 5 ++-- unraise.sh | 17 - 5 files changed, 31 insertions(+), 62 deletions(-) rename raise.sh = lib/build.sh (77%) delete mode 100755 unraise.sh diff --git a/raise.sh b/lib/build.sh similarity index 77% rename from raise.sh rename to lib/build.sh index 422fbe4..ab1e087 100755 --- a/raise.sh +++ b/lib/build.sh @@ -2,10 +2,6 @@ set -e -source config -source lib/common-functions.sh -source lib/git-checkout.sh - _help() { echo Usage: ./build.sh options command echo where options are: @@ -18,7 +14,9 @@ _help() { echo configureConfigure the system (requires sudo) } -_build() { +build() { +$arg_parse + if [[ $YES != y ]] then echo Do you want Raisin to automatically install build time dependencies for you? (y/n) @@ -50,7 +48,20 @@ _build() { build_package xen-system } -_install() { +unraise() { +$arg_parse + +for_each_component clean + +uninstall_package xen-system +for_each_component unconfigure + +rm -rf $INST_DIR +} + +install() { +$arg_parse + # need single braces for filename matching expansion if [ ! -f xen-sytem*rpm ] [ ! -f xen-system*deb ] then @@ -60,7 +71,9 @@ _install() { install_package xen-system } -_configure() { +configure() { +$arg_parse + if [[ $YES != y ]] then echo Proceeding we'll make changes to the running system, @@ -82,38 +95,3 @@ _configure() { for_each_component configure } -# start execution -common_init - -# parameters check -export VERBOSE=0 -export YES=n -export NO_DEPS=0 -while [[ $# -gt 1 ]] -do - if [[ $1 = -v || $1 = --verbose ]] - then -VERBOSE=1 -shift 1 - elif [[ $1 = -y || $1 = --yes ]] - then -YES=y -shift 1 - else -_help -exit 1 - fi -done - -case $1 in -build | install | configure ) -COMMAND=$1 -;; -*) -_help -exit 1 -;; -esac - -_$COMMAND - diff --git a/raise b/raise index 7f3faae..142956d 100755 --- a/raise +++ b/raise @@ -10,6 +10,12 @@ fi # Then as many as the sub-libraries as you need . ${RAISIN_PATH}/core.sh +. ${RAISIN_PATH}/common-functions.sh +. ${RAISIN_PATH}/git-checkout.sh +. ${RAISIN_PATH}/build.sh + +# Set up basic functionality +common_init # And do your own thing rather than running commands # I suggest defining a main function of your own and running it like this. diff --git a/scripts/mkdeb b/scripts/mkdeb index 46ade07..cb2a1b6 100755 --- a/scripts/mkdeb +++ b/scripts/mkdeb @@ -35,7 +35,8 @@ mkdir -p deb/opt/raisin cp -r data deb/opt/raisin cp -r components deb/opt/raisin cp -r scripts deb/opt/raisin -cp config raise.sh unraise.sh deb/opt/raisin +# FIXME +#cp config raise.sh unraise.sh deb/opt/raisin # Debian doesn't use /usr/lib64 for 64-bit libraries diff --git a/scripts/mkrpm b/scripts/mkrpm index c530466..90d9bdc 100755 --- a/scripts/mkrpm +++ b/scripts/mkrpm @@ -48,8 +48,9 @@ cp -r $BASEDIR/data \$RPM_BUILD_ROOT/opt/raisin cp -r $BASEDIR/components \$RPM_BUILD_ROOT/opt/raisin cp -r $BASEDIR/scripts \$RPM_BUILD_ROOT/opt/raisin cp $BASEDIR/config \$RPM_BUILD_ROOT/opt/raisin -cp $BASEDIR/raise.sh \$RPM_BUILD_ROOT/opt/raisin -cp $BASEDIR/unraise.sh \$RPM_BUILD_ROOT/opt/raisin +# FIXME +# cp $BASEDIR/raise.sh \$RPM_BUILD_ROOT/opt/raisin +# cp $BASEDIR/unraise.sh \$RPM_BUILD_ROOT/opt/raisin %clean diff --git a/unraise.sh b/unraise.sh deleted file mode 100755 index 50ce310..000 --- a/unraise.sh +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env bash - -set -e - -source config -source lib/common-functions.sh - - -# start execution -common_init - -for_each_component clean - -uninstall_package xen-system -for_each_component unconfigure - -rm -rf $INST_DIR -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V9 4/6] xen/vm_event: Relocate memop checks
The memop handler function for paging/sharing responsible for calling XSM doesn't really have anything to do with vm_event, thus in this patch we relocate it into mem_paging_memop and mem_sharing_memop. This has already been the approach in mem_access_memop, so in this patch we just make it consistent. Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Tim Deegan t...@xen.org --- v7: Minor fixes with returning error codes on rcu lock failure v6: Don't pass superfluous cmd to the memops. Unlock rcu's in sharing/paging Style fixes --- xen/arch/x86/mm/mem_paging.c | 41 ++--- xen/arch/x86/mm/mem_sharing.c | 125 +- xen/arch/x86/x86_64/compat/mm.c | 26 +--- xen/arch/x86/x86_64/mm.c | 24 +--- xen/common/vm_event.c | 43 - xen/include/asm-x86/mem_paging.h | 2 +- xen/include/asm-x86/mem_sharing.h | 3 +- xen/include/xen/vm_event.h| 1 - 8 files changed, 124 insertions(+), 141 deletions(-) diff --git a/xen/arch/x86/mm/mem_paging.c b/xen/arch/x86/mm/mem_paging.c index e63d8c1..17d2319 100644 --- a/xen/arch/x86/mm/mem_paging.c +++ b/xen/arch/x86/mm/mem_paging.c @@ -22,27 +22,45 @@ #include asm/p2m.h -#include xen/vm_event.h +#include xen/guest_access.h +#include xsm/xsm.h - -int mem_paging_memop(struct domain *d, xen_mem_paging_op_t *mpo) +int mem_paging_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_paging_op_t) arg) { -int rc = -ENODEV; -if ( unlikely(!d-vm_event-paging.ring_page) ) +int rc; +xen_mem_paging_op_t mpo; +struct domain *d; +bool_t copyback = 0; + +if ( copy_from_guest(mpo, arg, 1) ) +return -EFAULT; + +rc = rcu_lock_live_remote_domain_by_id(mpo.domain, d); +if ( rc ) return rc; -switch( mpo-op ) +rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_paging_op); +if ( rc ) +goto out; + +rc = -ENODEV; +if ( unlikely(!d-vm_event-paging.ring_page) ) +goto out; + +switch( mpo.op ) { case XENMEM_paging_op_nominate: -rc = p2m_mem_paging_nominate(d, mpo-gfn); +rc = p2m_mem_paging_nominate(d, mpo.gfn); break; case XENMEM_paging_op_evict: -rc = p2m_mem_paging_evict(d, mpo-gfn); +rc = p2m_mem_paging_evict(d, mpo.gfn); break; case XENMEM_paging_op_prep: -rc = p2m_mem_paging_prep(d, mpo-gfn, mpo-buffer); +rc = p2m_mem_paging_prep(d, mpo.gfn, mpo.buffer); +if ( !rc ) +copyback = 1; break; default: @@ -50,6 +68,11 @@ int mem_paging_memop(struct domain *d, xen_mem_paging_op_t *mpo) break; } +if ( copyback __copy_to_guest(arg, mpo, 1) ) +rc = -EFAULT; + +out: +rcu_unlock_domain(d); return rc; } diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index 4959407..ff01378 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -28,6 +28,7 @@ #include xen/grant_table.h #include xen/sched.h #include xen/rcupdate.h +#include xen/guest_access.h #include xen/vm_event.h #include asm/page.h #include asm/string.h @@ -1293,39 +1294,66 @@ int relinquish_shared_pages(struct domain *d) return rc; } -int mem_sharing_memop(struct domain *d, xen_mem_sharing_op_t *mec) +int mem_sharing_memop(XEN_GUEST_HANDLE_PARAM(xen_mem_sharing_op_t) arg) { -int rc = 0; +int rc; +xen_mem_sharing_op_t mso; +struct domain *d; + +rc = -EFAULT; +if ( copy_from_guest(mso, arg, 1) ) +return rc; + +if ( mso.op == XENMEM_sharing_op_audit ) +return mem_sharing_audit(); + +rc = rcu_lock_live_remote_domain_by_id(mso.domain, d); +if ( rc ) +return rc; + +rc = xsm_vm_event_op(XSM_DM_PRIV, d, XENMEM_sharing_op); +if ( rc ) +goto out; /* Only HAP is supported */ +rc = -ENODEV; if ( !hap_enabled(d) || !d-arch.hvm_domain.mem_sharing_enabled ) - return -ENODEV; +goto out; -switch(mec-op) +rc = -ENODEV; +if ( unlikely(!d-vm_event-share.ring_page) ) +goto out; + +switch ( mso.op ) { case XENMEM_sharing_op_nominate_gfn: { -unsigned long gfn = mec-u.nominate.u.gfn; +unsigned long gfn = mso.u.nominate.u.gfn; shr_handle_t handle; + +rc = -EINVAL; if ( !mem_sharing_enabled(d) ) -return -EINVAL; +goto out; + rc = mem_sharing_nominate_page(d, gfn, 0, handle); -mec-u.nominate.handle = handle; +mso.u.nominate.handle = handle; } break; case XENMEM_sharing_op_nominate_gref: { -grant_ref_t gref = mec-u.nominate.u.grant_ref; +grant_ref_t gref = mso.u.nominate.u.grant_ref; unsigned long gfn; shr_handle_t
[Xen-devel] [PATCH V9 3/6] xen/vm_event: Decouple vm_event and mem_access.
The vm_event subsystem has been artifically tied to the presence of mem_access. While mem_access does depend on vm_event, vm_event is an entirely independent subsystem that can be used for arbitrary function-offloading to helper apps in domains. This patch removes the dependency that mem_access needs to be supported in order to enable vm_event. A new vm_event_resume function is introduced which pulls all responses off from given ring and delegates handling to appropriate helper functions (if necessary). By default, vm_event_resume just pulls the response from the ring and unpauses the corresponding vCPU. This approach reduces code duplication and present a single point of entry for the entire vm_event subsystem's response handling mechanism. Signed-off-by: Tamas K Lengyel tamas.leng...@zentific.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov Acked-by: Tim Deegan t...@xen.org --- v4: Consolidate resume routines into vm_event_resume Style fixes Sort xen/common/Makefile to be alphabetical v3: Move ring processing out from mem_access.c to monitor.c in common --- xen/arch/x86/mm/mem_sharing.c | 32 ++--- xen/arch/x86/mm/p2m.c | 62 xen/common/Makefile | 18 +- xen/common/mem_access.c | 31 +--- xen/common/vm_event.c | 72 +++-- xen/include/asm-x86/mem_sharing.h | 1 - xen/include/asm-x86/p2m.h | 2 +- xen/include/xen/mem_access.h| 14 ++-- xen/include/xen/vm_event.h | 58 ++ xen/include/xsm/dummy.h | 2 -- xen/include/xsm/xsm.h | 4 --- xen/xsm/dummy.c | 2 -- xen/xsm/flask/hooks.c | 36 --- xen/xsm/flask/policy/access_vectors | 8 ++--- 14 files changed, 128 insertions(+), 214 deletions(-) diff --git a/xen/arch/x86/mm/mem_sharing.c b/xen/arch/x86/mm/mem_sharing.c index e6572af..4959407 100644 --- a/xen/arch/x86/mm/mem_sharing.c +++ b/xen/arch/x86/mm/mem_sharing.c @@ -591,35 +591,6 @@ unsigned int mem_sharing_get_nr_shared_mfns(void) return (unsigned int)atomic_read(nr_shared_mfns); } -int mem_sharing_sharing_resume(struct domain *d) -{ -vm_event_response_t rsp; - -/* Get all requests off the ring */ -while ( vm_event_get_response(d, d-vm_event-share, rsp) ) -{ -struct vcpu *v; - -if ( rsp.version != VM_EVENT_INTERFACE_VERSION ) -{ -printk(XENLOG_G_WARNING vm_event interface version mismatch\n); -continue; -} - -/* Validate the vcpu_id in the response. */ -if ( (rsp.vcpu_id = d-max_vcpus) || !d-vcpu[rsp.vcpu_id] ) -continue; - -v = d-vcpu[rsp.vcpu_id]; - -/* Unpause domain/vcpu */ -if ( rsp.flags VM_EVENT_FLAG_VCPU_PAUSED ) -vm_event_vcpu_unpause(v); -} - -return 0; -} - /* Functions that change a page's type and ownership */ static int page_make_sharable(struct domain *d, struct page_info *page, @@ -1470,7 +1441,8 @@ int mem_sharing_memop(struct domain *d, xen_mem_sharing_op_t *mec) { if ( !mem_sharing_enabled(d) ) return -EINVAL; -rc = mem_sharing_sharing_resume(d); + +vm_event_resume(d, d-vm_event-share); } break; diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c index 4032c62..6403172 100644 --- a/xen/arch/x86/mm/p2m.c +++ b/xen/arch/x86/mm/p2m.c @@ -1279,13 +1279,13 @@ int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer) } /** - * p2m_mem_paging_resume - Resume guest gfn and vcpus + * p2m_mem_paging_resume - Resume guest gfn * @d: guest domain - * @gfn: guest page in paging state + * @rsp: vm_event response received + * + * p2m_mem_paging_resume() will forward the p2mt of a gfn to ram_rw. It is + * called by the pager. * - * p2m_mem_paging_resume() will forward the p2mt of a gfn to ram_rw and all - * waiting vcpus will be unpaused again. It is called by the pager. - * * The gfn was previously either evicted and populated, or nominated and * populated. If the page was evicted the p2mt will be p2m_ram_paging_in. If * the page was just nominated the p2mt will be p2m_ram_paging_in_start because @@ -1293,51 +1293,33 @@ int p2m_mem_paging_prep(struct domain *d, unsigned long gfn, uint64_t buffer) * * If the gfn was dropped the vcpu needs to be unpaused. */ -void p2m_mem_paging_resume(struct domain *d) + +void p2m_mem_paging_resume(struct domain *d, vm_event_response_t *rsp) { struct p2m_domain *p2m = p2m_get_hostp2m(d); -vm_event_response_t rsp; p2m_type_t p2mt; p2m_access_t a; mfn_t mfn; -/* Pull all responses off the ring */ -while( vm_event_get_response(d, d-vm_event-paging, rsp) ) +/* Fix p2m entry if the page was not dropped */ +if (
[Xen-devel] [PATCH v20 06/13] x86/VPMU: Initialize PMU for PV(H) guests
Code for initializing/tearing down PMU for PV guests Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Kevin Tian kevin.t...@intel.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov --- Changes in v20: * Moved page freeing/unmapping from under vpmu_lock in pvpmu_init()/pvpmu_finish(): * Using is_hardware_domain() instead of open-coding * Added comments to explain how vpmu_count is used. * Don't test d-vcpu as it is covered by preceding d-max_vcpus check tools/flask/policy/policy/modules/xen/xen.te | 4 + xen/arch/x86/domain.c| 2 + xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/svm/svm.c | 4 +- xen/arch/x86/hvm/svm/vpmu.c | 44 ++--- xen/arch/x86/hvm/vmx/vmx.c | 4 +- xen/arch/x86/hvm/vmx/vpmu_core2.c| 79 +++- xen/arch/x86/hvm/vpmu.c | 131 --- xen/common/event_channel.c | 1 + xen/include/asm-x86/hvm/vpmu.h | 2 + xen/include/public/pmu.h | 2 + xen/include/public/xen.h | 1 + xen/include/xsm/dummy.h | 3 + xen/xsm/flask/hooks.c| 4 + xen/xsm/flask/policy/access_vectors | 2 + 15 files changed, 232 insertions(+), 52 deletions(-) diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index 963ed44..c47369a 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -120,6 +120,10 @@ domain_comms(dom0_t, dom0_t) # Allow all domains to use (unprivileged parts of) the tmem hypercall allow domain_type xen_t:xen tmem_op; +# Allow all domains to use PMU (but not to change its settings --- that's what +# pmu_ctrl is for) +allow domain_type xen_t:xen2 pmu_use; + ### # # Domain creation diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 9d5a527..dd10223 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -438,6 +438,8 @@ int vcpu_initialise(struct vcpu *v) vmce_init_vcpu(v); } +spin_lock_init(v-arch.vpmu.vpmu_lock); + if ( has_hvm_container_domain(d) ) { rc = hvm_vcpu_initialise(v); diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index 3ff87c6..7fcbb3e 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -4914,6 +4914,7 @@ static hvm_hypercall_t *const pvh_hypercall64_table[NR_hypercalls] = { HYPERCALL(hvm_op), HYPERCALL(sysctl), HYPERCALL(domctl), +HYPERCALL(xenpmu_op), [ __HYPERVISOR_arch_1 ] = (hvm_hypercall_t *)paging_domctl_continuation }; diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index b6e77cd..e523d12 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -1166,7 +1166,9 @@ static int svm_vcpu_initialise(struct vcpu *v) return rc; } -vpmu_initialise(v); +/* PVH's VPMU is initialized via hypercall */ +if ( is_hvm_vcpu(v) ) +vpmu_initialise(v); svm_guest_osvw_init(v); diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index b60ca40..58a0dc4 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -360,17 +360,19 @@ static void amd_vpmu_destroy(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); -if ( has_hvm_container_vcpu(v) is_msr_bitmap_on(vpmu) ) -amd_vpmu_unset_msr_bitmap(v); +if ( has_hvm_container_vcpu(v) ) +{ +if ( is_msr_bitmap_on(vpmu) ) +amd_vpmu_unset_msr_bitmap(v); -xfree(vpmu-context); -vpmu_reset(vpmu, VPMU_CONTEXT_ALLOCATED); +if ( is_hvm_vcpu(v) ) +xfree(vpmu-context); -if ( vpmu_is_set(vpmu, VPMU_RUNNING) ) -{ -vpmu_reset(vpmu, VPMU_RUNNING); release_pmu_ownship(PMU_OWNER_HVM); } + +vpmu-context = NULL; +vpmu_clear(vpmu); } /* VPMU part of the 'q' keyhandler */ @@ -435,15 +437,19 @@ int svm_vpmu_initialise(struct vcpu *v) if ( !counters ) return -EINVAL; -ctxt = xzalloc_bytes(sizeof(*ctxt) + - 2 * sizeof(uint64_t) * num_counters); -if ( !ctxt ) +if ( is_hvm_vcpu(v) ) { -printk(XENLOG_G_WARNING Insufficient memory for PMU, -PMU feature is unavailable on domain %d vcpu %d.\n, - v-vcpu_id, v-domain-domain_id); -return -ENOMEM; +ctxt = xzalloc_bytes(sizeof(*ctxt) + + 2 * sizeof(uint64_t) * num_counters); +if ( !ctxt ) +{ +printk(XENLOG_G_WARNING %pv: Insufficient memory for PMU, +PMU feature is unavailable\n, v); +return -ENOMEM; +} } +else +ctxt =
[Xen-devel] [PATCH v20 10/13] x86/VPMU: Handle PMU interrupts for PV(H) guests
Add support for handling PMU interrupts for PV(H) guests. VPMU for the interrupted VCPU is unloaded until the guest issues XENPMU_flush hypercall. This allows the guest to access PMU MSR values that are stored in VPMU context which is shared between hypervisor and domain, thus avoiding traps to hypervisor. Since the interrupt handler may now force VPMU context save (i.e. set VPMU_CONTEXT_SAVE flag) we need to make changes to amd_vpmu_save() which until now expected this flag to be set only when the counters were stopped. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov Acked-by: Jan Beulich jbeul...@suse.com --- * Updated patch title (include PVH guests) * vpmu_lvtpc_update() initializes curr at definition time * Drop curr in do_xenpmu_op()'s XENPMU_lvtpc_set case. * Declared domid as domid_t type xen/arch/x86/hvm/svm/vpmu.c | 11 +- xen/arch/x86/hvm/vpmu.c | 211 +++--- xen/include/public/arch-x86/pmu.h | 6 ++ xen/include/public/pmu.h | 2 + xen/include/xsm/dummy.h | 4 +- xen/xsm/flask/hooks.c | 2 + 6 files changed, 215 insertions(+), 21 deletions(-) diff --git a/xen/arch/x86/hvm/svm/vpmu.c b/xen/arch/x86/hvm/svm/vpmu.c index 474d0db..0997901 100644 --- a/xen/arch/x86/hvm/svm/vpmu.c +++ b/xen/arch/x86/hvm/svm/vpmu.c @@ -228,17 +228,12 @@ static int amd_vpmu_save(struct vcpu *v) struct vpmu_struct *vpmu = vcpu_vpmu(v); unsigned int i; -/* - * Stop the counters. If we came here via vpmu_save_force (i.e. - * when VPMU_CONTEXT_SAVE is set) counters are already stopped. - */ +for ( i = 0; i num_counters; i++ ) +wrmsrl(ctrls[i], 0); + if ( !vpmu_is_set(vpmu, VPMU_CONTEXT_SAVE) ) { vpmu_set(vpmu, VPMU_FROZEN); - -for ( i = 0; i num_counters; i++ ) -wrmsrl(ctrls[i], 0); - return 0; } diff --git a/xen/arch/x86/hvm/vpmu.c b/xen/arch/x86/hvm/vpmu.c index 5fbb799..37e612a 100644 --- a/xen/arch/x86/hvm/vpmu.c +++ b/xen/arch/x86/hvm/vpmu.c @@ -85,31 +85,56 @@ static void __init parse_vpmu_param(char *s) void vpmu_lvtpc_update(uint32_t val) { struct vpmu_struct *vpmu; +struct vcpu *curr = current; -if ( vpmu_mode == XENPMU_MODE_OFF ) +if ( likely(vpmu_mode == XENPMU_MODE_OFF) ) return; -vpmu = vcpu_vpmu(current); +vpmu = vcpu_vpmu(curr); vpmu-hw_lapic_lvtpc = PMU_APIC_VECTOR | (val APIC_LVT_MASKED); -apic_write(APIC_LVTPC, vpmu-hw_lapic_lvtpc); + +/* Postpone APIC updates for PV(H) guests if PMU interrupt is pending */ +if ( is_hvm_vcpu(curr) || !vpmu-xenpmu_data || + !(vpmu-xenpmu_data-pmu.pmu_flags PMU_CACHED) ) +apic_write(APIC_LVTPC, vpmu-hw_lapic_lvtpc); } int vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported) { -struct vpmu_struct *vpmu = vcpu_vpmu(current); +struct vcpu *curr = current; +struct vpmu_struct *vpmu; if ( vpmu_mode == XENPMU_MODE_OFF ) return 0; +vpmu = vcpu_vpmu(curr); if ( vpmu-arch_vpmu_ops vpmu-arch_vpmu_ops-do_wrmsr ) -return vpmu-arch_vpmu_ops-do_wrmsr(msr, msr_content, supported); +{ +int ret = vpmu-arch_vpmu_ops-do_wrmsr(msr, msr_content, supported); + +/* + * We may have received a PMU interrupt during WRMSR handling + * and since do_wrmsr may load VPMU context we should save + * (and unload) it again. + */ +if ( !is_hvm_vcpu(curr) vpmu-xenpmu_data + (vpmu-xenpmu_data-pmu.pmu_flags PMU_CACHED) ) +{ +vpmu_set(vpmu, VPMU_CONTEXT_SAVE); +vpmu-arch_vpmu_ops-arch_vpmu_save(curr); +vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); +} +return ret; +} + return 0; } int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) { -struct vpmu_struct *vpmu = vcpu_vpmu(current); +struct vcpu *curr = current; +struct vpmu_struct *vpmu; if ( vpmu_mode == XENPMU_MODE_OFF ) { @@ -117,24 +142,163 @@ int vpmu_do_rdmsr(unsigned int msr, uint64_t *msr_content) return 0; } +vpmu = vcpu_vpmu(curr); if ( vpmu-arch_vpmu_ops vpmu-arch_vpmu_ops-do_rdmsr ) -return vpmu-arch_vpmu_ops-do_rdmsr(msr, msr_content); +{ +int ret = vpmu-arch_vpmu_ops-do_rdmsr(msr, msr_content); + +if ( !is_hvm_vcpu(curr) vpmu-xenpmu_data + (vpmu-xenpmu_data-pmu.pmu_flags PMU_CACHED) ) +{ +vpmu_set(vpmu, VPMU_CONTEXT_SAVE); +vpmu-arch_vpmu_ops-arch_vpmu_save(curr); +vpmu_reset(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED); +} +return ret; +} else *msr_content = 0; return 0; } +static inline struct vcpu *choose_hwdom_vcpu(void) +{ +unsigned idx; + +if ( hardware_domain-max_vcpus == 0 ) +return NULL;
[Xen-devel] [PATCH v20 09/13] x86/VPMU: Add support for PMU register handling on PV guests
Intercept accesses to PMU MSRs and process them in VPMU module. If vpmu ops for VCPU are not initialized (which is the case, for example, for PV guests that are not VPMU-enlightened) access to MSRs will return failure. Dump VPMU state for all domains (HVM and PV) when requested. Signed-off-by: Boris Ostrovsky boris.ostrov...@oracle.com Acked-by: Jan Beulich jbeul...@suse.com Acked-by: Kevin Tian kevin.t...@intel.com Reviewed-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com Tested-by: Dietmar Hahn dietmar.h...@ts.fujitsu.com --- xen/arch/x86/domain.c | 3 +-- xen/arch/x86/hvm/vmx/vpmu_core2.c | 49 +++-- xen/arch/x86/hvm/vpmu.c | 3 +++ xen/arch/x86/traps.c | 51 +-- 4 files changed, 95 insertions(+), 11 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 03bcbd3..d9f48a3 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -2066,8 +2066,7 @@ void arch_dump_vcpu_info(struct vcpu *v) { paging_dump_vcpu_info(v); -if ( is_hvm_vcpu(v) ) -vpmu_dump(v); +vpmu_dump(v); } void domain_cpuid( diff --git a/xen/arch/x86/hvm/vmx/vpmu_core2.c b/xen/arch/x86/hvm/vmx/vpmu_core2.c index a00d06c..fc89eb7 100644 --- a/xen/arch/x86/hvm/vmx/vpmu_core2.c +++ b/xen/arch/x86/hvm/vmx/vpmu_core2.c @@ -27,6 +27,7 @@ #include asm/regs.h #include asm/types.h #include asm/apic.h +#include asm/traps.h #include asm/msr.h #include asm/msr-index.h #include asm/hvm/support.h @@ -299,12 +300,18 @@ static inline void __core2_vpmu_save(struct vcpu *v) rdmsrl(MSR_CORE_PERF_FIXED_CTR0 + i, fixed_counters[i]); for ( i = 0; i arch_pmc_cnt; i++ ) rdmsrl(MSR_IA32_PERFCTR0 + i, xen_pmu_cntr_pair[i].counter); + +if ( !has_hvm_container_vcpu(v) ) +rdmsrl(MSR_CORE_PERF_GLOBAL_STATUS, core2_vpmu_cxt-global_status); } static int core2_vpmu_save(struct vcpu *v) { struct vpmu_struct *vpmu = vcpu_vpmu(v); +if ( !has_hvm_container_vcpu(v) ) +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, 0); + if ( !vpmu_are_all_set(vpmu, VPMU_CONTEXT_SAVE | VPMU_CONTEXT_LOADED) ) return 0; @@ -342,6 +349,13 @@ static inline void __core2_vpmu_load(struct vcpu *v) wrmsrl(MSR_CORE_PERF_FIXED_CTR_CTRL, core2_vpmu_cxt-fixed_ctrl); wrmsrl(MSR_IA32_DS_AREA, core2_vpmu_cxt-ds_area); wrmsrl(MSR_IA32_PEBS_ENABLE, core2_vpmu_cxt-pebs_enable); + +if ( !has_hvm_container_vcpu(v) ) +{ +wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, core2_vpmu_cxt-global_ovf_ctrl); +core2_vpmu_cxt-global_ovf_ctrl = 0; +wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-global_ctrl); +} } static void core2_vpmu_load(struct vcpu *v) @@ -442,7 +456,6 @@ static int core2_vpmu_msr_common_check(u32 msr_index, int *type, int *index) static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, uint64_t supported) { -u64 global_ctrl; int i, tmp; int type = -1, index = -1; struct vcpu *v = current; @@ -486,7 +499,12 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, switch ( msr ) { case MSR_CORE_PERF_GLOBAL_OVF_CTRL: +if ( msr_content ~(0xC000 | + (((1ULL fixed_pmc_cnt) - 1) 32) | + ((1ULL arch_pmc_cnt) - 1)) ) +return 1; core2_vpmu_cxt-global_status = ~msr_content; +wrmsrl(MSR_CORE_PERF_GLOBAL_OVF_CTRL, msr_content); return 0; case MSR_CORE_PERF_GLOBAL_STATUS: gdprintk(XENLOG_INFO, Can not write readonly MSR: @@ -514,14 +532,18 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, gdprintk(XENLOG_WARNING, Guest setting of DTS is ignored.\n); return 0; case MSR_CORE_PERF_GLOBAL_CTRL: -global_ctrl = msr_content; +core2_vpmu_cxt-global_ctrl = msr_content; break; case MSR_CORE_PERF_FIXED_CTR_CTRL: if ( msr_content ( ~((1ull (fixed_pmc_cnt * FIXED_CTR_CTRL_BITS)) - 1)) ) return 1; -vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); +if ( has_hvm_container_vcpu(v) ) +vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, + core2_vpmu_cxt-global_ctrl); +else +rdmsrl(MSR_CORE_PERF_GLOBAL_CTRL, core2_vpmu_cxt-global_ctrl); *enabled_cntrs = ~(((1ULL fixed_pmc_cnt) - 1) 32); if ( msr_content != 0 ) { @@ -546,7 +568,11 @@ static int core2_vpmu_do_wrmsr(unsigned int msr, uint64_t msr_content, if ( msr_content (~((1ull 32) - 1)) ) return 1; -vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, global_ctrl); +if ( has_hvm_container_vcpu(v) ) +vmx_read_guest_msr(MSR_CORE_PERF_GLOBAL_CTRL, +
[Xen-devel] [PATCH v4 10/12] xsm: add CAT related xsm policies
Add xsm policies for Cache Allocation Technology(CAT) related hypercalls to restrict the functions visibility to control domain only. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com Acked-by: Daniel De Graaf dgde...@tycho.nsa.gov --- tools/flask/policy/policy/modules/xen/xen.if | 2 +- tools/flask/policy/policy/modules/xen/xen.te | 4 +++- xen/xsm/flask/hooks.c| 6 ++ xen/xsm/flask/policy/access_vectors | 6 ++ 4 files changed, 16 insertions(+), 2 deletions(-) diff --git a/tools/flask/policy/policy/modules/xen/xen.if b/tools/flask/policy/policy/modules/xen/xen.if index 2d32e1c..8bb081a 100644 --- a/tools/flask/policy/policy/modules/xen/xen.if +++ b/tools/flask/policy/policy/modules/xen/xen.if @@ -51,7 +51,7 @@ define(`create_domain_common', ` getaffinity setaffinity setvcpuextstate }; allow $1 $2:domain2 { set_cpuid settsc setscheduler setclaim set_max_evtchn set_vnumainfo get_vnumainfo cacheflush - psr_cmt_op configure_domain }; + psr_cmt_op configure_domain psr_cat_op }; allow $1 $2:security check_context; allow $1 $2:shadow enable; allow $1 $2:mmu { map_read map_write adjust memorymap physmap pinpage mmuext_op updatemp }; diff --git a/tools/flask/policy/policy/modules/xen/xen.te b/tools/flask/policy/policy/modules/xen/xen.te index c0128aa..d431aaf 100644 --- a/tools/flask/policy/policy/modules/xen/xen.te +++ b/tools/flask/policy/policy/modules/xen/xen.te @@ -67,6 +67,7 @@ allow dom0_t xen_t:xen { allow dom0_t xen_t:xen2 { resource_op psr_cmt_op +psr_cat_op }; allow dom0_t xen_t:mmu memorymap; @@ -80,7 +81,8 @@ allow dom0_t dom0_t:domain { getpodtarget setpodtarget set_misc_info set_virq_handler }; allow dom0_t dom0_t:domain2 { - set_cpuid gettsc settsc setscheduler set_max_evtchn set_vnumainfo get_vnumainfo psr_cmt_op + set_cpuid gettsc settsc setscheduler set_max_evtchn set_vnumainfo + get_vnumainfo psr_cmt_op psr_cat_op }; allow dom0_t dom0_t:resource { add remove }; diff --git a/xen/xsm/flask/hooks.c b/xen/xsm/flask/hooks.c index 05dafed..8964321 100644 --- a/xen/xsm/flask/hooks.c +++ b/xen/xsm/flask/hooks.c @@ -729,6 +729,9 @@ static int flask_domctl(struct domain *d, int cmd) case XEN_DOMCTL_psr_cmt_op: return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PSR_CMT_OP); +case XEN_DOMCTL_psr_cat_op: +return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__PSR_CAT_OP); + case XEN_DOMCTL_arm_configure_domain: return current_has_perm(d, SECCLASS_DOMAIN2, DOMAIN2__CONFIGURE_DOMAIN); @@ -790,6 +793,9 @@ static int flask_sysctl(int cmd) case XEN_SYSCTL_psr_cmt_op: return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2, XEN2__PSR_CMT_OP, NULL); +case XEN_SYSCTL_psr_cat_op: +return avc_current_has_perm(SECINITSID_XEN, SECCLASS_XEN2, +XEN2__PSR_CAT_OP, NULL); default: printk(flask_sysctl: Unknown op %d\n, cmd); diff --git a/xen/xsm/flask/policy/access_vectors b/xen/xsm/flask/policy/access_vectors index 8f44b9d..8cc1ef3 100644 --- a/xen/xsm/flask/policy/access_vectors +++ b/xen/xsm/flask/policy/access_vectors @@ -84,6 +84,9 @@ class xen2 resource_op # XEN_SYSCTL_psr_cmt_op psr_cmt_op +# XEN_SYSCTL_psr_cat_op +psr_cat_op + } # Classes domain and domain2 consist of operations that a domain performs on @@ -221,6 +224,9 @@ class domain2 psr_cmt_op # XEN_DOMCTL_configure_domain configure_domain +# XEN_DOMCTL_psr_cat_op +psr_cat_op + } # Similar to class domain, but primarily contains domctls related to HVM domains -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 12/12] docs: add xl-psr.markdown
Add document to introduce basic concepts and terms in PSR family techonologies and the xl/libxl interfaces. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- docs/man/xl.pod.1 | 7 +++ docs/misc/xl-psr.markdown | 111 ++ 2 files changed, 118 insertions(+) create mode 100644 docs/misc/xl-psr.markdown diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index dfab921..b71d6e6 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1472,6 +1472,9 @@ occupancy monitoring share the same set of underlying monitoring service. Once a domain is attached to the monitoring service, monitoring data can be showed for any of these monitoring types. +See Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html for more +informations. + =over 4 =item Bpsr-cmt-attach [Idomain-id] @@ -1501,6 +1504,9 @@ applications. In Xen implementation, CAT is used to control cache allocation on VM basis. To enforce cache on a specific domain, just set capacity bitmasks (CBM) for the domain. +See Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html for more +informations. + =over 4 =item Bpsr-cat-cbm-set [IOPTIONS] [Idomain-id] [Icbm] @@ -1546,6 +1552,7 @@ And the following documents on the xen.org website: Lhttp://xenbits.xen.org/docs/unstable/misc/xl-network-configuration.html Lhttp://xenbits.xen.org/docs/unstable/misc/xl-disk-configuration.txt Lhttp://xenbits.xen.org/docs/unstable/misc/xsm-flask.txt +Lhttp://xenbits.xen.org/docs/unstable/misc/xl-psr.html For systems that don't automatically bring CPU online: diff --git a/docs/misc/xl-psr.markdown b/docs/misc/xl-psr.markdown new file mode 100644 index 000..44f6f8c --- /dev/null +++ b/docs/misc/xl-psr.markdown @@ -0,0 +1,111 @@ +# Intel Platform Shared Resource Monitoring/Control in xl/libxl + +This document introduces Intel Platform Shared Resource Monitoring/Control +technologies, their basic concepts and the xl/libxl interfaces. + +## Cache Monitoring Technology (CMT) + +Cache Monitoring Technology (CMT) is a new feature available on Intel Haswell +and later server platforms that allows an OS or Hypervisor/VMM to determine +the usage of cache(currently only L3 cache supported) by applications running +on the platform. A Resource Monitoring ID (RMID) is the abstraction of the +application(s) that will be monitored for its cache usage. The CMT hardware +tracks cache utilization of memory accesses according to the RMID and reports +monitored data via a counter register. + +Detailed information please refer to Intel SDM chapter 17.14. + +In Xen's implementation, each domain in the system can be assigned a RMID +independently, while RMID=0 is reserved for monitoring domains that doesn't +enable CMT service. RMID is opaque for xl/libxl and is only used in +hypervisor. + +### xl interfaces + +A domain is assigned a RMID implicitly by attaching it to CMT service: + +xl psr-cmt-attach domid + +After that, cache usage for the domain can be showed by: + +xl psr-cmt-show cache_occupancy domid + +Once monitoring is not needed any more, the domain can be detached from the +CMT service by: + +xl psr-cmt-detach domid + +The attaching may fail because of no free RMID available. In such case +unused RMID(s) can be freed by detaching corresponding domains from CMT +services. Maximum COS number in the system can also be obtained by: + +xl psr_cmt-show + +## Memory Bandwidth Monitoring (MBM) + +Memory Bandwidth Monitoring(MBM) is a new hardware feature available on Intel +Broadwell and later server platforms which builds on the CMT infrastructure to +allow monitoring of system memory bandwidth. It introduces two new monitoring +event type to monitor system total/local memory bandwidth. The same RMID can +be used to monitor both cache usage and memory bandwidth at the same time. + +Detailed information please refer to Intel SDM chapter 17.14. + +In Xen's implementation, MBM shares the same set of underlying monitoring +service with CMT and can be used to monitor memory bandwidth on domain basis. + +The xl/libxl interface is the same with that of CMT. The difference is the +monitor type is corresponding memory monitoring type(local_mem_bandwidth/ +total_mem_bandwidth) but not cache_occupancy. + +## Cache Allocation Technology (CAT) + +Cache Allocation Technology (CAT) is a new feature available on Intel +Broadwell and later server platforms that allows an OS or Hypervisor/VMM to +partition cache allocation(i.e. L3 cache) based on application priority or +Class of Service(COS). Each COS is configured using capacity bitmasks (CBM) +which represent cache capacity and indicate the degree of overlap and +isolation between classes. System cache resource is divided into numbers of +minimum portions which is then made up into subset for cache partition. Each +portion corresponds to a bit in CBM and the set bit represents the +corresponding cache portion is available. + +Detailed information please refer to Intel SDM
[Xen-devel] [PATCH v4 02/12] x86: improve psr scheduling code
Switching RMID from previous vcpu to next vcpu only needs to write MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough, no need to write '0' first. Idle domain has RMID set to 0 and because MSR is already updated lazily, so just switch it as it does. Also move the initialization of per-CPU variable which used for lazy update from context switch to CPU starting. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v4: * Move psr_assoc_reg_read/psr_assoc_reg_write into psr_ctxt_switch_to. * Use 0 instead of smp_processor_id() for boot cpu. * add cpu parameter to psr_assoc_init. Changes in v2: * Move initialization for psr_assoc from context switch to CPU_STARTING. --- xen/arch/x86/domain.c | 7 ++--- xen/arch/x86/psr.c| 75 ++- xen/include/asm-x86/psr.h | 3 +- 3 files changed, 59 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 04c1898..695a2eb 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1444,8 +1444,6 @@ static void __context_switch(void) { memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES); vcpu_save_fpu(p); -if ( psr_cmt_enabled() ) -psr_assoc_rmid(0); p-arch.ctxt_switch_from(p); } @@ -1470,11 +1468,10 @@ static void __context_switch(void) } vcpu_restore_fpu_eager(n); n-arch.ctxt_switch_to(n); - -if ( psr_cmt_enabled() n-domain-arch.psr_rmid 0 ) -psr_assoc_rmid(n-domain-arch.psr_rmid); } +psr_ctxt_switch_to(n-domain); + gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) : per_cpu(compat_gdt_table, cpu); if ( need_full_gdt(n) ) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 344de3c..6119c6e 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -22,7 +22,6 @@ struct psr_assoc { uint64_t val; -bool_t initialized; }; struct psr_cmt *__read_mostly psr_cmt; @@ -122,14 +121,6 @@ static void __init init_psr_cmt(unsigned int rmid_max) printk(XENLOG_INFO Cache Monitoring Technology enabled\n); } -static int __init init_psr(void) -{ -if ( (opt_psr PSR_CMT) opt_rmid_max ) -init_psr_cmt(opt_rmid_max); -return 0; -} -__initcall(init_psr); - /* Called with domain lock held, no psr specific lock needed */ int psr_alloc_rmid(struct domain *d) { @@ -175,26 +166,70 @@ void psr_free_rmid(struct domain *d) d-arch.psr_rmid = 0; } -void psr_assoc_rmid(unsigned int rmid) +static inline void psr_assoc_init(unsigned int cpu) +{ +struct psr_assoc *psra = per_cpu(psr_assoc, cpu); + +if ( psr_cmt_enabled() ) +rdmsrl(MSR_IA32_PSR_ASSOC, psra-val); +} + +static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid) +{ +*reg = (*reg ~rmid_mask) | (rmid rmid_mask); +} + +void psr_ctxt_switch_to(struct domain *d) { -uint64_t val; -uint64_t new_val; struct psr_assoc *psra = this_cpu(psr_assoc); +uint64_t reg = psra-val; + +if ( psr_cmt_enabled() ) +psr_assoc_rmid(reg, d-arch.psr_rmid); -if ( !psra-initialized ) +if ( reg != psra-val ) { -rdmsrl(MSR_IA32_PSR_ASSOC, psra-val); -psra-initialized = 1; +wrmsrl(MSR_IA32_PSR_ASSOC, reg); +psra-val = reg; } -val = psra-val; +} -new_val = (val ~rmid_mask) | (rmid rmid_mask); -if ( val != new_val ) +static void psr_cpu_init(unsigned int cpu) +{ +psr_assoc_init(cpu); +} + +static int cpu_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +unsigned int cpu = (unsigned long)hcpu; + +switch ( action ) { -wrmsrl(MSR_IA32_PSR_ASSOC, new_val); -psra-val = new_val; +case CPU_STARTING: +psr_cpu_init(cpu); +break; } + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_nfb = { +.notifier_call = cpu_callback +}; + +static int __init psr_presmp_init(void) +{ +if ( (opt_psr PSR_CMT) opt_rmid_max ) +init_psr_cmt(opt_rmid_max); + +psr_cpu_init(0); +if ( psr_cmt_enabled() ) +register_cpu_notifier(cpu_nfb); + +return 0; } +presmp_initcall(psr_presmp_init); /* * Local variables: diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h index c6076e9..585350c 100644 --- a/xen/include/asm-x86/psr.h +++ b/xen/include/asm-x86/psr.h @@ -46,7 +46,8 @@ static inline bool_t psr_cmt_enabled(void) int psr_alloc_rmid(struct domain *d); void psr_free_rmid(struct domain *d); -void psr_assoc_rmid(unsigned int rmid); + +void psr_ctxt_switch_to(struct domain *d); #endif /* __ASM_PSR_H__ */ -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 07/12] x86: expose CBM length and COS number information
General CAT information such as maximum COS and CBM length are exposed to user space by a SYSCTL hypercall, to help user space to construct the CBM. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/psr.c | 31 +++ xen/arch/x86/sysctl.c | 18 ++ xen/include/asm-x86/psr.h | 3 +++ xen/include/public/sysctl.h | 16 4 files changed, 68 insertions(+) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 51faa70..e390fd9 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -221,6 +221,37 @@ void psr_ctxt_switch_to(struct domain *d) } } +static int get_cat_socket_info(unsigned int socket, + struct psr_cat_socket_info **info) +{ +if ( !cat_socket_info ) +return -ENODEV; + +if ( socket = nr_sockets ) +return -EBADSLT; + +if ( !cat_socket_info[socket].enabled ) +return -ENOENT; + +*info = cat_socket_info + socket; +return 0; +} + +int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len, +uint32_t *cos_max) +{ +struct psr_cat_socket_info *info; +int ret = get_cat_socket_info(socket, info); + +if ( ret ) +return ret; + +*cbm_len = info-cbm_len; +*cos_max = info-cos_max; + +return 0; +} + /* Called with domain lock held, no psr specific lock needed */ static void psr_free_cos(struct domain *d) { diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index 611a291..8a9e120 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -171,6 +171,24 @@ long arch_do_sysctl( break; +case XEN_SYSCTL_psr_cat_op: +switch ( sysctl-u.psr_cat_op.cmd ) +{ +case XEN_SYSCTL_PSR_CAT_get_l3_info: +ret = psr_get_cat_l3_info(sysctl-u.psr_cat_op.target, + sysctl-u.psr_cat_op.u.l3_info.cbm_len, + sysctl-u.psr_cat_op.u.l3_info.cos_max); + +if ( !ret __copy_to_guest(u_sysctl, sysctl, 1) ) +ret = -EFAULT; + +break; +default: +ret = -EOPNOTSUPP; +break; +} +break; + default: ret = -ENOSYS; break; diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h index 45392bf..3a8a406 100644 --- a/xen/include/asm-x86/psr.h +++ b/xen/include/asm-x86/psr.h @@ -52,6 +52,9 @@ void psr_free_rmid(struct domain *d); void psr_ctxt_switch_to(struct domain *d); +int psr_get_cat_l3_info(unsigned int socket, uint32_t *cbm_len, +uint32_t *cos_max); + int psr_domain_init(struct domain *d); void psr_domain_free(struct domain *d); diff --git a/xen/include/public/sysctl.h b/xen/include/public/sysctl.h index 8552dc6..91d90b8 100644 --- a/xen/include/public/sysctl.h +++ b/xen/include/public/sysctl.h @@ -656,6 +656,20 @@ struct xen_sysctl_psr_cmt_op { typedef struct xen_sysctl_psr_cmt_op xen_sysctl_psr_cmt_op_t; DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cmt_op_t); +#define XEN_SYSCTL_PSR_CAT_get_l3_info 0 +struct xen_sysctl_psr_cat_op { +uint32_t cmd; /* IN: XEN_SYSCTL_PSR_CAT_* */ +uint32_t target;/* IN: socket to be operated on */ +union { +struct { +uint32_t cbm_len; /* OUT: CBM length */ +uint32_t cos_max; /* OUT: Maximum COS */ +} l3_info; +} u; +}; +typedef struct xen_sysctl_psr_cat_op xen_sysctl_psr_cat_op_t; +DEFINE_XEN_GUEST_HANDLE(xen_sysctl_psr_cat_op_t); + struct xen_sysctl { uint32_t cmd; #define XEN_SYSCTL_readconsole1 @@ -678,6 +692,7 @@ struct xen_sysctl { #define XEN_SYSCTL_scheduler_op 19 #define XEN_SYSCTL_coverage_op 20 #define XEN_SYSCTL_psr_cmt_op21 +#define XEN_SYSCTL_psr_cat_op22 uint32_t interface_version; /* XEN_SYSCTL_INTERFACE_VERSION */ union { struct xen_sysctl_readconsole readconsole; @@ -700,6 +715,7 @@ struct xen_sysctl { struct xen_sysctl_scheduler_op scheduler_op; struct xen_sysctl_coverage_op coverage_op; struct xen_sysctl_psr_cmt_oppsr_cmt_op; +struct xen_sysctl_psr_cat_oppsr_cat_op; uint8_t pad[128]; } u; }; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On 04/09/2015 02:23 PM, Peter Zijlstra wrote: On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote: On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket)) +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) +{ + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits); + struct pv_hash_bucket *hb, *end; + + if (!hash) + hash = 1; + + init_hash = hash; + hb =pv_lock_hash[hash_align(hash)]; + for (;;) { + for (end = hb + PV_HB_PER_LINE; hb end; hb++) { + if (!cmpxchg(hb-lock, NULL, lock)) { + WRITE_ONCE(hb-node, node); + /* +* We haven't set the _Q_SLOW_VAL yet. So +* the order of writing doesn't matter. +*/ + smp_wmb(); /* matches rmb from pv_hash_find */ + goto done; + } + } + + hash = lfsr(hash, pv_lock_hash_bits, 0); Since pv_lock_hash_bits is a variable, you end up running through that massive if() forest to find the corresponding tap every single time. It cannot compile-time optimize it. Hence: hash = lfsr(hash, pv_taps); (I don't get the bits argument to the lfsr). In any case, like I said before, I think we should try a linear probe sequence first, the lfsr was over engineering from my side. + hb =pv_lock_hash[hash_align(hash)]; So one thing this does -- and one of the reasons I figured I should ditch the LFSR instead of fixing it -- is that you end up scanning each bucket HB_PER_LINE times. I am aware of that when I was trying to add the hash table debug code, but I want to get the code out for review and so hasn't made any change yet. I have just done testing by adding some debug code to check the hashing efficiency. With the kernel build workload, with over 1M calls to pv_hash(), all of them get an empty entry on the first try. Maybe the minimum hash table size of 256 helps. The 'fix' would be to LFSR on cachelines instead of HBs but then you're stuck with the 0-th cacheline. This should not be a big problem. I just need to add a check at the end of the for loop that if hash is 0, change it to a certain non-0 value instead of calling lfsr(). As for ditching the lfsr idea, I am fine with that. So there will be 4 entries (1 cacheline) for each hash value. If all the entries are full, we proceed to the next cacheline. Right? Cheers, Longman ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 3/3] Revert x86/hvm: wait for at least one ioreq server to be enabled
This reverts commit dd748d128d86996592afafea02e578cc7d4e6d42. We don't need this workaround anymore since we have fixed the toolstack interlock problem that affects stubdom. Signed-off-by: Wei Liu wei.l...@citrix.com Cc: Paul Durrant paul.durr...@citrix.com Cc: Jan Beulich jbeul...@suse.com Acked-by: Ian Campbell ian.campb...@citrix.com --- xen/arch/x86/hvm/hvm.c | 21 - xen/include/asm-x86/hvm/domain.h | 1 - 2 files changed, 22 deletions(-) diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c index bfde380..8b62296 100644 --- a/xen/arch/x86/hvm/hvm.c +++ b/xen/arch/x86/hvm/hvm.c @@ -893,13 +893,6 @@ static void hvm_ioreq_server_enable(struct hvm_ioreq_server *s, done: spin_unlock(s-lock); - -/* This check is protected by the domain ioreq server lock. */ -if ( d-arch.hvm_domain.ioreq_server.waiting ) -{ -d-arch.hvm_domain.ioreq_server.waiting = 0; -domain_unpause(d); -} } static void hvm_ioreq_server_disable(struct hvm_ioreq_server *s, @@ -1451,20 +1444,6 @@ int hvm_domain_initialise(struct domain *d) spin_lock_init(d-arch.hvm_domain.ioreq_server.lock); INIT_LIST_HEAD(d-arch.hvm_domain.ioreq_server.list); - -/* - * In the case where a stub domain is providing emulation for - * the guest, there is no interlock in the toolstack to prevent - * the guest from running before the stub domain is ready. - * Hence the domain must remain paused until at least one ioreq - * server is created and enabled. - */ -if ( !is_pvh_domain(d) ) -{ -domain_pause(d); -d-arch.hvm_domain.ioreq_server.waiting = 1; -} - spin_lock_init(d-arch.hvm_domain.irq_lock); spin_lock_init(d-arch.hvm_domain.uc_lock); diff --git a/xen/include/asm-x86/hvm/domain.h b/xen/include/asm-x86/hvm/domain.h index 0702bf5..2757c7f 100644 --- a/xen/include/asm-x86/hvm/domain.h +++ b/xen/include/asm-x86/hvm/domain.h @@ -83,7 +83,6 @@ struct hvm_domain { struct { spinlock_t lock; ioservid_t id; -bool_t waiting; struct list_head list; } ioreq_server; struct hvm_ioreq_server *default_ioreq_server; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2] xen/pci: Try harder to get PXM information for Xen
If the device being added to Xen is not contained in the ACPI table, walk the PCI device tree to find a parent that is contained in the ACPI table before finding the PXM information from this device. Previously, it would try to get a handle for the device, then the device's bridge, then the physfn. This changes the order so that it tries to get a handle for the device, then the physfn, the walks up the PCI device tree. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- drivers/xen/pci.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 95ee430..7494dbe 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -19,6 +19,7 @@ #include linux/pci.h #include linux/acpi.h +#include linux/pci-acpi.h #include xen/xen.h #include xen/interface/physdev.h #include xen/interface/xen.h @@ -67,12 +68,22 @@ static int xen_add_device(struct device *dev) #ifdef CONFIG_ACPI handle = ACPI_HANDLE(pci_dev-dev); - if (!handle pci_dev-bus-bridge) - handle = ACPI_HANDLE(pci_dev-bus-bridge); #ifdef CONFIG_PCI_IOV if (!handle pci_dev-is_virtfn) handle = ACPI_HANDLE(physfn-bus-bridge); #endif + if (!handle) { + /* +* This device was not listed in the ACPI name space at +* all. Try to get acpi handle of parent pci bus. +*/ + struct pci_bus *pbus; + for (pbus = pci_dev-bus; pbus; pbus = pbus-parent) { + handle = acpi_pci_get_bridge_handle(pbus); + if (handle) + break; + } + } if (handle) { acpi_status status; -- 2.1.0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/6] x86/link: Introduce and use __bss_end
At 18:26 +0100 on 07 Apr (1428431178), Andrew Cooper wrote: No functional change. Signed-off-by: Andrew Cooper andrew.coop...@citrix.com Reviewed-by: Tim Deegan t...@xen.org ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem
On Thu, Apr 9, 2015 at 1:03 PM, Tim Deegan t...@xen.org wrote: Hi, Sorry for the delay - I have been away. At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote: Tamas K Lengyel (12): xen/mem_event: Cleanup of mem_event structures xen/mem_event: Cleanup mem_event names in rings, functions and domctls xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup xen: Rename mem_event to vm_event tools/tests: Clean-up tools/tests/xen-access x86/hvm: factor out and rename vm_event related functions I have applied these six patches. xen: Introduce monitor_op domctl This one no longer applies cleanly - looks like a conflict with a7511905 (xen: Extend DOMCTL createdomain to support arch configuration) Can you rebase the second half of the series please? Absolutely. Will be sending it shortly, thanks. Tamas Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2 11/15] xen: check for initrd conflicting with e820 map
On 09/04/2015 07:55, Juergen Gross wrote: Check whether the initrd is placed at a location which is conflicting with the target E820 map. If this is the case relocate it to a new area unused up to now and compliant to the E820 map. Reviewed-by: David Vrabel david.vra...@citrix.com David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 05/10] VMX: add help functions to support PML
At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote: +void vmx_vcpu_disable_pml(struct vcpu *v) +{ +ASSERT(vmx_vcpu_pml_enabled(v)); + I think this function ought to call vmx_vcpu_flush_pml_buffer() before disabling PML. That way we don't need to worry about losing any information if a guest vcpu is reset or offlined during migration. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 00/10] PML (Paging Modification Logging) support
At 10:24 +0100 on 07 Apr (1428402277), Tim Deegan wrote: Hi, At 16:30 +0800 on 07 Apr (1428424218), Kai Huang wrote: Hi Jan, Tim, other maintainers, Do you have comments? Or should I send out the v2 addressing Andrew's comments, as it's been more than a week since this patch series were sent out? I'm sorry, I was away last week so I haven't had a chance to review these patches. I'll probably be able to look at them on Thursday. Done. They seem to be in good shape for a first cut! I've commented on the patches where there was anything I think needs improvement. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2 10/15] xen: check pre-allocated page tables for conflict with memory map
On 04/09/2015 02:47 PM, David Vrabel wrote: On 09/04/2015 07:55, Juergen Gross wrote: Check whether the page tables built by the domain builder are at memory addresses which are in conflict with the target memory map. If this is the case just panic instead of running into problems later. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/mmu.c | 19 --- arch/x86/xen/setup.c | 6 ++ arch/x86/xen/xen-ops.h | 1 + 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 1ca5197..41aeb1c 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -116,6 +116,7 @@ static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss; DEFINE_PER_CPU(unsigned long, xen_cr3);/* cr3 stored as physaddr */ DEFINE_PER_CPU(unsigned long, xen_current_cr3);/* actual vcpu cr3 */ +static phys_addr_t xen_pt_base, xen_pt_size; These be __init, but the use of globals in this way is confusing. How else would you want to do it? /* * Just beyond the highest usermode address. STACK_TOP_MAX has a @@ -1998,7 +1999,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn) check_pt_base(pt_base, pt_end, addr[i]); /* Our (by three pages) smaller Xen pagetable that we are using */ - memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE); + xen_pt_base = PFN_PHYS(pt_base); + xen_pt_size = (pt_end - pt_base) * PAGE_SIZE; + memblock_reserve(xen_pt_base, xen_pt_size); Why not provide a xen_memblock_check_and_reserve() call that has the xen_is_e820_reserved() check and the memblock_reserve() call? This may also be useful for patch #9 as well. Uuh, not really. memblock_reserve() for those areas is called much earlier than the e820 map is constructed. Thinking more about it, I even have to modify patch 11 and 13: relocation must be done _after_ doing the memblock_reserve() of all pre-populated areas to avoid relocating to such an area. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: +++ b/kernel/locking/qspinlock_paravirt.h @@ -0,0 +1,321 @@ +#ifndef _GEN_PV_LOCK_SLOWPATH +#error do not include this file +#endif + +/* + * Implement paravirt qspinlocks; the general idea is to halt the vcpus instead + * of spinning them. + * + * This relies on the architecture to provide two paravirt hypercalls: + * + * pv_wait(u8 *ptr, u8 val) -- suspends the vcpu if *ptr == val + * pv_kick(cpu) -- wakes a suspended vcpu + * + * Using these we implement __pv_queue_spin_lock_slowpath() and + * __pv_queue_spin_unlock() to replace native_queue_spin_lock_slowpath() and + * native_queue_spin_unlock(). + */ + +#define _Q_SLOW_VAL (3U _Q_LOCKED_OFFSET) + +enum vcpu_state { + vcpu_running = 0, + vcpu_halted, +}; + +struct pv_node { + struct mcs_spinlock mcs; + struct mcs_spinlock __res[3]; + + int cpu; + u8 state; +}; + +/* + * Hash table using open addressing with an LFSR probe sequence. + * + * Since we should not be holding locks from NMI context (very rare indeed) the + * max load factor is 0.75, which is around the point where open addressing + * breaks down. + * + * Instead of probing just the immediate bucket we probe all buckets in the + * same cacheline. + * + * http://en.wikipedia.org/wiki/Hash_table#Open_addressing + * + * Dynamically allocate a hash table big enough to hold at least 4X the + * number of possible cpus in the system. Allocation is done on page + * granularity. So the minimum number of hash buckets should be at least + * 256 to fully utilize a 4k page. + */ +#define LFSR_MIN_BITS8 +#define LFSR_MAX_BITS (2 + NR_CPUS_BITS) +#if LFSR_MAX_BITS LFSR_MIN_BITS +#undef LFSR_MAX_BITS +#define LFSR_MAX_BITSLFSR_MIN_BITS +#endif + +struct pv_hash_bucket { + struct qspinlock *lock; + struct pv_node *node; +}; +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket)) +#define HB_RESERVED ((struct qspinlock *)1) This is unused. + +static struct pv_hash_bucket *pv_lock_hash; +static unsigned int pv_lock_hash_bits __read_mostly; static unsigned int pv_taps __read_mostly; + +#include linux/hash.h +#include linux/lfsr.h +#include linux/bootmem.h + +/* + * Allocate memory for the PV qspinlock hash buckets + * + * This function should be called from the paravirt spinlock initialization + * routine. + */ +void __init __pv_init_lock_hash(void) +{ + int pv_hash_size = 4 * num_possible_cpus(); + + if (pv_hash_size (1U LFSR_MIN_BITS)) + pv_hash_size = (1U LFSR_MIN_BITS); + /* + * Allocate space from bootmem which should be page-size aligned + * and hence cacheline aligned. + */ + pv_lock_hash = alloc_large_system_hash(PV qspinlock, +sizeof(struct pv_hash_bucket), +pv_hash_size, 0, HASH_EARLY, +pv_lock_hash_bits, NULL, +pv_hash_size, pv_hash_size); pv_taps = lfsr_taps(pv_lock_hash_bits); +} + +static inline u32 hash_align(u32 hash) +{ + return hash ~(PV_HB_PER_LINE - 1); +} + +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) +{ + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits); + struct pv_hash_bucket *hb, *end; + + if (!hash) + hash = 1; + + init_hash = hash; + hb = pv_lock_hash[hash_align(hash)]; + for (;;) { + for (end = hb + PV_HB_PER_LINE; hb end; hb++) { + if (!cmpxchg(hb-lock, NULL, lock)) { + WRITE_ONCE(hb-node, node); + /* + * We haven't set the _Q_SLOW_VAL yet. So + * the order of writing doesn't matter. + */ + smp_wmb(); /* matches rmb from pv_hash_find */ This doesn't make sense. Both sites do -lock first and -node second. No amount of ordering can 'fix' that. I think we can safely remove this wmb and the rmb below, because the required ordering is already provided by setting/observing l-locked == SLOW. + goto done; + } + } + + hash = lfsr(hash, pv_lock_hash_bits, 0); Since pv_lock_hash_bits is a variable, you end up running through that massive if() forest to find the corresponding tap every single time. It cannot compile-time optimize it. Hence: hash = lfsr(hash, pv_taps); (I don't get the bits argument to the lfsr). In any case, like I said before, I think we should try a linear probe sequence first, the lfsr was over engineering from my side.
Re: [Xen-devel] [Patch V2 10/15] xen: check pre-allocated page tables for conflict with memory map
On 09/04/2015 07:55, Juergen Gross wrote: Check whether the page tables built by the domain builder are at memory addresses which are in conflict with the target memory map. If this is the case just panic instead of running into problems later. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/mmu.c | 19 --- arch/x86/xen/setup.c | 6 ++ arch/x86/xen/xen-ops.h | 1 + 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 1ca5197..41aeb1c 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -116,6 +116,7 @@ static pud_t level3_user_vsyscall[PTRS_PER_PUD] __page_aligned_bss; DEFINE_PER_CPU(unsigned long, xen_cr3); /* cr3 stored as physaddr */ DEFINE_PER_CPU(unsigned long, xen_current_cr3); /* actual vcpu cr3 */ +static phys_addr_t xen_pt_base, xen_pt_size; These be __init, but the use of globals in this way is confusing. /* * Just beyond the highest usermode address. STACK_TOP_MAX has a @@ -1998,7 +1999,9 @@ void __init xen_setup_kernel_pagetable(pgd_t *pgd, unsigned long max_pfn) check_pt_base(pt_base, pt_end, addr[i]); /* Our (by three pages) smaller Xen pagetable that we are using */ - memblock_reserve(PFN_PHYS(pt_base), (pt_end - pt_base) * PAGE_SIZE); + xen_pt_base = PFN_PHYS(pt_base); + xen_pt_size = (pt_end - pt_base) * PAGE_SIZE; + memblock_reserve(xen_pt_base, xen_pt_size); Why not provide a xen_memblock_check_and_reserve() call that has the xen_is_e820_reserved() check and the memblock_reserve() call? This may also be useful for patch #9 as well. +void __init xen_pt_check_e820(void) +{ + if (xen_chk_e820_reserved(xen_pt_base, xen_pt_size)) { + xen_raw_console_write(Xen hypervisor allocated page table memory conflicts with E820 map\n); + BUG(); + } +} + static unsigned char dummy_mapping[PAGE_SIZE] __page_aligned_bss; David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 p2 03/19] xen/arm: Release IRQ routed to a domain when it's destroying
From: Julien Grall julien.gr...@linaro.org Xen has to release IRQ routed to a domain in order to reuse later. Currently only SPIs can be routed to the guest so we only need to browse SPIs for a specific domain. Furthermore, a guest can crash and leave the IRQ in an incorrect state (i.e has not been EOIed). Xen will have to reset the IRQ in order to be able to reuse the IRQ later. Introduce 2 new functions for release an IRQ routed to a domain: - release_guest_irq: upper level to retrieve the IRQ, call the GIC code and release the action - gic_remove_guest_irq: Check if we can remove the IRQ, and reset it if necessary Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Ian Campbell ian.campb...@citrix.com --- Changes in v5: - Typoes in the commit message - Add Ian's Ack Changes in v4: - Reorder the code flow - Typoes and coding style - Use the newly helper spi_to_pending Changes in v3: - Take the vgic rank lock to protect p-desc - Correctly check if the IRQ is disabled - Extend the check on the virq in release_guest_irq - Use vgic_get_target_vcpu to get the target vCPU - Remove spurious change Changes in v2: - Drop the desc-handler = no_irq_type in release_irq as it's buggy if the IRQ is routed to Xen - Add release_guest_irq and gic_remove_guest_irq --- xen/arch/arm/gic.c| 45 + xen/arch/arm/irq.c| 46 ++ xen/arch/arm/vgic.c | 16 xen/include/asm-arm/gic.h | 4 xen/include/asm-arm/irq.h | 2 ++ 5 files changed, 113 insertions(+) diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index 5f34997..f023e4f 100644 --- a/xen/arch/arm/gic.c +++ b/xen/arch/arm/gic.c @@ -163,6 +163,51 @@ out: return res; } +/* This function only works with SPIs for now */ +int gic_remove_irq_from_guest(struct domain *d, unsigned int virq, + struct irq_desc *desc) +{ +struct vcpu *v_target = vgic_get_target_vcpu(d-vcpu[0], virq); +struct vgic_irq_rank *rank = vgic_rank_irq(v_target, virq); +struct pending_irq *p = irq_to_pending(v_target, virq); +unsigned long flags; + +ASSERT(spin_is_locked(desc-lock)); +ASSERT(test_bit(_IRQ_GUEST, desc-status)); +ASSERT(p-desc == desc); + +vgic_lock_rank(v_target, rank, flags); + +if ( d-is_dying ) +{ +desc-handler-shutdown(desc); + +/* EOI the IRQ it it has not been done by the guest */ +if ( test_bit(_IRQ_INPROGRESS, desc-status) ) +gic_hw_ops-deactivate_irq(desc); +clear_bit(_IRQ_INPROGRESS, desc-status); +} +else +{ +/* + * TODO: Handle eviction from LRs For now, deny + * remove if the IRQ is inflight or not disabled. + */ +if ( test_bit(_IRQ_INPROGRESS, desc-status) || + !test_bit(_IRQ_DISABLED, desc-status) ) +return -EBUSY; +} + +clear_bit(_IRQ_GUEST, desc-status); +desc-handler = no_irq_type; + +p-desc = NULL; + +vgic_unlock_rank(v_target, rank, flags); + +return 0; +} + int gic_irq_xlate(const u32 *intspec, unsigned int intsize, unsigned int *out_hwirq, unsigned int *out_type) diff --git a/xen/arch/arm/irq.c b/xen/arch/arm/irq.c index b2ddf6b..376c9f2 100644 --- a/xen/arch/arm/irq.c +++ b/xen/arch/arm/irq.c @@ -513,6 +513,52 @@ free_info: return retval; } +int release_guest_irq(struct domain *d, unsigned int virq) +{ +struct irq_desc *desc; +struct irq_guest *info; +unsigned long flags; +struct pending_irq *p; +int ret; + +/* Only SPIs are supported */ +if ( virq NR_LOCAL_IRQS || virq = vgic_num_irqs(d) ) +return -EINVAL; + +p = spi_to_pending(d, virq); +if ( !p-desc ) +return -EINVAL; + +desc = p-desc; + +spin_lock_irqsave(desc-lock, flags); + +ret = -EINVAL; +if ( !test_bit(_IRQ_GUEST, desc-status) ) +goto unlock; + +info = irq_get_guest_info(desc); +ret = -EINVAL; +if ( d != info-d ) +goto unlock; + +ret = gic_remove_irq_from_guest(d, virq, desc); +if ( ret ) +goto unlock; + +spin_unlock_irqrestore(desc-lock, flags); + +release_irq(desc-irq, info); +xfree(info); + +return 0; + +unlock: +spin_unlock_irqrestore(desc-lock, flags); + +return ret; +} + /* * pirq event channels. We don't use these on ARM, instead we use the * features of the GIC to inject virtualised normal interrupts. diff --git a/xen/arch/arm/vgic.c b/xen/arch/arm/vgic.c index 55c8927..05c5010 100644 --- a/xen/arch/arm/vgic.c +++ b/xen/arch/arm/vgic.c @@ -135,6 +135,22 @@ void register_vgic_ops(struct domain *d, const struct vgic_ops *ops) void domain_vgic_free(struct domain *d) { +int i; +int ret; + +for ( i = 0; i
[Xen-devel] [PATCH v2 0/2] osstest: update FreeBSD guests and cleanup
The first patch in this series updates FreeBSD guests in OSSTest to use raw images instead of qcow2 (which are no longer provided by upstream). The second patch is a cleanup for ts-freebsd-install which should not change functionality. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 2/2] osstest: clean ts-freebsd-install script
Remove some unused variables from ts-freebsd-install script. Also make the third parameter of target_put_guest_image optional and fix both callers of this function. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Osstest/TestSupport.pm | 4 ++-- ts-freebsd-install | 21 ++--- 2 files changed, 4 insertions(+), 21 deletions(-) diff --git a/Osstest/TestSupport.pm b/Osstest/TestSupport.pm index 8754e22..96942bd 100644 --- a/Osstest/TestSupport.pm +++ b/Osstest/TestSupport.pm @@ -1554,7 +1554,7 @@ END return $cfgpath; } -sub target_put_guest_image ($$$) { +sub target_put_guest_image ($$;$) { my ($ho, $gho, $default) = @_; my $specimage = $r{$gho-{Guest}_image}; $specimage = $default if !defined $specimage; @@ -1574,7 +1574,7 @@ sub more_prepareguest_hvm (;@) { my @disks = phy:$gho-{Lvdev},hda,w; if (!$xopts{NoCdromImage}) { - target_put_guest_image($ho, $gho, undef); + target_put_guest_image($ho, $gho); my $postimage_hook= $xopts{PostImageHook}; $postimage_hook-() if $postimage_hook; diff --git a/ts-freebsd-install b/ts-freebsd-install index 61d2f83..4449fd1 100755 --- a/ts-freebsd-install +++ b/ts-freebsd-install @@ -36,18 +36,6 @@ our $gho; our $mnt= '/root/freebsd_root'; -our $freebsd_version= 10.0-BETA3; - -# Folder where the FreeBSD VM images are stored inside of the host -# -# The naming convention of the stored images is: -# FreeBSD-$freebsd_version-$arch.qcow2.xz -# ie: FreeBSD-10.0-BETA3-amd64.qcow2.xz -# -# Used only if the runvar guest_image is not set. -# -our $freebsd_vm_repo= '/var/images'; - sub prep () { my $authkeys= authorized_keys(); @@ -59,13 +47,8 @@ sub prep () { more_prepareguest_hvm($ho, $gho, $ram_mb, $disk_mb, NoCdromImage = 1); -target_put_guest_image($ho, $gho, - $freebsd_vm_repo/FreeBSD-$freebsd_version-. - (defined($r{$gho-{Guest}_arch}) - # Use amd64 as default arch - ? $r{$gho-{Guest}_arch} : 'amd64'). - .qcow2.xz); - +target_put_guest_image($ho, $gho); + my $rootpartition_dev = target_guest_lv_name($ho, $gho-{Name}) . --disk3; target_cmd_root($ho, umount $gho-{Lvdev} ||:); -- 1.9.5 (Apple Git-50.3) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities
Hi Dario, 2015-04-03 22:14 GMT-04:00 Dario Faggioli dario.faggi...@citrix.com: Hi Everyone, This RFC series is the outcome of an investigation I've been doing about whether we can take better advantage of features like Intel CMT (and of PSR features in general). By take better advantage of them I mean, for example, use the data obtained from monitoring within the scheduler and/or within libxl's automatic NUMA placement algorithm, or similar. I'm putting here in the cover letter a markdown document I wrote to better describe my findings and ideas (sorry if it's a bit long! :-D). You can also fetch it at the following links: * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown See the document itself and the changelog of the various patches for details. The series includes one Chao's patch on top, as I found it convenient to build on top of it. The series itself is available here: git://xenbits.xen.org/people/dariof/xen.git wip/sched/icachemon http://xenbits.xen.org/gitweb/?p=people/dariof/xen.git;a=shortlog;h=refs/heads/wip/sched/icachemon Thanks a lot to everyone that will read and reply! :-) Regards, Dario --- # Intel Cache Monitoring: Present and Future ## About this document This document represents the result of in investigation on whether it would be possible to more extensively exploit the Platform Shared Resource Monitoring (PSR) capabilities of recent Intel x86 server chips. Examples of such features are the Cache Monitoring Technology (CMT) and the Memory Bandwidth Monitoring (MBM). More specifically, it focuses on Cache Monitoring Technology, support for which has recently been introduced in Xen by Intel, trying to figure out whether it can be used for high level load balancing, such as libxl automatic domain placement, and/or within Xen vCPU scheduler(s). Note that, although the document only speaks about CMT, most of the considerations apply (or can easily be extended) to MBM as well. The fact that, currently, support is provided for monitoring L3 cache only, somewhat limits the benefits of more extensively exploiting such technology, which is exactly the purpose here. Nevertheless, some improvements are possible already, and if at some point support for monitoring other cache layers will be available, this can be the basic building block for taking advantage of that too. I'm wondering if you really want to know the cache usage at different levels of cache, you may use the (4) general PMC on each logical core to monitor that. This could bypass the limitation of the current HW, but the concern is that it may affect the other mechanisms in Xen, like perf, which also use the PMC.) Another thought on the CMT is that it seems that Intel introduces CMT along with CAT. So I assume they want to use CMT along with CAT so that it gives some hint on how to allocate LLC to different guests? For example, if a crazy guest is thrashing the LLC, they can apply CAT to constraint/calm down this crazy guest. Best, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 04/12] x86: maintain COS to CBM mapping for each socket
On 09/04/2015 10:18, Chao Peng wrote: For each socket, a COS to CBM mapping structure is maintained for each COS. The mapping is indexed by COS and the value is the corresponding CBM. Different VMs may use the same CBM, a reference count is used to indicate if the CBM is available. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- xen/arch/x86/psr.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 16c37dd..4aff5f6 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -21,11 +21,17 @@ #define PSR_CMT(10) #define PSR_CAT(11) +struct psr_cat_cbm { +unsigned int ref; +uint64_t cbm; +}; + struct psr_cat_socket_info { bool_t initialized; bool_t enabled; unsigned int cbm_len; unsigned int cos_max; +struct psr_cat_cbm *cos_cbm_map; cos_to_cmb would be more in keeping with Xen style, and IMO easier to read in code. }; struct psr_assoc { @@ -240,6 +246,14 @@ static void cat_cpu_init(unsigned int cpu) info-cbm_len = (eax 0x1f) + 1; info-cos_max = (edx 0x); Apologies for missing this in the previous patch, but cos_max should have a command line parameter like rmid_max if a lower limit wants to be enforced. Otherwise, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com +info-cos_cbm_map = xzalloc_array(struct psr_cat_cbm, + info-cos_max + 1UL); +if ( !info-cos_cbm_map ) +return; + +/* cos=0 is reserved as default cbm(all ones). */ +info-cos_cbm_map[0].cbm = (1ull info-cbm_len) - 1; + info-enabled = 1; printk(XENLOG_INFO CAT: enabled on socket %u, cos_max:%u, cbm_len:%u\n, socket, info-cos_max, info-cbm_len); ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] VTd/dmar: Tweak how the DMAR table is clobbered
On 08/04/15 20:44, Andrew Cooper wrote: Intead of clobbering DMAR - XMAR and back, clobber to RMAD instead. This means that changing the signature does not alter the checksum, which allows the clobbering/unclobbering to be peformed atomically and idempotently, which is an advantage on the kexec path which can reenter acpi_dmar_reinstate(). Could RMAD be specified as a real table in the future? Does the clobbered name have to start with X to avoid future conflicts? David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2 01/15] xen: sync with xen headers
On 09/04/15 07:55, Juergen Gross wrote: Use the newest headers from the xen tree to get some new structure layouts. Reviewed-by: David Vrabel david.vra...@citrix.com David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH RFC v2 3/3] xen: rework paging_log_dirty_op to work with hvm guests
At 12:09 +0200 on 07 Apr (1428408556), Roger Pau Monné wrote: Hello, El 03/04/15 a les 16.12, Tim Deegan ha escrit: Hi, At 20:46 +0100 on 02 Apr (1428007593), Andrew Cooper wrote: On 02/04/15 11:26, Roger Pau Monne wrote: When the caller of paging_log_dirty_op is a hvm guest Xen would choke when trying to copy the dirty bitmap to the guest because the paging lock is already held. Are you sure? Presumably you get an mm lock ordering violation, because paging_log_dirty_op() should take the target domains paging lock, rather than your own (which is prohibited by the current check at the top of paging_domctl()). Unfortunately, dropping the paging_lock() here is unsafe, as it will result in corruption of the logdirty bitmap from non-domain sources such as HVMOP_modified_memory. I will need to find some time with a large pot of coffee and a whiteboard, but I suspect it might actually be safe to alter the current mm_lock() enforcement to maintain independent levels for a source and destination domain. We discussed this in an earlier thread and agreed it would be better to try to do this work in batches rather than add more complexity to the mm locking rules. (I'm AFK this week so I haven't had a chance to review the actual pacth yet.) I don't know about the locking rules or how much complexity would permitting this kind of accesses add to it, but IMHO this patch makes the code quite more complex and possibly error prone, so finding a simpler approach seems like a good option to me. I'm happier with this (relatively contained) complexity than with adding yet more logic to the mm_locks code. I don't think there are any new races introduced here that weren't already present in the -ERESTART case. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 08/10] VMX: disable PML in vmx_vcpu_destroy
At 10:35 +0800 on 27 Mar (1427452552), Kai Huang wrote: It's possible domain still remains in log-dirty mode when it is about to be destroyed, in which case we should manually disable PML for it. Signed-off-by: Kai Huang kai.hu...@linux.intel.com --- xen/arch/x86/hvm/vmx/vmx.c | 9 + 1 file changed, 9 insertions(+) diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c index fce3aa2..75ac44b 100644 --- a/xen/arch/x86/hvm/vmx/vmx.c +++ b/xen/arch/x86/hvm/vmx/vmx.c @@ -153,6 +153,15 @@ static int vmx_vcpu_initialise(struct vcpu *v) static void vmx_vcpu_destroy(struct vcpu *v) { +/* + * There are cases that domain still remains in log-dirty mode when it is + * about to be destroyed (ex, user types 'xl destroy dom'), in which case + * we should disable PML manually here. Note that vmx_vcpu_destroy is called + * prior to vmx_domain_destroy so we need to disable PML for each vcpu + * separately here. + */ +if ( vmx_vcpu_pml_enabled(v) ) +vmx_vcpu_disable_pml(v); Looking at this and other callers of these enable/disable functions, I think it would be better to make those functions idempotent (i.e. *_{en,dis}able_pml() should just return success if PML is already enabled/disabled). Then you don't need to check in every caller, and there's no risk of a crash if one caller is missing the check. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2 08/15] xen: find unused contiguous memory area
For being able to relocate pre-allocated data areas like initrd or p2m list it is mandatory to find a contiguous memory area which is not yet in use and doesn't conflict with the memory map we want to be in effect. In case such an area is found reserve it at once as this will be required to be done in any case. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/setup.c | 34 ++ arch/x86/xen/xen-ops.h | 1 + 2 files changed, 35 insertions(+) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 4666adf..606ac2b 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -597,6 +597,40 @@ bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size) } /* + * Find a free area in physical memory not yet reserved and compliant with + * E820 map. + * Used to relocate pre-allocated areas like initrd or p2m list which are in + * conflict with the to be used E820 map. + * In case no area is found, return 0. Otherwise return the physical address + * of the area which is already reserved for convenience. + */ +phys_addr_t __init xen_find_free_area(phys_addr_t size) +{ + unsigned mapcnt; + phys_addr_t addr, start; + struct e820entry *entry = xen_e820_map; + + for (mapcnt = 0; mapcnt xen_e820_map_entries; mapcnt++, entry++) { + if (entry-type != E820_RAM || entry-size size) + continue; + start = entry-addr; + for (addr = start; addr start + size; addr += PAGE_SIZE) { + if (!memblock_is_reserved(addr)) + continue; + start = addr + PAGE_SIZE; + if (start + size entry-addr + entry-size) + break; + } + if (addr = start + size) { + memblock_reserve(start, size); + return start; + } + } + + return 0; +} + +/* * Reserve Xen mfn_list. * See comment above struct start_info in xen/interface/xen.h * We tried to make the the memblock_reserve more selective so diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 56650bb..c7fa0a3 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -43,6 +43,7 @@ bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size); unsigned long __ref xen_chk_extra_mem(unsigned long pfn); void __init xen_inv_extra_mem(void); void __init xen_remap_memory(void); +phys_addr_t __init xen_find_free_area(phys_addr_t size); char * __init xen_memory_setup(void); char * xen_auto_xlated_memory_setup(void); void __init xen_arch_setup(void); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2 06/15] xen: split counting of extra memory pages from remapping
Memory pages in the initial memory setup done by the Xen hypervisor conflicting with the target E820 map are remapped. In order to do this those pages are counted and remapped in xen_set_identity_and_remap(). Split the counting from the remapping operation to be able to setup the needed memory sizes in time but doing the remap operation at a later time. This enables us to simplify the interface to xen_set_identity_and_remap() as the number of remapped and released pages is no longer needed here. Finally move the remapping further down to prepare relocating conflicting memory contents before the memory might be clobbered by xen_set_identity_and_remap(). This requires to not destroy the Xen E820 map when the one for the system is being constructed. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/setup.c | 98 +++- 1 file changed, 58 insertions(+), 40 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index ab6c36e..87251b4 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -223,7 +223,7 @@ static int __init xen_free_mfn(unsigned long mfn) * as a fallback if the remapping fails. */ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, - unsigned long end_pfn, unsigned long nr_pages, unsigned long *released) + unsigned long end_pfn, unsigned long nr_pages) { unsigned long pfn, end; int ret; @@ -243,7 +243,7 @@ static void __init xen_set_identity_and_release_chunk(unsigned long start_pfn, WARN(ret != 1, Failed to release pfn %lx err=%d\n, pfn, ret); if (ret == 1) { - (*released)++; + xen_released_pages++; if (!__set_phys_to_machine(pfn, INVALID_P2M_ENTRY)) break; } else @@ -359,8 +359,7 @@ static void __init xen_do_set_identity_and_remap_chunk( */ static unsigned long __init xen_set_identity_and_remap_chunk( unsigned long start_pfn, unsigned long end_pfn, unsigned long nr_pages, - unsigned long remap_pfn, unsigned long *released, - unsigned long *remapped) + unsigned long remap_pfn) { unsigned long pfn; unsigned long i = 0; @@ -385,7 +384,7 @@ static unsigned long __init xen_set_identity_and_remap_chunk( if (!remap_range_size) { pr_warning(Unable to find available pfn range, not remapping identity pages\n); xen_set_identity_and_release_chunk(cur_pfn, - cur_pfn + left, nr_pages, released); + cur_pfn + left, nr_pages); break; } /* Adjust size to fit in current e820 RAM region */ @@ -397,7 +396,6 @@ static unsigned long __init xen_set_identity_and_remap_chunk( /* Update variables to reflect new mappings. */ i += size; remap_pfn += size; - *remapped += size; } /* @@ -412,14 +410,11 @@ static unsigned long __init xen_set_identity_and_remap_chunk( return remap_pfn; } -static void __init xen_set_identity_and_remap(unsigned long nr_pages, - unsigned long *released, unsigned long *remapped) +static void __init xen_set_identity_and_remap(unsigned long nr_pages) { phys_addr_t start = 0; unsigned long last_pfn = nr_pages; const struct e820entry *entry = xen_e820_map; - unsigned long num_released = 0; - unsigned long num_remapped = 0; int i; /* @@ -445,16 +440,12 @@ static void __init xen_set_identity_and_remap(unsigned long nr_pages, if (start_pfn end_pfn) last_pfn = xen_set_identity_and_remap_chunk( start_pfn, end_pfn, nr_pages, - last_pfn, num_released, - num_remapped); + last_pfn); start = end; } } - *released = num_released; - *remapped = num_remapped; - - pr_info(Released %ld page(s)\n, num_released); + pr_info(Released %ld page(s)\n, xen_released_pages); } /* @@ -560,6 +551,28 @@ static void __init xen_ignore_unusable(void) } } +static unsigned long __init xen_count_remap_pages(unsigned long max_pfn) +{ + unsigned long extra = 0; + const struct e820entry *entry = xen_e820_map; + int i; + + for (i = 0; i xen_e820_map_entries; i++, entry++) { + unsigned long start_pfn = PFN_DOWN(entry-addr); + unsigned long end_pfn = PFN_UP(entry-addr + entry-size); + + if (start_pfn = max_pfn) +
[Xen-devel] [Patch V2 13/15] xen: move p2m list if conflicting with e820 map
Check whether the hypervisor supplied p2m list is placed at a location which is conflicting with the target E820 map. If this is the case relocate it to a new area unused up to now and compliant to the E820 map. As the p2m list might by huge (up to several GB) and is required to be mapped virtually, set up a temporary mapping for the copied list. For pvh domains just delete the p2m related information from start info instead of reserving the p2m memory, as we don't need it at all. For 32 bit kernels adjust the memblock_reserve() parameters in order to cover the page tables only. This requires to memblock_reserve() the start_info page on it's own. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/mmu.c | 232 ++--- arch/x86/xen/setup.c | 51 +-- arch/x86/xen/xen-ops.h | 3 + 3 files changed, 247 insertions(+), 39 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 41aeb1c..3689fb8 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1094,6 +1094,16 @@ static void xen_exit_mmap(struct mm_struct *mm) static void xen_post_allocator_init(void); +static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn) +{ + struct mmuext_op op; + + op.cmd = cmd; + op.arg1.mfn = pfn_to_mfn(pfn); + if (HYPERVISOR_mmuext_op(op, 1, NULL, DOMID_SELF)) + BUG(); +} + #ifdef CONFIG_X86_64 static void __init xen_cleanhighmap(unsigned long vaddr, unsigned long vaddr_end) @@ -1129,10 +1139,12 @@ static void __init xen_free_ro_pages(unsigned long paddr, unsigned long size) memblock_free(paddr, size); } -static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl) +static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl, bool unpin) { unsigned long pa = __pa(pgtbl) PHYSICAL_PAGE_MASK; + if (unpin) + pin_pagetable_pfn(MMUEXT_UNPIN_TABLE, PFN_DOWN(pa)); ClearPagePinned(virt_to_page(__va(pa))); xen_free_ro_pages(pa, PAGE_SIZE); } @@ -1151,7 +1163,9 @@ static void __init xen_cleanmfnmap(unsigned long vaddr) pmd_t *pmd; pte_t *pte; unsigned int i; + bool unpin; + unpin = (vaddr == 2 * PGDIR_SIZE); set_pgd(pgd, __pgd(0)); do { pud = pud_page + pud_index(va); @@ -1168,22 +1182,24 @@ static void __init xen_cleanmfnmap(unsigned long vaddr) xen_free_ro_pages(pa, PMD_SIZE); } else if (!pmd_none(*pmd)) { pte = pte_offset_kernel(pmd, va); + set_pmd(pmd, __pmd(0)); for (i = 0; i PTRS_PER_PTE; ++i) { if (pte_none(pte[i])) break; pa = pte_pfn(pte[i]) PAGE_SHIFT; xen_free_ro_pages(pa, PAGE_SIZE); } - xen_cleanmfnmap_free_pgtbl(pte); + xen_cleanmfnmap_free_pgtbl(pte, unpin); } va += PMD_SIZE; if (pmd_index(va)) continue; - xen_cleanmfnmap_free_pgtbl(pmd); + set_pud(pud, __pud(0)); + xen_cleanmfnmap_free_pgtbl(pmd, unpin); } } while (pud_index(va) || pmd_index(va)); - xen_cleanmfnmap_free_pgtbl(pud_page); + xen_cleanmfnmap_free_pgtbl(pud_page, unpin); } static void __init xen_pagetable_p2m_free(void) @@ -1219,6 +1235,12 @@ static void __init xen_pagetable_p2m_free(void) } else { xen_cleanmfnmap(addr); } +} + +static void __init xen_pagetable_cleanhighmap(void) +{ + unsigned long size; + unsigned long addr; /* At this stage, cleanup_highmap has already cleaned __ka space * from _brk_limit way up to the max_pfn_mapped (which is the end of @@ -1251,6 +1273,8 @@ static void __init xen_pagetable_p2m_setup(void) #ifdef CONFIG_X86_64 xen_pagetable_p2m_free(); + + xen_pagetable_cleanhighmap(); #endif /* And revector! Bye bye old array */ xen_start_info-mfn_list = (unsigned long)xen_p2m_addr; @@ -1586,15 +1610,6 @@ static void __init xen_set_pte_init(pte_t *ptep, pte_t pte) native_set_pte(ptep, pte); } -static void __init pin_pagetable_pfn(unsigned cmd, unsigned long pfn) -{ - struct mmuext_op op; - op.cmd = cmd; - op.arg1.mfn = pfn_to_mfn(pfn); - if (HYPERVISOR_mmuext_op(op, 1, NULL, DOMID_SELF)) - BUG(); -} - /* Early in boot, while setting up the initial pagetable, assume everything is pinned. */ static void __init xen_alloc_pte_init(struct mm_struct *mm, unsigned long pfn) @@
[Xen-devel] [Patch V2 01/15] xen: sync with xen headers
Use the newest headers from the xen tree to get some new structure layouts. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/xen/interface.h | 96 include/xen/interface/xen.h | 10 ++-- 2 files changed, 93 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/xen/interface.h b/arch/x86/include/asm/xen/interface.h index 3400dba..3b88eea 100644 --- a/arch/x86/include/asm/xen/interface.h +++ b/arch/x86/include/asm/xen/interface.h @@ -3,12 +3,38 @@ * * Guest OS interface to x86 Xen. * - * Copyright (c) 2004, K A Fraser + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to + * deal in the Software without restriction, including without limitation the + * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or + * sell copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + * + * Copyright (c) 2004-2006, K A Fraser */ #ifndef _ASM_X86_XEN_INTERFACE_H #define _ASM_X86_XEN_INTERFACE_H +/* + * XEN_GUEST_HANDLE represents a guest pointer, when passed as a field + * in a struct in memory. + * XEN_GUEST_HANDLE_PARAM represent a guest pointer, when passed as an + * hypercall argument. + * XEN_GUEST_HANDLE_PARAM and XEN_GUEST_HANDLE are the same on X86 but + * they might not be on other architectures. + */ #ifdef __XEN__ #define __DEFINE_GUEST_HANDLE(name, type) \ typedef struct { type *p; } __guest_handle_ ## name @@ -88,13 +114,16 @@ DEFINE_GUEST_HANDLE(xen_ulong_t); * start of the GDT because some stupid OSes export hard-coded selector values * in their ABI. These hard-coded values are always near the start of the GDT, * so Xen places itself out of the way, at the far end of the GDT. + * + * NB The LDT is set using the MMUEXT_SET_LDT op of HYPERVISOR_mmuext_op */ #define FIRST_RESERVED_GDT_PAGE 14 #define FIRST_RESERVED_GDT_BYTE (FIRST_RESERVED_GDT_PAGE * 4096) #define FIRST_RESERVED_GDT_ENTRY (FIRST_RESERVED_GDT_BYTE / 8) /* - * Send an array of these to HYPERVISOR_set_trap_table() + * Send an array of these to HYPERVISOR_set_trap_table(). + * Terminate the array with a sentinel entry, with traps[].address==0. * The privilege level specifies which modes may enter a trap via a software * interrupt. On x86/64, since rings 1 and 2 are unavailable, we allocate * privilege levels as follows: @@ -118,10 +147,41 @@ struct trap_info { DEFINE_GUEST_HANDLE_STRUCT(trap_info); struct arch_shared_info { -unsigned long max_pfn; /* max pfn that appears in table */ -/* Frame containing list of mfns containing list of mfns containing p2m. */ -unsigned long pfn_to_mfn_frame_list_list; -unsigned long nmi_reason; + /* +* Number of valid entries in the p2m table(s) anchored at +* pfn_to_mfn_frame_list_list and/or p2m_vaddr. +*/ + unsigned long max_pfn; + /* +* Frame containing list of mfns containing list of mfns containing p2m. +* A value of 0 indicates it has not yet been set up, ~0 indicates it +* has been set to invalid e.g. due to the p2m being too large for the +* 3-level p2m tree. In this case the linear mapper p2m list anchored +* at p2m_vaddr is to be used. +*/ + xen_pfn_t pfn_to_mfn_frame_list_list; + unsigned long nmi_reason; + /* +* Following three fields are valid if p2m_cr3 contains a value +* different from 0. +* p2m_cr3 is the root of the address space where p2m_vaddr is valid. +* p2m_cr3 is in the same format as a cr3 value in the vcpu register +* state and holds the folded machine frame number (via xen_pfn_to_cr3) +* of a L3 or L4 page table. +* p2m_vaddr holds the virtual address of the linear p2m list. All +* entries in the range [0...max_pfn[ are accessible via this pointer. +* p2m_generation will be incremented by the guest before and after each +* change of the mappings of the p2m list. p2m_generation starts at 0 +* and a value with the least significant bit set indicates that a +* mapping update is in progress. This allows guest external
[Xen-devel] [Patch V2 11/15] xen: check for initrd conflicting with e820 map
Check whether the initrd is placed at a location which is conflicting with the target E820 map. If this is the case relocate it to a new area unused up to now and compliant to the E820 map. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/setup.c | 51 +++ 1 file changed, 51 insertions(+) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 5d0f4e2..6985730 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -632,6 +632,36 @@ phys_addr_t __init xen_find_free_area(phys_addr_t size) } /* + * Like memcpy, but with physical addresses for dest and src. + */ +static void __init xen_phys_memcpy(phys_addr_t dest, phys_addr_t src, + phys_addr_t n) +{ + phys_addr_t dest_off, src_off, dest_len, src_len, len; + void *from, *to; + + while (n) { + dest_off = dest ~PAGE_MASK; + src_off = src ~PAGE_MASK; + dest_len = n; + if (dest_len (NR_FIX_BTMAPS PAGE_SHIFT) - dest_off) + dest_len = (NR_FIX_BTMAPS PAGE_SHIFT) - dest_off; + src_len = n; + if (src_len (NR_FIX_BTMAPS PAGE_SHIFT) - src_off) + src_len = (NR_FIX_BTMAPS PAGE_SHIFT) - src_off; + len = min(dest_len, src_len); + to = early_memremap(dest - dest_off, dest_len + dest_off); + from = early_memremap(src - src_off, src_len + src_off); + memcpy(to, from, len); + early_memunmap(to, dest_len + dest_off); + early_memunmap(from, src_len + src_off); + n -= len; + dest += len; + src += len; + } +} + +/* * Reserve Xen mfn_list. * See comment above struct start_info in xen/interface/xen.h * We tried to make the the memblock_reserve more selective so @@ -808,6 +838,27 @@ char * __init xen_memory_setup(void) */ xen_pt_check_e820(); + /* Check for a conflict of the initrd with the target E820 map. */ + if (xen_chk_e820_reserved(boot_params.hdr.ramdisk_image, + boot_params.hdr.ramdisk_size)) { + phys_addr_t new_area, start, size; + + new_area = xen_find_free_area(boot_params.hdr.ramdisk_size); + if (!new_area) { + xen_raw_console_write(Can't find new memory area for initrd needed due to E820 map conflict\n); + BUG(); + } + + start = boot_params.hdr.ramdisk_image; + size = boot_params.hdr.ramdisk_size; + xen_phys_memcpy(new_area, start, size); + pr_info(initrd moved from [mem %#010llx-%#010llx] to [mem %#010llx-%#010llx]\n, + start, start + size, new_area, new_area + size); + memblock_free(start, size); + boot_params.hdr.ramdisk_image = new_area; + boot_params.ext_ramdisk_image = new_area 32; + } + xen_reserve_xen_mfnlist(); /* -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2 04/15] xen: eliminate scalability issues from initial mapping setup
Direct Xen to place the initial P-M table outside of the initial mapping, as otherwise the 1G (implementation) / 2G (theoretical) restriction on the size of the initial mapping limits the amount of memory a domain can be handed initially. As the initial P-M table is copied rather early during boot to domain private memory and it's initial virtual mapping is dropped, the easiest way to avoid virtual address conflicts with other addresses in the kernel is to use a user address area for the virtual address of the initial P-M table. This allows us to just throw away the page tables of the initial mapping after the copy without having to care about address invalidation. It should be noted that this patch won't enable a pv-domain to USE more than 512 GB of RAM. It just enables it to be started with a P-M table covering more memory. This is especially important for being able to boot a Dom0 on a system with more than 512 GB memory. Signed-off-by: Juergen Gross jgr...@suse.com Based-on-patch-by: Jan Beulich jbeul...@suse.com --- arch/x86/xen/mmu.c | 126 arch/x86/xen/setup.c| 67 ++--- arch/x86/xen/xen-head.S | 2 + 3 files changed, 156 insertions(+), 39 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index adca9e2..1ca5197 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -1114,6 +1114,77 @@ static void __init xen_cleanhighmap(unsigned long vaddr, xen_mc_flush(); } +/* + * Make a page range writeable and free it. + */ +static void __init xen_free_ro_pages(unsigned long paddr, unsigned long size) +{ + void *vaddr = __va(paddr); + void *vaddr_end = vaddr + size; + + for (; vaddr vaddr_end; vaddr += PAGE_SIZE) + make_lowmem_page_readwrite(vaddr); + + memblock_free(paddr, size); +} + +static void __init xen_cleanmfnmap_free_pgtbl(void *pgtbl) +{ + unsigned long pa = __pa(pgtbl) PHYSICAL_PAGE_MASK; + + ClearPagePinned(virt_to_page(__va(pa))); + xen_free_ro_pages(pa, PAGE_SIZE); +} + +/* + * Since it is well isolated we can (and since it is perhaps large we should) + * also free the page tables mapping the initial P-M table. + */ +static void __init xen_cleanmfnmap(unsigned long vaddr) +{ + unsigned long va = vaddr PMD_MASK; + unsigned long pa; + pgd_t *pgd = pgd_offset_k(va); + pud_t *pud_page = pud_offset(pgd, 0); + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned int i; + + set_pgd(pgd, __pgd(0)); + do { + pud = pud_page + pud_index(va); + if (pud_none(*pud)) { + va += PUD_SIZE; + } else if (pud_large(*pud)) { + pa = pud_val(*pud) PHYSICAL_PAGE_MASK; + xen_free_ro_pages(pa, PUD_SIZE); + va += PUD_SIZE; + } else { + pmd = pmd_offset(pud, va); + if (pmd_large(*pmd)) { + pa = pmd_val(*pmd) PHYSICAL_PAGE_MASK; + xen_free_ro_pages(pa, PMD_SIZE); + } else if (!pmd_none(*pmd)) { + pte = pte_offset_kernel(pmd, va); + for (i = 0; i PTRS_PER_PTE; ++i) { + if (pte_none(pte[i])) + break; + pa = pte_pfn(pte[i]) PAGE_SHIFT; + xen_free_ro_pages(pa, PAGE_SIZE); + } + xen_cleanmfnmap_free_pgtbl(pte); + } + va += PMD_SIZE; + if (pmd_index(va)) + continue; + xen_cleanmfnmap_free_pgtbl(pmd); + } + + } while (pud_index(va) || pmd_index(va)); + xen_cleanmfnmap_free_pgtbl(pud_page); +} + static void __init xen_pagetable_p2m_free(void) { unsigned long size; @@ -1128,18 +1199,25 @@ static void __init xen_pagetable_p2m_free(void) /* using __ka address and sticking INVALID_P2M_ENTRY! */ memset((void *)xen_start_info-mfn_list, 0xff, size); - /* We should be in __ka space. */ - BUG_ON(xen_start_info-mfn_list __START_KERNEL_map); addr = xen_start_info-mfn_list; - /* We roundup to the PMD, which means that if anybody at this stage is -* using the __ka address of xen_start_info or xen_start_info-shared_info -* they are in going to crash. Fortunatly we have already revectored -* in xen_setup_kernel_pagetable and in xen_setup_shared_info. */ + /* +* We could be in __ka space. +* We roundup to the PMD, which means that if anybody at this stage is +* using the __ka address of xen_start_info or +*
[Xen-devel] [Patch V2 15/15] xen: remove no longer needed p2m.h
Cleanup by removing arch/x86/xen/p2m.h as it isn't needed any more. Most definitions in this file are used in p2m.c only. Move those into p2m.c. set_phys_range_identity() is already declared in arch/x86/include/asm/xen/page.h, add __init annotation there. MAX_REMAP_RANGES isn't used at all, just delete it. The only define left is P2M_PER_PAGE which is moved to page.h as well. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/include/asm/xen/page.h | 6 -- arch/x86/xen/p2m.c | 6 +- arch/x86/xen/p2m.h | 15 --- arch/x86/xen/setup.c| 1 - 4 files changed, 9 insertions(+), 19 deletions(-) delete mode 100644 arch/x86/xen/p2m.h diff --git a/arch/x86/include/asm/xen/page.h b/arch/x86/include/asm/xen/page.h index 18a11f2..b858592 100644 --- a/arch/x86/include/asm/xen/page.h +++ b/arch/x86/include/asm/xen/page.h @@ -35,6 +35,8 @@ typedef struct xpaddr { #define FOREIGN_FRAME(m) ((m) | FOREIGN_FRAME_BIT) #define IDENTITY_FRAME(m) ((m) | IDENTITY_FRAME_BIT) +#define P2M_PER_PAGE (PAGE_SIZE / sizeof(unsigned long)) + extern unsigned long *machine_to_phys_mapping; extern unsigned long machine_to_phys_nr; extern unsigned long *xen_p2m_addr; @@ -44,8 +46,8 @@ extern unsigned long xen_max_p2m_pfn; extern unsigned long get_phys_to_machine(unsigned long pfn); extern bool set_phys_to_machine(unsigned long pfn, unsigned long mfn); extern bool __set_phys_to_machine(unsigned long pfn, unsigned long mfn); -extern unsigned long set_phys_range_identity(unsigned long pfn_s, -unsigned long pfn_e); +extern unsigned long __init set_phys_range_identity(unsigned long pfn_s, + unsigned long pfn_e); extern int set_foreign_p2m_mapping(struct gnttab_map_grant_ref *map_ops, struct gnttab_map_grant_ref *kmap_ops, diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 365a64a..1f63ad2 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -78,10 +78,14 @@ #include xen/balloon.h #include xen/grant_table.h -#include p2m.h #include multicalls.h #include xen-ops.h +#define P2M_MID_PER_PAGE (PAGE_SIZE / sizeof(unsigned long *)) +#define P2M_TOP_PER_PAGE (PAGE_SIZE / sizeof(unsigned long **)) + +#define MAX_P2M_PFN(P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE) + #define PMDS_PER_MID_PAGE (P2M_MID_PER_PAGE / PTRS_PER_PTE) unsigned long *xen_p2m_addr __read_mostly; diff --git a/arch/x86/xen/p2m.h b/arch/x86/xen/p2m.h deleted file mode 100644 index ad8aee2..000 --- a/arch/x86/xen/p2m.h +++ /dev/null @@ -1,15 +0,0 @@ -#ifndef _XEN_P2M_H -#define _XEN_P2M_H - -#define P2M_PER_PAGE(PAGE_SIZE / sizeof(unsigned long)) -#define P2M_MID_PER_PAGE(PAGE_SIZE / sizeof(unsigned long *)) -#define P2M_TOP_PER_PAGE(PAGE_SIZE / sizeof(unsigned long **)) - -#define MAX_P2M_PFN (P2M_TOP_PER_PAGE * P2M_MID_PER_PAGE * P2M_PER_PAGE) - -#define MAX_REMAP_RANGES10 - -extern unsigned long __init set_phys_range_identity(unsigned long pfn_s, - unsigned long pfn_e); - -#endif /* _XEN_P2M_H */ diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 13394b1..5561608 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -30,7 +30,6 @@ #include xen/hvc-console.h #include xen-ops.h #include vdso.h -#include p2m.h #include mmu.h #define GB(x) ((uint64_t)(x) * 1024 * 1024 * 1024) -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [Patch V2 07/15] xen: check memory area against e820 map
Provide a service routine to check a physical memory area against the E820 map. The routine will return false if the complete area is RAM according to the E820 map and true otherwise. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/setup.c | 23 +++ arch/x86/xen/xen-ops.h | 1 + 2 files changed, 24 insertions(+) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 87251b4..4666adf 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -573,6 +573,29 @@ static unsigned long __init xen_count_remap_pages(unsigned long max_pfn) return extra; } +bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size) +{ + struct e820entry *entry; + unsigned mapcnt; + phys_addr_t end; + + if (!size) + return false; + + end = start + size; + entry = xen_e820_map; + + for (mapcnt = 0; mapcnt xen_e820_map_entries; mapcnt++) { + if (entry-type == E820_RAM entry-addr = start + (entry-addr + entry-size) = end) + return false; + + entry++; + } + + return true; +} + /* * Reserve Xen mfn_list. * See comment above struct start_info in xen/interface/xen.h diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index 9e195c6..56650bb 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -39,6 +39,7 @@ void xen_reserve_top(void); void xen_mm_pin_all(void); void xen_mm_unpin_all(void); +bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size); unsigned long __ref xen_chk_extra_mem(unsigned long pfn); void __init xen_inv_extra_mem(void); void __init xen_remap_memory(void); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 02/12] x86: improve psr scheduling code
On 09/04/2015 10:18, Chao Peng wrote: Switching RMID from previous vcpu to next vcpu only needs to write MSR_IA32_PSR_ASSOC once. Write it with the value of next vcpu is enough, no need to write '0' first. Idle domain has RMID set to 0 and because MSR is already updated lazily, so just switch it as it does. Also move the initialization of per-CPU variable which used for lazy update from context switch to CPU starting. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v4: * Move psr_assoc_reg_read/psr_assoc_reg_write into psr_ctxt_switch_to. * Use 0 instead of smp_processor_id() for boot cpu. * add cpu parameter to psr_assoc_init. Changes in v2: * Move initialization for psr_assoc from context switch to CPU_STARTING. --- xen/arch/x86/domain.c | 7 ++--- xen/arch/x86/psr.c| 75 ++- xen/include/asm-x86/psr.h | 3 +- 3 files changed, 59 insertions(+), 26 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 04c1898..695a2eb 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1444,8 +1444,6 @@ static void __context_switch(void) { memcpy(p-arch.user_regs, stack_regs, CTXT_SWITCH_STACK_BYTES); vcpu_save_fpu(p); -if ( psr_cmt_enabled() ) -psr_assoc_rmid(0); p-arch.ctxt_switch_from(p); } @@ -1470,11 +1468,10 @@ static void __context_switch(void) } vcpu_restore_fpu_eager(n); n-arch.ctxt_switch_to(n); - -if ( psr_cmt_enabled() n-domain-arch.psr_rmid 0 ) -psr_assoc_rmid(n-domain-arch.psr_rmid); } +psr_ctxt_switch_to(n-domain); + gdt = !is_pv_32on64_vcpu(n) ? per_cpu(gdt_table, cpu) : per_cpu(compat_gdt_table, cpu); if ( need_full_gdt(n) ) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 344de3c..6119c6e 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -22,7 +22,6 @@ struct psr_assoc { uint64_t val; -bool_t initialized; }; struct psr_cmt *__read_mostly psr_cmt; @@ -122,14 +121,6 @@ static void __init init_psr_cmt(unsigned int rmid_max) printk(XENLOG_INFO Cache Monitoring Technology enabled\n); } -static int __init init_psr(void) -{ -if ( (opt_psr PSR_CMT) opt_rmid_max ) -init_psr_cmt(opt_rmid_max); -return 0; -} -__initcall(init_psr); - /* Called with domain lock held, no psr specific lock needed */ int psr_alloc_rmid(struct domain *d) { @@ -175,26 +166,70 @@ void psr_free_rmid(struct domain *d) d-arch.psr_rmid = 0; } -void psr_assoc_rmid(unsigned int rmid) +static inline void psr_assoc_init(unsigned int cpu) +{ +struct psr_assoc *psra = per_cpu(psr_assoc, cpu); + +if ( psr_cmt_enabled() ) +rdmsrl(MSR_IA32_PSR_ASSOC, psra-val); +} On further consideration, this would probably be better as a void function which used this_cpu() rather than per_cpu(). Absolutely nothing good can come of calling it with cpu != smp_processor_id(), so we should avoid that situation arising in the first place. + +static inline void psr_assoc_rmid(uint64_t *reg, unsigned int rmid) +{ +*reg = (*reg ~rmid_mask) | (rmid rmid_mask); +} + +void psr_ctxt_switch_to(struct domain *d) { -uint64_t val; -uint64_t new_val; struct psr_assoc *psra = this_cpu(psr_assoc); +uint64_t reg = psra-val; + +if ( psr_cmt_enabled() ) +psr_assoc_rmid(reg, d-arch.psr_rmid); -if ( !psra-initialized ) +if ( reg != psra-val ) { -rdmsrl(MSR_IA32_PSR_ASSOC, psra-val); -psra-initialized = 1; +wrmsrl(MSR_IA32_PSR_ASSOC, reg); +psra-val = reg; } -val = psra-val; +} -new_val = (val ~rmid_mask) | (rmid rmid_mask); -if ( val != new_val ) +static void psr_cpu_init(unsigned int cpu) +{ +psr_assoc_init(cpu); +} This can also turn into a void helper. Otherwise, Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ~Andrew + +static int cpu_callback( +struct notifier_block *nfb, unsigned long action, void *hcpu) +{ +unsigned int cpu = (unsigned long)hcpu; + +switch ( action ) { -wrmsrl(MSR_IA32_PSR_ASSOC, new_val); -psra-val = new_val; +case CPU_STARTING: +psr_cpu_init(cpu); +break; } + +return NOTIFY_DONE; +} + +static struct notifier_block cpu_nfb = { +.notifier_call = cpu_callback +}; + +static int __init psr_presmp_init(void) +{ +if ( (opt_psr PSR_CMT) opt_rmid_max ) +init_psr_cmt(opt_rmid_max); + +psr_cpu_init(0); +if ( psr_cmt_enabled() ) +register_cpu_notifier(cpu_nfb); + +return 0; } +presmp_initcall(psr_presmp_init); /* * Local variables: diff --git a/xen/include/asm-x86/psr.h b/xen/include/asm-x86/psr.h index c6076e9..585350c 100644 ---
Re: [Xen-devel] [RFC PATCH v2 00/29] libxl: Cancelling asynchronous operations
On Tue, Apr 07, 2015 at 06:19:52PM +0100, Ian Jackson wrote: On the contrary, I think many long-running operations, such as suspend and migrations, involve multiple iterations of the libxl event loop. Actual suspend/migrate is done in a helper process; the main process is responsible for progress report handling, coordination, etc. Yes, that would work, but an open loop approach like that can lead to frustratingly unreliable tests. I think it would be best to make the test aware of the state of the helper - or even in control of it. That would allow us to wait for the helper to reach a particular state before killing it. Thanks, Euan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
Prashant Sreedharan writes (Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]): On Thu, 2015-04-09 at 12:11 +0100, Ian Jackson wrote: No. I can try to repro the problem without the bridge, if it would help. yes please do I will do so. FYI, when I came back to this test box just now (after leaving it since yesterday) and now it is completely broken: [89210.340696] DMA: Out of SW-IOMMU space for 1600 bytes at device :03:00.0 [89210.449936] tg3 :03:00.0: swiotlb buffer is full (sz: 1600 bytes) The root fs block device is also unuseable and gives lots of EIO. This is with 3.14.21, baremetal, with `iommu=soft swiotlb=force'. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v4 01/12] x86: clean up psr boot parameter parsing
Change type of opt_psr from bool to int so more psr features can fit. Introduce a new routine to parse bool parameter so that both cmt and future psr features like cat can use it. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com --- Changes in v4: * change 'int bit' to 'unsigned int mask'. * Remove printk that will never be called. Changes in v3: * Set off value explicity if requested. --- xen/arch/x86/psr.c | 39 +++ 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/xen/arch/x86/psr.c b/xen/arch/x86/psr.c index 2ef83df..344de3c 100644 --- a/xen/arch/x86/psr.c +++ b/xen/arch/x86/psr.c @@ -26,11 +26,30 @@ struct psr_assoc { }; struct psr_cmt *__read_mostly psr_cmt; -static bool_t __initdata opt_psr; +static unsigned int __initdata opt_psr; static unsigned int __initdata opt_rmid_max = 255; static uint64_t rmid_mask; static DEFINE_PER_CPU(struct psr_assoc, psr_assoc); +static void __init parse_psr_bool(char *s, char *value, char *feature, + unsigned int mask) +{ +if ( !strcmp(s, feature) ) +{ +if ( !value ) +opt_psr |= mask; +else +{ +int val_int = parse_bool(value); + +if ( val_int == 0 ) +opt_psr = ~mask; +else if ( val_int == 1 ) +opt_psr |= mask; +} +} +} + static void __init parse_psr_param(char *s) { char *ss, *val_str; @@ -44,21 +63,9 @@ static void __init parse_psr_param(char *s) if ( val_str ) *val_str++ = '\0'; -if ( !strcmp(s, cmt) ) -{ -if ( !val_str ) -opt_psr |= PSR_CMT; -else -{ -int val_int = parse_bool(val_str); -if ( val_int == 1 ) -opt_psr |= PSR_CMT; -else if ( val_int != 0 ) -printk(PSR: unknown cmt value: %s - CMT disabled!\n, -val_str); -} -} -else if ( val_str !strcmp(s, rmid_max) ) +parse_psr_bool(s, val_str, cmt, PSR_CMT); + +if ( val_str !strcmp(s, rmid_max) ) opt_rmid_max = simple_strtoul(val_str, NULL, 0); s = ss + 1; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] osstest: update FreeBSD guests to 10.1
Roger Pau Monne writes ([PATCH] osstest: update FreeBSD guests to 10.1): Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images should be placed in the osstest images folder: ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz Sadly, iwj@OSSTEST:~$ wget ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz --2015-04-09 14:36:34-- ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.qcow2.xz = `FreeBSD-10.1-RELEASE-amd64.qcow2.xz' Resolving ftp.freebsd.org (ftp.freebsd.org)... 96.47.72.72, 2610:1c1:1:606c::15:0 Connecting to ftp.freebsd.org (ftp.freebsd.org)|96.47.72.72|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD (1) /pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest ... done. == SIZE FreeBSD-10.1-RELEASE-amd64.qcow2.xz ... done. == PASV ... done.== RETR FreeBSD-10.1-RELEASE-amd64.qcow2.xz ... No such file `FreeBSD-10.1-RELEASE-amd64.qcow2.xz'. iwj@OSSTEST:~$ wget ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz --2015-04-09 14:36:40-- ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.qcow2.xz = `FreeBSD-10.1-RELEASE-i386.qcow2.xz' Resolving ftp.freebsd.org (ftp.freebsd.org)... 96.47.72.72, 2610:1c1:1:606c::15:0 Connecting to ftp.freebsd.org (ftp.freebsd.org)|96.47.72.72|:21... connected. Logging in as anonymous ... Logged in! == SYST ... done.== PWD ... done. == TYPE I ... done. == CWD (1) /pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest ... done. == SIZE FreeBSD-10.1-RELEASE-i386.qcow2.xz ... done. == PASV ... done.== RETR FreeBSD-10.1-RELEASE-i386.qcow2.xz ... No such file `FreeBSD-10.1-RELEASE-i386.qcow2.xz'. iwj@OSSTEST:~$ Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time
On Mon, Apr 06, 2015 at 10:55:48PM -0400, Waiman Long wrote: @@ -219,24 +236,30 @@ static void pv_wait_node(struct mcs_spinlock *node) } /* + * Called after setting next-locked = 1 lock acquired. + * Check if the the CPU has been halted. If so, set the _Q_SLOW_VAL flag + * and put an entry into the lock hash table to be waken up at unlock time. */ -static void pv_kick_node(struct mcs_spinlock *node) +static void pv_scan_next(struct qspinlock *lock, struct mcs_spinlock *node) I'm not too sure about that name change.. { struct pv_node *pn = (struct pv_node *)node; + struct __qspinlock *l = (void *)lock; /* + * Transition CPU state: halted = hashed + * Quit if the transition failed. */ + if (cmpxchg(pn-state, vcpu_halted, vcpu_hashed) != vcpu_halted) + return; + + /* + * Put the lock into the hash table set the _Q_SLOW_VAL in the lock. + * As this is the same CPU that will check the _Q_SLOW_VAL value and + * the hash table later on at unlock time, no atomic instruction is + * needed. + */ + WRITE_ONCE(l-locked, _Q_SLOW_VAL); + (void)pv_hash(lock, pn); } This is broken. The unlock path relies on: pv_hash() MB l-locked = SLOW such that when it observes SLOW, it must then also observe a consistent bucket. The above can have us do pv_hash_find() _before_ we actually hash the lock, which will result in us triggering that BUG_ON() in there. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 05/10] VMX: add help functions to support PML
Hi, At 10:35 +0800 on 27 Mar (1427452549), Kai Huang wrote: +void vmx_vcpu_flush_pml_buffer(struct vcpu *v) +{ +uint64_t *pml_buf; +unsigned long pml_idx; + +ASSERT(vmx_vcpu_pml_enabled(v)); + +vmx_vmcs_enter(v); + +__vmread(GUEST_PML_INDEX, pml_idx); + +/* Do nothing if PML buffer is empty */ +if ( pml_idx == (PML_ENTITY_NUM - 1) ) +goto out; + +pml_buf = map_domain_page(page_to_mfn(v-arch.hvm_vmx.pml_pg)); + +/* + * PML index can be either 2^16-1 (buffer is full), or 0~511 (buffer is not + * full), and in latter case PML index always points to next available + * entity. + */ +if (pml_idx = PML_ENTITY_NUM) +pml_idx = 0; +else +pml_idx++; + +for ( ; pml_idx PML_ENTITY_NUM; pml_idx++ ) +{ +struct p2m_domain *p2m = p2m_get_hostp2m(v-domain); +unsigned long gfn; +mfn_t mfn; +p2m_type_t t; +p2m_access_t a; + +gfn = pml_buf[pml_idx] PAGE_SHIFT; +mfn = p2m-get_entry(p2m, gfn, t, a, 0, NULL); Please don't call p2m-get_entry() directly -- that interface should only be used inside the p2m code. As it happens, I don't think this lookup is correct anyway: the logging only sees races (which are not interesting) or buggy hardware (which is not worth the extra lookup to detect). So you only need this to get 'mfn' to pass to paging_mark_dirty(). That's also buggy, because there's no locking here to make sure gfn-mfn-gfn ends up in the right place. :( I think the right thing to do is: - split paging_park_dirty() into paging_mark_gfn_dirty() (the bulk of the current function) and a paging_mark_dirty() wrapper that does get_gpfn_from_mfn(mfn_x(gmfn)) and calls paging_mark_gfn_dirty(). - call paging_mark_gfn_dirty() from vmx_vcpu_flush_pml_buffer(). That will avoid _two_ p2m lookups in this function. :) Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST Nested PATCH v7 4/6] Add new script to custmize nested test configuration
-Original Message- From: Ian Campbell [mailto:ian.campb...@citrix.com] Sent: Wednesday, April 01, 2015 4:59 PM To: Pang, LongtaoX; ian.jack...@eu.citrix.com Cc: xen-devel@lists.xen.org; wei.l...@citrix.com; Hu, Robert Subject: Re: [OSSTEST Nested PATCH v7 4/6] Add new script to custmize nested test configuration On Wed, 2015-04-01 at 08:45 +, Pang, LongtaoX wrote: As it happens I was rebasing that series this morning but due to other issues I've not managed to run it yet. Once I've managed to at least smoke test I'll CC you on the repost. OK. What's more, for the below codes which is used for starting 'osstest-confirm-booted' script to confirm whether L1 is fully booted after reboot it. I think it's necessary here. +target_cmd_root($l1, END); +wget -O overlay.tar $url +tar -xf overlay.tar -C / +rm overlay.tar -f +update-rc.d osstest-confirm-booted start 99 2 . +END In my distro series I also have some patches refactoring the overlay stuff, which would mean you could reuse that. http://article.gmane.org/gmane.comp.emulators.xen.devel/224433 I'll CC you on that one too. I don't think there would be any harm in adding those overlays for all guests and enabling the initscript, but Ian may disagree or know something which I don't. I have modified and updated the v7 patches that according to your reply. It seems that your patchs[v4 04,05,06] has not been pushed into OSSTest master tree, should I waiting for that till these patches pushed or release my v8 nested patches to you firstly? Since we prepared this for a long time. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 01/12] x86: clean up psr boot parameter parsing
On 09/04/2015 10:18, Chao Peng wrote: Change type of opt_psr from bool to int so more psr features can fit. Introduce a new routine to parse bool parameter so that both cmt and future psr features like cat can use it. Signed-off-by: Chao Peng chao.p.p...@linux.intel.com Reviewed-by: Andrew Cooper andrew.coop...@citrix.com ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 p2 12/19] xen/passthrough: Extend XEN_DOMCTL_*assign_device to support DT device
From: Julien Grall julien.gr...@linaro.org A device node is described by a path. It will be used to retrieved the node in the device tree and assign the related device to the domain. Only non-PCI protected by an IOMMU can be assigned to a guest. Also document the behavior of XEN_DOMCTL_deassign_device in the public headers which differ between non-PCI and PCI. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Jan Beulich jbeul...@suse.com Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Wei Liu wei.l...@citrix.com --- Changes in v5: - Fix comment in public/domctl.h - Remove unecessary comment in drivers/passthrough/device_tree.c - Check d-is_dying before assigning a device (consistency with PCI code) - Invert the if in iommu.c in order to avoid extra return - Add Jan's ack for non-ARM part Changes in v4: - Add XSM bits - Return -ENODEV rather than -ENOSYS - Move the if (...) into the ifdef (see iommu.c) - Document the behavior of XEN_DOMCTL_deassign_device - Use PCI_BUS and PCI_DEVFN2 when it's possible - iommu_dt_device_is_assigned now returns 0 when the device is not protected Changes in v2: - Use a different number for XEN_DOMCTL_assign_dt_device --- tools/libxc/include/xenctrl.h | 10 tools/libxc/xc_domain.c | 95 -- xen/drivers/passthrough/device_tree.c | 108 +- xen/drivers/passthrough/iommu.c | 7 ++- xen/drivers/passthrough/pci.c | 47 ++- xen/include/public/domctl.h | 24 +++- xen/include/xen/iommu.h | 3 + 7 files changed, 269 insertions(+), 25 deletions(-) diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index cc78ed6..a26d222 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2057,6 +2057,16 @@ int xc_deassign_device(xc_interface *xch, uint32_t domid, uint32_t machine_sbdf); +int xc_assign_dt_device(xc_interface *xch, +uint32_t domid, +char *path); +int xc_test_assign_dt_device(xc_interface *xch, + uint32_t domid, + char *path); +int xc_deassign_dt_device(xc_interface *xch, + uint32_t domid, + char *path); + int xc_domain_memory_mapping(xc_interface *xch, uint32_t domid, unsigned long first_gfn, diff --git a/tools/libxc/xc_domain.c b/tools/libxc/xc_domain.c index 676ec50..a6fcf14 100644 --- a/tools/libxc/xc_domain.c +++ b/tools/libxc/xc_domain.c @@ -1650,7 +1650,8 @@ int xc_assign_device( domctl.cmd = XEN_DOMCTL_assign_device; domctl.domain = domid; -domctl.u.assign_device.machine_sbdf = machine_sbdf; +domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; +domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; return do_domctl(xch, domctl); } @@ -1699,7 +1700,8 @@ int xc_test_assign_device( domctl.cmd = XEN_DOMCTL_test_assign_device; domctl.domain = domid; -domctl.u.assign_device.machine_sbdf = machine_sbdf; +domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; +domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; return do_domctl(xch, domctl); } @@ -1713,11 +1715,96 @@ int xc_deassign_device( domctl.cmd = XEN_DOMCTL_deassign_device; domctl.domain = domid; -domctl.u.assign_device.machine_sbdf = machine_sbdf; - +domctl.u.assign_device.dev = XEN_DOMCTL_DEV_PCI; +domctl.u.assign_device.u.pci.machine_sbdf = machine_sbdf; + return do_domctl(xch, domctl); } +int xc_assign_dt_device( +xc_interface *xch, +uint32_t domid, +char *path) +{ +int rc; +size_t size = strlen(path); +DECLARE_DOMCTL; +DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN); + +if ( xc_hypercall_bounce_pre(xch, path) ) +return -1; + +domctl.cmd = XEN_DOMCTL_assign_device; +domctl.domain = (domid_t)domid; + +domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT; +domctl.u.assign_device.u.dt.size = size; +set_xen_guest_handle(domctl.u.assign_device.u.dt.path, path); + +rc = do_domctl(xch, domctl); + +xc_hypercall_bounce_post(xch, path); + +return rc; +} + +int xc_test_assign_dt_device( +xc_interface *xch, +uint32_t domid, +char *path) +{ +int rc; +size_t size = strlen(path); +DECLARE_DOMCTL; +DECLARE_HYPERCALL_BOUNCE(path, size, XC_HYPERCALL_BUFFER_BOUNCE_IN); + +if ( xc_hypercall_bounce_pre(xch, path) ) +return -1; + +domctl.cmd = XEN_DOMCTL_test_assign_device; +domctl.domain = (domid_t)domid; + +domctl.u.assign_device.dev = XEN_DOMCTL_DEV_DT; +domctl.u.assign_device.u.dt.size = size; +
Re: [Xen-devel] [PATCH 2/6] x86/numa: Correct the extern of cpu_to_node
At 16:05 +0100 on 09 Apr (1428595536), Andrew Cooper wrote: On 09/04/15 16:00, Tim Deegan wrote: At 18:26 +0100 on 07 Apr (1428431176), Andrew Cooper wrote: --- a/xen/include/asm-x86/numa.h +++ b/xen/include/asm-x86/numa.h @@ -9,7 +9,7 @@ extern int srat_rev; -extern unsigned char cpu_to_node[]; +extern nodeid_t cpu_to_node[NR_CPUS]; Does the compiler do anything useful with the array size here? Specifying the size allows ARRAY_SIZE(cpu_to_node) to work in other translation units. It also allows static analysers to perform bounds checks, should they wish. In particular does it check that it matches the size at the definition? It will complain if they are mismatched. Excellent. In that case, Reviewed-by: Tim Deegan t...@xen.org Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC PATCH 0/7] Intel Cache Monitoring: Current Status and Future Opportunities
2015-04-07 9:10 GMT-04:00 Dario Faggioli dario.faggi...@citrix.com: On Tue, 2015-04-07 at 11:27 +0100, Andrew Cooper wrote: On 04/04/2015 03:14, Dario Faggioli wrote: I'm putting here in the cover letter a markdown document I wrote to better describe my findings and ideas (sorry if it's a bit long! :-D). You can also fetch it at the following links: * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.pdf * http://xenbits.xen.org/people/dariof/CMT-in-scheduling.markdown See the document itself and the changelog of the various patches for details. There seem to be several areas of confusion indicated in your document. I see. Sorry for that then. I am unsure whether this is a side effect of the way you have written it, but here are (hopefully) some words of clarification. And thanks for this. :-) PSR CMT works by tagging cache lines with the currently-active RMID. The cache utilisation is a count of the number of lines which are tagged with a specific RMID. MBM on the other hand counts the number of cache line fills and cache line evictions tagged with a specific RMID. Ok. By this nature, the information will never reveal the exact state of play. e.g. a core with RMID A which gets a cache line hit against a line currently tagged with RMID B will not alter any accounting. So, you're saying that the information we get is an approximation of reality, not it's 100% accurate representation. That is no news, IMO. When, inside Credit2, we try to track the average load on each runqueue, that is an approximation. When, in Credit1, we consider a vcpu cache hot if it run recently, that is an approximation. Etc. These approximations happens fully in software, because it is possible, in those cases. PSR provides data and insights on something that, without hardware support, we couldn't possibly hope to know anything about. Whether we should think about using such data or not, it depends whether they are represents a (base for a) reasonable enough approximation, or they are just a bunch of pseudo random numbers. It seems to me that you are suggesting the latter to be more likely than the former, i.e., PSR does not provide a good enough approximation for being used from inside Xen and toolstack, is my understanding correct? Furthermore, as alterations of the RMID only occur in __context_switch(), Xen actions such as handling an interrupt will be accounted against the currently active domain (or other future granularity of RMID). Yes, I thought about this. However, this is certainly important for per-domain, or for a (unlikely) future per-vcpu, monitoring, but if you attach an RMID to a pCPU (or groups of pCPU) then that is not really a problem. Actually, it's the correct behavior: running Xen and serving interrupts in a certain core, in that case, *do* need to be accounted! So, considering that both the document and the RFC series are mostly focused on introducing per-pcpu/core/socket monitoring, rather than on per-domain monitoring, and given that the document was becoming quite long, I decided not to add a section about this. max_rmid is a per-socket property. There is no requirement for it to be the same for each socket in a system, although it is likely, given a homogeneous system. I know. Again this was not mentioned for document length reasons, but I planned to ask about this (as I've done that already this morning, as you can see. :-D). In this case, though, it probably was something worth being mentioned, so I will if there will ever be a v2 of the document. :-) Mostly, I was curious to learn why that is not reflected in the current implementation, i.e., whether there are any reasons why we should not take advantage of per-socketness of RMIDs, as reported by SDM, as that can greatly help mitigating RMID shortage in the per-CPU/core/socket configuration (in general, actually, but it's per-cpu that I'm interested in). The limit on RMID is based on the size of the accounting table. Did not know in details, but it makes sense. Getting feedback on what should be expected as number of available RMIDs in current and future hardware, from Intel people and from everyone who knows (like you :-D ), was the main purpose of sending this out, so thanks. As far as MSRs themselves go, an extra MSR write in the context switch path is likely to pale into the noise. However, querying the data is an indirect MSR read (write to the event select MSR, read from the data MSR). Furthermore there is no way to atomically read all data at once which means that activity on other cores can interleave with back-to-back reads in the scheduler. All true. And in fact, how and how frequent data should be gathered remains to be decided (as said in the document). I was thinking more to some periodic sampling, rather than to throw handfuls of rdmsr/wrmsr against the code that makes scheduling decisions! :-D Actually,
Re: [Xen-devel] [PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock
On Thu, Apr 09, 2015 at 08:13:27PM +0200, Peter Zijlstra wrote: On Mon, Apr 06, 2015 at 10:55:44PM -0400, Waiman Long wrote: +#define PV_HB_PER_LINE (SMP_CACHE_BYTES / sizeof(struct pv_hash_bucket)) +static struct qspinlock **pv_hash(struct qspinlock *lock, struct pv_node *node) +{ + unsigned long init_hash, hash = hash_ptr(lock, pv_lock_hash_bits); + struct pv_hash_bucket *hb, *end; + + if (!hash) + hash = 1; + + init_hash = hash; + hb = pv_lock_hash[hash_align(hash)]; + for (;;) { + for (end = hb + PV_HB_PER_LINE; hb end; hb++) { + if (!cmpxchg(hb-lock, NULL, lock)) { + WRITE_ONCE(hb-node, node); + /* +* We haven't set the _Q_SLOW_VAL yet. So +* the order of writing doesn't matter. +*/ + smp_wmb(); /* matches rmb from pv_hash_find */ + goto done; + } + } + + hash = lfsr(hash, pv_lock_hash_bits, 0); Since pv_lock_hash_bits is a variable, you end up running through that massive if() forest to find the corresponding tap every single time. It cannot compile-time optimize it. Hence: hash = lfsr(hash, pv_taps); (I don't get the bits argument to the lfsr). In any case, like I said before, I think we should try a linear probe sequence first, the lfsr was over engineering from my side. + hb = pv_lock_hash[hash_align(hash)]; So one thing this does -- and one of the reasons I figured I should ditch the LFSR instead of fixing it -- is that you end up scanning each bucket HB_PER_LINE times. The 'fix' would be to LFSR on cachelines instead of HBs but then you're stuck with the 0-th cacheline. + BUG_ON(hash == init_hash); + } + +done: + return hb-lock; +} ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5 p2 09/19] xen/passthrough: iommu_deassign_device_dt: By default reassign device to nobody
From: Julien Grall julien.gr...@linaro.org Currently, when the device is deassigned from a domain, we directly reassign to DOM0. As the device may not have been correctly reset, this may lead to corruption or expose some part of DOM0 memory. Also, we may have no way to reset some platform devices. If Xen reassigns the device to nobody, it may receive some global/context fault because the transaction has failed (indeed the context has been marked invalid). Unfortunately there is no simple way to quiesce a buggy hardware. I think we could live with that for a first version of platform device passthrough. DOM0 will have to issue an hypercall to assign the device to itself if it wants to use it. Signed-off-by: Julien Grall julien.gr...@linaro.org Acked-by: Stefano Stabellini stefano.stabell...@citrix.com Acked-by: Ian Campbell ian.campb...@citrix.com --- Note: This behavior is documented in a following patch which extend DOMCT_*assign_device to support non-PCI passthrough. Changes in v5: - Add Ian's ack Changes in v4: - Add Stefano's ack Changes in v3: - Use the coding style of the new SMMU drivers Changes in v2: - Fix typoes in the commit message - Update commit message --- xen/drivers/passthrough/arm/smmu.c| 8 +++- xen/drivers/passthrough/device_tree.c | 9 +++-- 2 files changed, 10 insertions(+), 7 deletions(-) diff --git a/xen/drivers/passthrough/arm/smmu.c b/xen/drivers/passthrough/arm/smmu.c index 8a9b58b..65de50b 100644 --- a/xen/drivers/passthrough/arm/smmu.c +++ b/xen/drivers/passthrough/arm/smmu.c @@ -2692,7 +2692,7 @@ static int arm_smmu_reassign_dev(struct domain *s, struct domain *t, int ret = 0; /* Don't allow remapping on other domain than hwdom */ - if (t != hardware_domain) + if (t t != hardware_domain) return -EPERM; if (t == s) @@ -2702,6 +2702,12 @@ static int arm_smmu_reassign_dev(struct domain *s, struct domain *t, if (ret) return ret; + if (t) { + ret = arm_smmu_assign_dev(t, devfn, dev); + if (ret) + return ret; + } + return 0; } diff --git a/xen/drivers/passthrough/device_tree.c b/xen/drivers/passthrough/device_tree.c index 05ab274..0ec4103 100644 --- a/xen/drivers/passthrough/device_tree.c +++ b/xen/drivers/passthrough/device_tree.c @@ -80,15 +80,12 @@ int iommu_deassign_dt_device(struct domain *d, struct dt_device_node *dev) spin_lock(dtdevs_lock); -rc = hd-platform_ops-reassign_device(d, hardware_domain, - 0, dt_to_dev(dev)); +rc = hd-platform_ops-reassign_device(d, NULL, 0, dt_to_dev(dev)); if ( rc ) goto fail; -list_del(dev-domain_list); - -dt_device_set_used_by(dev, hardware_domain-domain_id); -list_add(dev-domain_list, domain_hvm_iommu(hardware_domain)-dt_devices); +list_del_init(dev-domain_list); +dt_device_set_used_by(dev, DOMID_IO); fail: spin_unlock(dtdevs_lock); -- 2.1.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] xen/pci: Try harder to get PXM information for Xen
On 08/04/15 18:17, Boris Ostrovsky wrote: On 04/08/2015 12:44 PM, David Vrabel wrote: On 08/04/15 15:01, Boris Ostrovsky wrote: On 04/08/2015 09:39 AM, Ross Lagerwall wrote: If the device being added to Xen is not contained in the ACPI table, walk the PCI device tree to find a parent that is contained in the ACPI table before finding the PXM information from this device. Signed-off-by: Ross Lagerwall ross.lagerw...@citrix.com --- drivers/xen/pci.c | 15 +-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c index 95ee430..6837181 100644 --- a/drivers/xen/pci.c +++ b/drivers/xen/pci.c @@ -19,6 +19,7 @@ #include linux/pci.h #include linux/acpi.h +#include linux/pci-acpi.h #include xen/xen.h #include xen/interface/physdev.h #include xen/interface/xen.h @@ -67,8 +68,18 @@ static int xen_add_device(struct device *dev) #ifdef CONFIG_ACPI handle = ACPI_HANDLE(pci_dev-dev); -if (!handle pci_dev-bus-bridge) -handle = ACPI_HANDLE(pci_dev-bus-bridge); +if (!handle) { +/* + * This device was not listed in the ACPI name space at + * all. Try to get acpi handle of parent pci bus. + */ +struct pci_bus *pbus; +for (pbus = pci_dev-bus; pbus; pbus = pbus-parent) { +handle = acpi_pci_get_bridge_handle(pbus); +if (handle) +break; +} +} #ifdef CONFIG_PCI_IOV if (!handle pci_dev-is_virtfn) handle = ACPI_HANDLE(physfn-bus-bridge); Shouldn't we first look at physfn, before going up the tree? That sounds sensible but should be a separate pre-requisite patch. It's already there: the last two (unchanged) lines above. The added chunk should just move to after those two. OK, I can swap it around. -- Ross Lagerwall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem
Hi, Sorry for the delay - I have been away. At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote: Tamas K Lengyel (12): xen/mem_event: Cleanup of mem_event structures xen/mem_event: Cleanup mem_event names in rings, functions and domctls xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup xen: Rename mem_event to vm_event tools/tests: Clean-up tools/tests/xen-access x86/hvm: factor out and rename vm_event related functions I have applied these six patches. xen: Introduce monitor_op domctl This one no longer applies cleanly - looks like a conflict with a7511905 (xen: Extend DOMCTL createdomain to support arch configuration) Can you rebase the second half of the series please? Cheers, Tim. xen/vm_event: Deprecate VM_EVENT_FLAG_DUMMY flag xen/vm_event: Decouple vm_event and mem_access. xen/vm_event: Relocate memop checks xen/xsm: Split vm_event_op into three separate labels xen/vm_event: Add RESUME option to vm_event_op domctl MAINTAINERS| 6 +- docs/misc/xsm-flask.txt| 2 +- tools/libxc/Makefile | 3 +- tools/libxc/include/xenctrl.h | 59 ++- tools/libxc/xc_domain.c| 28 +- tools/libxc/xc_domain_restore.c| 14 +- tools/libxc/xc_domain_save.c | 4 +- tools/libxc/xc_hvm_build_x86.c | 2 +- tools/libxc/xc_mem_access.c| 56 ++- tools/libxc/xc_mem_paging.c| 80 ++-- tools/libxc/xc_memshr.c| 29 +- tools/libxc/xc_monitor.c | 137 +++ tools/libxc/xc_private.h | 15 +- tools/libxc/{xc_mem_event.c = xc_vm_event.c} | 59 +-- tools/libxc/xg_save_restore.h | 2 +- tools/tests/xen-access/xen-access.c| 264 + tools/xenpaging/pagein.c | 2 +- tools/xenpaging/xenpaging.c| 155 tools/xenpaging/xenpaging.h| 8 +- xen/arch/x86/Makefile | 1 + xen/arch/x86/domain.c | 2 +- xen/arch/x86/domctl.c | 4 +- xen/arch/x86/hvm/Makefile | 3 +- xen/arch/x86/hvm/emulate.c | 8 +- xen/arch/x86/hvm/event.c | 196 ++ xen/arch/x86/hvm/hvm.c | 189 + xen/arch/x86/hvm/vmx/vmcs.c| 11 +- xen/arch/x86/hvm/vmx/vmx.c | 9 +- xen/arch/x86/mm/hap/nested_ept.c | 4 +- xen/arch/x86/mm/hap/nested_hap.c | 4 +- xen/arch/x86/mm/mem_paging.c | 61 +-- xen/arch/x86/mm/mem_sharing.c | 180 - xen/arch/x86/mm/p2m-pod.c | 4 +- xen/arch/x86/mm/p2m-pt.c | 4 +- xen/arch/x86/mm/p2m.c | 271 +++-- xen/arch/x86/monitor.c | 195 ++ xen/arch/x86/x86_64/compat/mm.c| 24 +- xen/arch/x86/x86_64/mm.c | 24 +- xen/common/Makefile| 18 +- xen/common/domain.c| 12 +- xen/common/domctl.c| 17 +- xen/common/mem_access.c| 55 +-- xen/common/{mem_event.c = vm_event.c} | 505 + xen/drivers/passthrough/pci.c | 2 +- xen/include/asm-arm/monitor.h | 35 ++ xen/include/asm-arm/p2m.h | 22 +- xen/include/asm-x86/domain.h | 26 +- xen/include/asm-x86/hvm/domain.h | 1 - xen/include/asm-x86/hvm/emulate.h | 2 +- xen/include/asm-x86/hvm/event.h| 40 ++ xen/include/asm-x86/hvm/hvm.h | 11 - xen/include/asm-x86/mem_paging.h | 5 +- xen/include/asm-x86/mem_sharing.h | 4 +- xen/include/asm-x86/monitor.h | 31 ++ xen/include/asm-x86/p2m.h | 41 +- xen/include/public/domctl.h| 113 -- xen/include/public/hvm/params.h| 11 +- xen/include/public/memory.h| 27 +- xen/include/public/{mem_event.h = vm_event.h} | 183 ++--- xen/include/xen/mem_access.h | 18 +- xen/include/xen/p2m-common.h | 4 +- xen/include/xen/sched.h| 28 +- xen/include/xen/{mem_event.h = vm_event.h}| 103 ++--- xen/include/xsm/dummy.h| 22 +- xen/include/xsm/xsm.h | 35 +- xen/xsm/dummy.c| 13 +- xen/xsm/flask/hooks.c | 66 +++-
[Xen-devel] [PATCH 1/3] xen/x86: Infrastructure to create BUG_FRAMES in asm code
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com --- xen/include/asm-x86/bug.h | 48 - 1 file changed, 43 insertions(+), 5 deletions(-) diff --git a/xen/include/asm-x86/bug.h b/xen/include/asm-x86/bug.h index cd862e3..365c6b8 100644 --- a/xen/include/asm-x86/bug.h +++ b/xen/include/asm-x86/bug.h @@ -5,6 +5,13 @@ #define BUG_LINE_LO_WIDTH (31 - BUG_DISP_WIDTH) #define BUG_LINE_HI_WIDTH (31 - BUG_DISP_WIDTH) +#define BUGFRAME_run_fn 0 +#define BUGFRAME_warn 1 +#define BUGFRAME_bug2 +#define BUGFRAME_assert 3 + +#ifndef __ASSEMBLY__ + struct bug_frame { signed int loc_disp:BUG_DISP_WIDTH; unsigned int line_hi:BUG_LINE_HI_WIDTH; @@ -22,11 +29,6 @@ struct bug_frame { ((1 BUG_LINE_LO_WIDTH) - 1))) #define bug_msg(b) ((const char *)(b) + (b)-msg_disp[1]) -#define BUGFRAME_run_fn 0 -#define BUGFRAME_warn 1 -#define BUGFRAME_bug2 -#define BUGFRAME_assert 3 - #define BUG_FRAME(type, line, ptr, second_frame, msg) do { \ BUILD_BUG_ON((line) (BUG_LINE_LO_WIDTH + BUG_LINE_HI_WIDTH)); \ asm volatile ( .Lbug%=: ud2\n \ @@ -66,4 +68,40 @@ struct bug_frame { __stop_bug_frames_2[], __stop_bug_frames_3[]; +#else /* !__ASSEMBLY__ */ + +/* + * Construct a bugframe, suitable for using in assembly code. Should always + * match the C version above. One complication is having to stash the strings + * in .rodata (TODO - figure out how to get GAS to elide duplicate file_str's) + */ +.macro BUG_FRAME type, line, file_str, second_frame, msg +92: ud2a + +.pushsection .rodata +94: .asciz \file_str +.popsection + +.pushsection .bug_frames.\type, a, @progbits +93: +.long (92b - 93b) + ((\line BUG_LINE_LO_WIDTH) BUG_DISP_WIDTH) +.long (94b - 93b) + ((\line ((1 BUG_LINE_LO_WIDTH) - 1)) BUG_DISP_WIDTH) + +.if \second_frame + .pushsection .rodata + 95: .asciz \msg + .popsection +.long 0, (95b - 93b) +.endif +.popsection +.endm + +#define WARN() BUG_FRAME BUGFRAME_warn, __LINE__, __FILE__, 0, 0 +#define BUG() BUG_FRAME BUGFRAME_bug, __LINE__, __FILE__, 0, 0 + +#define ASSERT_FAILED(msg) \ + BUG_FRAME BUGFRAME_assert, __LINE__, __FILE__, 1, msg + +#endif /* !__ASSEMBLY__ */ + #endif /* __X86_BUG_H__ */ -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH 2/3] xen/x86: Use real assert frames for ASSERT_INTERRUPTS_{EN, DIS}ABLED
Signed-off-by: Andrew Cooper andrew.coop...@citrix.com CC: Keir Fraser k...@xen.org CC: Jan Beulich jbeul...@suse.com --- xen/include/asm-x86/asm_defns.h | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/xen/include/asm-x86/asm_defns.h b/xen/include/asm-x86/asm_defns.h index 1674c7c..e8a678e 100644 --- a/xen/include/asm-x86/asm_defns.h +++ b/xen/include/asm-x86/asm_defns.h @@ -6,6 +6,7 @@ /* NB. Auto-generated from arch/.../asm-offsets.c */ #include asm/asm-offsets.h #endif +#include asm/bug.h #include asm/processor.h #include asm/percpu.h #include xen/stringify.h @@ -26,18 +27,24 @@ #endif #ifndef NDEBUG -#define ASSERT_INTERRUPT_STATUS(x) \ -pushf; \ -testb $X86_EFLAGS_IF8,1(%rsp);\ -j##x 1f; \ -ud2a; \ -1: addq $8,%rsp; +#define ASSERT_INTERRUPTS_ENABLED \ +pushf; \ +testb $X86_EFLAGS_IF8,1(%rsp);\ +jnz 1f; \ +ASSERT_FAILED(INTERRUPTS ENABLED);\ +1: addq $8,%rsp; + +#define ASSERT_INTERRUPTS_DISABLED \ +pushf; \ +testb $X86_EFLAGS_IF8,1(%rsp);\ +jz1f; \ +ASSERT_FAILED(INTERRUPTS DISABLED); \ +1: addq $8,%rsp; #else -#define ASSERT_INTERRUPT_STATUS(x) +#define ASSERT_INTERRUPTS_ENABLED +#define ASSERT_INTERRUPTS_DISABLED #endif -#define ASSERT_INTERRUPTS_ENABLED ASSERT_INTERRUPT_STATUS(nz) -#define ASSERT_INTERRUPTS_DISABLED ASSERT_INTERRUPT_STATUS(z) /* * This flag is set in an exception frame when registers R12-R15 did not get -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, 9 Apr 2015, Eric Dumazet wrote: On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: Hi all, I found a performance regression when running netperf -t TCP_MAERTS from an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the virtual machine are 30% slower than v3.18. Through bisection I found that the perf regression is caused by the prensence of the following commit in the guest kernel: commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 Author: Eric Dumazet eduma...@google.com Date: Sun Dec 7 12:22:18 2014 -0800 tcp: refine TSO autosizing A simple revert would fix the issue. Does anybody have any ideas on what could be the cause of the problem? Suggestions on what to do to fix it? You sent this to lkml while networking discussions are on netdev. This topic had been discussed on netdev multiple times. Sorry, and many thanks for the quick reply! This commit restored original TCP Small Queue behavior, which is the first step to fight bufferbloat. Some network drivers are known to be problematic because of a delayed TX completion. So far this commit did not impact max single flow throughput on 40Gb mlx4 NIC. (ie : line rate is possible) Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it makes a difference ? A very big difference: echo 262144 /proc/sys/net/ipv4/tcp_limit_output_bytes brings us much closer to the original performance, the slowdown is just 8% echo 1048576 /proc/sys/net/ipv4/tcp_limit_output_bytes fills the gap entirely, same performance as before refine TSO autosizing What would be the next step for here? Should I just document this as an important performance tweaking step for Xen, or is there something else we can do? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v2 1/2] osstest: update FreeBSD guests to 10.1
Update FreeBSD guests in OSSTest to FreeBSD 10.1. The following images should be placed in the osstest images folder: ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/amd64/Latest/FreeBSD-10.1-RELEASE-amd64.raw.xz ftp://ftp.freebsd.org/pub/FreeBSD/releases/VM-IMAGES/10.1-RELEASE/i386/Latest/FreeBSD-10.1-RELEASE-i386.raw.xz Since new images are in raw format rather than qcow2 remove the runes to convert from qcow2 to raw. Signed-off-by: Roger Pau Monné roger@citrix.com Cc: Ian Jackson ian.jack...@eu.citrix.com --- Changes since v1: - Remove the runes to convert the image from qcow2 to raw. --- make-flight| 2 +- ts-freebsd-install | 7 ++- 2 files changed, 3 insertions(+), 6 deletions(-) diff --git a/make-flight b/make-flight index 8ac3a87..b340d78 100755 --- a/make-flight +++ b/make-flight @@ -150,7 +150,7 @@ do_freebsd_tests () { job_create_test test-$xenarch$kern-$dom0arch$qemuu_suffix-freebsd10-$freebsdarch \ test-freebsd xl $xenarch $dom0arch \ freebsd_arch=$freebsdarch \ - freebsd_image=${FREEBSD_IMAGE_PREFIX-FreeBSD-10.0-RELEASE-}$freebsdarch${FREEBSD_IMAGE_SUFFIX--20140116-r260789.qcow2.xz} \ + freebsd_image=${FREEBSD_IMAGE_PREFIX-FreeBSD-10.1-RELEASE-}$freebsdarch${FREEBSD_IMAGE_SUFFIX-.raw.xz} \ all_hostflags=$most_hostflags done diff --git a/ts-freebsd-install b/ts-freebsd-install index 6c6abbe..61d2f83 100755 --- a/ts-freebsd-install +++ b/ts-freebsd-install @@ -51,8 +51,7 @@ our $freebsd_vm_repo= '/var/images'; sub prep () { my $authkeys= authorized_keys(); -target_install_packages_norec($ho, qw(rsync lvm2 qemu-utils - xz-utils kpartx)); +target_install_packages_norec($ho, qw(rsync lvm2 xz-utils kpartx)); $gho= prepareguest($ho, $gn, $guesthost, 22, $disk_mb + 1, @@ -76,9 +75,7 @@ sub prep () { target_cmd_root($ho, END, 900); set -ex -xz -dkc $rimage $rimagebase.qcow2 -qemu-img convert -f qcow2 $rimagebase.qcow2 -O raw $rimagebase.raw -rm $rimagebase.qcow2 +xz -dkc $rimage $rimagebase.raw dd if=$rimagebase.raw of=$gho-{Lvdev} bs=1M rm $rimagebase.raw -- 1.9.5 (Apple Git-50.3) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle
Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle): On 09/04/15 17:17, Ian Jackson wrote: I have to say I have no idea what a phandle is... A phandle is a way to reference another node in the device tree. Any node that can referenced defines a phandle property with a unique unsigned 32 bit value. Thanks for the explanation. Reserve the ID 65000 for the GIC phandle. I think we can safely assume that the partial device tree will never contain a such ID. Do we control the DT compiler ? What if it should change its phandle allocation algorithm ? We don't control the DT compiler. But the algorithm of the phandle will unlikely change. FWIW, the compiler is very tiny, it's not GCC. Right. I only expect people using the partial device tree in very specific use case. Generic use case is not even possible with the current status of non-PCI (i.e device tree) passthrough. So people control their environment. As I said later in patch, supporting dynamic allocation will require some rework in the device tree creation for the guest. So I was suggesting this solution as temporary in order to not block the DT passthrough. What would happen if our assumption about the DT compiler were violated ? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] remove entry in shadow table
Hi, At 18:15 +0200 on 06 Apr (1428344115), HANNAS YAYA Issa wrote: I want to remove entry of a given page in the shadow page table so that when the next time the guest access to the page there is page fault. Here is what I try to do: 1. I have a timer which wake up every 30 seconds and remove entry in the shadow by calling sh_remove_all_mappings(d-vcpu[0], _mfn(page_to_mfn(page))) here d is the domain and page is the page that I want to remove from the shadow page table. 2. In the function sh_page_fault() I get the gmfn and compare it with the mfn of the page that I removed earlier from the shadow page table. Is this method correct? Yes, though it may be extremely slow if you're doing it for large numbers of mfns, since sh_remove_all_mappings() may have to do a brute-force search of all PTEs for each one. You should probably put your check for the gmfn in _sh_propagate(), rather than sh_page_fault(). That way it will also see things like prefetched mappings. I also get this error: sh error: sh_remove_all_mappings(): can't find all mappings of mfn That usually means that there's a mapping of that frame from another domain. Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V8 00/12] xen: Clean-up of mem_event subsystem
On Thu, Apr 9, 2015 at 1:07 PM, Tamas Lengyel tamas.leng...@zentific.com wrote: On Thu, Apr 9, 2015 at 1:03 PM, Tim Deegan t...@xen.org wrote: Hi, Sorry for the delay - I have been away. At 22:06 +0100 on 26 Mar (1427407612), Tamas K Lengyel wrote: Tamas K Lengyel (12): xen/mem_event: Cleanup of mem_event structures xen/mem_event: Cleanup mem_event names in rings, functions and domctls xen/mem_paging: Convert mem_event_op to mem_paging_op and cleanup xen: Rename mem_event to vm_event tools/tests: Clean-up tools/tests/xen-access x86/hvm: factor out and rename vm_event related functions I have applied these six patches. xen: Introduce monitor_op domctl This one no longer applies cleanly - looks like a conflict with a7511905 (xen: Extend DOMCTL createdomain to support arch configuration) Can you rebase the second half of the series please? Absolutely. Will be sending it shortly, thanks. Tamas Cheers, Tim. What's the policy on reusing DOMCTL numbers? I see XEN_DOMCTL_arm_configure_domain has been retired in the conflicting patch. Should I just reuse it's number for monitor_op? For the most part domctl numbers seem to be continuous but there are holes (30-32) so I'm not sure. Tamas ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4 30/33] tools/libxl: arm: Use an higher value for the GIC phandle
Hi Ian, On 31/03/15 12:43, Ian Campbell wrote: On Thu, 2015-03-19 at 19:29 +, Julien Grall wrote: The partial device tree may contains phandle. The Device Tree Compiler tends to allocate the phandle from 1. Reserve the ID 65000 for the GIC phandle. I think we can safely assume that the partial device tree will never contain a such ID. Signed-off-by: Julien Grall julien.gr...@linaro.org Cc: Ian Jackson ian.jack...@eu.citrix.com Cc: Wei Liu wei.l...@citrix.com --- It's not easily possible to track the maximum phandle in the partial device tree. We would need to parse it twice: one for looking the maximum phandle, and one for copying the nodes. This is because we have to know the phandle of the GIC when we create the properties of the root. Or you could fill it in post-hoc like we do with e.g. the initramfs location? That would work. I will see for a follow-up of this patch series. Anyway, this'll do for now: Acked-by: Ian Campbell ian.ampb...@citrix.com As the phandle is encoded an unsigned 32 bits, I could use an higher value. Though, having 65000 phandle is already a lot... TODO: If it's necessary, I can check if the value has been used by another phandle in the device tree. If that's easy enough to add then yes please, but if it is complex then don't bother. I would prefer to postpone and replace with a follow-up to allocate dynamically the phandle. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]
On Thu, 2015-04-09 at 18:25 +0100, Ian Jackson wrote: root@bedbug:~# ethtool -S eth0 | grep -v ': 0$' NIC statistics: rx_octets: 8196868 rx_ucast_packets: 633 rx_mcast_packets: 1 rx_bcast_packets: 123789 tx_octets: 42854 tx_ucast_packets: 9 tx_mcast_packets: 8 tx_bcast_packets: 603 root@bedbug:~# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:13:72:14:c0:51 inet addr:10.80.249.102 Bcast:10.80.251.255 Mask:255.255.252.0 inet6 addr: fe80::213:72ff:fe14:c051/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:124774 errors:0 dropped:88921 overruns:0 frame:0 TX packets:620 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:8222158 (7.8 MiB) TX bytes:42854 (41.8 KiB) Interrupt:17 root@bedbug:~# It appears therefore that packets are being corrupted on the receive path, and the kernel then drops them (as misaddressed). thanks for the repo, the RX drop counter is updated at few places in the driver. Please use the attached debug patch and provide the logs From 777363eb77bddd52b9983c0025fed8b4ec151417 Mon Sep 17 00:00:00 2001 From: Prashant Sreedharan prash...@broadcom.com Date: Thu, 9 Apr 2015 10:52:17 -0700 Subject: [stable: 3.14.37]tg3: debug_patch --- drivers/net/ethernet/broadcom/tg3.c | 13 +++-- 1 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c index 8206113..5e2c9d6 100644 --- a/drivers/net/ethernet/broadcom/tg3.c +++ b/drivers/net/ethernet/broadcom/tg3.c @@ -6871,8 +6871,11 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget) skb_size = tg3_alloc_rx_data(tp, tpr, opaque_key, *post_ptr, frag_size); - if (skb_size 0) + if (skb_size 0) { +netdev_err(tp-dev, alloc_rx failure %x %x %x\n, + skb_size, opaque_key, frag_size); goto drop_it; + } pci_unmap_single(tp-pdev, dma_addr, skb_size, PCI_DMA_FROMDEVICE); @@ -6886,6 +6889,8 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget) skb = build_skb(data, frag_size); if (!skb) { +netdev_err(tp-dev, build_skb failure %d\n, + frag_size); tg3_frag_free(frag_size != 0, data); goto drop_it_no_recycle; } @@ -6896,8 +6901,10 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget) skb = netdev_alloc_skb(tp-dev, len + TG3_RAW_IP_ALIGN); - if (skb == NULL) + if (skb == NULL) { +netdev_err(tp-dev, alloc_skb fail %d\n, len); goto drop_it_no_recycle; + } skb_reserve(skb, TG3_RAW_IP_ALIGN); pci_dma_sync_single_for_cpu(tp-pdev, dma_addr, len, PCI_DMA_FROMDEVICE); @@ -6925,6 +6932,8 @@ static int tg3_rx(struct tg3_napi *tnapi, int budget) if (len (tp-dev-mtu + ETH_HLEN) skb-protocol != htons(ETH_P_8021Q) skb-protocol != htons(ETH_P_8021AD)) { + netdev_err(tp-dev, Proto %x %x\n, + skb-protocol, len); dev_kfree_skb(skb); goto drop_it_no_recycle; } -- 1.7.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] machine address
At 10:26 +0100 on 31 Mar (1427797566), George Dunlap wrote: On Mon, Mar 30, 2015 at 3:00 PM, HANNAS YAYA Issa issa.hannasy...@enseeiht.fr wrote: Hi When there is a page fault the trapper of the page fault in the hypervisor is do_page_fault in xen/arch/x86/traps.c right? That's for PV guests. For HVM guests, the page fault causes a VMEXIT, which will be handled in xen/arch/x86/hvm/vmx/vmx.c:vmx_vmexit_handler() (on Intel). in this funcion i found a method read_cr2() which return the virtual adrress of the page who generate the page fault. My question is : is it possible to get the machine address of the page table entry for this virtual address? In general the way you have to do that is to use the virtual address to walk the guest's pagetables (exactly the same way the hardware would do on a TLB miss). For HVM guests (or PV guests in shadow mode) there's already code to do walk for you in xen/arch/x86/mm/guest_walk.c:guest_walk(). You can see how it's called from the HAP code and the shadow code if you want. I don't immediately see a walker for PV guests. There one in __page_fault_type(). You can also use the linear pagetable mappings, if you know what you're doing -- see, e.g., guest_map_l1e() which does something very like this. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle
On 09/04/15 17:52, Ian Jackson wrote: Julien Grall writes (Re: [Xen-devel] [PATCH v5 p2 16/19] tools/libxl: arm: Use an higher value for the GIC phandle): On 09/04/15 17:17, Ian Jackson wrote: I only expect people using the partial device tree in very specific use case. Generic use case is not even possible with the current status of non-PCI (i.e device tree) passthrough. So people control their environment. As I said later in patch, supporting dynamic allocation will require some rework in the device tree creation for the guest. So I was suggesting this solution as temporary in order to not block the DT passthrough. What would happen if our assumption about the DT compiler were violated ? The phandle would be present in 2 different nodes of the DT. FYI, that may also happen if a user use 2 times the same phandle in the partial DT. The guest may retrieve the wrong node and warn/crash depending on the implementation. Although, it won't impact neither Xen nor the toolstack. Regards, -- Julien Grall ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Patch V2 07/15] xen: check memory area against e820 map
On 09/04/15 07:55, Juergen Gross wrote: Provide a service routine to check a physical memory area against the E820 map. The routine will return false if the complete area is RAM according to the E820 map and true otherwise. Signed-off-by: Juergen Gross jgr...@suse.com --- arch/x86/xen/setup.c | 23 +++ arch/x86/xen/xen-ops.h | 1 + 2 files changed, 24 insertions(+) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index 87251b4..4666adf 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -573,6 +573,29 @@ static unsigned long __init xen_count_remap_pages(unsigned long max_pfn) return extra; } +bool __init xen_chk_e820_reserved(phys_addr_t start, phys_addr_t size) Can you rename this to xen_is_e280_reserved(). Otherwise, Reviewed-by: David Vrabel david.vra...@citrix.com David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] xen/pci: Try harder to get PXM information for Xen
On 09/04/15 08:05, Ross Lagerwall wrote: If the device being added to Xen is not contained in the ACPI table, walk the PCI device tree to find a parent that is contained in the ACPI table before finding the PXM information from this device. Previously, it would try to get a handle for the device, then the device's bridge, then the physfn. This changes the order so that it tries to get a handle for the device, then the physfn, the walks up the PCI device tree. Applied to devel/for-linus-4.1, thanks. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] tcp: refine TSO autosizing causes performance regression on Xen
On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: Hi all, I found a performance regression when running netperf -t TCP_MAERTS from an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the virtual machine are 30% slower than v3.18. Through bisection I found that the perf regression is caused by the prensence of the following commit in the guest kernel: commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 Author: Eric Dumazet eduma...@google.com Date: Sun Dec 7 12:22:18 2014 -0800 tcp: refine TSO autosizing A simple revert would fix the issue. Does anybody have any ideas on what could be the cause of the problem? Suggestions on what to do to fix it? You sent this to lkml while networking discussions are on netdev. This topic had been discussed on netdev multiple times. This commit restored original TCP Small Queue behavior, which is the first step to fight bufferbloat. Some network drivers are known to be problematic because of a delayed TX completion. So far this commit did not impact max single flow throughput on 40Gb mlx4 NIC. (ie : line rate is possible) Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it makes a difference ? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel