Version 15 of PV(H) PMU patches. Changes in v15:
* Rewrote vpmu_force_context_switch() to use continue_hypercall_on_cpu() * Added vpmu_init initcall that will call vendor-specific init routines * Added a lock to vpmu_struct to serialize pmu_init()/pmu_finish() * Use SS instead of CS for DPL (instead of RPL) * Don't take lock for XENPMU_mode_get * Make vmpu_mode/features an unsigned int (from uint64_t) * Adjusted pvh_hypercall64_table[] order * Replaced address range check [XEN_VIRT_START..XEN_VIRT_END] with guest_mode() * A few style cleanups Changes in v14: * Moved struct xen_pmu_regs to pmu.h * Moved CHECK_pmu_* to an earlier patch (when structures are first introduced) * Added PMU_SAMPLE_REAL flag to indicate whether the sample was taken in real mode * Simplified slightly setting rules for xenpmu_data flags * Rewrote vpmu_force_context_switch() to again use continuations. (Returning EAGAIN to user would mean that VPMU mode may get into inconsistent state (across processors) and dealing with that is more compicated than I'd like). * Fixed msraddr_to_bitpos() and converted it into an inline * Replaced address range check in vmpu_do_interrupt() with guest_mode() * No error returns from __initcall * Rebased on top of recent VPMU changes * Various cleanups Changes in v13: * Rearranged data in xenpf_symdata to eliminate a hole (no change in structure size) * Removed unnecessary zeroing of last character in name string during symbol read hypercall * Updated comment in access_vectors for pmu_use operation * Compute AMD MSR bank size at runtime * Moved a couple of BUILD_BUG_ON later, to when the structures they are checking are first used * Added ss and eflags to xen_pmu_registers structure * vpmu_force_context_switch() uses per-cpu tasklet pointers. * Moved most of arch-specific VPMU initialization code into an __initcall() to avoid re-calculating/re-checking things more than once (new patch, #12) * Replaced is_*_domain() with is_*_vcpu() * choose_hwdom_vcpu() will now assume that hardware_domain->vcpu[idx] is always there (callers will still verify whether or not that's true) * Fixed a couple of sampled vs. sampling tests in vpmu_do_interrupt() * Pass CS to the guest unchanged, add pmu_flags's flag to indicate whether the sample was for a user or kernel space. Move pmu_flags from xen_pmu_data into xen_pmu_arch * Removed local msr_content variable from vpmu_do_wrmsr() * Re-arranged code in parse_vpmu_param() * Made vpmu_interrupt_type checks test for value, not individual bits * Moved PMU_SOFTIRQ definition into arch-specific header * Moved vpmu*.c files into xen/arch/x86/cpu/ instead of xen/arch/x86/ * For hypervisor samples, report DOMID_XEN to the guest * Changed logic to select which registers to report to the guest (include RIP check to see whether we are in the hypervisor) Changes in v12: * Added XSM support * Made a valifity check before writing MSR_CORE_PERF_GLOBAL_OVF_CTRL * Updated documentation for 'vpmu=nmi' option * Added more text to a bunch of commit messages (per Konrad's request) Changes in v11: * Replaced cpu_user_regs with new xen_pmu_regs (IP, SP, CS) in xen_pmu_arch. - as part of this re-work noticed that CS registers were set in later patch then needed. Moved those changes to appropriate place * Added new VPMU mode (XENPMU_MODE_HV). Now XENPMU_MODE_SELF will only provide dom0 with its own samples only (i.e. no hypervisor data) and XENPMU_MODE_HV will be what XENPMU_MODE_SELF used to be. * Kept vmx_add_guest_msr()/vmx_add_host_load_msr() as wrappers around vmx_add_msr() * Cleaned up VPMU context switch macros (moved 'if(prev!=next)' back to context_switch()) * Dropped hypercall continuation from vpmu_force_context_switch() and replaced it with -EAGAIN error if hypercall_preempt_check() is true after 2ms. * Kept vpmu_do_rdmsr()/vpmu_do_wrmsr as wrapperd for vpmu_do_msr() * Move context switching patch (#13) earlier in the series (for proper bisection support) * Various comment updates and cleanups * Dropped a bunch of Reviewed-by and all Tested-by tags Changes in v10: * Swapped address and name fields of xenpf_symdata (to make it smaller on 32-bit) * Dropped vmx_rm_guest_msr() as it requires refcountig which makes code more complicated. * Cleaned up vlapic_reg_write() * Call vpmu_destroy() for both HVM and PVH VCPUs * Verify that (xen_pmu_data+PMU register bank) fit into a page * Return error codes from arch-specific VPMU init code * Moved VPMU-related context switch logic into inlines * vpmu_force_context_switch() changes: o Avoid greater than page-sized allocations o Prevent another VCPU from starting VPMU sync while the first sync is in progress * Avoid stack leak in do_xenpmu_op() * Checked validity of Intel VPMU MSR values before they are committed * Fixed MSR handling in traps.c (avoid potential accesses to Intel MSRs on AMD) * Fixed VCPU selection in interrupt handler for 32-bit dom0 (sampled => sampling) * Clarified commit messages (patches 2, 13, 18) * Various cleanups Changes in v9: * Restore VPMU context after context_saved() is called in context_switch(). This is needed because vpmu_load() may end up calling vmx_vmcs_try_enter()->vcpu_pause() and that needs is_running to be correctly set/cleared. (patch 18, dropped review acks) * Added patch 2 to properly manage VPMU_CONTEXT_LOADED * Addressed most of Jan's comments. o Keep track of time in vpmu_force_context_switch() to properly break out of a loop when using hypercall continuations o Fixed logic in calling vpmu_do_msr() in emulate_privileged_op() o Cleaned up vpmu_interrupt() wrt vcpu variable names to (hopefully) make it more clear which vcpu we are using o Cleaned up vpmu_do_wrmsr() o Did *not* replace sizeof(uint64_t) with sizeof(variable) in amd_vpmu_initialise(): throughout the code registers are declared as uint64_t and if we are to add a new type (e.g. reg_t) this should be done in a separate patch, unrelated to this series. o Various more minor cleanups and code style fixes Changes in v8: * Cleaned up a bit definitions of struct xenpf_symdata and xen_pmu_params * Added compat checks for vpmu structures * Converted vpmu flag manipulation macros to inline routines * Reimplemented vpmu_unload_all() to avoid long loops * Reworked PMU fault generation and handling (new patch #12) * Added checks for domain->vcpu[] non-NULLness * Added more comments, renamed some routines and macros, code style cleanup Changes in v7: * When reading hypervisor symbols make the caller pass buffer length (as opposed to having this length be part of the API). Make the hypervisor buffer static, make xensyms_read() return zero-length string on end-of-symbols. Make 'type' field of xenpf_symdata a char, drop compat_pf_symdata definition. * Spread PVH support across patches as opposed to lumping it into a separate patch * Rename vpmu_is_set_all() to vpmu_are_all_set() * Split VPMU cleanup patch in two * Use memmove when copying VMX guest and host MSRs * Make padding of xen_arch_pmu's context union a constand that does not depend on arch context size. * Set interface version to 0.1 * Check pointer validity in pvpmu_init/destroy() * Fixed crash in core2_vpmu_dump() * Fixed crash in vmx_add_msr() * Break handling of Intel and AMD MSRs in traps.c into separate cases * Pass full CS selector to guests * Add lock in pvpmu init code to prevent potential race Changes in v6: * Two new patches: o Merge VMX MSR add/remove routines in vmcs.c (patch 5) o Merge VPMU read/write MSR routines in vpmu.c (patch 14) * Check for pending NMI softirq after saving VPMU context to prevent a newly-scheduled guest from overwriting sampled_vcpu written by de-scheduled VPCU. * Keep track of enabled counters on Intel. This was removed in earlier patches and was a mistake. As result of this change struct vpmu will have a pointer to private context data (i.e. data that is not exposed to a PV(H) guest). Use this private pointer on SVM as well for storing MSR bitmap status (it was unnecessarily exposed to PV guests earlier). Dropped Reviewed-by: and Tested-by: tags from patch 4 since it needs to be reviewed agan (core2_vpmu_do_wrmsr() routine, mostly) * Replaced references to dom0 with hardware_domain (and is_control_domain with is_hardware_domain for consistency) * Prevent non-privileged domains from reading PMU MSRs in VPMU_PRIV_MODE * Reverted unnecessary changes in vpmu_initialise()'s switch statement * Fixed comment in vpmu_do_interrupt Changes in v5: * Dropped patch number 2 ("Stop AMD counters when called from vpmu_save_force()") as no longer needed * Added patch number 2 that marks context as loaded before PMU registers are loaded. This prevents situation where a PMU interrupt may occur while context is still viewed as not loaded. (This is really a bug fix for exsiting VPMU code) * Renamed xenpmu.h files to pmu.h * More careful use of is_pv_domain(), is_hvm_domain(, is_pvh_domain and has_hvm_container_domain(). Also explicitly disabled support for PVH until patch 16 to make distinction between usage of the above macros more clear. * Added support for disabling VPMU support during runtime. * Disable VPMUs for non-privileged domains when switching to privileged profiling mode * Added ARM stub for xen_arch_pmu_t * Separated vpmu_mode from vpmu_features * Moved CS register query to make sure we use appropriate query mechanism for various guest types. * LVTPC is now set from value in shared area, not copied from dom0 * Various code and comments cleanup as suggested by Jan. Changes in v4: * Added support for PVH guests: o changes in pvpmu_init() to accommodate both PV and PVH guests, still in patch 10 o more careful use of is_hvm_domain o Additional patch (16) * Moved HVM interrupt handling out of vpmu_do_interrupt() for NMI-safe handling * Fixed dom0's VCPU selection in privileged mode * Added a cast in register copy for 32-bit PV guests cpu_user_regs_t in vpmu_do_interrupt. (don't want to expose compat_cpu_user_regs in a public header) * Renamed public structures by prefixing them with "xen_" * Added an entry for xenpf_symdata in xlat.lst * Fixed pv_cpuid check for vpmu-specific cpuid adjustments * Varios code style fixes * Eliminated anonymous unions * Added more verbiage to NMI patch description Changes in v3: * Moved PMU MSR banks out from architectural context data structures to allow for future expansion without protocol changes * PMU interrupts can be either NMIs or regular vector interrupts (the latter is the default) * Context is now marked as PMU_CACHED by the hypervisor code to avoid certain race conditions with the guest * Fixed races with PV guest in MSR access handlers * More Intel VPMU cleanup * Moved NMI-unsafe code from NMI handler * Dropped changes to vcpu->is_running * Added LVTPC apic handling (cached for PV guests) * Separated privileged profiling mode into a standalone patch * Separated NMI handling into a standalone patch Changes in v2: * Xen symbols are exported as data structure (as opoosed to a set of formatted strings in v1). Even though one symbol per hypercall is returned performance appears to be acceptable: reading whole file from dom0 userland takes on average about twice as long as reading /proc/kallsyms * More cleanup of Intel VPMU code to simplify publicly exported structures * There is an architecture-independent and x86-specific public include files (ARM has a stub) * General cleanup of public include files to make them more presentable (and to make auto doc generation better) * Setting of vcpu->is_running is now done on ARM in schedule_tail as well (making changes to common/schedule.c architecture-independent). Note that this is not tested since I don't have access to ARM hardware. * PCPU ID of interrupted processor is now passed to PV guest The following patch series adds PMU support in Xen for PV(H) guests. There is a companion patchset for Linux kernel. In addition, another set of changes will be provided (later) for userland perf code. This version has following limitations: * For accurate profiling of dom0/Xen dom0 VCPUs should be pinned. * Hypervisor code is only profiled on processors that have running dom0 VCPUs on them. * No backtrace support. A few notes that may help reviewing: * A shared data structure (xenpmu_data_t) between each PV VPCU and hypervisor CPU is used for passing registers' values as well as PMU state at the time of PMU interrupt. * PMU interrupts are taken by hypervisor either as NMIs or regular vector interrupts for both HVM and PV(H). The interrupts are sent as NMIs to HVM guests and as virtual interrupts to PV(H) guests * PV guest's interrupt handler does not read/write PMU MSRs directly. Instead, it accesses xenpmu_data_t and flushes it to HW it before returning. * PMU mode is controlled at runtime via /sys/hypervisor/pmu/pmu/{pmu_mode,pmu_flags} in addition to 'vpmu' boot option (which is preserved for back compatibility). The following modes are provided: * disable: VPMU is off * enable: VPMU is on. Guests can profile themselves, dom0 profiles itself and Xen * priv_enable: dom0 only profiling. dom0 collects samples for everyone. Sampling in guests is suspended. * /proc/xen/xensyms file exports hypervisor's symbols to dom0 (similar to /proc/kallsyms) * VPMU infrastructure is now used for HVM, PV and PVH and therefore has been moved up from hvm subtree Boris Ostrovsky (21): common/symbols: Export hypervisor symbols to privileged guest x86/VPMU: Manage VPMU_CONTEXT_SAVE flag in vpmu_save_force() x86/VPMU: Set MSR bitmaps only for HVM/PVH guests x86/VPMU: Make vpmu macros a bit more efficient intel/VPMU: Clean up Intel VPMU code vmx: Merge MSR management routines x86/VPMU: Handle APIC_LVTPC accesses intel/VPMU: MSR_CORE_PERF_GLOBAL_CTRL should be initialized to zero x86/VPMU: Add public xenpmu.h x86/VPMU: Make vpmu not HVM-specific x86/VPMU: Interface for setting PMU mode and flags x86/VPMU: Initialize VPMUs with __initcall x86/VPMU: Initialize PMU for PV(H) guests x86/VPMU: Save VPMU state for PV guests during context switch x86/VPMU: When handling MSR accesses, leave fault injection to callers x86/VPMU: Add support for PMU register handling on PV guests x86/VPMU: Handle PMU interrupts for PV guests x86/VPMU: Merge vpmu_rdmsr and vpmu_wrmsr x86/VPMU: Add privileged PMU mode x86/VPMU: NMI-based VPMU support x86/VPMU: Move VPMU files up from hvm/ directory docs/misc/xen-command-line.markdown | 8 +- tools/flask/policy/policy/modules/xen/xen.te | 7 + xen/arch/x86/cpu/Makefile | 1 + xen/arch/x86/cpu/vpmu.c | 873 +++++++++++++++++++++ xen/arch/x86/{hvm/svm/vpmu.c => cpu/vpmu_amd.c} | 245 +++--- .../x86/{hvm/vmx/vpmu_core2.c => cpu/vpmu_intel.c} | 780 +++++++++--------- xen/arch/x86/domain.c | 25 +- xen/arch/x86/hvm/Makefile | 1 - xen/arch/x86/hvm/hvm.c | 1 + xen/arch/x86/hvm/svm/Makefile | 1 - xen/arch/x86/hvm/svm/svm.c | 10 +- xen/arch/x86/hvm/vlapic.c | 3 + xen/arch/x86/hvm/vmx/Makefile | 1 - xen/arch/x86/hvm/vmx/vmcs.c | 84 +- xen/arch/x86/hvm/vmx/vmx.c | 28 +- xen/arch/x86/hvm/vpmu.c | 263 ------- xen/arch/x86/oprofile/op_model_ppro.c | 8 +- xen/arch/x86/platform_hypercall.c | 28 + xen/arch/x86/traps.c | 62 +- xen/arch/x86/x86_64/compat/entry.S | 4 + xen/arch/x86/x86_64/entry.S | 4 + xen/common/event_channel.c | 1 + xen/common/symbols.c | 54 ++ xen/include/Makefile | 2 + xen/include/asm-x86/domain.h | 2 + xen/include/asm-x86/hvm/vcpu.h | 3 - xen/include/asm-x86/hvm/vmx/vmcs.h | 18 +- xen/include/asm-x86/hvm/vmx/vpmu_core2.h | 51 -- xen/include/asm-x86/softirq.h | 3 +- xen/include/asm-x86/{hvm => }/vpmu.h | 96 ++- xen/include/public/arch-arm.h | 3 + xen/include/public/arch-x86/pmu.h | 95 +++ xen/include/public/platform.h | 19 + xen/include/public/pmu.h | 90 +++ xen/include/public/xen.h | 2 + xen/include/xen/hypercall.h | 4 + xen/include/xen/symbols.h | 3 + xen/include/xlat.lst | 6 + xen/include/xsm/dummy.h | 20 + xen/include/xsm/xsm.h | 6 + xen/xsm/dummy.c | 1 + xen/xsm/flask/hooks.c | 28 + xen/xsm/flask/policy/access_vectors | 6 + 43 files changed, 2018 insertions(+), 932 deletions(-) create mode 100644 xen/arch/x86/cpu/vpmu.c rename xen/arch/x86/{hvm/svm/vpmu.c => cpu/vpmu_amd.c} (68%) rename xen/arch/x86/{hvm/vmx/vpmu_core2.c => cpu/vpmu_intel.c} (58%) delete mode 100644 xen/arch/x86/hvm/vpmu.c delete mode 100644 xen/include/asm-x86/hvm/vmx/vpmu_core2.h rename xen/include/asm-x86/{hvm => }/vpmu.h (54%) create mode 100644 xen/include/public/arch-x86/pmu.h create mode 100644 xen/include/public/pmu.h -- 1.8.1.4 _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel