RE: One off question: Hot vertical scaling of a KVM?

2015-08-20 Thread ROZUMNY, VICTOR
Bandan- Thank you for the quick response and for your time. To answer your question, yes, we have a need to dynamically increase the available RAM that a KVM guest has available to it. i.e. change from 16GB to 32GB with minimal or no interruption of service. I have taken a look at Qemu and f

[PATCH 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()

2015-08-20 Thread Vlastimil Babka
Perform the same debug checks in alloc_pages_node() as are done in __alloc_pages_node(), by making the former function a wrapper of the latter one. In addition to better diagnostics in DEBUG_VM builds for situations which have been already fatal (e.g. out-of-bounds node id), there are two visible

[PATCH 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-08-20 Thread Vlastimil Babka
The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page allocator: do not check NUMA node ID when the caller knows the node is valid") as an optimized variant of alloc_pages_node(), that doesn't fallback to current node for nid == NUMA_NO_NODE. Unfortunately the name of the func

[PATCH 3/3] mm: use numa_mem_id() in alloc_pages_node()

2015-08-20 Thread Vlastimil Babka
alloc_pages_node() might fail when called with NUMA_NO_NODE and __GFP_THISNODE on a CPU belonging to a memoryless node. To make the local-node fallback more robust and prevent such situations, use numa_mem_id(), which was introduced for similar scenarios in the slab context. Suggested-by: Christop

Re: [PATCH 1/3] mm: rename alloc_pages_exact_node to __alloc_pages_node

2015-08-20 Thread Michal Hocko
On Thu 20-08-15 13:43:20, Vlastimil Babka wrote: > The function alloc_pages_exact_node() was introduced in 6484eb3e2a81 ("page > allocator: do not check NUMA node ID when the caller knows the node is valid") > as an optimized variant of alloc_pages_node(), that doesn't fallback to > current > node

Re: [PATCH 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()

2015-08-20 Thread Michal Hocko
On Thu 20-08-15 13:43:21, Vlastimil Babka wrote: > Perform the same debug checks in alloc_pages_node() as are done in > __alloc_pages_node(), by making the former function a wrapper of the latter > one. > > In addition to better diagnostics in DEBUG_VM builds for situations which > have been alrea

IRQ affinity on Linux guest

2015-08-20 Thread Mihai Neagu
Hello, I'm trying to assign some IRQ affinities to core 0 by setting smp_affinity to 1. This is on a dual-core embedded Linux virtual machine ran with KVM. However, ISRs continue to run on both cores. The same technique works well with QEMU with full software emulation. Here is the output of t

Re: [PATCH 2/3] mm: unify checks in alloc_pages_node() and __alloc_pages_node()

2015-08-20 Thread Michal Hocko
On Thu 20-08-15 16:14:34, Michal Hocko wrote: > On Thu 20-08-15 13:43:21, Vlastimil Babka wrote: > > Perform the same debug checks in alloc_pages_node() as are done in > > __alloc_pages_node(), by making the former function a wrapper of the latter > > one. > > > > In addition to better diagnostics

[GIT PULL] KVM/ARM pull request for 4.3

2015-08-20 Thread Marc Zyngier
Hi Paolo, This is the KVM/ARM pull request for Linux 4.3. Some rather major things this time around (guest debug, management of interrupt active state, lazy FP save/restore). Thanks! M. The following changes since commit bc0195aad0daa2ad5b0d76cce22b167bc3435590: Linux 4.2-rc2 (2015-0

[PATCH 24/25] arm64: KVM: Optimize arm64 skip 30-50% vfp/simd save/restore on exits

2015-08-20 Thread Marc Zyngier
From: Mario Smarduch This patch only saves and restores FP/SIMD registers on Guest access. To do this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit. lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD context is not saved/restored [chazy/maz: fixed

[PATCH 07/25] KVM: arm64: re-factor hyp.S debug register code

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This is a pre-cursor to sharing the code with the guest debug support. This replaces the big macro that fishes data out of a fixed location with a more general helper macro to restore a set of debug registers. It uses macro substitution so it can be re-used for debug control and

[PATCH 18/25] KVM: arm/arm64: vgic: Relax vgic_can_sample_irq for edge IRQs

2015-08-20 Thread Marc Zyngier
We only set the irq_queued flag for level interrupts, meaning that "!vgic_irq_is_queued(vcpu, irq)" is a good enough predicate for all interrupts. This will allow us to inject edge HW interrupts, for which the state ACTIVE+PENDING is not allowed. Reviewed-by: Christoffer Dall Signed-off-by: Marc

[PATCH 21/25] KVM: arm/arm64: vgic: Add vgic_{get,set}_phys_irq_active

2015-08-20 Thread Marc Zyngier
In order to control the active state of an interrupt, introduce a pair of accessors allowing the state to be set/queried. This only affects the logical state, and the HW state will only be applied at world-switch time. Acked-by: Christoffer Dall Signed-off-by: Marc Zyngier --- include/kvm/arm_

[PATCH 19/25] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts

2015-08-20 Thread Marc Zyngier
In order to be able to feed physical interrupts to a guest, we need to be able to establish the virtual-physical mapping between the two worlds. The mappings are kept in a set of RCU lists, indexed by virtual interrupts. Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier --- arch/arm/kv

[PATCH 06/25] KVM: arm64: guest debug, add support for single-step

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This adds support for single-stepping the guest. To do this we need to manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits to trigger stepping. We take care to preserve MDSCR_EL1 and trap access to it to ensure we don't affect the apparent state of the guest. As we have to en

[PATCH 20/25] KVM: arm/arm64: vgic: Allow HW interrupts to be queued to a guest

2015-08-20 Thread Marc Zyngier
To allow a HW interrupt to be injected into a guest, we lookup the guest virtual interrupt in the irq_phys_map list, and if we have a match, encode both interrupts in the LR. We also mark the interrupt as "active" at the host distributor level. On guest EOI on the virtual interrupt, the host inte

[PATCH 09/25] KVM: arm64: guest debug, HW assisted debug support

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This adds support for userspace to control the HW debug registers for guest debug. In the debug ioctl we copy an IMPDEF registers into a new register set called host_debug_state. We use the recently introduced vcpu parameter debug_ptr to select which register set is copied into

[PATCH 22/25] KVM: arm/arm64: vgic: Prevent userspace injection of a mapped interrupt

2015-08-20 Thread Marc Zyngier
Virtual interrupts mapped to a HW interrupt should only be triggered from inside the kernel. Otherwise, you could end up confusing the kernel (and the GIC's) state machine. Rearrange the injection path so that kvm_vgic_inject_irq is used for non-mapped interrupts, and kvm_vgic_inject_mapped_irq is

[PATCH 25/25] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-08-20 Thread Marc Zyngier
From: Mario Smarduch After enhancing arm64 FP/SIMD exit handling, ARMv7 VFP exit branch is moved to guest trap handling. This allows us to keep exit handling flow between both architectures consistent. Signed-off-by: Mario Smarduch Signed-off-by: Marc Zyngier --- arch/arm/kvm/interrupts.S | 1

[PATCH 23/25] KVM: arm/arm64: timer: Allow the timer to control the active state

2015-08-20 Thread Marc Zyngier
In order to remove the crude hack where we sneak the masked bit into the timer's control register, make use of the phys_irq_map API control the active state of the interrupt. This causes some limited changes to allow for potential error propagation. Reviewed-by: Christoffer Dall Signed-off-by: M

[PATCH 15/25] arm/arm64: KVM: Move vgic handling to a non-preemptible section

2015-08-20 Thread Marc Zyngier
As we're about to introduce some serious GIC-poking to the vgic code, it is important to make sure that we're going to poke the part of the GIC that belongs to the CPU we're about to run on (otherwise, we'd end up with some unexpected interrupts firing)... Introducing a non-preemptible section in

[PATCH 05/25] KVM: arm64: guest debug, add SW break point support

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This adds support for SW breakpoints inserted by userspace. We do this by trapping all guest software debug exceptions to the hypervisor (MDCR_EL2.TDE). The exit handler sets an exit reason of KVM_EXIT_DEBUG with the kvm_debug_exit_arch structure holding the exception syndrome

[PATCH 17/25] KVM: arm/arm64: vgic: Allow HW irq to be encoded in LR

2015-08-20 Thread Marc Zyngier
Now that struct vgic_lr supports the LR_HW bit and carries a hwirq field, we can encode that information into the list registers. This patch provides implementations for both GICv2 and GICv3. Reviewed-by: Christoffer Dall Signed-off-by: Marc Zyngier --- include/linux/irqchip/arm-gic-v3.h | 3

[PATCH 01/25] KVM: add comments for kvm_debug_exit_arch struct

2015-08-20 Thread Marc Zyngier
From: Alex Bennée Bring into line with the comments for the other structures and their KVM_EXIT_* cases. Also update api.txt to reflect use in kvm_run documentation. Signed-off-by: Alex Bennée Reviewed-by: David Hildenbrand Reviewed-by: Andrew Jones Acked-by: Christoffer Dall Signed-off-by:

[PATCH 11/25] KVM: arm64: add trace points for guest_debug debug

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This includes trace points for: kvm_arch_setup_guest_debug kvm_arch_clear_guest_debug I've also added some generic register setting trace events and also a trace point to dump the array of hardware registers. Acked-by: Christoffer Dall Signed-off-by: Alex Bennée Signed-o

[PATCH 03/25] KVM: arm: guest debug, add stub KVM_SET_GUEST_DEBUG ioctl

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This commit adds a stub function to support the KVM_SET_GUEST_DEBUG ioctl. Any unsupported flag will return -EINVAL. For now, only KVM_GUESTDBG_ENABLE is supported, although it won't have any effects. Signed-off-by: Alex Bennée . Reviewed-by: Christoffer Dall Signed-off-by: Ma

[PATCH 08/25] KVM: arm64: introduce vcpu->arch.debug_ptr

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This introduces a level of indirection for the debug registers. Instead of using the sys_regs[] directly we store registers in a structure in the vcpu. The new kvm_arm_reset_debug_ptr() sets the debug ptr to the guest context. Because we no longer give the sys_regs offset for t

[PATCH 14/25] arm/arm64: KVM: Fix ordering of timer/GIC on guest entry

2015-08-20 Thread Marc Zyngier
As we now inject the timer interrupt when we're about to enter the guest, it makes a lot more sense to make sure this happens before the vgic code queues the pending interrupts. Otherwise, we get the interrupt on the following exit, which is not great for latency (and leads to all kind of bizarre

[PATCH 02/25] KVM: arm64: guest debug, define API headers

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This commit defines the API headers for guest debugging. There are two architecture specific debug structures: - kvm_guest_debug_arch, allows us to pass in HW debug registers - kvm_debug_exit_arch, signals exception and possible faulting address The type of debugging being

[PATCH 12/25] arm64/kvm: Add generic v8 KVM target

2015-08-20 Thread Marc Zyngier
From: "Suzuki K. Poulose" This patch adds a generic ARM v8 KVM target cpu type for use by the new CPUs which eventualy ends up using the common sys_reg table. For backward compatibility the existing targets have been preserved. Any new target CPU that can be covered by generic v8 sys_reg tables s

[PATCH 04/25] KVM: arm: introduce kvm_arm_init/setup/clear_debug

2015-08-20 Thread Marc Zyngier
From: Alex Bennée This is a precursor for later patches which will need to do more to setup debug state before entering the hyp.S switch code. The existing functionality for setting mdcr_el2 has been moved out of hyp.S and now uses the value kept in vcpu->arch.mdcr_el2. As the assembler used to

[PATCH 10/25] KVM: arm64: enable KVM_CAP_SET_GUEST_DEBUG

2015-08-20 Thread Marc Zyngier
From: Alex Bennée Finally advertise the KVM capability for SET_GUEST_DEBUG. Once arm support is added this check can be moved to the common kvm_vm_ioctl_check_extension() code. Signed-off-by: Alex Bennée Acked-by: Christoffer Dall Signed-off-by: Marc Zyngier --- Documentation/virtual/kvm/api

[PATCH 16/25] KVM: arm/arm64: vgic: Convert struct vgic_lr to use bitfields

2015-08-20 Thread Marc Zyngier
As we're about to cram more information in the vgic_lr structure (HW interrupt number and additional state information), we switch to a layout similar to the HW's: - use bitfields to save space (we don't need more than 10 bits to represent the irq numbers) - source CPU and HW interrupt can share

[PATCH 13/25] arm64: KVM: remove remaining reference to vgic_sr_vectors

2015-08-20 Thread Marc Zyngier
From: Vladimir Murzin Since commit 8a14849 (arm64: KVM: Switch vgic save/restore to alternative_insn) vgic_sr_vectors is not used anymore, so remove remaining leftovers and kill the structure. Signed-off-by: Vladimir Murzin Signed-off-by: Marc Zyngier --- arch/arm64/include/asm/kvm_host.h | 5

Re: [PATCH] kvm:powerpc:Fix return statements for wrapper functions in the file book3s_64_mmu_hv.c

2015-08-20 Thread Paul Mackerras
On Mon, Aug 10, 2015 at 11:27:31AM -0400, Nicholas Krause wrote: > This fixes the wrapper functions kvm_umap_hva_hv and the function > kvm_unmap_hav_range_hv to return the return value of the function > kvm_handle_hva or kvm_handle_hva_range that they are wrapped to > call internally rather then al

[PATCH 1/5] KVM: nVMX: refactor segment checks, make the code more clean and straightforward

2015-08-20 Thread Eugene Korenevsky
Prepare for subsequent changes. Extract calls for segment checking in protected and 64-bit mode. This should be done to avoid overbloating of get_vmx_mem_address() function, even if kvm_queue_exception_e() is called twice. Signed-off-by: Eugene Korenevsky --- arch/x86/kvm/vmx.c | 106 +++

[PATCH 2/5] KVM: nVMX: fix limit check for protected mode

2015-08-20 Thread Eugene Korenevsky
Fix limit checking for all segment types except expand-down data segments. The effective limit is the last address that is allowed to be accessed in the segment. The condition for exceeding the limit should be offset + operand_size - 1 > limit For example, if offset == limit and operand size is o

[PATCH 3/5] KVM: nVMX: add limit check for expand-down segments

2015-08-20 Thread Eugene Korenevsky
Add limit checking for expand-down data segments. For such segments, the effective limit specifies the last address that is not allowed to be accessed within the segment. I.e. offset <= limit means means limit exceeding. Signed-off-by: Eugene Korenevsky --- arch/x86/kvm/vmx.c | 5 - 1 file c

[PATCH 5/5] KVM: nVMX: VMWRITE emulation: remove unnecessary check for compatibility mode

2015-08-20 Thread Eugene Korenevsky
VMWRITE instruction is not valid in compatibility mode. This is checked by nested_vmx_check_permission() function which throws #UD if CS.L=0. The additional check in is_64_bit_mode() for CS.L=0 is useless. We should check only EFER.LMA=1 which is done by is_long_mode(). Signed-off-by: Eugene Koren

[PATCH 4/5] KVM: nVMX: fix limit checking: memory operand size varies for different VMX instructions

2015-08-20 Thread Eugene Korenevsky
When checking limits for VMX opcodes in protected mode, different sizes of memory operands must be taken into account. For VMREAD and VMWRITE instructions, memory operand size is 32 or 64 bits depending on CPU mode. For VMON, VMCLEAR, VMPTRST, VMPTRLD instructions, memory operand size is 64 bits. F

Re: [RFC PATCH 0/5] KVM: x86: exit to user space on unhandled MSR accesses

2015-08-20 Thread Peter Hornyack
On Wed, Aug 19, 2015 at 2:43 PM, Bandan Das wrote: > Peter Hornyack writes: > >> There are numerous MSRs that kvm does not currently handle. On Intel >> platforms we have observed guest VMs accessing some of these MSRs (for >> example, MSR_PLATFORM_INFO) and behaving poorly (to the point of guest

Re: IRQ affinity on Linux guest

2015-08-20 Thread Radim Krčmář
2015-08-20 17:16+0300, Mihai Neagu: > Here is how IRQ affinity is configured on guest at startup, in an init.d > script: > > echo 1 > /proc/irq/default_smp_affinity > for x in /proc/irq/*/smp_affinity; > do > echo 1 > $x > done 2> /dev/null > > The command line for starting the hardware acceler

[PATCH 1/9] KVM: MMU: fix use uninitialized value

2015-08-20 Thread Xiao Guangrong
GCC (gcc version 5.1.1 20150618 (Red Hat 5.1.1-4) (GCC)) complains of this warning: arch/x86/kvm//mmu.c:3332:9: warning: ‘leaf’ may be used uninitialized in this function [-Wmaybe-uninitialized] while (root >= leaf) { ^ arch/x86/kvm//mmu.c:3304:12: note: ‘leaf’ was declared here int

[PATCH 3/9] KVM: x86: add pcommit support

2015-08-20 Thread Xiao Guangrong
Pass PCOMMIT CPU feature to guest to enable PCOMMIT instruction Currently we do not catch pcommit instruction for L1 guest and allow L1 to catch this instruction for L2 The specification locates at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf Signed-off-by: Xiao G

[PATCH 2/9] KVM: x86: allow guest to use cflushopt anc clwb

2015-08-20 Thread Xiao Guangrong
Pass its CPU feature to guest to enable them in guest These are needed by nvdimm drivers The specification locates at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf Signed-off-by: Xiao Guangrong --- arch/x86/kvm/cpuid.c | 2 +- 1 file changed, 1 insertion(+), 1 de

[PATCH 6/9] KVM: VMX: simplify invpcid handling in vmx_cpuid_update()

2015-08-20 Thread Xiao Guangrong
If vmx_invpcid_supported() is true, second execution control filed must be supported and SECONDARY_EXEC_ENABLE_INVPCID must have already been set in current vmcs by vmx_secondary_exec_control() If vmx_invpcid_supported() is false, no need to clear SECONDARY_EXEC_ENABLE_INVPCID Signed-off-by: Xiao

[PATCH 9/9] KVM: VMX: drop rdtscp_enabled field

2015-08-20 Thread Xiao Guangrong
Check cpuid bit instead of it Signed-off-by: Xiao Guangrong --- arch/x86/kvm/cpuid.h | 8 arch/x86/kvm/vmx.c | 19 ++- 2 files changed, 14 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index aed7bfe..d434ee9 100644 --- a/arch

[PATCH 8/9] KVM: VMX: introduce set_clear_2nd_exec_ctrl()

2015-08-20 Thread Xiao Guangrong
It's used to clean up the code Signed-off-by: Xiao Guangrong --- arch/x86/kvm/vmx.c | 42 +++--- 1 file changed, 19 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 4f238b7..58f7b89 100644 --- a/arch/x86/kvm/vmx.c +++

[PATCH 7/9] KVM: VMX: unify SECONDARY_VM_EXEC_CONTROL update

2015-08-20 Thread Xiao Guangrong
Unify the update in vmx_cpuid_update() Signed-off-by: Xiao Guangrong --- arch/x86/kvm/vmx.c | 21 +++-- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0d68140..4f238b7 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kv

[PATCH 0/9] KVM: x86: enable cflushopt/clwb/pcommit and simplify code

2015-08-20 Thread Xiao Guangrong
This pachset enables clfushopt, clwb and pcommit instructions for guest which are used by NVDIMM. The specification locates at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf Patch 1 fixes a uninitialized value used in KVM MMU code, patch 2 and patch 3 enable these th

[PATCH 4/9] KVM: VMX: drop rdtscp_enabled check in prepare_vmcs02()

2015-08-20 Thread Xiao Guangrong
SECONDARY_EXEC_RDTSCP set for L2 guest comes from vmcs12 Signed-off-by: Xiao Guangrong --- arch/x86/kvm/vmx.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index b526c61..f7a721e 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vm

[PATCH 5/9] KVM: VMX: simplify rdtscp handling in vmx_cpuid_update()

2015-08-20 Thread Xiao Guangrong
if vmx_rdtscp_supported() is true SECONDARY_EXEC_RDTSCP must have already been set in current vmcs by vmx_secondary_exec_control() Signed-off-by: Xiao Guangrong --- arch/x86/kvm/vmx.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/

[PATCH] target-i386: enable cflushopt/clwb/pcommit instructions

2015-08-20 Thread Xiao Guangrong
These instructions are used by NVDIMM drivers and the specification locates at: https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf Let them be enabled on Broadwell on default Signed-off-by: Xiao Guangrong --- target-i386/cpu.c | 14 +- target-i386/cpu.h | 3