When the VCPU target of an H_IPI hypercall is not running in the guest, we need to do a kick VCPU (wake the VCPU thread) to make it runnable. The real-mode version of the H_IPI hypercall cannot do this because it involves waking a sleeping thread. Thus the hcall returns H_TOO_HARD which forces a switch back to host so that the H_IPI call can be completed in virtual mode. This has been found to cause a slowdown for many workloads like YCSB MongoDB, small message networking, etc.
One solution is to hand off this job of waking the VCPU to a CPU that is running in the host by sending it a message through the IPI mechanism from the hypercall. This patch set optimizes the wakeup of the target VCPU by posting a free core already running in the host to do the wakeup, thus avoiding the switch to host and back. It requires maintaining a bitmask of all the available cores in the system to indicate if they are in the host or running in some guest. It also requires the H_IPI hypercall to search for a free host core and send it a new IPI message PPC_MSG_RM_HOST_ACTION after stashing away some parameters like the pointer to VCPU for the IPI handler. Locks are avoided by using atomic operations to save core state, to find and reserve a core in the host, etc. Note that it is possible for a guest to be destroyed and its VCPUs freed before the IPI handler gets to run. This case is handled by ensuring that any pending PPC_MSG_RM_HOST_ACTION IPIs are completed before proceeding with freeing the VCPUs. Currently, powerpc only support 4 IPI messages and all 4 are already taken for other purposes. This patch also set increases the number of supported IPI messages to 8. It also provides the code to send an IPI from hypercall running in real-mode since the existing cause_ipi functions cannot be executed in real-mode. A tunable h_ipi_redirect is also included in the patch set to disable the feature. v2: * Complete patch set sent to both kvm and linuxppc mailing lists to avoid build-breaks. * Broke up real mode IPI messaging function into two pieces - one to set the message and one to cause the IPI. New function icp_native_cause_ipi_rm added to arch/powerpc/sysdev/xics/icp-native.c Suresh Warrier (9): powerpc/smp: Support more IPI messages powerpc/smp: Add smp_muxed_ipi_set_message powerpc/powernv: Add icp_native_cause_ipi_rm KVM: PPC: Book3S HV: Host-side RM data structures KVM: PPC: Book3S HV: Manage core host state KVM: PPC: Book3S HV: kvmppc_host_rm_ops - handle offlining CPUs KVM: PPC: Book3S HV: Host side kick VCPU when poked by real-mode KVM KVM: PPC: Book3S HV: Send IPI to host core to wake VCPU KVM: PPC: Book3S HV: Add tunable to control H_IPI redirection arch/powerpc/include/asm/kvm_ppc.h | 33 +++++++ arch/powerpc/include/asm/smp.h | 4 + arch/powerpc/include/asm/xics.h | 1 + arch/powerpc/kernel/smp.c | 28 +++++- arch/powerpc/kvm/book3s_hv.c | 166 ++++++++++++++++++++++++++++++++++ arch/powerpc/kvm/book3s_hv_builtin.c | 3 + arch/powerpc/kvm/book3s_hv_rm_xics.c | 120 +++++++++++++++++++++++- arch/powerpc/kvm/powerpc.c | 10 ++ arch/powerpc/sysdev/xics/icp-native.c | 19 ++++ 9 files changed, 376 insertions(+), 8 deletions(-) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html