Re: [RFC PATCH 00/12] KVM: MMU: locklessly wirte-protect

2013-08-29 Thread Gleb Natapov
On Sat, Aug 03, 2013 at 02:09:43PM +0900, Takuya Yoshikawa wrote: > On Tue, 30 Jul 2013 21:01:58 +0800 > Xiao Guangrong wrote: > > > Background > > == > > Currently, when mark memslot dirty logged or get dirty page, we need to > > write-protect large guest memory, it is the heavy work, es

Re: [PATCH 01/12] KVM: MMU: remove unused parameter

2013-08-29 Thread Gleb Natapov
On Tue, Jul 30, 2013 at 09:01:59PM +0800, Xiao Guangrong wrote: > @vcpu in page_fault_can_be_fast() is not used so remove it > > Signed-off-by: Xiao Guangrong Applied this one. Thanks. > --- > arch/x86/kvm/mmu.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Gleb Natapov
On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote: > >>> BTW I do not see > >>> rcu_assign_pointer()/rcu_dereference() in your patches which hints on > >> > >> IIUC, We can not directly use rcu_assign_pointer(), that is something like: > >> p = v to assign a pointer to a pointer. But i

Re: [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

2013-08-29 Thread Gleb Natapov
On Tue, Jul 30, 2013 at 09:02:08PM +0800, Xiao Guangrong wrote: > It is easy if the handler is in the vcpu context, in that case we can use > walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() that > disable interrupt to stop shadow page be freed. But we are on the ioctl > conte

Re: [PATCH 10/12] KVM: MMU: allow locklessly access shadow page table out of vcpu thread

2013-08-29 Thread Xiao Guangrong
On 08/29/2013 05:10 PM, Gleb Natapov wrote: > On Tue, Jul 30, 2013 at 09:02:08PM +0800, Xiao Guangrong wrote: >> It is easy if the handler is in the vcpu context, in that case we can use >> walk_shadow_page_lockless_begin() and walk_shadow_page_lockless_end() that >> disable interrupt to stop shado

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Xiao Guangrong
On 08/29/2013 05:08 PM, Gleb Natapov wrote: > On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote: > BTW I do not see > rcu_assign_pointer()/rcu_dereference() in your patches which hints on IIUC, We can not directly use rcu_assign_pointer(), that is something like:

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Gleb Natapov
On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote: > After more thinking, I still think rcu_assign_pointer() is unneeded when a > entry > is removed. The remove-API does not care the order between unlink the entry > and > the changes to its fields. It is the caller's responsibility:

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Gleb Natapov
On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote: > > As Documentation/RCU/whatisRCU.txt says: > > > > As with rcu_assign_pointer(), an important function of > > rcu_dereference() is to document which pointers are protected by > > RCU, in particular, flagging

[PATCH 2/4] ARM: KVM: vgic: fix GICD_ICFGRn access

2013-08-29 Thread Marc Zyngier
All the code in handle_mmio_cfg_reg() assumes the offset has been shifted right to accomodate for the 2:1 bit compression, but this is only done when getting the register addess. Shift the offset early so the code works mostly unchanged. Reported-by: Zhaobo (Bob, ERC) Signed-off-by: Marc Zyngier

[PATCH 3/4] ARM: KVM: Bugfix: vgic_bytemap_get_reg per cpu regs

2013-08-29 Thread Marc Zyngier
From: Christoffer Dall For bytemaps each IRQ field is 1 byte wide, so we pack 4 irq fields in one word and since there are 32 private (per cpu) irqs, we have 8 private u32 fields on the vgic_bytemap struct. We shift the offset from the base of the register group right by 2, giving us the word in

[PATCH 4/4] ARM: KVM: vgic: Bump VGIC_NR_IRQS to 256

2013-08-29 Thread Marc Zyngier
From: Christoffer Dall The Versatile Express TC2 board, which we use as our main emulated platform in QEMU, defines 160+32 == 192 interrupts, so limiting the number of interrupts to 128 is not quite going to cut it for real board emulation. Note that this didn't use to be a problem because QEMU

[PATCH 1/4] ARM: KVM: vgic: simplify vgic_get_target_reg

2013-08-29 Thread Marc Zyngier
vgic_get_target_reg is quite complicated, for no good reason. Actually, it is fairly easy to write it in a much more efficient way by using the target CPU array instead of the bitmap. Signed-off-by: Marc Zyngier --- virt/kvm/arm/vgic.c | 12 +++- 1 file changed, 3 insertions(+), 9 deleti

[GIT PULL] ARM: KVM: VGIC fixes for 3.12

2013-08-29 Thread Marc Zyngier
Gleb, Paolo, Please pull the below tag for a few VGIC fixes to be merged in 3.12. Thanks, M. The following changes since commit d8dfad3876e438b759da3c833d62fb8b2267: Linux 3.11-rc7 (2013-08-25 17:43:22 -0700) are available in the git repository at: git://git.kernel.org/pub/sc

Re: [Qemu-devel] [RFC][PATCH 2/6] cpus: release allocated vcpu objects and exit vcpu thread

2013-08-29 Thread chenfan
On Thu, 2013-08-29 at 07:10 +0200, Andreas Färber wrote: > Am 29.08.2013 04:09, schrieb Chen Fan: > > After ACPI get a signal to eject a vcpu, then it will notify > > the vcpu thread of needing to exit, before the vcpu exiting, > > will release the vcpu related objects. > > > > Signed-off-by: Chen

[PULL 00/19] ppc patch queue 2013-08-29

2013-08-29 Thread Alexander Graf
Hi Paolo / Gleb, This is my current patch queue for ppc. Please pull. Changes include: - Book3S HV: CMA based memory allocator for linear memory - A few bug fixes Alex The following changes since commit cc2df20c7c4ce594c3e17e9cc260c330646012c8: KVM: x86: Update symbolic exit codes (20

[PULL 07/17] KVM: PPC: Book3S HV: Allow negative offsets to real-mode hcall handlers

2013-08-29 Thread Alexander Graf
From: Paul Mackerras The table of offsets to real-mode hcall handlers in book3s_hv_rmhandlers.S can contain negative values, if some of the handlers end up before the table in the vmlinux binary. Thus we need to use a sign-extending load to read the values in the table rather than a zero-extendi

[PULL 11/17] powerpc/kvm: Copy the pvr value after memset

2013-08-29 Thread Alexander Graf
From: "Aneesh Kumar K.V" Otherwise we would clear the pvr value Signed-off-by: Aneesh Kumar K.V Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_hv.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c

[PULL 16/17] KVM: PPC: Book3S PR: Make instruction fetch fallback work for system calls

2013-08-29 Thread Alexander Graf
From: Paul Mackerras It turns out that if we exit the guest due to a hcall instruction (sc 1), and the loading of the instruction in the guest exit path fails for any reason, the call to kvmppc_ld() in kvmppc_get_last_inst() fetches the instruction after the hcall instruction rather than the hcal

[PULL 17/17] KVM: PPC: Book3S PR: Rework kvmppc_mmu_book3s_64_xlate()

2013-08-29 Thread Alexander Graf
From: Paul Mackerras This reworks kvmppc_mmu_book3s_64_xlate() to make it check the large page bit in the hashed page table entries (HPTEs) it looks at, and to simplify and streamline the code. The checking of the first dword of each HPTE is now done with a single mask and compare operation, and

[PULL 05/17] powerpc/kvm: Use 256K chunk to track both RMA and hash page table allocation.

2013-08-29 Thread Alexander Graf
From: "Aneesh Kumar K.V" Both RMA and hash page table request will be a multiple of 256K. We can use a chunk size of 256K to track the free/used 256K chunk in the bitmap. This should help to reduce the bitmap size. Signed-off-by: Aneesh Kumar K.V Acked-by: Paul Mackerras Signed-off-by: Alexand

[PULL 08/17] kvm/ppc: Call trace_hardirqs_on before entry

2013-08-29 Thread Alexander Graf
From: Scott Wood Currently this is only being done on 64-bit. Rather than just move it out of the 64-bit ifdef, move it to kvm_lazy_ee_enable() so that it is consistent with lazy ee state, and so that we don't track more host code as interrupts-enabled than necessary. Rename kvm_lazy_ee_enable(

[PULL 12/17] arch: powerpc: kvm: add signed type cast for comparation

2013-08-29 Thread Alexander Graf
From: Chen Gang 'rmls' is 'unsigned long', lpcr_rmls() will return negative number when failure occurs, so it need a type cast for comparing. 'lpid' is 'unsigned long', kvmppc_alloc_lpid() return negative number when failure occurs, so it need a type cast for comparing. Signed-off-by: Chen Gang

[PULL 14/17] KVM: PPC: Book3S: Fix compile error in XICS emulation

2013-08-29 Thread Alexander Graf
From: Paul Mackerras Commit 8e44ddc3f3 ("powerpc/kvm/book3s: Add support for H_IPOLL and H_XIRR_X in XICS emulation") added a call to get_tb() but didn't include the header that defines it, and on some configs this means book3s_xics.c fails to compile: arch/powerpc/kvm/book3s_xics.c: In function

[PULL 06/17] KVM: PPC: Book3S HV: Correct tlbie usage

2013-08-29 Thread Alexander Graf
From: Paul Mackerras This corrects the usage of the tlbie (TLB invalidate entry) instruction in HV KVM. The tlbie instruction changed between PPC970 and POWER7. On the PPC970, the bit to select large vs. small page is in the instruction, not in the RB register value. This changes the code to us

[PULL 09/17] kvm/ppc/booke: Don't call kvm_guest_enter twice

2013-08-29 Thread Alexander Graf
From: Scott Wood kvm_guest_enter() was already called by kvmppc_prepare_to_enter(). Don't call it again. Signed-off-by: Scott Wood Signed-off-by: Alexander Graf --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.

[PULL 01/17] mm/cma: Move dma contiguous changes into a seperate config

2013-08-29 Thread Alexander Graf
From: "Aneesh Kumar K.V" We want to use CMA for allocating hash page table and real mode area for PPC64. Hence move DMA contiguous related changes into a seperate config so that ppc64 can enable CMA without requiring DMA contiguous. Acked-by: Michal Nazarewicz Acked-by: Paul Mackerras Signed-o

[PULL 02/17] KVM: PPC: Book3S: Ignore DABR register

2013-08-29 Thread Alexander Graf
We don't emulate breakpoints yet, so just ignore reads and writes to / from DABR. This fixes booting of more recent Linux guest kernels for me. Reported-by: Nello Martuscielli Tested-by: Nello Martuscielli Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3s_emulate.c | 2 ++ 1 file chan

[PULL 03/17] powerpc/kvm: Contiguous memory allocator based hash page table allocation

2013-08-29 Thread Alexander Graf
From: "Aneesh Kumar K.V" Powerpc architecture uses a hash based page table mechanism for mapping virtual addresses to physical address. The architecture require this hash page table to be physically contiguous. With KVM on Powerpc currently we use early reservation mechanism for allocating guest

[PULL 10/17] KVM: PPC: Book3S PR: Load up SPRG3 register with guest value on guest entry

2013-08-29 Thread Alexander Graf
From: Paul Mackerras Unlike the other general-purpose SPRs, SPRG3 can be read by usermode code, and is used in recent kernels to store the CPU and NUMA node numbers so that they can be read by VDSO functions. Thus we need to load the guest's SPRG3 value into the real SPRG3 register when entering

[PULL 13/17] KVM: PPC: Book3S PR: return appropriate error when allocation fails

2013-08-29 Thread Alexander Graf
From: Thadeu Lima de Souza Cascardo err was overwritten by a previous function call, and checked to be 0. If the following page allocation fails, 0 is going to be returned instead of -ENOMEM. Signed-off-by: Thadeu Lima de Souza Cascardo Signed-off-by: Alexander Graf --- arch/powerpc/kvm/book3

[PULL 04/17] powerpc/kvm: Contiguous memory allocator based RMA allocation

2013-08-29 Thread Alexander Graf
From: "Aneesh Kumar K.V" Older version of power architecture use Real Mode Offset register and Real Mode Limit Selector for mapping guest Real Mode Area. The guest RMA should be physically contigous since we use the range when address translation is not enabled. This patch switch RMA allocation

[PULL 15/17] KVM: PPC: Book3S PR: Don't corrupt guest state when kernel uses VMX

2013-08-29 Thread Alexander Graf
From: Paul Mackerras Currently the code assumes that once we load up guest FP/VSX or VMX state into the CPU, it stays valid in the CPU registers until we explicitly flush it to the thread_struct. However, on POWER7, copy_page() and memcpy() can use VMX. These functions do flush the VMX state to

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Xiao Guangrong
On 08/29/2013 05:51 PM, Gleb Natapov wrote: > On Thu, Aug 29, 2013 at 05:31:42PM +0800, Xiao Guangrong wrote: >>> As Documentation/RCU/whatisRCU.txt says: >>> >>> As with rcu_assign_pointer(), an important function of >>> rcu_dereference() is to document which pointers are protected

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Xiao Guangrong
On 08/29/2013 05:31 PM, Gleb Natapov wrote: > On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote: >> After more thinking, I still think rcu_assign_pointer() is unneeded when a >> entry >> is removed. The remove-API does not care the order between unlink the entry >> and >> the changes

Re: [PATCH 09/12] KVM: MMU: introduce pte-list lockless walker

2013-08-29 Thread Xiao Guangrong
On 08/29/2013 07:33 PM, Xiao Guangrong wrote: > On 08/29/2013 05:31 PM, Gleb Natapov wrote: >> On Thu, Aug 29, 2013 at 02:50:51PM +0800, Xiao Guangrong wrote: >>> After more thinking, I still think rcu_assign_pointer() is unneeded when a >>> entry >>> is removed. The remove-API does not care the o

Re: [PATCH 07/23] KVM: PPC: Book3S PR: Use 64k host pages where possible

2013-08-29 Thread Alexander Graf
On 29.08.2013, at 07:23, Paul Mackerras wrote: > On Thu, Aug 29, 2013 at 01:24:04AM +0200, Alexander Graf wrote: >> >> On 06.08.2013, at 06:19, Paul Mackerras wrote: >> >>> +#ifdef CONFIG_PPC_64K_PAGES >>> + /* >>> +* Mark this as a 64k segment if the host is using >>> +* 64k pages, t

Re: [PATCH 04/23] KVM: PPC: Book3S PR: Keep volatile reg values in vcpu rather than shadow_vcpu

2013-08-29 Thread Alexander Graf
On 29.08.2013, at 07:04, Paul Mackerras wrote: > On Thu, Aug 29, 2013 at 12:00:53AM +0200, Alexander Graf wrote: >> >> On 06.08.2013, at 06:16, Paul Mackerras wrote: >> >>> kvm_start_lightweight: >>> + /* Copy registers into shadow vcpu so we can access them in real mode */ >>> + GET_SHADOW

Re: [PATCH 06/23] KVM: PPC: Book3S PR: Allow guest to use 64k pages

2013-08-29 Thread Alexander Graf
On 29.08.2013, at 07:17, Paul Mackerras wrote: > On Thu, Aug 29, 2013 at 12:56:40AM +0200, Alexander Graf wrote: >> >> On 06.08.2013, at 06:18, Paul Mackerras wrote: >> >>> #ifdef CONFIG_PPC_BOOK3S_64 >>> - /* default to book3s_64 (970fx) */ >>> + /* >>> +* Default to the same as the ho

Re: Is fallback vhost_net to qemu for live migrate available?

2013-08-29 Thread Anthony Liguori
Hi Qin, On Mon, Aug 26, 2013 at 10:32 PM, Qin Chuanyu wrote: > Hi all > > I am participating in a project which try to port vhost_net on Xen。 Neat! > By change the memory copy and notify mechanism ,currently virtio-net with > vhost_net could run on Xen with good performance。 I think the key in

Re: [PATCH 2/4] ARM: KVM: vgic: fix GICD_ICFGRn access

2013-08-29 Thread Christoffer Dall
On Thu, Aug 29, 2013 at 11:08:23AM +0100, Marc Zyngier wrote: > All the code in handle_mmio_cfg_reg() assumes the offset has > been shifted right to accomodate for the 2:1 bit compression, > but this is only done when getting the register addess. address > > Shift the offset early so the code wo

Fwd: [Qemu-devel] Direct guest device access from nested guest

2013-08-29 Thread Aaron Fabbri
Sorry. Resending in plain text. (Gmail). -- Forwarded message -- Has anyone considered a paravirt approach? That is: Guest kernel: Write a new IOMMU API back end which does KVM hypercalls. Exposes VFIO to guest user processes (nested VMs) as usual. Host kernel: KVM does thin

RE: [PATCH 0/2] KVM: PPC: BOOKE: MMU Fixes

2013-08-29 Thread Bhushan Bharat-R65777
Hi Alex, Second patch (kvm: ppc: booke: check range page invalidation progress on page setup) of this patch series fixes a critical issue and we would like that to be part of 2.12. First Patch is not that important but pretty simple. Thanks -Bharat > -Original Message- > From: Bhushan

Re: [PATCH 6/6] vhost_net: remove the max pending check

2013-08-29 Thread Jason Wang
On 08/25/2013 07:53 PM, Michael S. Tsirkin wrote: > On Fri, Aug 23, 2013 at 04:55:49PM +0800, Jason Wang wrote: >> On 08/20/2013 10:48 AM, Jason Wang wrote: >>> On 08/16/2013 06:02 PM, Michael S. Tsirkin wrote: > On Fri, Aug 16, 2013 at 01:16:30PM +0800, Jason Wang wrote: >>> We used to lim

[PATCH 0/2] RFC: KVM: Simple optimization based on Xiao's patch

2013-08-29 Thread Takuya Yoshikawa
I think this patch set answers Gleb's comment. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] KVM: Stop using extra buffer for copying dirty_bitmap to user-space

2013-08-29 Thread Takuya Yoshikawa
Now that mmu_lock is held only inside kvm_mmu_write_protect_pt_masked(), we can do __put_user() for copying each 64/32 dirty bits to user-space. This eliminates the need to copy the whole bitmap to an extra buffer and the resulting code is much more cache friendly than before. Signed-off-by: Taku

[PATCH 1/2] KVM: Take mmu_lock only while write-protecting pages in get_dirty_log

2013-08-29 Thread Takuya Yoshikawa
Xiao's "KVM: MMU: flush tlb if the spte can be locklessly modified" allows us to release mmu_lock before flushing TLBs. Signed-off-by: Takuya Yoshikawa Cc: Xiao Guangrong --- Xiao can change the remaining mmu_lock to RCU's read-side lock: The grace period will be reasonably limited. arch/x86

[PATCH V2 6/6] vhost_net: correctly limit the max pending buffers

2013-08-29 Thread Jason Wang
As Michael point out, We used to limit the max pending DMAs to get better cache utilization. But it was not done correctly since it was one done when there's no new buffers submitted from guest. Guest can easily exceeds the limitation by keeping sending packets. So this patch moves the check into

[PATCH V2 4/6] vhost_net: determine whether or not to use zerocopy at one time

2013-08-29 Thread Jason Wang
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if upend_idx != done_idx we still set zcopy_used to true and rollback this choice later. This could be avoided by determine zerocopy once by checking all conditions at one time before. Signed-off-by: Jason Wang --- drivers/

[PATCH V2 2/6] vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()

2013-08-29 Thread Jason Wang
We tend to batch the used adding and signaling in vhost_zerocopy_callback() which may result more than 100 used buffers to be updated in vhost_zerocopy_signal_used() in some cases. So wwitch to use vhost_add_used_and_signal_n() to avoid multiple calls to vhost_add_used_and_signal(). Which means muc

[PATCH V2 3/6] vhost: switch to use vhost_add_used_n()

2013-08-29 Thread Jason Wang
Let vhost_add_used() to use vhost_add_used_n() to reduce the code duplication. Signed-off-by: Jason Wang --- drivers/vhost/vhost.c | 54 ++-- 1 files changed, 12 insertions(+), 42 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost

[PATCH V2 5/6] vhost_net: poll vhost queue after marking DMA is done

2013-08-29 Thread Jason Wang
We used to poll vhost queue before making DMA is done, this is racy if vhost thread were waked up before marking DMA is done which can result the signal to be missed. Fix this by always poll the vhost thread before DMA is done. Signed-off-by: Jason Wang --- drivers/vhost/net.c |9 +

[PATCH V2 1/6] vhost_net: make vhost_zerocopy_signal_used() returns void

2013-08-29 Thread Jason Wang
None of its caller use its return value, so let it return void. Signed-off-by: Jason Wang --- drivers/vhost/net.c |5 ++--- 1 files changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 969a859..280ee66 100644 --- a/drivers/vhost/net.c +++ b/d

[PATCH V2 0/6] vhost code cleanup and minor enhancement

2013-08-29 Thread Jason Wang
Hi all: This series tries to unify and simplify vhost codes especially for zerocopy. With this series, 5% - 10% improvement for per cpu throughput were seen during netperf guest sending test. Plase review. Changes from V1: - Fix the zerocopy enabling check by changing the check of upend_idx != d

Re: [Qemu-devel] Direct guest device access from nested guest

2013-08-29 Thread Muli Ben-Yehuda
On Thu, Aug 29, 2013 at 03:55:20PM -0700, Aaron Fabbri wrote: > Has anyone considered a paravirt approach? That is: > > Guest kernel: Write a new IOMMU API back end which does KVM hypercalls. > Exposes VFIO to guest user processes (nested VMs) as usual. > > Host kernel: KVM does things like c

[PATCH RFC] KVM: PPC: Book3S HV: Reserve POWER8 space in get/set_one_reg

2013-08-29 Thread Michael Neuling
Alex, This reserves space in get/set_one_reg ioctl for the extra guest state needed for POWER8. It doesn't implement these at all, it just reserves them so that the ABI is defined now. A few things to note here: - POWER8 has 6 PMCs and an additional 2 SPMCs for the supervisor. Here I'm sto