[PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start()

2012-06-15 Thread Takuya Yoshikawa
this by using kvm_handle_hva_range(). On our x86 host, with a minimum configuration for the guest, the invalidation became 40% faster on average and the worst case was also improved to the same degree. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Alexander Graf ag...@suse.de Cc: Paul

Re: [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()

2012-06-15 Thread Takuya Yoshikawa
On Fri, 15 Jun 2012 20:31:44 +0900 Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp wrote: ... diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index d03eb6f..53716dd 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm

[PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()

2012-06-15 Thread Takuya Yoshikawa
This restricts hva handling in mmu code and makes it easier to extend kvm_handle_hva() so that it can treat a range of addresses later in this patch series. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Alexander Graf ag...@suse.de Cc: Paul Mackerras pau...@samba.org

[PATCH 4/4] KVM: Introduce kvm_unmap_hva_range() for kvm_mmu_notifier_invalidate_range_start()

2012-06-15 Thread Takuya Yoshikawa
this by using kvm_handle_hva_range(). On our x86 host, with a minimum configuration for the guest, the invalidation became 40% faster on average and the worst case was also improved to the same degree. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Alexander Graf ag...@suse.de Cc: Paul

Re: [PATCH 2/4] KVM: Introduce hva_to_gfn() for kvm_handle_hva()

2012-06-15 Thread Takuya Yoshikawa
On Fri, 15 Jun 2012 20:31:44 +0900 Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp wrote: ... diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index d03eb6f..53716dd 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm

Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

2012-06-14 Thread Takuya Yoshikawa
On Thu, 14 Jun 2012 18:36:42 +0900 Akinobu Mita akinobu.m...@gmail.com wrote: 1) while I agree with Akinobu and thank him for pointing out a _potential_ alignment problem, this is a separate issue and your existing patch should go in anyway. There are probably other drivers with

Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

2012-06-13 Thread Takuya Yoshikawa
On Wed, 13 Jun 2012 18:43:40 +0900 Akinobu Mita akinobu.m...@gmail.com wrote: Should this hash_table be converted from u16 hash_table[32] to DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned on long-word boundary? I think hash_table is already long-word aligned because it is

Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

2012-06-13 Thread Takuya Yoshikawa
On Wed, 13 Jun 2012 22:31:13 +0900 Akinobu Mita akinobu.m...@gmail.com wrote: Should this hash_table be converted from u16 hash_table[32] to DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned on long-word boundary? I think hash_table is already long-word aligned because it

Re: [PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

2012-06-13 Thread Takuya Yoshikawa
On Wed, 13 Jun 2012 08:21:18 -0700 Grant Grundler grantgrund...@gmail.com wrote: Should this hash_table be converted from u16 hash_table[32] to DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned on long-word boundary? I think hash_table is already long-word aligned

Re: [PATCH v6 5/9] KVM: MMU: introduce SPTE_MMU_WRITEABLE bit

2012-06-13 Thread Takuya Yoshikawa
On Wed, 13 Jun 2012 18:39:05 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: /* Return true if the spte is dropped. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush) +static bool +spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool pt_protect)

Re: [PATCH v6 6/9] KVM: MMU: fast path of handling guest page fault

2012-06-13 Thread Takuya Yoshikawa
On Wed, 13 Jun 2012 19:40:02 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: mmu_spte_update is handling several different cases. Please rewrite it, add a comment on top of it (or spread comments on top of each significant code line) with all cases it is handling (also recheck it regarding

Re: [PATCH 1/4] drivers/net/ethernet/sfc: Add efx_ prefix to set_bit_le()

2012-06-12 Thread Takuya Yoshikawa
On Mon, 11 Jun 2012 14:09:15 + Arnd Bergmann a...@arndb.de wrote: On Monday 11 June 2012, Takuya Yoshikawa wrote: /* Set bit in a little-endian bitfield */ -static inline void set_bit_le(unsigned nr, unsigned char *addr) +static inline void efx_set_bit_le(unsigned nr, unsigned char

Re: [PATCH 3/4] bitops: Introduce generic set_bit_le()

2012-06-12 Thread Takuya Yoshikawa
On Mon, 11 Jun 2012 14:10:26 + Arnd Bergmann a...@arndb.de wrote: On Monday 11 June 2012, Takuya Yoshikawa wrote: From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is being used for this missing function

[PATCH 0/5] Introduce generic set_bit_le() -v2

2012-06-12 Thread Takuya Yoshikawa
by Grant - bitops: added clear_bit_le -- suggested by Arnd - powerpc: added the same code Ben Hutchings (1): sfc: Use standard __{clear,set}_bit_le() functions Takuya Yoshikawa (4): drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function bitops: Introduce generic {clear,set

[PATCH 1/5] sfc: Use standard __{clear,set}_bit_le() functions

2012-06-12 Thread Takuya Yoshikawa
From: Ben Hutchings bhutchi...@solarflare.com There are now standard functions for dealing with little-endian bit arrays, so use them instead of our own implementations. Signed-off-by: Ben Hutchings bhutchi...@solarflare.com Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp

[PATCH 2/5] drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function

2012-06-12 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp To introduce generic set_bit_le() later, we remove our own definition and use a proper non-atomic bitops function: __set_bit_le(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Acked-by: Grant Grundler grund...@parisc

[PATCH 3/5] bitops: Introduce generic {clear,set}_bit_le()

2012-06-12 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is being used for this missing function. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Acked-by: Arnd Bergmann a...@arndb.de --- include/asm-generic/bitops

[PATCH 4/5] powerpc: bitops: Introduce {clear,set}_bit_le()

2012-06-12 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is being used for this missing function. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Benjamin Herrenschmidt b...@kernel.crashing.org --- arch/powerpc

[PATCH 5/5] KVM: Replace test_and_set_bit_le() in mark_page_dirty_in_slot() with set_bit_le()

2012-06-12 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Now that we have defined generic set_bit_le() we do not need to use test_and_set_bit_le() for atomically setting a bit. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Avi Kivity a...@redhat.com Cc: Marcelo Tosatti mtosa

[PATCH 4/4] KVM: Replace test_and_set_bit_le() in mark_page_dirty_in_slot() with set_bit_le()

2012-06-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Now that we have defined generic set_bit_le() we do not need to use test_and_set_bit_le() for atomically setting a bit. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/kvm_main.c |3 +-- 1 files changed, 1

[PATCH 3/4] bitops: Introduce generic set_bit_le()

2012-06-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is being used for this missing function. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Arnd Bergmann a...@arndb.de --- include/asm-generic/bitops/le.h

[PATCH 2/4] drivers/net/ethernet/dec/tulip: Add tulip_ prefix to set_bit_le()

2012-06-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to introduce generic set_bit_le(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Grant Grundler grund...@parisc-linux.org --- drivers/net/ethernet/dec/tulip/de2104x.c|7 +++ drivers/net/ethernet/dec

[PATCH 1/4] drivers/net/ethernet/sfc: Add efx_ prefix to set_bit_le()

2012-06-11 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed to introduce generic set_bit_le(). Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Ben Hutchings bhutchi...@solarflare.com --- drivers/net/ethernet/sfc/efx.c|4 ++-- drivers/net/ethernet/sfc/net_driver.h

[PATCH 0/4] Introduce generic set_bit_le()

2012-06-11 Thread Takuya Yoshikawa
KVM is using test_and_set_bit_le() for this missing function; this patch series corrects this usage. As some drivers have their own definitions of set_bit_le(), which seem to be incompatible with the generic one, renaming is also needed. Note: the whole series is against linux-next. Takuya

[PATCH] KVM: MMU: Remove unused parameter from mmu_memory_cache_alloc()

2012-05-29 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Size is not needed to return one from pre-allocated objects. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/mmu.c | 14 +- 1 files changed, 5 insertions(+), 9 deletions(-) diff --git a/arch/x86

Re: [PATCH v2 2/5] KVM: MMU: Convert remote flushes to kvm_mark_tlb_dirty() and a conditional flush

2012-05-22 Thread Takuya Yoshikawa
On Thu, 17 May 2012 13:24:41 +0300 Avi Kivity a...@redhat.com wrote: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2256f51..a2149d8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3130,7 +3130,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct

[PATCH 1/2] KVM: Separate out dirty_bitmap allocation code as kvm_kvzalloc()

2012-05-19 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Will be used for lpage_info allocation later. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- virt/kvm/kvm_main.c | 32 ++-- 1 files changed, 22 insertions(+), 10 deletions(-) diff --git

[PATCH 2/2 v2] KVM: Avoid wasting pages for small lpage_info arrays

2012-05-19 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(). To make things worse

Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-18 Thread Takuya Yoshikawa
On Fri, 18 May 2012 09:26:05 +0200 Ingo Molnar mi...@kernel.org wrote: I'm not sure we had a usable spin_is_contended() back then, nor was the !PREEMPT case in my mind really. The fact that both spin_needbreak() and spin_is_contended() can be used outside of sched is a bit confusing. For

Re: [PATCH 0/4] KVM: Enable EPT access bit feature

2012-05-16 Thread Takuya Yoshikawa
On Wed, 16 May 2012 12:21:53 +0300 Avi Kivity a...@redhat.com wrote: On 05/16/2012 04:04 AM, Xudong Hao wrote: EPT A/D bits enable VMMs to efficiently implement memory management and page classification algorithms to optimize VM memory operations such as de-fragmentation, paging,

Re: dirty-log-perf can not work

2012-05-15 Thread Takuya Yoshikawa
On Mon, 14 May 2012 15:04:42 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: It shows: # ./api/dirty-log-perf ERROR: ld.so: object '/usr/lib64/freetype-infinality/libfreetype.so.6' from LD_PRELOAD cannot be preloaded: ignored. dirty-log-perf: 1125899907104768 slot pages /

[PATCH] KVM: Avoid wasting pages for small lpage_info arrays

2012-05-10 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp lpage_info is created for each large level even when the memory slot is not for RAM. This means that when we add one slot for a PCI device, we end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc(): this problem will become

Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-10 Thread Takuya Yoshikawa
Replaced Ingo's address with kernel.org one, On Thu, 03 May 2012 17:47:30 +0200 Peter Zijlstra pet...@infradead.org wrote: On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote: But as I could not see why spin_needbreak() was differently implemented depending on CONFIG_PREEMPT, I

Re: [PATCH unit-tests] Add async page fault test

2012-05-09 Thread Takuya Yoshikawa
On Wed, 9 May 2012 11:59:17 +0300 Gleb Natapov g...@redhat.com wrote: Hmm, yes if it is file backed it may work. Setting up qemu to use file backed memory is one more complication while running the test though. I haven't checked by I am not sure that MADV_DONTNEED will drop page immediately

Re: [PATCH unit-tests] Add async page fault test

2012-05-09 Thread Takuya Yoshikawa
On Wed, 09 May 2012 16:20:23 +0300 Avi Kivity a...@redhat.com wrote: zap_page_range() actually frees these pages, no? Virtio balloon seems to rely on this. The pages are removed from the user address space. But if they're not anonymous, the pages still live in the page cache. Ah,

[RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-03 Thread Takuya Yoshikawa
=== From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp While doing kvm development, we found a case in which we wanted to break a critical section on lock contention even without CONFIG_PREEMPT. Although we can do that using spin_is_contended() and cond_resched(), changing cond_resched_lock

Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-03 Thread Takuya Yoshikawa
On Thu, 03 May 2012 10:35:27 +0200 Peter Zijlstra pet...@infradead.org wrote: On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote: Although we can do that using spin_is_contended() and cond_resched(), changing cond_resched_lock() to satisfy such a need is another option. Yeah

Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

2012-05-03 Thread Takuya Yoshikawa
On Thu, 03 May 2012 20:23:18 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Takuya, i am sorry, please forgive my rudeness! Since my English is so poor that it is easy for me to misunderstand the mail. :( Me too, I am not good at reading/speaking English! No problem. But this

Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-03 Thread Takuya Yoshikawa
On Thu, 03 May 2012 14:29:10 +0200 Peter Zijlstra pet...@infradead.org wrote: On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote: Although the real use case is out of this RFC patch, we are now discussing a case in which we may hold a spin_lock for long time, ms order, depending

Re: [RFC] sched: make callers check lock contention for cond_resched_lock()

2012-05-03 Thread Takuya Yoshikawa
On Thu, 03 May 2012 15:47:26 +0300 Avi Kivity a...@redhat.com wrote: On 05/03/2012 03:29 PM, Peter Zijlstra wrote: On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote: Although the real use case is out of this RFC patch, we are now discussing a case in which we may hold a spin_lock

Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

2012-05-02 Thread Takuya Yoshikawa
On Sat, 28 Apr 2012 19:05:44 +0900 Takuya Yoshikawa takuya.yoshik...@gmail.com wrote: 1. Problem During live migration, if the guest tries to take mmu_lock at the same time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be forced to wait long time

Re: Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

2012-05-02 Thread Takuya Yoshikawa
On Wed, 02 May 2012 14:33:55 +0300 Avi Kivity a...@redhat.com wrote: = perf top -t ${QEMU_TID} = 51.52% qemu-system-x86_64 [.] memory_region_get_dirty 16.73% qemu-system-x86_64 [.] ram_save_remaining

Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

2012-05-02 Thread Takuya Yoshikawa
On Wed, 02 May 2012 13:39:51 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Was the problem really mmu_lock contention? Takuya, i am so tired to argue the advantage of lockless write-protect and lockless O(1) dirty-log again and again. You are missing my point. Please do not

[Qemu-devel] Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

2012-05-02 Thread Takuya Yoshikawa
On Sat, 28 Apr 2012 19:05:44 +0900 Takuya Yoshikawa takuya.yoshik...@gmail.com wrote: 1. Problem During live migration, if the guest tries to take mmu_lock at the same time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be forced to wait long time

Re: [Qemu-devel] Heavy memory_region_get_dirty() -- Re: [PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

2012-05-02 Thread Takuya Yoshikawa
On Wed, 02 May 2012 14:33:55 +0300 Avi Kivity a...@redhat.com wrote: = perf top -t ${QEMU_TID} = 51.52% qemu-system-x86_64 [.] memory_region_get_dirty 16.73% qemu-system-x86_64 [.] ram_save_remaining

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-05-01 Thread Takuya Yoshikawa
On Tue, 1 May 2012 00:04:47 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Looking forward to it! After your work, 8192 in my patch may better be lowered a bit. Why not simply use spin_is_contented again? Are you afraid of GET_DIRTY_LOG starved by pagefaults? No, but not so confident.

Re: [PATCH] kvm: update eax documentation in signature cpuid

2012-05-01 Thread Takuya Yoshikawa
On Tue, 1 May 2012 16:06:11 +0300 Michael S. Tsirkin m...@redhat.com wrote: Note that this value in ebx, ecx and edx corresponds to the string KVMKVMKVM. +The value in eax corresponds to the maximim cpuid function present in this leaf, maximum? Takuya +and will be updated if

[PATCH 0/2 v2] KVM: x86 emulator: Simplify ModRM fetching

2012-04-30 Thread Takuya Yoshikawa
Updated based on Avi's advice. Takuya Yoshikawa (2): KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables KVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding arch/x86/kvm/emulate.c | 119 1 files

[PATCH 1/2] KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables

2012-04-30 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Needed for the following patch which simplifies ModRM fetching code. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp --- arch/x86/kvm/emulate.c | 111 1 files changed, 56

[PATCH 2/2 v2] KVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding

2012-04-30 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Although ModRM byte is fetched for group decoding, it is soon pushed back to make decode_modrm() fetch it later again. Now that ModRM flag can be found in the top level opcode tables, fetch ModRM byte before group decoding to make the code

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-30 Thread Takuya Yoshikawa
On Sun, 29 Apr 2012 18:20:37 +0300 Avi Kivity a...@redhat.com wrote: - cond_resched_lock() uses spin_needbreak(), and I am not sure if this still does rescheduling for alleviating mmu_lock contention without CONFIG_PREEMPT. IMO that's a feature. If kernel policy is not

Re: [PATCH 1/2] KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables

2012-04-30 Thread Takuya Yoshikawa
On Mon, 30 Apr 2012 13:31:09 +0300 Avi Kivity a...@redhat.com wrote: static struct opcode group7_rm1[] = { - DI(SrcNone | ModRM | Priv, monitor), - DI(SrcNone | ModRM | Priv, mwait), + DI(SrcNone | Priv, monitor), + DI(SrcNone | Priv, mwait), N, N, N, N, N, N, };

Re: [PATCH v4 06/10] KVM: MMU: fast path of handling guest page fault

2012-04-29 Thread Takuya Yoshikawa
On Fri, 27 Apr 2012 11:52:13 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Yes but the objective you are aiming for is to read and write sptes without mmu_lock. That is, i am not talking about this patch. Please read carefully the two examples i gave (separated by example)). The real

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-29 Thread Takuya Yoshikawa
On Sun, 29 Apr 2012 14:27:30 +0300 Avi Kivity a...@redhat.com wrote: + if (need_resched() || spin_is_contended(kvm-mmu_lock)) { + kvm_flush_remote_tlbs(kvm); Do we really need to flush the TLB here? Suppose we don't. So some pages could

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-29 Thread Takuya Yoshikawa
On Sun, 29 Apr 2012 15:59:18 +0300 Avi Kivity a...@redhat.com wrote: As we discussed before, we need to add some tricks to de-couple mmu_lock and TLB flush. Ok, let's discuss them (we can apply the patch independently). Do you have something in mind? How about your own idea?

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-29 Thread Takuya Yoshikawa
On Sun, 29 Apr 2012 17:39:35 +0300 Avi Kivity a...@redhat.com wrote: How about your own idea? Looks pretty good, if I do say so myself. I'll take a shot at implementing it. Looking forward to it! After your work, 8192 in my patch may better be lowered a bit. Thanks, Takuya --

Re: [PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-29 Thread Takuya Yoshikawa
On Sun, 29 Apr 2012 18:00:03 +0300 Avi Kivity a...@redhat.com wrote: After your work, 8192 in my patch may better be lowered a bit. Why not remove it altogether? Just change it to cond_resched_lock(). Two concerns: - too many checks may slow down GET_DIRTY_LOG. -

[PATCH 0/1 v2] KVM: Alleviate mmu_lock contention during dirty logging

2012-04-28 Thread Takuya Yoshikawa
dirty page logging, can cause the same problem. My plan is to replace it with rmap-based protection after this. Thanks, Takuya --- Takuya Yoshikawa (1): KVM: Reduce mmu_lock contention during dirty logging by cond_resched() arch/x86/include/asm/kvm_host.h |6 +++--- arch/x86/kvm

[PATCH 1/1 v2] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-28 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp get_dirty_log() needs to hold mmu_lock during write protecting dirty pages and this can be long when there are many dirty pages to protect. As the guest can get faulted during that time, and of course other mmu works can also happen, this may

Re: [PATCH 4/4] KVM: x86 emulator: Avoid pushing back ModRM byte in decode_opcode()

2012-04-24 Thread Takuya Yoshikawa
On Tue, 24 Apr 2012 17:10:08 +0300 Avi Kivity a...@redhat.com wrote: + if (!modrm_fetched (ctxt-d ModRM)) + ctxt-modrm = insn_fetch(u8, ctxt); Instead of adding yet another conditional, how about doing something like if ((c-d ModRM) || (gtype == Group) || (gtype ==

Re: [PATCH 0/4] KVM: x86 emulator: Split decoder into separate functions

2012-04-24 Thread Takuya Yoshikawa
On Tue, 24 Apr 2012 17:11:54 +0300 Avi Kivity a...@redhat.com wrote: On 04/23/2012 06:31 PM, Takuya Yoshikawa wrote: Takuya Yoshikawa (4): KVM: x86 emulator: Introduce ctxt-op_prefix for 0x66 prefix KVM: x86 emulator: Make prefix decoding a separate function KVM: x86 emulator: Make

[PATCH 0/4] KVM: x86 emulator: Split decoder into separate functions

2012-04-23 Thread Takuya Yoshikawa
Takuya Yoshikawa (4): KVM: x86 emulator: Introduce ctxt-op_prefix for 0x66 prefix KVM: x86 emulator: Make prefix decoding a separate function KVM: x86 emulator: Make opcode decoding a separate function KVM: x86 emulator: Avoid pushing back ModRM byte in decode_opcode() arch/x86/include

[PATCH 1/4] KVM: x86 emulator: Introduce ctxt-op_prefix for 0x66 prefix

2012-04-23 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp This is needed to make prefix decoding a separate function. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com --- arch/x86/include/asm/kvm_emulate.h |1 + arch/x86/kvm

[PATCH 2/4] KVM: x86 emulator: Make prefix decoding a separate function

2012-04-23 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Instruction decoding can be split into separate parts, and this is the first one which treats the instruction prefixes. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com --- arch

[PATCH 3/4] KVM: x86 emulator: Make opcode decoding a separate function

2012-04-23 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp This is the second part of the instruction decoding which treats the opcode. Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com --- arch/x86/kvm/emulate.c | 66

[PATCH 4/4] KVM: x86 emulator: Avoid pushing back ModRM byte in decode_opcode()

2012-04-23 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp Although ModRM byte is read for group decoding, it is soon pushed back to make decode_modrm() fetch it later. We should consistently read it, only once, in decode_opcode() instead. Signed-off-by: Takuya Yoshikawa yoshikawa.tak

kvm build failure on 32bit host: arch/x86/kvm/emulate.c

2012-04-22 Thread Takuya Yoshikawa
I got the following error: === arch/x86/kvm/emulate.c: Assembler messages: arch/x86/kvm/emulate.c:4122: Error: bad register name `%dil' make[2]: *** [arch/x86/kvm/emulate.o] Error 1 ... === The line indicates: commit cbe2c9d30aa69b0551247ddb0fb450b6e8080ec4 KVM: x86 emulator: MMX support

Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte

2012-04-20 Thread Takuya Yoshikawa
Sorry for the delay. On Wed, 18 Apr 2012 12:01:03 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: I have checked dirty-log-perf myself with this patch [01-07]. GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower. Thanks for your test! Unbelievable, i will

Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte

2012-04-20 Thread Takuya Yoshikawa
On Wed, 18 Apr 2012 18:03:10 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: By the way, what is the case did you test? ept = 1 ? Yes! I am not worrying about without-EPT/NPT-migration. Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of

Re: [PATCH v3 2/9] KVM: MMU: abstract spte write-protect

2012-04-20 Thread Takuya Yoshikawa
On Fri, 20 Apr 2012 18:33:19 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: It is preferable to remove all large sptes including read-only ones, the current behaviour, then to verify that no read-write transition can occur in fault paths (fault paths which are increasing in number). I

Re: [PATCH v3 5/9] KVM: MMU: introduce SPTE_WRITE_PROTECT bit

2012-04-20 Thread Takuya Yoshikawa
On Fri, 20 Apr 2012 21:55:55 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: More importantly than the particular flush TLB case, the point is every piece of code that reads and writes sptes must now be aware that mmu_lock alone does not guarantee stability. Everything must be audited. In

Re: [PATCH] kvm: don't call mmu_shrinker w/o used_mmu_pages

2012-04-20 Thread Takuya Yoshikawa
On Fri, 20 Apr 2012 16:07:41 -0700 Ying Han ying...@google.com wrote: My understanding of the real pain is the poor implementation of the mmu_shrinker. It iterates all the registered mmu_shrink callbacks for each kvm and only does little work at a time while holding two big locks. I learned

Re: [PATCH] kvm: don't call mmu_shrinker w/o used_mmu_pages

2012-04-20 Thread Takuya Yoshikawa
On Fri, 20 Apr 2012 19:15:24 -0700 Mike Waychison mi...@google.com wrote: In our situation, we simple disable the shrinker altogether. My understanding is that we EPT or NPT, the amount of memory used by these tables is bounded by the size of guest physical memory, whereas with software

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-18 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 17:56:24 +0300 Avi Kivity a...@redhat.com wrote: For live migration, range-based control may be enough duo to the locality of WWS. What's WWS? IIRC it was mentioned in a usenix paper: Writable Working Set. May not be a commonly known concept. Kind of working set, but

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-18 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 17:56:24 +0300 Avi Kivity a...@redhat.com wrote: For live migration, range-based control may be enough duo to the locality of WWS. What's WWS? IIRC it was mentioned in a usenix paper: Writable Working Set. May not be a commonly known concept. Kind of working set, but

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-17 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 10:51:40 +0300 Avi Kivity a...@redhat.com wrote: That's true with the write protect everything approach we use now. But it's not true with range-based write protection, where you issue GET_DIRTY_LOG on a range of pages and only need to re-write-protect them. (the

Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte

2012-04-17 Thread Takuya Yoshikawa
On Mon, 16 Apr 2012 11:36:25 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: I tested it with kernbench, no regression is found. Because kernbench is not at all good test for this. It is not a problem since the iter and spte should be in the cache. I have checked dirty-log-perf

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-17 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 15:41:39 +0300 Avi Kivity a...@redhat.com wrote: Since there are many known algorithms to predict hot memory pages, the userspace will be able to tune the frequency of GET_DIRTY_LOG for such parts not to get too many faults repeatedly, if we can restrict the range of

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-17 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 10:51:40 +0300 Avi Kivity a...@redhat.com wrote: That's true with the write protect everything approach we use now. But it's not true with range-based write protection, where you issue GET_DIRTY_LOG on a range of pages and only need to re-write-protect them. (the

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-17 Thread Takuya Yoshikawa
On Tue, 17 Apr 2012 15:41:39 +0300 Avi Kivity a...@redhat.com wrote: Since there are many known algorithms to predict hot memory pages, the userspace will be able to tune the frequency of GET_DIRTY_LOG for such parts not to get too many faults repeatedly, if we can restrict the range of

Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect

2012-04-16 Thread Takuya Yoshikawa
On Sun, 15 Apr 2012 14:25:30 +0300 Avi Kivity a...@redhat.com wrote: @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, kvm_mmu_pages_init(parent, parents, pages); while (mmu_unsync_walk(parent, pages)) { - int protected = 0; + bool

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-16 Thread Takuya Yoshikawa
On Sun, 15 Apr 2012 12:32:59 +0300 Avi Kivity a...@redhat.com wrote: Just to throw another idea into the mix - we can have write-protect-less dirty logging, too. Instead of write protection, drop the dirty bit, and check it again when reading the dirty log. It might look like we're

Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect

2012-04-16 Thread Takuya Yoshikawa
On Mon, 16 Apr 2012 17:28:32 +0300 Avi Kivity a...@redhat.com wrote: But the real question is whether there is any point in re-writing completely correct C code: there are tons of int like this in the kernel code. __rmap_write_protect() was introduced recently, so if this conversion is

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-16 Thread Takuya Yoshikawa
On Sun, 15 Apr 2012 12:32:59 +0300 Avi Kivity a...@redhat.com wrote: Just to throw another idea into the mix - we can have write-protect-less dirty logging, too. Instead of write protection, drop the dirty bit, and check it again when reading the dirty log. It might look like we're

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-13 Thread Takuya Yoshikawa
Xiao, Takuya Yoshikawa takuya.yoshik...@gmail.com wrote: What is your really want to say but i missed? How to improve and what we should pay for that. Note that I am not objecting to O(1) itself. I forgot to say one important thing -- I might give you wrong impression. I am perfectly

Re: [PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-13 Thread Takuya Yoshikawa
On Thu, 12 Apr 2012 19:56:45 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: Other than potential performance improvement, the worst case scenario of holding mmu_lock for hundreds of milliseconds at the beginning of migration of huge guests must be fixed. Write protection in

Re: [PATCH v2] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:33:39 -0300 Marcelo Tosatti mtosa...@redhat.com wrote: kvm_arch_commit_memory_region(kvm, mem, old, user_alloc); - /* -* If the new memory slot is created, we need to clear all -* mmio sptes. -*/ - if (npages old.base_gfn !=

Re: [PATCH v2] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()

2012-04-13 Thread Takuya Yoshikawa
Hi, On Wed, 11 Apr 2012 11:11:07 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: restart: - list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link) - if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list)) - goto restart; + zapped

Re: [PATCH v2 04/16] KVM: MMU: return bool in __rmap_write_protect

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:11:13 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: The reture value of __rmap_write_protect is either 1 or 0, use true/false instead of these ... @@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,

Re: [PATCH v2 03/16] KVM: MMU: properly assert spte on rmap walking path

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:10:45 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep = NULL; + if (iter-desc) { if (iter-pos PTE_LIST_EXT - 1) { - u64 *sptep; -

Re: [PATCH v2 05/16] KVM: MMU: abstract spte write-protect

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:11:45 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: +/* Return true if the spte is dropped. */ Return value does not correspond with the function name so it is confusing. People may think that true means write protection has been done. +static bool

Re: [PATCH v2 07/16] KVM: MMU: introduce for_each_pte_list_spte

2012-04-13 Thread Takuya Yoshikawa
://vger.kernel.org/majordomo-info.html -- Takuya Yoshikawa takuya.yoshik...@gmail.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 10/16] KVM: MMU: fask check whether page is writable

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:14:26 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status to avoid unnecessary shadow page walking Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com --- arch/x86/kvm/mmu.c |

Re: [PATCH v2 00/16] KVM: MMU: fast page fault

2012-04-13 Thread Takuya Yoshikawa
On Fri, 13 Apr 2012 18:05:29 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Thanks for Avi and Marcelo's review, i have simplified the whole things in this version: - it only fix the page fault with PFEC.P = 1 PFEC.W = 0 that means unlock set_spte path can be dropped. - it

[PATCH being tested] KVM: Reduce mmu_lock contention during dirty logging by cond_resched()

2012-04-11 Thread Takuya Yoshikawa
increase if multiple VCPUs try to write to memory. Anyway, this patch is small and seems effective. Takuya === From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp get_dirty_log() needs to hold mmu_lock during write protecting dirty pages and this can be long when there are many dirty

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-11 Thread Takuya Yoshikawa
On Tue, 10 Apr 2012 19:58:44 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: No, i do not really agree with that. We really can get great benefit from O(1) especially if lockless write-protect is introduced for O(1), live migration is very useful for cloud computing

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-11 Thread Takuya Yoshikawa
On Wed, 11 Apr 2012 20:38:57 +0800 Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote: Well, my point is that live migration is so very useful that it is worth to be improved, the description of your also proves this point. What is your really want to say but i missed? How to improve and

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-11 Thread Takuya Yoshikawa
On Wed, 11 Apr 2012 17:21:30 +0300 Avi Kivity a...@redhat.com wrote: Currently the main performance bottleneck for migration is qemu, which is single threaded and generally inefficient. However I am sure that once the qemu bottlenecks will be removed we'll encounter kvm problems,

Re: [PATCH 00/13] KVM: MMU: fast page fault

2012-04-10 Thread Takuya Yoshikawa
On Tue, 10 Apr 2012 13:39:14 +0300 Avi Kivity a...@redhat.com wrote: On 04/09/2012 10:46 PM, Marcelo Tosatti wrote: Perhaps the mmu_lock hold times by get_dirty are a large component here? If that can be alleviated, not only RO-RW faults benefit. Currently the longest holder in normal

[PATCH v2] KVM: Avoid zapping unrelated shadows in __kvm_set_memory_region()

2012-04-10 Thread Takuya Yoshikawa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp We do not need to zap all shadow pages of the guest when we create or destroy a slot in this function. To change this, we make kvm_mmu_zap_all()/kvm_arch_flush_shadow() zap only those which have mappings into a given slot. The way we iterate

<    2   3   4   5   6   7   8   9   10   11   >