this by using kvm_handle_hva_range().
On our x86 host, with a minimum configuration for the guest, the
invalidation became 40% faster on average and the worst case was also
improved to the same degree.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Alexander Graf ag...@suse.de
Cc: Paul
On Fri, 15 Jun 2012 20:31:44 +0900
Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp wrote:
...
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d03eb6f..53716dd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm
This restricts hva handling in mmu code and makes it easier to extend
kvm_handle_hva() so that it can treat a range of addresses later in this
patch series.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Alexander Graf ag...@suse.de
Cc: Paul Mackerras pau...@samba.org
this by using kvm_handle_hva_range().
On our x86 host, with a minimum configuration for the guest, the
invalidation became 40% faster on average and the worst case was also
improved to the same degree.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Alexander Graf ag...@suse.de
Cc: Paul
On Fri, 15 Jun 2012 20:31:44 +0900
Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp wrote:
...
diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index d03eb6f..53716dd 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm
On Thu, 14 Jun 2012 18:36:42 +0900
Akinobu Mita akinobu.m...@gmail.com wrote:
1) while I agree with Akinobu and thank him for pointing out a
_potential_ alignment problem, this is a separate issue and your
existing patch should go in anyway. There are probably other drivers
with
On Wed, 13 Jun 2012 18:43:40 +0900
Akinobu Mita akinobu.m...@gmail.com wrote:
Should this hash_table be converted from u16 hash_table[32] to
DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned
on long-word boundary?
I think hash_table is already long-word aligned because it is
On Wed, 13 Jun 2012 22:31:13 +0900
Akinobu Mita akinobu.m...@gmail.com wrote:
Should this hash_table be converted from u16 hash_table[32] to
DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned
on long-word boundary?
I think hash_table is already long-word aligned because it
On Wed, 13 Jun 2012 08:21:18 -0700
Grant Grundler grantgrund...@gmail.com wrote:
Should this hash_table be converted from u16 hash_table[32] to
DECLARE_BITMAP(hash_table, 16 * 32) to ensure that it is aligned
on long-word boundary?
I think hash_table is already long-word aligned
On Wed, 13 Jun 2012 18:39:05 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
/* Return true if the spte is dropped. */
-static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush)
+static bool
+spte_write_protect(struct kvm *kvm, u64 *sptep, bool *flush, bool
pt_protect)
On Wed, 13 Jun 2012 19:40:02 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
mmu_spte_update is handling several different cases. Please rewrite
it, add a comment on top of it (or spread comments on top of each
significant code line) with all cases it is handling (also recheck it
regarding
On Mon, 11 Jun 2012 14:09:15 +
Arnd Bergmann a...@arndb.de wrote:
On Monday 11 June 2012, Takuya Yoshikawa wrote:
/* Set bit in a little-endian bitfield */
-static inline void set_bit_le(unsigned nr, unsigned char *addr)
+static inline void efx_set_bit_le(unsigned nr, unsigned char
On Mon, 11 Jun 2012 14:10:26 +
Arnd Bergmann a...@arndb.de wrote:
On Monday 11 June 2012, Takuya Yoshikawa wrote:
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function
by Grant
- bitops: added clear_bit_le -- suggested by Arnd
- powerpc: added the same code
Ben Hutchings (1):
sfc: Use standard __{clear,set}_bit_le() functions
Takuya Yoshikawa (4):
drivers/net/ethernet/dec/tulip: Use standard __set_bit_le() function
bitops: Introduce generic {clear,set
From: Ben Hutchings bhutchi...@solarflare.com
There are now standard functions for dealing with little-endian bit
arrays, so use them instead of our own implementations.
Signed-off-by: Ben Hutchings bhutchi...@solarflare.com
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
To introduce generic set_bit_le() later, we remove our own definition
and use a proper non-atomic bitops function: __set_bit_le().
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Acked-by: Grant Grundler grund...@parisc
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Acked-by: Arnd Bergmann a...@arndb.de
---
include/asm-generic/bitops
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Benjamin Herrenschmidt b...@kernel.crashing.org
---
arch/powerpc
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Now that we have defined generic set_bit_le() we do not need to use
test_and_set_bit_le() for atomically setting a bit.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Avi Kivity a...@redhat.com
Cc: Marcelo Tosatti mtosa
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Now that we have defined generic set_bit_le() we do not need to use
test_and_set_bit_le() for atomically setting a bit.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
virt/kvm/kvm_main.c |3 +--
1 files changed, 1
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to replace test_and_set_bit_le() in virt/kvm/kvm_main.c which is
being used for this missing function.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Arnd Bergmann a...@arndb.de
---
include/asm-generic/bitops/le.h
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to introduce generic set_bit_le().
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Grant Grundler grund...@parisc-linux.org
---
drivers/net/ethernet/dec/tulip/de2104x.c|7 +++
drivers/net/ethernet/dec
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed to introduce generic set_bit_le().
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Ben Hutchings bhutchi...@solarflare.com
---
drivers/net/ethernet/sfc/efx.c|4 ++--
drivers/net/ethernet/sfc/net_driver.h
KVM is using test_and_set_bit_le() for this missing function; this patch
series corrects this usage.
As some drivers have their own definitions of set_bit_le(), which seem
to be incompatible with the generic one, renaming is also needed.
Note: the whole series is against linux-next.
Takuya
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Size is not needed to return one from pre-allocated objects.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
arch/x86/kvm/mmu.c | 14 +-
1 files changed, 5 insertions(+), 9 deletions(-)
diff --git a/arch/x86
On Thu, 17 May 2012 13:24:41 +0300
Avi Kivity a...@redhat.com wrote:
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2256f51..a2149d8 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3130,7 +3130,9 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Will be used for lpage_info allocation later.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
virt/kvm/kvm_main.c | 32 ++--
1 files changed, 22 insertions(+), 10 deletions(-)
diff --git
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
lpage_info is created for each large level even when the memory slot is
not for RAM. This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc().
To make things worse
On Fri, 18 May 2012 09:26:05 +0200
Ingo Molnar mi...@kernel.org wrote:
I'm not sure we had a usable spin_is_contended() back then, nor
was the !PREEMPT case in my mind really.
The fact that both spin_needbreak() and spin_is_contended() can be
used outside of sched is a bit confusing.
For
On Wed, 16 May 2012 12:21:53 +0300
Avi Kivity a...@redhat.com wrote:
On 05/16/2012 04:04 AM, Xudong Hao wrote:
EPT A/D bits enable VMMs to efficiently implement memory management and
page classification algorithms to optimize VM memory operations such as
de-fragmentation, paging,
On Mon, 14 May 2012 15:04:42 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
It shows:
# ./api/dirty-log-perf
ERROR: ld.so: object '/usr/lib64/freetype-infinality/libfreetype.so.6' from
LD_PRELOAD cannot be preloaded: ignored.
dirty-log-perf: 1125899907104768 slot pages /
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
lpage_info is created for each large level even when the memory slot is
not for RAM. This means that when we add one slot for a PCI device, we
end up allocating at least KVM_NR_PAGE_SIZES - 1 pages by vmalloc():
this problem will become
Replaced Ingo's address with kernel.org one,
On Thu, 03 May 2012 17:47:30 +0200
Peter Zijlstra pet...@infradead.org wrote:
On Thu, 2012-05-03 at 22:00 +0900, Takuya Yoshikawa wrote:
But as I could not see why spin_needbreak() was differently
implemented
depending on CONFIG_PREEMPT, I
On Wed, 9 May 2012 11:59:17 +0300
Gleb Natapov g...@redhat.com wrote:
Hmm, yes if it is file backed it may work. Setting up qemu to use file
backed memory is one more complication while running the test though.
I haven't checked by I am not sure that MADV_DONTNEED will drop page
immediately
On Wed, 09 May 2012 16:20:23 +0300
Avi Kivity a...@redhat.com wrote:
zap_page_range() actually frees these pages, no?
Virtio balloon seems to rely on this.
The pages are removed from the user address space. But if they're not
anonymous, the pages still live in the page cache.
Ah,
===
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
While doing kvm development, we found a case in which we wanted to break
a critical section on lock contention even without CONFIG_PREEMPT.
Although we can do that using spin_is_contended() and cond_resched(),
changing cond_resched_lock
On Thu, 03 May 2012 10:35:27 +0200
Peter Zijlstra pet...@infradead.org wrote:
On Thu, 2012-05-03 at 17:12 +0900, Takuya Yoshikawa wrote:
Although we can do that using spin_is_contended() and cond_resched(),
changing cond_resched_lock() to satisfy such a need is another option.
Yeah
On Thu, 03 May 2012 20:23:18 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
Takuya, i am sorry, please forgive my rudeness! Since my English is
so poor that it is easy for me to misunderstand the mail. :(
Me too, I am not good at reading/speaking English!
No problem.
But this
On Thu, 03 May 2012 14:29:10 +0200
Peter Zijlstra pet...@infradead.org wrote:
On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
Although the real use case is out of this RFC patch, we are now discussing
a case in which we may hold a spin_lock for long time, ms order, depending
On Thu, 03 May 2012 15:47:26 +0300
Avi Kivity a...@redhat.com wrote:
On 05/03/2012 03:29 PM, Peter Zijlstra wrote:
On Thu, 2012-05-03 at 21:22 +0900, Takuya Yoshikawa wrote:
Although the real use case is out of this RFC patch, we are now discussing
a case in which we may hold a spin_lock
On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:
1. Problem
During live migration, if the guest tries to take mmu_lock at the same
time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
forced to wait long time
On Wed, 02 May 2012 14:33:55 +0300
Avi Kivity a...@redhat.com wrote:
=
perf top -t ${QEMU_TID}
=
51.52% qemu-system-x86_64 [.] memory_region_get_dirty
16.73% qemu-system-x86_64 [.] ram_save_remaining
On Wed, 02 May 2012 13:39:51 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
Was the problem really mmu_lock contention?
Takuya, i am so tired to argue the advantage of lockless write-protect
and lockless O(1) dirty-log again and again.
You are missing my point. Please do not
On Sat, 28 Apr 2012 19:05:44 +0900
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:
1. Problem
During live migration, if the guest tries to take mmu_lock at the same
time as GET_DIRTY_LOG, which is called periodically by QEMU, it may be
forced to wait long time
On Wed, 02 May 2012 14:33:55 +0300
Avi Kivity a...@redhat.com wrote:
=
perf top -t ${QEMU_TID}
=
51.52% qemu-system-x86_64 [.] memory_region_get_dirty
16.73% qemu-system-x86_64 [.] ram_save_remaining
On Tue, 1 May 2012 00:04:47 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
Looking forward to it!
After your work, 8192 in my patch may better be lowered a bit.
Why not simply use spin_is_contented again? Are you afraid of
GET_DIRTY_LOG starved by pagefaults?
No, but not so confident.
On Tue, 1 May 2012 16:06:11 +0300
Michael S. Tsirkin m...@redhat.com wrote:
Note that this value in ebx, ecx and edx corresponds to the string
KVMKVMKVM.
+The value in eax corresponds to the maximim cpuid function present in this
leaf,
maximum?
Takuya
+and will be updated if
Updated based on Avi's advice.
Takuya Yoshikawa (2):
KVM: x86 emulator: Move ModRM flags for groups to top level opcode tables
KVM: x86 emulator: Avoid pushing back ModRM byte fetched for group decoding
arch/x86/kvm/emulate.c | 119
1 files
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Needed for the following patch which simplifies ModRM fetching code.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
---
arch/x86/kvm/emulate.c | 111
1 files changed, 56
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Although ModRM byte is fetched for group decoding, it is soon pushed
back to make decode_modrm() fetch it later again.
Now that ModRM flag can be found in the top level opcode tables, fetch
ModRM byte before group decoding to make the code
On Sun, 29 Apr 2012 18:20:37 +0300
Avi Kivity a...@redhat.com wrote:
- cond_resched_lock() uses spin_needbreak(), and I am not sure if
this still does rescheduling for alleviating mmu_lock contention
without CONFIG_PREEMPT.
IMO that's a feature. If kernel policy is not
On Mon, 30 Apr 2012 13:31:09 +0300
Avi Kivity a...@redhat.com wrote:
static struct opcode group7_rm1[] = {
- DI(SrcNone | ModRM | Priv, monitor),
- DI(SrcNone | ModRM | Priv, mwait),
+ DI(SrcNone | Priv, monitor),
+ DI(SrcNone | Priv, mwait),
N, N, N, N, N, N,
};
On Fri, 27 Apr 2012 11:52:13 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
Yes but the objective you are aiming for is to read and write sptes
without mmu_lock. That is, i am not talking about this patch.
Please read carefully the two examples i gave (separated by example)).
The real
On Sun, 29 Apr 2012 14:27:30 +0300
Avi Kivity a...@redhat.com wrote:
+ if (need_resched() ||
spin_is_contended(kvm-mmu_lock)) {
+ kvm_flush_remote_tlbs(kvm);
Do we really need to flush the TLB here?
Suppose we don't. So some pages could
On Sun, 29 Apr 2012 15:59:18 +0300
Avi Kivity a...@redhat.com wrote:
As we discussed before, we need to add some tricks to de-couple mmu_lock and
TLB flush.
Ok, let's discuss them (we can apply the patch independently). Do you
have something in mind?
How about your own idea?
On Sun, 29 Apr 2012 17:39:35 +0300
Avi Kivity a...@redhat.com wrote:
How about your own idea?
Looks pretty good, if I do say so myself. I'll take a shot at
implementing it.
Looking forward to it!
After your work, 8192 in my patch may better be lowered a bit.
Thanks,
Takuya
--
On Sun, 29 Apr 2012 18:00:03 +0300
Avi Kivity a...@redhat.com wrote:
After your work, 8192 in my patch may better be lowered a bit.
Why not remove it altogether? Just change it to cond_resched_lock().
Two concerns:
- too many checks may slow down GET_DIRTY_LOG.
-
dirty page logging, can cause the same problem.
My plan is to replace it with rmap-based protection after this.
Thanks,
Takuya
---
Takuya Yoshikawa (1):
KVM: Reduce mmu_lock contention during dirty logging by cond_resched()
arch/x86/include/asm/kvm_host.h |6 +++---
arch/x86/kvm
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
get_dirty_log() needs to hold mmu_lock during write protecting dirty
pages and this can be long when there are many dirty pages to protect.
As the guest can get faulted during that time, and of course other mmu
works can also happen, this may
On Tue, 24 Apr 2012 17:10:08 +0300
Avi Kivity a...@redhat.com wrote:
+ if (!modrm_fetched (ctxt-d ModRM))
+ ctxt-modrm = insn_fetch(u8, ctxt);
Instead of adding yet another conditional, how about doing something like
if ((c-d ModRM) || (gtype == Group) || (gtype ==
On Tue, 24 Apr 2012 17:11:54 +0300
Avi Kivity a...@redhat.com wrote:
On 04/23/2012 06:31 PM, Takuya Yoshikawa wrote:
Takuya Yoshikawa (4):
KVM: x86 emulator: Introduce ctxt-op_prefix for 0x66 prefix
KVM: x86 emulator: Make prefix decoding a separate function
KVM: x86 emulator: Make
Takuya Yoshikawa (4):
KVM: x86 emulator: Introduce ctxt-op_prefix for 0x66 prefix
KVM: x86 emulator: Make prefix decoding a separate function
KVM: x86 emulator: Make opcode decoding a separate function
KVM: x86 emulator: Avoid pushing back ModRM byte in decode_opcode()
arch/x86/include
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
This is needed to make prefix decoding a separate function.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com
---
arch/x86/include/asm/kvm_emulate.h |1 +
arch/x86/kvm
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Instruction decoding can be split into separate parts, and this is the
first one which treats the instruction prefixes.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com
---
arch
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
This is the second part of the instruction decoding which treats the
opcode.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Cc: Takuya Yoshikawa takuya.yoshik...@gmail.com
---
arch/x86/kvm/emulate.c | 66
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
Although ModRM byte is read for group decoding, it is soon pushed back
to make decode_modrm() fetch it later.
We should consistently read it, only once, in decode_opcode() instead.
Signed-off-by: Takuya Yoshikawa yoshikawa.tak
I got the following error:
===
arch/x86/kvm/emulate.c: Assembler messages:
arch/x86/kvm/emulate.c:4122: Error: bad register name `%dil'
make[2]: *** [arch/x86/kvm/emulate.o] Error 1
...
===
The line indicates:
commit cbe2c9d30aa69b0551247ddb0fb450b6e8080ec4
KVM: x86 emulator: MMX support
Sorry for the delay.
On Wed, 18 Apr 2012 12:01:03 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
I have checked dirty-log-perf myself with this patch [01-07].
GET_DIRTY_LOG for 1GB dirty pages has become more than 15% slower.
Thanks for your test!
Unbelievable, i will
On Wed, 18 Apr 2012 18:03:10 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
By the way, what is the case did you test? ept = 1 ?
Yes!
I am not worrying about without-EPT/NPT-migration.
Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of
On Fri, 20 Apr 2012 18:33:19 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
It is preferable to remove all large sptes including read-only ones, the
current behaviour, then to verify that no read-write transition can
occur in fault paths (fault paths which are increasing in number).
I
On Fri, 20 Apr 2012 21:55:55 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
More importantly than the particular flush TLB case, the point is
every piece of code that reads and writes sptes must now be aware that
mmu_lock alone does not guarantee stability. Everything must be audited.
In
On Fri, 20 Apr 2012 16:07:41 -0700
Ying Han ying...@google.com wrote:
My understanding of the real pain is the poor implementation of the
mmu_shrinker. It iterates all the registered mmu_shrink callbacks for
each kvm and only does little work at a time while holding two big
locks. I learned
On Fri, 20 Apr 2012 19:15:24 -0700
Mike Waychison mi...@google.com wrote:
In our situation, we simple disable the shrinker altogether. My
understanding is that we EPT or NPT, the amount of memory used by
these tables is bounded by the size of guest physical memory, whereas
with software
On Tue, 17 Apr 2012 17:56:24 +0300
Avi Kivity a...@redhat.com wrote:
For live migration, range-based control may be enough duo to the locality
of WWS.
What's WWS?
IIRC it was mentioned in a usenix paper: Writable Working Set.
May not be a commonly known concept.
Kind of working set, but
On Tue, 17 Apr 2012 17:56:24 +0300
Avi Kivity a...@redhat.com wrote:
For live migration, range-based control may be enough duo to the locality
of WWS.
What's WWS?
IIRC it was mentioned in a usenix paper: Writable Working Set.
May not be a commonly known concept.
Kind of working set, but
On Tue, 17 Apr 2012 10:51:40 +0300
Avi Kivity a...@redhat.com wrote:
That's true with the write protect everything approach we use now. But
it's not true with range-based write protection, where you issue
GET_DIRTY_LOG on a range of pages and only need to re-write-protect them.
(the
On Mon, 16 Apr 2012 11:36:25 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
I tested it with kernbench, no regression is found.
Because kernbench is not at all good test for this.
It is not a problem since the iter and spte should be in the cache.
I have checked dirty-log-perf
On Tue, 17 Apr 2012 15:41:39 +0300
Avi Kivity a...@redhat.com wrote:
Since there are many known algorithms to predict hot memory pages,
the userspace will be able to tune the frequency of GET_DIRTY_LOG for such
parts not to get too many faults repeatedly, if we can restrict the range
of
On Tue, 17 Apr 2012 10:51:40 +0300
Avi Kivity a...@redhat.com wrote:
That's true with the write protect everything approach we use now. But
it's not true with range-based write protection, where you issue
GET_DIRTY_LOG on a range of pages and only need to re-write-protect them.
(the
On Tue, 17 Apr 2012 15:41:39 +0300
Avi Kivity a...@redhat.com wrote:
Since there are many known algorithms to predict hot memory pages,
the userspace will be able to tune the frequency of GET_DIRTY_LOG for such
parts not to get too many faults repeatedly, if we can restrict the range
of
On Sun, 15 Apr 2012 14:25:30 +0300
Avi Kivity a...@redhat.com wrote:
@@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
kvm_mmu_pages_init(parent, parents, pages);
while (mmu_unsync_walk(parent, pages)) {
- int protected = 0;
+ bool
On Sun, 15 Apr 2012 12:32:59 +0300
Avi Kivity a...@redhat.com wrote:
Just to throw another idea into the mix - we can have write-protect-less
dirty logging, too. Instead of write protection, drop the dirty bit,
and check it again when reading the dirty log. It might look like we're
On Mon, 16 Apr 2012 17:28:32 +0300
Avi Kivity a...@redhat.com wrote:
But the real question is whether there is any point in re-writing completely
correct C code: there are tons of int like this in the kernel code.
__rmap_write_protect() was introduced recently, so if this conversion is
On Sun, 15 Apr 2012 12:32:59 +0300
Avi Kivity a...@redhat.com wrote:
Just to throw another idea into the mix - we can have write-protect-less
dirty logging, too. Instead of write protection, drop the dirty bit,
and check it again when reading the dirty log. It might look like we're
Xiao,
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:
What is your really want to say but i missed?
How to improve and what we should pay for that.
Note that I am not objecting to O(1) itself.
I forgot to say one important thing -- I might give you wrong impression.
I am perfectly
On Thu, 12 Apr 2012 19:56:45 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
Other than potential performance improvement, the worst case scenario
of holding mmu_lock for hundreds of milliseconds at the beginning
of migration of huge guests must be fixed.
Write protection in
On Fri, 13 Apr 2012 18:33:39 -0300
Marcelo Tosatti mtosa...@redhat.com wrote:
kvm_arch_commit_memory_region(kvm, mem, old, user_alloc);
- /*
-* If the new memory slot is created, we need to clear all
-* mmio sptes.
-*/
- if (npages old.base_gfn !=
Hi,
On Wed, 11 Apr 2012 11:11:07 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
restart:
- list_for_each_entry_safe(sp, node, kvm-arch.active_mmu_pages, link)
- if (kvm_mmu_prepare_zap_page(kvm, sp, invalid_list))
- goto restart;
+ zapped
On Fri, 13 Apr 2012 18:11:13 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
The reture value of __rmap_write_protect is either 1 or 0, use
true/false instead of these
...
@@ -1689,7 +1690,7 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu,
On Fri, 13 Apr 2012 18:10:45 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
static u64 *rmap_get_next(struct rmap_iterator *iter)
{
+ u64 *sptep = NULL;
+
if (iter-desc) {
if (iter-pos PTE_LIST_EXT - 1) {
- u64 *sptep;
-
On Fri, 13 Apr 2012 18:11:45 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
+/* Return true if the spte is dropped. */
Return value does not correspond with the function name so it is confusing.
People may think that true means write protection has been done.
+static bool
://vger.kernel.org/majordomo-info.html
--
Takuya Yoshikawa takuya.yoshik...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 13 Apr 2012 18:14:26 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
Using bit 1 (PTE_LIST_WP_BIT) in rmap store the write-protect status
to avoid unnecessary shadow page walking
Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
---
arch/x86/kvm/mmu.c |
On Fri, 13 Apr 2012 18:05:29 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
Thanks for Avi and Marcelo's review, i have simplified the whole things
in this version:
- it only fix the page fault with PFEC.P = 1 PFEC.W = 0 that means
unlock set_spte path can be dropped.
- it
increase
if multiple VCPUs try to write to memory.
Anyway, this patch is small and seems effective.
Takuya
===
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
get_dirty_log() needs to hold mmu_lock during write protecting dirty
pages and this can be long when there are many dirty
On Tue, 10 Apr 2012 19:58:44 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
No, i do not really agree with that.
We really can get great benefit from O(1) especially if lockless write-protect
is introduced for O(1), live migration is very useful for cloud computing
On Wed, 11 Apr 2012 20:38:57 +0800
Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com wrote:
Well, my point is that live migration is so very useful that it is worth
to be improved, the description of your also proves this point.
What is your really want to say but i missed?
How to improve and
On Wed, 11 Apr 2012 17:21:30 +0300
Avi Kivity a...@redhat.com wrote:
Currently the main performance bottleneck for migration is qemu, which
is single threaded and generally inefficient. However I am sure that
once the qemu bottlenecks will be removed we'll encounter kvm problems,
On Tue, 10 Apr 2012 13:39:14 +0300
Avi Kivity a...@redhat.com wrote:
On 04/09/2012 10:46 PM, Marcelo Tosatti wrote:
Perhaps the mmu_lock hold times by get_dirty are a large component here?
If that can be alleviated, not only RO-RW faults benefit.
Currently the longest holder in normal
From: Takuya Yoshikawa yoshikawa.tak...@oss.ntt.co.jp
We do not need to zap all shadow pages of the guest when we create or
destroy a slot in this function.
To change this, we make kvm_mmu_zap_all()/kvm_arch_flush_shadow()
zap only those which have mappings into a given slot.
The way we iterate
601 - 700 of 1354 matches
Mail list logo