On 05/14/2015 10:31 AM, Andrea Arcangeli wrote:
> +static int userfaultfd_wake_function(wait_queue_t *wq, unsigned mode,
> + int wake_flags, void *key)
> +{
> + struct userfaultfd_wake_range *range = key;
> + int ret;
> + struct userfaultfd_wait_queue *u
On 04/23/2015 02:13 PM, Liang Li wrote:
> When compiling kernel on westmere, the performance of eager FPU
> is about 0.4% faster than lazy FPU.
Do you have an theory why this is? What does the regression come from?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of
On 10/18/2014 07:49 AM, Dominik Dingel wrote:
> On Fri, 17 Oct 2014 15:04:21 -0700
> Dave Hansen wrote:
>> Is there ever a time where the VMAs under an mm have mixed VM_NOZEROPAGE
>> status? Reading the patches, it _looks_ like it might be an all or
>> nothing thing.
>
On 10/17/2014 07:09 AM, Dominik Dingel wrote:
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index cd33ae2..8f09c91 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -113,7 +113,7 @@ extern unsigned int kobjsize(const void *objp);
> #define VM_GROWSDOWN 0x0100 /*
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> The MADV_USERFAULT feature should be generic enough that it can
> provide the userfaults to the Android volatile range feature too, on
> access of reclaimed volatile pages.
Maybe.
I certainly can't keep track of all the versions of the variations
On 05/16/2014 02:01 PM, Paolo Bonzini wrote:
> Yes, of course. Dave, ok to only have it in 3.16?
Sure, it's been broken for a long time, so it's no hurry to get fixed.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majo
From: Dave Hansen
I noticed on some of my systems that page fault tracing doesn't
work:
cd /sys/kernel/debug/tracing
echo 1 > events/exceptions/enable
cat trace;
# nothing shows up
I eventually traced it down to CONFIG_KVM_GUEST. At least in a
KVM VM,
On 01/23/2014 10:55 AM, Dave Hansen wrote:
> On 01/21/2014 08:38 AM, Toralf Förster wrote:
>> Jan 21 17:18:57 n22 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
>> (t=60001 jiffies g=18494 c=18493 q=183951)
>> Jan 21 17:18:57 n22 kernel: sending NMI to all CPUs:
&
On 01/21/2014 08:38 AM, Toralf Förster wrote:
> Jan 21 17:18:57 n22 kernel: INFO: rcu_sched self-detected stall on CPU { 2}
> (t=60001 jiffies g=18494 c=18493 q=183951)
> Jan 21 17:18:57 n22 kernel: sending NMI to all CPUs:
> Jan 21 17:18:57 n22 kernel: NMI backtrace for cpu 2
> Jan 21 17:18:57 n
I'm causing qemu to spew these emulation failure messages until I kill
it. The guest kernel being run has been hacked up pretty heavily and is
probably either accessing bad physical addresses (above the address
ranges in the e820 table) or trying to DMA to bad addresses.
What I'd really like qemu
On 05/23/2012 01:48 AM, Peter Zijlstra wrote:
> On Wed, 2012-05-23 at 16:34 +0800, Liu ping fan wrote:
>> > so we need to migrate some of vcpus from node-B to node-A, or to
>> > node-C.
> This is absolutely broken, you cannot do that.
>
> A guest task might want to be node affine, it looks at the
On Tue, 2011-09-20 at 16:55 -0300, Marcelo Tosatti wrote:
> > and the wall clock stays behind my host wall clock by the amount of
> > time it took to resume.
>
> This is expected, similar to savevm/loadvm.
That seems like pretty undesirable behavior to me. It's too bad that it
does that with sa
On Wed, 2011-06-29 at 20:21 +1000, Paul Mackerras wrote:
> +struct kvmppc_pginfo {
> + unsigned long pfn;
> + atomic_t refcnt;
> +};
I only see this refcnt inc'd in one spot and never decremented or read.
Is the refcnt just the number of hptes we have for this particular page
at the momen
On Thu, 2010-08-05 at 12:28 +0300, Avi Kivity wrote:
> On 08/04/2010 10:13 AM, Lai Jiangshan wrote:
> > mmu_shrink() should attempt to free @nr_to_scan entries.
>
> This conflicts with Dave's patchset.
>
> Dave, what's going on with those patches? They're starting to smell.
These seem to fix th
On Thu, 2010-08-05 at 12:28 +0300, Avi Kivity wrote:
> On 08/04/2010 10:13 AM, Lai Jiangshan wrote:
> > mmu_shrink() should attempt to free @nr_to_scan entries.
> >
>
> This conflicts with Dave's patchset.
>
> Dave, what's going on with those patches? They're starting to smell.
The hardware and
On Thu, 2010-07-22 at 07:36 +0300, Avi Kivity wrote:
> On 06/22/2010 07:32 PM, Dave Hansen wrote:
> > On Sun, 2010-06-20 at 11:11 +0300, Avi Kivity wrote:
> >>> That changes a few things. I bet all the contention we were seeing was
> >>> just from nr_to_scan=0
On Sun, 2010-06-20 at 11:11 +0300, Avi Kivity wrote:
> > That changes a few things. I bet all the contention we were seeing was
> > just from nr_to_scan=0 calls and not from actual shrink operations.
> > Perhaps we should just stop this set after patch 4.
> >
>
> At the very least, we should
On Wed, 2010-06-16 at 08:25 -0700, Dave Hansen wrote:
> On Wed, 2010-06-16 at 12:24 +0300, Avi Kivity wrote:
> > On 06/15/2010 04:55 PM, Dave Hansen wrote:
> > > In a previous patch, we removed the 'nr_to_scan' tracking.
> > > It was not being used to track th
On Wed, 2010-06-16 at 11:48 +0300, Avi Kivity wrote:
> > +static inline void kvm_mod_used_mmu_pages(struct kvm *kvm, int nr)
> > +{
> > + kvm->arch.n_used_mmu_pages += nr;
> > + kvm_total_used_mmu_pages += nr;
> >
>
> Needs an atomic operation, since there's no global lock here. To av
On Wed, 2010-06-16 at 11:25 -0300, Marcelo Tosatti wrote:
> > - if (used_pages > kvm_nr_mmu_pages) {
> > - while (used_pages > kvm_nr_mmu_pages &&
> > + if (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages) {
> > + while (kvm->arch.n_used_mmu_pages > goal_nr_mmu_pages
On Wed, 2010-06-16 at 12:24 +0300, Avi Kivity wrote:
> On 06/15/2010 04:55 PM, Dave Hansen wrote:
> > In a previous patch, we removed the 'nr_to_scan' tracking.
> > It was not being used to track the number of objects
> > scanned, so we stopped using it entirely
On Wed, 2010-06-16 at 11:48 +0300, Avi Kivity wrote:
> On 06/15/2010 04:55 PM, Dave Hansen wrote:
> > +/*
> > + * This value is the sum of all of the kvm instances's
> > + * kvm->arch.n_used_mmu_pages values. We need a global,
> > + * aggregate version
On Wed, 2010-06-16 at 11:38 +0300, Avi Kivity wrote:
> On 06/15/2010 04:55 PM, Dave Hansen wrote:
> > These seem to boot and run fine. I'm running about 40 VMs at
> > once, while doing "echo 3> /proc/sys/vm/drop_caches", and
> > killing/restarting VMs co
On Tue, 2010-06-15 at 10:07 +0300, Avi Kivity wrote:
> On 06/14/2010 08:58 PM, Dave Hansen wrote:
> > On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> >
> >>> Again, this is useless when ballooning is being used. But, I'm thinking
> >>> of a m
emoves the manipulation of the 'nr_to_scan'
variable. It use in here was questionable, especially
since it was being decremented even in cases where no
scanning was taking place: when building the counter.
Interestingly enough, removing it here does not affect
the reclaim behavior at all.
Signed
put debate.
Signed-off-by: Dave Hansen
---
linux-2.6.git-dave/arch/x86/kvm/mmu.c | 48 ++
linux-2.6.git-dave/kernel/profile.c |2 +
2 files changed, 40 insertions(+), 10 deletions(-)
diff -puN arch/x86/kvm/mmu.c~optimize_shrinker-3 arch/x86/kvm/mmu.c
---
The comment tells most of the story here. This patch guarantees
that once a user decrements kvm->users_count to 0 that no one
will increment it again.
We'll need this in a moment because we are going to use
kvm->users_count as a more generic refcount.
Signed-off-by: Dave Hansen
ount on a 'struct kvm'
2. freeing a kvm mmu page
This would probably be most ideal if we can expose some
of the work done by kvm_mmu_remove_some_alloc_mmu_pages()
as also counting as scanning, but I think we have churned
enough for the moment.
Signed-off-by: Dave Hansen
---
linux-
It can get really hairy, and we've seen lock
spinning in mmu_shrink() be the dominant entry in profiles.
This is guaranteed to optimize at least half of those lock
aquisitions away. It removes the need to take any of the locks
when simply trying to count objects.
Signed-off-by: Dave Hansen
ot;
values which confused me. It might all tie together.
Anyway, another benefit of storing 'used' intead of 'free' is
that the values are consistent from the moment the structure is
allocated: no negative "used" value.
Signed-off-by: Dave Hansen
---
lin
, which is dead wrong.
It's really the high watermark, so let's give it a name to match:
nr_max_mmu_pages. This change will make the next few patches
much more obvious and easy to read.
Signed-off-by: Dave Hansen
---
linux-2.6.git-dave/arch/x86/include/asm/kvm_host.h |
lable" is a much better description, especially when you
see how it is calculated.
In this patch, we abstract its use into a function. We'll soon
replace the function's contents by calculating the value in a
different way.
Signed-off-by: Dave Hansen
---
linux-2.6.git-dave/arch/x8
This basically takes the loop contents and sticks it in
its own function for readability. Don't pay too much
attention to the use of nr_scanned in here. It's a bit
wonky but it'll change in a minute anyway.
Signed-off-by: Dave Hansen
---
linux-2.6.git-dave/arch/x86/k
This is a big RFC for the moment. These need a bunch more
runtime testing.
--
We've seen contention in the mmu_shrink() function. This patch
set reworks it to hopefully be more scalable to large numbers
of CPUs, as well as large numbers of running VMs.
The patches are ordered with increasing i
On Mon, 2010-06-14 at 19:34 +0300, Avi Kivity wrote:
> > Again, this is useless when ballooning is being used. But, I'm thinking
> > of a more general mechanism to force the system to both have MemFree
> > _and_ be acting as if it is under memory pressure.
> >
>
> If there is no memory pressu
On Mon, 2010-06-14 at 22:28 +0530, Balbir Singh wrote:
> If you've got duplicate pages and you know
> that they are duplicated and can be retrieved at a lower cost, why
> wouldn't we go after them first?
I agree with this in theory. But, the guest lacks the information about
what is truly duplica
On Mon, 2010-06-14 at 18:44 +0300, Avi Kivity wrote:
> On 06/14/2010 06:33 PM, Dave Hansen wrote:
> > At the same time, I see what you're trying to do with this. It really
> > can be an alternative to ballooning if we do it right, since ballooning
> > would probably evic
On Mon, 2010-06-14 at 16:01 +0300, Avi Kivity wrote:
> If we drop unmapped pagecache pages, we need to be sure they can be
> backed by the host, and that depends on the amount of sharing.
You also have to set up the host up properly, and continue to maintain
it in a way that finds and eliminates
On Mon, 2010-06-14 at 14:18 +0530, Balbir Singh wrote:
> 1. A slab page will not be freed until the entire page is free (all
> slabs have been kfree'd so to speak). Normal reclaim will definitely
> free this page, but a lot of it depends on how frequently we are
> scanning the LRU list and when thi
On Thu, 2010-06-10 at 19:55 +0530, Balbir Singh wrote:
> > I'm not sure victimizing unmapped cache pages is a good idea.
> > Shouldn't page selection use the LRU for recency information instead
> > of the cost of guest reclaim? Dropping a frequently used unmapped
> > cache page can be more expensi
On Tue, 2010-03-16 at 11:05 +0200, Avi Kivity wrote:
> > Not really. In many cloud environments, there's a set of common
> > images that are instantiated on each node. Usually this is because
> > you're running a horizontally scalable application or because you're
> > supporting an ephemeral s
This time it is kvm_arch_vcpu_ioctl(). Use dynamic
allocations to reduce its stack usage.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/x86.c | 20 +---
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm
Same as the last one, but this time we use kmalloc()
for all of the uses.
Note that the kfree()s take advantage of the fact that
kfree() is OK on NULL.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
virt/kvm/kvm_main.c | 46 --
1 files c
methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
on the stack.
2. Use a union{} member for all of the case variables. This
tricks gcc into combining them all into a single stack
allocation. (There's also a comment on this)
Signed-off-by: Dave H
ion to the
arch-specific x86 kvm header.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/mmu.c | 23 ---
include/asm-x86/kvm_host.h | 10 ++
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kv
On my machine with gcc 3.4, kvm uses ~2k of stack in a few
select functions. This is mostly because gcc fails to
notice that the different case: statements could have their
stack usage combined. It overflows very nicely if interrupts
happen during one of these large uses.
This patch uses two met
ion to the
arch-specific x86 kvm header.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/mmu.c | 23 ---
include/asm-x86/kvm_host.h | 10 ++
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kv
Same as the last one, but this time we use kmalloc()
for all of the uses.
Note that the kfree()s take advantage of the fact that
kfree() is OK on NULL.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
virt/kvm/kvm_main.c | 48 ++--
1 files c
This time it is kvm_arch_vcpu_ioctl(). Use dynamic
allocations to reduce its stack usage.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/x86.c | 20 +---
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm
On Mon, 2008-07-28 at 13:46 -0500, Anthony Liguori wrote:
> Tricking gcc seems like a bad thing to me. Who knows what crazy thing
> GCC is going to do in the future.
>
> Why not just kmalloc() these things? Is kmalloc really that slow?
In this case, I the kmalloc() just looked worse. I think
This time it is kvm_arch_vcpu_ioctl(). Use dynamic
allocations to reduce its stack usage.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/x86.c | 20 +---
1 files changed, 13 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm
ion to the
arch-specific x86 kvm header.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
arch/x86/kvm/mmu.c | 23 ---
include/asm-x86/kvm_host.h | 10 ++
2 files changed, 18 insertions(+), 15 deletions(-)
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kv
methods for reducing stack usage.
1. dynamically allocate large objects instead of putting
on the stack.
2. Use a union{} member for all of the case variables. This
tricks gcc into combining them all into a single stack
allocation.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
ar
Same as the last one, but this time we use kmalloc()
for all of the uses.
Note that the kfree()s take advantage of the fact that
kfree() is OK on NULL.
Signed-off-by: Dave Hansen <[EMAIL PROTECTED]>
---
virt/kvm/kvm_main.c | 48 ++--
1 files c
On Thu, 2008-07-17 at 08:40 -0700, Roland Dreier wrote:
> > + struct kvm_pv_mmu_op_buffer *buffer =
> > + kmalloc(GFP_KERNEL, sizeof(struct kvm_pv_mmu_op_buffer));
>
> Surely this produces a warning? kmalloc takes (size, flags) -- you have
> them reversed here.
Heh. It actually doe
KVM uses a lot of kernel stack on x86, especially with gcc 3.x. It
likes to overflow it and make my poor machine go boom. This patch takes
the worst stack users and makes them use kmalloc(). It also saves ~30
bytes in kvm_arch_vm_ioctl() by using a union.
I haven't tested this at all yet. Just
On Wed, 2008-07-16 at 23:08 -0700, Roland Dreier wrote:
> > Yes, things like kvm_lapic_state are way too big to be on the
> stack.
>
> I had a quick look at the code, and my worry about dynamic allocation
> would be that handling allocation failure seems like it might get
> tricky. Eg for handli
That also reminds me. kvm somehow has an outdated copy of
anon_inodes.c. It needs to be updated for the r/o bind mount patches to
add a proper mnt_want/drop_write(). Otherwise, you'll run into warnings
about imbalanced mount writer counts. Something like this will do, but
it would be best to ju
A newer gcc (4.2) makes this a wee bit better, but probably still
worrisome.
[EMAIL PROTECTED]:~/src/kvm-userspace-virgin/kernel$ objdump -d *.ko | perl
/home/dave/kernels/linux-2.6.git-t61/scripts/checkstack.pl i386
0x7b33 kvm_arch_vm_ioctl [kvm]: 1164
0x72e8 kvm_arch
On Thu, 2008-07-17 at 08:52 +0300, Avi Kivity wrote:
> Dave Hansen wrote:
> > Avi, how would you like this fixed? I'd be happy to prepare some
> > patches. Do you have a particular approach that you think we should
> > use? Just make the big objects dynamically
On Wed, 2008-07-16 at 14:44 -0700, Dave Hansen wrote:
> On a suggestion of Anthony's, I tried a defconfig kernel.
>
> It is now bombing out on an assertion in the lapic code:
>
> http://sr71.net/~dave/linux/2.6.26-oops1.txt
I think I found it!!!
$ (objdump -d kvm
On a suggestion of Anthony's, I tried a defconfig kernel.
It is now bombing out on an assertion in the lapic code:
http://sr71.net/~dave/linux/2.6.26-oops1.txt
-- Dave
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More
So, just a continuation of what we were talking about before...
I just had a bug triggered on my system because I'm running sparsemem
(it was in show_mem()). I wonder if sparsemem is contributing to the
bug. Does kvm ever do any arithmetic that you can think of with 'struct
page's?
To summarize
On Thu, 2008-06-12 at 16:10 +0300, Avi Kivity wrote:
> Stumped. Please post .config, will try to reproduce.
http://sr71.net/~dave/linux/config-2.6.26-rc6
-- Dave
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info a
On Wed, 2008-06-04 at 16:42 +0300, Avi Kivity wrote:
> Dave Hansen wrote:
...
> > After collecting all those, I turned on CONFIG_DEBUG_HIGHMEM and the
> > oopses miraculously stopped. But, the guest hung (for at least 5
> > minutes or so) during windows bootup, pegging my h
On Wed, 2008-06-04 at 21:51 +0200, Andrea Arcangeli wrote:
> > Dave mentioned that SetPageReserved() doesn't necessarily get called
> for
> > zones with bad alignment.
>
> What does 'bad alignment' mean? Buddy was used to require each zone to
> start at 1< before the alignment was wasted). In any
On Mon, 2008-06-02 at 15:30 -0700, Dave Hansen wrote:
> On Thu, 2008-03-27 at 16:59 +0200, Avi Kivity wrote:
> > Dave Hansen wrote:
> > > On Thu, 2008-03-27 at 12:10 +0200, Avi Kivity wrote:
> > >> btw, is this with >= 4GB RAM on the host?
> > >
> >
Oops, sent to the old list. Please reply-all to the LKML version if you
need to reply...
Forwarded Message
From: Dave Hansen <[EMAIL PROTECTED]>
To: Avi Kivity <[EMAIL PROTECTED]>
Cc: [EMAIL PROTECTED] <[EMAIL PROTECTED]>,
kvm-devel <[EMAIL PROTECTED]>,
68 matches
Mail list logo