On Fri, Jun 16, 2017 at 16:31 +1000, Jonathan Matthew wrote:
> Recently I updated the kernel lock profiling stuff I've been working on, since
> it  had been rotting a bit since witness was introduced.  Running my diff on a
> KVM VM, I found there was a pretty huge performance impact (10 minutes to
> build a kernel instead of 4), which turned out to be because reading the
> emulated HPET in KVM is slow, and lock profiling involves a lot of extra
> clock reads.  The diff below adds a new TSC-based timecounter implementation
> for KVM and Xen to remedy this.
> 
> KVM and Xen provide frequently-updated views of system time from the host to
> each vcpu in a way that lets the VM get accurate high resolution time without
> much work.  Linux calls this mechanism 'pvclock' so I'm doing the same.
> 
> The pvclock structure gives you a system time (in nanoseconds), the TSC
> reading from when the time was updated, and scaling factors for converting TSC
> values to nanoseconds.  Usually you subtract the TSC reading in the pvclock
> structure from a current reading, convert that to nanoseconds, and add it to
> the system time.  I decided to go the other way in order to keep all the
> available resolution.
> 
> Using pvclock as the timecounter reduces the overhead of lock profiling to
> almost nothing.  Even without the extra clock reads for lock profiling,
> it cuts a few seconds off kernel compile time on a 2 vcpu vm.  I've run it
> for ~12 hours without ntpd and the clock keeps time accurately.
> 
> One wrinkle here is that the KVM pvclock mechanism requires setup on each 
> vcpu,
> so I added a new pvbus function that gets called from cpu_hatch, allowing any
> hypervisor-specific setup to happen there.
> 
> I still need to try this on xen, but comments at this stage are welcome.
>

Cool!  You've beaten both of us to it :)

Last time I've tried uebayashi's pvclock on Xen, it didn't
work for me.  I didn't have time to investigate why but
probably because we need per-cpu readings.  Which you do
for KVM.  I'll test this on Xen as soon as I get to the
office.

Now regarding the diff.  pvbus_init_vcpu.  Ah yes, please.
It was a chicken and the egg problem for me: I didn't have
Xen, but wanted a callback from cpu_hatch to setup shared
info pages and events (interrupt delivery) for all CPUs.
So please factor it out and let's get that committed.

I don't know if it's a good idea to depend on Xen's
definition of vcpu_time_info.  I think I have factored
it out into the pvclock_time_info and put it into the
pvclockvar.h or something like that.  And then made Xen
use those definitions instead of its own.  Dunno what's
the best course of action here.

But this brings another point: where and how to perform
the pvclock initialization and attachment.  In your diff
pvclock_xen_init comes a bit too early: none of the Xen
things are initialized at that point, shared info page
isn't allocated.

I told Stefan in Munich that perhaps having a kvm.c shim
that would prepare and attach pvclock (and maybe provide
some flags and other bells and whistles).

I think we need to call pvclock attachment from Xen code
where it's appropriate, not from pvbus code.  Or do a
config_attach on it.  Why didn't you want to put it in
its own device driver?

It's nice that this version avoids using assembly. Any idea
what was the reason for Linux/FreeBSD code to use it?  Were
they afraid to lose precision maybe?

In any case, good job, lets try to get this in.

Reply via email to