Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-16 Thread Michael S. Tsirkin
On Wed, Jun 17, 2015 at 12:00:56AM +0200, Igor Mammedov wrote: > On Tue, 16 Jun 2015 23:14:20 +0200 > "Michael S. Tsirkin" wrote: > > > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote: > > > since commit > > > 1d4e7e3 kvm: x86: increase user memory slots to 509 > > > > > > it beca

Re: [PATCH 0/5] vhost: support upto 509 memory regions

2015-06-16 Thread Michael S. Tsirkin
On Wed, Jun 17, 2015 at 12:19:15AM +0200, Igor Mammedov wrote: > On Tue, 16 Jun 2015 23:16:07 +0200 > "Michael S. Tsirkin" wrote: > > > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote: > > > Series extends vhost to support upto 509 memory regions, > > > and adds some vhost:translate

Re: [PATCH v3 08/18] baycom_epp: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Thomas Sailer
Acked-by: Thomas Sailer On 06/17/2015 02:35 AM, Andy Lutomirski wrote: This is only used if BAYCOM_DEBUG is defined. Cc: walter harms Cc: Ralf Baechle Cc: Thomas Sailer Cc: linux-h...@vger.kernel.org Signed-off-by: Andy Lutomirski --- I'm hoping for an ack for this to go through -tip.

[PATCH v3 01/18] x86/tsc: Inline native_read_tsc and remove __native_read_tsc

2015-06-16 Thread Andy Lutomirski
In cdc7957d1954 ("x86: move native_read_tsc() offline"), native_read_tsc was moved out of line, presumably for some now-obsolete vDSO-related reason. Undo it. The entire rdtsc, shl, or sequence is only 11 bytes, and calls via rdtscl and similar helpers were already inlined. Signed-off-by: Andy L

[PATCH v3 08/18] baycom_epp: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Andy Lutomirski
This is only used if BAYCOM_DEBUG is defined. Cc: walter harms Cc: Ralf Baechle Cc: Thomas Sailer Cc: linux-h...@vger.kernel.org Signed-off-by: Andy Lutomirski --- I'm hoping for an ack for this to go through -tip. drivers/net/hamradio/baycom_epp.c | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH v3 04/18] x86/tsc: Replace rdtscll with native_read_tsc

2015-06-16 Thread Andy Lutomirski
Now that the read_tsc paravirt hook is gone, rdtscll() is just a wrapper around native_read_tsc(). Unwrap it. Signed-off-by: Andy Lutomirski --- arch/x86/boot/compressed/aslr.c | 2 +- arch/x86/include/asm/msr.h | 3 --- arch/x86/include/asm/tsc.h

[PATCH v3 07/18] x86/cpu/amd: Use the full 64-bit TSC to detect the 2.6.2 bug

2015-06-16 Thread Andy Lutomirski
This code is timing 100k indirect calls, so the added overhead of counting the number of cycles elapsed as a 64-bit number should be insignificant. Drop the optimization of using a 32-bit count. Signed-off-by: Andy Lutomirski --- arch/x86/kernel/cpu/amd.c | 6 +++--- 1 file changed, 3 insertion

[PATCH v3 09/18] staging/lirc_serial: Remove TSC-based timing

2015-06-16 Thread Andy Lutomirski
It wasn't compiled in by default. I suspect that the driver was and still is broken, though -- it's calling udelay with a parameter that's derived from loops_per_jiffy. Cc: Jarod Wilson Cc: de...@driverdev.osuosl.org Cc: Greg Kroah-Hartman Signed-off-by: Andy Lutomirski --- drivers/staging/me

[PATCH v3 05/18] x86/tsc: Remove the rdtscp and rdtscpll macros

2015-06-16 Thread Andy Lutomirski
They have no users. Leave native_read_tscp, which seems potentially useful despite also having no callers. Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/msr.h | 9 - 1 file changed, 9 deletions(-) diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h index 7273

[PATCH v3 13/18] x86/tsc: Rename native_read_tsc() to rdtsc()

2015-06-16 Thread Andy Lutomirski
Now that there is no paravirt TSC, the "native" is inappropriate. The function does RDTSC, so give it the obvious name: rdtsc() Suggested-by: Borislav Petkov Signed-off-by: Andy Lutomirski --- arch/x86/boot/compressed/aslr.c | 2 +- arch/x86/entry/vdso/vclock_gettime.c

[PATCH v3 12/18] x86/tsc: Remove rdtscl()

2015-06-16 Thread Andy Lutomirski
It has no more callers, and it was never a very sensible interface to begin with. Users of the TSC should either read all 64 bits or explicitly throw out the high bits. Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/msr.h | 3 --- 1 file changed, 3 deletions(-) diff --git a/arch/x86/i

[PATCH v3 10/18] input/joystick/analog: Switch from rdtscl() to native_read_tsc()

2015-06-16 Thread Andy Lutomirski
This timing code is hideous, and this doesn't help. It gets rid of one of the last users of rdtscl, though. Acked-by: Dmitry Torokhov Cc: linux-in...@vger.kernel.org Signed-off-by: Andy Lutomirski --- drivers/input/joystick/analog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) di

[PATCH v3 11/18] drivers/input/gameport: Replace rdtscl() with native_read_tsc()

2015-06-16 Thread Andy Lutomirski
It's unclear to me why this code exists in the first place. Acked-by: Dmitry Torokhov Cc: linux-in...@vger.kernel.org Signed-off-by: Andy Lutomirski --- drivers/input/gameport/gameport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/input/gameport/gameport.c b

[PATCH v3 16/18] x86/tsc: In read_tsc, use rdtsc_ordered() instead of get_cycles()

2015-06-16 Thread Andy Lutomirski
There are two logical changes here. First, this removes a check for cpu_has_tsc. That check is unnecessary, as we don't register the TSC as a clocksource on systems that have no TSC. Second, it adds a barrier, thus preventing observable non-monotonicity. I suspect that the missing barrier was n

[PATCH v3 15/18] x86/tsc: Use rdtsc_ordered() in check_tsc_warp() and drop extra barriers

2015-06-16 Thread Andy Lutomirski
Using get_cycles was unnecessary: check_tsc_warp() is not called on TSC-less systems. Replace rdtsc_barrier(); get_cycles() with rdtsc_ordered(). While we're at it, make the somewhat more dangerous change of removing barrier_before_rdtsc after RDTSC in the TSC warp check code. This should be oka

[PATCH v3 18/18] x86/tsc: Remove rdtsc_barrier()

2015-06-16 Thread Andy Lutomirski
All callers have been converted to rdtsc_ordered(). Signed-off-by: Andy Lutomirski --- arch/x86/include/asm/barrier.h | 11 --- arch/x86/um/asm/barrier.h | 13 - 2 files changed, 24 deletions(-) diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.

[PATCH v3 17/18] x86/kvm/tsc: Drop extra barrier and use rdtsc_ordered in kvmclock

2015-06-16 Thread Andy Lutomirski
__pvclock_read_cycles had an unnecessary barrier. Get rid of that barrier and clean up the code by just using rdtsc_ordered(). Cc: Paolo Bonzini Cc: Radim Krcmar Cc: Marcelo Tosatti Cc: kvm@vger.kernel.org Signed-off-by: Andy Lutomirski --- I'm hoping to get an ack for this to go in through

[PATCH v3 14/18] x86: Add rdtsc_ordered() and use it in trivial call sites

2015-06-16 Thread Andy Lutomirski
rdtsc_barrier(); rdtsc() is an unnecessary mouthful and requires more thought than should be necessary. Add an rdtsc_ordered() helper and replace the trivial call sites with it. This should not change generated code. The duplication of the fence asm is temporary. Signed-off-by: Andy Lutomirski

[PATCH v3 06/18] x86/tsc: Use the full 64-bit tsc in tsc_delay

2015-06-16 Thread Andy Lutomirski
As a very minor optimization, tsc_delay was only using the low 32 bits of the TSC. It's a delay function, so just use the whole thing. Signed-off-by: Andy Lutomirski --- arch/x86/lib/delay.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/lib/delay.c b/arch/

[PATCH v3 03/18] x86/tsc/paravirt: Remove the read_tsc and read_tscp paravirt hooks

2015-06-16 Thread Andy Lutomirski
We've had read_tsc and read_tscp paravirt hooks since the very beginning of paravirt, i.e., d3561b7fa0fb ("[PATCH] paravirt: header and stubs for paravirtualisation"). AFAICT the only paravirt guest implementation that ever replaced these calls was vmware, and it's gone. Arguably even vmware shou

[PATCH v3 02/18] x86/msr/kvm: Remove vget_cycles()

2015-06-16 Thread Andy Lutomirski
The only caller was kvm's read_tsc. The only difference between vget_cycles and native_read_tsc was that vget_cycles returned zero instead of crashing on TSC-less systems. KVM's already checks vclock_mode before calling that function, so the extra check is unnecessary. (Off-topic, but the whole

[PATCH v3 00/18] x86/tsc: Clean up rdtsc helpers

2015-06-16 Thread Andy Lutomirski
My sincere apologies for the spam. I send an unholy mixture of the real patch set and an old poorly split-up patch set, and the result is incomprehensible. Here's what I meant to send. After the some recent threads about rdtsc barriers, I remembered that our RDTSC wrappers are a big mess. Let's

Re: [PATCH 0/5] vhost: support upto 509 memory regions

2015-06-16 Thread Igor Mammedov
On Tue, 16 Jun 2015 23:16:07 +0200 "Michael S. Tsirkin" wrote: > On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote: > > Series extends vhost to support upto 509 memory regions, > > and adds some vhost:translate_desc() performance improvemnts > > so it won't regress when memslots are i

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-16 Thread Igor Mammedov
On Tue, 16 Jun 2015 23:14:20 +0200 "Michael S. Tsirkin" wrote: > On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote: > > since commit > > 1d4e7e3 kvm: x86: increase user memory slots to 509 > > > > it became possible to use a bigger amount of memory > > slots, which is used by memory

[PATCH v2 2/2] arm: KVM: keep arm vfp/simd exit handling consistent with arm64

2015-06-16 Thread Mario Smarduch
After enhancing arm64 FP/SIMD exit handling, FP/SIMD exit branch is moved to guest trap handling. This keeps exiting handling flow between both architectures consistent. Signed-off-by: Mario Smarduch --- arch/arm/kvm/interrupts.S | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(

[PATCH v2 1/2] arm64: KVM: Optimize arm64 fp/simd save/restore

2015-06-16 Thread Mario Smarduch
This patch only saves and restores FP/SIMD registers on Guest access. To do this cptr_el2 FP/SIMD trap is set on Guest entry and later checked on exit. lmbench, hackbench show significant improvements, for 30-50% exits FP/SIMD context is not saved/restored Signed-off-by: Mario Smarduch --- arch/

[PATCH v2 0/2] arm/arm64: KVM: Optimize arm64 fp/simd, saves 30-50% on exits

2015-06-16 Thread Mario Smarduch
Currently we save/restore fp/simd on each exit. Fist patch optimizes arm64 save/restore, we only do so on Guest access. hackbench and several lmbench tests show anywhere from 30% to above 50% optimzation achieved. In second patch 32-bit handler is updated to keep exit handling consistent with 6

Re: [PATCH 0/5] vhost: support upto 509 memory regions

2015-06-16 Thread Michael S. Tsirkin
On Tue, Jun 16, 2015 at 06:33:34PM +0200, Igor Mammedov wrote: > Series extends vhost to support upto 509 memory regions, > and adds some vhost:translate_desc() performance improvemnts > so it won't regress when memslots are increased to 509. > > It fixes running VM crashing during memory hotplug

Re: [PATCH 3/5] vhost: support upto 509 memory regions

2015-06-16 Thread Michael S. Tsirkin
On Tue, Jun 16, 2015 at 06:33:37PM +0200, Igor Mammedov wrote: > since commit > 1d4e7e3 kvm: x86: increase user memory slots to 509 > > it became possible to use a bigger amount of memory > slots, which is used by memory hotplug for > registering hotplugged memory. > However QEMU crashes if it's

Re: [PATCH 1/5] vhost: use binary search instead of linear in find_region()

2015-06-16 Thread Igor Mammedov
On Tue, 16 Jun 2015 23:07:24 +0200 "Michael S. Tsirkin" wrote: > On Tue, Jun 16, 2015 at 06:33:35PM +0200, Igor Mammedov wrote: > > For default region layouts performance stays the same > > as linear search i.e. it takes around 210ns average for > > translate_desc() that inlines find_region(). >

Re: [PATCH 5/5] vhost: translate_desc: optimization for desc.len < region size

2015-06-16 Thread Michael S. Tsirkin
On Tue, Jun 16, 2015 at 06:33:39PM +0200, Igor Mammedov wrote: > when translating descriptors they are typically less than > memory region that holds them and translated into 1 iov > enty, entry > so it's not nessesary to check remaining length > twice and calculate used length and next address >

Re: [PATCH 1/5] vhost: use binary search instead of linear in find_region()

2015-06-16 Thread Michael S. Tsirkin
On Tue, Jun 16, 2015 at 06:33:35PM +0200, Igor Mammedov wrote: > For default region layouts performance stays the same > as linear search i.e. it takes around 210ns average for > translate_desc() that inlines find_region(). > > But it scales better with larger amount of regions, > 235ns BS vs 300n

Re: [PATCH 0/5] kvmtool: Misc fixes

2015-06-16 Thread Will Deacon
On Mon, Jun 15, 2015 at 12:49:41PM +0100, Andreas Herrmann wrote: > Following some patches to fix misc issues found when testing the > standalone kvmtool version. > > Please apply. All applied, apart from the ioeventfd patch which I'm not sure about. Will -- To unsubscribe from this list: send t

Re: [PATCH 4/5] kvmtool: Save datamatch as little endian in {add,del}_event

2015-06-16 Thread Will Deacon
On Mon, Jun 15, 2015 at 12:49:45PM +0100, Andreas Herrmann wrote: > W/o dedicated endianess it's impossible to find reliably a match > e.g. in kernel/virt/kvm/eventfd.c ioeventfd_in_range. Hmm, but shouldn't this be the endianness of the guest, rather than just forcing things to little-endian? Wi

Re: [PATCH] kvmtool: don't use PCI config space IRQ line field

2015-06-16 Thread Will Deacon
On Mon, Jun 15, 2015 at 11:45:38AM +0100, Andre Przywara wrote: > On 06/05/2015 05:41 PM, Will Deacon wrote: > > On Thu, Jun 04, 2015 at 04:20:45PM +0100, Andre Przywara wrote: > >> In PCI config space there is an interrupt line field (offset 0x3f), > >> which is used to initially communicate the I

Re: [PATCH v2 01/11] KVM: arm: plug guest debug exploit

2015-06-16 Thread Will Deacon
On Sun, Jun 14, 2015 at 05:13:05PM +0100, zichao wrote: > I and marc are talking about how to plug the guest debug exploit in an > easier way. > > I remembered that you mentioned disabling monitor mode had proven to be > extremely fragile in practice on 32-bit ARM SoCs, what if I save/restore > th

[PATCH 1/5] vhost: use binary search instead of linear in find_region()

2015-06-16 Thread Igor Mammedov
For default region layouts performance stays the same as linear search i.e. it takes around 210ns average for translate_desc() that inlines find_region(). But it scales better with larger amount of regions, 235ns BS vs 300ns LS with 55 memory regions and it will be about the same values when allow

[PATCH 4/5] vhost: add per VQ memory region caching

2015-06-16 Thread Igor Mammedov
that brings down translate_desc() cost to around 210ns if accessed descriptors are from the same memory region. Signed-off-by: Igor Mammedov --- that's what netperf/iperf workloads were during testing. --- drivers/vhost/vhost.c | 16 +--- drivers/vhost/vhost.h | 1 + 2 files changed

[PATCH 3/5] vhost: support upto 509 memory regions

2015-06-16 Thread Igor Mammedov
since commit 1d4e7e3 kvm: x86: increase user memory slots to 509 it became possible to use a bigger amount of memory slots, which is used by memory hotplug for registering hotplugged memory. However QEMU crashes if it's used with more than ~60 pc-dimm devices and vhost-net since host kernel in mo

[PATCH 0/5] vhost: support upto 509 memory regions

2015-06-16 Thread Igor Mammedov
Series extends vhost to support upto 509 memory regions, and adds some vhost:translate_desc() performance improvemnts so it won't regress when memslots are increased to 509. It fixes running VM crashing during memory hotplug due to vhost refusing accepting more than 64 memory regions. It's only h

[PATCH 5/5] vhost: translate_desc: optimization for desc.len < region size

2015-06-16 Thread Igor Mammedov
when translating descriptors they are typically less than memory region that holds them and translated into 1 iov enty, so it's not nessesary to check remaining length twice and calculate used length and next address in such cases. so relace a remaining length and 'size' increment branches with a

[PATCH 2/5] vhost: extend memory regions allocation to vmalloc

2015-06-16 Thread Igor Mammedov
with large number of memory regions we could end up with high order allocations and kmalloc could fail if host is under memory pressure. Considering that memory regions array is used on hot path try harder to allocate using kmalloc and if it fails resort to vmalloc. It's still better than just fail

Re: [PATCH V5 0/4] Consolidated KVM vPMU support for x86

2015-06-16 Thread Joerg Roedel
On Fri, Jun 12, 2015 at 01:34:52AM -0400, Wei Huang wrote: > This patchset is directlyh applicable on kvm.git/queue. > > V5: > * Remove the get_pmu_ops from sub_arch; instead define pmu dispatcher > in kvm_x86_ops->pmu_ops. The dispatcher is initialized in sub-arch. > The PMU interface f

Re: [PATCH V5 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-16 Thread Joerg Roedel
On Fri, Jun 12, 2015 at 01:34:54AM -0400, Wei Huang wrote: > This patch splits existing vPMU code into a common vPMU interface (pmc.c) > and Intel specific vPMU code (pmu_intel.c) using the following steps: > > - Part of arechitectural vPMU code is extracted and moved to pmu_intel.c > file. They

Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-06-16 Thread Thomas Gleixner
On Tue, 16 Jun 2015, Juergen Gross wrote: > AFAIK there are no outstanding questions for more than one month now. > I'd appreciate some feedback or accepting these patches. They are against dead code, which will be gone soon. We switched over to queued locks. Thanks, tglx -- To unsubsc

Re: [PATCH 0/6] x86: reduce paravirtualized spinlock overhead

2015-06-16 Thread Juergen Gross
AFAIK there are no outstanding questions for more than one month now. I'd appreciate some feedback or accepting these patches. Juergen On 04/30/2015 12:53 PM, Juergen Gross wrote: Paravirtualized spinlocks produce some overhead even if the kernel is running on bare metal. The main reason are t

Re: [PATCH 2/2] arm64: KVM: Add VCPU support for Qualcomm Technologies Kryo ARMv8 CPU

2015-06-16 Thread Marc Zyngier
In the future, please add the KVM/ARM maintainers on Cc. On 12/06/15 22:57, Timur Tabi wrote: > From: Shanker Donthineni > > This patch enables assignment of 32/64bit guest VCPU to > Qualcomm Technologies ARMv8 CPU. Added KVM_ARM_TARGET_QCOM_KRYO > to the KVM target list and modified vm_target_c

Re: [PATCH 5/7] userfaultfd: switch to exclusive wakeup for blocking reads

2015-06-16 Thread Andrea Arcangeli
On Mon, Jun 15, 2015 at 08:41:24PM -1000, Linus Torvalds wrote: > On Mon, Jun 15, 2015 at 12:19 PM, Andrea Arcangeli > wrote: > > > > Yes, it would leave the other blocked, how is it different from having > > just 1 reader and it gets killed? > > Either is completely wrong. But the read() code c

[GIT PULL] KVM fixes for 4.1-rc8

2015-06-16 Thread Marcelo Tosatti
Linus, Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git master To receive the following KVM bug fix, which restores APIC migration functionality. Radim Krčmář (1): KVM: x86: fix lapic.timer_mode on restore arch/x86/kvm/lapic.c | 26 -- 1 file c

[PATCH 2/2] KVM: fix checkpatch.pl errors in kvm/coalesced_mmio.h

2015-06-16 Thread Kevin Mulvey
Tabs rather than spaces Signed-off-by: Kevin Mulvey --- virt/kvm/coalesced_mmio.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/coalesced_mmio.h b/virt/kvm/coalesced_mmio.h index b280c20..5cbf190 100644 --- a/virt/kvm/coalesced_mmio.h +++ b/virt/kvm/coalesced_m

[PATCH 1/2] KVM: fix checkpatch.pl errors in kvm/async_pf.h

2015-06-16 Thread Kevin Mulvey
fix brace spacing Signed-off-by: Kevin Mulvey --- virt/kvm/async_pf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/virt/kvm/async_pf.h b/virt/kvm/async_pf.h index e7ef6447..ec4cfa2 100644 --- a/virt/kvm/async_pf.h +++ b/virt/kvm/async_pf.h @@ -29,8 +29,8 @@ void kvm_as

Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts

2015-06-16 Thread Eric Auger
On 06/16/2015 10:28 AM, Marc Zyngier wrote: > Hi Eric, > > On 15/06/15 16:44, Eric Auger wrote: >> Hi Marc, >> On 06/08/2015 07:04 PM, Marc Zyngier wrote: >>> In order to be able to feed physical interrupts to a guest, we need >>> to be able to establish the virtual-physical mapping between the tw

Re: [PATCH] arm64: KVM: Optimize arm64 guest exit VFP/SIMD register save/restore

2015-06-16 Thread Marc Zyngier
On 16/06/15 04:04, Mario Smarduch wrote: > On 06/15/2015 11:20 AM, Marc Zyngier wrote: >> On 15/06/15 19:04, Mario Smarduch wrote: >>> On 06/15/2015 03:00 AM, Marc Zyngier wrote: Hi Mario, > [ ... ] On 13/06/15 23:20, Mario Smarduch wrote: > Currently VFP/SIMD registers are

Re: [PATCH 06/10] KVM: arm/arm64: vgic: Allow dynamic mapping of physical/virtual interrupts

2015-06-16 Thread Marc Zyngier
Hi Eric, On 15/06/15 16:44, Eric Auger wrote: > Hi Marc, > On 06/08/2015 07:04 PM, Marc Zyngier wrote: >> In order to be able to feed physical interrupts to a guest, we need >> to be able to establish the virtual-physical mapping between the two >> worlds. >> >> The mapping is kept in a rbtree, in

[PATCH] KVM: Avoid warning "user requested TSC rate below hardware speed" when create VM.

2015-06-16 Thread Lan Tianyu
KVM populates max_tsc_khz with tsc_khz at arch init stage on the constant tsc machine and creates VM with max_tsc_khz as tsc. However, tsc_khz maybe changed during tsc clocksource driver refines calibration. This will cause to create VM with slow tsc and produce the following warning. To fix the is