Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Jan Kiszka
On 2015-04-07 08:29, Valentine Sinitsyn wrote: > On 07.04.2015 11:23, Jan Kiszka wrote: >> On 2015-04-07 08:19, Valentine Sinitsyn wrote: >>> On 07.04.2015 11:13, Jan Kiszka wrote: >> It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, >> more >> towards 600 if they are co

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Valentine Sinitsyn
On 07.04.2015 11:23, Jan Kiszka wrote: On 2015-04-07 08:19, Valentine Sinitsyn wrote: On 07.04.2015 11:13, Jan Kiszka wrote: It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more towards 600 if they are colder (added some usleep to each loop in the test). Great, thanks. Could

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Jan Kiszka
On 2015-04-07 08:19, Valentine Sinitsyn wrote: > On 07.04.2015 11:13, Jan Kiszka wrote: It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more towards 600 if they are colder (added some usleep to each loop in the test). >>> Great, thanks. Could you post absolute numbe

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Valentine Sinitsyn
On 07.04.2015 11:13, Jan Kiszka wrote: It is, at least 160 cycles with hot caches on an AMD A6-5200 APU, more towards 600 if they are colder (added some usleep to each loop in the test). Great, thanks. Could you post absolute numbers, i.e how long do A and B take on your CPU? A is around 1910

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Jan Kiszka
On 2015-04-07 08:10, Valentine Sinitsyn wrote: > Hi Jan, > > On 07.04.2015 10:43, Jan Kiszka wrote: >> On 2015-04-05 19:12, Valentine Sinitsyn wrote: >>> Hi Jan, >>> >>> On 05.04.2015 13:31, Jan Kiszka wrote: studying the VM exit logic of Jailhouse, I was wondering when AMD's vmload/vmsa

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Valentine Sinitsyn
Hi Jan, On 07.04.2015 10:43, Jan Kiszka wrote: On 2015-04-05 19:12, Valentine Sinitsyn wrote: Hi Jan, On 05.04.2015 13:31, Jan Kiszka wrote: studying the VM exit logic of Jailhouse, I was wondering when AMD's vmload/vmsave can be avoided. Jailhouse as well as KVM currently use these instructi

Re: SVM: vmload/vmsave-free VM exits?

2015-04-06 Thread Jan Kiszka
On 2015-04-05 19:12, Valentine Sinitsyn wrote: > Hi Jan, > > On 05.04.2015 13:31, Jan Kiszka wrote: >> studying the VM exit logic of Jailhouse, I was wondering when AMD's >> vmload/vmsave can be avoided. Jailhouse as well as KVM currently use >> these instructions unconditionally. However, I think

[PATCH v15 01/15] qspinlock: A simple generic 4-byte queue spinlock

2015-04-06 Thread Waiman Long
This patch introduces a new generic queue spinlock implementation that can serve as an alternative to the default ticket spinlock. Compared with the ticket spinlock, this queue spinlock should be almost as fair as the ticket spinlock. It has about the same speed in single-thread and it can be much

[PATCH v15 00/15] qspinlock: a 4-byte queue spinlock with PV support

2015-04-06 Thread Waiman Long
v14->v15: - Incorporate PeterZ's v15 qspinlock patch and improve upon the PV qspinlock code by dynamically allocating the hash table as well as some other performance optimization. - Simplified the Xen PV qspinlock code as suggested by David Vrabel . - Add benchmarking data for 3.19 ker

[PATCH v15 08/15] lfsr: a simple binary Galois linear feedback shift register

2015-04-06 Thread Waiman Long
This patch is based on the code sent out by Peter Zijstra as part of his queue spinlock patch to provide a hashing function with open addressing. The lfsr() function can be used to return a sequence of numbers that cycle through all the bit patterns (2^n -1) of a given bit width n except the value

[PATCH v15 04/15] qspinlock: Extract out code snippets for the next patch

2015-04-06 Thread Waiman Long
This is a preparatory patch that extracts out the following 2 code snippets to prepare for the next performance optimization patch. 1) the logic for the exchange of new and previous tail code words into a new xchg_tail() function. 2) the logic for clearing the pending bit and setting the loc

[PATCH v15 03/15] qspinlock: Add pending bit

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) Because the qspinlock needs to touch a second cacheline (the per-cpu mcs_nodes[]); add a pending bit and allow a single in-word spinner before we punt to the second cacheline. It is possible so observe the pending bit without the locked bit when the last owner has ju

[PATCH v15 05/15] qspinlock: Optimize for smaller NR_CPUS

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) When we allow for a max NR_CPUS < 2^14 we can optimize the pending wait-acquire and the xchg_tail() operations. By growing the pending bit to a byte, we reduce the tail to 16bit. This means we can use xchg16 for the tail part and do away with all the repeated compxch

[PATCH v15 09/15] pvqspinlock: Implement simple paravirt support for the qspinlock

2015-04-06 Thread Waiman Long
Provide a separate (second) version of the spin_lock_slowpath for paravirt along with a special unlock path. The second slowpath is generated by adding a few pv hooks to the normal slowpath, but where those will compile away for the native case, they expand into special wait/wake code for the pv v

[PATCH v15 07/15] qspinlock: Revert to test-and-set on hypervisors

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) When we detect a hypervisor (!paravirt, see qspinlock paravirt support patches), revert to a simple test-and-set lock to avoid the horrors of queue preemption. Signed-off-by: Peter Zijlstra (Intel) Signed-off-by: Waiman Long --- arch/x86/include/asm/qspinlock.h |

[PATCH v15 13/15] pvqspinlock: Only kick CPU at unlock time

2015-04-06 Thread Waiman Long
Before this patch, a CPU may have been kicked twice before getting the lock - one before it becomes queue head and once before it gets the lock. All these CPU kicking and halting (VMEXIT) can be expensive and slow down system performance, especially in an overcommitted guest. This patch add a new

[PATCH v15 15/15] pvqspinlock: Add debug code to check for PV lock hash sanity

2015-04-06 Thread Waiman Long
The current code for PV lock hash table processing will panic the system if pv_hash_find() can't find the desired hash bucket. However, there is no check to see if there is more than one entry for a given lock which should never happen. This patch adds a pv_hash_check_duplicate() function to do th

[PATCH v15 10/15] pvqspinlock: Implement the paravirt qspinlock for x86

2015-04-06 Thread Waiman Long
From: Peter Zijlstra (Intel) We use the regular paravirt call patching to switch between: native_queue_spin_lock_slowpath() __pv_queue_spin_lock_slowpath() native_queue_spin_unlock()__pv_queue_spin_unlock() We use a callee saved call for the unlock function which reduces the

[PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen

2015-04-06 Thread Waiman Long
This patch adds the necessary Xen specific code to allow Xen to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long --- arch/x86/xen/spinlock.c | 63 --- kernel/Kconfig.locks|2 +- 2

[PATCH v15 06/15] qspinlock: Use a simple write to grab the lock

2015-04-06 Thread Waiman Long
Currently, atomic_cmpxchg() is used to get the lock. However, this is not really necessary if there is more than one task in the queue and the queue head don't need to reset the tail code. For that case, a simple write to set the lock bit is enough as the queue head will be the only one eligible to

[PATCH v15 11/15] pvqspinlock, x86: Enable PV qspinlock for KVM

2015-04-06 Thread Waiman Long
This patch adds the necessary KVM specific code to allow KVM to support the CPU halting and kicking operations needed by the queue spinlock PV code. Signed-off-by: Waiman Long --- arch/x86/kernel/kvm.c | 43 +++ kernel/Kconfig.locks |2 +- 2 files c

[PATCH v15 02/15] qspinlock, x86: Enable x86-64 to use queue spinlock

2015-04-06 Thread Waiman Long
This patch makes the necessary changes at the x86 architecture specific layer to enable the use of queue spinlock for x86-64. As x86-32 machines are typically not multi-socket. The benefit of queue spinlock may not be apparent. So queue spinlock is not enabled. Currently, there is some incompatibi

[PATCH v15 14/15] pvqspinlock: Improve slowpath performance by avoiding cmpxchg

2015-04-06 Thread Waiman Long
In the pv_scan_next() function, the slow cmpxchg atomic operation is performed even if the other CPU is not even close to being halted. This extra cmpxchg can harm slowpath performance. This patch introduces the new mayhalt flag to indicate if the other spinning CPU is close to being halted or not

Re: [PATCH] x86: vdso: fix pvclock races with task migration

2015-04-06 Thread Paolo Bonzini
On 06/04/2015 22:07, Andy Lutomirski wrote: > On 04/02/2015 11:59 AM, Andy Lutomirski wrote: >> On Thu, Apr 2, 2015 at 11:44 AM, Radim Krčmář wrote: >>> If we were migrated right after __getcpu, but before reading the >>> migration_count, we wouldn't notice that we read TSC of a different >>> VC

[3.13.y-ckt stable] Patch "KVM: MIPS: Fix trace event to save PC directly" has been added to staging queue

2015-04-06 Thread Kamal Mostafa
This is a note to let you know that I have just added a patch titled KVM: MIPS: Fix trace event to save PC directly to the linux-3.13.y-queue branch of the 3.13.y-ckt extended stable tree which can be found at: http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-

Re: [PATCH] x86: vdso: fix pvclock races with task migration

2015-04-06 Thread Marcelo Tosatti
On Thu, Apr 02, 2015 at 08:44:23PM +0200, Radim Krčmář wrote: > If we were migrated right after __getcpu, but before reading the > migration_count, we wouldn't notice that we read TSC of a different > VCPU, nor that KVM's bug made pvti invalid, as only migration_count > on source VCPU is increased.

Re: [PATCH] x86: vdso: fix pvclock races with task migration

2015-04-06 Thread Andy Lutomirski
On 04/02/2015 11:59 AM, Andy Lutomirski wrote: On Thu, Apr 2, 2015 at 11:44 AM, Radim Krčmář wrote: If we were migrated right after __getcpu, but before reading the migration_count, we wouldn't notice that we read TSC of a different VCPU, nor that KVM's bug made pvti invalid, as only migration_