Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15
On 03/16/2015 06:46 PM, Peter Zijlstra wrote: Hi Waiman, As promised; here is the paravirt stuff I did during the trip to BOS last week. All the !paravirt patches are more or less the same as before (the only real change is the copyright lines in the first patch). The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more convoluted and I've no real way to test that but it should be stright fwd to make work. I ran this using the virtme tool (thanks Andy) on my laptop with a 4x overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and it both booted and survived a hackbench run (perf bench sched messaging -g 20 -l 5000). So while the paravirt code isn't the most optimal code ever conceived it does work. Also, the paravirt patching includes replacing the call with "movb $0, %arg1" for the native case, which should greatly reduce the cost of having CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware. I feel that if someone were to do a Xen patch we can go ahead and merge this stuff (finally!). These patches do not implement the paravirt spinlock debug stats currently implemented (separately) by KVM and Xen, but that should not be too hard to do on top and in the 'generic' code -- no reason to duplicate all that. Of course; once this lands people can look at improving the paravirt nonsense. last time I had reported some hangs in kvm case, and I can confirm that the current set of patches works fine. Feel free to add Tested-by: Raghavendra K T #kvm pv As far as performance is concerned (with my 16core +ht machine having 16vcpu guests [ even w/ , w/o the lfsr hash patchset ]), I do not see any significant observations to report, though I understand that we could see much more benefit with large number of vcpus because of possible reduction in cache bouncing. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 9/9] qspinlock, x86, kvm: Implement KVM support for paravirt qspinlock
On 03/20/2015 02:38 AM, Waiman Long wrote: On 03/19/2015 06:01 AM, Peter Zijlstra wrote: [...] You are probably right. The initial apply_paravirt() was done before the SMP boot. Subsequent ones were at kernel module load time. I put a counter in the __native_queue_spin_unlock() and it registered 26949 unlock calls in a 16-cpu guest before it got patched out. because even printks take lock.. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock
On 02/24/2015 08:50 PM, Greg KH wrote: On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote: * Greg KH wrote: On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T Reviewed-by: Oleg Nesterov Acked-by: David Vrabel --- arch/x86/include/asm/spinlock.h | 94 - arch/x86/kernel/kvm.c | 7 ++- arch/x86/xen/spinlock.c | 7 ++- 3 files changed, 58 insertions(+), 50 deletions(-) Changes for stable: - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous Compiler warnings (Linus, David Vbriel, PeterZ, Ingo) What is the git commit id of this in Linus's tree? What stable tree(s) do you want this applied to? It's: d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock You'll also need this fix from Linus to avoid (harmless) build warnings: dd36929720f4 kernel: make READ_ONCE() valid on const arguments Great. But what stable kernel trees should it be backported to? Just 3.19? Or anything older? My patch was intended only for 3.19. Though paravirt changes have gone in 3.12, the problem manifested clearly after some of the completion related changes. but I leave that decision to experts here. (I 'll send necessary changes if patch is needed for older versions because it may not apply cleanly). ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock
On 02/24/2015 08:17 PM, Ingo Molnar wrote: * Greg KH wrote: On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T Reviewed-by: Oleg Nesterov Acked-by: David Vrabel --- arch/x86/include/asm/spinlock.h | 94 - arch/x86/kernel/kvm.c | 7 ++- arch/x86/xen/spinlock.c | 7 ++- 3 files changed, 58 insertions(+), 50 deletions(-) Changes for stable: - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous Compiler warnings (Linus, David Vbriel, PeterZ, Ingo) What is the git commit id of this in Linus's tree? What stable tree(s) do you want this applied to? It's: d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock Yes, This is the original patch. Please note I have taken out the READ_ONCE changes from the original patch to avoid build warnings mentioned below. (Those READ_ONCE changes were cosmetic and was not present in the previous versions) You'll also need this fix from Linus to avoid (harmless) build warnings: dd36929720f4 kernel: make READ_ONCE() valid on const arguments So this may not be absolutely necessary with the current patch. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
On 02/16/2015 10:17 PM, David Vrabel wrote: On 15/02/15 17:30, Raghavendra K T wrote: --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -41,7 +41,7 @@ static u8 zero_stats; static inline void check_zero(void) { u8 ret; - u8 old = ACCESS_ONCE(zero_stats); + u8 old = READ_ONCE(zero_stats); if (unlikely(old)) { ret = cmpxchg(&zero_stats, old, 0); /* This ensures only one fellow resets the stat */ @@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting); int cpu = smp_processor_id(); u64 start; + __ticket_t head; unsigned long flags; /* If kicker interrupts not initialized yet, just spin */ @@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) */ __ticket_enter_slowpath(lock); + /* make sure enter_slowpath, which is atomic does not cross the read */ + smp_mb__after_atomic(); + /* * check again make sure it didn't become free while * we weren't looking */ - if (ACCESS_ONCE(lock->tickets.head) == want) { + head = READ_ONCE(lock->tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } @@ -204,8 +209,8 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu); /* Make sure we read lock before want */ - if (ACCESS_ONCE(w->lock) == lock && - ACCESS_ONCE(w->want) == next) { + if (READ_ONCE(w->lock) == lock && + READ_ONCE(w->want) == next) { add_stats(RELEASED_SLOW_KICKED, 1); xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); break; Acked-by: David Vrabel Although some of the ACCESS_ONCE to READ_ONCE changes are cosmetic and are perhaps best left out of a patch destined for stable. Thanks. Yes, will send out a separate patch for -stable without READ_ONCE changes once this patches goes in. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
On 02/15/2015 09:47 PM, Oleg Nesterov wrote: Well, I regret I mentioned the lack of barrier after enter_slowpath ;) On 02/15, Raghavendra K T wrote: @@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile unsigned long *)&lock->tickets.tail); + set_bit(0, (volatile unsigned long *)&lock->tickets.head); + barrier(); } Because this barrier() looks really confusing. Firsty, it is equally unneeded on x86. At the same time, it can not help. We need a memory barrier() between set_bit(SLOWPATH) and READ_ONCE(head) to avoid the race with spin_unlock(). So I think you should replace it with smp_mb__after_atomic() or remove it. I resent the patch the above change. Other than that I believe this version is correct. So I won't insist, this is cosmetic after all. Thanks Oleg. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
* Raghavendra K T [2015-02-15 11:25:44]: Resending the V5 with smp_mb__after_atomic() change without bumping up revision ---8<--- >From 0b9ecde30e3bf5b5b24009fd2ac5fc7ac4b81158 Mon Sep 17 00:00:00 2001 From: Raghavendra K T Date: Fri, 6 Feb 2015 16:44:11 +0530 Subject: [PATCH RESEND V5] x86 spinlock: Fix memory corruption on completing completions Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 95 - arch/x86/kernel/kvm.c | 10 +++-- arch/x86/xen/spinlock.c | 10 +++-- 3 files changed, 59 insertions(+), 56 deletions(-) potential TODO: * The whole patch be splitted into, 1. move slowpath flag 2. fix memory corruption in completion problem ?? Changes since V4: - one more usage of tickets_equal() (Oleg) - head > tail situation can lead to false contended check (Oleg) - smp_mb__after_atomic() added after slowptha_enter (Oleg) Changes since V3: - Detailed changelog (PeterZ) - Replace ACCESS_ONCE with READ_ONCE (oleg) - Add xen changes (Oleg) - Correct break logic in unlock_wait() (Oleg) Changes since V2: - Move the slowpath flag to head, this enables xadd usage in unlock code and inturn we can get rid of read/write after unlock (Oleg) - usage of ticket_equals (Oleg) Changes since V1: - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: Jeremy). - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy) - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) base = 3_19_rc7 3_19_rc7_spinfix_v3 +---+---+---++---+ kernbench (Time taken in sec lower is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 54.2300 3.0652 54.3008 4.0366-0.13056 2x 90.1883 5.5509 90.1650 6.4336 0.02583 +---+---+---++---+ +---+---+---++---+ dbench (Throughput higher is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 7029.9188 2.5952 6905.0712 4.4737-1.77595 2x 3254.207514.8291 3233.713726.8784-0.62976 +---+---+---++---+ (here is the result I got from the patches, I believe there may be some small overhead from xadd etc, but overall looks fine but a thorough test may be needed) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..cf87de3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arc
Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
On 02/15/2015 11:25 AM, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- Sasha, Hope this addresses invalid read concern you had with latest xadd based implementation. (Think we need to test without Oleg's PATCH] sched/completion: completion_done() should serialize with complete() reported by Paul.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 95 - arch/x86/kernel/kvm.c | 10 +++-- arch/x86/xen/spinlock.c | 10 +++-- 3 files changed, 59 insertions(+), 56 deletions(-) potential TODO: * The whole patch be splitted into, 1. move slowpath flag 2. fix memory corruption in completion problem ?? Changes since V4: - one more usage of tickets_equal() (Oleg) - head > tail situation can lead to false contended check (Oleg) Changes since V3: - Detailed changelog (PeterZ) - Replace ACCESS_ONCE with READ_ONCE (oleg) - Add xen changes (Oleg) - Correct break logic in unlock_wait() (Oleg) Changes since V2: - Move the slowpath flag to head, this enables xadd usage in unlock code and inturn we can get rid of read/write after unlock (Oleg) - usage of ticket_equals (Oleg) Changes since V1: - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: Jeremy). - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy) - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) base = 3_19_rc7 3_19_rc7_spinfix_v3 +---+---+---++---+ kernbench (Time taken in sec lower is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 54.2300 3.0652 54.3008 4.0366-0.13056 2x 90.1883 5.5509 90.1650 6.4336 0.02583 +---+---+---++---+ +---+---+---++---+ dbench (Throughput higher is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 7029.9188 2.5952 6905.0712 4.4737-1.77595 2x 3254.207514.8291 3233.713726.8784-0.62976 +---+---+---++---+ (here is the result I got from the patches, I believe there may be some small overhead from xadd etc, but overall looks fine but a thorough test may be needed) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..4413315 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile unsigned long *)&lock->tickets.tail); + set_bit(0, (volatile unsigned long *)&lock->tickets.head); + barrier(); } #else /* !CONFIG_PARAVIRT_SPINLOCKS */ @@ -60,10 +61,30 @@ static inline v
Re: [Xen-devel] [PATCH V4] x86 spinlock: Fix memory corruption on completing completions
On 02/13/2015 09:02 PM, Oleg Nesterov wrote: On 02/13, Raghavendra K T wrote: @@ -164,7 +161,7 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock) { struct __raw_tickets tmp = READ_ONCE(lock->tickets); - return tmp.tail != tmp.head; + return tmp.tail != (tmp.head & ~TICKET_SLOWPATH_FLAG); } Well, this can probably use __tickets_equal() too. But this is cosmetic. That looks good. Added. It seems that arch_spin_is_contended() should be fixed with this change, (__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC can be true because of TICKET_SLOWPATH_FLAG in .head, even if it is actually unlocked. Done. Hmm! it was because I was still under impression that slowpath bit is in tail. You are right, situation could lead to positive max and may report false contention. And the "(__ticket_t)" typecast looks unnecessary, it only adds more confusuin, but this is cosmetic too. Done. @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * check again make sure it didn't become free while * we weren't looking. */ - if (ACCESS_ONCE(lock->tickets.head) == want) { + head = READ_ONCE(lock->tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; This is off-topic, but with or without this change perhaps it makes sense to add smp_mb__after_atomic(). It is nop on x86, just to make this code more understandable for those (for me ;) who can never remember even the x86 rules. Hope you meant it for add_stat. yes smp_mb__after_atomic() would be harmless barrier() in x86. Did not add this V5 as yoiu though but this made me look at slowpath_enter code and added an explicit barrier() there :). ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V4] x86 spinlock: Fix memory corruption on completing completions
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg): Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we also avoid the need to access both the head and tail tickets on unlock. 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus): We further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] [PeterZ: Detailed changelog] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 91 - arch/x86/kernel/kvm.c | 10 +++-- arch/x86/xen/spinlock.c | 10 +++-- 3 files changed, 56 insertions(+), 55 deletions(-) potential TODO: * The whole patch be splitted into, 1. move slowpath flag 2. fix memory corruption in completion problem ?? Changes since V3: - Detailed changelog (PeterZ) - Replace ACCESS_ONCE with READ_ONCE (oleg) - Add xen changes (Oleg) - Correct break logic in unlock_wait() (Oleg) Changes since V2: - Move the slowpath flag to head, this enables xadd usage in unlock code and inturn we can get rid of read/write after unlock (Oleg) - usage of ticket_equals (Oleg) Changes since V1: - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: Jeremy). - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy) - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison. Note: The current implementation is still based on avoid writing after unlock. we could still have potential invalid memory read. (Sasha) Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) base = 3_19_rc7 3_19_rc7_spinfix_v3 +---+---+---++---+ kernbench (Time taken in sec lower is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 54.2300 3.0652 54.3008 4.0366-0.13056 2x 90.1883 5.5509 90.1650 6.4336 0.02583 +---+---+---++---+ +---+---+---++---+ dbench (Throughput higher is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 7029.9188 2.5952 6905.0712 4.4737-1.77595 2x 3254.207514.8291 3233.713726.8784-0.62976 +---+---+---++---+ (here is the result I got from the patches, I believe there may be some small overhead from xadd etc, but overall looks fine but a thorough test may be needed) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..646a1a3 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile unsigned long *)&lock->tickets.tail); + set_bit(0, (volatile unsigned long *)&lock->tickets.head); } #else /* !CONFIG_PARAVIRT_SPINLOCKS */ @@ -60,10 +60,30 @@ static inline void
Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
On 02/12/2015 08:30 PM, Peter Zijlstra wrote: On Thu, Feb 12, 2015 at 05:17:27PM +0530, Raghavendra K T wrote: [...] Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg). 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus). Maybe spend a few more words explaining these things; something like: Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, as long as we clear it again on the first (try)lock -- this removes the write after unlock. Nit: I 'll reword a bit here since slowpath flag would result in unnecessary kick but otherwise harmless IMO. something like: Unlocked locks don't care about the slowpath flag; therefore we can keep it set after the last unlock, and clear it again on the first (try)lock. -- this removes the write after unlock. note that keeping slowpath flag would result in unnecessary kicks. By moving the slowpath flag from the tail to the head ticket we avoid the need to access both the head and tail tickets on unlock. We can further avoid the need for a read-after-release by using xadd; the prev head value will include the slowpath flag and indicate if we need to do PV kicking of suspended spinners -- on modern chips xadd isn't (much) more expensive than an add + load. Its 'obvious' now, but maybe not so much after we've all not looked at this for a few months. Absolutely correct. Thanks Peter for the detailed and very helpful writeup. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
On 02/12/2015 07:32 PM, Oleg Nesterov wrote: Damn, sorry for noise, forgot to mention... On 02/12, Raghavendra K T wrote: +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) +{ + if (head & TICKET_SLOWPATH_FLAG) { + arch_spinlock_t old, new; + + old.tickets.head = head; + new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; + old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; + new.tickets.tail = old.tickets.tail; + + /* try to clear slowpath flag when there are no contenders */ + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); + } +} ... +clear_slowpath: + if (TICKET_SLOWPATH_FLAG) + __ticket_check_and_clear_slowpath(lock, inc.head); I think you can remove this "if (TICKET_SLOWPATH_FLAG)" check. If it is defined as zero, gcc should optimize out the code under "if (head & 0)". right, the above if ( ) is unnecesary, though we would have same code at the end, getting rid of that makes code more clean. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
On 02/12/2015 07:20 PM, Oleg Nesterov wrote: On 02/12, Raghavendra K T wrote: @@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t *lock) * We need to check "unlocked" in a loop, tmp.head == head * can be false positive because of overflow. */ - if (tmp.head == (tmp.tail & ~TICKET_SLOWPATH_FLAG) || - tmp.head != head) + if (__tickets_equal(tmp.head, tmp.tail) || tmp.head != head) break; Ah, it seems that "tmp.head != head" should be turned into !__tickets_equal(), no? Suppose that TICKET_SLOWPATH_FLAG is set after the first ACCESS_ONCE(head), then tmp.head != head will be true before the first unlock we are waiting for. Good catch. othewise we would wrongly break out even when somebody does halt wait. And perhaps you can turn these ACCESS_ONCE into READ_ONCE as well. Yes again :) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
On 02/12/2015 07:07 PM, Oleg Nesterov wrote: On 02/12, Raghavendra K T wrote: @@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want) * check again make sure it didn't become free while * we weren't looking. */ - if (ACCESS_ONCE(lock->tickets.head) == want) { + head = ACCESS_ONCE(lock->tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); While at it, perhaps it makes sense to s/ACCESS_ONCE/READ_ONCE/ but this is cosmetic. yes.. will do. We also need to change another user of enter_slow_path, xen_lock_spinning() in arch/x86/xen/spinlock.c. Had in mind but forgot before sending. will update resend. Other than that looks correct at first glance... but this is up to maintainers. Thanks ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. So this patch implements the fix with: 1. Moving slowpath flag to head (Oleg). 2. use xadd to avoid read/write after unlock that checks the need for unlock_kick (Linus). Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) benchmark overcommit %improve kernbench 1x -0.13 kernbench 2x0.02 dbench 1x -1.77 dbench 2x -0.63 [Jeremy: hinted missing TICKET_LOCK_INC for kick] [Oleg: Moving slowpath flag to head, ticket_equals idea] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 87 - arch/x86/kernel/kvm.c | 4 +- 2 files changed, 45 insertions(+), 46 deletions(-) potential TODO: * The whole patch be splitted into, 1. move slowpath flag 2. fix memory corruption in completion problem ?? * May be we could directly pass inc for __ticket_check_and_clear_slowpath but I hope current code is more readable. Changes since V2: - Move the slowpath flag to head, this enables xadd usage in unlock code and inturn we can get rid of read/write after unlock (Oleg) - usage of ticket_equals (Oleg) Changes since V1: - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: Jeremy). - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy) - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison. Note: The current implementation is still based on avoid writing after unlock. we could still have potential invalid memory read. (Sasha) Result: setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest) base = 3_19_rc7 3_19_rc7_spinfix_v3 +---+---+---++---+ kernbench (Time taken in sec lower is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 54.2300 3.0652 54.3008 4.0366-0.13056 2x 90.1883 5.5509 90.1650 6.4336 0.02583 +---+---+---++---+ +---+---+---++---+ dbench (Throughput higher is better) +---+---+---++---+ base %stdevpatched %stdev %improve +---+---+---++---+ 1x 7029.9188 2.5952 6905.0712 4.4737-1.77595 2x 3254.207514.8291 3233.713726.8784-0.62976 +---+---+---++---+ (here is the result I got from the patches, I believe there may be some small overhead from xadd etc, but overall looks fine but a thorough test may be needed) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..9697b45 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct static_key *key); static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) { - set_bit(0, (volatile unsigned long *)&lock->tickets.tail); + set_bit(0, (volatile unsigned long *)&lock->tickets.head); } #else /* !CONFIG_PARAVIRT_SPINLOCKS */ @@ -60,10 +60,30 @@ static inline void __ticket_unlock_kick(arch_spinlock_t *lock, } #endif /* CONFIG_PARAVIRT_SPINLOCKS */ +static inline int __tickets_equal(__ticket_t one, __ticket_t two) +{ + return !((one ^ two) & ~TICKET_SLOWPATH_FLAG); +} + +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, + __ticket_t head) +{ + if (head & TICKET_SLOWPATH_FLAG) { + arch_spinlock_t old, new; + + old.tickets.head = head; + new.tickets.head = head & ~TICKET_SLOWPATH_FLAG; + old.tickets.tail = new.tickets.head + TICKET_LOCK_INC; + new.tickets.tail = old.tickets.tail; + + /* try to clear slowpath flag when there are no contender
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/11/2015 11:08 PM, Oleg Nesterov wrote: On 02/11, Raghavendra K T wrote: On 02/10/2015 06:56 PM, Oleg Nesterov wrote: In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from this, before doing xadd in unlock, we would have to make sure lsb bit is cleared so that we can live with 1 bit overflow to tail which is unused. now either or both of head,tail lsb bit may be set after unlock. Sorry, can't understand... could you spell? If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do head = xadd(&lock->tickets.head, TICKET_LOCK_INC); if (head & TICKET_SLOWPATH_FLAG) __ticket_unlock_kick(head); so it can't overflow to .tail? You are right. I totally forgot we can get rid of tail operations :) And we we do this, probably it makes sense to add something like bool tickets_equal(__ticket_t one, __ticket_t two) { return (one ^ two) & ~TICKET_SLOWPATH_FLAG; } Very nice idea. I was tired of ~TICKET_SLOWPATH_FLAG usage all over in the current (complex :)) implementation. These two suggestions helps alot. and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus it can have more users in asm/spinlock.h. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 06:56 PM, Oleg Nesterov wrote: On 02/10, Raghavendra K T wrote: On 02/10/2015 06:23 AM, Linus Torvalds wrote: add_smp(&lock->tickets.head, TICKET_LOCK_INC); if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. into something like val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT); if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... would be the right thing to do. Somebody should just check that I got that shift right, and that the tail is in the high bytes (head really needs to be high to work, if it's in the low byte(s) the xadd would overflow from head into tail which would be wrong). Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? Stupid question... what if we simply move SLOWPATH from .tail to .head? In this case arch_spin_unlock() could do xadd(tickets.head) and check the result It is a good idea. Trying this now. In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg the whole .head_tail. Plus obviously more boring changes. This needs a separate patch even _if_ this can work. Correct, but apart from this, before doing xadd in unlock, we would have to make sure lsb bit is cleared so that we can live with 1 bit overflow to tail which is unused. now either or both of head,tail lsb bit may be set after unlock. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/10/2015 06:23 AM, Linus Torvalds wrote: On Mon, Feb 9, 2015 at 4:02 AM, Peter Zijlstra wrote: On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote: So we have 3 choices, 1. xadd 2. continue with current approach. 3. a read before unlock and also after that. For the truly paranoid we have probe_kernel_address(), suppose the lock was in module space and the module just got unloaded under us. That's much too expensive. The xadd shouldn't be noticeably more expensive than the current "add_smp()". Yes, "lock xadd" used to be several cycles slower than just "lock add" on some early cores, but I think these days it's down to a single-cycle difference, which is not really different from doing a separate load after the add. The real problem with xadd used to be that we always had to do magic special-casing for i386, but that's one of the reasons we dropped support for original 80386. So I think Raghavendra's last version (which hopefully fixes the lockup problem that Sasha reported) together with changing that V2 did pass the stress, but getting confirmation Sasha would help. add_smp(&lock->tickets.head, TICKET_LOCK_INC); if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) .. into something like val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT); if (unlikely(val & TICKET_SLOWPATH_FLAG)) ... would be the right thing to do. Somebody should just check that I got that shift right, and that the tail is in the high bytes (head really needs to be high to work, if it's in the low byte(s) the xadd would overflow from head into tail which would be wrong). Unfortunately xadd could result in head overflow as tail is high. The other option was repeated cmpxchg which is bad I believe. Any suggestions? ( I have the V3 with Oleg's suggestion and performance numbers but without this getting resolved, It will be one unnecessary iteration). How about getting rid off SLOW_PATH_FLAG in spinlock (i.e. use it only as hint for paravirt), but do unlock_kick whenever we see that (tail-head) > TICKET_LOCK_INC?. (but this also may need cmpxchg in loop in unlock but we will be able to get rid of clear slowpath logic) Only problem is we may do unnecessary kicks even in 1x load. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH V2] x86 spinlock: Fix memory corruption on completing completions
Ccing Davidlohr, (sorry that I got confused with similar address in cc list). On 02/09/2015 08:44 PM, Oleg Nesterov wrote: On 02/09, Raghavendra K T wrote: +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ + arch_spinlock_t old, new; + __ticket_t diff; + + old.tickets = READ_ONCE(lock->tickets); + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head; + + /* try to clear slowpath flag when there are no contenders */ + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) && + (diff == TICKET_LOCK_INC)) { + new = old; + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG; + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); + } +} Can't we simplify it? We own .head, and we already know it. We only need to clear TICKET_SLOWPATH_FLAG in .tail atomically? IOW, static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock, __ticket_t head) { __ticket_t old_tail, new_tail; new_tail = head + TICKET_LOCK_INC; old_tail = new_tail | TICKET_SLOWPATH_FLAG; if (READ_ONCE(lock->tickets.tail) == old_tail) cmpxchg(&lock->tickets.tail, old_tail, new_tail); } Plus - __ticket_check_and_clear_slowpath(lock); + __ticket_check_and_clear_slowpath(lock, inc.tail); Or I missed something? Thanks.. Perfect, 'll update with this change. (Jeremy had hinted similar). And I think it would be better to avoid ifdef(CONFIG_PARAVIRT_SPINLOCKS), ww can just do if (TICKET_SLOWPATH_FLAG) __ticket_check_and_clear_slowpath(); Okay. While at it, I think current arch_spin_unlock() has similar structure and wanted to clean it up. considering we define TICKET_SLOWPATH_FLAG 0 or 1, I think compiler would be smart enough to generate appropriate code and we could avoid #ifdef. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH V2] x86 spinlock: Fix memory corruption on completing completions
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. However it brings additional case to be handled, viz., slowpath still could be set when somebody does arch_trylock. Handle that too by ignoring slowpath flag during lock availability check. [Jeremy: hinted missing TICKET_LOCK_INC for kick] Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 70 - 1 file changed, 34 insertions(+), 36 deletions(-) Changes since V1: Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: Jeremy). Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy) clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison. Note: The current implementation is still based on avoid writing after unlock. we could still have potential invalid memory read. (Sasha) Patch has passed stress testing. diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..7fc50d7 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) set_bit(0, (volatile unsigned long *)&lock->tickets.tail); } +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ + arch_spinlock_t old, new; + __ticket_t diff; + + old.tickets = READ_ONCE(lock->tickets); + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head; + + /* try to clear slowpath flag when there are no contenders */ + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) && + (diff == TICKET_LOCK_INC)) { + new = old; + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG; + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); + } +} + #else /* !CONFIG_PARAVIRT_SPINLOCKS */ static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock, __ticket_t ticket) @@ -59,11 +76,15 @@ static inline void __ticket_unlock_kick(arch_spinlock_t *lock, { } +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ +} + #endif /* CONFIG_PARAVIRT_SPINLOCKS */ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) { - return lock.tickets.head == lock.tickets.tail; + return lock.tickets.head == (lock.tickets.tail & ~TICKET_SLOWPATH_FLAG); } /* @@ -98,7 +119,9 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) } while (--count); __ticket_lock_spinning(lock, inc.tail); } -out: barrier(); /* make sure nothing creeps before the lock is taken */ + __ticket_check_and_clear_slowpath(lock); +out: + barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) @@ -115,47 +138,22 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock, - arch_spinlock_t old) -{ - arch_spinlock_t new; - - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - - /* Perform the unlock on the "before" copy */ - old.tickets.head += TICKET_LOCK_INC; - - /* Clear the slowpath flag */ - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT); - - /* -* If the lock is uncontended, clear the flag - use cmpxchg in -* case it changes behind our back though. -*/ - if (new.tickets.head != new.tickets.tail || - cmpxchg(&lock->head_tail, old.head_tail, - new.head_tail) != old.head_tail) { - /* -* Lock still has someone queued for it, so wake up an -* appropriate waiter. -*/ - __ticket_unlock_kick(lock, old.tic
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/09/2015 05:32 PM, Peter Zijlstra wrote: On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote: So we have 3 choices, 1. xadd 2. continue with current approach. 3. a read before unlock and also after that. For the truly paranoid we have probe_kernel_address(), suppose the lock was in module space and the module just got unloaded under us. Thanks.. Good idea, How costly is it? atleast we could do probe_kernel_address() and check the value of slowpath flag if people as us to address invalid read problem. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/09/2015 02:44 AM, Jeremy Fitzhardinge wrote: On 02/06/2015 06:49 AM, Raghavendra K T wrote: [...] Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. Yep, that seems like a sound approach. Current approach seem to be working now. (though we could not avoid read). Related question: Do you think we could avoid SLOWPATH_FLAG itself by checking head and tail difference. or is it costly because it may result in unnecessary unlock_kicks? However it brings additional case to be handled, viz., slowpath still could be set when somebody does arch_trylock. Handle that too by ignoring slowpath flag during lock availability check. Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 70 - 1 file changed, 34 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..0829f86 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) set_bit(0, (volatile unsigned long *)&lock->tickets.tail); } +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ + arch_spinlock_t old, new; + __ticket_t diff; + + old.tickets = READ_ONCE(lock->tickets); Couldn't the caller pass in the lock state that it read rather than re-reading it? Yes we could. do you mean we could pass additional read value apart from lock, (because lock will be anyway needed for cmpxchg). +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ +} + #endif /* CONFIG_PARAVIRT_SPINLOCKS */ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; inc = xadd(&lock->tickets, inc); - if (likely(inc.head == inc.tail)) + if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG))) good point, we can get rid of this as well. The intent of this conditional was to be the quickest possible path when taking a fastpath lock, with the code below being used for all slowpath locks (free or taken). So I don't think masking out SLOWPATH_FLAG is necessary here. goto out; inc.tail &= ~TICKET_SLOWPATH_FLAG; @@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) } while (--count); __ticket_lock_spinning(lock, inc.tail); } -out: barrier(); /* make sure nothing creeps before the lock is taken */ +out: + __ticket_check_and_clear_slowpath(lock); + + barrier(); /* make sure nothing creeps before the lock is taken */ Which means that if "goto out" path is only ever used for fastpath locks, you can limit calling __ticket_check_and_clear_slowpath() to the slowpath case. Yes, I ll move that call up. } static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) @@ -115,47 +139,21 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock, - arch_spinlock_t old) -{ - arch_spinlock_t new; - - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - - /* Perform the unlock on the "before" copy */ - old.tickets.head += TICKET_LOCK_INC; NB (see below) Thanks for pointing, this solved the hang issue. I missed this exact addition. - - /* Clear the slowpath flag */ - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT); - - /* -* If the lock is uncontended, clear the flag - use cmpxchg in -* case it changes behind our back though. -*/ - if (new.tickets.head != new.tickets.tail || - cmpxchg(&lock->head_tail, old.head_tail, - new.head_tail) != old.head_tail) { - /* -* Lock still has someone queued for it, so wake up an -* appropriate waiter. -*/ - __ticket_unlock_kick(lock, old.tickets.head); - } -} - static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { if (TICKET_SLOWPATH_FLAG && - static_key_false(¶virt_ticketlocks_enabled)) { - arch_spinlock_t prev; + static_key_false(¶virt_ticketlocks_enabled)) { + __ticket_t prev_head; - prev = *lock; + prev_head = lock->tickets.head;
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/07/2015 12:27 AM, Sasha Levin wrote: On 02/06/2015 09:49 AM, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. However it brings additional case to be handled, viz., slowpath still could be set when somebody does arch_trylock. Handle that too by ignoring slowpath flag during lock availability check. Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T With this patch, my VMs lock up quickly after boot with: Tried to reproduce the hang myself, and there seems to be still a barrier (or logic I miss). Looking closely below, unlock_kick got missed though we see that SLOWPATH_FLAG is still set: /me goes back to look closely (gdb) bt #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 #1 0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116 #2 kvm_lock_spinning (lock=0x88023ffe8240, want=52504) at arch/x86/kernel/kvm.c:786 #3 0x81037251 in __raw_callee_save_kvm_lock_spinning () #4 0x88023fc0edb0 in ?? () #5 0x in ?? () (gdb) p *(arch_spinlock_t *)0x88023ffe8240 $1 = {{head_tail = 3441806612, tickets = {head = 52500, tail = 52517}}} (gdb) t 2 [Switching to thread 2 (Thread 2)] #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 55 } (gdb) bt #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 #1 0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116 #2 kvm_lock_spinning (lock=0x88023ffe8240, want=52502) at arch/x86/kernel/kvm.c:786 #3 0x81037251 in __raw_callee_save_kvm_lock_spinning () #4 0x0246 in irq_stack_union () #5 0x00080750 in ?? () #6 0x0002 in ?? () #7 0x0004 in irq_stack_union () #8 0xcd16 in nmi_print_seq () Cannot access memory at address 0xbfc0 (gdb) t 3 [Switching to thread 3 (Thread 3)] #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 55 } (gdb) bt #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 #1 0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116 #2 kvm_lock_spinning (lock=0x88023ffe8240, want=52512) at arch/x86/kernel/kvm.c:786 #3 0x81037251 in __raw_callee_save_kvm_lock_spinning () #4 0x88023fc8edb0 in ?? () #5 0x in ?? () [...] //other threads with similar output (gdb) t 8 [Switching to thread 8 (Thread 8)] #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 55 } (gdb) bt #0 native_halt () at ./arch/x86/include/asm/irqflags.h:55 #1 0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116 #2 kvm_lock_spinning (lock=0x88023ffe8240, want=52500) at arch/x86/kernel/kvm.c:786 #3 0x81037251 in __raw_callee_save_kvm_lock_spinning () #4 0x88023fdcedb0 in ?? () #5 0x in ?? () ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
On 02/06/2015 09:55 PM, Linus Torvalds wrote: On Fri, Feb 6, 2015 at 6:49 AM, Raghavendra K T wrote: Paravirt spinlock clears slowpath flag after doing unlock. [ fix edited out ] So I'm not going to be applying this for 3.19, because it's much too late and the patch is too scary. Plus the bug probably effectively never shows up in real life (it is probably easy to trigger the speculative *read* but probably never the actual speculative write after dropping the lock last). Understood and agreed. This will need a lot of testing by the paravirt people - both performance and correctness. So *maybe* for 3.20, but maybe for even later, and then marked for stable, of course. Are there any good paravirt stress-tests that people could run for extended times? I have been running several benchmarks (kern, sys, hack, ebizzy etc in in 1x,2x scenarios. I run them for performance test as well. (In the current patch I did not get kvm hang in normal run, But overcommit reproduced it). ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions
Paravirt spinlock clears slowpath flag after doing unlock. As explained by Linus currently it does: prev = *lock; add_smp(&lock->tickets.head, TICKET_LOCK_INC); /* add_smp() is a full mb() */ if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG)) __ticket_unlock_slowpath(lock, prev); which is *exactly* the kind of things you cannot do with spinlocks, because after you've done the "add_smp()" and released the spinlock for the fast-path, you can't access the spinlock any more. Exactly because a fast-path lock might come in, and release the whole data structure. Linus suggested that we should not do any writes to lock after unlock(), and we can move slowpath clearing to fastpath lock. However it brings additional case to be handled, viz., slowpath still could be set when somebody does arch_trylock. Handle that too by ignoring slowpath flag during lock availability check. Reported-by: Sasha Levin Suggested-by: Linus Torvalds Signed-off-by: Raghavendra K T --- arch/x86/include/asm/spinlock.h | 70 - 1 file changed, 34 insertions(+), 36 deletions(-) diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h index 625660f..0829f86 100644 --- a/arch/x86/include/asm/spinlock.h +++ b/arch/x86/include/asm/spinlock.h @@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t *lock) set_bit(0, (volatile unsigned long *)&lock->tickets.tail); } +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ + arch_spinlock_t old, new; + __ticket_t diff; + + old.tickets = READ_ONCE(lock->tickets); + diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head; + + /* try to clear slowpath flag when there are no contenders */ + if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) && + (diff == TICKET_LOCK_INC)) { + new = old; + new.tickets.tail &= ~TICKET_SLOWPATH_FLAG; + cmpxchg(&lock->head_tail, old.head_tail, new.head_tail); + } +} + #else /* !CONFIG_PARAVIRT_SPINLOCKS */ static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock, __ticket_t ticket) @@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t *lock, { } +static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock) +{ +} + #endif /* CONFIG_PARAVIRT_SPINLOCKS */ static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock) @@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC }; inc = xadd(&lock->tickets, inc); - if (likely(inc.head == inc.tail)) + if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG))) goto out; inc.tail &= ~TICKET_SLOWPATH_FLAG; @@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t *lock) } while (--count); __ticket_lock_spinning(lock, inc.tail); } -out: barrier(); /* make sure nothing creeps before the lock is taken */ +out: + __ticket_check_and_clear_slowpath(lock); + + barrier(); /* make sure nothing creeps before the lock is taken */ } static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) @@ -115,47 +139,21 @@ static __always_inline int arch_spin_trylock(arch_spinlock_t *lock) return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == old.head_tail; } -static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock, - arch_spinlock_t old) -{ - arch_spinlock_t new; - - BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS); - - /* Perform the unlock on the "before" copy */ - old.tickets.head += TICKET_LOCK_INC; - - /* Clear the slowpath flag */ - new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT); - - /* -* If the lock is uncontended, clear the flag - use cmpxchg in -* case it changes behind our back though. -*/ - if (new.tickets.head != new.tickets.tail || - cmpxchg(&lock->head_tail, old.head_tail, - new.head_tail) != old.head_tail) { - /* -* Lock still has someone queued for it, so wake up an -* appropriate waiter. -*/ - __ticket_unlock_kick(lock, old.tickets.head); - } -} - static __always_inline void arch_spin_unlock(arch_spinlock_t *lock) { if (TICKET_SLOWPATH_FLAG && - static_key_false(¶virt_ticketlocks_enabled)) { -
Re: [Xen-devel] [PATCH v14 08/11] qspinlock, x86: Rename paravirt_ticketlocks_enabled
On 01/21/2015 01:42 AM, Waiman Long wrote: This patch renames the paravirt_ticketlocks_enabled static key to a more generic paravirt_spinlocks_enabled name. Signed-off-by: Waiman Long Signed-off-by: Peter Zijlstra --- Reviewed-by: Raghavendra K T ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel