Re: [Xen-devel] [PATCH 0/9] qspinlock stuff -v15

2015-03-26 Thread Raghavendra K T

On 03/16/2015 06:46 PM, Peter Zijlstra wrote:

Hi Waiman,

As promised; here is the paravirt stuff I did during the trip to BOS last week.

All the !paravirt patches are more or less the same as before (the only real
change is the copyright lines in the first patch).

The paravirt stuff is 'simple' and KVM only -- the Xen code was a little more
convoluted and I've no real way to test that but it should be stright fwd to
make work.

I ran this using the virtme tool (thanks Andy) on my laptop with a 4x
overcommit on vcpus (16 vcpus as compared to the 4 my laptop actually has) and
it both booted and survived a hackbench run (perf bench sched messaging -g 20
-l 5000).

So while the paravirt code isn't the most optimal code ever conceived it does 
work.

Also, the paravirt patching includes replacing the call with "movb $0, %arg1"
for the native case, which should greatly reduce the cost of having
CONFIG_PARAVIRT_SPINLOCKS enabled on actual hardware.

I feel that if someone were to do a Xen patch we can go ahead and merge this
stuff (finally!).

These patches do not implement the paravirt spinlock debug stats currently
implemented (separately) by KVM and Xen, but that should not be too hard to do
on top and in the 'generic' code -- no reason to duplicate all that.

Of course; once this lands people can look at improving the paravirt nonsense.



last time I had reported some hangs in kvm case, and I can confirm that
the current set of patches works fine.

Feel free to add
Tested-by: Raghavendra K T  #kvm pv

As far as performance is concerned (with my 16core +ht machine having
16vcpu guests [ even w/ , w/o the lfsr hash patchset ]), I do not see
any significant observations to report, though I understand that we
could see much more benefit with large number of vcpus because of
possible reduction in cache bouncing.







___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 9/9] qspinlock, x86, kvm: Implement KVM support for paravirt qspinlock

2015-03-20 Thread Raghavendra K T

On 03/20/2015 02:38 AM, Waiman Long wrote:

On 03/19/2015 06:01 AM, Peter Zijlstra wrote:

[...]

You are probably right. The initial apply_paravirt() was done before the
SMP boot. Subsequent ones were at kernel module load time. I put a
counter in the __native_queue_spin_unlock() and it registered 26949
unlock calls in a 16-cpu guest before it got patched out.


because even printks take lock..


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:50 PM, Greg KH wrote:

On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote:


* Greg KH  wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
Reviewed-by: Oleg Nesterov 
Acked-by: David Vrabel 
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


Great.  But what stable kernel trees should it be backported to?  Just
3.19?  Or anything older?


My patch was intended only for 3.19.

Though paravirt changes have gone in 3.12, the problem manifested
clearly after some of the completion related changes. but I leave that 
decision to experts here. (I 'll send necessary changes if patch is

needed for older versions because it may not apply cleanly).



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:17 PM, Ingo Molnar wrote:


* Greg KH  wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
Reviewed-by: Oleg Nesterov 
Acked-by: David Vrabel 
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock


Yes, This is the original patch. Please note I have taken out the
READ_ONCE changes from the original patch to avoid build warnings
mentioned below.
(Those READ_ONCE changes were cosmetic and was not present in the
previous versions)



You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


So this may not be absolutely necessary with the current patch.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-17 Thread Raghavendra K T

On 02/16/2015 10:17 PM, David Vrabel wrote:

On 15/02/15 17:30, Raghavendra K T wrote:

--- a/arch/x86/xen/spinlock.c
+++ b/arch/x86/xen/spinlock.c
@@ -41,7 +41,7 @@ static u8 zero_stats;
  static inline void check_zero(void)
  {
u8 ret;
-   u8 old = ACCESS_ONCE(zero_stats);
+   u8 old = READ_ONCE(zero_stats);
if (unlikely(old)) {
ret = cmpxchg(&zero_stats, old, 0);
/* This ensures only one fellow resets the stat */
@@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
struct xen_lock_waiting *w = this_cpu_ptr(&lock_waiting);
int cpu = smp_processor_id();
u64 start;
+   __ticket_t head;
unsigned long flags;

/* If kicker interrupts not initialized yet, just spin */
@@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 */
__ticket_enter_slowpath(lock);

+   /* make sure enter_slowpath, which is atomic does not cross the read */
+   smp_mb__after_atomic();
+
/*
 * check again make sure it didn't become free while
 * we weren't looking
 */
-   if (ACCESS_ONCE(lock->tickets.head) == want) {
+   head = READ_ONCE(lock->tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;
}
@@ -204,8 +209,8 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
__ticket_t next)
const struct xen_lock_waiting *w = &per_cpu(lock_waiting, cpu);

/* Make sure we read lock before want */
-   if (ACCESS_ONCE(w->lock) == lock &&
-   ACCESS_ONCE(w->want) == next) {
+   if (READ_ONCE(w->lock) == lock &&
+   READ_ONCE(w->want) == next) {
add_stats(RELEASED_SLOW_KICKED, 1);
xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
break;


Acked-by: David Vrabel 

Although some of the ACCESS_ONCE to READ_ONCE changes are cosmetic and
are perhaps best left out of a patch destined for stable.



Thanks.
Yes, will send out a separate patch for -stable without READ_ONCE 
changes once this patches goes in.






___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Raghavendra K T

On 02/15/2015 09:47 PM, Oleg Nesterov wrote:

Well, I regret I mentioned the lack of barrier after enter_slowpath ;)

On 02/15, Raghavendra K T wrote:


@@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct 
static_key *key);

  static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
  {
-   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
+   set_bit(0, (volatile unsigned long *)&lock->tickets.head);
+   barrier();
  }


Because this barrier() looks really confusing.

Firsty, it is equally unneeded on x86. At the same time, it can not help.
We need a memory barrier() between set_bit(SLOWPATH) and READ_ONCE(head)
to avoid the race with spin_unlock().

So I think you should replace it with smp_mb__after_atomic() or remove it.



I resent the patch the above change.



Other than that I believe this version is correct. So I won't insist, this
is cosmetic after all.


Thanks Oleg.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-15 Thread Raghavendra K T
* Raghavendra K T  [2015-02-15 11:25:44]:

Resending the V5 with smp_mb__after_atomic() change without bumping up
revision

---8<---

>From 0b9ecde30e3bf5b5b24009fd2ac5fc7ac4b81158 Mon Sep 17 00:00:00 2001
From: Raghavendra K T 
Date: Fri, 6 Feb 2015 16:44:11 +0530
Subject: [PATCH RESEND V5] x86 spinlock: Fix memory corruption on completing
 completions

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 95 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 59 insertions(+), 56 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head > tail situation can lead to false contended check (Oleg)
  - smp_mb__after_atomic() added after slowptha_enter (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)


diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..cf87de3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arc

Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T

On 02/15/2015 11:25 AM, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---


Sasha, Hope this addresses invalid read concern you had with latest
xadd based implementation.

(Think we need to test without Oleg's PATCH] sched/completion: 
completion_done() should serialize with complete() reported by Paul.)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 95 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 59 insertions(+), 56 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head > tail situation can lead to false contended check (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)


diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..4413315 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,8 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
+   set_bit(0, (volatile unsigned long *)&lock->tickets.head);
+   barrier();
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +61,30 @@ static inline v

Re: [Xen-devel] [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-14 Thread Raghavendra K T

On 02/13/2015 09:02 PM, Oleg Nesterov wrote:

On 02/13, Raghavendra K T wrote:


@@ -164,7 +161,7 @@ static inline int arch_spin_is_locked(arch_spinlock_t *lock)
  {
struct __raw_tickets tmp = READ_ONCE(lock->tickets);

-   return tmp.tail != tmp.head;
+   return tmp.tail != (tmp.head & ~TICKET_SLOWPATH_FLAG);
  }


Well, this can probably use __tickets_equal() too. But this is cosmetic.


That looks good. Added.


It seems that arch_spin_is_contended() should be fixed with this change,

(__ticket_t)(tmp.tail - tmp.head) > TICKET_LOCK_INC

can be true because of TICKET_SLOWPATH_FLAG in .head, even if it is actually
unlocked.



Done.
Hmm! it was because I was still under impression that slowpath bit is
in tail. You are right, situation could lead to positive max and may
report false contention.

And the "(__ticket_t)" typecast looks unnecessary, it only adds more

confusuin, but this is cosmetic too.



Done.


@@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 * check again make sure it didn't become free while
 * we weren't looking.
 */
-   if (ACCESS_ONCE(lock->tickets.head) == want) {
+   head = READ_ONCE(lock->tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);
goto out;


This is off-topic, but with or without this change perhaps it makes sense
to add smp_mb__after_atomic(). It is nop on x86, just to make this code
more understandable for those (for me ;) who can never remember even the
x86 rules.



Hope you meant it for add_stat. yes  smp_mb__after_atomic() would be
harmless barrier() in x86. Did not add this V5 as yoiu though  but this
made me look at slowpath_enter code and added an explicit barrier() 
there :).




___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V4] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 91 -
 arch/x86/kernel/kvm.c   | 10 +++--
 arch/x86/xen/spinlock.c | 10 +++--
 3 files changed, 56 insertions(+), 55 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..646a1a3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
+   set_bit(0, (volatile unsigned long *)&lock->tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline void

Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 08:30 PM, Peter Zijlstra wrote:

On Thu, Feb 12, 2015 at 05:17:27PM +0530, Raghavendra K T wrote:

[...]


Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).


Maybe spend a few more words explaining these things; something like:

Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, as long as we clear it again on the first
(try)lock -- this removes the write after unlock.


Nit: I 'll reword a bit here since slowpath flag would result in
unnecessary kick but otherwise harmless IMO.
something like:
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag 
would result in unnecessary kicks.



By moving the slowpath flag from the tail to the head ticket we avoid
the need to access both the head and tail tickets on unlock.

We can further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Its 'obvious' now, but maybe not so much after we've all not looked at
this for a few months.



Absolutely correct. Thanks Peter for the detailed and very helpful
writeup.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:32 PM, Oleg Nesterov wrote:

Damn, sorry for noise, forgot to mention...

On 02/12, Raghavendra K T wrote:


+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head & TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head & ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when there are no contenders */
+   cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
+   }
+}


...


+clear_slowpath:
+   if (TICKET_SLOWPATH_FLAG)
+   __ticket_check_and_clear_slowpath(lock, inc.head);


I think you can remove this "if (TICKET_SLOWPATH_FLAG)" check. If it is
defined as zero, gcc should optimize out the code under "if (head & 0)".



right, the above if ( ) is unnecesary, though we would have same code
at the end, getting rid of that makes code more clean.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:20 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -191,8 +189,7 @@ static inline void arch_spin_unlock_wait(arch_spinlock_t 
*lock)
 * We need to check "unlocked" in a loop, tmp.head == head
 * can be false positive because of overflow.
 */
-   if (tmp.head == (tmp.tail & ~TICKET_SLOWPATH_FLAG) ||
-   tmp.head != head)
+   if (__tickets_equal(tmp.head, tmp.tail) || tmp.head != head)
break;


Ah, it seems that "tmp.head != head" should be turned into
!__tickets_equal(), no?

Suppose that TICKET_SLOWPATH_FLAG is set after the first ACCESS_ONCE(head),
then tmp.head != head will be true before the first unlock we are waiting
for.


Good catch. othewise we would wrongly break out even when somebody
does halt wait.



And perhaps you can turn these ACCESS_ONCE into READ_ONCE as well.


Yes again :)


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T

On 02/12/2015 07:07 PM, Oleg Nesterov wrote:

On 02/12, Raghavendra K T wrote:


@@ -772,7 +773,8 @@ __visible void kvm_lock_spinning(struct arch_spinlock 
*lock, __ticket_t want)
 * check again make sure it didn't become free while
 * we weren't looking.
 */
-   if (ACCESS_ONCE(lock->tickets.head) == want) {
+   head = ACCESS_ONCE(lock->tickets.head);
+   if (__tickets_equal(head, want)) {
add_stats(TAKEN_SLOW_PICKUP, 1);


While at it, perhaps it makes sense to s/ACCESS_ONCE/READ_ONCE/ but this
is cosmetic.


yes.. will do.



We also need to change another user of enter_slow_path, xen_lock_spinning()
in arch/x86/xen/spinlock.c.



Had in mind but forgot before sending. will update resend.


Other than that looks correct at first glance... but this is up to
maintainers.



Thanks


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V3] x86 spinlock: Fix memory corruption on completing completions

2015-02-12 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg).
2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus).

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 87 -
 arch/x86/kernel/kvm.c   |  4 +-
 2 files changed, 45 insertions(+), 46 deletions(-)

potential TODO:
 * The whole patch be splitted into, 1. move slowpath flag
 2. fix memory corruption in completion problem ??
 * May be we could directly pass inc for __ticket_check_and_clear_slowpath
   but I hope current code is more readable.

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Result: 
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)

base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better) 
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

 (here is the result I got from the patches, I believe there may
 be some small overhead from xadd etc, but overall looks fine but
 a thorough test may be needed)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..9697b45 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
+   set_bit(0, (volatile unsigned long *)&lock->tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 }
 
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
+static inline int  __tickets_equal(__ticket_t one, __ticket_t two)
+{
+   return !((one ^ two) & ~TICKET_SLOWPATH_FLAG);
+}
+
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock,
+   __ticket_t head)
+{
+   if (head & TICKET_SLOWPATH_FLAG) {
+   arch_spinlock_t old, new;
+
+   old.tickets.head = head;
+   new.tickets.head = head & ~TICKET_SLOWPATH_FLAG;
+   old.tickets.tail = new.tickets.head + TICKET_LOCK_INC;
+   new.tickets.tail = old.tickets.tail;
+
+   /* try to clear slowpath flag when there are no contender

Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/11/2015 11:08 PM, Oleg Nesterov wrote:

On 02/11, Raghavendra K T wrote:


On 02/10/2015 06:56 PM, Oleg Nesterov wrote:


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1
bit overflow to tail which is unused. now either or both of head,tail
lsb bit may be set after unlock.


Sorry, can't understand... could you spell?

If TICKET_SLOWPATH_FLAG lives in .head arch_spin_unlock() could simply do

head = xadd(&lock->tickets.head, TICKET_LOCK_INC);

if (head & TICKET_SLOWPATH_FLAG)
__ticket_unlock_kick(head);

so it can't overflow to .tail?



You are right.
I totally forgot we can get rid of tail operations :)



And we we do this, probably it makes sense to add something like

bool tickets_equal(__ticket_t one, __ticket_t two)
{
return (one ^ two) & ~TICKET_SLOWPATH_FLAG;
}



Very nice idea. I was tired of ~TICKET_SLOWPATH_FLAG usage all over in
the current (complex :)) implementation. These two suggestions helps
alot.


and change kvm_lock_spinning() to use tickets_equal(tickets.head, want), plus
it can have more users in asm/spinlock.h.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-11 Thread Raghavendra K T

On 02/10/2015 06:56 PM, Oleg Nesterov wrote:

On 02/10, Raghavendra K T wrote:


On 02/10/2015 06:23 AM, Linus Torvalds wrote:


  add_smp(&lock->tickets.head, TICKET_LOCK_INC);
  if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) ..

into something like

  val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT);
  if (unlikely(val & TICKET_SLOWPATH_FLAG)) ...

would be the right thing to do. Somebody should just check that I got
that shift right, and that the tail is in the high bytes (head really
needs to be high to work, if it's in the low byte(s) the xadd would
overflow from head into tail which would be wrong).


Unfortunately xadd could result in head overflow as tail is high.

The other option was repeated cmpxchg which is bad I believe.
Any suggestions?


Stupid question... what if we simply move SLOWPATH from .tail to .head?
In this case arch_spin_unlock() could do xadd(tickets.head) and check
the result


It is a good idea. Trying this now.


In this case __ticket_check_and_clear_slowpath() really needs to cmpxchg
the whole .head_tail. Plus obviously more boring changes. This needs a
separate patch even _if_ this can work.


Correct, but apart from this, before doing xadd in unlock,
we would have to make sure lsb bit is cleared so that we can live with 1 
bit overflow to tail which is unused. now either or both of head,tail

lsb bit may be set after unlock.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-10 Thread Raghavendra K T

On 02/10/2015 06:23 AM, Linus Torvalds wrote:

On Mon, Feb 9, 2015 at 4:02 AM, Peter Zijlstra  wrote:

On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote:

So we have 3 choices,
1. xadd
2. continue with current approach.
3. a read before unlock and also after that.


For the truly paranoid we have probe_kernel_address(), suppose the lock
was in module space and the module just got unloaded under us.


That's much too expensive.

The xadd shouldn't be noticeably more expensive than the current
"add_smp()". Yes, "lock xadd" used to be several cycles slower than
just "lock add" on some early cores, but I think these days it's down
to a single-cycle difference, which is not really different from doing
a separate load after the add.

The real problem with xadd used to be that we always had to do magic
special-casing for i386, but that's one of the reasons we dropped
support for original 80386.

So I think Raghavendra's last version (which hopefully fixes the
lockup problem that Sasha reported) together with changing that


V2 did pass the stress, but getting confirmation Sasha would help.



 add_smp(&lock->tickets.head, TICKET_LOCK_INC);
 if (READ_ONCE(lock->tickets.tail) & TICKET_SLOWPATH_FLAG) ..

into something like

 val = xadd((&lock->ticket.head_tail, TICKET_LOCK_INC << TICKET_SHIFT);
 if (unlikely(val & TICKET_SLOWPATH_FLAG)) ...

would be the right thing to do. Somebody should just check that I got
that shift right, and that the tail is in the high bytes (head really
needs to be high to work, if it's in the low byte(s) the xadd would
overflow from head into tail which would be wrong).


Unfortunately xadd could result in head overflow as tail is high.

The other option was repeated cmpxchg which is bad I believe.
Any suggestions?

( I have the V3 with Oleg's suggestion and performance numbers but
without this getting resolved, It will be one unnecessary iteration).

How about getting rid off SLOW_PATH_FLAG in spinlock (i.e. use it only
 as hint for paravirt), but do unlock_kick whenever we see that
(tail-head) > TICKET_LOCK_INC?. (but this also may need cmpxchg in loop
in unlock but we will be able to get rid of clear slowpath logic)

Only problem is we may do unnecessary kicks even in 1x load.









___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

Ccing Davidlohr, (sorry that I got confused with similar address in cc
list).

On 02/09/2015 08:44 PM, Oleg Nesterov wrote:

On 02/09, Raghavendra K T wrote:


+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock->tickets);
+   diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) &&
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
+   }
+}


Can't we simplify it? We own .head, and we already know it. We only need
to clear TICKET_SLOWPATH_FLAG in .tail atomically?

IOW,

static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t 
*lock, __ticket_t head)
{
__ticket_t old_tail, new_tail;

new_tail = head + TICKET_LOCK_INC;
old_tail = new_tail | TICKET_SLOWPATH_FLAG;

if (READ_ONCE(lock->tickets.tail) == old_tail)
cmpxchg(&lock->tickets.tail, old_tail, new_tail);
}

Plus

-   __ticket_check_and_clear_slowpath(lock);
+   __ticket_check_and_clear_slowpath(lock, inc.tail);

Or I missed something?


Thanks.. Perfect, 'll update with this change. (Jeremy had hinted
similar).



And I think it would be better to avoid ifdef(CONFIG_PARAVIRT_SPINLOCKS),
ww can just do

if (TICKET_SLOWPATH_FLAG)
__ticket_check_and_clear_slowpath();



Okay.
While at it, I think current arch_spin_unlock() has similar structure
and wanted to clean it up. considering we define
TICKET_SLOWPATH_FLAG 0 or 1, I think compiler would be smart enough
to generate appropriate code and we could avoid #ifdef.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH V2] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

[Jeremy: hinted missing TICKET_LOCK_INC for kick]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)


Changes since V1:
  
  Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.
 Note: The current implementation is still based on avoid writing after unlock.
  we could still have potential invalid memory read. (Sasha)

 Patch has passed stress testing.

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..7fc50d7 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock->tickets);
+   diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) &&
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
+   }
+}
+
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
 static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
__ticket_t ticket)
@@ -59,11 +76,15 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 {
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
 {
-   return lock.tickets.head == lock.tickets.tail;
+   return lock.tickets.head == (lock.tickets.tail & ~TICKET_SLOWPATH_FLAG);
 }
 
 /*
@@ -98,7 +119,9 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+   __ticket_check_and_clear_slowpath(lock);
+out:
+   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +138,22 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the "before" copy */
-   old.tickets.head += TICKET_LOCK_INC;
-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(&lock->head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tic

Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

On 02/09/2015 05:32 PM, Peter Zijlstra wrote:

On Mon, Feb 09, 2015 at 03:04:22PM +0530, Raghavendra K T wrote:

So we have 3 choices,
1. xadd
2. continue with current approach.
3. a read before unlock and also after that.


For the truly paranoid we have probe_kernel_address(), suppose the lock
was in module space and the module just got unloaded under us.



Thanks.. Good idea, How costly is it?
atleast we could do probe_kernel_address() and check the value of
slowpath flag if people as us to address invalid read problem.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-09 Thread Raghavendra K T

On 02/09/2015 02:44 AM, Jeremy Fitzhardinge wrote:

On 02/06/2015 06:49 AM, Raghavendra K T wrote:

[...]



Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.


Yep, that seems like a sound approach.


Current approach seem to be working now. (though we could not avoid read).
Related question: Do you think we could avoid SLOWPATH_FLAG itself by
checking head and tail difference. or is it costly because it may
result in unnecessary unlock_kicks?


However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
  arch/x86/include/asm/spinlock.h | 70 -
  1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..0829f86 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
  }

+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock->tickets);


Couldn't the caller pass in the lock state that it read rather than
re-reading it?



Yes we could. do you mean we could pass additional read value apart from 
lock, (because lock will be anyway needed for cmpxchg).




+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
  #endif /* CONFIG_PARAVIRT_SPINLOCKS */

  static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
@@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };

inc = xadd(&lock->tickets, inc);
-   if (likely(inc.head == inc.tail))
+   if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG)))




good point, we can get rid of this as well.


The intent of this conditional was to be the quickest possible path when
taking a fastpath lock, with the code below being used for all slowpath
locks (free or taken). So I don't think masking out SLOWPATH_FLAG is
necessary here.


goto out;

inc.tail &= ~TICKET_SLOWPATH_FLAG;
@@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+out:
+   __ticket_check_and_clear_slowpath(lock);
+
+   barrier();  /* make sure nothing creeps before the lock is taken */


Which means that if "goto out" path is only ever used for fastpath
locks, you can limit calling __ticket_check_and_clear_slowpath() to the
slowpath case.



Yes, I ll move that call up.


  }

  static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +139,21 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
  }

-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the "before" copy */
-   old.tickets.head += TICKET_LOCK_INC;


NB (see below)


Thanks for pointing, this solved the hang issue. I
missed this exact addition.




-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(&lock->head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tickets.head);
-   }
-}
-
  static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
  {
if (TICKET_SLOWPATH_FLAG &&
-   static_key_false(¶virt_ticketlocks_enabled)) {
-   arch_spinlock_t prev;
+   static_key_false(¶virt_ticketlocks_enabled)) {
+   __ticket_t prev_head;

-   prev = *lock;
+   prev_head = lock->tickets.head;

Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/07/2015 12:27 AM, Sasha Levin wrote:

On 02/06/2015 09:49 AM, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 


With this patch, my VMs lock up quickly after boot with:


Tried to reproduce the hang myself, and there seems to be still a
barrier (or logic I miss).

Looking closely below, unlock_kick got missed though we see
that SLOWPATH_FLAG is still set:

/me goes back to look closely

(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52504) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc0edb0 in ?? ()
#5  0x in ?? ()

(gdb) p *(arch_spinlock_t *)0x88023ffe8240
$1 = {{head_tail = 3441806612, tickets = {head = 52500, tail = 52517}}}
(gdb) t 2
[Switching to thread 2 (Thread 2)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52502) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x0246 in irq_stack_union ()
#5  0x00080750 in ?? ()
#6  0x0002 in ?? ()
#7  0x0004 in irq_stack_union ()
#8  0xcd16 in nmi_print_seq ()
Cannot access memory at address 0xbfc0
(gdb) t 3
[Switching to thread 3 (Thread 3)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52512) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fc8edb0 in ?? ()
#5  0x in ?? ()

[...] //other threads with similar output

(gdb) t 8
[Switching to thread 8 (Thread 8)]
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
55  }
(gdb) bt
#0  native_halt () at ./arch/x86/include/asm/irqflags.h:55
#1  0x81037c27 in halt () at ./arch/x86/include/asm/paravirt.h:116
#2  kvm_lock_spinning (lock=0x88023ffe8240, want=52500) at 
arch/x86/kernel/kvm.c:786

#3  0x81037251 in __raw_callee_save_kvm_lock_spinning ()
#4  0x88023fdcedb0 in ?? ()
#5  0x in ?? ()


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-08 Thread Raghavendra K T

On 02/06/2015 09:55 PM, Linus Torvalds wrote:

On Fri, Feb 6, 2015 at 6:49 AM, Raghavendra K T
 wrote:

Paravirt spinlock clears slowpath flag after doing unlock.

[ fix edited out ]

So I'm not going to be applying this for 3.19, because it's much too
late and the patch is too scary. Plus the bug probably effectively
never shows up in real life (it is probably easy to trigger the
speculative *read* but probably never the actual speculative write
after dropping the lock last).



Understood and agreed.


This will need a lot of testing by the paravirt people - both
performance and correctness. So *maybe* for 3.20, but maybe for even
later, and then marked for stable, of course.

Are there any good paravirt stress-tests that people could run for
extended times?



I have been running several benchmarks (kern, sys, hack, ebizzy etc in
in 1x,2x scenarios. I run them for performance test as well.
(In the current patch I did not get kvm hang in normal run, But
overcommit reproduced it).


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] x86 spinlock: Fix memory corruption on completing completions

2015-02-06 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);


which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

However it brings additional case to be handled, viz., slowpath still
could be set when somebody does arch_trylock. Handle that too by ignoring
slowpath flag during lock availability check.

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
---
 arch/x86/include/asm/spinlock.h | 70 -
 1 file changed, 34 insertions(+), 36 deletions(-)

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..0829f86 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -49,6 +49,23 @@ static inline void __ticket_enter_slowpath(arch_spinlock_t 
*lock)
set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+   arch_spinlock_t old, new;
+   __ticket_t diff;
+
+   old.tickets = READ_ONCE(lock->tickets);
+   diff = (old.tickets.tail & ~TICKET_SLOWPATH_FLAG) - old.tickets.head;
+
+   /* try to clear slowpath flag when there are no contenders */
+   if ((old.tickets.tail & TICKET_SLOWPATH_FLAG) &&
+   (diff == TICKET_LOCK_INC)) {
+   new = old;
+   new.tickets.tail &= ~TICKET_SLOWPATH_FLAG;
+   cmpxchg(&lock->head_tail, old.head_tail, new.head_tail);
+   }
+}
+
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
 static __always_inline void __ticket_lock_spinning(arch_spinlock_t *lock,
__ticket_t ticket)
@@ -59,6 +76,10 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 {
 }
 
+static inline void __ticket_check_and_clear_slowpath(arch_spinlock_t *lock)
+{
+}
+
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
 
 static __always_inline int arch_spin_value_unlocked(arch_spinlock_t lock)
@@ -84,7 +105,7 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
register struct __raw_tickets inc = { .tail = TICKET_LOCK_INC };
 
inc = xadd(&lock->tickets, inc);
-   if (likely(inc.head == inc.tail))
+   if (likely(inc.head == (inc.tail & ~TICKET_SLOWPATH_FLAG)))
goto out;
 
inc.tail &= ~TICKET_SLOWPATH_FLAG;
@@ -98,7 +119,10 @@ static __always_inline void arch_spin_lock(arch_spinlock_t 
*lock)
} while (--count);
__ticket_lock_spinning(lock, inc.tail);
}
-out:   barrier();  /* make sure nothing creeps before the lock is taken */
+out:
+   __ticket_check_and_clear_slowpath(lock);
+
+   barrier();  /* make sure nothing creeps before the lock is taken */
 }
 
 static __always_inline int arch_spin_trylock(arch_spinlock_t *lock)
@@ -115,47 +139,21 @@ static __always_inline int 
arch_spin_trylock(arch_spinlock_t *lock)
return cmpxchg(&lock->head_tail, old.head_tail, new.head_tail) == 
old.head_tail;
 }
 
-static inline void __ticket_unlock_slowpath(arch_spinlock_t *lock,
-   arch_spinlock_t old)
-{
-   arch_spinlock_t new;
-
-   BUILD_BUG_ON(((__ticket_t)NR_CPUS) != NR_CPUS);
-
-   /* Perform the unlock on the "before" copy */
-   old.tickets.head += TICKET_LOCK_INC;
-
-   /* Clear the slowpath flag */
-   new.head_tail = old.head_tail & ~(TICKET_SLOWPATH_FLAG << TICKET_SHIFT);
-
-   /*
-* If the lock is uncontended, clear the flag - use cmpxchg in
-* case it changes behind our back though.
-*/
-   if (new.tickets.head != new.tickets.tail ||
-   cmpxchg(&lock->head_tail, old.head_tail,
-   new.head_tail) != old.head_tail) {
-   /*
-* Lock still has someone queued for it, so wake up an
-* appropriate waiter.
-*/
-   __ticket_unlock_kick(lock, old.tickets.head);
-   }
-}
-
 static __always_inline void arch_spin_unlock(arch_spinlock_t *lock)
 {
if (TICKET_SLOWPATH_FLAG &&
-   static_key_false(¶virt_ticketlocks_enabled)) {
-

Re: [Xen-devel] [PATCH v14 08/11] qspinlock, x86: Rename paravirt_ticketlocks_enabled

2015-01-21 Thread Raghavendra K T

On 01/21/2015 01:42 AM, Waiman Long wrote:

This patch renames the paravirt_ticketlocks_enabled static key to a
more generic paravirt_spinlocks_enabled name.

Signed-off-by: Waiman Long 
Signed-off-by: Peter Zijlstra 
---

Reviewed-by: Raghavendra K T 


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel