On Sat, 2012-12-29 at 02:27 -0800, Michel Lespinasse wrote:
> On Wed, Dec 26, 2012 at 11:10 AM, Eric Dumazet wrote:
> > I did some tests with your patches with following configuration :
> >
> > tc qdisc add dev eth0 root htb r2q 1000 default 3
> > (to force a contention on qdisc lock, even with a
On Thu, 2013-01-03 at 11:45 -0500, Steven Rostedt wrote:
> On Thu, 2013-01-03 at 08:10 -0800, Eric Dumazet wrote:
>
> > > But then would the problem even exist? If the lock is on its own cache
> > > line, it shouldn't cause a performance issue if other CPUs are spinning
> > > on it. Would it?
> >
On Thu, 2013-01-03 at 08:10 -0800, Eric Dumazet wrote:
> > But then would the problem even exist? If the lock is on its own cache
> > line, it shouldn't cause a performance issue if other CPUs are spinning
> > on it. Would it?
>
> Not sure I understand the question.
>
I'll explain my question b
On Thu, 2013-01-03 at 10:32 -0500, Steven Rostedt wrote:
> On Thu, 2013-01-03 at 05:35 -0800, Eric Dumazet wrote:
> > On Thu, 2013-01-03 at 08:24 -0500, Steven Rostedt wrote:
> > > On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote:
> > >
> > > > > How much bus traffic do monitor/mwait cause beh
On Thu, 2013-01-03 at 05:35 -0800, Eric Dumazet wrote:
> On Thu, 2013-01-03 at 08:24 -0500, Steven Rostedt wrote:
> > On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote:
> >
> > > > How much bus traffic do monitor/mwait cause behind the scenes?
> > >
> > > I would suppose that this just snoops
On Thu, 2013-01-03 at 08:24 -0500, Steven Rostedt wrote:
> On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote:
>
> > > How much bus traffic do monitor/mwait cause behind the scenes?
> >
> > I would suppose that this just snoops the bus for writes, but the
> > amount of bus traffic involved in t
On Thu, 2013-01-03 at 09:05 +, Jan Beulich wrote:
> > How much bus traffic do monitor/mwait cause behind the scenes?
>
> I would suppose that this just snoops the bus for writes, but the
> amount of bus traffic involved in this isn't explicitly documented.
>
> One downside of course is that
>>> On 27.12.12 at 20:09, Rik van Riel wrote:
> On 12/27/2012 01:41 PM, Jan Beulich wrote:
> Rik van Riel 12/27/12 4:01 PM >>>
>>> On 12/27/2012 09:27 AM, Eric Dumazet wrote:
So the hash sounds good to me, because the hash key could mix both lock
address and caller IP ( __builtin_re
On Wed, Dec 26, 2012 at 11:10 AM, Eric Dumazet wrote:
> I did some tests with your patches with following configuration :
>
> tc qdisc add dev eth0 root htb r2q 1000 default 3
> (to force a contention on qdisc lock, even with a multi queue net
> device)
>
> and 24 concurrent "netperf -t UDP_STREAM
On Thu, 2012-12-27 at 14:31 -0500, Rik van Riel wrote:
> to use a bigger/smaller one.
>
> I guess we want a larger value.
>
> With your hashed lock approach, we can get away with
> larger values - they will not penalize other locks
> the same way a single value per cpu might have.
Then, we absol
On 12/27/2012 01:49 PM, Eric Dumazet wrote:
On Thu, 2012-12-27 at 09:35 -0500, Rik van Riel wrote:
The lock acquisition time depends on the holder of the lock,
and what the CPUs ahead of us in line will do with the lock,
not on the caller IP of the spinner.
That would be true only for genera
On 12/27/2012 01:41 PM, Jan Beulich wrote:
Rik van Riel 12/27/12 4:01 PM >>>
On 12/27/2012 09:27 AM, Eric Dumazet wrote:
So the hash sounds good to me, because the hash key could mix both lock
address and caller IP ( __builtin_return_address(1) in
ticket_spin_lock_wait())
The lock acquisitio
On Thu, 2012-12-27 at 09:35 -0500, Rik van Riel wrote:
>
> The lock acquisition time depends on the holder of the lock,
> and what the CPUs ahead of us in line will do with the lock,
> not on the caller IP of the spinner.
>
That would be true only for general cases.
In network land, we do have
>>> Rik van Riel 12/27/12 4:01 PM >>>
>On 12/27/2012 09:27 AM, Eric Dumazet wrote:
>> So the hash sounds good to me, because the hash key could mix both lock
>> address and caller IP ( __builtin_return_address(1) in
>> ticket_spin_lock_wait())
>
>The lock acquisition time depends on the holder of
On 12/27/2012 09:27 AM, Eric Dumazet wrote:
On Wed, 2012-12-26 at 22:07 -0800, Michel Lespinasse wrote:
If we go with per-spinlock tunings, I feel we'll most likely want to
add an associative cache in order to avoid the 1/16 chance (~6%) of
getting 595Mbit/s instead of 982Mbit/s when there is a
On Wed, 2012-12-26 at 22:07 -0800, Michel Lespinasse wrote:
> If we go with per-spinlock tunings, I feel we'll most likely want to
> add an associative cache in order to avoid the 1/16 chance (~6%) of
> getting 595Mbit/s instead of 982Mbit/s when there is a hash collision.
>
> I would still prefe
On Wed, Dec 26, 2012 at 11:51 AM, Rik van Riel wrote:
> On 12/26/2012 02:10 PM, Eric Dumazet wrote:
>> We might try to use a hash on lock address, and an array of 16 different
>> delays so that different spinlocks have a chance of not sharing the same
>> delay.
>>
>> With following patch, I get 98
On 12/26/2012 02:10 PM, Eric Dumazet wrote:
I did some tests with your patches with following configuration :
tc qdisc add dev eth0 root htb r2q 1000 default 3
(to force a contention on qdisc lock, even with a multi queue net
device)
and 24 concurrent "netperf -t UDP_STREAM -H other_machine --
On Wed, 2012-12-26 at 11:10 -0800, Eric Dumazet wrote:
> +#define DELAY_HASH_SHIFT 4
> +DEFINE_PER_CPU(int [1 << DELAY_HASH_SHIFT], spinlock_delay) = {
> + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY,
> + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY,
> + MIN_SPINLOCK_DELAY, MIN_SPINLOCK_DELAY,
>
On Fri, 2012-12-21 at 22:50 -0500, Rik van Riel wrote:
> I will try to run this test on a really large SMP system
> in the lab during the break.
>
> Ideally, the auto-tuning will keep the delay value large
> enough that performance will stay flat even when there are
> 100 CPUs contending over the
On 12/21/2012 10:33 PM, Steven Rostedt wrote:
On Fri, Dec 21, 2012 at 06:56:13PM -0500, Rik van Riel wrote:
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index 4e44840..e44c56f 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -113,19 +113,62 @@ static atomic_t stopp
On 12/21/2012 10:29 PM, Eric Dumazet wrote:
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
+ int *delay_ptr = &per_cpu(spinlock_delay, smp_processor_id());
+ int delay = *delay_ptr;
int delay = __this_cpu_read(spinlock_delay);
}
+ *delay_ptr = delay;
__thi
On Fri, Dec 21, 2012 at 06:56:13PM -0500, Rik van Riel wrote:
> diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
> index 4e44840..e44c56f 100644
> --- a/arch/x86/kernel/smp.c
> +++ b/arch/x86/kernel/smp.c
> @@ -113,19 +113,62 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
> static bo
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
> + int *delay_ptr = &per_cpu(spinlock_delay, smp_processor_id());
> + int delay = *delay_ptr;
int delay = __this_cpu_read(spinlock_delay);
> }
> + *delay_ptr = delay;
__this_cpu_write(spinlock_delay, delay);
--
To unsubs
On 12/21/2012 07:18 PM, Eric Dumazet wrote:
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...
+
+ /*
+
On 12/21/2012 07:48 PM, Eric Dumazet wrote:
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...
Please let me know if you can break
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
> Argh, the first one had a typo in it that did not influence
> performance with fewer threads running, but that made things
> worse with more than a dozen threads...
>
> Please let me know if you can break these patches.
> ---8<---
> Subject:
On Fri, 2012-12-21 at 18:56 -0500, Rik van Riel wrote:
> Argh, the first one had a typo in it that did not influence
> performance with fewer threads running, but that made things
> worse with more than a dozen threads...
> +
> + /*
> + * The lock is still busy, the delay
Argh, the first one had a typo in it that did not influence
performance with fewer threads running, but that made things
worse with more than a dozen threads...
Please let me know if you can break these patches.
---8<---
Subject: x86,smp: auto tune spinlock backoff delay factor
Many spinlocks are
29 matches
Mail list logo