"Tom Lane" <[EMAIL PROTECTED]> wrote
>
> Not really --- that patch was intended to ensure that LWLocks don't
> unnecessarily cross two cache lines. It doesn't ensure that two
> different LWLocks aren't sharing a cache line. You could do that
> by increasing LWLOCK_PADDED_SIZE to the cache line s
"Qingqing Zhou" <[EMAIL PROTECTED]> writes:
>>> One thing we tried in February was padding out the statically defined
>>> locks with dummy lock definitions in the enum.
> Has this been done? See the LWLOCK_PADDED_SIZE macro in code.
Not really --- that patch was intended to ensure that LWLocks do
Qingqing Zhou wrote:
>
> "Bruce Momjian" wrote
> >
> > Added to TODO list.
> >
> > > One thing we tried in February was padding out the statically defined
> > > locks with dummy lock definitions in the enum. This has the effect of
> > > ensuring that the most contested locks are very definitely i
"Bruce Momjian" wrote
>
> Added to TODO list.
>
> > One thing we tried in February was padding out the statically defined
> > locks with dummy lock definitions in the enum. This has the effect of
> > ensuring that the most contested locks are very definitely in their own
> > cache line and not sh
Added to TODO list.
---
Simon Riggs wrote:
> On Wed, 2005-09-14 at 13:32 -0400, Tom Lane wrote:
> > I wrote:
> > > Another thought came to mind: maybe the current data layout for LWLocks
> > > is bad. Right now, the spinloc
On Thu, 03 Nov 2005 18:29:09 +
Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Thu, 2005-11-03 at 08:03 -0800, Mark Wong wrote:
> > On Tue, 01 Nov 2005 07:32:32 +
> > Simon Riggs <[EMAIL PROTECTED]> wrote:
> > > Concerned about the awful checkpointing. Can you bump wal_buffers to
> > > 8192 ju
On Thu, 2005-11-03 at 08:03 -0800, Mark Wong wrote:
> On Tue, 01 Nov 2005 07:32:32 +
> Simon Riggs <[EMAIL PROTECTED]> wrote:
> > Concerned about the awful checkpointing. Can you bump wal_buffers to
> > 8192 just to make sure? Thats way too high, but just to prove it.
> >
> > We need to rdeuce
On Tue, 01 Nov 2005 07:32:32 +
Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Mon, 2005-10-31 at 16:10 -0800, Mark Wong wrote:
> > On Thu, 20 Oct 2005 23:03:47 +0100
> > Simon Riggs <[EMAIL PROTECTED]> wrote:
> >
> > > On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote:
> > > > >
> > > > > This
On Mon, 2005-10-31 at 16:10 -0800, Mark Wong wrote:
> On Thu, 20 Oct 2005 23:03:47 +0100
> Simon Riggs <[EMAIL PROTECTED]> wrote:
>
> > On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote:
> > > >
> > > > This isn't exactly elegant coding, but it provides a useful improvement
> > > > on an 8-way S
On Thu, 20 Oct 2005 23:03:47 +0100
Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote:
> > >
> > > This isn't exactly elegant coding, but it provides a useful improvement
> > > on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks
> > >
On Wed, 2005-10-19 at 14:07 -0700, Mark Wong wrote:
> >
> > This isn't exactly elegant coding, but it provides a useful improvement
> > on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks
> > pretty darn stupid. But it does follow the CPU optimization handbook
> > advice and I
On Wed, Oct 12, 2005 at 03:07:23PM -0400, Emil Briggs wrote:
> > where the number of padding locks is determined by how many lock
> > structures fit within a 128 byte cache line.
> >
> > This isn't exactly elegant coding, but it provides a useful improvement
> > on an 8-way SMP box when run on 8.0
On Wed, 12 Oct 2005 18:44:50 +0100
Simon Riggs <[EMAIL PROTECTED]> wrote:
> On Wed, 2005-09-14 at 13:32 -0400, Tom Lane wrote:
> > I wrote:
> > > Another thought came to mind: maybe the current data layout for LWLocks
> > > is bad. Right now, the spinlock that protects each LWLock data struct
> >
> where the number of padding locks is determined by how many lock
> structures fit within a 128 byte cache line.
>
> This isn't exactly elegant coding, but it provides a useful improvement
> on an 8-way SMP box when run on 8.0 base. OK, lets be brutal: this looks
> pretty darn stupid. But it does
On Wed, 2005-09-14 at 13:32 -0400, Tom Lane wrote:
> I wrote:
> > Another thought came to mind: maybe the current data layout for LWLocks
> > is bad. Right now, the spinlock that protects each LWLock data struct
> > is itself part of the struct, and since the structs aren't large (circa
> > 20 byt
Is there a TODO here?
---
Tom Lane wrote:
> The test case I just posted shows that our spinlock code, which we had
> thought largely done, is once again becoming a performance bottleneck.
> It's time to resurrect some of the
The fact that cmpb isn't helping proves that getting the cache line
in a
read-only fashion does *not* do enough to protect the xchgb in this
way.
But maybe another locking instruction would. Comments?
regards, tom lane
Hi, Tom!
Possible you help next link
'Replay: Unkn
I have removed this TODO item:
* Research use of sched_yield() for spinlock acquisition failure
---
Tom Lane wrote:
> Greg Stark <[EMAIL PROTECTED]> writes:
> > Marko Kreen writes:
> > (I speculate that it's set u
On Sun, 18 Sep 2005, Jim C. Nasby wrote:
> On Sat, Sep 17, 2005 at 01:40:28AM -0400, Tom Lane wrote:
> > Gavin Sherry <[EMAIL PROTECTED]> writes:
> > > On Sat, 17 Sep 2005, Tom Lane wrote:
> > >> It'd be real interesting to see comparable numbers from some non-Linux
> > >> kernels, particularly co
On Sat, Sep 17, 2005 at 01:40:28AM -0400, Tom Lane wrote:
> Gavin Sherry <[EMAIL PROTECTED]> writes:
> > On Sat, 17 Sep 2005, Tom Lane wrote:
> >> It'd be real interesting to see comparable numbers from some non-Linux
> >> kernels, particularly commercial systems like Solaris.
>
> > Did you see th
Gavin Sherry <[EMAIL PROTECTED]> writes:
> On Sat, 17 Sep 2005, Tom Lane wrote:
>> It'd be real interesting to see comparable numbers from some non-Linux
>> kernels, particularly commercial systems like Solaris.
> Did you see the Solaris results I posted?
Are you speaking of
http://archives.postg
On Sat, 17 Sep 2005, Tom Lane wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > * Greg Stark ([EMAIL PROTECTED]) wrote:
> >> However I was under the impression that 2.6 had moved beyond that problem.
> >> It would be very interesting to know if 2.6 still suffers from this.
>
> > The tests on
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Greg Stark ([EMAIL PROTECTED]) wrote:
>> However I was under the impression that 2.6 had moved beyond that problem.
>> It would be very interesting to know if 2.6 still suffers from this.
> The tests on the em64t at my place were using 2.6.12. I had t
* Greg Stark ([EMAIL PROTECTED]) wrote:
> However I was under the impression that 2.6 had moved beyond that problem.
> It would be very interesting to know if 2.6 still suffers from this.
The tests on the em64t at my place were using 2.6.12. I had thought 2.6
was better about this too, but I don'
Josh Berkus writes:
> Tom,
>
> > What I think this means is that the kernel is scheduling the 2 processes
> > onto 2 processors chosen-at-random, without awareness of whether those
> > two processors are on the same chip (in the Xeon case) or have closer
> > NUMA affinity (in the Opteron case).
Tom,
> What I think this means is that the kernel is scheduling the 2 processes
> onto 2 processors chosen-at-random, without awareness of whether those
> two processors are on the same chip (in the Xeon case) or have closer
> NUMA affinity (in the Opteron case).
That would be consistent with my
I thought I'd throw SPARC into the equation (SPARC IIIi, in a dual SunFire
v250):
vanilla HEAD from ~1 week ago:
bash-3.00$ for i in 1 2 4; do time ./nrun.sh $i; done
real1m49.037s
user0m0.008s
sys 0m0.016s
real2m3.482s
user0m0.014s
sys 0m0.026s
real3m54.215s
user
Gregory Maxwell <[EMAIL PROTECTED]> writes:
> If I had to guess I might say that the 64byte alignment is removing
> much of the unneeded line bouncing in the the two process case but is
> at the same time creating more risk of bouncing caused by aliasing.
It's an idea ... not sure if it's right or
On 9/15/05, Tom Lane <[EMAIL PROTECTED]> wrote:
> Yesterday's CVS tip:
> 1 32s 2 46s 4 88s 8 168s
> plus no-cmpb and spindelay2:
> 1 32s 2 48s 4 100s 8 177s
> plus just-committed code to pad LWLock to 32:
> 1 33s 2 50s 4 98s 8 179s
> alter to pad to 64:
>
Gavin Sherry <[EMAIL PROTECTED]> writes:
> Interesting. On Xeon (2 phys, 4 log), with LWLock padded to 64 bytes and
> the cmpb/jump removed I get:
> [ 1 55s 2 69s 4 128s ]
> This compares to the following, which is unpadded but has cmpb/jump
> removed but is otherwise vanilla:
> 1: 55: 2: 1
Gregory Maxwell <[EMAIL PROTECTED]> writes:
> might be useful to align the structure so it always crosses two lines
> and measure the performance difference.. the delta could be basically
> attributed to the cache line bouncing since even one additional bounce
> would overwhelm the other performanc
On Thu, 15 Sep 2005, Tom Lane wrote:
> Gavin Sherry <[EMAIL PROTECTED]> writes:
> > What about padding the LWLock to 64 bytes on these architectures. Both P4
> > and Opteron have 64 byte cache lines, IIRC. This would ensure that a
> > cacheline doesn't hold two LWLocks.
>
> I tried that first, act
Gavin Sherry <[EMAIL PROTECTED]> writes:
> What about padding the LWLock to 64 bytes on these architectures. Both P4
> and Opteron have 64 byte cache lines, IIRC. This would ensure that a
> cacheline doesn't hold two LWLocks.
I tried that first, actually, but it was a net loss. I guess enlarging
> Tom Lane wrote
> I'm going to go ahead and make that change, since it doesn't
> seem likely
> to have any downside. It might be interesting to look into forcing
> proper alignment of the shared buffer headers as well.
Just catching up on your mails - all of that sounds good so far.
Everything
On Thu, 15 Sep 2005, Tom Lane wrote:
> One thing that did seem to help a little bit was padding the LWLocks
> to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring
> the array starts on a 32-byte boundary. This ensures that we won't have
> any LWLocks crossing cache lines --- con
I wrote:
> I guess what this means is that there's no real problem with losing the
> cache line while manipulating the LWLock, which is what the patch was
> intended to prevent. Instead, we're paying for swapping two cache lines
> (the spinlock and the LWLock) across processors instead of just one
Tom Lane wrote:
I guess what this means is that there's no real problem with losing the
cache line while manipulating the LWLock, which is what the patch was
intended to prevent. Instead, we're paying for swapping two cache lines
(the spinlock and the LWLock) across processors instead of just on
Tom Lane wrote:
I wrote:
We could ameliorate this if there were a way to acquire ownership of the
cache line without necessarily winning the spinlock. I'm imagining
that we insert a "dummy" locked instruction just ahead of the xchgb,
which touches the spinlock in such a way as to not change i
I wrote:
> Another thought came to mind: maybe the current data layout for LWLocks
> is bad. Right now, the spinlock that protects each LWLock data struct
> is itself part of the struct, and since the structs aren't large (circa
> 20 bytes), the whole thing is usually all in the same cache line. .
I wrote:
> We could ameliorate this if there were a way to acquire ownership of the
> cache line without necessarily winning the spinlock. I'm imagining
> that we insert a "dummy" locked instruction just ahead of the xchgb,
> which touches the spinlock in such a way as to not change its state.
I
Tom, et al.,
Updated, with full recompiles between everything and the new
modification:
N, runtime:
Tip:1 31s 2 37s 4 86s 8 159s
no-cmpb:1 32s 2 43s 4 83s 8 168s
spin: 1 32s 2 51s 4 84s 8 160s
spin+mod: 1 32s
On Tue, Sep 13, 2005 at 08:14:47PM -0400, Stephen Frost wrote:
> I suppose another option would be to provide seperate packages... Could
> this be done as a shared library so it's more 'plug-and-play' to switch
> between the two? I dunno, just trying to think about how to deal with
> this without
Tom Lane wrote:
"Michael Paesold" <[EMAIL PROTECTED]> writes:
To have other data, I have retested the patches on a single-cpu Intel P4
3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4
dual-Xeon
results it's clear that this is in reality only one cpu. While the
runtime
for N=1
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
> > I'm starting to think that we might have to succumb to having a compile
> > option "optimize for multiprocessor" or "optimize for single processor".
> > It's pretty hard to see how we'd alter a data structure dec
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Tom Lane
> Sent: 13 September 2005 23:03
> To: Marko Kreen
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Spinlocks, yet again: analysis and
> proposed patch
Tom Lane wrote:
But the cmpb instruction in the 8.0 version of TAS would have done that,
and I think we've already established that the cmpb is a loss on most
machines (except maybe single-physical-CPU Xeons).
Note that this was a regular Pentium 4 system, not a Xeon.
Best Regards,
Michael Pa
On Tue, 13 Sep 2005 Tom Lane wrote :
> "Min Xu (Hsu)" <[EMAIL PROTECTED]> writes:
> > ... As you said, however, experimental results
> > shows fetching read-only lines didn't help, which led me wonder whether the
> > second scenario your described was really happening.
>
> I don't know --- we have
"Min Xu (Hsu)" <[EMAIL PROTECTED]> writes:
> ... As you said, however, experimental results
> shows fetching read-only lines didn't help, which led me wonder whether the
> second scenario your described was really happening.
I don't know --- we haven't tried it. I do intend to work up some
patche
On Tue, 13 Sep 2005 Tom Lane wrote :
> "Min Xu (Hsu)" <[EMAIL PROTECTED]> writes:
> > ...If this were the case, perhaps first fetch the spin lock with read-only
> > permission should have helped.
>
> But the cmpb instruction in the 8.0 version of TAS would have done that,
> and I think we've alrea
Stephen Frost <[EMAIL PROTECTED]> writes:
> I suspect distributors would go for the multi-cpu setup (especially if
> a uniprocessor build is *broken* for multiprocessor) and then in a
> lot of cases you end up not actually getting any benefit. I'm afraid
> you'd also end up having to tell alot of
"Min Xu (Hsu)" <[EMAIL PROTECTED]> writes:
> ...If this were the case, perhaps first fetch the spin lock with read-only
> permission should have helped.
But the cmpb instruction in the 8.0 version of TAS would have done that,
and I think we've already established that the cmpb is a loss on most
ma
* Gavin Sherry ([EMAIL PROTECTED]) wrote:
> It does make it painful for distribution/package maintainers but I think
> the potential benefits for single/multi-CPU architectures are high. It
> means that our lock intrinsic on uniprocessors can just be a lock/delay
> loop without any spinning -- whic
On Tue, 13 Sep 2005, Stephen Frost wrote:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
> > I'm starting to think that we might have to succumb to having a compile
> > option "optimize for multiprocessor" or "optimize for single processor".
> > It's pretty hard to see how we'd alter a data structure dec
* Tom Lane ([EMAIL PROTECTED]) wrote:
> I'm starting to think that we might have to succumb to having a compile
> option "optimize for multiprocessor" or "optimize for single processor".
> It's pretty hard to see how we'd alter a data structure decision like
> this on the fly.
I'd really hate to s
On Tue, 13 Sep 2005 Tom Lane wrote :
> I wrote:
> > We could ameliorate this if there were a way to acquire ownership of the
> > cache line without necessarily winning the spinlock.
>
> Another thought came to mind: maybe the current data layout for LWLocks
> is bad. Right now, the spinlock that
I wrote:
> We could ameliorate this if there were a way to acquire ownership of the
> cache line without necessarily winning the spinlock.
Another thought came to mind: maybe the current data layout for LWLocks
is bad. Right now, the spinlock that protects each LWLock data struct
is itself part o
Marko Kreen writes:
> Hmm. I guess this could be separated into 2 cases:
> 1. Light load - both lock owner and lock requester wont get
>scheduled while busy (owner in critical section, waiter
>spinning.)
> 2. Big load - either or both of them gets scheduled while busy.
>(waiter is sc
On Tue, Sep 13, 2005 at 10:10:13AM -0400, Tom Lane wrote:
> Marko Kreen writes:
> > On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
> >> However, given that we are only expecting
> >> the spinlock to be held for a couple dozen instructions, using the
> >> kernel futex mechanism is huge o
"Michael Paesold" <[EMAIL PROTECTED]> writes:
> To have other data, I have retested the patches on a single-cpu Intel P4
> 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 dual-Xeon
> results it's clear that this is in reality only one cpu. While the runtime
> for N=1 is better t
* Tom Lane ([EMAIL PROTECTED]) wrote:
> I'm feeling even more disenchanted with sched_yield now that Marko
> pointed out that the behavior was changed recently. Here we have a
To be fair, I'm not entirely sure 'recently' is quite the right word.
It sounds like it changed during the 2.5 developmen
Douglas McNaught <[EMAIL PROTECTED]> writes:
> Greg Stark <[EMAIL PROTECTED]> writes:
>> What Tom found was that some processes are never scheduled when sched_yield
>> is
>> called. There's no reason that should be happening.
> Yeah, that would probably be a bug...
I suspect the kernel hackers m
On Tue, 13 Sep 2005 12:21:45 -0400
Douglas McNaught <[EMAIL PROTECTED]> wrote:
> Josh Berkus writes:
>
> > Tom, All:
> >
> >> It seems to me what you've found is an outright bug in the linux scheduler.
> >> Perhaps posting it to linux-kernel would be worthwhile.
> >
> > For people using this on
Greg Stark <[EMAIL PROTECTED]> writes:
> What Tom found was that some processes are never scheduled when sched_yield is
> called. There's no reason that should be happening.
Yeah, that would probably be a bug...
-Doug
---(end of broadcast)---
TIP
Douglas McNaught <[EMAIL PROTECTED]> writes:
> > It seems to me what you've found is an outright bug in the linux scheduler.
> > Perhaps posting it to linux-kernel would be worthwhile.
>
> People have complained on l-k several times about the 2.6
> sched_yield() behavior; the response has basical
Greg Stark <[EMAIL PROTECTED]> writes:
> Tom Lane <[EMAIL PROTECTED]> writes:
>
>> No; that page still says specifically "So a process calling
>> sched_yield() now must wait until all other runnable processes in the
>> system have used up their time slices before it will get the processor
>> again
Josh Berkus writes:
> Tom, All:
>
>> It seems to me what you've found is an outright bug in the linux scheduler.
>> Perhaps posting it to linux-kernel would be worthwhile.
>
> For people using this on Linux 2.6, which scheduler are you using? Deadline
> is the recommended one for databases, an
Tom, All:
> It seems to me what you've found is an outright bug in the linux scheduler.
> Perhaps posting it to linux-kernel would be worthwhile.
For people using this on Linux 2.6, which scheduler are you using? Deadline
is the recommended one for databases, and does offer significant (+5-8%)
Tom Lane <[EMAIL PROTECTED]> writes:
> > On contented case you'll want task switch anyway, so the futex
> > managing should not matter.
>
> No, we DON'T want a task switch. That's the entire point: in a
> multiprocessor, it's a good bet that the spinlock is held by a task
> running on another p
Greg Stark <[EMAIL PROTECTED]> writes:
> Marko Kreen writes:
> (I speculate that it's set up to only yield the processor to other
> processes already affiliated to that processor. In any case, it
> is definitely capable of getting through 1 yields without
> running the guy who's holding the s
Marko Kreen writes:
> On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
> > The second reason that the futex patch is helping is that when
> > a spinlock delay does occur, it allows the delaying process to be
> > awoken almost immediately, rather than delaying 10 msec or more
> > as the e
Marko Kreen writes:
> On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
>> However, given that we are only expecting
>> the spinlock to be held for a couple dozen instructions, using the
>> kernel futex mechanism is huge overkill --- the in-kernel overhead
>> to manage the futex state is a
Marko Kreen writes:
> > (I speculate that it's set up to only yield the processor to other
> > processes already affiliated to that processor. In any case, it
> > is definitely capable of getting through 1 yields without
> > running the guy who's holding the spinlock.)
Maybe it should try s
I wrote:
I'll do tomorrow morning (CEST, i.e. in about 11 hours).
These are the tests with the change:
if ((--spins % MAX_SPINS_PER_DELAY) == 0)
to
if (--spins == 0)
I have called the resulting patch (spin-delay + this change) spin-delay-2.
again with only slock-no-cmpb applied
1: 55s 4
On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
> The second reason that the futex patch is helping is that when
> a spinlock delay does occur, it allows the delaying process to be
> awoken almost immediately, rather than delaying 10 msec or more
> as the existing code does. However, giv
I wrote:
> ... and see how the patch does that way?
BTW, please do look at "vmstat 1" while running the test case
corresponding to your number of processors. It's hard to tell from the
runtime alone whether the patch is fully accomplishing its goal of
reducing wasted cycles. If you see user CPU
Tom Lane wrote:
I probably should have broken down the spindelay patch into multiple
components. But it's only a small change --- could you try simplifying
the patched line
if ((--spins % MAX_SPINS_PER_DELAY) == 0)
to
if (--spins == 0)
and see how the patch does that way?
I'll do tomorrow
"Michael Paesold" <[EMAIL PROTECTED]> writes:
> It seems to me the slock-no-cmpb is a win in any case. The spin-delay patch
> does not really help much on this machine. That seems to match Stephen
> Frost's results with EM64T, if I read them correctly.
Yeah, it's interesting that you both see sl
Tom Lane wrote:
Comments and testing invited.
I have tested the patches on a Dual Xeon 2,4 GHz w/ HT (no EM64T).
(Configured with
"CFLAGS='-O2 -mcpu=pentium4 -march=pentium4' --enable-casserts").
The results were pretty stable (around .2 seconds). I would not trust the
numbers for N=2, linux
Tom Lane wrote:
I attach two proposed patches: the first removes the cmpb/jne from
the x86 TAS assembly code, and the second one does the s_lock changes
enumerated as points #2, #3, #4. The first one in particular needs
more testing to see if it hurts performance on any non-Opteron x86
chips.
"Qingqing Zhou" <[EMAIL PROTECTED]> writes:
> Some changes are based on tests and heuristics, so can we make them into the
> configure script so the choice could be made automatically?
It's a bit premature to propose that, when we don't yet know if the
suggested changes are a win or loss under an
* Stephen Frost ([EMAIL PROTECTED]) wrote:
> > Thanks. If you've got the time, could you try the two patches
> > separately and see what you get?
>
> Sure.
[...]
Just to be clear- this was from a completely default 'make install'
using the Debian configure options from 8.0.3 (which aren't that
Tom Lane <[EMAIL PROTECTED]> writes:
> > Something else to consider is the OS you're using. I've been
> > told that Linux isn't that good in NUMA and FreeBSD might be
> > better.
>
> It's hard to see how the OS could affect behavior at the level of
> processor cache operations --- unless they d
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > * Tom Lane ([EMAIL PROTECTED]) wrote:
> >> Er, which (or both) of the two patches did you apply here?
>
> > Applied both, sorry that wasn't clear.
>
> Thanks. If you've got the time, could you try the two patch
"Tom Lane" <[EMAIL PROTECTED]> wrote
>
> My proposal therefore is to do #2, #3, and #4, and to modify the TAS
> assembly code at least on x86_64. Together, these changes bring
> my test case on a 4-way Opteron from
>
Some changes are based on tests and heuristics, so can we make them into the
c
Stephen Frost <[EMAIL PROTECTED]> writes:
> * Tom Lane ([EMAIL PROTECTED]) wrote:
>> Er, which (or both) of the two patches did you apply here?
> Applied both, sorry that wasn't clear.
Thanks. If you've got the time, could you try the two patches
separately and see what you get?
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Stephen Frost <[EMAIL PROTECTED]> writes:
> > em64t, 2 proc + 2 HT, 3.4ghz, 4G, 2.6.12:
>
> > N, runtime: 1 31s 2 47s 4 86s 8 159s
>
> > N, runtime: 1 32s 2 53s 4 90s 8 169s
>
> Er, which (or both) of the two patches did you apply here?
Applie
Stephen Frost <[EMAIL PROTECTED]> writes:
> em64t, 2 proc + 2 HT, 3.4ghz, 4G, 2.6.12:
> N, runtime: 1 31s 2 47s 4 86s 8 159s
> N, runtime: 1 32s 2 53s 4 90s 8 169s
Er, which (or both) of the two patches did you apply here?
regards, tom lane
-
* Tom Lane ([EMAIL PROTECTED]) wrote:
> My proposal therefore is to do #2, #3, and #4, and to modify the TAS
> assembly code at least on x86_64. Together, these changes bring
> my test case on a 4-way Opteron from
>
> N, runtime: 1 36s 2 61s 4 105s 8 198s
em64t, 2 proc + 2 HT, 3.4ghz, 4G,
* Tom Lane ([EMAIL PROTECTED]) wrote:
> Kurt Roeckx <[EMAIL PROTECTED]> writes:
> > On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
> >> I kinda suspect that the cmpb test is a no-op or loss on all
> >> Intelish processors:
>
> > I think an important question is wether this is for x86_64
Kurt Roeckx <[EMAIL PROTECTED]> writes:
> On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
>> I kinda suspect that the cmpb test is a no-op or loss on all
>> Intelish processors:
> I think an important question is wether this is for x86_64 in
> general, of opteron specific. It could be t
On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote:
>
> I kinda suspect that the cmpb test is a no-op or loss on all
> Intelish processors: it can only be a win if you expect a lot
> of contention for the spin lock, but in percentage terms we still
> have a very low conflict rate, so in most
The test case I just posted shows that our spinlock code, which we had
thought largely done, is once again becoming a performance bottleneck.
It's time to resurrect some of the ideas we kicked around in early
2002, and didn't pursue because we decided spinlocks weren't our major
performance problem
92 matches
Mail list logo