Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Marko Kreen
On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: The second reason that the futex patch is helping is that when a spinlock delay does occur, it allows the delaying process to be awoken almost immediately, rather than delaying 10 msec or more as the existing code does. However, given

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Michael Paesold
I wrote: I'll do tomorrow morning (CEST, i.e. in about 11 hours). These are the tests with the change: if ((--spins % MAX_SPINS_PER_DELAY) == 0) to if (--spins == 0) I have called the resulting patch (spin-delay + this change) spin-delay-2. again with only slock-no-cmpb applied 1: 55s

[HACKERS] Hard drive failure leads to corrupt db

2005-09-13 Thread Brusser, Michael
Our customer reported a problem resulting from the hard drive failure. Database server would not start, generating this message: PANIC: The database cluster was initialized with LC_COLLATE 'en_US.ISO8859-1', which is not recognized by setlocale(). It looks like you need to initdb. They are

Re: [HACKERS] Hard drive failure leads to corrupt db

2005-09-13 Thread Brusser, Michael
Just occurred to me: perhaps we don't have a database corruption, instead after replacement of the boot drive the locale on the host changed from en_US.ISO8859-1 to 'C' Still I am not sure what to do. Is changing the locale back to en_US.ISO8859-1 the right thing to do now? Mike. -Original

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Greg Stark
Marko Kreen marko@l-t.ee writes: (I speculate that it's set up to only yield the processor to other processes already affiliated to that processor. In any case, it is definitely capable of getting through 1 yields without running the guy who's holding the spinlock.) Maybe it should

Re: [HACKERS] Hard drive failure leads to corrupt db

2005-09-13 Thread Peter Eisentraut
Brusser, Michael wrote: Our customer reported a problem resulting from the hard drive failure. Database server would not start, generating this message: PANIC: The database cluster was initialized with LC_COLLATE 'en_US.ISO8859-1', which is not recognized by setlocale(). The issue is

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Marko Kreen marko@l-t.ee writes: On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: However, given that we are only expecting the spinlock to be held for a couple dozen instructions, using the kernel futex mechanism is huge overkill --- the in-kernel overhead to manage the futex state

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Greg Stark
Marko Kreen marko@l-t.ee writes: On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: The second reason that the futex patch is helping is that when a spinlock delay does occur, it allows the delaying process to be awoken almost immediately, rather than delaying 10 msec or more as

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes: Marko Kreen marko@l-t.ee writes: (I speculate that it's set up to only yield the processor to other processes already affiliated to that processor. In any case, it is definitely capable of getting through 1 yields without running the guy who's holding

[HACKERS] bug #1702: nested composite types in plpgsql

2005-09-13 Thread Roman Neuhauser
Hello, I'm getting this error with the code below (on 8.0.3 like the other guy in #1702. Is this a hard problem to fix? Looking at src/pl/plpgsql/src/pl_exec.c for the first time, is it a problem of make_tuple_from_row() not accounting for nested composite types? test=# SELECT takes_ct2parts(1,

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: On contented case you'll want task switch anyway, so the futex managing should not matter. No, we DON'T want a task switch. That's the entire point: in a multiprocessor, it's a good bet that the spinlock is held by a task running on another processor,

Re: [HACKERS] Hard drive failure leads to corrupt db

2005-09-13 Thread Martijn van Oosterhout
Your problem is that your database was initialised with locale 'en_US.ISO8859-1' but your system no longer recognises it. You need to create the locale somehow. On Linux it's /etc/locale.gen but you should probably search the locale manpage for how to do it on Solaris. Changing the locale

Re: [HACKERS] Hard drive failure leads to corrupt db

2005-09-13 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: Brusser, Michael wrote: Our customer reported a problem resulting from the hard drive failure. Database server would not start, generating this message: PANIC: The database cluster was initialized with LC_COLLATE 'en_US.ISO8859-1', which is not

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Josh Berkus
Tom, All: It seems to me what you've found is an outright bug in the linux scheduler. Perhaps posting it to linux-kernel would be worthwhile. For people using this on Linux 2.6, which scheduler are you using? Deadline is the recommended one for databases, and does offer significant (+5-8%)

Re: [HACKERS] bug #1702: nested composite types in plpgsql

2005-09-13 Thread Tom Lane
Roman Neuhauser [EMAIL PROTECTED] writes: Looking at src/pl/plpgsql/src/pl_exec.c for the first time, is it a problem of make_tuple_from_row() not accounting for nested composite types? Looks that way. I've committed a fix to HEAD. I'm not sure how hard it'd be to fix 8.0.

Re: [HACKERS] Materialized Views in PostgreSQL

2005-09-13 Thread Josh Berkus
Dann, http://research.csc.ncsu.edu/selftune/ I think someone is working on merging this stuff back into the PostgreSQL core.  Not sure what the current status is. I was in theory working on testing and documentation. Others (better coders than me) are welcome to take it over, though. --

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Douglas McNaught
Josh Berkus josh@agliodbs.com writes: Tom, All: It seems to me what you've found is an outright bug in the linux scheduler. Perhaps posting it to linux-kernel would be worthwhile. For people using this on Linux 2.6, which scheduler are you using? Deadline is the recommended one for

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Douglas McNaught
Greg Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: No; that page still says specifically So a process calling sched_yield() now must wait until all other runnable processes in the system have used up their time slices before it will get the processor again. I can prove

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Greg Stark
Douglas McNaught [EMAIL PROTECTED] writes: It seems to me what you've found is an outright bug in the linux scheduler. Perhaps posting it to linux-kernel would be worthwhile. People have complained on l-k several times about the 2.6 sched_yield() behavior; the response has basically been

[HACKERS] postgresql CVS callgraph data from dbt2

2005-09-13 Thread Mark Wong
Hi everyone, For those of you watching the the daily results generated from STP (http://developer.osdl.org/markw/postgrescvs/dbt2/) I have callgraph data from oprofile collected starting from the Sept 9 results. Here is an example of what it looks like:

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Douglas McNaught
Greg Stark [EMAIL PROTECTED] writes: What Tom found was that some processes are never scheduled when sched_yield is called. There's no reason that should be happening. Yeah, that would probably be a bug... -Doug ---(end of broadcast)--- TIP 4:

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Mark Wong
On Tue, 13 Sep 2005 12:21:45 -0400 Douglas McNaught [EMAIL PROTECTED] wrote: Josh Berkus josh@agliodbs.com writes: Tom, All: It seems to me what you've found is an outright bug in the linux scheduler. Perhaps posting it to linux-kernel would be worthwhile. For people using this on

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Douglas McNaught [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: What Tom found was that some processes are never scheduled when sched_yield is called. There's no reason that should be happening. Yeah, that would probably be a bug... I suspect the kernel hackers might

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: I'm feeling even more disenchanted with sched_yield now that Marko pointed out that the behavior was changed recently. Here we have a To be fair, I'm not entirely sure 'recently' is quite the right word. It sounds like it changed during the 2.5 development

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Michael Paesold [EMAIL PROTECTED] writes: To have other data, I have retested the patches on a single-cpu Intel P4 3GHz w/ HT (i.e. 2 virtual cpus), no EM64T. Comparing to the 2,4 dual-Xeon results it's clear that this is in reality only one cpu. While the runtime for N=1 is better than the

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Marko Kreen
On Tue, Sep 13, 2005 at 10:10:13AM -0400, Tom Lane wrote: Marko Kreen marko@l-t.ee writes: On Sun, Sep 11, 2005 at 05:59:49PM -0400, Tom Lane wrote: However, given that we are only expecting the spinlock to be held for a couple dozen instructions, using the kernel futex mechanism is huge

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Marko Kreen marko@l-t.ee writes: Hmm. I guess this could be separated into 2 cases: 1. Light load - both lock owner and lock requester wont get scheduled while busy (owner in critical section, waiter spinning.) 2. Big load - either or both of them gets scheduled while busy.

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
I wrote: We could ameliorate this if there were a way to acquire ownership of the cache line without necessarily winning the spinlock. Another thought came to mind: maybe the current data layout for LWLocks is bad. Right now, the spinlock that protects each LWLock data struct is itself part of

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Min Xu (Hsu)
On Tue, 13 Sep 2005 Tom Lane wrote : I wrote: We could ameliorate this if there were a way to acquire ownership of the cache line without necessarily winning the spinlock. Another thought came to mind: maybe the current data layout for LWLocks is bad. Right now, the spinlock that

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote: I'm starting to think that we might have to succumb to having a compile option optimize for multiprocessor or optimize for single processor. It's pretty hard to see how we'd alter a data structure decision like this on the fly. I'd really hate to see this

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Gavin Sherry
On Tue, 13 Sep 2005, Stephen Frost wrote: * Tom Lane ([EMAIL PROTECTED]) wrote: I'm starting to think that we might have to succumb to having a compile option optimize for multiprocessor or optimize for single processor. It's pretty hard to see how we'd alter a data structure decision like

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Stephen Frost
* Gavin Sherry ([EMAIL PROTECTED]) wrote: It does make it painful for distribution/package maintainers but I think the potential benefits for single/multi-CPU architectures are high. It means that our lock intrinsic on uniprocessors can just be a lock/delay loop without any spinning -- which

[HACKERS] About method of PostgreSQL's Optimizer

2005-09-13 Thread Pryscila B Guttoski
Hello all! On my master course, I'm studying the PostgreSQL's optimizer. I don't know if anyone in this list have been participated from the PostgreSQL's Optimizer development, but maybe someone can help me on this question. PostgreSQL generates all possible plans of executing the query (using an

[HACKERS] 8.1 system info / admin functions

2005-09-13 Thread Neil Conway
Two minor gripes about these new functions: (1) I think pg_total_relation_size() is a bit more concise and clear than pg_complete_relation_size(). (2) pg_cancel_backend(), pg_reload_conf(), and pg_rotate_logfile() all return an int indicating success (1) or failure (0). Why shouldn't these

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Min Xu (Hsu) [EMAIL PROTECTED] writes: ...If this were the case, perhaps first fetch the spin lock with read-only permission should have helped. But the cmpb instruction in the 8.0 version of TAS would have done that, and I think we've already established that the cmpb is a loss on most

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes: I suspect distributors would go for the multi-cpu setup (especially if a uniprocessor build is *broken* for multiprocessor) and then in a lot of cases you end up not actually getting any benefit. I'm afraid you'd also end up having to tell alot of

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Min Xu (Hsu)
On Tue, 13 Sep 2005 Tom Lane wrote : Min Xu (Hsu) [EMAIL PROTECTED] writes: ...If this were the case, perhaps first fetch the spin lock with read-only permission should have helped. But the cmpb instruction in the 8.0 version of TAS would have done that, and I think we've already

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Tom Lane
Min Xu (Hsu) [EMAIL PROTECTED] writes: ... As you said, however, experimental results shows fetching read-only lines didn't help, which led me wonder whether the second scenario your described was really happening. I don't know --- we haven't tried it. I do intend to work up some patches for

Re: [HACKERS] About method of PostgreSQL's Optimizer

2005-09-13 Thread Jonah H. Harris
Pryscila, While I haven't been too involved in the open source PostgreSQL optimizer, I have done some work on it and optimizers in other database systems. Based on my work, it is my opinion that PostgreSQL, as-well-as other databases which use a cost-based optimizer, prefer a breadth-first

Re: [HACKERS] 8.1 system info / admin functions

2005-09-13 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: Two minor gripes about these new functions: (1) I think pg_total_relation_size() is a bit more concise and clear than pg_complete_relation_size(). (2) pg_cancel_backend(), pg_reload_conf(), and pg_rotate_logfile() all return an int indicating success

Re: [HACKERS] 8.1 system info / admin functions

2005-09-13 Thread Neil Conway
Tom Lane wrote: If we weren't already forcing an initdb for beta2, I'd say it's a bit late to be complaining ... but we can fix them for free right now, so why not? Ok, I'll take a look. While we're on the subject, the units used by pg_size_pretty() are incorrect, at least according to the

Re: [HACKERS] 8.1 system info / admin functions

2005-09-13 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: While we're on the subject, the units used by pg_size_pretty() are incorrect, at least according to the IEC: for example, MB is strictly-speaking one million bytes, not 1024^2 bytes. 1024^2 bytes is 1 MiB (similarly for KiB, GiB, and TiB). I'll take a

Re: [HACKERS] Spinlocks, yet again: analysis and proposed patches

2005-09-13 Thread Min Xu (Hsu)
On Tue, 13 Sep 2005 Tom Lane wrote : Min Xu (Hsu) [EMAIL PROTECTED] writes: ... As you said, however, experimental results shows fetching read-only lines didn't help, which led me wonder whether the second scenario your described was really happening. I don't know --- we haven't tried

[HACKERS] VACUUM VERBOSE 8.1dev

2005-09-13 Thread Joshua D. Drake
Hello, It seems the new VACUUM VERBOSE output is not quite as helpful as 8.0. In 8.0 I get a nice output at the end like this: INFO: free space map: 1377 relations, 22478 pages stored; 44112 total pages needed DETAIL: Allocated FSM size: 10 relations + 200 pages = 17702 kB shared

Re: [HACKERS] VACUUM VERBOSE 8.1dev

2005-09-13 Thread Joshua D. Drake
INFO: free space map: 1377 relations, 22478 pages stored; 44112 total pages needed DETAIL: Allocated FSM size: 10 relations + 200 pages = 17702 kB shared memory. AFAIR, in both 8.0 and 8.1 that info appears only at the end of a database-wide vacuum. Your example is a

Re: [HACKERS] VACUUM VERBOSE 8.1dev

2005-09-13 Thread Tom Lane
Joshua D. Drake [EMAIL PROTECTED] writes: It seems the new VACUUM VERBOSE output is not quite as helpful as 8.0. In 8.0 I get a nice output at the end like this: INFO: free space map: 1377 relations, 22478 pages stored; 44112 total pages needed DETAIL: Allocated FSM size: 10 relations

Re: [HACKERS] 8.1 system info / admin functions

2005-09-13 Thread Neil Conway
Tom Lane wrote: [ itch... ] The IEC may think they get to define what's correct, but I don't think that squares with common usage. The only people who think MB is measured in decimal are disk-manufacturer marketroids. Well, just them and the IEEE :) While common usage has been to use kB to