Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2006-06-14 Thread Bruce Momjian
Added to TODO: * Consider increasing internal areas when shared buffers is increased http://archives.postgresql.org/pgsql-hackers/2005-10/msg01419.php --- Alvaro Herrera wrote: > Jim C. Nasby wro

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-03 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Seriously, I am wondering about the performance hit of always checking > debug_assertions. > http://archives.postgresql.org/pgsql-hackers/2005-08/msg00389.php > indicates that even with debug_assertions=false, --enable-cassert has a > big performance imp

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-03 Thread Jim C. Nasby
On Thu, Nov 03, 2005 at 11:11:42AM -0500, Tom Lane wrote: > Perhaps rather than an all-or-nothing debug_assertions GUC variable, > what we want is something that turns on or off "expensive" assertion > checks at runtime. This could include MEMORY_CONTEXT_CHECKING and > anything else where the actu

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-03 Thread Tom Lane
Alvaro Herrera <[EMAIL PROTECTED]> writes: > May I propose to make Assert() yield only WARNINGs, That is a horrid idea --- for one thing, it means that Asserts inside the elog machinery itself would be instant infinite recursion, and even elsewhere you'd have to think a bit about whether it's ok t

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-03 Thread Alvaro Herrera
Jim C. Nasby wrote: > BTW, is MEMORY_CONTEXT_CHECKING that expensive? It seems like it > shouldn't be, but I'm only guessing at what exactly it does... Yes, because not only it checks marker bytes at the end of palloc chunks and similar stuff, but it also overwrites whole contexts with 0x7f when

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-02 Thread Jim C. Nasby
On Wed, Nov 02, 2005 at 02:04:09PM -0500, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > BTW, that's a reversal from what I was originally arguing for, which was > > due to the performance penalty associated with --enable-cassert. My > > client is now running with Tom's suggestion

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-02 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > BTW, that's a reversal from what I was originally arguing for, which was > due to the performance penalty associated with --enable-cassert. My > client is now running with Tom's suggestion of commenting out > CLOBBER_FREED_MEMORY and MEMORY_CONTEXT_CHECK

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-02 Thread Jim C. Nasby
On Wed, Nov 02, 2005 at 07:03:57AM -0500, Greg Stark wrote: > > I would bet that ninety percent of the Asserts in the existing code are on > > conditions that could represent, at worst, corruption of backend-local or > > even transaction-local data structures. Taking down the entire database > > cl

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-02 Thread Greg Stark
Tom Lane <[EMAIL PROTECTED]> writes: > Greg Stark <[EMAIL PROTECTED]> writes: > > I happen to think that except for the rare assertion that has major > > performance impact all the assertions should be on in production builds. The > > goal of assertions is to catch corruption quickly and that's s

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-01 Thread Tom Lane
Greg Stark <[EMAIL PROTECTED]> writes: > I happen to think that except for the rare assertion that has major > performance impact all the assertions should be on in production builds. The > goal of assertions is to catch corruption quickly and that's something that's > just as important in producti

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-11-01 Thread Greg Stark
Bruce Momjian writes: > Jim C. Nasby wrote: > > > My assumption is that the asserts that are currently in place fall into > > one of two categories: either they check for something that if false > > could result in data corruption in the heap, or they check for something > > that shouldn't happe

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > Doesn't clog use the same code? Yeah, but all three of your examples were referencing pg_subtrans, as proven by the stack traces and the contents of the shared control block. Even though the bug seems completely clog.c's fault, this is not a coincidenc

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Tom Lane
I wrote: > Even though the bug seems completely clog.c's fault, s/clog.c/slru.c/ of course :-( regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Jim C. Nasby
On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote: > Tom Lane wrote: > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > > AFAIK they're not using subtransactions at all, but I'll check. > > > > Well, yeah, they are ... else you'd never have seen this failure. > > Maybe it's in plpgsq

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Jim C. Nasby
On Tue, Nov 01, 2005 at 11:23:55AM -0300, Alvaro Herrera wrote: > Tom Lane wrote: > > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > > AFAIK they're not using subtransactions at all, but I'll check. > > > > Well, yeah, they are ... else you'd never have seen this failure. > > Maybe it's in plpgsq

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-11-01 Thread Alvaro Herrera
Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > AFAIK they're not using subtransactions at all, but I'll check. > > Well, yeah, they are ... else you'd never have seen this failure. Maybe it's in plpgsql EXCEPTION clauses. -- Alvaro Herrera Valdivia, Chile ICBM: S 39ยบ 4

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > AFAIK they're not using subtransactions at all, but I'll check. Well, yeah, they are ... else you'd never have seen this failure. regards, tom lane ---(end of broadcast)--- TIP 9:

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 09:02:59PM -0300, Alvaro Herrera wrote: > Jim C. Nasby wrote: > > Now that I've got a little better idea of what this code does, I've > > noticed something interesting... this issue is happening on an 8-way > > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Alvaro Herrera
Jim C. Nasby wrote: > Now that I've got a little better idea of what this code does, I've > noticed something interesting... this issue is happening on an 8-way > machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this > greatly increase the odds of buffer conflicts? Bug aside, would

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
Now that I've got a little better idea of what this code does, I've noticed something interesting... this issue is happening on an 8-way machine, and NUM_SLRU_BUFFERS is currently defined at 8. Doesn't this greatly increase the odds of buffer conflicts? Bug aside, would it be better to set NUM_SLRU

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Gregory Maxwell
On 10/31/05, Jim C. Nasby <[EMAIL PROTECTED]> wrote: > On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote: > > There is no way if the system has some incorrect value whether that > > would later corrupt the data or not. Anything the system does that it > > shouldn't do is a potential co

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 01:34:17PM -0500, Bruce Momjian wrote: > There is no way if the system has some incorrect value whether that > would later corrupt the data or not. Anything the system does that it > shouldn't do is a potential corruption problem. But is it safe to say that there are areas

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Bruce Momjian
Jim C. Nasby wrote: > On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote: > > > This incident has made me wonder if it's worth creating two classes of > > > assertions. The (hopefully more common) set of assertions would be for > > > things that shouldn't happen, but if go un-caught won'

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 01:01:14PM -0500, Bruce Momjian wrote: > > This incident has made me wonder if it's worth creating two classes of > > assertions. The (hopefully more common) set of assertions would be for > > things that shouldn't happen, but if go un-caught won't result in heap > > corrupt

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Bruce Momjian
Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > >> This won't do as a permanent patch, because it isn't guaranteed to fix > >> the problem on machines that don't strongly order writes, but it should > >> work on Opterons,

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Mon, Oct 31, 2005 at 01:05:06PM -0500, Tom Lane wrote: > "Jim C. Nasby" <[EMAIL PROTECTED]> writes: > > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > >> This won't do as a permanent patch, because it isn't guaranteed to fix > >> the problem on machines that don't strongly order wri

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Tom Lane
"Jim C. Nasby" <[EMAIL PROTECTED]> writes: > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: >> This won't do as a permanent patch, because it isn't guaranteed to fix >> the problem on machines that don't strongly order writes, but it should >> work on Opterons, at least well enough to co

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags

2005-10-31 Thread Bruce Momjian
Jim C. Nasby wrote: > On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > > I'd like Jim to test this theory by seeing if it helps to reverse the > > order of the if-test elements at lines 294/295, ie make it look like > > > > if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PR

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
Sorry, two more things... Will increasing shared_buffers make this less likely to occur? Or is this just something that's likely to happen when there are things like seqscans that are putting buffers near the front of the LRU? (The 8.0.3 buffer manager does something like that, right?) Is this so

Re: slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-31 Thread Jim C. Nasby
On Sun, Oct 30, 2005 at 06:17:53PM -0500, Tom Lane wrote: > I'd like Jim to test this theory by seeing if it helps to reverse the > order of the if-test elements at lines 294/295, ie make it look like > > if (shared->page_status[slotno] != SLRU_PAGE_READ_IN_PROGRESS || > shared

slru.c race condition (was Re: [HACKERS] TRAP: FailedAssertion("!((itemid)->lp_flags & 0x01)",)

2005-10-30 Thread Tom Lane
OK, I think I see it. The problem is that the code in slru.c is careful about not modifying state when it doesn't hold the proper lock, but not so careful about not *inspecting* state without the proper lock. In particular consider these lines in SimpleLruReadPage (line numbers are as in CVS tip)