Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-10-15 Thread Richard Neill
Dear Tom, Thanks for this, and sorry for not replying earlier. We finally obtained a window to deploy this patch on the real (rather busy!) production system as of last Saturday evening. The good news is that the patch has now been in place for 5 days, and, despite some very high loading, it

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-10-15 Thread Tom Lane
Richard Neill writes: > The good news is that the patch has now been in place for 5 days, and, > despite some very high loading, it has survived without a single crash. > I'd venture to say that this issue is now fixed. Great, thanks for the followup. regards, tom lane

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-25 Thread Tom Lane
I wrote: > Interestingly, the bug can no longer be reproduced in CVS HEAD, because > pg_database no longer has a trigger. We had better fix it anyway of > course, since future hash collisions are unpredictable. I'm wondering > though whether to bother back-patching further than 8.4. Thoughts? I

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-25 Thread Tom Lane
I wrote: > I'll get you a real fix as soon as I can, but might not be till > tomorrow. The attached patch (against 8.4.x) fixes the problem as far as I can tell. Please test. regards, tom lane Index: src/backend/utils/cache/relcache.c

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-25 Thread Tom Lane
Heikki Linnakangas writes: > Tom Lane wrote: >> 2. By chance, a shared-cache-inval flush comes through while it's doing >> that, causing all non-open, non-nailed relcache entries to be discarded. >> Including, in particular, the one that is "next" according to the >> hash_seq_search's status. > I

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Heikki Linnakangas
Tom Lane wrote: > 2. By chance, a shared-cache-inval flush comes through while it's doing > that, causing all non-open, non-nailed relcache entries to be discarded. > Including, in particular, the one that is "next" according to the > hash_seq_search's status. I thought we have catchup interrupts

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Michael Brown
Tom Lane said: > "Michael Brown" writes: >> I have put in place a temporary workaround on the production system, >> which is to insert a > >> // Pretend that the cache is always invalid >> fprintf ( stderr, "*** bypassing cache ***\n" ); >> goto read_failed; > > I don't think this w

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Michael Brown
Tom Lane said: > I shall go and do some further investigation, but at least it's now > clear where to look. Thanks for the report, and for being so helpful in > providing information! Thank you! I have put in place a temporary workaround on the production system, which is to insert a //

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Tom Lane
I wrote: > But: the question at this point is why we've never seen such a report > before 8.4. If this theory is correct, it's been broken for a *long* > time. I can think of a couple of possible explanations: > A: the problem can only manifest if this loop has work to do for > a relcache entry

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Tom Lane
"Michael Brown" writes: > If temporary table drops count towards this, then yes. Yeah, they do. > I could fairly easily change this procedure to truncate rather than drop > the temporary table, if that would lessen the exposure to the problem. > Would that be likely to help? Very probably. It

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Tom Lane
"Michael Brown" writes: > I have put in place a temporary workaround on the production system, which > is to insert a > // Pretend that the cache is always invalid > fprintf ( stderr, "*** bypassing cache ***\n" ); > goto read_failed; I don't think this will actually help --- i

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Tom Lane
Michael Brown writes: >> ... (If you have a spare machine with the same OS and >> the same postgres executables, maybe you could put the core file on that >> and let me ssh in to have a look?) [ ssh details ] Thanks for letting me poke around. What I found out is that the hash_seq_search loop

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Michael Brown
On Thursday 24 September 2009 23:02:15 Michael Brown wrote: > > I think this must mean that corrupt data is being read from the relcache > > init file. The reason a restart fixes it is probably that restart > > forcibly removes the old init file, which is good for recovery but not > > so good for

Re: [BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-24 Thread Tom Lane
Richard Neill writes: > I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and > we've found that this is still happening repeatedly in 8.4.1. Oh dear. I just got an off-list report that seems to point to the same kind of thing. > The backtrace points to line 2654 in relcache

[BUGS] Postgresql 8.4.1 segfault, backtrace

2009-09-23 Thread Richard Neill
Dear All, I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and we've found that this is still happening repeatedly in 8.4.1. We're in a bit of a bind, as this is a production system, and we get segfaults every few hours. [It's a testament to how good the postgres crash r