Dear Tom,
Thanks for this, and sorry for not replying earlier. We finally obtained
a window to deploy this patch on the real (rather busy!) production
system as of last Saturday evening.
The good news is that the patch has now been in place for 5 days, and,
despite some very high loading, it
Richard Neill writes:
> The good news is that the patch has now been in place for 5 days, and,
> despite some very high loading, it has survived without a single crash.
> I'd venture to say that this issue is now fixed.
Great, thanks for the followup.
regards, tom lane
I wrote:
> Interestingly, the bug can no longer be reproduced in CVS HEAD, because
> pg_database no longer has a trigger. We had better fix it anyway of
> course, since future hash collisions are unpredictable. I'm wondering
> though whether to bother back-patching further than 8.4. Thoughts?
I
I wrote:
> I'll get you a real fix as soon as I can, but might not be till
> tomorrow.
The attached patch (against 8.4.x) fixes the problem as far as I can
tell. Please test.
regards, tom lane
Index: src/backend/utils/cache/relcache.c
Heikki Linnakangas writes:
> Tom Lane wrote:
>> 2. By chance, a shared-cache-inval flush comes through while it's doing
>> that, causing all non-open, non-nailed relcache entries to be discarded.
>> Including, in particular, the one that is "next" according to the
>> hash_seq_search's status.
> I
Tom Lane wrote:
> 2. By chance, a shared-cache-inval flush comes through while it's doing
> that, causing all non-open, non-nailed relcache entries to be discarded.
> Including, in particular, the one that is "next" according to the
> hash_seq_search's status.
I thought we have catchup interrupts
Tom Lane said:
> "Michael Brown" writes:
>> I have put in place a temporary workaround on the production system,
>> which is to insert a
>
>> // Pretend that the cache is always invalid
>> fprintf ( stderr, "*** bypassing cache ***\n" );
>> goto read_failed;
>
> I don't think this w
Tom Lane said:
> I shall go and do some further investigation, but at least it's now
> clear where to look. Thanks for the report, and for being so helpful in
> providing information!
Thank you!
I have put in place a temporary workaround on the production system, which
is to insert a
//
I wrote:
> But: the question at this point is why we've never seen such a report
> before 8.4. If this theory is correct, it's been broken for a *long*
> time. I can think of a couple of possible explanations:
> A: the problem can only manifest if this loop has work to do for
> a relcache entry
"Michael Brown" writes:
> If temporary table drops count towards this, then yes.
Yeah, they do.
> I could fairly easily change this procedure to truncate rather than drop
> the temporary table, if that would lessen the exposure to the problem.
> Would that be likely to help?
Very probably. It
"Michael Brown" writes:
> I have put in place a temporary workaround on the production system, which
> is to insert a
> // Pretend that the cache is always invalid
> fprintf ( stderr, "*** bypassing cache ***\n" );
> goto read_failed;
I don't think this will actually help --- i
Michael Brown writes:
>> ... (If you have a spare machine with the same OS and
>> the same postgres executables, maybe you could put the core file on that
>> and let me ssh in to have a look?)
[ ssh details ]
Thanks for letting me poke around. What I found out is that the
hash_seq_search loop
On Thursday 24 September 2009 23:02:15 Michael Brown wrote:
> > I think this must mean that corrupt data is being read from the relcache
> > init file. The reason a restart fixes it is probably that restart
> > forcibly removes the old init file, which is good for recovery but not
> > so good for
Richard Neill writes:
> I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
> we've found that this is still happening repeatedly in 8.4.1.
Oh dear. I just got an off-list report that seems to point to the same
kind of thing.
> The backtrace points to line 2654 in relcache
Dear All,
I've just upgraded from 8.4.0 to 8.4.1 because of a segfault in 8.4, and
we've found that this is still happening repeatedly in 8.4.1. We're in a
bit of a bind, as this is a production system, and we get segfaults
every few hours.
[It's a testament to how good the postgres crash r
15 matches
Mail list logo