Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Jeremy Drake
On Mon, 9 Jan 2006, Tom Lane wrote: > Does your application drop these temp tables explicitly, or leave them > to be dropped automatically during commit? It might be interesting to > see whether changing that makes any difference. I drop them explicitly at the end of the function. > I'm also cu

Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > I ran without that function you made, and it got the error, but not a > crash. I stuck an Assert(false) right before the ereport for that > particular error, and I did end up with a core there, but I don't see > anything out of the ordinary (what little I

Re: [HACKERS] catalog corruption bug

2006-01-09 Thread Jeremy Drake
On Sun, 8 Jan 2006, Tom Lane wrote: > Yeah, that's not very surprising. Running the forced-cache-resets > function will definitely expose that catcache bug pretty quickly. > You'd need to apply the patches I put in yesterday to have a system > that has any chance of withstanding that treatment fo

Re: [HACKERS] catalog corruption bug

2006-01-08 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > On Sat, 7 Jan 2006, Tom Lane wrote: >> A bit of a leap in the dark, but: maybe the triggering event for this >> situation is not a "VACUUM pg_amop" but a global cache reset due to >> sinval message buffer overrun. > I tried that function you sent, while r

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: > Jeremy Drake <[EMAIL PROTECTED]> writes: > > On Sat, 7 Jan 2006, Tom Lane wrote: > >> I'll go fix CatCacheRemoveCList, but I think this is not the bug > >> we're looking for. > > A bit of a leap in the dark, but: maybe the triggering event for this > situation

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > On Sat, 7 Jan 2006, Tom Lane wrote: >> I'll go fix CatCacheRemoveCList, but I think this is not the bug >> we're looking for. > Incidentally, one of my processes did get that error at the same time. > All of the other processes had an error > DBD::Pg::st

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: > Jeremy Drake <[EMAIL PROTECTED]> writes: > > Am I correct in interpreting this as the hash opclass for Oid? > > However, AFAICS the only consequence of this bug is to trigger > that Assert failure if you've got Asserts enabled. Dead catcache > entries aren't

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > Am I correct in interpreting this as the hash opclass for Oid? No, it's the AMOPOPID catalog cache (containing rows from pg_amop indexed by amopopr/amopclaid). After digging around for a bit I noticed that catalog caches get flushed if someone vacuums th

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Sat, 7 Jan 2006, Tom Lane wrote: > Fascinating --- that's not anywhere near where I thought your problem > was. Which cache is this tuple in? (Print *ct->my_cache) $2 = { id = 3, cc_next = 0x2aac1048, cc_relname = 0x2ab19df8 "pg_amop", cc_reloid = 2602, cc_indexoid = 2654,

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > I got an assert to fail. I'm not entirely sure if this is helpful, but I > managed to get a core dump with --enable-debug and --enable-cassert (with > optimizations still on). Let me know if there is anything else that would > be useful to get out of thi

Re: [HACKERS] catalog corruption bug

2006-01-07 Thread Jeremy Drake
On Fri, 6 Jan 2006, Tom Lane wrote: > OK, this must be a different issue then. I think we have seen reports > like this one before, but not been able to reproduce it. > > Could you rebuild with Asserts enabled and see if any asserts trigger? I got an assert to fail. I'm not entirely sure if thi

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: >>> DBD::Pg::st execute failed: ERROR: duplicate key violates unique >>> constraint "pg_type_typname_nsp_index" >> >> Hm, did you REINDEX things beforehand? This could be leftover corruption... > Yes. I ran that VACUUM FULL ANALYZE VERBOSE which I email

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Jeremy Drake
On Fri, 6 Jan 2006, Tom Lane wrote: > Jeremy Drake <[EMAIL PROTECTED]> writes: > > Well, I applied that patch that you sent me the link to (the bufmgr.c > > one), and rebuilt (PORTDIR_OVERLAY is cool...) > > > I ran my nine processes which hammer things overnight, and in the > > morning one of the

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > Well, I applied that patch that you sent me the link to (the bufmgr.c > one), and rebuilt (PORTDIR_OVERLAY is cool...) > I ran my nine processes which hammer things overnight, and in the > morning one of them was dead. > DBD::Pg::st execute failed: ERROR

Re: [HACKERS] catalog corruption bug

2006-01-06 Thread Jeremy Drake
On Thu, 5 Jan 2006, Tom Lane wrote: > The ReadBuffer bug I just fixed could result in disappearance of catalog > rows, so this observation is consistent with the theory that that's > what's biting you. It's not proof though... Well, I applied that patch that you sent me the link to (the bufmgr.c

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > Here is some additional information that I have managed to gather today > regarding this. It is not really what causes it, so much as what does > not. > ... > Similar for pg_type, there being 248 index row versions vs 244 row > versions in the table. The

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Jeremy Drake
Here is some additional information that I have managed to gather today regarding this. It is not really what causes it, so much as what does not. I removed all plperl from the loading processes. I did a VACUUM FULL ANALYZE, and then I reindexed everything in the database (Including starting the

Re: [HACKERS] catalog corruption bug

2006-01-05 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: >>> We have encountered a very nasty but apparently rare bug which appears to >>> result in catalog corruption. I've been fooling around with this report today. In several hours of trying, I've been able to get one Assert failure from running Jeremy's exam

Re: [HACKERS] catalog corruption bug

2006-01-04 Thread Jeremy Drake
On Wed, 21 Dec 2005, Tom Lane wrote: > Jeremy Drake <[EMAIL PROTECTED]> writes: > > We have encountered a very nasty but apparently rare bug which appears to > > result in catalog corruption. > > How much of this can you reproduce on 8.1.1? We've fixed a few issues > already. We did not see this

Re: [HACKERS] catalog corruption bug

2005-12-21 Thread Tom Lane
Jeremy Drake <[EMAIL PROTECTED]> writes: > We have encountered a very nasty but apparently rare bug which appears to > result in catalog corruption. How much of this can you reproduce on 8.1.1? We've fixed a few issues already. > This was built from the gentoo ebuild version 8.1.0 I'd be even m

[HACKERS] catalog corruption bug

2005-12-21 Thread Jeremy Drake
We have encountered a very nasty but apparently rare bug which appears to result in catalog corruption. I have not been able to pin down an exact sequence of events which cause this problem, it appears to be a race condition of some sort. This is what I have been able to figure out so far. * It