Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-06-10 Thread Jeff Frost
On May 26, 2012, at 9:17 AM, Tom Lane wrote: > Would you guys please try this in the problem databases: > > select a.ctid, c.relname > from pg_attribute a join pg_class c on a.attrelid=c.oid > where c.relnamespace=11 and c.relkind in ('r','i') > order by 1 desc; > > If you see any block numbers

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Tom Lane
Greg Sabino Mullane writes: > On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote: >> On May 27, 2012, at 12:53 PM, Tom Lane wrote: >>> occurring, they'd take long enough to expose the process to sinval >>> overrun even with not-very-high DDL rates. >> As it turns out, there are quite a fe

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-28 Thread Greg Sabino Mullane
On Sun, May 27, 2012 at 05:44:15PM -0700, Jeff Frost wrote: > On May 27, 2012, at 12:53 PM, Tom Lane wrote: > > occurring, they'd take long enough to expose the process to sinval > > overrun even with not-very-high DDL rates. > As it turns out, there are quite a few temporary tables created. For t

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-27 Thread Jeff Frost
On May 27, 2012, at 12:53 PM, Tom Lane wrote: > Another thing that can be deduced from those stack traces is that sinval > resets were happening. For example, in the third message linked above, > the heapscan is being done to load up a relcache entry for relation 2601 > (pg_am). This would be un

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-27 Thread Tom Lane
I've been continuing to poke at this business of relcache-related startup stalls, and have come to some new conclusions. One is that it no longer seems at all likely that the pg_attribute rows for system catalogs aren't at the front of pg_attribute, because the commands that might be used to updat

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Sat, May 26, 2012 at 01:25:29PM -0400, Tom Lane wrote: > Greg Sabino Mullane writes: > > On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: > >> If you see any block numbers above about 20 then maybe the triggering > >> condition is a row relocation after all. > > > Highest was 13. > >

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Tom Lane
Greg Sabino Mullane writes: > On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: >> If you see any block numbers above about 20 then maybe the triggering >> condition is a row relocation after all. > Highest was 13. Hm ... but wait, you said you'd done a VACUUM FULL on the catalogs. So it

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Sat, May 26, 2012 at 12:17:04PM -0400, Tom Lane wrote: > If you see any block numbers above about 20 then maybe the triggering > condition is a row relocation after all. Highest was 13. -- Greg Sabino Mullane g...@endpoint.com End Point Corporation PGP Key: 0x14964AC8 pgpa6XGTGTEIZ.pgp Desc

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Tom Lane
Greg Sabino Mullane writes: > On Fri, May 25, 2012 at 07:02:42PM -0400, Tom Lane wrote: >> pg_attribute just enough smaller to avoid the scenario. Not sure about >> Greg's case, but he should be able to tell us the size of pg_attribute >> and his shared_buffers setting ... > pg_attribute around

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-26 Thread Greg Sabino Mullane
On Fri, May 25, 2012 at 07:02:42PM -0400, Tom Lane wrote: > However, the remaining processes trying to > compute new init files would still have to complete the process, so I'd > expect there to be a diminishing effect --- the ones that were stalling > shouldn't all release exactly together. Unles

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost
On May 25, 2012, at 7:12 PM, Tom Lane wrote: > Jeff Frost writes: >> In our customer's case, the size of pg_attribute was a little less than 1/4 >> of shared_buffers, so might not be the syncscan? > > Could you go back and double check that? If the shared_buffers setting > were 7GB not 8GB,

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Jeff Frost writes: > In our customer's case, the size of pg_attribute was a little less than 1/4 > of shared_buffers, so might not be the syncscan? Could you go back and double check that? If the shared_buffers setting were 7GB not 8GB, that would fall right between the pg_attribute sizes you p

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Jeff Frost
On May 25, 2012, at 4:02 PM, Tom Lane wrote: > Greg Sabino Mullane writes: >>> Yeah, this is proof that what it was doing is the same as what we saw in >>> Jeff's backtrace, ie loading up the system catalog relcache entries the >>> hard way via seqscans on the core catalogs. So the question to

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Tom Lane
Greg Sabino Mullane writes: >> Yeah, this is proof that what it was doing is the same as what we saw in >> Jeff's backtrace, ie loading up the system catalog relcache entries the >> hard way via seqscans on the core catalogs. So the question to be >> answered is why that's suddenly a big performa

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-25 Thread Greg Sabino Mullane
> Yeah, this is proof that what it was doing is the same as what we saw in > Jeff's backtrace, ie loading up the system catalog relcache entries the > hard way via seqscans on the core catalogs. So the question to be > answered is why that's suddenly a big performance bottleneck. It's not > a che

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane writes: > Oh, almost forgot: reading your reply to the old thread reminded me of > something I saw in one of the straces right as it "woke up" and left > the startup state to do some work. Here's a summary: > 12:18:39 semop(4390981, 0x7fff66c4ec10, 1) = 0 > 12:18:39 semop(43

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
> I think there are probably two independent issues here. The missing > index entries are clearly bad but it's not clear that they had anything > to do with the startup stall. On further log digging, I think you are correct, as those index warnings go back many days before the startup problems a

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
On Thu, May 24, 2012 at 03:54:54PM -0400, Tom Lane wrote: > Did you check I/O activity? I looked again at Jeff Frost's report and > now think that what he saw was probably a lot of seqscans on bloated > system catalogs, cf > http://archives.postgresql.org/message-id/28484.1337887...@sss.pgh.pa.us

Re: [HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Tom Lane
Greg Sabino Mullane writes: > Yesterday I had a client that experienced a sudden high load on > one of their servers (8.3.5 - yes, I know. Those of you with > clients will understand). When I checked, almost all connections > were in a "startup" state, very similar to this thread: > http://pos

[HACKERS] Backends stalled in 'startup' state: index corruption

2012-05-24 Thread Greg Sabino Mullane
Yesterday I had a client that experienced a sudden high load on one of their servers (8.3.5 - yes, I know. Those of you with clients will understand). When I checked, almost all connections were in a "startup" state, very similar to this thread: http://postgresql.1045698.n5.nabble.com/9-1-3-bac