Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Tom Lane
Gordon Shannon writes: > In any case, I will cease and desist from ALTER SET TABLESPACE for a while!. Here's the applied patch, if you are interested in testing it. regards, tom lane Index: src/backend/access/heap/heapam.c

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
Sounds like you're on it. Just wanted to share one additional piece, in case it helps. Just before the ALTER INDEX SET TABLESPACE was issued, there were some writes to the table in question inside a serializable transaction. The transaction committed at 11:11:58 EDT, and consisted of, among a cou

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Tom Lane
Gordon Shannon writes: > [ corruption on a standby slave after an ALTER SET TABLESPACE operation ] Found it, I think. ATExecSetTableSpace transfers the copied data to the slave by means of XLOG_HEAP_NEWPAGE WAL records. The replay function for this (heap_xlog_newpage) is failing to pay any atte

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 12:52 PM, Tom Lane wrote: > Gordon Shannon writes: > > Bingo. Yes it is reasonable. It was 25 seconds between my altering the > > index in question and the server crash. > > Sounds like we have a smoking gun. Could you show all your non-default > postgresql.conf setting

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Tom Lane
Gordon Shannon writes: > Bingo. Yes it is reasonable. It was 25 seconds between my altering the > index in question and the server crash. Sounds like we have a smoking gun. Could you show all your non-default postgresql.conf settings on the master? I'm wondering about full_page_writes in part

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 12:10 PM, Tom Lane wrote: > No, this would be a pg_database row with that OID. But it looks like > you found the relevant index anyway. > > Yup, realized that on second reading. > > These commands worked fine on the master, yet this seems suspiciously > > relevant. > > >

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Tom Lane
Gordon Shannon writes: > Interesting. There is no pg_class entry for 22362. No, this would be a pg_database row with that OID. But it looks like you found the relevant index anyway. > ... There is, however, an > entry for that filenode. It's an index I created Sat AM, about 6AM. > ... > - This

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
On Sun, May 2, 2010 at 11:02 AM, Tom Lane wrote: > > > Hmm ... AFAICS the only way to get that message when the incoming TID's > offsetNumber is only 2 is for the index page to be completely empty > (not zeroes, else PageAddItem's sanity check would have triggered, > but valid and empty). What t

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Tom Lane
Gordon Shannon writes: > I just got ran into the same problem. Both servers are running 8.4.3, and > the standby server had been running for 2 days, processing many thousands of > logs successfully. Here's my error: > 4158 2010-05-02 11:12:09 EDT [26445]LOG: restored log file > "0001

Re: [GENERAL] Tracking down log segment corruption

2010-05-02 Thread Gordon Shannon
I just got ran into the same problem. Both servers are running 8.4.3, and the standby server had been running for 2 days, processing many thousands of logs successfully. Here's my error: 4158 2010-05-02 11:12:09 EDT [26445]LOG: restored log file "00013C7700C3" from archive 4158

[GENERAL] Tracking down log segment corruption

2008-12-21 Thread Charles Duffy
Howdy, all. I have a log-shipping replication environment (based on PostgreSQL 8.3.4) using pg_lesslog+LZOP for compression of archived segments (kept around long-term for possible use doing PITR). The slave came out of synchronization recently, restoring a series of segments and then failing