[ADMIN] IO Timeout

2005-03-10 Thread Alex Turner
I have a question about IO timeouts: We are using the 3ware escalade 9500S series of cards, and we had a drive failure this morning. Apparnetly the card waits 30 seconds for the drive to respond, and if it doesn't, it put's the drive in a fail state. Postgres it seems didn't wait 30 seconds befo

Re: [ADMIN] IO Timeout

2005-03-10 Thread Tom Lane
Alex Turner <[EMAIL PROTECTED]> writes: > I have a question about IO timeouts: > We are using the 3ware escalade 9500S series of cards, and we had a > drive failure this morning. Apparnetly the card waits 30 seconds for > the drive to respond, and if it doesn't, it put's the drive in a fail > stat

Re: [ADMIN] IO Timeout

2005-03-10 Thread Alex Turner
Well - I am sort of trying to piece together exactly what happened. Here's what I know. Around 02:52 I get messages in my syslog stating that there were problems writing to a controler channel: Mar 10 02:52:29 tsunami kernel: 3w-9xxx: scsi1: WARNING: (0x06:0x002C): Unit #1: Command (0x28) timed o

Re: [ADMIN] IO Timeout

2005-03-10 Thread Tom Lane
Alex Turner <[EMAIL PROTECTED]> writes: > Well - I am sort of trying to piece together exactly what happened. > Here's what I know. > Around 02:52 I get messages in my syslog stating that there were > problems writing to a controler channel: > [ various hardware errors snipped ] > At around 07:30

Re: [ADMIN] IO Timeout

2005-03-11 Thread Alex Turner
Thanks very much Tom for you input - The guys at AMCC are suggesting that the firmware on the controller card crashed, causing the card to basicaly stop IO operations. This would explain why postgres could not recover and re-read WAL, because /dev/sdc and sdd were inaccessible at that time. I th