Re: [BUGS] unable to fail over to warm standby server

2010-02-09 Thread Heikki Linnakangas
Tom Lane wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jan 29, 2010 at 3:32 PM, Heikki Linnakangas That only affects the error message and is harmless otherwise, but I thought I'd mention it. I'll fix it, unless someone wants to argue that its more useful to print the raw return

Re: [BUGS] unable to fail over to warm standby server

2010-02-09 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Yes. Please see the existing code in the postmaster that prints subprocess exit codes, and duplicate it (or perhaps refactor so you can avoid code duplication; though translatability of messages might limit what

Re: [BUGS] unable to fail over to warm standby server

2010-02-09 Thread Alvaro Herrera
Heikki Linnakangas escribió: Here's what I came up with. Translatability indeed makes it pretty hard, I ended up copy-pasting. Looks sane to me too; msgmerge segfaults though so I couldn't test. Two minor comments: + /*-- + translator: %s is a noun phrase

Re: [BUGS] unable to fail over to warm standby server

2010-02-09 Thread Fujii Masao
On Wed, Feb 10, 2010 at 4:47 AM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Here's what I came up with. Translatability indeed makes it pretty hard, I ended up copy-pasting. BTW, I don't think I'm going to bother or risk back-patching this. It was harmless, and for forensic

Re: [BUGS] unable to fail over to warm standby server

2010-02-05 Thread Heikki Linnakangas
Mason Hale wrote: Given that this situation did NOT actually cause corruption, rather the error message was mangled such that it suggested corruption, I offer this revised suggestion for update to the documentation: Important note: It is critical the trigger file be created with permissions

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Fujii Masao
On Fri, Jan 29, 2010 at 11:49 PM, Mason Hale ma...@onespot.com wrote: While I did not remove the trigger file, I did rename recovery.conf to recovery.conf.old. That file contained the recovery_command configuration that identified the trigger file. So that rename should have eliminated the

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
On Fri, Jan 29, 2010 at 12:03 AM, Mason Hale ma...@onespot.com wrote: Of course the best solution is to avoid this issue entirely. Something as easy to miss as file permissions should not cause data corruption, especially in the process meant to fail over from a crashing primary database.

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
Hello Fujii -- Thanks for the clarification. It's clear my understanding of the recovery process is lacking. My naive assumption was that Postgres would recover using whatever files were available and if it had run out of files it would stop there and come up. And that if recovery.conf were

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Robert Haas
On Fri, Jan 29, 2010 at 11:02 AM, Fujii Masao masao.fu...@gmail.com wrote: You seem to focus on the above trouble. I think that this happened because recovery.conf was deleted and restore_command was not given. In fact, the WAL file (e.g., pg_xlog/00023C8200A3) required for recovery

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Heikki Linnakangas
Robert Haas wrote: On Fri, Jan 29, 2010 at 11:02 AM, Fujii Masao masao.fu...@gmail.com wrote: You seem to focus on the above trouble. I think that this happened because recovery.conf was deleted and restore_command was not given. In fact, the WAL file (e.g., pg_xlog/00023C8200A3)

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Mason Hale
If the sysadmin had left the recovery.conf and removed the trigger file, pg_standby in restore_command would have restored all WAL files required for recovery, and recovery would advance well. That may be true, but it's certainly seems unfortunate that we don't handle this case a bit

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Heikki Linnakangas
Actually, I think there's a tiny harmless bug in the server too. When it prints the error message: 2010-01-18 21:08:31 UTC ()FATAL: could not restore file 00023C8200D8 from archive: return code 65280 That return code is not the return code that came from the restore_command. Ie if

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Robert Haas
On Fri, Jan 29, 2010 at 3:32 PM, Heikki Linnakangas heikki.linnakan...@enterprisedb.com wrote: Actually, I think there's a tiny harmless bug in the server too. When it prints the error message: 2010-01-18 21:08:31 UTC ()FATAL:  could not restore file 00023C8200D8 from archive:

Re: [BUGS] unable to fail over to warm standby server

2010-01-29 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jan 29, 2010 at 3:32 PM, Heikki Linnakangas That only affects the error message and is harmless otherwise, but I thought I'd mention it. I'll fix it, unless someone wants to argue that its more useful to print the raw return value of system(),

Re: [BUGS] unable to fail over to warm standby server

2010-01-28 Thread Heikki Linnakangas
Mason Hale wrote: ERROR: could not remove /tmp/pgsql.trigger.5432: Operation not permittedtrigger file found ERROR: could not remove /tmp/pgsql.trigger.5432: Operation not permitted This file was not looked until after the attempt to recover was aborted. Clearly the permissions on

Re: [BUGS] unable to fail over to warm standby server

2010-01-28 Thread Mason Hale
Hello Heikki -- Thank you for investigating this issue and clearing up this mystery. I do not believe it is obvious that the postgres process needs to be able to remove the trigger file. My naive assumption was that the trigger file was merely a flag to signal that recovery mode needed to be

Re: [BUGS] unable to fail over to warm standby server

2010-01-28 Thread Fujii Masao
On Fri, Jan 29, 2010 at 12:03 AM, Mason Hale ma...@onespot.com wrote: Of course the best solution is to avoid this issue entirely. Something as easy to miss as file permissions should not cause data corruption, especially in the process meant to fail over from a crashing primary database. I

[BUGS] unable to fail over to warm standby server

2010-01-27 Thread Mason Hale
Hello -- We are using PostgreSQL 8.3.8 with a Warm Standy (PITR) setup. Recently we experienced a failure on our primary database server and when we attempted to fail over to the standby server it would not come up. This configuration has been tested previously (we've successfully transferred