Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Fujii Masao
On Wed, Mar 31, 2010 at 1:28 AM, Heikki Linnakangas wrote: > Fujii Masao wrote: >>> * Small code changes to handling of failedSources, inspired by your >>> comment. No change in functionality. >>> >>> This is also available in my git repository at >>> git://git.postgresql.org/git/users/heikki/post

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-30 Thread Heikki Linnakangas
Fujii Masao wrote: >> * Small code changes to handling of failedSources, inspired by your >> comment. No change in functionality. >> >> This is also available in my git repository at >> git://git.postgresql.org/git/users/heikki/postgres.git, branch "xlogchanges" > > I looked the patch and was not

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Fujii Masao
On Thu, Mar 25, 2010 at 9:55 PM, Heikki Linnakangas wrote: > * Fix the bug of a spurious PANIC in archive recovery, if the WAL ends > in the middle of a WAL record that continues over a WAL segment boundary. > > * If a corrupt WAL record is found in archive or streamed from master in > standby mod

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 12:26 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: > > > >> PANIC seems like the appropriate solution for now. > > > > It definitely is not. Think some more. > > Well, what happens now in previous ver

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 12:15 +0200, Heikki Linnakangas wrote: > (cc'ing docs list) > > Simon Riggs wrote: > > The lack of docs begins to show a lack of coherent high-level design > > here. > > Yeah, I think you're right. It's becoming hard to keep track of how it's > supposed to behave. Thank you

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Robert Haas
On Thu, Mar 25, 2010 at 8:55 AM, Heikki Linnakangas wrote: > * If a corrupt WAL record is found in archive or streamed from master in > standby mode, throw WARNING instead of PANIC, and keep trying. In > archive recovery (ie. standby_mode=off) it's still a PANIC. We can make > it a WARNING too, wh

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Fujii Masao wrote: > On second thought, the following lines seem to be necessary just after > calling XLogPageRead() since it reads new WAL file from another source. > >> if (readSource == XLOG_FROM_STREAM || readSource == XLOG_FROM_ARCHIVE) >> emode = PANIC; >> else >>

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Fujii Masao wrote: >> sources &= ~failedSources; >> failedSources |= readSource; > > The above lines in XLogPageRead() seem not to be required in normal > recovery case (i.e., standby_mode = off). So how about the attached > patch? > > *** 9050,9056 next_record_is_invalid: > --- 9047,9056 ---

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Heikki Linnakangas wrote: > Simon Riggs wrote: >> On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: >> >>> PANIC seems like the appropriate solution for now. >> It definitely is not. Think some more. > > Well, what happens now in previous versions with pg_standby et al is > that the sta

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: > >> PANIC seems like the appropriate solution for now. > > It definitely is not. Think some more. Well, what happens now in previous versions with pg_standby et al is that the standby starts up. That doesn't seem

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
(cc'ing docs list) Simon Riggs wrote: > The lack of docs begins to show a lack of coherent high-level design > here. Yeah, I think you're right. It's becoming hard to keep track of how it's supposed to behave. > By now, I've forgotten what this thread was even about. The major > design decision

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: >> And if the trigger file is >> found, I think that the startup process should emit a FATAL, i.e., the >> server should exit immediately, to prevent the server from becoming the >> primary in a half-finished state. > > Pl

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 10:11 +0200, Heikki Linnakangas wrote: > PANIC seems like the appropriate solution for now. It definitely is not. Think some more. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: > On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs wrote: > > PANICing won't change the situation, so it just destroys server > > availability. If we had 1 master and 42 slaves then this behaviour would > > take down almost the whole server farm at

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Heikki Linnakangas
Tom Lane wrote: > Fujii Masao writes: >> OK. How about making the startup process emit WARNING, stop WAL replay and >> wait for the presence of trigger file, when an invalid record is found? >> Which keeps the server up for readonly queries. And if the trigger file is >> found, I think that the st

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-25 Thread Simon Riggs
On Thu, 2010-03-25 at 11:08 +0900, Fujii Masao wrote: > On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs wrote: > > PANICing won't change the situation, so it just destroys server > > availability. If we had 1 master and 42 slaves then this behaviour would > > take down almost the whole server farm at

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Tom Lane
Fujii Masao writes: > OK. How about making the startup process emit WARNING, stop WAL replay and > wait for the presence of trigger file, when an invalid record is found? > Which keeps the server up for readonly queries. And if the trigger file is > found, I think that the startup process should e

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Thu, Mar 25, 2010 at 8:23 AM, Simon Riggs wrote: > PANICing won't change the situation, so it just destroys server > availability. If we had 1 master and 42 slaves then this behaviour would > take down almost the whole server farm at once. Very uncool. > > You might have reason to prevent the s

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Simon Riggs
On Wed, 2010-03-24 at 14:31 +0200, Heikki Linnakangas wrote: > Fujii Masao wrote: > > But in the current (v8.4 or before) behavior, recovery ends normally > > when an invalid record is found in an archived WAL file. Otherwise, > > the server would never be able to start normal processing when there

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Wed, Mar 24, 2010 at 10:20 PM, Fujii Masao wrote: >> Thanks. That's easily fixable (applies over the previous patch): >> >> --- a/src/backend/access/transam/xlog.c >> +++ b/src/backend/access/transam/xlog.c >> @@ -3773,7 +3773,7 @@ retry: >>                pagelsn.xrecoff = 0; >>            } >

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Fujii Masao
On Wed, Mar 24, 2010 at 9:31 PM, Heikki Linnakangas wrote: > Hmm, true, this changes behavior over previous releases. I tend to think > that it's always an error if there's a corrupt file in the archive, > though, and PANIC is appropriate. If the administrator wants to start up > the database anyw

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-24 Thread Heikki Linnakangas
Fujii Masao wrote: > But in the current (v8.4 or before) behavior, recovery ends normally > when an invalid record is found in an archived WAL file. Otherwise, > the server would never be able to start normal processing when there > is a corrupted archived file for some reasons. So, that invalid re

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-23 Thread Fujii Masao
Sorry for the delay. On Fri, Mar 19, 2010 at 8:37 PM, Heikki Linnakangas wrote: > Here's a patch I've been playing with. Thanks! I'm reading the patch. > The idea is that in standby mode, > the server keeps trying to make progress in the recovery by: > > a) restoring files from archive > b) rep

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Alvaro Herrera wrote: > Heikki Linnakangas escribió: > >> When recovery reaches an invalid WAL record, typically caused by a >> half-written WAL file, it closes the file and moves to the next source. >> If an error is found in a file restored from archive or in a portion >> just streamed from mast

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Alvaro Herrera
Heikki Linnakangas escribió: > When recovery reaches an invalid WAL record, typically caused by a > half-written WAL file, it closes the file and moves to the next source. > If an error is found in a file restored from archive or in a portion > just streamed from master, however, a PANIC is thrown

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Tom Lane wrote: > Heikki Linnakangas writes: >> Simon Riggs wrote: >>> We might also have written half a file many times. The files in pg_xlog >>> are suspect whereas the files in the archive are not. If we have both we >>> should prefer the archive. > >> Yep. > > Really? That will result in a

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Tom Lane
Heikki Linnakangas writes: > Simon Riggs wrote: >> We might also have written half a file many times. The files in pg_xlog >> are suspect whereas the files in the archive are not. If we have both we >> should prefer the archive. > Yep. Really? That will result in a change in the longstanding be

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote: > >> I agree that this is a bigger problem. Since the standby always starts >> walreceiver before replaying any WAL files in pg_xlog, walreceiver tries >> to receive the WAL files following the REDO starting point even if t

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-19 Thread Simon Riggs
On Thu, 2010-03-18 at 23:27 +0900, Fujii Masao wrote: > I agree that this is a bigger problem. Since the standby always starts > walreceiver before replaying any WAL files in pg_xlog, walreceiver tries > to receive the WAL files following the REDO starting point even if they > have already been in

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-18 Thread Fujii Masao
On Wed, Mar 17, 2010 at 7:35 PM, Heikki Linnakangas wrote: > Fujii Masao wrote: >> I found another missing feature in new file-based log shipping (i.e., >> standby_mode is enabled and 'cp' is used as restore_command). >> >> After the trigger file is found, the startup process with pg_standby >> tr

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Simon Riggs
On Wed, 2010-03-17 at 12:35 +0200, Heikki Linnakangas wrote: > Looking into this, I realized that we have a bigger problem... A lot of this would be easier if you do the docs first, then work through the problems. The new system is more complex, since it has two modes rather than one and also mul

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-03-17 Thread Heikki Linnakangas
Fujii Masao wrote: > I found another missing feature in new file-based log shipping (i.e., > standby_mode is enabled and 'cp' is used as restore_command). > > After the trigger file is found, the startup process with pg_standby > tries to replay all of the WAL files in both pg_xlog and the archive

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-18 Thread Fujii Masao
On Fri, Feb 12, 2010 at 2:29 AM, Heikki Linnakangas wrote: > So the only major feature we're missing is the ability to clean up old > files. I found another missing feature in new file-based log shipping (i.e., standby_mode is enabled and 'cp' is used as restore_command). After the trigger file

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-14 Thread Fujii Masao
On Sat, Feb 13, 2010 at 1:10 AM, Heikki Linnakangas wrote: > Are you thinking of a scenario where remove_command gets stuck, and > prevents bgwriter from performing restartpoints while it's stuck? Yes. If there is the archive in the remote server and the network outage happens, remove_command mig

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Greg Stark
so I from by like having the server doing the cleanup because it down by necessarily have the while picture. it down nt know of it is the only replica reading these log files our if the site policy is to keep them for disaster recovery purposes. I like having this as an return val command though.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Dimitri Fontaine
Simon Riggs writes: > Attached patch implements pg_standby for use as an > archive_cleanup_command, reusing existing code with new -a option. > > Happy to add the archive_cleanup_command into main server as well, if > you like. Won't take long. Would it be possible to have the server do the clean

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas
Fujii Masao wrote: > On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas > wrote: >>> So I suggest that you have a new action that gets called after every >>> checkpoint to clear down the archive. It will remove all files from the >>> archive prior to %r. We can implement that as a sequence of un

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Fujii Masao
On Fri, Feb 12, 2010 at 10:10 PM, Heikki Linnakangas wrote: >> So I suggest that you have a new action that gets called after every >> checkpoint to clear down the archive. It will remove all files from the >> archive prior to %r. We can implement that as a sequence of unlink()s >> from within the

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs
On Fri, 2010-02-12 at 12:54 +, Simon Riggs wrote: > So I suggest that you have a new action that gets called after every > checkpoint to clear down the archive. It will remove all files from the > archive prior to %r. We can implement that as a sequence of unlink()s > from within the server, o

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Heikki Linnakangas
Simon Riggs wrote: > In 8.4 it is pg_standby that was responsible for clearing down the > archive, which is why I suggested using pg_standby for that again. I > agree that will not work. The important thing is not pg_standby but that > we have a valid mechanism for clearing down the archive. Good

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-12 Thread Simon Riggs
On Fri, 2010-02-12 at 14:38 +0900, Fujii Masao wrote: > On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas > wrote: > > Simon Riggs wrote: > >> Might it not be simpler to add a parameter onto pg_standby? > >> We send %s to tell pg_standby the standby_mode of the server which is > >> calling it s

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote: >> Heikki Linnakangas writes: >>> -1. it isn't necessary for PITR. It's a new requirement for >>> standby_mode='on', unless we add the file size check into the backend. I >>> think we should add the file size check to the back

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Fujii Masao
On Thu, Feb 11, 2010 at 11:22 PM, Heikki Linnakangas wrote: > Simon Riggs wrote: >> Might it not be simpler to add a parameter onto pg_standby? >> We send %s to tell pg_standby the standby_mode of the server which is >> calling it so it can decide how to act in each case. > > That would work too,

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 19:29 +0200, Heikki Linnakangas wrote: > Aidan Van Dyk wrote: > > * Heikki Linnakangas [100211 09:17]: > > > >> Yeah, if you're careful about that, then this change isn't required. But > >> pg_standby protects against that, so I think it'd be reasonable to have > >> the same

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Garick Hamlin
On Thu, Feb 11, 2010 at 01:22:44PM -0500, Kevin Grittner wrote: > Heikki Linnakangas wrote: > > > I think 'rsync' has the same problem. > > There is a switch you can use to create the problem under rsync, but > by default rsync copies to a temporary file name and moves the > completed file to

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Kevin Grittner
Heikki Linnakangas wrote: > I think 'rsync' has the same problem. There is a switch you can use to create the problem under rsync, but by default rsync copies to a temporary file name and moves the completed file to the target name. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hack

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 13:08 -0500, Tom Lane wrote: > Heikki Linnakangas writes: > > -1. it isn't necessary for PITR. It's a new requirement for > > standby_mode='on', unless we add the file size check into the backend. I > > think we should add the file size check to the backend instead and save >

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: > * Heikki Linnakangas [100211 12:04]: > >>> But it can be a problem - without the last WAL (or at least enough of >>> it) the master switched and archived, you have no guarantee of having >>> being consistent again (I'm thinking specifically of recovering from a >>> fresh ba

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Tom Lane
Heikki Linnakangas writes: > -1. it isn't necessary for PITR. It's a new requirement for > standby_mode='on', unless we add the file size check into the backend. I > think we should add the file size check to the backend instead and save > admins the headache. I think the file size check needs to

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas [100211 12:04]: > > But it can be a problem - without the last WAL (or at least enough of > > it) the master switched and archived, you have no guarantee of having > > being consistent again (I'm thinking specifically of recovering from a > > fresh backup) > > You have to wa

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: > * Heikki Linnakangas [100211 09:17]: > >> Yeah, if you're careful about that, then this change isn't required. But >> pg_standby protects against that, so I think it'd be reasonable to have >> the same level of protection built-in. It's not a lot of code. > > This 1 check

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: > * Heikki Linnakangas [100211 09:17]: > >> If the file is just being copied to the archive when restore_command >> ('cp', say) is launched, it will copy a half file. That's not a problem >> for PITR, because PITR will end at the end of valid WAL anyway, but >> returning a ha

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Euler Taveira de Oliveira
Simon Riggs escreveu: > It would mean that pg_standby would act appropriately according to the > setting of standby_mode. So you wouldn't need multiple examples of use, > it would all just work whatever the setting of standby_mode. Nice simple > entry in the docs. > +1. I like the %s idea. IMHO fi

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> Might it not be simpler to add a parameter onto pg_standby? >>> We send %s to tell pg_standby the standby_mode of the server which is >>> calling it so it can decide how to act in each case.

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Greg Smith
Heikki Linnakangas wrote: Simon Riggs wrote: Might it not be simpler to add a parameter onto pg_standby? We send %s to tell pg_standby the standby_mode of the server which is calling it so it can decide how to act in each case. That would work too, but it doesn't seem any simpler to me

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas [100211 09:17]: > If the file is just being copied to the archive when restore_command > ('cp', say) is launched, it will copy a half file. That's not a problem > for PITR, because PITR will end at the end of valid WAL anyway, but > returning a half WAL file in standby mode i

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 16:22 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Might it not be simpler to add a parameter onto pg_standby? > > We send %s to tell pg_standby the standby_mode of the server which is > > calling it so it can decide how to act in each case. > > That would work t

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > Might it not be simpler to add a parameter onto pg_standby? > We send %s to tell pg_standby the standby_mode of the server which is > calling it so it can decide how to act in each case. That would work too, but it doesn't seem any simpler to me. On the contrary. -- Heikki

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Aidan Van Dyk wrote: > But colour me confused, I'm still not understanding why this is any > different that with normal PITR recovery. > > So even with a plain "cp" in your recovery command instead of a > sleep+copy (a la pg_standby, or PITR tools, or all the home-grown > solutions out thery), I'm

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 15:55 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > One question then: how do we ensure that the archive does not grow too > > big? pg_standby cleans down the archive using %R. That function appears > > to not exist anymore. > > You can still use %R. Of course, p

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Aidan Van Dyk
* Heikki Linnakangas [100211 08:29]: > To suppport a restore_command that does the sleeping itself, like > pg_standby, would require a major rearchitecting of the retry logic. And > I don't see why that'd desirable anyway. It's easier for the admin to > set up using simple commands like 'cp' or

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > One question then: how do we ensure that the archive does not grow too > big? pg_standby cleans down the archive using %R. That function appears > to not exist anymore. You can still use %R. Of course, plain 'cp' won't know what to do with it, so a script will then be require

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:41 +0100, Dimitri Fontaine wrote: > Simon Riggs writes: > > If you were running pg_standby as the restore_command then this error > > wouldn't happen. So you need to explain why running pg_standby cannot > > solve your problem and why we must fix it by replicating code tha

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 15:28 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > If you were running pg_standby as the restore_command then this error > > wouldn't happen. So you need to explain why running pg_standby cannot > > solve your problem and why we must fix it by replicating code tha

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Dimitri Fontaine
Simon Riggs writes: > If you were running pg_standby as the restore_command then this error > wouldn't happen. So you need to explain why running pg_standby cannot > solve your problem and why we must fix it by replicating code that has > previously existed elsewhere. Let me try. pg_standby will

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > If you were running pg_standby as the restore_command then this error > wouldn't happen. So you need to explain why running pg_standby cannot > solve your problem and why we must fix it by replicating code that has > previously existed elsewhere. pg_standby cannot be used with

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:44 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: > >> Simon Riggs wrote: > >>> On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: > Hmm, so after running restore_command, check the file

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: Hmm, so after running restore_command, check the file size and if it's too short, treat it the same as if restore_comman

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Thu, 2010-02-11 at 14:22 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: > >> Hmm, so after running restore_command, check the file size and if it's > >> too short, treat it the same as if restore_command returned non-zero? >

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Heikki Linnakangas
Simon Riggs wrote: > On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: >> Hmm, so after running restore_command, check the file size and if it's >> too short, treat it the same as if restore_command returned non-zero? >> And it will be retried on the next iteration. Works for me, though

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-11 Thread Simon Riggs
On Wed, 2010-02-10 at 09:32 +0200, Heikki Linnakangas wrote: > Fujii Masao wrote: > > As I pointed out previously, the standby might restore a partially-filled > > WAL file that is being archived by the primary, and cause a FATAL error. > > And this happened in my box when I was testing the SR. > >

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Heikki Linnakangas
Aidan Van Dyk wrote: > * Heikki Linnakangas [100210 02:33]: > >> Hmm, so after running restore_command, check the file size and if it's >> too short, treat it the same as if restore_command returned non-zero? >> And it will be retried on the next iteration. Works for me, though OTOH >> it will t

Re: [HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Aidan Van Dyk
* Heikki Linnakangas [100210 02:33]: > Hmm, so after running restore_command, check the file size and if it's > too short, treat it the same as if restore_command returned non-zero? > And it will be retried on the next iteration. Works for me, though OTOH > it will then fail to complain about a

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-10 Thread Fujii Masao
On Wed, Feb 10, 2010 at 4:32 PM, Heikki Linnakangas wrote: > Hmm, so after running restore_command, check the file size and if it's > too short, treat it the same as if restore_command returned non-zero? Yes, only in standby mode case. OTOH I think that normal archive recovery should treat it as

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-09 Thread Heikki Linnakangas
Fujii Masao wrote: > As I pointed out previously, the standby might restore a partially-filled > WAL file that is being archived by the primary, and cause a FATAL error. > And this happened in my box when I was testing the SR. > > sby [20088] FATAL: archive file "00010087" has >

[HACKERS] Re: [COMMITTERS] pgsql: Make standby server continuously retry restoring the next WAL

2010-02-09 Thread Fujii Masao
On Thu, Jan 28, 2010 at 12:27 AM, Heikki Linnakangas wrote: > Log Message: > --- > Make standby server continuously retry restoring the next WAL segment with > restore_command, if the connection to the primary server is lost. This > ensures that the standby can recover automatically, if th