Re: [HACKERS] Point in Time Recovery

2004-07-30 Thread Zeugswetter Andreas SB SD
I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to pg_controldata itself) would enable the relevant info to be extracted

Re: [HACKERS] Point in Time Recovery

2004-07-30 Thread Bruce Momjian
Zeugswetter Andreas SB SD wrote: I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to pg_controldata itself) would enable

Re: [HACKERS] Point in Time Recovery

2004-07-30 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: If you use a readable file you will also need a feature for restore (or a tool) to create an appropriate pg_control file, or are you intending to still require that pg_control be the first file backed up. No, the entire point of this

Re: [HACKERS] Point in Time Recovery

2004-07-30 Thread Mark Kirkwood
Ok - that is a much better way of doing it! regards Mark Tom Lane wrote: Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: If you use a readable file you will also need a feature for restore (or a tool) to create an appropriate pg_control file, or are you intending to still require that

Re: [HACKERS] Point in Time Recovery

2004-07-29 Thread Mark Kirkwood
I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to pg_controldata itself) would enable the relevant info to be extracted P.s : would

Re: [HACKERS] Point in Time Recovery

2004-07-29 Thread Bruce Momjian
Mark Kirkwood wrote: I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to pg_controldata itself) would enable the relevant info to

Re: [HACKERS] Point in Time Recovery

2004-07-29 Thread markir
Quoting Bruce Momjian [EMAIL PROTECTED]: Mark Kirkwood wrote: I was wondering about this point - might it not be just as reasonable for the copied file to *be* an exact image of pg_control? Then a very simple variant of pg_controldata (or maybe even just adding switches to

Re: [HACKERS] Point in Time Recovery

2004-07-28 Thread Bruce Momjian
We need someone to code two backend functions to complete PITR. The function would be called at start/stop of backup of the data directory. The functions would be checked during restore to make sure the requested xid is not between the start/stop xids of the backup. They would also contain

Re: [HACKERS] Point in Time Recovery

2004-07-28 Thread Bruce Momjian
Oh, here is something else we need to add --- a GUC to control whether pg_xlog is clean on recovery start. --- Tom Lane wrote: Bruce and I had another phone chat about the problems that can ensue if you restore a tar

Re: [ADMIN] [HACKERS] Point in Time Recovery

2004-07-28 Thread Bruce Momjian
[ Sorry, sent to hackers now.] Here is another open PITR issue that I think will have to be addressed in 7.6. If you do a critical transaction, but do nothing else for eight hours, that critical transaction hasn't been archived yet. It is still sitting in pg_xlog until the WAL file fills. I

Re: [PATCHES] [HACKERS] Point in Time Recovery

2004-07-27 Thread markw
On 26 Jul, To: [EMAIL PROTECTED] wrote: Sorry I wasn't clearer. I think I have a better idea about what's going on now. With the archiving enabled, it looks like the database is able to complete 1 transaction per database connection, but doesn't complete any subsequent transactions. I'm not

Re: [HACKERS] Point in Time Recovery

2004-07-20 Thread Zeugswetter Andreas SB SD
Hang on, are you supposed to MOVE or COPY away WAL segments? Copy. pg will delete them once they are archived. Copy. pg will recycle them once they are archived. Andreas ---(end of broadcast)--- TIP 9: the planner will ignore your desire to

Re: [HACKERS] Point in Time Recovery

2004-07-20 Thread Bruce Momjian
Simon Riggs wrote: On Sat, 2004-07-17 at 00:57, Bruce Momjian wrote: OK, I think I have some solid ideas and reasons for them. Sorry for taking so long to reply... First, I think we need server-side functions to call when we start/stop the backup. The advantage of these server-side

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Simon Riggs
On Sat, 2004-07-17 at 00:57, Bruce Momjian wrote: OK, I think I have some solid ideas and reasons for them. Sorry for taking so long to reply... First, I think we need server-side functions to call when we start/stop the backup. The advantage of these server-side functions is that they

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Tom Lane
Bruce and I had another phone chat about the problems that can ensue if you restore a tar backup that contains old (incompletely filled) versions of WAL segment files. While the current code will ignore them during the recovery-from-archive run, leaving them laying around seems awfully dangerous.

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Christopher Kings-Lynne
I've got a PITR set up here that's happily scp'ing WAL files across to another machine. However, the NIC in the machine is currently stuffed, so it gets like 50k/s :) What happens in general if you are generating WAL file bytes faster always than they can be copied off? Also, does the

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes: I've got a PITR set up here that's happily scp'ing WAL files across to another machine. However, the NIC in the machine is currently stuffed, so it gets like 50k/s :) What happens in general if you are generating WAL file bytes faster

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Christopher Kings-Lynne
If you keep falling further and further behind, eventually your pg_xlog directory will fill the space available on its disk, and I think at that point PG will panic and shut down because it can't create any more xlog segments. Hang on, are you supposed to MOVE or COPY away WAL segments? Chris

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes: If you keep falling further and further behind, eventually your pg_xlog directory will fill the space available on its disk, and I think at that point PG will panic and shut down because it can't create any more xlog segments. Hang on, are

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Bruce Momjian
Christopher Kings-Lynne wrote: If you keep falling further and further behind, eventually your pg_xlog directory will fill the space available on its disk, and I think at that point PG will panic and shut down because it can't create any more xlog segments. Hang on, are you supposed to

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Christopher Kings-Lynne
Hang on, are you supposed to MOVE or COPY away WAL segments? COPY. The checkpoint code will then delete or recycle the segment file, as appropriate. So what happens if you just move it? Postgres breaks? Chris ---(end of broadcast)--- TIP 8: explain

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes: Hang on, are you supposed to MOVE or COPY away WAL segments? COPY. The checkpoint code will then delete or recycle the segment file, as appropriate. So what happens if you just move it? Postgres breaks? I don't think so, but it seems like

Re: [HACKERS] Point in Time Recovery

2004-07-19 Thread Christopher Kings-Lynne
I don't think so, but it seems like a much less robust way to do things. What happens if you have a failure partway through? For instance archive machine dies and loses recent data right after you've rm'd the source file. The recommended COPY procedure at least provides some breathing room

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 04:49, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2004-07-16 at 00:01, Alvaro Herrera wrote: My manpage for signal(2) says that you shouldn't assign SIG_IGN to SIGCHLD, according to POSIX. So - I should be setting this to SIG_DFL and thats good

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD
I'm aiming for the minimum feature set - which means we do need to take care over whether that set is insufficient and also to pull any part that doesn't stand up to close scrutiny over the next few days. As you can see, we are still chewing on NT. What PITR features are missing? I

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: We only need to tell people to backup pg_control first. The rest was only intended to enforce 1. that pg_control is the first file backed up 2. the dba uses a large enough PIT (or xid) for restore Right, but I think Bruce's point is that

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: We only need to tell people to backup pg_control first. The rest was only intended to enforce 1. that pg_control is the first file backed up 2. the dba uses a large enough PIT (or xid) for restore Right, but I think Bruce's point is

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? I do not care for the idea of a recovery.conf

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD
then on restore once all the files are restored move the pg_control.backup to its original name. That gives us the checkpoint wal/offset but how do we get the start/stop information. Is that not required? The checkpoint wal/offset is in pg_control, that is sufficient start information.

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. You have to start at a checkpoint record when replaying the log, so if no checkpoint occurred between starting to archive WAL and starting

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Zeugswetter Andreas SB SD
Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after archiving starts. Imho turning on xlog archiving should issue such a

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes, you need one checkpoint after

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 15:27, Bruce Momjian wrote: Also, when you are in recovery mode, how do you get out of recovery mode, meaning if you have a power failure, how do you prevent the system from doing another recovery? Do you remove the recovery.conf file? That was the whole point of the

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 16:25, Zeugswetter Andreas SB SD wrote: I think the filename 'recovery.conf' is misleading, since it is not a static configuration file, but a command file for one recovery. How about 'recovery.command' then 'recovery.inprogress', and on recovery completion it should

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian
Simon Riggs wrote: On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not connect 'archiving' with the backed up xlogs :-( So yes,

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 19:30, Bruce Momjian wrote: Simon Riggs wrote: On Fri, 2004-07-16 at 16:58, Zeugswetter Andreas SB SD wrote: Do we need a checkpoint after the archiving starts but before the backup begins? No. Actually yes. Sorry, I did incorrectly not

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Simon Riggs
On Fri, 2004-07-16 at 16:47, Tom Lane wrote: As far as the business about copying pg_control first goes: there is another way to think about it, which is to copy pg_control to another place that will be included in your backup. For example the standard backup procedure could be 1.

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian
OK, I think I have some solid ideas and reasons for them. First, I think we need server-side functions to call when we start/stop the backup. The advantage of these server-side functions is that they will do the required work of recording the pg_control values and creating needed files with

Re: [HACKERS] Point in Time Recovery

2004-07-16 Thread Bruce Momjian
Let me address you concerns about PITR getting into 7.5. I think a few people spoke last week expressing concern about our release process and wanting to take drastic action. However, looking at the release status report I am about to post, you will see we are on track for an August 1 beta.

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 02:43, Mark Kirkwood wrote: I noticed that compiling with 5_1 patch applied fails due to XLOG_archive_dir being removed from xlog.c , but src/backend/commands/tablecmds.c still uses it. I did the following to tablecmds.c : 5408c5408 extern char

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 03:02, Bruce Momjian wrote: I talked to Tom on the phone today and and I think we have a procedure for doing backup/restore in a fairly foolproof way. As outlined below, we need to record the start/stop and checkpoint WAL file names and offsets, and somehow pass those

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Mark Kirkwood
I tried what I thought was a straightforward scenario, and seem to have broken it :-( Here is the little tale 1) initdb 2) set archive_mode and archive_dest in postgresql.conf 3) startup 4) create database called 'test' 5) connect to 'test' and type 'checkpoint' 6) backup PGDATA using 'tar

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Zeugswetter Andreas SB SD
Other db's have commands for: start/end external backup I see that the analogy to external backup was not good, since you are correct that dba's would expect that to stop all writes, so they can safely split their mirror or some such. Usually the expected time from start until end external

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread HISADAMasaki
Dear Simon, I've just tested pitr_v5_2.patch and got an error message during archiving process as follows. -- begin LOG: archive command=cp /usr/local/pgsql/data/pg_xlog/ /tmp,return code=-1 -- end The command called in system(3) works, but it returns -1. system(3) can not

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Zeugswetter Andreas SB SD
Sorry for the stupid question, but how do I get this patch if I do not receive the patches mails ? The web interface html'ifies it, thus making it unusable. Thanks Andreas ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ?

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Bruce Momjian
Simon Riggs wrote: On Thu, 2004-07-15 at 03:02, Bruce Momjian wrote: I talked to Tom on the phone today and and I think we have a procedure for doing backup/restore in a fairly foolproof way. As outlined below, we need to record the start/stop and checkpoint WAL file names and offsets,

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Bruce Momjian
Simon Riggs wrote: On Wed, 2004-07-14 at 10:57, Zeugswetter Andreas SB SD wrote: The recovery mechanism doesn't rely upon you knowing 1 or 3. The recovery reads pg_control (from the backup) and then attempts to de-archive the appropriate xlog segment file and then starts rollforward

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 10:47, Mark Kirkwood wrote: I tried what I thought was a straightforward scenario, and seem to have broken it :-( Here is the little tale 1) initdb 2) set archive_mode and archive_dest in postgresql.conf 3) startup 4) create database called 'test' 5) connect to

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 15:57, Bruce Momjian wrote: We will get there --- it just seems dark at this time. Thanks for that. My comments were heartfelt, but not useful right now. I'm badly overdrawn already on my time budget, though that is my concern alone. There is more to do than I have time

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Devrim GUNDUZ
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Simon, On Thu, 15 Jul 2004, Simon Riggs wrote: We will get there --- it just seems dark at this time. Thanks for that. My comments were heartfelt, but not useful right now. I'm badly overdrawn already on my time budget, though that is my

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 13:16, HISADAMasaki wrote: Dear Simon, I've just tested pitr_v5_2.patch and got an error message during archiving process as follows. -- begin LOG: archive command=cp /usr/local/pgsql/data/pg_xlog/ /tmp,return code=-1 -- end The command called

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Thu, 2004-07-15 at 23:18, Devrim GUNDUZ wrote: Thanks for the vote of confidence, on or off list. too many people spend a lot of money for proprietary databases, just for some missing features in PostgreSQL Agreed - PITR isn't aimed at existing users of PostgreSQL. If you use it

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Alvaro Herrera
On Thu, Jul 15, 2004 at 11:44:02PM +0100, Simon Riggs wrote: On Thu, 2004-07-15 at 13:16, HISADAMasaki wrote: -- line 236 --- - pgsignal(SIGCHLD, SIG_IGN); -- line 236 --- + pgsignal(SIGCHLD, SIG_DFL); I'm not sure I understand why its returned -1, though I'll take you

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Bruce Momjian
Simon Riggs wrote: On Thu, 2004-07-15 at 23:18, Devrim GUNDUZ wrote: Thanks for the vote of confidence, on or off list. too many people spend a lot of money for proprietary databases, just for some missing features in PostgreSQL Agreed - PITR isn't aimed at existing users of

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Mark Kirkwood
Simon Riggs wrote: First, thanks for sticking with it to test this. I've not received such a message myself - this is interesting. Is it possible to copy that directory to one side and re-run the test? Add another parameter in postgresql.conf called archive_debug = true Does it happen identically

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Bruce Momjian
Simon Riggs wrote: On Thu, 2004-07-15 at 15:57, Bruce Momjian wrote: We will get there --- it just seems dark at this time. Thanks for that. My comments were heartfelt, but not useful right now. I'm badly overdrawn already on my time budget, though that is my concern alone. There is

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Glen Parker
Simon Riggs wrote: On Thu, 2004-07-15 at 23:18, Devrim GUNDUZ wrote: Thanks for the vote of confidence, on or off list. too many people spend a lot of money for proprietary databases, just for some missing features in PostgreSQL Agreed - PITR isn't aimed at existing users

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Mark Kirkwood
Couldn't agree more. Maybe we should have made more noise :-) Glen Parker wrote: Simon Riggs wrote: On Thu, 2004-07-15 at 23:18, Devrim GUNDUZ wrote: Thanks for the vote of confidence, on or off list. too many people spend a lot of money for proprietary databases, just for

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Mark Kirkwood
Simon Riggs wrote: So far: I've tried to re-create the problem as exactly as I can, but it works for me. This is clearly an important case to chase down. I assume that this is the very first time you tried recovery? Second and subsequent recoveries using the same set have a potential loophole,

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Fri, 2004-07-16 at 00:01, Alvaro Herrera wrote: On Thu, Jul 15, 2004 at 11:44:02PM +0100, Simon Riggs wrote: On Thu, 2004-07-15 at 13:16, HISADAMasaki wrote: -- line 236 --- - pgsignal(SIGCHLD, SIG_IGN); -- line 236 --- + pgsignal(SIGCHLD, SIG_DFL); I'm not sure I

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Simon Riggs
On Fri, 2004-07-16 at 00:46, Mark Kirkwood wrote: By way of contrast, using the *same* procedure (1-11), but generating 2 logs worth of INSERTS/UPDATES using 10 concurrent process *works fine* - e.g : Great...at least we have shown that something works (or can work) and have begun to

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: On Fri, 2004-07-16 at 00:01, Alvaro Herrera wrote: My manpage for signal(2) says that you shouldn't assign SIG_IGN to SIGCHLD, according to POSIX. So - I should be setting this to SIG_DFL and thats good for everyone? Yeah, we learned the same lesson in

Re: [HACKERS] Point in Time Recovery

2004-07-15 Thread Christopher Kings-Lynne
Thanks for that. My comments were heartfelt, but not useful right now. Hi Simon, I'm sorry if I gave the impression that I thought your work wasn't worthwhile, it is :( I'm badly overdrawn already on my time budget, though that is my concern alone. There is more to do than I have time for.

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Simon Riggs
On Wed, 2004-07-14 at 03:31, Christopher Kings-Lynne wrote: Can you give us some suggestions of what kind of stuff to test? Is there a way we can artificially kill the backend in all sorts of nasty spots to see if recovery works? Does kill -9 simulate a 'power off'? I was hoping some

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Zeugswetter Andreas SB SD
The recovery mechanism doesn't rely upon you knowing 1 or 3. The recovery reads pg_control (from the backup) and then attempts to de-archive the appropriate xlog segment file and then starts rollforward Unfortunately this only works if pg_control was the first file to be backed up (or by

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I've not done power off tests, yet. They need to be done just to check...actually you don't need to do this to test PITR... I agree, power off is not really the point here. What we need to check into is (a) the mechanics of archiving WAL segments and (b)

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread markw
On 14 Jul, Simon Riggs wrote: PITR Patch v5_1 just posted has Point in Time Recovery working Still some rough edgesbut we really need some testers now to give this a try and let me know what you think. Klaus Naumann and Mark Wong are the only [non-committers] to have tried to run

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Simon Riggs
On Wed, 2004-07-14 at 16:55, [EMAIL PROTECTED] wrote: On 14 Jul, Simon Riggs wrote: PITR Patch v5_1 just posted has Point in Time Recovery working Still some rough edgesbut we really need some testers now to give this a try and let me know what you think. Klaus Naumann and

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Simon Riggs
On Wed, 2004-07-14 at 10:57, Zeugswetter Andreas SB SD wrote: The recovery mechanism doesn't rely upon you knowing 1 or 3. The recovery reads pg_control (from the backup) and then attempts to de-archive the appropriate xlog segment file and then starts rollforward Unfortunately this

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Mark Kirkwood
I noticed that compiling with 5_1 patch applied fails due to XLOG_archive_dir being removed from xlog.c , but src/backend/commands/tablecmds.c still uses it. I did the following to tablecmds.c : 5408c5408 extern char XLOG_archive_dir[]; --- extern char

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread SAKATA Tetsuo
Hi, folks. My colleages and I are planning to test PITR after the 7.5 beta release. Now we are desinging test items, but some specification are enough clear (to us). For example, we are not clear which resouce manager order to store log records. - some access method (like B-tree) require to log

Re: [HACKERS] Point in Time Recovery

2004-07-14 Thread Bruce Momjian
I talked to Tom on the phone today and and I think we have a procedure for doing backup/restore in a fairly foolproof way. As outlined below, we need to record the start/stop and checkpoint WAL file names and offsets, and somehow pass those on to restore. I think any system that requires users

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Tue, 2004-07-06 at 22:40, Simon Riggs wrote: On Mon, 2004-07-05 at 22:46, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: - when we stop, keep reading records until EOF, just don't apply them. When we write a checkpoint at end of recovery, the unapplied transactions are

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Zeugswetter Andreas SB SD
The starting a new timeline thought works for xlogs, but not for clogs. No matter how far you go into the future, there is a small (yet vanishing) possibility that there is a yet undiscovered committed transaction in the future. (Because transactions are ordered in the clog because xids are

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Tue, 2004-07-13 at 13:18, Zeugswetter Andreas SB SD wrote: The starting a new timeline thought works for xlogs, but not for clogs. No matter how far you go into the future, there is a small (yet vanishing) possibility that there is a yet undiscovered committed transaction in the future.

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: Please tell me that we can ignore the state of the clog, We can. The reason that keeping track of timelines is interesting for xlog is simply to take pity on the poor DBA who needs to distinguish the various archived xlog files he's got laying about, and so

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Tue, 2004-07-13 at 15:29, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Please tell me that we can ignore the state of the clog, We can. In general, you are of course correct. The reason that keeping track of timelines is interesting for xlog is simply to take pity on the

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: I'm getting carried away with the improbablebut this is the rather strange, but possible scenario I foresee: A sequence of times... 1. We start archiving xlogs 2. We take a checkpoint 3. we commit an important transaction 4. We take a backup 5. We

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Tue, 2004-07-13 at 22:19, Tom Lane wrote: To have a consistent recovery at all, you must replay the log starting from a checkpoint before the backup began and extending to the time that the backup finished. You only get to decide where to stop after that point. So the situation is: -

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes: So the situation is: - You must only stop recovery at a point in time (in the logs) after the backup had completed. Right. No way to enforce that currently, apart from procedurally. Not exactly frequent, so I think I just document that and move on, eh?

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Tue, 2004-07-13 at 23:42, Bruce Momjian wrote: Simon Riggs wrote: On Tue, 2004-07-13 at 22:19, Tom Lane wrote: To have a consistent recovery at all, you must replay the log starting from a checkpoint before the backup began and extending to the time that the backup finished. You

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Bruce Momjian
Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: So the situation is: - You must only stop recovery at a point in time (in the logs) after the backup had completed. Right. No way to enforce that currently, apart from procedurally. Not exactly frequent, so I think I just

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: OK, but procedurally, how do you correlate the start/stop time of the tar backup with the WAL numeric file names? Ideally the procedure for making a backup would go something like: 1. Inquire of the server its current time and the WAL position of the

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Wed, 2004-07-14 at 00:28, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: OK, but procedurally, how do you correlate the start/stop time of the tar backup with the WAL numeric file names? Ideally the procedure for making a backup would go something like: 1. Inquire of the

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
PITR Patch v5_1 just posted has Point in Time Recovery working Still some rough edgesbut we really need some testers now to give this a try and let me know what you think. Klaus Naumann and Mark Wong are the only [non-committers] to have tried to run the code (and let me know about it),

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Bruce Momjian
Simon Riggs wrote: On Tue, 2004-07-13 at 22:19, Tom Lane wrote: To have a consistent recovery at all, you must replay the log starting from a checkpoint before the backup began and extending to the time that the backup finished. You only get to decide where to stop after that point.

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Simon Riggs
On Wed, 2004-07-14 at 00:01, Bruce Momjian wrote: Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: So the situation is: - You must only stop recovery at a point in time (in the logs) after the backup had completed. Right. No way to enforce that currently, apart from

Re: [HACKERS] Point in Time Recovery

2004-07-13 Thread Christopher Kings-Lynne
Can you give us some suggestions of what kind of stuff to test? Is there a way we can artificially kill the backend in all sorts of nasty spots to see if recovery works? Does kill -9 simulate a 'power off'? Chris Simon Riggs wrote: PITR Patch v5_1 just posted has Point in Time Recovery

Re: [HACKERS] Point in Time Recovery

2004-07-10 Thread Jan Wieck
On 7/6/2004 3:58 PM, Simon Riggs wrote: On Tue, 2004-07-06 at 08:38, Zeugswetter Andreas SB SD wrote: - by time - but the time stamp on each xlog record only specifies to the second, which could easily be 10 or more commits (we hope) Should we use a different datatype than time_t for the

Re: [HACKERS] Point in Time Recovery

2004-07-10 Thread Simon Riggs
On Sat, 2004-07-10 at 15:17, Jan Wieck wrote: On 7/6/2004 3:58 PM, Simon Riggs wrote: On Tue, 2004-07-06 at 08:38, Zeugswetter Andreas SB SD wrote: - by time - but the time stamp on each xlog record only specifies to the second, which could easily be 10 or more commits (we hope)

Re: [HACKERS] Point in Time Recovery

2004-07-09 Thread spock
On Tue, 6 Jul 2004, Zeugswetter Andreas SB SD wrote: Should we use a different datatype than time_t for the commit timestamp, one that offers more fine grained differentiation between checkpoints? Imho seconds is really sufficient. If you know a more precise position you will probably know

Re: [HACKERS] Point in Time Recovery

2004-07-09 Thread spock
On Thu, 8 Jul 2004, Simon Riggs wrote: We don't need to mention timelines in the docs, nor do we need to alter pg_controldata to display it...just a comment in the code to explain why we add a large number to the LogId after each recovery completes. I'd disagree on that. Knowing what exactly

Re: [HACKERS] Point in Time Recovery

2004-07-08 Thread Simon Riggs
On Thu, 2004-07-08 at 07:57, [EMAIL PROTECTED] wrote: On Thu, 8 Jul 2004, Simon Riggs wrote: We don't need to mention timelines in the docs, nor do we need to alter pg_controldata to display it...just a comment in the code to explain why we add a large number to the LogId after each

Re: [HACKERS] Point in Time Recovery

2004-07-07 Thread Zeugswetter Andreas SB SD
Well, Tom does seem to have something with regard to StartUpIds. I feel it is easier to force a new timeline by adding a very large number to the LogId IF, and only if, we have performed an archive recovery. That way, we do not change at all the behaviour of the system for people that choose

Re: [HACKERS] Point in Time Recovery

2004-07-07 Thread Simon Riggs
On Wed, 2004-07-07 at 14:17, Zeugswetter Andreas SB SD wrote: Well, Tom does seem to have something with regard to StartUpIds. I feel it is easier to force a new timeline by adding a very large number to the LogId IF, and only if, we have performed an archive recovery. That way, we do not

Re: [HACKERS] Point in Time Recovery

2004-07-06 Thread Zeugswetter Andreas SB SD
- by time - but the time stamp on each xlog record only specifies to the second, which could easily be 10 or more commits (we hope) Should we use a different datatype than time_t for the commit timestamp, one that offers more fine grained differentiation between checkpoints? Imho

Re: [HACKERS] Point in Time Recovery

2004-07-06 Thread Simon Riggs
On Tue, 2004-07-06 at 08:38, Zeugswetter Andreas SB SD wrote: - by time - but the time stamp on each xlog record only specifies to the second, which could easily be 10 or more commits (we hope) Should we use a different datatype than time_t for the commit timestamp, one that offers

Re: [HACKERS] Point in Time Recovery

2004-07-06 Thread Simon Riggs
On Mon, 2004-07-05 at 22:46, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: - when we stop, keep reading records until EOF, just don't apply them. When we write a checkpoint at end of recovery, the unapplied transactions are buried alive, never to return. - stop where we stop,

Re: [HACKERS] Point in Time Recovery

2004-07-06 Thread Simon Riggs
On Tue, 2004-07-06 at 20:00, Richard Huxton wrote: Simon Riggs wrote: On Mon, 2004-07-05 at 22:46, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: Should we use a different datatype than time_t for the commit timestamp, one that offers more fine grained differentiation between

[HACKERS] Point in Time Recovery

2004-07-05 Thread Simon Riggs
Taking advantage of the freeze bubble allowed us... there are some last minute features to add. Summarising earlier thoughts, with some detailed digging and design from myself in last few days - we're now in a position to add Point-in-Time Recovery, on top of whats been achieved. The target for

  1   2   >