[ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Jeff Frost
I've run into a problem with a PITR setup at a client. The problem is that whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has problems, it seems that the current client connection gets blocked and this eventually builds up to a "sorry, too many clients already" error. I'm w

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Tom Arthurs
What might be more bullet proof would be to make the archive command copy the file to an intermediate local directory, then have a daemon/cron job that wakes up once a minute or so, check for new files, then copy them to the network mount. You may want to use something like lofs to make sure t

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > I've run into a problem with a PITR setup at a client. The problem is that > whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has > problems, it seems that the current client connection gets blocked and this > eventually builds up to a "

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Jeff Frost
On Mon, 15 May 2006, Tom Lane wrote: No, I can't see what the connection should be there. It's supposed to be designed so that the archive command can take its sweet old time and nothing happens except that a backlog of WAL files builds up in pg_xlog. That's what I thought, but that doesn't s

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > On Mon, 15 May 2006, Tom Lane wrote: >> No, I can't see what the connection should be there. It's supposed to >> be designed so that the archive command can take its sweet old time and >> nothing happens except that a backlog of WAL files builds up in pg_xl

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Jeff Frost
On Mon, 15 May 2006, Tom Lane wrote: That's what I thought, but that doesn't seem to be what I'm observing. Of course the NAS device only gets wedged about once every month or two, so it's difficult to reproduce. If it's really a PG bug, it should be trivial to reproduce: put a long sleep in

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Simon Riggs
On Mon, 2006-05-15 at 09:28 -0700, Jeff Frost wrote: > I've run into a problem with a PITR setup at a client. The problem is that > whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has > problems What kind of problems? > , it seems that the current client connection gets block

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Jeff Frost
On Mon, 15 May 2006, Simon Riggs wrote: On Mon, 2006-05-15 at 09:28 -0700, Jeff Frost wrote: I've run into a problem with a PITR setup at a client. The problem is that whenever the CIFS NAS device that we're mounting at /mnt/pgbackup has problems What kind of problems? It becomes unwritabl

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Scott Marlowe
On Mon, 2006-05-15 at 16:29, Jeff Frost wrote: > On Mon, 15 May 2006, Simon Riggs wrote: > > > On Mon, 2006-05-15 at 09:28 -0700, Jeff Frost wrote: > >> I've run into a problem with a PITR setup at a client. The problem is that > >> whenever the CIFS NAS device that we're mounting at /mnt/pgbacku

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-15 Thread Simon Riggs
On Mon, 2006-05-15 at 14:29 -0700, Jeff Frost wrote: > On Mon, 15 May 2006, Simon Riggs wrote: > > > On Mon, 2006-05-15 at 09:28 -0700, Jeff Frost wrote: > >> I've run into a problem with a PITR setup at a client. The problem is that > >> whenever the CIFS NAS device that we're mounting at /mnt/p

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-16 Thread Simon Riggs
On Mon, 2006-05-15 at 16:58 -0700, Jeff Frost wrote: > The log is below. Note that the problem began around 2a.m. around the time > the complaint about checkpoint segments happens. After a bit of research it > appears that the checkpoint complaint happens when our db maintenance job > kicks o

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-16 Thread Jeff Frost
On Tue, 16 May 2006, Simon Riggs wrote: I don't see much evidence for a connection between archiver and these issues. The problems start after autovacuum of "vb_web" at 02:08. That seems much more likely to have something to do with client connections than the archiver - which is really nothing

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-16 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > On Tue, 16 May 2006, Simon Riggs wrote: >> Whatever happened between 02:08 and 02:14 seems important. > I have the logs and after reviewing /var/log/messages for that time period, > there is no other activity besides postgres. I have a lurking feeling tha

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-16 Thread Jeff Frost
On Wed, 17 May 2006, Tom Lane wrote: I have a lurking feeling that the still-hypothetical connection between archiver and foreground operations might come into operation at pg_clog page boundaries (which require emitting XLOG events) --- that is, every 32K transactions something special happens

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Simon Riggs
On Wed, 2006-05-17 at 00:36 -0400, Tom Lane wrote: > Jeff Frost <[EMAIL PROTECTED]> writes: > > On Tue, 16 May 2006, Simon Riggs wrote: > >> Whatever happened between 02:08 and 02:14 seems important. > > > I have the logs and after reviewing /var/log/messages for that time period, > > there is no

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes: > You'll have to explain a little more. I checked the archives... I was thinking of http://archives.postgresql.org/pgsql-hackers/2004-01/msg00530.php full explanation here: http://archives.postgresql.org/pgsql-hackers/2004-01/msg00606.php > The "lurking fee

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Simon Riggs
On Wed, 2006-05-17 at 10:01 -0400, Tom Lane wrote: > Simon Riggs <[EMAIL PROTECTED]> writes: > > You'll have to explain a little more. I checked the archives... > > I was thinking of > http://archives.postgresql.org/pgsql-hackers/2004-01/msg00530.php > full explanation here: > http://archives.post

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Jeff Frost
On Wed, 17 May 2006, Tom Lane wrote: The "lurking feeling" scenario above might or might nor be an issue here, but I can't see how the archiver could be involved at all. Well, I don't see it either; at this point we're waiting on Jeff to provide some harder evidence ... Was the 3,000 transac

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > Was the 3,000 transactions per minute helpful? What other evidence should I > be looking for? Did you try generating a test case using a long sleep() as a replacement for the archive_command script? If there is a PG bug here it shouldn't be that hard to

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Jeff Frost
On Wed, 17 May 2006, Tom Lane wrote: Did you try generating a test case using a long sleep() as a replacement for the archive_command script? If there is a PG bug here it shouldn't be that hard to expose it in a simple test case. I'm up to my armpits in other stuff and don't have time to try i

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Jeff Frost
On Wed, 17 May 2006, Jeff Frost wrote: On Wed, 17 May 2006, Tom Lane wrote: Did you try generating a test case using a long sleep() as a replacement for the archive_command script? If there is a PG bug here it shouldn't be that hard to expose it in a simple test case. I'm up to my armpits in

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > I seem to get alot of these: > May 17 21:34:04 discord postgres[20573]: [5-1] WARNING: could not rename > file > "pg_xlog/archive_status/00010001.ready" to > May 17 21:34:04 discord postgres[20573]: [5-2] > "pg_xlog/archive_status/00

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Jeff Frost
On Thu, 18 May 2006, Tom Lane wrote: Jeff Frost <[EMAIL PROTECTED]> writes: I seem to get alot of these: May 17 21:34:04 discord postgres[20573]: [5-1] WARNING: could not rename file "pg_xlog/archive_status/00010001.ready" to May 17 21:34:04 discord postgres[20573]: [5-2] "p

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-17 Thread Jeff Frost
On Wed, 17 May 2006, Jeff Frost wrote: And in the window where I started postgres via pg_ctl, I had this: cat: pg_xlog/0001000E: No such file or directory cat: pg_xlog/0001000E: No such file or directory Hrmmm...my pgbench died with an integer out of range erro

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-18 Thread Simon Riggs
On Wed, 2006-05-17 at 22:45 -0700, Jeff Frost wrote: > On Thu, 18 May 2006, Tom Lane wrote: > > > Jeff Frost <[EMAIL PROTECTED]> writes: > >> I seem to get alot of these: > > > >> May 17 21:34:04 discord postgres[20573]: [5-1] WARNING: could not rename > >> file > >> "pg_xlog/archive_status/

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-18 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > Hrmmm...my pgbench died with an integer out of range error: That's normal, if you run it long enough without re-creating the tables. It keeps adding small values to the balances, and eventually they overflow. (Possibly someone should fix it so that the del

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-18 Thread Jeff Frost
On Thu, 18 May 2006, Simon Riggs wrote: Seems so. Can you post the full test, plus full execution log. [You don't need to "cat" you could just do "ls" instead FWIW] Are you doing *anything* with pg_xlog directory or below? I understand your saying No to that question and pg_xlog has not been

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Thu, 2006-05-18 at 10:08 -0700, Jeff Frost wrote: > May 18 08:00:18 discord postgres[20228]: [129-1] LOG: archived transaction > log file "0001007F" > May 18 08:00:41 discord postgres[20573]: [254-1] LOG: archived transaction > log file "0001007F" > May 18 08

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: Jeff Frost <[EMAIL PROTECTED]> writes: Do you think the postmaster on 5432 is trying to archive the other postmaster's WAL files somehow? Not as long as they aren't in the same data directory ;-). What Simon was wondering about was whether an archiver pro

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: Well, there's our smoking gun. IIRC, all the failures you showed us are consistent with race conditions caused by multiple archiver processes all trying to do the same tasks concurrently. Do you frequently stop and restart the postmaster? Because I don't s

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > Hurray! Unfortunately, the postmaster on the original troubled server almost > never gets restarted, and in fact only has only one archiver process running > right now. Drat! Well, the fact that there's only one archiver *now* doesn't mean there wasn't m

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Tom Lane
I wrote: > Well, the fact that there's only one archiver *now* doesn't mean there > wasn't more than one when the problem happened. The orphaned archiver > would eventually quit. But, actually, nevermind: we have explained the failures you were seeing in the test setup, but a multiple-active-arch

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 12:20 -0400, Tom Lane wrote: > I wrote: > > Well, the fact that there's only one archiver *now* doesn't mean there > > wasn't more than one when the problem happened. The orphaned archiver > > would eventually quit. > > But, actually, nevermind: we have explained the failure

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: Well, the fact that there's only one archiver *now* doesn't mean there wasn't more than one when the problem happened. The orphaned archiver would eventually quit. Do you have logs that would let you check when the production postmaster was restarted? I l

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Jeff Frost
On Fri, 19 May 2006, Tom Lane wrote: What I'd suggest is resuming the test after making sure you've killed off any old archivers, and seeing if you can make any progress on reproducing the original problem. We definitely need a multiple-archiver interlock, but I think that must be unrelated to

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-19 Thread Simon Riggs
On Fri, 2006-05-19 at 09:36 -0700, Jeff Frost wrote: > On Fri, 19 May 2006, Tom Lane wrote: > > > What I'd suggest is resuming the test after making sure you've killed > > off any old archivers, and seeing if you can make any progress on > > reproducing the original problem. We definitely need a

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Jeff Frost
On Fri, 19 May 2006, Simon Riggs wrote: Now I can run my same pg_bench, or do you guys have any other suggestions on attempting to reproduce the problem? No. We're back on track to try to reproduce the original error. I've been futzing with trying to reproduce the original problem for a few

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Simon Riggs
On Sun, 2006-05-21 at 14:16 -0700, Jeff Frost wrote: > On Fri, 19 May 2006, Simon Riggs wrote: > > >> Now I can run my same pg_bench, or do you guys > >> have any other suggestions on attempting to reproduce the problem? > > > > No. We're back on track to try to reproduce the original error. > >

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-21 Thread Jeff Frost
On Sun, 21 May 2006, Simon Riggs wrote: I've been futzing with trying to reproduce the original problem for a few days and so far postgres seems to be just fine with a long delay on archiving, so now I'm rather at a loss. In fact, I currently have 1,234 xlog files in pg_xlog, but the archiver i

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Jeff Frost
On Sun, 21 May 2006, Jeff Frost wrote: So the chances of the original problem being archiver related are receding... This is possible, but I guess I should try and reproduce the actual problem with the same archive_command script and a CIFS mount just to see what happens. Perhaps the real r

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > I tried both pulling the plug on the CIFS server and unsharing the CIFS > share, > but pgbench continued completely unconcerned. I guess the failure mode of > the > NAS device in the customer colo must be something different that I don't yet > know how

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-22 Thread Jeff Frost
On Tue, 23 May 2006, Tom Lane wrote: I'm still thinking that the simplest explanation is that $PGDATA/pg_clog/ is on the NAS device. Please double-check the file locations. I know that seems like an excellent candidate, but it really isn't, I swear. In fact, you almost had me convinced the l

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Simon Riggs
On Fri, 2006-05-19 at 08:53 -0700, Jeff Frost wrote: > On Fri, 19 May 2006, Tom Lane wrote: > > > Jeff Frost <[EMAIL PROTECTED]> writes: > >> Do you think the postmaster on 5432 is trying to archive the other > >> postmaster's WAL files somehow? > > > > Not as long as they aren't in the same data

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > Do you think the postmaster on 5432 is trying to archive the other > postmaster's WAL files somehow? Not as long as they aren't in the same data directory ;-). What Simon was wondering about was whether an archiver process had somehow been left over from

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Simon Riggs
On Fri, 2006-05-19 at 08:23 -0700, Jeff Frost wrote: > On Fri, 19 May 2006, Simon Riggs wrote: > > > On Thu, 2006-05-18 at 10:08 -0700, Jeff Frost wrote: > > > >> May 18 08:00:18 discord postgres[20228]: [129-1] LOG: archived > >> transaction log file "0001007F" > >> May 18 08:00

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Jeff Frost
On Fri, 19 May 2006, Simon Riggs wrote: On Thu, 2006-05-18 at 10:08 -0700, Jeff Frost wrote: May 18 08:00:18 discord postgres[20228]: [129-1] LOG: archived transaction log file "0001007F" May 18 08:00:41 discord postgres[20573]: [254-1] LOG: archived transaction log file "0

Re: [ADMIN] does wal archiving block the current client connection?

2006-05-23 Thread Tom Lane
Jeff Frost <[EMAIL PROTECTED]> writes: > Well now, will you look at this: > postgres 20228 1 0 May17 ?00:00:00 postgres: archiver process > postgres 20573 1 0 May17 ?00:00:00 postgres: archiver process > postgres 23817 23810 0 May17 pts/11 00:00:00 postgres: archiver p