Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-08 Thread Fujii Masao
On Thu, Jul 8, 2010 at 7:55 AM, Robert Haas wrote: >> What was the final decision on behavior if fsync=off? > > I'm not sure we made any decision, per se, but if you use fsync=off in > combination with SR and experience an unexpected crash-and-reboot on > the master, you will be sad. True. But, w

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread marcin mank
> Having said that, I do think we urgently need some high-level design > discussion on how sync rep is actually going to handle this issue > (perhaps on a new thread).  If we can't resolve this issue, sync rep > is going to be really slow; but there are no easy solutions to this > problem in sight,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Robert Haas
On Wed, Jul 7, 2010 at 6:44 PM, Josh Berkus wrote: > On 7/6/10 4:44 PM, Robert Haas wrote: >> To recap the previous discussion on this thread, we ended up changing >> the behavior of 9.0 so that it only sends WAL which has been written >> to the OS *and flushed*, because sending unflushed WAL to t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Josh Berkus
On 7/6/10 4:44 PM, Robert Haas wrote: > To recap the previous discussion on this thread, we ended up changing > the behavior of 9.0 so that it only sends WAL which has been written > to the OS *and flushed*, because sending unflushed WAL to the standby > is unsafe. The standby can get ahead of the

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Dimitri Fontaine
Tom Lane writes: > Dimitri Fontaine writes: >> Stop me if I'm all wrong already, but I though we said that we should >> handle this case by decoupling what we can send to the standby and what >> it can apply. > > What's the point of that? It won't make the standby apply any faster. True, but it

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Tom Lane
Dimitri Fontaine writes: > Stop me if I'm all wrong already, but I though we said that we should > handle this case by decoupling what we can send to the standby and what > it can apply. What's the point of that? It won't make the standby apply any faster. What it will do is make the protocol mo

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Robert Haas
On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine wrote: > Stop me if I'm all wrong already, but I though we said that we should > handle this case by decoupling what we can send to the standby and what > it can apply. We could do this by sending the current WAL fsync'ed > position on the master in

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-07 Thread Dimitri Fontaine
Robert Haas writes: > If it's unsafe to send written but unflushed WAL to the standby, then > for the same reasons we can't send unwritten WAL either. [...] > Having said that, I do think we urgently need some high-level design > discussion on how sync rep is actually going to handle this issue S

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-06 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the > disk. > This degrades the performance of synchronous replication very much since a > t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-07-01 Thread Greg Stark
On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas wrote: > One thought that occurred to me is that if the master and standby were > more tightly coupled, you could recover after a crash by making the > one with the further-advanced WAL position the master, and the other > one the standby.  That would

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-30 Thread Robert Haas
On Wed, Jun 30, 2010 at 5:36 AM, Fujii Masao wrote: >> Before we get too busy frobnicating this gonkulator, I'd like to see a >> little more discussion of what kind of performance people are >> expecting from sync rep.  Sounds to me like the best we can expect >> here is, on every commit: (a) wait

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-30 Thread Fujii Masao
On Wed, Jun 30, 2010 at 11:26 AM, Robert Haas wrote: > Maybe.  As Heikki pointed out upthread, the standby can't even write > the WAL to back to the OS until it's been fsync'd on the master > without risking the problem under discussion. If we change the startup process so that it doesn't go ahea

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-29 Thread Robert Haas
On Tue, Jun 29, 2010 at 10:06 PM, Bruce Momjian wrote: > Simon Riggs wrote: >> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: >> >> > The problem is not that the master streams non-fsync'd WAL, but that the >> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL >> >

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-29 Thread Bruce Momjian
Simon Riggs wrote: > On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: > > > The problem is not that the master streams non-fsync'd WAL, but that the > > standby can replay that. So I'm thinking that we can send non-fsync'd WAL > > safely if the standby makes the recovery wait until the master

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Simon Riggs
On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote: > The problem is not that the master streams non-fsync'd WAL, but that the > standby can replay that. So I'm thinking that we can send non-fsync'd WAL > safely if the standby makes the recovery wait until the master has fsync'd > WAL. That is,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Greg Stark
On Mon, Jun 21, 2010 at 10:40 AM, Heikki Linnakangas wrote: > I guess, but you have to be very careful to correctly refrain from applying > the WAL. For example, a naive implementation might write the WAL to disk in > walreceiver immediately, but refrain from telling the startup process about > it

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Heikki Linnakangas
On 21/06/10 12:08, Fujii Masao wrote: On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote: In 9.0, I think we can fix this problem by (1) only streaming WAL that has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But in 9.1, with sync rep and the performance demands that entails,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-21 Thread Fujii Masao
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote: > On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote: >>> I wonder if it would be possible to jigger things so that we send the >>> WAL to the standby as soon as it is generated, but somehow arrange >>> things so that the standby knows the last

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 8:09 PM, Josh Berkus wrote: > >> I have yet to convince myself of how likely this is to occur.  I tried >> to reproduce this issue by crashing the database, but I think in 9.0 >> you need an actual operating system crash to cause this problem, and I >> haven't yet set up an

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
On 6/15/10 5:09 PM, Josh Berkus wrote: >> > In 9.0, I think we can fix this problem by (1) only streaming WAL that >> > has been fsync'd and > > I don't think this is the best solution; it would be a noticeable > performance penalty on replication. Actually, there's an even bigger reason not to

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
> I have yet to convince myself of how likely this is to occur. I tried > to reproduce this issue by crashing the database, but I think in 9.0 > you need an actual operating system crash to cause this problem, and I > haven't yet set up an environment in which I can repeatedly crash the > OS. I

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote: >> I wonder if it would be possible to jigger things so that we send the >> WAL to the standby as soon as it is generated, but somehow arrange >> things so that the standby knows the last location that the master has >> fsync'd and never applies

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Josh Berkus
> I wonder if it would be possible to jigger things so that we send the > WAL to the standby as soon as it is generated, but somehow arrange > things so that the standby knows the last location that the master has > fsync'd and never applies beyond that point. I can't think of any way which would

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Florian Pflug
On Jun 15, 2010, at 10:45 , Fujii Masao wrote: > A transaction commit would need to wait for local fsync and replication > in a serial manner, in synchronous replication. IOW, walsender cannot > send the commit record until it's fsync'd in XLogWrite(). Hm, but since 9.0 won't do synchronous replic

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Robert Haas
On Tue, Jun 15, 2010 at 12:46 AM, Fujii Masao wrote: > On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote: >> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: >>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: Maybe.  That sounds like a pretty enormous foot-gun to me, considering

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-15 Thread Fujii Masao
On Tue, Jun 15, 2010 at 2:16 PM, Heikki Linnakangas wrote: > On 15/06/10 07:47, Fujii Masao wrote: >> >> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane  wrote: >>> >>> Fujii Masao  writes: Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, xlogctl->LogwrtResult.Write i

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Heikki Linnakangas
On 15/06/10 07:47, Fujii Masao wrote: On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote: Fujii Masao writes: Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync. Wrong. LogwrtResult.Write tracks how far

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote: > Fujii Masao writes: >> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: >>> Well, we're already not waiting for fsync, which is the slowest part. > >> No, currently walsender waits for fsync. > > No, you're mistaken. > >> Walsender tries to se

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote: > On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: >> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: >>> Maybe.  That sounds like a pretty enormous foot-gun to me, considering >>> that we have no way of recovering from the situation wh

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Tom Lane
Fujii Masao writes: > On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: >> Well, we're already not waiting for fsync, which is the slowest part. > No, currently walsender waits for fsync. No, you're mistaken. > Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH, > xlogctl->Log

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote: > On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: >> Maybe.  That sounds like a pretty enormous foot-gun to me, considering >> that we have no way of recovering from the situation where the standby >> gets ahead of the master. > > No, we can

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote: > Maybe.  That sounds like a pretty enormous foot-gun to me, considering > that we have no way of recovering from the situation where the standby > gets ahead of the master. No, we can do that by reconstructing the standby from the backup. And,

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote: > No, currently walsender waits for fsync. > ... > But that change would cause the problem that Robert pointed out. > http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php Presumably this means that if synchronous_commit = off on p

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Simon Riggs
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: > > Stefan Kaltenbrunner writes: > >> hmm not sure that is what fujii tried to say - I think his point was > >> that in the original case we would have serialized all the operations > >> (fir

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Robert Haas
On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote: >> I think the failover case might be OK.  But if the master crashes and >> restarts, the slave might be left thinking its xlog position is ahead >> of the xlog position on the master. > > R

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Sat, Jun 12, 2010 at 12:15 AM, Stefan Kaltenbrunner wrote: > hmm ok - but assuming sync rep we would end up with something like the > following(hypotetically assuming each operation takes 1 time unit): > > originally: > > write 1 > sync 1 > network 1 > write 1 > sync 1 > > total: 5 > > whereas

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote: > Stefan Kaltenbrunner writes: >> hmm not sure that is what fujii tried to say - I think his point was >> that in the original case we would have serialized all the operations >> (first write+sync on the master, network afterwards and write+sync o

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-14 Thread Fujii Masao
On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote: > I think the failover case might be OK.  But if the master crashes and > restarts, the slave might be left thinking its xlog position is ahead > of the xlog position on the master. Right. Unless we perform a failover in this case, the standby

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-13 Thread Greg Smith
Florian Pflug wrote: glibc defines O_DSYNC as an alias for O_SYNC and warrants that with "Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which require all metadata updates of a write to be on disk on returning to userspace, but only the O_DSYNC semantics, which requ

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-12 Thread Heikki Linnakangas
On 12/06/10 01:16, Josh Berkus wrote: Well, we're already not waiting for fsync, which is the slowest part. If there's a performance problem, it may be because FADVISE_DONTNEED disables kernel buffering so that we're forced to actually read the data back from disk before sending it on down the

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-12 Thread Florian Pflug
On Jun 12, 2010, at 3:10 , Josh Berkus wrote: >> Hm, but then Robert's failure case is real, and streaming replication might >> break due to an OS-level crash of the master. Or am I missing something? > > 1) Master goes out > 2) "floating" transaction applied to standby. > 3) Standby goes out > 4

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Josh Berkus
> Hm, but then Robert's failure case is real, and streaming replication might > break due to an OS-level crash of the master. Or am I missing something? Well, in the failover case this isn't a problem, it's a benefit: the standby gets a transaction which you would have lost off the master. Howev

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Florian Pflug
On Jun 11, 2010, at 16:31 , Tom Lane wrote: > Fujii Masao writes: >> In 9.0, walsender reads WAL always from the disk and sends it to the standby. >> That is, we cannot send WAL until it has been written (and flushed) to the >> disk. > > I believe the above statement to be incorrect: walsender d

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Josh Berkus
> Well, we're already not waiting for fsync, which is the slowest part. > If there's a performance problem, it may be because FADVISE_DONTNEED > disables kernel buffering so that we're forced to actually read the data > back from disk before sending it on down the wire. Well, that's fairly direct

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Stefan Kaltenbrunner
On 06/11/2010 04:47 PM, Tom Lane wrote: Stefan Kaltenbrunner writes: hmm not sure that is what fujii tried to say - I think his point was that in the original case we would have serialized all the operations (first write+sync on the master, network afterwards and write+sync on the slave) and no

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Tom Lane
Stefan Kaltenbrunner writes: > hmm not sure that is what fujii tried to say - I think his point was > that in the original case we would have serialized all the operations > (first write+sync on the master, network afterwards and write+sync on > the slave) and now we could try parallelizing by

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Stefan Kaltenbrunner
On 06/11/2010 04:31 PM, Tom Lane wrote: Fujii Masao writes: In 9.0, walsender reads WAL always from the disk and sends it to the standby. That is, we cannot send WAL until it has been written (and flushed) to the disk. I believe the above statement to be incorrect: walsender does *not* wait f

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Tom Lane
Fujii Masao writes: > In 9.0, walsender reads WAL always from the disk and sends it to the standby. > That is, we cannot send WAL until it has been written (and flushed) to the > disk. I believe the above statement to be incorrect: walsender does *not* wait for an fsync to occur. I agree with t

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao wrote: > On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote: >> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: >>> Thought? Comment? Objection? >> >> What happens if the WAL is streamed to the standby and then the master >> crashes without writi

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Fujii Masao
On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote: > On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: >> Thought? Comment? Objection? > > What happens if the WAL is streamed to the standby and then the master > crashes without writing that WAL to disk? What are you concerned about? I think

Re: [HACKERS] Proposal for 9.1: WAL streaming from WAL buffers

2010-06-11 Thread Robert Haas
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote: > Thought? Comment? Objection? What happens if the WAL is streamed to the standby and then the master crashes without writing that WAL to disk? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent