On Thu, Jul 8, 2010 at 7:55 AM, Robert Haas wrote:
>> What was the final decision on behavior if fsync=off?
>
> I'm not sure we made any decision, per se, but if you use fsync=off in
> combination with SR and experience an unexpected crash-and-reboot on
> the master, you will be sad.
True. But, w
> Having said that, I do think we urgently need some high-level design
> discussion on how sync rep is actually going to handle this issue
> (perhaps on a new thread). If we can't resolve this issue, sync rep
> is going to be really slow; but there are no easy solutions to this
> problem in sight,
On Wed, Jul 7, 2010 at 6:44 PM, Josh Berkus wrote:
> On 7/6/10 4:44 PM, Robert Haas wrote:
>> To recap the previous discussion on this thread, we ended up changing
>> the behavior of 9.0 so that it only sends WAL which has been written
>> to the OS *and flushed*, because sending unflushed WAL to t
On 7/6/10 4:44 PM, Robert Haas wrote:
> To recap the previous discussion on this thread, we ended up changing
> the behavior of 9.0 so that it only sends WAL which has been written
> to the OS *and flushed*, because sending unflushed WAL to the standby
> is unsafe. The standby can get ahead of the
Tom Lane writes:
> Dimitri Fontaine writes:
>> Stop me if I'm all wrong already, but I though we said that we should
>> handle this case by decoupling what we can send to the standby and what
>> it can apply.
>
> What's the point of that? It won't make the standby apply any faster.
True, but it
Dimitri Fontaine writes:
> Stop me if I'm all wrong already, but I though we said that we should
> handle this case by decoupling what we can send to the standby and what
> it can apply.
What's the point of that? It won't make the standby apply any faster.
What it will do is make the protocol mo
On Wed, Jul 7, 2010 at 4:40 AM, Dimitri Fontaine wrote:
> Stop me if I'm all wrong already, but I though we said that we should
> handle this case by decoupling what we can send to the standby and what
> it can apply. We could do this by sending the current WAL fsync'ed
> position on the master in
Robert Haas writes:
> If it's unsafe to send written but unflushed WAL to the standby, then
> for the same reasons we can't send unwritten WAL either.
[...]
> Having said that, I do think we urgently need some high-level design
> discussion on how sync rep is actually going to handle this issue
S
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the
> disk.
> This degrades the performance of synchronous replication very much since a
> t
On Wed, Jun 30, 2010 at 12:37 PM, Robert Haas wrote:
> One thought that occurred to me is that if the master and standby were
> more tightly coupled, you could recover after a crash by making the
> one with the further-advanced WAL position the master, and the other
> one the standby. That would
On Wed, Jun 30, 2010 at 5:36 AM, Fujii Masao wrote:
>> Before we get too busy frobnicating this gonkulator, I'd like to see a
>> little more discussion of what kind of performance people are
>> expecting from sync rep. Sounds to me like the best we can expect
>> here is, on every commit: (a) wait
On Wed, Jun 30, 2010 at 11:26 AM, Robert Haas wrote:
> Maybe. As Heikki pointed out upthread, the standby can't even write
> the WAL to back to the OS until it's been fsync'd on the master
> without risking the problem under discussion.
If we change the startup process so that it doesn't go ahea
On Tue, Jun 29, 2010 at 10:06 PM, Bruce Momjian wrote:
> Simon Riggs wrote:
>> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
>>
>> > The problem is not that the master streams non-fsync'd WAL, but that the
>> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL
>> >
Simon Riggs wrote:
> On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
>
> > The problem is not that the master streams non-fsync'd WAL, but that the
> > standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> > safely if the standby makes the recovery wait until the master
On Mon, 2010-06-21 at 18:08 +0900, Fujii Masao wrote:
> The problem is not that the master streams non-fsync'd WAL, but that the
> standby can replay that. So I'm thinking that we can send non-fsync'd WAL
> safely if the standby makes the recovery wait until the master has fsync'd
> WAL. That is,
On Mon, Jun 21, 2010 at 10:40 AM, Heikki Linnakangas
wrote:
> I guess, but you have to be very careful to correctly refrain from applying
> the WAL. For example, a naive implementation might write the WAL to disk in
> walreceiver immediately, but refrain from telling the startup process about
> it
On 21/06/10 12:08, Fujii Masao wrote:
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote:
In 9.0, I think we can fix this problem by (1) only streaming WAL that
has been fsync'd and (2) PANIC-ing if the problem occurs anyway. But
in 9.1, with sync rep and the performance demands that entails,
On Wed, Jun 16, 2010 at 5:06 AM, Robert Haas wrote:
> On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote:
>>> I wonder if it would be possible to jigger things so that we send the
>>> WAL to the standby as soon as it is generated, but somehow arrange
>>> things so that the standby knows the last
On Tue, Jun 15, 2010 at 8:09 PM, Josh Berkus wrote:
>
>> I have yet to convince myself of how likely this is to occur. I tried
>> to reproduce this issue by crashing the database, but I think in 9.0
>> you need an actual operating system crash to cause this problem, and I
>> haven't yet set up an
On 6/15/10 5:09 PM, Josh Berkus wrote:
>> > In 9.0, I think we can fix this problem by (1) only streaming WAL that
>> > has been fsync'd and
>
> I don't think this is the best solution; it would be a noticeable
> performance penalty on replication.
Actually, there's an even bigger reason not to
> I have yet to convince myself of how likely this is to occur. I tried
> to reproduce this issue by crashing the database, but I think in 9.0
> you need an actual operating system crash to cause this problem, and I
> haven't yet set up an environment in which I can repeatedly crash the
> OS. I
On Tue, Jun 15, 2010 at 3:57 PM, Josh Berkus wrote:
>> I wonder if it would be possible to jigger things so that we send the
>> WAL to the standby as soon as it is generated, but somehow arrange
>> things so that the standby knows the last location that the master has
>> fsync'd and never applies
> I wonder if it would be possible to jigger things so that we send the
> WAL to the standby as soon as it is generated, but somehow arrange
> things so that the standby knows the last location that the master has
> fsync'd and never applies beyond that point.
I can't think of any way which would
On Jun 15, 2010, at 10:45 , Fujii Masao wrote:
> A transaction commit would need to wait for local fsync and replication
> in a serial manner, in synchronous replication. IOW, walsender cannot
> send the commit record until it's fsync'd in XLogWrite().
Hm, but since 9.0 won't do synchronous replic
On Tue, Jun 15, 2010 at 12:46 AM, Fujii Masao wrote:
> On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote:
>> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
>>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
Maybe. That sounds like a pretty enormous foot-gun to me, considering
On Tue, Jun 15, 2010 at 2:16 PM, Heikki Linnakangas
wrote:
> On 15/06/10 07:47, Fujii Masao wrote:
>>
>> On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
>>>
>>> Fujii Masao writes:
Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
xlogctl->LogwrtResult.Write i
On 15/06/10 07:47, Fujii Masao wrote:
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
Fujii Masao writes:
Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
xlogctl->LogwrtResult.Write is updated after XLogWrite() performs fsync.
Wrong. LogwrtResult.Write tracks how far
On Tue, Jun 15, 2010 at 12:02 AM, Tom Lane wrote:
> Fujii Masao writes:
>> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
>>> Well, we're already not waiting for fsync, which is the slowest part.
>
>> No, currently walsender waits for fsync.
>
> No, you're mistaken.
>
>> Walsender tries to se
On Mon, Jun 14, 2010 at 10:13 PM, Robert Haas wrote:
> On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
>> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
>>> Maybe. That sounds like a pretty enormous foot-gun to me, considering
>>> that we have no way of recovering from the situation wh
Fujii Masao writes:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
>> Well, we're already not waiting for fsync, which is the slowest part.
> No, currently walsender waits for fsync.
No, you're mistaken.
> Walsender tries to send WAL up to xlogctl->LogwrtResult.Write. OTOH,
> xlogctl->Log
On Mon, Jun 14, 2010 at 8:41 AM, Fujii Masao wrote:
> On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
>> Maybe. That sounds like a pretty enormous foot-gun to me, considering
>> that we have no way of recovering from the situation where the standby
>> gets ahead of the master.
>
> No, we can
On Mon, Jun 14, 2010 at 8:10 PM, Robert Haas wrote:
> Maybe. That sounds like a pretty enormous foot-gun to me, considering
> that we have no way of recovering from the situation where the standby
> gets ahead of the master.
No, we can do that by reconstructing the standby from the backup.
And,
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> No, currently walsender waits for fsync.
> ...
> But that change would cause the problem that Robert pointed out.
> http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
Presumably this means that if synchronous_commit = off on p
On Mon, 2010-06-14 at 17:39 +0900, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
> > Stefan Kaltenbrunner writes:
> >> hmm not sure that is what fujii tried to say - I think his point was
> >> that in the original case we would have serialized all the operations
> >> (fir
On Mon, Jun 14, 2010 at 4:14 AM, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote:
>> I think the failover case might be OK. But if the master crashes and
>> restarts, the slave might be left thinking its xlog position is ahead
>> of the xlog position on the master.
>
> R
On Sat, Jun 12, 2010 at 12:15 AM, Stefan Kaltenbrunner
wrote:
> hmm ok - but assuming sync rep we would end up with something like the
> following(hypotetically assuming each operation takes 1 time unit):
>
> originally:
>
> write 1
> sync 1
> network 1
> write 1
> sync 1
>
> total: 5
>
> whereas
On Fri, Jun 11, 2010 at 11:47 PM, Tom Lane wrote:
> Stefan Kaltenbrunner writes:
>> hmm not sure that is what fujii tried to say - I think his point was
>> that in the original case we would have serialized all the operations
>> (first write+sync on the master, network afterwards and write+sync o
On Fri, Jun 11, 2010 at 11:24 PM, Robert Haas wrote:
> I think the failover case might be OK. But if the master crashes and
> restarts, the slave might be left thinking its xlog position is ahead
> of the xlog position on the master.
Right. Unless we perform a failover in this case, the standby
Florian Pflug wrote:
glibc defines O_DSYNC as an alias for O_SYNC and warrants that with
"Most Linux filesystems don't actually implement the POSIX O_SYNC semantics, which
require all metadata updates of a write to be on disk on returning to userspace, but only
the O_DSYNC semantics, which requ
On 12/06/10 01:16, Josh Berkus wrote:
Well, we're already not waiting for fsync, which is the slowest part.
If there's a performance problem, it may be because FADVISE_DONTNEED
disables kernel buffering so that we're forced to actually read the data
back from disk before sending it on down the
On Jun 12, 2010, at 3:10 , Josh Berkus wrote:
>> Hm, but then Robert's failure case is real, and streaming replication might
>> break due to an OS-level crash of the master. Or am I missing something?
>
> 1) Master goes out
> 2) "floating" transaction applied to standby.
> 3) Standby goes out
> 4
> Hm, but then Robert's failure case is real, and streaming replication might
> break due to an OS-level crash of the master. Or am I missing something?
Well, in the failover case this isn't a problem, it's a benefit: the
standby gets a transaction which you would have lost off the master.
Howev
On Jun 11, 2010, at 16:31 , Tom Lane wrote:
> Fujii Masao writes:
>> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
>> That is, we cannot send WAL until it has been written (and flushed) to the
>> disk.
>
> I believe the above statement to be incorrect: walsender d
> Well, we're already not waiting for fsync, which is the slowest part.
> If there's a performance problem, it may be because FADVISE_DONTNEED
> disables kernel buffering so that we're forced to actually read the data
> back from disk before sending it on down the wire.
Well, that's fairly direct
On 06/11/2010 04:47 PM, Tom Lane wrote:
Stefan Kaltenbrunner writes:
hmm not sure that is what fujii tried to say - I think his point was
that in the original case we would have serialized all the operations
(first write+sync on the master, network afterwards and write+sync on
the slave) and no
Stefan Kaltenbrunner writes:
> hmm not sure that is what fujii tried to say - I think his point was
> that in the original case we would have serialized all the operations
> (first write+sync on the master, network afterwards and write+sync on
> the slave) and now we could try parallelizing by
On 06/11/2010 04:31 PM, Tom Lane wrote:
Fujii Masao writes:
In 9.0, walsender reads WAL always from the disk and sends it to the standby.
That is, we cannot send WAL until it has been written (and flushed) to the disk.
I believe the above statement to be incorrect: walsender does *not* wait
f
Fujii Masao writes:
> In 9.0, walsender reads WAL always from the disk and sends it to the standby.
> That is, we cannot send WAL until it has been written (and flushed) to the
> disk.
I believe the above statement to be incorrect: walsender does *not* wait
for an fsync to occur.
I agree with t
On Fri, Jun 11, 2010 at 9:57 AM, Fujii Masao wrote:
> On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote:
>> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
>>> Thought? Comment? Objection?
>>
>> What happens if the WAL is streamed to the standby and then the master
>> crashes without writi
On Fri, Jun 11, 2010 at 10:22 PM, Robert Haas wrote:
> On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
>> Thought? Comment? Objection?
>
> What happens if the WAL is streamed to the standby and then the master
> crashes without writing that WAL to disk?
What are you concerned about?
I think
On Fri, Jun 11, 2010 at 9:14 AM, Fujii Masao wrote:
> Thought? Comment? Objection?
What happens if the WAL is streamed to the standby and then the master
crashes without writing that WAL to disk?
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company
--
Sent
51 matches
Mail list logo