Fujii Masao writes:
> On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas wrote:
>> The real problem here is that we're sending records to the slave which
>> might cease to exist on the master if it unexpectedly reboots. I
>> believe that what we need to do is make sure that the master only
>> sends WA
On Thu, Jun 17, 2010 at 09:20, Fujii Masao wrote:
> On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
> wrote:
>> I tested this yesterday and I could not get any reaction from the wal
>> receiver even after using minimal values compared to the default values .
>>
>> The default values in linux fo
On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
wrote:
> I tested this yesterday and I could not get any reaction from the wal
> receiver even after using minimal values compared to the default values .
>
> The default values in linux for tcp_keepalive_time, tcp_keepalive_intvl
> and tcp_keepali
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Heikki Linnakangas wrote:
>
> We're not talking about a timeout for promoting standby to master. The
> problem is that the standby doesn't notice that from the master's point
> of view, the connection has been broken. Whether it's because of a
> netw
On 17/06/10 02:40, Greg Stark wrote:
On Thu, Jun 17, 2010 at 12:16 AM, Kevin Grittner
wrote:
Greg Stark wrote:
TCP keepalives are for detecting broken network connections
Yeah. That seems like what we have here. If you shoot the OS in
the head, the network connection is broken rather ab
On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas wrote:
> On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus wrote:
>>> The first problem I noticed is that the slave never seems to realize
>>> that the master has gone away. Every time I crashed the master, I had
>>> to kill the wal receiver process on the
On Thu, Jun 17, 2010 at 12:16 AM, Kevin Grittner
wrote:
> Greg Stark wrote:
>
>> TCP keepalives are for detecting broken network connections
>
> Yeah. That seems like what we have here. If you shoot the OS in
> the head, the network connection is broken rather abruptly, without
> the normal pac
On Thu, Jun 17, 2010 at 12:22 AM, Kevin Grittner
wrote:
> "Kevin Grittner" wrote:
>
>> It sounds like it behaves just fine except for not detecting a
>> broken connection.
>
> Of course I meant in terms of the slave's attempts at retrieving
> more WAL, not in terms of it applying a second time li
"Kevin Grittner" wrote:
> It sounds like it behaves just fine except for not detecting a
> broken connection.
Of course I meant in terms of the slave's attempts at retrieving
more WAL, not in terms of it applying a second time line. TCP
keepalive timeouts don't help with that part of it, just
Greg Stark wrote:
> TCP keepalives are for detecting broken network connections
Yeah. That seems like what we have here. If you shoot the OS in
the head, the network connection is broken rather abruptly, without
the normal packets exchanged to close the TCP connection. It sounds
like it beh
On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane wrote:
> Robert Haas writes:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away. Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> othe
The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots. I
believe that what we need to do is make sure that the master only
sends WAL it has already fsync'd
How about this :
- pg records somewhere the xlog position
On 6/16/10 1:26 PM, Robert Haas wrote:
> Similarly with synchronous_commit=off, I believe
> that the next checkpoint will still fsync WAL, but the lag might be
> long.
That's not a showstopper. Just tell people that having synch_commit=off
on the master might increase the lag to the slave, and le
Robert Haas writes:
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at le
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Robert Haas wrote:
>
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise i
On Wed, Jun 16, 2010 at 22:26, Robert Haas wrote:
>>> and this just
>>> makes it more likely. After the most recent crash, the master thought
>>> pg_current_xlog_location() was 1/86CD4000; the slave thought
>>> pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to
>>> the master,
Robert Haas wrote:
> Kevin Grittner wrote:
>> Robert Haas wrote:
>>> So, obviously at this point my slave database is corrupted
>>> beyond repair due to nothing more than an unexpected crash on
>>> the master.
>>
>> Certainly that's true for resuming replication. From your
>> description it sou
On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus wrote:
>> The first problem I noticed is that the slave never seems to realize
>> that the master has gone away. Every time I crashed the master, I had
>> to kill the wal receiver process on the slave to get it to reconnect;
>> otherwise it just sat th
On Wed, Jun 16, 2010 at 4:00 PM, Kevin Grittner
wrote:
> Robert Haas wrote:
>> So, obviously at this point my slave database is corrupted beyond
>> repair due to nothing more than an unexpected crash on the master.
>
> Certainly that's true for resuming replication. From your
> description it so
> The first problem I noticed is that the slave never seems to realize
> that the master has gone away. Every time I crashed the master, I had
> to kill the wal receiver process on the slave to get it to reconnect;
> otherwise it just sat there waiting, either forever or at least for
> longer tha
Stefan Kaltenbrunner wrote:
> well this is likely caused by the OS not noticing that the
> connections went away (linux has really long timeouts here) -
> maybe we should unconditionally enable keepalive on systems that
> support that for replication connections (if that is possible in
> the cur
Robert Haas wrote:
> I don't know what to do about this
This probably is out of the question for 9.0 based on scale of
change, and maybe forever based on the impact of WAL volume, but --
if we logged "before" images along with the "after", we could undo
the work of the "over-eager" transaction
On 06/16/2010 09:47 PM, Robert Haas wrote:
On Mon, Jun 14, 2010 at 7:55 AM, Simon Riggs wrote:
But that change would cause the problem that Robert pointed out.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
Presumably this means that if synchronous_commit = off on primary t
Robert Haas wrote:
> So, obviously at this point my slave database is corrupted beyond
> repair due to nothing more than an unexpected crash on the master.
Certainly that's true for resuming replication. From your
description it sounds as though the slave would be usable for
purposes of takin
On Wed, 2010-06-16 at 15:47 -0400, Robert Haas wrote:
> So, obviously at this point my slave database is corrupted beyond
> repair due to nothing more than an unexpected crash on the master.
> That's bad. What is worse is that the system only detected the
> corruption because the slave had crosse
25 matches
Mail list logo