On 17/06/10 02:40, Greg Stark wrote:
On Thu, Jun 17, 2010 at 12:16 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Greg Starkgsst...@mit.edu wrote:
TCP keepalives are for detecting broken network connections
Yeah. That seems like what we have here. If you shoot the OS in
the head,
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Heikki Linnakangas wrote:
We're not talking about a timeout for promoting standby to master. The
problem is that the standby doesn't notice that from the master's point
of view, the connection has been broken. Whether it's because of a
network
On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
r.m.guerr...@usit.uio.no wrote:
I tested this yesterday and I could not get any reaction from the wal
receiver even after using minimal values compared to the default values .
The default values in linux for tcp_keepalive_time,
On Thu, Jun 17, 2010 at 09:20, Fujii Masao masao.fu...@gmail.com wrote:
On Thu, Jun 17, 2010 at 4:02 PM, Rafael Martinez
r.m.guerr...@usit.uio.no wrote:
I tested this yesterday and I could not get any reaction from the wal
receiver even after using minimal values compared to the default values
Fujii Masao masao.fu...@gmail.com writes:
On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas robertmh...@gmail.com wrote:
The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots. I
believe that what we need to do is make
On Mon, Jun 14, 2010 at 7:55 AM, Simon Riggs si...@2ndquadrant.com wrote:
But that change would cause the problem that Robert pointed out.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
Presumably this means that if synchronous_commit = off on primary that
SR in 9.0 will no
On Wed, 2010-06-16 at 15:47 -0400, Robert Haas wrote:
So, obviously at this point my slave database is corrupted beyond
repair due to nothing more than an unexpected crash on the master.
That's bad. What is worse is that the system only detected the
corruption because the slave had crossed
Robert Haas robertmh...@gmail.com wrote:
So, obviously at this point my slave database is corrupted beyond
repair due to nothing more than an unexpected crash on the master.
Certainly that's true for resuming replication. From your
description it sounds as though the slave would be usable
On 06/16/2010 09:47 PM, Robert Haas wrote:
On Mon, Jun 14, 2010 at 7:55 AM, Simon Riggssi...@2ndquadrant.com wrote:
But that change would cause the problem that Robert pointed out.
http://archives.postgresql.org/pgsql-hackers/2010-06/msg00670.php
Presumably this means that if
Robert Haas robertmh...@gmail.com wrote:
I don't know what to do about this
This probably is out of the question for 9.0 based on scale of
change, and maybe forever based on the impact of WAL volume, but --
if we logged before images along with the after, we could undo
the work of the
Stefan Kaltenbrunner ste...@kaltenbrunner.cc wrote:
well this is likely caused by the OS not noticing that the
connections went away (linux has really long timeouts here) -
maybe we should unconditionally enable keepalive on systems that
support that for replication connections (if that is
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill the wal receiver process on the slave to get it to reconnect;
otherwise it just sat there waiting, either forever or at least for
longer than I
On Wed, Jun 16, 2010 at 4:00 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Robert Haas robertmh...@gmail.com wrote:
So, obviously at this point my slave database is corrupted beyond
repair due to nothing more than an unexpected crash on the master.
Certainly that's true for resuming
On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus j...@agliodbs.com wrote:
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill the wal receiver process on the slave to get it to reconnect;
otherwise it
Robert Haas robertmh...@gmail.com wrote:
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
Robert Haas robertmh...@gmail.com wrote:
So, obviously at this point my slave database is corrupted
beyond repair due to nothing more than an unexpected crash on
the master.
Certainly that's true for
On Wed, Jun 16, 2010 at 22:26, Robert Haas robertmh...@gmail.com wrote:
and this just
makes it more likely. After the most recent crash, the master thought
pg_current_xlog_location() was 1/86CD4000; the slave thought
pg_last_xlog_receive_location() was 1/8733C000. After reconnecting to
the
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Robert Haas wrote:
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill the wal receiver process on the slave to get it to reconnect;
otherwise it
Robert Haas robertmh...@gmail.com writes:
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill the wal receiver process on the slave to get it to reconnect;
otherwise it just sat there waiting,
On 6/16/10 1:26 PM, Robert Haas wrote:
Similarly with synchronous_commit=off, I believe
that the next checkpoint will still fsync WAL, but the lag might be
long.
That's not a showstopper. Just tell people that having synch_commit=off
on the master might increase the lag to the slave, and
The real problem here is that we're sending records to the slave which
might cease to exist on the master if it unexpectedly reboots. I
believe that what we need to do is make sure that the master only
sends WAL it has already fsync'd
How about this :
- pg records somewhere the xlog
On Wed, Jun 16, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill the wal receiver process on the
Greg Stark gsst...@mit.edu wrote:
TCP keepalives are for detecting broken network connections
Yeah. That seems like what we have here. If you shoot the OS in
the head, the network connection is broken rather abruptly, without
the normal packets exchanged to close the TCP connection. It
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
It sounds like it behaves just fine except for not detecting a
broken connection.
Of course I meant in terms of the slave's attempts at retrieving
more WAL, not in terms of it applying a second time line. TCP
keepalive timeouts don't help
On Thu, Jun 17, 2010 at 12:22 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
It sounds like it behaves just fine except for not detecting a
broken connection.
Of course I meant in terms of the slave's attempts at retrieving
more WAL,
On Thu, Jun 17, 2010 at 12:16 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
Greg Stark gsst...@mit.edu wrote:
TCP keepalives are for detecting broken network connections
Yeah. That seems like what we have here. If you shoot the OS in
the head, the network connection is broken
On Thu, Jun 17, 2010 at 5:26 AM, Robert Haas robertmh...@gmail.com wrote:
On Wed, Jun 16, 2010 at 4:14 PM, Josh Berkus j...@agliodbs.com wrote:
The first problem I noticed is that the slave never seems to realize
that the master has gone away. Every time I crashed the master, I had
to kill
26 matches
Mail list logo