Peter Geoghegan <[email protected]> writes:
> What sort of risk am I assuming specifically to replication by turning
> off synchronous_commit on the slaves but not on the master?
The risk is that a subscriber would report back that it has committed
changes, when those changes get swept back by a failure (e.g. - power
outage that loses a little bit of work).
I see the edge case, and it's regrettably unpleasant.
Consider...
- Node #2 claims to have committed up to transaction T5, but
the WAL only really has records up to T3.
- Node #1, the "master", got the report back that #2 is up to date to
T5.
- Node #2 experiences a failure (e.g. - power outage).
There are two possible outcomes, now, one OK, and one not so OK...
1. OK
Node #2 gets restarted, replays WAL, knows it's only got data up
to T3, and heads back to node #1, asking for transaction T4 and
others.
No problem.
2. Not so OK :-(
Before node #2 gets back up, node #1 has run an iteration of the
cleanup thread, which trims out all the data up to T5, because the
other nodes confirmed up to that point.
Node #2 gets restarted, replays WAL, knows it's only got data up
to T3, and heads back to node #1, asking for transaction T4 and
T5.
Oops. Node #1 just trimmed those out.
The race condition here is easy to exercise - you just need to suppress
the restart of node #2 for a while, long enough for node #1 to run the
cleanup thread.
You may evade the problem somewhat by setting the parameter
"cleanup_interval" to a larger value. (I'm assuming version 2.0, here.)
Unfortunately, any time the outage of node #2 could exceed that
interval, the risk of losing log data inexorably emerges.
--
let name="cbbrowne" and tld="ca.afilias.info" in String.concat "@" [name;tld];;
Christopher Browne
"Bother," said Pooh, "Eeyore, ready two photon torpedoes and lock
phasers on the Heffalump, Piglet, meet me in transporter room three"
_______________________________________________
Slony1-general mailing list
[email protected]
http://lists.slony.info/mailman/listinfo/slony1-general