On Tue, 9 Oct 2007, David Carter wrote:
I've never faced a spilt brain situation which involved more than two or
three messages (the outstanding log on an old master system).
I suppose that it was predicable that a week after writing this I faced my
first serious split brain (3000 messages
On Sat, 13 Oct 2007, Bron Gondwana wrote:
Apart from a couple of short-lived command line utilities it looks like
the only use of signal() is a bunch of 'signal(SIGPIPE, SIG_IGN);'
scattered through just about everything.
Most of the interesting signal handling is done with sigaction
already.
This would seem to be a significant advantage of running sync_client
outside master.
When I shut down master, sync_client continues to process the outstanding
log. I can then use sync_shutdown_file when it has finished and is idle.
We do something similar.
But it means you have to develop
On Wed, 10 Oct 2007, Rob Mueller wrote:
I think the problem at the moment is that the process you really want is:
1. Stop new imap/pop/lmtp/sieve/etc connections
2. Finish and close existing connections cleanly but as quickly as possible
3. Finish running any sync log files
4. Fully shutdown
On Fri, Oct 12, 2007 at 10:29:53AM -0400, Carson Gaspar wrote:
David Carter wrote:
I'm still a little bothered about signal handling and EINTR. I did some
experiments after our last chat about signals. In practice disk IO system
calls seem to be reasonably safe against EINTR on both Linux
Or is the problem that you have something like:
write to file 1
write to file 2
And if the first returns EINTR but is ignored, and then it writes the
complete data to the second, things are in an inconsistent state?
This is my concern.
Doing an ack 'write\(' reveals a scary mix of write,
On Mon, 8 Oct 2007, Rudy Gevaert wrote:
Note, we are running 2.3.7, I'm going to upgrade when 2.3.10 is out.
We have replication in place, but daren't use it. If I have a method to
check if the replica is in sync then I'll dare to do a fail over.
I do this using -v -v to sync_client, which
On Mon, 8 Oct 2007, Bron Gondwana wrote:
We already run a sync_server on our masters as well because we use it
for user moves:
Generally takes about 15 seconds for the critical path bit, and
the initial sync doesn't matter how long it takes.
As do we. In fact when I first showed the
On Thu, 4 Oct 2007, Bron Gondwana wrote:
a) MUST never lose a message that's been accepted for
delivery except in the case of total drive failure.
b) MUST have a standard way to integrity check and
repair a replica-pair after a system crash.
A replica system is automatically repaired to
c) MUST have a clean process to soft-failover to the
replica machine, making sure that all replication
events from the ex-master have been synchronised.
Something more than sync_shutdown_file plus automatic retries on
recent work files?
I think the problem at the moment is that the
Hello,
I agree with Bron. However I do think some parts are more important
than others. I'll try to explain my point of view.
Note, we are running 2.3.7, I'm going to upgrade when 2.3.10 is out. We
have replication in place, but daren't use it. If I have a method to
check if the replica
On Mon, Oct 08, 2007 at 10:03:31AM +0200, Rudy Gevaert wrote:
For me points a, e and f are most important, but the others are also
important.
Bron Gondwana wrote:
So I'd like to start a dialogue on the topic of making Cyrus
replication robust across failures with the following goals:
a)
Hi,
As I've mentioned on the mailing list, we have had to put
quite a lot of infrastructure around Cyrus to make
replication robust in all cases.
While the core replication protocol seems pretty stable now,
and with GUID stuff it will be easier to do integrity checks,
it's still very much not a
13 matches
Mail list logo