Tom Lane wrote:
> Greg Smith writes:
> > I don't see this as needing any implementation any more complicated than
> > the usual way such timeouts are handled. Note how long you've been
> > trying to reach the standby. Default to -1 for forever. And if you hit
> > the timeout, mark the standb
On Wed, Oct 13, 2010 at 10:18 PM, Greg Stark wrote:
> On Tue, Oct 12, 2010 at 11:50 PM, Robert Haas wrote:
>> There's another problem here we should think about, too. Suppose you
>> have a master and two standbys. The master dies. You promote one of
>> the standbys, which turns out to be behin
On Thu, Oct 14, 2010 at 11:18 AM, Greg Stark wrote:
> Why don't the usual protections kick in here? The new record read from
> the location the xlog reader is expecting to find it has to have a
> valid CRC and a correct back pointer to the previous record.
Yep. In most cases, those protections se
On Wed, Oct 13, 2010 at 5:22 AM, Fujii Masao wrote:
> On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas wrote:
>> There's another problem here we should think about, too. Suppose you
>> have a master and two standbys. The master dies. You promote one of
>> the standbys, which turns out to be behind
On Tue, Oct 12, 2010 at 11:50 PM, Robert Haas wrote:
> There's another problem here we should think about, too. Suppose you
> have a master and two standbys. The master dies. You promote one of
> the standbys, which turns out to be behind the other. You then
> repoint the other standby at the
On Wed, Oct 13, 2010 at 3:50 PM, Robert Haas wrote:
> There's another problem here we should think about, too. Suppose you
> have a master and two standbys. The master dies. You promote one of
> the standbys, which turns out to be behind the other. You then
> repoint the other standby at the o
On Wed, Oct 13, 2010 at 3:43 PM, Heikki Linnakangas
wrote:
> On 13.10.2010 08:21, Fujii Masao wrote:
>>
>> On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas
>> wrote:
>>>
>>> It shouldn't be too hard to fix. Walsender needs to be able to read WAL
>>> from
>>> preceding timelines, like recovery
On 10/13/2010 06:43 AM, Fujii Masao wrote:
> Unfortunately even enough standbys don't increase write-availability
> unless you choose wait-forever. Because, after promoting one of
> standbys to new master, you must keep all the transactions waiting
> until at least one standby has connected to and
On Wed, Oct 13, 2010 at 2:43 AM, Heikki Linnakangas
wrote:
> On 13.10.2010 08:21, Fujii Masao wrote:
>>
>> On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas
>> wrote:
>>>
>>> It shouldn't be too hard to fix. Walsender needs to be able to read WAL
>>> from
>>> preceding timelines, like recovery
On 13.10.2010 08:21, Fujii Masao wrote:
On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas
wrote:
It shouldn't be too hard to fix. Walsender needs to be able to read WAL from
preceding timelines, like recovery does, and walreceiver needs to write the
incoming WAL to the right file.
And walse
On Sat, Oct 9, 2010 at 4:31 AM, Heikki Linnakangas
wrote:
>> Yes. But if there is no unsent WAL when the master goes down,
>> we can start new standby without new backup by copying the
>> timeline history file from new master to new standby and
>> setting recovery_target_timeline to 'latest'.
>
>
On Sat, Oct 9, 2010 at 1:41 AM, Josh Berkus wrote:
>
>> And, I'd like to know whether the master waits forever because of the
>> standby failure in other solutions such as Oracle DataGuard, MySQL
>> semi-synchronous replication.
>
> MySQL used to be fond of simiply failing sliently. Not sure what
On Sat, Oct 9, 2010 at 12:12 AM, Markus Wanner wrote:
> On 10/08/2010 04:48 PM, Fujii Masao wrote:
>> I believe many systems require write-availability.
>
> Sure. Make sure you have enough standbies to fail over to.
Unfortunately even enough standbys don't increase write-availability
unless you c
Greg,
to me it looks like we have very similar goals, but start from different
preconditions. I absolutely agree with you given the preconditions you
named.
On 10/08/2010 10:04 PM, Greg Smith wrote:
> How is that a new problem? It's already possible to end up with a
> standby pair that has suffe
On Fri, 2010-10-08 at 16:34 -0400, Greg Smith wrote:
> Tom Lane wrote:
> > How are you going to "mark the standby as degraded"? The
> > standby can't keep that information, because it's not even connected
> > when the master makes the decision.
>
> From a high level, I'm assuming only that the m
On Fri, 2010-10-08 at 17:06 +0200, Markus Wanner wrote:
> Well, full cluster outages are infrequent, but sadly cannot be avoided
> entirely. (Murphy's laughing). IMO we should be prepared to deal with
> those.
I've described how I propose to deal with those. I'm not waving away
these issues, just
Tom Lane wrote:
How are you going to "mark the standby as degraded"? The
standby can't keep that information, because it's not even connected
when the master makes the decision.
From a high level, I'm assuming only that the master has a list in
memory of the standby system(s) it believes are
Markus Wanner wrote:
..and how do you make sure you are not marking your second standby as
degraded just because it's currently lagging? Effectively degrading the
utterly needed one, because your first standby has just bitten the dust?
People are going to monitor the standby lag. If it gets
On 08.10.2010 17:26, Fujii Masao wrote:
On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas
wrote:
Do we really need that?
Yes. But if there is no unsent WAL when the master goes down,
we can start new standby without new backup by copying the
timeline history file from new master to new stan
*
On 10/8/10, Fujii Masao wrote:
> On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas
> wrote:
>> Do we really need that?
>
> Yes. But if there is no unsent WAL when the master goes down,
> we can start new standby without new backup by copying the
> timeline history file from new master to new
And, I'd like to know whether the master waits forever because of the
standby failure in other solutions such as Oracle DataGuard, MySQL
semi-synchronous replication.
MySQL used to be fond of simiply failing sliently. Not sure what 5.4
does, or Oracle. In any case MySQL's replication has al
On 10/08/2010 04:48 PM, Fujii Masao wrote:
> I believe many systems require write-availability.
Sure. Make sure you have enough standbies to fail over to.
(I think there are even more situations where read-availability is much
more important, though).
>> Start with 0 (i.e. replication off), then
On 10/08/2010 04:47 PM, Simon Riggs wrote:
> Yes, I really want to avoid such issues and likely complexities we get
> into trying to solve them. In reality they should not be common because
> it only happens if the sysadmin has not configured sufficient number of
> redundant standbys.
Well, full c
On Fri, 2010-10-08 at 23:55 +0900, Fujii Masao wrote:
> On Fri, Oct 8, 2010 at 6:00 PM, Simon Riggs wrote:
> > >From the perspective of an observer, randomly selecting a standby for
> > load balancing purposes: No, they are not guaranteed to see the "latest"
> > answer, nor even can they find out
On Fri, Oct 8, 2010 at 6:00 PM, Simon Riggs wrote:
> >From the perspective of an observer, randomly selecting a standby for
> load balancing purposes: No, they are not guaranteed to see the "latest"
> answer, nor even can they find out whether what they are seeing is the
> latest answer.
To guara
On Fri, Oct 8, 2010 at 5:16 PM, Markus Wanner wrote:
> On 10/08/2010 05:41 AM, Fujii Masao wrote:
>> But, even with quorum commit, if you choose wait-forever option,
>> failover would decrease availability. Right after the failover,
>> no standby has connected to new master, so if quorum >= 1, all
On Fri, 2010-10-08 at 10:11 -0400, Tom Lane wrote:
> 1. a unique identifier for each standby (not just role names that
> multiple standbys might share);
That is difficult because each standby is identical. If a standby goes
down, people can regenerate a new standby by taking a copy from another
s
On 10/08/2010 04:38 PM, Tom Lane wrote:
> Markus Wanner writes:
>> IIUC you seem to assume that the master node keeps its master role. But
>> users who value availability a lot certainly want automatic fail-over,
>
> Huh? Surely loss of the slaves shouldn't force a failover. Maybe the
> slaves
Markus Wanner writes:
> On 10/08/2010 04:11 PM, Tom Lane wrote:
>> Actually, #2 seems rather difficult even if you want it. Presumably
>> you'd like to keep that state in reliable storage, so it survives master
>> crashes. But how you gonna commit a change to that state, if you just
>> lost ever
On 10/08/2010 12:05 PM, Dimitri Fontaine wrote:
> Markus Wanner writes:
>> ..and a whole lot of manual work, that's prone to error for something
>> that could easily be automated
>
> So, the master just crashed, first standby is dead and second ain't in
> sync. What's the easy and automated way o
Tom Lane writes:
> Well, actually, that's *considerably* more complicated than just a
> timeout. How are you going to "mark the standby as degraded"? The
> standby can't keep that information, because it's not even connected
> when the master makes the decision. ISTM that this requires
>
> 1. a
On 10/08/2010 04:11 PM, Tom Lane wrote:
> Actually, #2 seems rather difficult even if you want it. Presumably
> you'd like to keep that state in reliable storage, so it survives master
> crashes. But how you gonna commit a change to that state, if you just
> lost every standby (suppose master's e
On Fri, Oct 8, 2010 at 5:10 PM, Heikki Linnakangas
wrote:
> Do we really need that?
Yes. But if there is no unsent WAL when the master goes down,
we can start new standby without new backup by copying the
timeline history file from new master to new standby and
setting recovery_target_timeline to
Greg Smith writes:
> I don't see this as needing any implementation any more complicated than
> the usual way such timeouts are handled. Note how long you've been
> trying to reach the standby. Default to -1 for forever. And if you hit
> the timeout, mark the standby as degraded and force th
On Fri, Oct 8, 2010 at 5:07 PM, Markus Wanner wrote:
> On 10/08/2010 04:01 AM, Fujii Masao wrote:
>> Really? I don't think that ko-count=0 means "wait-forever".
>
> Telling from the documentation, I'd also say it doesn't wait forever by
> default. However, please note that there are different para
Markus Wanner writes:
> ..and a whole lot of manual work, that's prone to error for something
> that could easily be automated
So, the master just crashed, first standby is dead and second ain't in
sync. What's the easy and automated way out? Sorry, I need a hand here.
--
Dimitri Fontaine
http:
On 10/08/2010 11:41 AM, Dimitri Fontaine wrote:
> Same old story. Either you're able to try and fix the master so that you
> don't lose any data and don't even have to check for that, or you take a
> risk and start from a non synced standby. It's all availability against
> durability again.
..and
Markus Wanner writes:
> ..and how do you make sure you are not marking your second standby as
> degraded just because it's currently lagging?
Well, in sync rep, a standby that's not able to stay under the timeout
is degraded. Full stop. The presence of the timeout (or its value not
being -1) mea
On 10/08/2010 11:00 AM, Simon Riggs wrote:
> From the perspective of an observer, randomly selecting a standby for
> load balancing purposes: No, they are not guaranteed to see the "latest"
> answer, nor even can they find out whether what they are seeing is the
> latest answer.
I completely agree
On 10/08/2010 01:44 AM, Greg Smith wrote:
> They'll use Sync Rep to maximize
> the odds a system failure doesn't cause any transaction loss. They'll
> use good quality hardware on the master so it's unlikely to fail.
.."unlikely to fail"?
Ehm.. is that you speaking, Greg? ;-)
> But
> when the d
On Fri, 2010-10-08 at 11:27 +0300, Heikki Linnakangas wrote:
> On 08.10.2010 11:25, Simon Riggs wrote:
> > On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote:
> >>>
> >>> Or what kind of customers do you think really need a no-lag solution for
> >>> read-only queries? In the LAN case, the
On 10/08/2010 09:56 AM, Heikki Linnakangas wrote:
> Imagine a web application that's mostly read-only, but a
> user can modify his own personal details like name and address, for
> example. Imagine that the user changes his street address and clicks
> 'save', causing an UPDATE, and the next query f
On 10/08/2010 10:27 AM, Heikki Linnakangas wrote:
> Synchronous replication in the 'replay' mode is supposed to guarantee
> exactly that, no?
The master may lag behind, so it's not strictly speaking the same data.
Regards
Markus Wanner
--
Sent via pgsql-hackers mailing list (pgsql-hackers@pos
On 08.10.2010 11:25, Simon Riggs wrote:
On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote:
Or what kind of customers do you think really need a no-lag solution for
read-only queries? In the LAN case, the lag of async rep is negligible
and in the WAN case the latencies of sync rep are
On Fri, 2010-10-08 at 10:56 +0300, Heikki Linnakangas wrote:
> >
> > Or what kind of customers do you think really need a no-lag solution for
> > read-only queries? In the LAN case, the lag of async rep is negligible
> > and in the WAN case the latencies of sync rep are prohibitive.
>
> There is a
On 08.10.2010 01:25, Simon Riggs wrote:
On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote:
To get "non-stale" responses, you can only query those k=3 servers.
But you've shot your self in the foot because you don't know which
3/10 those will be. The other 7 *are* stale (by definition). T
On 10/08/2010 05:41 AM, Fujii Masao wrote:
> But, even with quorum commit, if you choose wait-forever option,
> failover would decrease availability. Right after the failover,
> no standby has connected to new master, so if quorum >= 1, all
> the transactions must wait for a while.
That's a point,
On Fri, 2010-10-08 at 09:52 +0200, Markus Wanner wrote:
> One addendum: a timeout increases availability at the cost of
> increased danger of data loss and higher complexity. Don't use it,
> just increase (N - k) instead.
Completely agree.
--
Simon Riggs www.2ndQuadrant.com
Postgre
On 08.10.2010 06:41, Fujii Masao wrote:
On Thu, Oct 7, 2010 at 3:01 AM, Markus Wanner wrote:
Of course, it doesn't make sense to wait-forever on *every* standby that
ever gets added. Quorum commit is required, yes (and that's what this
thread is about, IIRC). But with quorum commit, adding a st
On 10/08/2010 04:01 AM, Fujii Masao wrote:
> Really? I don't think that ko-count=0 means "wait-forever".
Telling from the documentation, I'd also say it doesn't wait forever by
default. However, please note that there are different parameters for
the initial wait for connection during boot up (wfc
On 07.10.2010 21:38, Markus Wanner wrote:
On 10/07/2010 03:19 PM, Dimitri Fontaine wrote:
I think you're all into durability, and that's good. The extra cost is
service downtime
It's just *reduced* availability. That doesn't necessarily mean
downtime, if you combine cleverly with async replica
Simon,
On 10/08/2010 12:25 AM, Simon Riggs wrote:
> Asking for k > 1 does *not* mean those servers are time synchronised.
Yes, it's technically impossible to create a fully synchronized cluster
(on the basis of shared-nothing nodes we are aiming for, that is). There
always is some kind of "lag" o
On 10/08/2010 12:30 AM, Simon Riggs wrote:
> I do, but its not a parameter. The k = 1 behaviour is hardcoded and
> considerably simplifies the design. Moving to k > 1 is additional work,
> slows things down and seems likely to be fragile.
Perfect! So I'm all in favor of committing that, but leavin
Greg Smith writes:
[…]
> I don't see this as needing any implementation any more complicated than the
> usual way such timeouts are handled. Note how long you've been trying to
> reach the standby. Default to -1 for forever. And if you hit the timeout,
> mark the standby as degraded and force t
On Fri, Oct 8, 2010 at 8:44 AM, Greg Smith wrote:
> Additional code? Yes. Foot-gun? Yes. Timeout should be disabled by
> default so that you get wait forever unless you ask for something different?
> Probably. Unneeded? This is where we don't agree anymore. The example
> that Josh Berkus j
On Thu, 2010-10-07 at 19:44 -0400, Greg Smith wrote:
> I don't see this as needing any implementation any more complicated than
> the usual way such timeouts are handled. Note how long you've been
> trying to reach the standby. Default to -1 for forever. And if you hit
> the timeout, mark th
On Thu, Oct 7, 2010 at 3:01 AM, Markus Wanner wrote:
> Of course, it doesn't make sense to wait-forever on *every* standby that
> ever gets added. Quorum commit is required, yes (and that's what this
> thread is about, IIRC). But with quorum commit, adding a standby only
> improves availability, b
On Thu, Oct 7, 2010 at 5:01 AM, Simon Riggs wrote:
> You seem willing to trade anything for that guarantee. I seek a more
> pragmatic approach that balances availability and risk.
>
> Those views are different, but not inconsistent. Oracle manages to offer
> multiple options and so can we.
+1
Re
On Wed, Oct 6, 2010 at 9:22 PM, Dimitri Fontaine wrote:
> From my experience operating londiste, those states would be:
>
> 1. base-backup — self explaining
> 2. catch-up — getting the WAL to catch up after base backup
> 3. wanna-sync — don't yet have all the WAL to get in sync
> 4. do-
On Thu, Oct 7, 2010 at 10:24 PM, Fujii Masao wrote:
> On Wed, Oct 6, 2010 at 6:00 PM, Heikki Linnakangas
> wrote:
>> In general, salvaging the WAL that was not sent to the standby yet is
>> outright impossible. You can't achieve zero data loss with asynchronous
>> replication at all.
>
> No. That
On Wed, Oct 6, 2010 at 6:00 PM, Heikki Linnakangas
wrote:
> In general, salvaging the WAL that was not sent to the standby yet is
> outright impossible. You can't achieve zero data loss with asynchronous
> replication at all.
No. That depends on the type of failure. Unless the disk in the master
On Wed, Oct 6, 2010 at 6:11 PM, Markus Wanner wrote:
> Yeah, sounds more likely. Then I'm surprised that I didn't find any
> warning that the Protocol C definitely reduces availability (with the
> ko-count=0 default, that is).
Really? I don't think that ko-count=0 means "wait-forever". IIRC,
when
Markus Wanner wrote:
So far I've been under the impression that Simon already has the code
for quorum_commit k = 1.
What I'm opposing to is the timeout "feature", which I consider to be
additional code, unneeded complexity and foot-gun.
Additional code? Yes. Foot-gun? Yes. Timeout shoul
All,
> Establishing an affinity between a session and one of the database
> servers will only help if the traffic is strictly read-only.
I think this thread has drifted very far away from anything we're going
to do for 9.1. And seems to have little to do with synchronous replication.
Synch rep
On Thu, 2010-10-07 at 19:50 +0200, Markus Wanner wrote:
> So far I've been under the impression that Simon already has the code
> for quorum_commit k = 1.
I do, but its not a parameter. The k = 1 behaviour is hardcoded and
considerably simplifies the design. Moving to k > 1 is additional work,
sl
On Thu, 2010-10-07 at 13:44 -0400, Aidan Van Dyk wrote:
> To get "non-stale" responses, you can only query those k=3 servers.
> But you've shot your self in the foot because you don't know which
> 3/10 those will be. The other 7 *are* stale (by definition). They
> talk about picking the "caught
Robert Haas wrote:
> Establishing an affinity between a session and one of the database
> servers will only help if the traffic is strictly read-only.
Thanks; I now see your point.
In our environment, that's pretty common. Our most heavily used web
app (the one for which we have, at times,
Markus Wanner writes:
> I don't buy that. The risk calculation gets a lot simpler and obvious
> with strict guarantees.
Ok, I'm lost in the use cases and analysis.
I still don't understand why you want to consider the system already
synchronous when it's not, whatever is the guarantee you're as
On Thu, Oct 7, 2010 at 2:31 PM, Kevin Grittner
wrote:
> Robert Haas wrote:
>> Kevin Grittner wrote:
>
>>> With web applications, at least, you often don't care that the
>>> data read is absolutely up-to-date, as long as the point in time
>>> doesn't jump around from one request to the next. Whe
On 10/07/2010 07:44 PM, Aidan Van Dyk wrote:
> The only case I see a "race to quorum" type of k < N being useful is
> if you're just trying to duplicate data everywhere, but not actually
> querying any of the replicas. I can see that "all queries go to the
> master, but the chances are pretty high
On 10/07/2010 03:19 PM, Dimitri Fontaine wrote:
> I think you're all into durability, and that's good. The extra cost is
> service downtime
It's just *reduced* availability. That doesn't necessarily mean
downtime, if you combine cleverly with async replication.
> if that's not what you're after:
Robert Haas wrote:
> Kevin Grittner wrote:
>> With web applications, at least, you often don't care that the
>> data read is absolutely up-to-date, as long as the point in time
>> doesn't jump around from one request to the next. When we have
>> used load balancing between multiple database se
On Thu, Oct 7, 2010 at 2:10 PM, Kevin Grittner
wrote:
> Aidan Van Dyk wrote:
>
>> To get "non-stale" responses, you can only query those k=3
>> servers. But you've shot your self in the foot because you don't
>> know which 3/10 those will be. The other 7 *are* stale (by
>> definition). They tal
Aidan Van Dyk wrote:
> To get "non-stale" responses, you can only query those k=3
> servers. But you've shot your self in the foot because you don't
> know which 3/10 those will be. The other 7 *are* stale (by
> definition). They talk about picking the "caught up" slave when
> the master fails
> But as a practical matter, I'm afraid the true cost of the better
> guarantee you're suggesting here is additional code complexity that will
> likely cause this feature to miss 9.1 altogether. As far as I'm
> concerned, this whole diversion into the topic of quorum commit is only
> consuming re
On 10/07/2010 06:41 PM, Greg Smith wrote:
> The cost of hardware capable of running a database server is a large
> multiple of what you can build an alerting machine for.
You realize you don't need lots of disks nor RAM for a box that only
ACKs? A box with two SAS disks and a BBU isn't that expens
> If you want "synchronous replication" because you want "query
> availabilty" while making sure you're not getting "stale" queries from
> all your slaves, than using your k < N (k = 3 and N - 10) situation is
> screwing your self.
Correct. If that is your reason for synch standby, then you shoul
On Thu, Oct 7, 2010 at 1:22 PM, Josh Berkus wrote:
> So if you have k = 3 and N = 10, then you can have 10 standbys and only
> 3 of them need to ack any specific commit for the master to proceed. As
> long as (a) you retain at least one of the 3 which ack'd, and (b) you
> have some way of determi
On 10/7/10 6:41 AM, Aidan Van Dyk wrote:
> I'm really confused with all this k < N scenarious I see bandied
> about, because, all it really amounts to is "I only want *one*
> syncronous replication, and a bunch of synchrounous replications".
> And a bit of chance thrown in the mix to hope the "sync
Markus Wanner wrote:
I think that's a pretty special case, because the "good alerting system"
is at least as expensive as another server that just persistently stores
and ACKs incoming WAL.
The cost of hardware capable of running a database server is a large
multiple of what you can build a
Aidan Van Dyk writes:
> *shrug* The joining standby is still asynchronous at this point.
> It's not synchronous replication. It's just another ^k of the N
> slaves serving stale data ;-)
Agreed *here*, but if you read the threads again, you'll see that's not
at all what's been talked about befo
On Thu, Oct 7, 2010 at 10:08 AM, Dimitri Fontaine
wrote:
> Aidan Van Dyk writes:
>> Sure, but that lagged standy is already asynchrounous, not
>> synchrounous. If it was synchronous, it would have slowed the master
>> down enough it would not be lagged.
>
> Agreed, except in the case of a joinin
Aidan Van Dyk writes:
> Sure, but that lagged standy is already asynchrounous, not
> synchrounous. If it was synchronous, it would have slowed the master
> down enough it would not be lagged.
Agreed, except in the case of a joining standby. But you're saying it
better than I do:
> Yes, I believ
On Thu, Oct 7, 2010 at 6:32 AM, Dimitri Fontaine wrote:
> Or if the standby is lagging and the master wal_keep_segments is not
> sized big enough. Is that a catastrophic loss of the standby too?
Sure, but that lagged standy is already asynchrounous, not
synchrounous. If it was synchronous, it w
Markus Wanner writes:
> Why does one ever want the guarantee that sync replication gives to only
> hold true up to one failure, if a better guarantee doesn't cost anything
> extra? (Note that a "good alerting system" is impossible to achieve with
> only two servers. You need a third device anyway)
Salut Dimitri,
On 10/07/2010 12:32 PM, Dimitri Fontaine wrote:
> Another one is to say that I want sync rep when the standby is
> available, but I don't have the budget for more. So I prefer a good
> alerting system and low-budget-no-guarantee when the standby is down,
> that's my risk evaluation.
On Thu, Oct 7, 2010 at 3:30 AM, Simon Riggs wrote:
> Yes, lets get k = 1 first.
>
> With k = 1 the number of standbys is not limited, so we can still have
> very robust and highly available architectures. So we mean
> "first-acknowledgement-releases-waiters".
+1. I like the design Greg Smith pro
On 10/07/2010 01:08 PM, Simon Riggs wrote:
> Adding timeout is very little code. We can take that out of the patch if
> that's an objection.
Okay. If you take it out, we are at the wait-forever option, right?
If not, I definitely don't understand how you envision things to happen.
I've been askin
On Thu, 2010-10-07 at 11:46 +0200, Markus Wanner wrote:
> On 10/06/2010 10:01 PM, Simon Riggs wrote:
> > The code to implement your desired option is
> > more complex and really should come later.
>
> I'm sorry, but I think of that exactly the opposite way.
I see why you say that. Dimitri's sugg
Heikki Linnakangas writes:
> Either that, or you configure your system for asynchronous replication
> first, and flip the switch to synchronous only after the standby has caught
> up. Setting up the first standby happens only once when you initially set up
> the system, or if you're recovering fro
On 07.10.2010 12:52, Dimitri Fontaine wrote:
Markus Wanner writes:
I'm just saying that this should be an option, not the only choice.
I'm sorry, I just don't see the use case for a mode that drops
guarantees when they are most needed. People who don't need those
guarantees should definitely
Markus Wanner writes:
>> I'm just saying that this should be an option, not the only choice.
>
> I'm sorry, I just don't see the use case for a mode that drops
> guarantees when they are most needed. People who don't need those
> guarantees should definitely go for async replication instead.
We'r
On 10/06/2010 10:01 PM, Simon Riggs wrote:
> The code to implement your desired option is
> more complex and really should come later.
I'm sorry, but I think of that exactly the opposite way. The timeout for
automatic continuation after waiting for a standby is the addition. The
wait state of the
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote:
> (2), (3) Degradation: (Jeff) these two cases make sense only if we
> give
> DBAs the tools they need to monitor which standbys are falling behind,
> and to drop and replace those standbys. Otherwise we risk giving DBAs
> false confidence that
On Wed, 2010-10-06 at 10:57 -0700, Josh Berkus wrote:
> I also strongly believe that we should get single-standby
> functionality committed and tested *first*, before working further on
> multi-standby.
Yes, lets get k = 1 first.
With k = 1 the number of standbys is not limited, so we can still
On Wed, 2010-10-06 at 18:04 +0300, Heikki Linnakangas wrote:
> The key is whether you are guaranteed to have zero data loss or not.
We agree that is an important question.
You seem willing to trade anything for that guarantee. I seek a more
pragmatic approach that balances availability and risk.
> Seems reasonable, but what is a CAP database?
Database based around the CAP theorem[1]. Cassandra, Dynamo,
Hypertable, etc.
For us, the equation is: CAD, as in Consistency, Availability,
Durability. Pick any two, at best. But it's a very similar bag of
issues as the ones CAP addresses.
[1]
On 10/06/2010 09:04 PM, Dimitri Fontaine wrote:
> Ok so I think we're agreeing here: what I said amounts to propose that
> the code does work this way when the quorum is such setup, and/or is
> able to reject any non-read-only transaction (those that needs a real
> XID) until your standby is fully
On 06.10.2010 20:57, Josh Berkus wrote:
While it's nice to dismiss case (1) as an edge-case, consider the
likelyhood of someone running PostgreSQL with fsync=off on cloud
hosting. In that case, having k = N = 5 does not seem like an
unreasonable arrangement if you want to ensure durability via
r
Markus Wanner writes:
> There's no point in time I
> ever mind if a standby is a "candidate" or not. Either I want to
> synchronously replicate to X standbies, or not.
Ok so I think we're agreeing here: what I said amounts to propose that
the code does work this way when the quorum is such setup,
1 - 100 of 138 matches
Mail list logo