[ https://issues.apache.org/jira/browse/SLING-4640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974545#comment-14974545 ]
Stefan Egli edited comment on SLING-4640 at 10/26/15 5:02 PM: -------------------------------------------------------------- This would require help from the underlying repository. Without knowing how large the read/write-delay of the local instance is, there's no way of dealing with this. However, if such a read/write-delay would be known and periodically checked for, then the instance could put itself into a {{TOPOLOGY_CHANGING}} state as soon as the read-delay is larger than the configured heartbeat timeout (or actually a little bit before that to account for glitches). Removing the fix-version here for now as this requires non-standard repository support was (Author: egli): This would require help from the underlying repository. Without knowing how large the read-delay of the local instance is, there's no way of dealing with this. However, if such a read-delay would be known and periodically checked for, then the instance could put itself into a {{TOPOLOGY_CHANGING}} state as soon as the read-delay is larger than the configured heartbeat timeout (or actually a little bit before that to account for glitches). Removing the fix-version here for now as this requires non-standard repository support > Possibility of duplicate leaders w/discovery.impl on eventually consistent > repo > ------------------------------------------------------------------------------- > > Key: SLING-4640 > URL: https://issues.apache.org/jira/browse/SLING-4640 > Project: Sling > Issue Type: Bug > Components: Extensions > Affects Versions: Discovery Impl 1.1.0 > Reporter: Stefan Egli > Assignee: Stefan Egli > > Note: This is a fork of SLING-3432 based on a > [comment|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14495936&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14495936]. > So here is that comment again: > Note that [the > above|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494] > does not solve the problem where the underlying repository is eventually > consistent and the heartbeat configured is too low to catch all possible > delays (that such an eventually consistent repository might produce under > load). Consider the following: > # a cluster consisting of 3 nodes: A, B and C, A is the leader > # writes from B and C are fast - and can be read by all 3 nodes fast > # writes from A though are slow (ie A behaves asymmetric: slow writes but > fast reads) > # at some point writes from A are slower than the configured heartbeat > timeout: at this point B and C find out about this and vote on a new > clusterView consisting only of B and C and (let's say) B becomes leader. > #* meanwhile at A however: A is still happy: it sees the heartbeats of B and > C in time and would not start a new voting. > # at some later point (with a *certain read delay*) A sees that B and C have > declared a new {{/establishedViews}} - at this point it would (according to > the new rule above) immediately send a TOPOLOGY_CHANGING and things would be > 'ok' again. > #* *but* until it does send this event - *between 4. and 5. - we have two > leaders: A and B*! -> thus could see issues reported here in SLING-3432 still > during that small timeframe (which is basically the amount of time it takes > for the new established view declared by B and C to be read by A). > #* at a later time, when eg the delays in the repository have come down, A > would rejoin the cluster - but would have to *not become leader* again, as > the leader is B and must stay stable. > This IMHO highlights the problem that using an eventually consistent > repository (that has no max guaranteed delay) is *not* > pseudo-network-partition/duplicate-leader free under load. > Note that what is described here is not fixed by SLING-4627. -- This message was sent by Atlassian JIRA (v6.3.4#6332)