[ https://issues.apache.org/jira/browse/SLING-4627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Egli closed SLING-4627. ------------------------------ > TOPOLOGY_CHANGED in an eventually consistent repository > ------------------------------------------------------- > > Key: SLING-4627 > URL: https://issues.apache.org/jira/browse/SLING-4627 > Project: Sling > Issue Type: Improvement > Components: Extensions > Reporter: Stefan Egli > Assignee: Stefan Egli > Priority: Critical > Fix For: Discovery Commons 1.0.0, Discovery Oak 1.0.0 > > Attachments: SLING-4627.patch, SLING-4627.patch > > > This is a parent ticket describing the +coordination effort needed between > properly sending TOPOLOGY_CHANGED when running ontop of an eventually > consistent repository+. These findings are independent of the implementation > details used inside the discovery implementation, so apply to discovery.impl, > discovery.etcd/.zookeeper/.oak etc. Tickets to implement this for specific > implementation are best created separately (eg sub-task or related..). Also > note that this assumes immediately sending TOPOLOGY_CHANGING as described [in > SLING-3432|https://issues.apache.org/jira/browse/SLING-3432?focusedCommentId=14492494&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14492494] > h5. The spectrum of possible TOPOLOGY_CHANGED events include the following > scenarios: > || scenario || classification || action || > | A. change is completely outside of local cluster | (/) uncritical | changes > outside the cluster are considered uncritical for this exercise. | > | B. a new instance joins the local cluster, this new instance is by contract > not the leader (leader must be stable \[0\]) | (/) uncritical | a join of an > instance is uncritical due to the fact that it merely joins the cluster and > has thus no 'backlog' of changes that might be propagating through the > (eventually consistent) repository. | > | C. a non-leader *leaves* the local cluster | (x) *critical* | changes that > were written by the leaving instance might still not be *seen* by all > surviving (ie it can be that discovery is faster than the repository) and > this must be assured before sending out TOPOLOGY_CHANGED. This is because the > leaving instance could have written changes that are *topology dependent* and > thus those changes must first be settled in the repository before continuing > with a *new topology*. | > | D. the leader *leaves* the local cluster (and thus a new leader is elected) > | (x)(x) *very critical* | same as C except that this is more critical due to > the fact that the leader left | > | E. -the leader of the local cluster changes (without leaving)- this is not > supported by contract (leader must be stable \[0\]) | (/) -irrelevant- | | > So both C and D are about an instance leaving. And as mentioned above the > survivors must assure they have read all changes of the leavers. There are > two parts to this: > * the leaver could have pending writes that are not yet in mongoD: I don't > think this is the case. The only thing that can remain could be an > uncommitted branch and that would be rolled back afaik. > ** Exception to this is a partition: where the leaver didn't actually crash > but is still hooked to the repository. *For this I'm not sure how it can be > solved* yet. > * the survivers could however not yet have read all changes (pending in the > background read) and one way to make sure they did is to have each surviving > instance write a (pseudo-) sync token to the repository. Once all survivors > have seen this sync token of all other survivors, the assumption is that all > pending changes are "flushed" through the eventually consistent repository > and that it is safe to send out a TOPOLOGY_CHANGED event. > * this sync token must be *conflict free* and could be eg: > {{/var/discovery/oak/clusterInstances/<slingId>/syncTokens/<newViewId>}} - > where {{newViewId}} is defined by whatever discovery mechanism is used > * a special case is when only one instance is remaining. It can then not wait > for any other survivor to send a sync token. In that case sync tokens would > not work. All it could then possibly do is to wait for a certain time (which > should be larger than any expected background-read duration) > [~mreutegg], [~chetanm] can you pls confirm/comment on the above "flush/sync > token" approach? Thx! > /cc [~marett] > \[0\] - see [getLeader() in > ClusterView|https://github.com/apache/sling/blob/trunk/bundles/extensions/discovery/api/src/main/java/org/apache/sling/discovery/ClusterView.java] -- This message was sent by Atlassian JIRA (v6.3.4#6332)