[ 
https://issues.apache.org/jira/browse/FLINK-32010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17719968#comment-17719968
 ] 

David Morávek edited comment on FLINK-32010 at 5/6/23 6:20 AM:
---------------------------------------------------------------

master: 026d7ccfe1d6f4cfa26c9038dd05403c889d2e0d

release-1.16: 6aa84630a39fbe33209487fbeac412dc98439b46

release-1.17: 2799038b964e88129545bf4e6a5128c03e3d2f2b


was (Author: davidmoravek):
master: 026d7ccfe1d6f4cfa26c9038dd05403c889d2e0d

> KubernetesLeaderRetrievalDriver always waits for lease update to resolve 
> leadership
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-32010
>                 URL: https://issues.apache.org/jira/browse/FLINK-32010
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / Kubernetes, Runtime / Coordination
>    Affects Versions: 1.17.0, 1.16.1, 1.18.0
>            Reporter: David Morávek
>            Assignee: David Morávek
>            Priority: Major
>              Labels: pull-request-available
>
> The k8s-based leader retrieval is based on ConfigMap watching. The config map 
> lifecycle (from the consumer point of view) is handled as a series of events 
> with the following types:
>  * ADDED -> the first time the consumer has seen the CM
>  * UPDATED -> any further changes to the CM
>  * DELETED -> ... you get the idea
> The implementation assumes that ElectionDriver (the one that creates the CM) 
> and ElectionRetriver are started simultaneously and therefore ignore the 
> ADDED events because the CM is always created as empty and is updated with 
> the leadership information later on.
> This assumption is incorrect in the following cases (I might be missing some, 
> but that's not important, the goal is to illustrate the problem):
>  * TM joining the cluster later when the leaders are established to discover 
> RM / JM
>  * RM tries to discover JM when 
> MultipleComponentLeaderElectionDriver is used
> This, for example, leads to higher job submission latencies that could be 
> unnecessarily held back for up to the lease retry period [1].
> [1] Configured by _high-availability.kubernetes.leader-election.retry-period_



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to