Hi Enrique,
thanks for reaching out to the community. I'm not 100% sure what problem
you're facing. The log messages you're sharing could mean that the Flink
cluster still behaves as normal having some outages and the HA
functionality kicking in.

The behavior you're seeing with leaders for the different actors (i.e.
RestServer, Dispatcher, ResourceManager) being located on different hosts
is fine and no indication for something going wrong as well.

It might help to share the entire logs with us if you need assistance in
investigating your issue.

Best,
Matthias

On Thu, May 27, 2021 at 12:42 PM Enrique <enriquela...@gmail.com> wrote:

> To add to my post, instead of using POD IP for the `jobmanager.rpc.address`
> configuration we start each JM pod with the Fully Qualified Name `--host
> <pod-name>.<stateful-set-name>.ns.svc:8081`  and this address gets
> persisted
> to the ConfigMaps. In some scenarios, the leader address in the ConfigMaps
> might differ.
>
> For example, let's assume I have 3 JMs:
>
> jm-0.jm-statefulset.ns.svc:8081 <-- Leader
> jm-1.jm-statefulset.ns.svc:8081
> jm-2.jm-statefulset..ns.svc:8081
>
> I have seen the ConfigMaps in the following state:
>
> RestServer Configmap Address: jm-0.jm-statefulset.ns.svc:8081
> DispatchServer Configmap Address: jm-1.jm-statefulset.ns.svc:8081
> ResourceManager ConfigMap Address: jm-0.jm-statefulset.ns.svc:8081
>
> Is this the correct behaviour?
>
> I then have seen that the TM pods fail to connect due to
>
> ```
> java.util.concurrent.CompletionException:
> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing
> token
> not set: Ignoring message
> RemoteFencedMessage(b870874c1c590d593178811f052a42c9,
> RemoteRpcInvocation(registerTaskExecutor(TaskExecutorRegistration, Time)))
> sent to
> akka.tcp://fl...@jm-1.jm-statefulset.ns.svc
> :6123/user/rpc/resourcemanager_0
> because the fencing token is null.
> ```
>
> This is explained by Till
>
> https://issues.apache.org/jira/browse/FLINK-18367?focusedCommentId=17141070&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17141070
>
> Has anyone else seen this?
>
> Thanks!
>
> Enrique
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Reply via email to