EDGAR CHERNICK created CAMEL-15903:
--------------------------------------

             Summary: Master component do not retry endpoint startup on failure
                 Key: CAMEL-15903
                 URL: https://issues.apache.org/jira/browse/CAMEL-15903
             Project: Camel
          Issue Type: Bug
            Reporter: EDGAR CHERNICK


The cluster view implementations have a listener attribute where the master 
component hooks itself to receive leadership change events. 

When the app instance becomes leader the cluster view will mark that instance 
as leader then it will trigger the leadershipchangedevent, this will trigger 
the master component event handler and it will start the delegated consumer and 
endpoint.

The issue happens when the delegated consumer or endpoint fail to start. The 
exception throw by them will go up in the stack, however, this exception does 
not affect the leadership, i.e., once the app instance becomes leader it will 
stay so even if the delegated components fail to start.

Both KubernetesClusterView and FileLockClusterView have this issue.

KubernetesClusterView uses KubernetesLeadershipController to run the leadership 
check at an interval. When it acquires the leadership it updates the configmap 
with that info and call TimedLeaderNotifier refreshLeadership method to check 
if the leadership has changed. The issue here is that it will mark itself as 
leader before firing the leadership changed event. Another issue is that the 
event is fired in a separete thread, so, when the start of the delegated 
components fail the exception will "die" together with the thread. When the 
next scheduled leadership check runs the app instance is already the leader and 
it will not fire the leadership changed event and the delegated component will 
never start.

FileLockClusterView has a similar issue, it acquires the file lock prior to 
firing the event, even if the event processing fails it does not rollback the 
leader selection.

Other cluster view implementations might have the same issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to