Edoardo Comar created KAFKA-16931:
-------------------------------------

             Summary: A transient REST failure to forward fenceZombie request 
leaves Connect Task in FAILED state
                 Key: KAFKA-16931
                 URL: https://issues.apache.org/jira/browse/KAFKA-16931
             Project: Kafka
          Issue Type: Bug
          Components: connect
            Reporter: Edoardo Comar


When Kafka Connect runs in exactly_once mode, a task restart will fence 
possible zombies tasks.

This is achieved forwarding the request to the leader worker using the REST 
protocol.

At scale, in distributed mode, occasionally an HTTPs request may fail because 
of a networking glitch, reconfiguration etc

Currently there is no attempt to retry the REST request, the task is left in a 
FAILED state and requires an external restart (with the REST API).

Would this issue require a small KIP to introduce configuration entries to  
limit the number of retries, backoff times etc ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to