We have stumbled upon an issue on a running cluster with multiple
source/sink connectors:

   1. One of our connectors was a JDBC sink connector connected to an SQL
   Server database (using the oracle JDBC driver).
   2. It turns out that the DB instance had a problem causing all queries
   to be stuck forever, which in turn made the start method of the connector
   hang forever.
   3. After some time, the entire Kafka Connect cluster was unavailable and
   the REST API was not responding giving {"error_code":500,"message":"Request
   timed out"} for most requests.
   4. Pausing (just before the deletion of the consumer group) or deleting
   the problematic connector allowed the cluster to run normally again.

We could reproduce the same issue by adding Thread.sleep(300000) in the
start method or in the put method of the ConnectorTask.

Wanted to know if there's any wiki/documentation provided that mentions how
to handle this issue. My approach would be to throw a timeout after waiting
for a particular time period and make the connector fail fast.

-- 
Thanks & Regards,
Hemanth

Reply via email to