Ravindranath Kakarla created KAFKA-14952:
--------------------------------------------

             Summary: Publish metrics when source connector fails to poll data
                 Key: KAFKA-14952
                 URL: https://issues.apache.org/jira/browse/KAFKA-14952
             Project: Kafka
          Issue Type: Improvement
          Components: KafkaConnect
    Affects Versions: 3.3.2
            Reporter: Ravindranath Kakarla


Currently, there is no metric in Kafka Connect to track when a source connector 
fails to poll data from the source. This information would be useful to 
operators and developers to visualize, monitor and alert when the connector 
fails to poll records from the source.

Existing metrics like `kafka_producer_producer_metrics_record_error_total` and 

`kafka_connect_task_error_metrics_total_record_failures` only cover failures 
when producing data to the Kafka cluster but not when the source task fails 
with a retryable exception or ConnectException.

Polling from source can fail due to unavailability of the source system or 
errors with the connect configuration. Currently, this cannot be monitored 
directly using metrics and instead operators have to rely on log diving which 
is not consistent with how other metrics are monitored.

I propose adding new metrics to Kafka Connect, "source-record-poll-error-total" 
and "source-record-poll-error-rate" that can be used to monitor failures during 
polling.

`source-record-poll-error-total` - The total number of times a source connector 
failed to poll data from the source. This will include both retryable and 
non-retryable exceptions.

`source-record-poll-error-rate` - The rate of above failures per unit of time. 

These metrics would be tracked at the connector level and could be exposed 
through the JMX along with the other metrics.

I am willing to submit a PR if this looks good, sample implementation code 
below,

 
{code:java}
//AbstractWorkerSourceTask.java

protected List<SourceRecord> poll() throws InterruptedException {
    try {
        return task.poll();
    } catch (RetriableException | 
org.apache.kafka.common.errors.RetriableException e) {
        log.warn("{} failed to poll records from SourceTask. Will retry 
operation.", this, e);
      
         sourceTaskMetricsGroup.recordPollError();

        // Do nothing. Let the framework poll whenever it's ready.
        return null;
    } catch (Throwable e) {
        sourceTaskMetricsGroup.recordPollError();
        
        throw e;
    }
} {code}
 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to