Hi,

We are facing an issue in our application where Kafka Consumer Retries are
failing whereas a restart of the application is making the Kafka Consumers
work as expected again.

Kafka Server version is 2.5.0 - confluent 5.5.0
Kafka Client Version is 2.4.1 -
{"component":"org.apache.kafka.common.utils.AppInfoParser$AppInfo","message":"Kafka
version: 2.4.1","method":"<init>"}

Occasionally(every 24 hours), we have observed that the Kafka consumption
rate went down(NOT 0) and the following logs were observed: Generally, the
consumption rate across all consumers is 1k records/sec. When this issue
occurred, the consumption rate dropped to < 100 records/sec

org.apache.kafka.common.errors.DisconnectException: null

{"time":"2021-07-07T22:13:37,385","severity":"INFO","component":"org.apache.kafka.clients.FetchSessionHandler","message":"[Consumer
clientId=consumer-XXXXXX-3, groupId=XXXXXX] Error sending fetch request
(sessionId=405798138, epoch=5808) to node 8: {}.","method":"handleError"}

org.apache.kafka.common.errors.TimeoutException: Failed

{"time":"2021-07-07T22:26:41,379","severity":"INFO","component":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator","message":"[Consumer
clientId=consumer-XXXXXX-3, groupId=XXXXXX] Group coordinator x.x.x.x:9092
<http://100.98.40.16:9092/> (id: 2147483623 rack: null) is unavailable or
invalid, will attempt rediscovery","method":"markCoordinatorUnknown"}

{"time":"2021-07-07T22:27:10,465","severity":"INFO","component":"org.apache.kafka.clients.consumer.internals.AbstractCoordinator$FindCoordinatorResponseHandler","message":"[Consumer
clientId=consumer-XXXXXX-3, groupId=XXXXXX] Discovered group coordinator x
.x.x.x:9092 <http://100.98.40.16:9092/> (id: 2147483623 rack:
null)","method":"onSuccess"}

The consumers retried for more than an hour but the above logs are observed
again.
The consumers started pulling data after a manual restart of the
application.

No WARN or ERROR logs were observed in Kafka or Zookeeper during this
period.

Our observation from this incident is that Kafka Consumer retries could not
resolve the issue but a manual restart of the application does.

Has anyone faced this issue before? Any pointers are appreciated.

Regards,
Rahul

Reply via email to