Hi, we have some issues with a job using the flink-sql-connector-kafka (flink 1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance (replication-factor=2), the taskmanagers executing the job are constantly logging errors on each checkpoint creation:
Failed to commit consumer offsets for checkpoint 50659 org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException: Offset commit failed with a retriable exception. You should retry committing the latest consumed offsets. Caused by: org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException: The coordinator is not available. AFAICT the error itself is produced by the underlying kafka consumer. Unfortunately this error cannot be reproduced on our test system. From my understanding this error might occur once, but follow up checkpoints / kafka commits should be fine again. Currently my only way of “fixing” the issue is to restart the taskmanagers. Is there maybe some kafka consumer setting which would help to circumvent this? Kind regards, Christian Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 München. Registered with the District Court München HRB 226181 Managing Directors: Frasier, Christopher & Warren, Steve This e-mail is from Mapp Digital and its international legal entities and may contain information that is confidential or proprietary. If you are not the intended recipient, do not read, copy or distribute the e-mail or any attachments. Instead, please notify the sender and delete the e-mail and any attachments. Please consider the environment before printing. Thank you.