Hi Alexander,

thanks for the reply. We use AT_LEAST_ONCE delivery semantics in this 
application. Do you think this might still be related?

Best regards,
Christian


Von: Alexander Fedulov <alexan...@ververica.com>
Datum: Montag, 13. Juni 2022 um 13:06
An: "user@flink.apache.org" <user@flink.apache.org>
Cc: Christian Lorenz <christian.lor...@mapp.com>
Betreff: Re: Kafka Consumer commit error

This email has reached Mapp via an external source

Hi Christian,

you should check if the exceptions that you see after the broker is back from 
maintenance are the same as the ones you posted here. If you are using 
EXACTLY_ONCE, it could be that the later errors are caused by Kafka purging 
transactions that Flink attempts to commit [1].

Best,
Alexander Fedulov

[1] 
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/connectors/datastream/kafka/#fault-tolerance

On Mon, Jun 13, 2022 at 12:04 PM Martijn Visser 
<martijnvis...@apache.org<mailto:martijnvis...@apache.org>> wrote:
Hi Christian,

I would expect that after the broker comes back up and recovers completely, 
these error messages would disappear automagically. It should not require a 
restart (only time). Flink doesn't rely on Kafka's checkpointing mechanism for 
fault tolerance.

Best regards,

Martijn

Op wo 8 jun. 2022 om 15:49 schreef Christian Lorenz 
<christian.lor...@mapp.com<mailto:christian.lor...@mapp.com>>:
Hi,

we have some issues with a job using the flink-sql-connector-kafka (flink 
1.15.0/standalone cluster). If one broker e.g. is restarted for maintainance 
(replication-factor=2), the taskmanagers executing the job are constantly 
logging errors on each checkpoint creation:

Failed to commit consumer offsets for checkpoint 50659
org.apache.flink.kafka.shaded.org.apache.kafka.clients.consumer.RetriableCommitFailedException:
 Offset commit failed with a retriable exception. You should retry committing 
the latest consumed offsets.
Caused by: 
org.apache.flink.kafka.shaded.org.apache.kafka.common.errors.CoordinatorNotAvailableException:
 The coordinator is not available.

AFAICT the error itself is produced by the underlying kafka consumer. 
Unfortunately this error cannot be reproduced on our test system.
From my understanding this error might occur once, but follow up checkpoints / 
kafka commits should be fine again.
Currently my only way of “fixing” the issue is to restart the taskmanagers.

Is there maybe some kafka consumer setting which would help to circumvent this?

Kind regards,
Christian
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.
Mapp Digital Germany GmbH with registered offices at Dachauer, Str. 63, 80335 
München.
Registered with the District Court München HRB 226181
Managing Directors: Frasier, Christopher & Warren, Steve
This e-mail is from Mapp Digital and its international legal entities and may 
contain information that is confidential or proprietary.
If you are not the intended recipient, do not read, copy or distribute the 
e-mail or any attachments. Instead, please notify the sender and delete the 
e-mail and any attachments.
Please consider the environment before printing. Thank you.

Reply via email to