Hello,

Our Kafka streams applications are showing the following warning every few 
seconds (on each of our 3 brokers, and on each of the 2 instances of the 
streams application):


[Producer 
clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-StreamThread-1-1_24-producer,
 transactionalId=event-rule-engine-1_24] Resetting sequence number of batch 
with current sequence 1 for partition 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24 to 0



Followed by:



[Producer 
clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-StreamThread-1-1_24-producer,
 transactionalId=event-rule-engine-1_24] Got error produce response with 
correlation id 5902 on topic-partition 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24, 
retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID

The brokers are showing errors that look related:


Error processing append operation on partition 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24 
(kafka.server.ReplicaManager)

org.apache.kafka.common.errors.UnknownProducerIdException: Found no record of 
producerId=72 on the broker. It is possible that the last message with the 
producerId=72 has been removed due to hitting the retention limit.



We would expect the UNKNOWN_PRODUCER_ID error to occur once. After a retry the 
record would be published on the partition and the PRODUCER_ID would be known. 
However, this error keeps occurring every few seconds. This is roughly at the 
same rate at which records are produced on the input topics partitions, so it 
seems like it occurs for (nearly) every input record.



The following JIRA issue: https://issues.apache.org/jira/browse/KAFKA-7190 
looks related. Except the Jira issue mentions ‘little traffic’, and I am not 
sure if a message per every few seconds is regarded as little traffic. Matthias 
mentions in the issue that a workaround seems to be to increase topic configs 
`segment.bytes`, `segment.index.bytes`, and `segment.ms` for the corresponding 
repartition topics. We’ve tried manually overriding these configs for a 
relevant topic to the config values in the linked pull request 
(https://github.com/apache/kafka/pull/6511) but this did not result in the 
errors disappearing.



Could anyone help us to figure out what is happening here, and why the proposed 
fix for the above JIRA issue is not working in this case?



Best,



Pieter

Reply via email to