Hello, Our Kafka streams applications are showing the following warning every few seconds (on each of our 3 brokers, and on each of the 2 instances of the streams application):
[Producer clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-StreamThread-1-1_24-producer, transactionalId=event-rule-engine-1_24] Resetting sequence number of batch with current sequence 1 for partition event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24 to 0 Followed by: [Producer clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-StreamThread-1-1_24-producer, transactionalId=event-rule-engine-1_24] Got error produce response with correlation id 5902 on topic-partition event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID The brokers are showing errors that look related: Error processing append operation on partition event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24 (kafka.server.ReplicaManager) org.apache.kafka.common.errors.UnknownProducerIdException: Found no record of producerId=72 on the broker. It is possible that the last message with the producerId=72 has been removed due to hitting the retention limit. We would expect the UNKNOWN_PRODUCER_ID error to occur once. After a retry the record would be published on the partition and the PRODUCER_ID would be known. However, this error keeps occurring every few seconds. This is roughly at the same rate at which records are produced on the input topics partitions, so it seems like it occurs for (nearly) every input record. The following JIRA issue: https://issues.apache.org/jira/browse/KAFKA-7190 looks related. Except the Jira issue mentions ‘little traffic’, and I am not sure if a message per every few seconds is regarded as little traffic. Matthias mentions in the issue that a workaround seems to be to increase topic configs `segment.bytes`, `segment.index.bytes`, and `segment.ms` for the corresponding repartition topics. We’ve tried manually overriding these configs for a relevant topic to the config values in the linked pull request (https://github.com/apache/kafka/pull/6511) but this did not result in the errors disappearing. Could anyone help us to figure out what is happening here, and why the proposed fix for the above JIRA issue is not working in this case? Best, Pieter