Hi Guozhang,

Some additional finding: it seems to only happen on Kakfa Streams repartition 
topics. We haven't seen this happening for any other topics so far.

Best,

Pieter

-----Oorspronkelijk bericht-----
Van: Pieter Hameete <pieter.hame...@blockbax.com> 
Verzonden: Wednesday, 5 June 2019 11:23
Aan: users@kafka.apache.org
Onderwerp: RE: Repeating UNKNOWN_PRODUCER_ID errors for Kafka streams 
applications

Hi Guozhang,

Thanks for your reply! I noticed my original mail went out twice by accident, 
sorry for that.

We currently have a small variety of keys so not all partitions are 'actively 
used' indeed. The strange thing is though is that the errors occur for the 
partitions that actively receive records every few seconds. I have checked this 
using kafkacat to consume the specific partitions. Something I noticed was that 
for each received record the partition offset was 2 higher than the previous 
record, instead of the expected 1. Could that be due to the producers retrying 
(see warning logs in my original mail)?

I've done the override for the configs in the repartition topics as follows, on 
one of the brokers:

The values are taken from your KIP-443 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-443%3A+Return+to+default+segment.ms+and+segment.index.bytes+in+Streams+repartition+topics

kafka-topics --zookeeper localhost:2181 --alter --topic 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
segment.index.bytes=10485760 kafka-topics --zookeeper localhost:2181 --alter 
--topic event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition 
--config segment.bytes= 52428800 kafka-topics --zookeeper localhost:2181 
--alter --topic 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
segment.ms=604800000 kafka-topics --zookeeper localhost:2181 --alter --topic 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition --config 
retention.ms=-1

Verifying afterwards:

kafka-topics --zookeeper localhost:2181 --describe --topic 
event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition

Topic:event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition      
 PartitionCount:32       ReplicationFactor:3     
Configs:segment.bytes=52428800,retention.ms=-1,segment.index.bytes=10485760,segment.ms=604800000,cleanup.policy=delete

Is there anything that seems off to you? Or something else I can investigate 
further? We'd really like to nail this issue down. Especially because the cause 
seems different than the 'low traffic' cause in JIRA issue KAFKA-7190 as the 
partitions for which errors are thrown are receiving data.

Best,

Pieter

-----Oorspronkelijk bericht-----
Van: Guozhang Wang <wangg...@gmail.com>
Verzonden: Wednesday, 5 June 2019 02:23
Aan: users@kafka.apache.org
Onderwerp: Re: Repeating UNKNOWN_PRODUCER_ID errors for Kafka streams 
applications

Hello Pieter,

If you only have one record every few seconds that may be too small given you 
have at least 25 partitions (as I saw you have a xxx--repartition-24 
partition), which means that for a single partition, it may not see any records 
for a long time, and in this case you may need to override it to very large 
values. On the other hand, if you can reduce your num.partitions that may also 
help increasing the traffic per partition.

Also could you show me how did you override the configs in the repartition 
topics?


Guozhang

On Tue, Jun 4, 2019 at 2:10 AM Pieter Hameete <pieter.hame...@blockbax.com>
wrote:

> Hello,
>
> Our Kafka streams applications are showing the following warning every 
> few seconds (on each of our 3 brokers, and on each of the 2 instances 
> of the streams application):
>
>
> [Producer
> clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-Stream
> Thread-1-1_24-producer, transactionalId=event-rule-engine-1_24]
> Resetting sequence number of batch with current sequence 1 for 
> partition
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
> to 0
>
>
>
> Followed by:
>
>
>
> [Producer
> clientId=event-rule-engine-dd71ae9b-523c-425d-a7c0-c62993315b30-Stream
> Thread-1-1_24-producer, transactionalId=event-rule-engine-1_24] Got 
> error produce response with correlation id 5902 on topic-partition 
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
> , retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
>
> The brokers are showing errors that look related:
>
>
> Error processing append operation on partition
> event-rule-engine-KSTREAM-REDUCE-STATE-STORE-0000000015-repartition-24
> (kafka.server.ReplicaManager)
>
> org.apache.kafka.common.errors.UnknownProducerIdException: Found no 
> record of producerId=72 on the broker. It is possible that the last 
> message with the producerId=72 has been removed due to hitting the retention 
> limit.
>
>
>
> We would expect the UNKNOWN_PRODUCER_ID error to occur once. After a 
> retry the record would be published on the partition and the 
> PRODUCER_ID would be known. However, this error keeps occurring every 
> few seconds. This is roughly at the same rate at which records are 
> produced on the input topics partitions, so it seems like it occurs for 
> (nearly) every input record.
>
>
>
> The following JIRA issue: 
> https://issues.apache.org/jira/browse/KAFKA-7190
> looks related. Except the Jira issue mentions ‘little traffic’, and I 
> am not sure if a message per every few seconds is regarded as little traffic.
> Matthias mentions in the issue that a workaround seems to be to 
> increase topic configs `segment.bytes`, `segment.index.bytes`, and 
> `segment.ms` for the corresponding repartition topics. We’ve tried 
> manually overriding these configs for a relevant topic to the config 
> values in the linked pull request 
> (https://github.com/apache/kafka/pull/6511) but this did not result in the 
> errors disappearing.
>
>
>
> Could anyone help us to figure out what is happening here, and why the 
> proposed fix for the above JIRA issue is not working in this case?
>
>
>
> Best,
>
>
>
> Pieter
>
>

--
-- Guozhang

Reply via email to