zwangbo created KAFKA-7848:
------------------------------

             Summary: Idempotence producer keep retry on 
OutOfOrderSequenceException
                 Key: KAFKA-7848
                 URL: https://issues.apache.org/jira/browse/KAFKA-7848
             Project: Kafka
          Issue Type: Bug
          Components: clients, core
    Affects Versions: 1.1.0
         Environment: CentOS Linux release 7.2.1511 (Core)
            Reporter: zwangbo


We increase our cluster capacity from 50 brokers to 80 brokers.

We do a broker partition reassign while producers is sending message. After 
finished we found a small number of producer in a infinite retry on 
OutOfOrderSequenceException. It's recover when we restart problem producer(ask 
for a new PID).

 

We found problem partition error log in broker server.log like:

ERROR [ReplicaManager broker=79] Error processing append operation on partition 
xxx1-36 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order 
sequence number for producerId 152125: 133262 (incoming seq. number), 133374 
(current end sequence number)

ERROR [ReplicaManager broker=79] Error processing append operation on partition 
xxx-76 (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order 
sequence number for producerId 140981: 834530 (incoming seq. number), 834543 
(current end sequence number)

 

Strange things is the incoming seq. number is smaller than borker current end 
sequence number. Before this exception problem partition has do a leader 
election.

[17:08:20,706] INFO [Partition xxx-76 broker=79] xxx-76 starts at Leader Epoch 
2 from offset 217709710. Previous Leader Epoch was: 1 (kafka.cluster.Partition)

[17:08:20,715] INFO [Partition xxx-76 broker=79] xxx-76 starts at Leader Epoch 
6 from offset 217709710. Previous Leader Epoch was: 2 (kafka.cluster.Partition)

And in producer side, it has NETWORK_EXCEPTION before into 
OutOfOrderSequenceException. So we think maybe some message send success to 
broker, but not response to producer. After partition leader change producer 
retry those old message always reject by broker because of the 
OutOfOrderSequenceException.

 

Our primary producer config:

enable.idempotence = true

retries = Integer.MAX_VALUE

acks = all

max.in.flight.requests.per.connection = 5

compression.type = lz4

metadata.max.age.ms = 300000

 

Topic config:

min.insync.replicas = 2

4 replicas each partition

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to