Hi all,

We have been seeing this issue intermittently, and hence it's difficult to
give a step by step instructions to reproduce it. I have been studying the
code base of the Sender.java
(org.apache.kafka.clients.producer.internals.Sender.java), but haven't been
able to find the possible bug.

We are using setup is 3 node Kafka cluster.

Here are some relevant logs:

2018-03-28 09:50:54,290 ERROR [kafka-producer-network-thread | producer-1]
o.a.k.c.producer.internals.Sender:301 - [Producer clientId=producer-1] The
broker returned org.apache.kafka.common.errors.UnknownProducerIdException:
This exception is raised by the broker if it could not locate the producer
metadata associated with the producerId in question. This could happen if,
for instance, the producer's records were deleted because their retention
time had elapsed. Once the last records of the producerId are removed, the
producer's metadata is removed from the broker, and future appends by the
producer will return this exception. for topic-partition pipeline-0 at
offset -1. This indicates data loss on the broker, and should be
investigated.

2018-03-28 09:51:13,394 WARN [kafka-producer-network-thread | producer-1]
o.a.k.c.producer.internals.Sender:251 - [Producer clientId=producer-1] Got
error produce response with correlation id 1000 on topic-partition
pipeline-3, retrying (2147483459 attempts left). Error:
OUT_OF_ORDER_SEQUENCE_NUMBER

2018-03-28 10:48:33,365 WARN [kafka-producer-network-thread | producer-1]
o.a.k.c.producer.internals.Sender:251 - [Producer clientId=producer-1] Got
error produce response with correlation id 34893 on topic-partition
pipeline-3, retrying (2147449585 attempts left). Error:
OUT_OF_ORDER_SEQUENCE_NUMBER

[2018-03-28 09:50:54,421] ERROR [ReplicaManager broker=1001] Error
processing append operation on partition pipeline-3
(kafka.server.ReplicaManager)
org.apache.kafka.common.errors.OutOfOrderSequenceException: Out of order
sequence number for producerId 5102: 2 (incoming seq. number), 7 (current
end sequence number)


1. We have some sort of Admin API, which deletes and recreates topics (and
loads them), and when we delete a topic it creates a new producerId, which
uses the same producer instance to write messages. (This might be a
problem, but we don't know for sure)

2. We don't always get stuck in this INT_MAX retries (because we have
enabled idempotence), many times it stops after 30 seconds, as expected and
sets a new producerId. (But sometimes that timeout exception doesn't get
triggered)

2018-03-29 10:16:54,826 INFO [kafka-producer-network-thread | producer-1]
o.a.k.c.p.i.TransactionManager:346 - [Producer clientId=producer-1]
ProducerId set to -1 with epoch -1
2018-03-29 10:16:54,827 INFO [kafka-producer-network-thread | producer-1]
o.a.k.c.p.i.TransactionManager:346 - [Producer clientId=producer-1]
ProducerId set to 9002 with epoch 0

---
We are looking to eliminate this indeterministic behaviour, by
handling the OUT_OF_ORDER_SEQUENCE_NUMBER
in a better way (maybe re-instantiate the producer, but not sure if that
would solve anything as Kafka has ways to reset producerId after timeout).

Any ideas/comments on why this is happening, regardless of having a default
timeout of 30 seconds?

Please let me know if you need more information in understanding the
problem we are facing.

Regards,
Saheb
-- 
...
[image: cake bamtech_logo_rgb signature.jpg] <http://www.cakesolutions.net>

Saheb Motiani
(Office) 0845 617 1200
Houldsworth Mill, Houldsworth Street, Reddish, Stockport, SK5 6DA, UK
www.cakesolutions.net
[image: twitter-circle-darkgrey.png]
<https://twitter.com/cakesolutions> [image:
facebook-circle-darkgrey.png]
<https://www.facebook.com/cakesolutionslimited/> [image:
linkedin-circle-darkgrey.png]
<https://www.linkedin.com/company/cake-solutions-limited>
[image: Reactive Applications]
<https://cakesolutions.sigstr.net/uc/588780e60e0f7519396890f3>
Company registered in the UK, No. 4184567 If you have received this e-mail
in error, please accept our apologies, destroy it immediately, and it would
be greatly appreciated if you notified the sender. It is your
responsibility to protect your system from viruses and any other harmful
code or device. We try to eliminate them from e-mails and attachments, but
we accept no liability for any which remain. We may monitor or access any
or all e-mails sent to us.
[image: Powered by Sigstr]
<https://cakesolutions.sigstr.net/uc/588780e60e0f7519396890f3/watermark>

Reply via email to