Hi Mickael,

This looks to be the same as KAFKA-4669. In theory, this should never
happen and it's unclear when/how it can happen. Not sure if someone has
investigated it in more detail.

Ismael

On Mon, Mar 6, 2017 at 5:15 PM, Mickael Maison <mickael.mai...@gmail.com>
wrote:

> Hi,
>
> In one of our clusters, some of our clients occasionally see this
> exception:
> java.lang.IllegalStateException: Correlation id for response (4564)
> does not match request (4562)
> at org.apache.kafka.clients.NetworkClient.correlate(
> NetworkClient.java:486)
> at org.apache.kafka.clients.NetworkClient.parseResponse(
> NetworkClient.java:381)
> at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
> NetworkClient.java:449)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
> at java.lang.Thread.run(Unknown Source)
>
> We've also seen it from consumer poll() and commit()
>
> Usually the response's correlation id is off by just 1 or 2 (like
> above) but we've also seen it off by a few hundreds:
> java.lang.IllegalStateException: Correlation id for response (742)
> does not match request (174)
>     at org.apache.kafka.clients.NetworkClient.correlate(
> NetworkClient.java:486)
>     at org.apache.kafka.clients.NetworkClient.parseResponse(
> NetworkClient.java:381)
>     at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(
> NetworkClient.java:449)
>     at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269)
>     at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.
> clientPoll(ConsumerNetworkClient.java:360)
>     at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
>     at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
>     at org.apache.kafka.clients.consumer.internals.
> ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
>     at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.
> commitOffsetsSync(ConsumerCoordinator.java:426)
>     at org.apache.kafka.clients.consumer.KafkaConsumer.
> commitSync(KafkaConsumer.java:1059)
>     at org.apache.kafka.clients.consumer.KafkaConsumer.
> commitSync(KafkaConsumer.java:1027)
>
> When this happens, all subsequent responses are also shifted:
> java.lang.IllegalStateException: Correlation id for response (743)
> does not match request (742)
> java.lang.IllegalStateException: Correlation id for response (744)
> does not match request (743)
> java.lang.IllegalStateException: Correlation id for response (745)
> does not match request (744)
> java.lang.IllegalStateException: Correlation id for response (746)
> does not match request (745)
>  ...
> It's easy to discard and recreate the consumer instance to recover
> however we can't do that with the producer as it occurs in the Sender
> thread.
>
> Our cluster and our clients are running Kafka 0.10.0.1.
> Under which circumstances would such an error happen ?
> Even with logging set to TRACE, we can't spot anything suspicious
> shortly before the issue. Is there any data we should try to capture
> when this happens ?
>
> Thanks!
>

Reply via email to