It was on our test environment and nothing was running when the incident
occurred.
In the server log we have a bunch of
[2017-07-11 11:52:15,330] WARN Attempting to send response via channel for
which there is no open connection, connection id 0 (kafka.network.Processor)
But the time doesn't match so I don't know if it's correlated or not

On Tue, Jul 11, 2017 at 1:08 PM, John Yost <hokiege...@gmail.com> wrote:

> Hi Pierre,
>
> Do your brokers remain responsive? In other words, do you see any other
> symptoms such as decreased write or read throughput which may indicate long
> GC pauses or possibly heavy load on your zookeeper cluster as evidenced by
> any SocketTimeoutExceptions on the Kafka and/or Zookeeper sides?
>
> --John
>
> On Tue, Jul 11, 2017 at 6:15 AM, Pierre Coquentin <
> pierre.coquen...@gmail.com> wrote:
>
> > Hi,
> >
> > We are using kafka 0.10.2 with 2 brokers and 2 application nodes composed
> > of 6 consumers each (all in one group). And recently we experienced
> > disconnection of both nodes simultaneously and an infinite retry to
> connect
> > to the coordinator. Currently, just restarting the nodes solve the
> problem
> > but it will occur a few hours later.
> > In the application log we see a lot of :
> > 11.07.2017 06:47:08,905 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> > Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> > group ABC
> > 11.07.2017 06:47:09,007 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> > Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> > ABC.
> > 11.07.2017 06:47:09,008 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> > (Re-)joining group ABC
> > 11.07.2017 06:47:09,274 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> > Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> > group ABC
> > 11.07.2017 06:47:09,375 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> > Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> > ABC.
> > 11.07.2017 06:47:09,375 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> > (Re-)joining group ABC
> > 11.07.2017 06:47:10,820 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:631]
> > Marking the coordinator kafka-2:9092 (id: 2147483646 rack: null) dead for
> > group ABC
> > 11.07.2017 06:47:10,921 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:586]
> > Discovered coordinator kafka-2:9092 (id: 2147483646 rack: null) for group
> > ABC.
> > 11.07.2017 06:47:10,922 INFO
> > [org.apache.kafka.clients.consumer.internals.AbstractCoordinator:420]
> > (Re-)joining group ABC
> >
> > There is nothing in the log of the brokers.
> > We have no problem to contact the coordinator from both nodes. Could it
> be
> > a periodic instability of the network which leads to this infinite
> retries?
> > This problem could it be related to
> > https://issues.apache.org/jira/browse/KAFKA-5464 ?
> >
> > Here is the configuration of the Stream (lots of option are default ones)
> >         application.id = ABC
> >         application.server =
> >         bootstrap.servers = [kafka-1:9092, kafka-2:9092]
> >         buffered.records.per.partition = 1000
> >         cache.max.bytes.buffering = 10485760
> >         client.id =
> >         commit.interval.ms = 30000
> >         connections.max.idle.ms = 540000
> >         key.serde = class
> > org.apache.kafka.common.serialization.Serdes$StringSerde
> >         metadata.max.age.ms = 300000
> >         num.standby.replicas = 0
> >         num.stream.threads = 6
> >         partition.grouper = class
> > org.apache.kafka.streams.processor.DefaultPartitionGrouper
> >         poll.ms = 100
> >         receive.buffer.bytes = 32768
> >         reconnect.backoff.ms = 50
> >         replication.factor = 1
> >         request.timeout.ms = 40000
> >         retry.backoff.ms = 100
> >         rocksdb.config.setter = null
> >         security.protocol = PLAINTEXT
> >         send.buffer.bytes = 131072
> >         state.cleanup.delay.ms = 60000
> >         state.dir = null
> >         timestamp.extractor = class
> > org.apache.kafka.streams.processor.FailOnInvalidTimestamp
> >         value.serde = class com.sigfox.kafka.serde.AvroStreamRecordSerde
> >         windowstore.changelog.additional.retention.ms = 86400000
> >         zookeeper.connect =
> > zookeeper-1:2181,zookeeper-2:2181,zookeeper-3:2181
> >
> >
> > Any thoughts?
> > Regards,
> >
> > Pierre
> >
>

Reply via email to