Re: Frequent appearance of "Marking the coordinator dead" message in consumer log

Shantanu Deshmukh Wed, 22 Aug 2018 03:51:43 -0700

How do I check for GC pausing?

On Wed, Aug 22, 2018 at 4:12 PM Steve Tian <steve.cs.t...@gmail.com> wrote:


> Did you observed any GC-pausing?
>
> On Wed, Aug 22, 2018, 6:38 PM Shantanu Deshmukh <shantanu...@gmail.com>
> wrote:
>
> > Hi Steve,
> >
> > Application is just sending mails. Every record is just a email request
> > with very basic business logic. Generally it doesn't take more than 200ms
> > to process a single mail. Currently it is averaging out at 70-80 ms.
> >
> > On Wed, Aug 22, 2018 at 3:06 PM Steve Tian <steve.cs.t...@gmail.com>
> > wrote:
> >
> > > How long did it take to process 50 `ConsumerRecord`s?
> > >
> > > On Wed, Aug 22, 2018, 5:16 PM Shantanu Deshmukh <shantanu...@gmail.com
> >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We have Kafka 0.10.0.1 running on a 3 broker cluster. We have an
> > > > application which consumes from a topic having 10 partitions. 10
> > > consumers
> > > > are spawned from this process, they belong to one consumer group.
> > > >
> > > > What we have observed is that very frequently we are observing such
> > > > messages in consumer logs
> > > >
> > > > [2018-08-21 11:12:46] :: WARN  :: ConsumerCoordinator:554 - Auto
> offset
> > > > commit failed for group otp-email-consumer: Commit cannot be
> completed
> > > > since the group has already rebalanced and assigned the partitions to
> > > > another member. This means that the time between subsequent calls to
> > > poll()
> > > > was longer than the configured max.poll.interval.ms, which typically
> > > > implies that the poll loop is spending too much time message
> > processing.
> > > > You can address this either by increasing the session timeout or by
> > > > reducing the maximum size of batches returned in poll() with
> > > > max.poll.records.
> > > > [2018-08-21 11:12:46] :: INFO  :: ConsumerCoordinator:333 - Revoking
> > > > previously assigned partitions [otp-email-1, otp-email-0,
> otp-email-3,
> > > > otp-email-2] for group otp-email-consumer
> > > > [2018-08-21 11:12:46] :: INFO  :: AbstractCoordinator:381 -
> > (Re-)joining
> > > > group otp-email-consumer
> > > > [2018-08-21 11:12:46] :: INFO  :: AbstractCoordinator:600 - *Marking
> > the
> > > > coordinator x.x.x.x:9092 (id: 2147483646 rack: null) dead for group
> > > > otp-email-consumer*
> > > > [2018-08-21 11:12:46] :: INFO  :: AbstractCoordinator:600 - *Marking
> > the
> > > > coordinator x.x.x.x:9092 (id: 2147483646 rack: null) dead for group
> > > > otp-email-consumer*
> > > > [2018-08-21 11:12:46] :: INFO  ::
> > > > AbstractCoordinator$GroupCoordinatorResponseHandler:555 - Discovered
> > > > coordinator 10.189.179.117:9092 (id: 2147483646 rack: null) for
> group
> > > > otp-email-consumer.
> > > > [2018-08-21 11:12:46] :: INFO  :: AbstractCoordinator:381 -
> > (Re-)joining
> > > > group otp-email-consumer
> > > >
> > > > After this, the group enters rebalancing phase and it takes about
> 5-10
> > > > minutes to start consuming messages again.
> > > > What does this message mean? The actual broker doesn't  go down as
> per
> > > our
> > > > monitoring tools. So how come it is declared dead? Please help, I am
> > > stuck
> > > > on this issue since 2 months now.
> > > >
> > > > Here's our consumer configuration
> > > > auto.commit.interval.ms = 3000
> > > > auto.offset.reset = latest
> > > > bootstrap.servers = [x.x.x.x:9092, x.x.x.x:9092, x.x.x.x:9092]
> > > > check.crcs = true
> > > > client.id =
> > > > connections.max.idle.ms = 540000
> > > > enable.auto.commit = true
> > > > exclude.internal.topics = true
> > > > fetch.max.bytes = 52428800
> > > > fetch.max.wait.ms = 500
> > > > fetch.min.bytes = 1
> > > > group.id = otp-notifications-consumer
> > > > heartbeat.interval.ms = 3000
> > > > interceptor.classes = null
> > > > key.deserializer = class org.apache.kafka.common.serialization.
> > > > StringDeserializer
> > > > max.partition.fetch.bytes = 1048576
> > > > max.poll.interval.ms = 300000
> > > > max.poll.records = 50
> > > > metadata.max.age.ms = 300000
> > > > metric.reporters = []
> > > > metrics.num.samples = 2
> > > > metrics.sample.window.ms = 30000
> > > > partition.assignment.strategy = [class org.apache.kafka.clients.
> > > > consumer.RangeAssignor]
> > > > receive.buffer.bytes = 65536
> > > > reconnect.backoff.ms = 50
> > > > request.timeout.ms = 305000
> > > > retry.backoff.ms = 100
> > > > sasl.kerberos.kinit.cmd = /usr/bin/kinit
> > > > sasl.kerberos.min.time.before.relogin = 60000
> > > > sasl.kerberos.service.name = null
> > > > sasl.kerberos.ticket.renew.jitter = 0.05
> > > > sasl.kerberos.ticket.renew.window.factor = 0.8
> > > > sasl.mechanism = GSSAPI
> > > > security.protocol = SSL
> > > > send.buffer.bytes = 131072
> > > > session.timeout.ms = 300000
> > > > ssl.cipher.suites = null
> > > > ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> > > > ssl.endpoint.identification.algorithm = null
> > > > ssl.key.password = null
> > > > ssl.keymanager.algorithm = SunX509
> > > > ssl.keystore.location = null
> > > > ssl.keystore.password = null
> > > > ssl.keystore.type = JKS
> > > > ssl.protocol = TLS
> > > > ssl.provider = null
> > > > ssl.secure.random.implementation = null
> > > > ssl.trustmanager.algorithm = PKIX
> > > > ssl.truststore.location = /x/x/client.truststore.jks
> > > > ssl.truststore.password = [hidden]
> > > > ssl.truststore.type = JKS
> > > > value.deserializer = class org.apache.kafka.common.serialization.
> > > > StringDeserializer
> > > >
> > >
> >
>

Re: Frequent appearance of "Marking the coordinator dead" message in consumer log

Reply via email to