Javier,

Got it. The proposal from SO should work, while the drawback is that you
need one more full fledged consumer instance to do that.

If you'd like to go a bit deeper, you can actually turn on DEBUG level
logging on the `o.a.k.clients.NetworkClient` class which would print the
following upon node disconnects:

```
Node {} disconnected.
```

Then you can have a very simple grep program that look for this line, and
fire healthcheck actions whenever it exceeds an limit within a sliding
window, e.g.


Guozhang


On Thu, Feb 21, 2019 at 11:23 PM Javier Arias Losada <
javier.ari...@gmail.com> wrote:

> Thank you for your responses!
>
> Guozhang, what you propose seems like a very good way to monitor externally
> the healthiness of consumers, with this combination of metrics (offset
> advance + bytes-in/out) it can be deduced when a consumer is not working.
>
> What we are trying to accomplish is detect this very same situation, but
> from inside the consumer process. The reason is our consumer is running as
> a container task in AWS-ECS; and we have an HTTP healthcheck in the process
> so that whenever the process returns 'unhealthy', the cluster scheduler
> stops that instance.
>
> So our idea is to find the best way to realize from inside the consumer
> that we lost connection to the broker so that we can mark the instance as
> unhealthy.
>
> We found in stackoverflow a way to do it, have a consumer and periodically
> do a listTopics(timeout) call, whenever you lose the connection to the
> cluster, this raises an exception. What do you think? Are there any
> drawbacks with this approach other than one extra consumer? Is it better to
> reuse the same consumer, or create a new consumer every time? it would be
> about every minute, this is the period for healthchecks in our cluster.
>
> Again, thanks.
>
>
>
> El mié., 20 feb. 2019 a las 18:54, Guozhang Wang (<wangg...@gmail.com>)
> escribió:
>
> > Hello Javier,
> >
> > Matthias is right it is a known issue, not only in Streams, but in the
> > underlying producer / consumer clients.
> >
> > For you own healthcheck monitoring, I'd suggest you can consider some
> > following alternatives:
> >
> > 1) Monitor on consumer offsets, and alert when it did not change for a
> long
> > time.
> >
> > 2) Obviously not all scenarios of 1) above is contributed from lost
> > connection, so in addition to it you can also monitor on the embedded
> > consumer / producer's bytes-in / bytes-out rate, and alert when it drops
> to
> > zero for some time.
> >
> > Combining 1) with 2), when both happens, it is usually indicating a lost
> > connection situation.
> >
> >
> > Guozhang
> >
> >
> > On Wed, Feb 20, 2019 at 9:39 AM Matthias J. Sax <matth...@confluent.io>
> > wrote:
> >
> > > It's a known issue: https://issues.apache.org/jira/browse/KAFKA-6520
> > >
> > >
> > > On 2/20/19 3:25 AM, Javier Arias Losada wrote:
> > > > Hello Kafka users,
> > > >
> > > > working on a Kafka-Streams stateless application; we want to
> implement
> > > some
> > > > healthchecks so that whenever connection to Kafka is lost for more
> > than a
> > > > threshold, marke the instance as unhealthy, so that our cluster
> manager
> > > > (could be K8S or AWS-ECS) kills that instance and starts a new one.
> > > >
> > > > We have notice that when the consumer is running and the connection
> is
> > > > lost, it tries to reconnect and sends some logs, but we didn't find a
> > way
> > > > to programatically check or subscribe to the connection status.
> > > >
> > > > Am I missing something?
> > > > Is this an intended feature? Why?
> > > > What are the best practices for healtchecking Kafka-streams
> > applications?
> > > >
> > > > I also found that with a plain Kafka consumer, no exception is raised
> > on
> > > > lost connectivity... how could we somehow check the connection
> status?
> > > How
> > > > are other people solving this issue?
> > > >
> > > > Thank you very much.
> > > >
> > >
> > >
> >
> > --
> > -- Guozhang
> >
>


-- 
-- Guozhang

Reply via email to