Hello Siva,

To better understand your situation, I'd need to ask a few more questions:

1) What triggers your REBALANCING event?

2) Does your application contain any states? If yes, how are they
configured (persistent or in-memory, is logging enabled, etc)?

3) What is your commit interval configured via "commit.interval.ms"?


To have better insights  on what's happening, you can 1) set the
StateRestoreListener via KafkaStreams#setGlobalStateRestoreListener
(details can be found here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-167%3A+Add+interface+for+the+state+store+restoration+process),
to see how much data are being restored during the task resuming process,
2) monitor on state store restoration metrics (
https://kafka.apache.org/documentation/#kafka_streams_store_monitoring)
such as "restore-latency-avg" and "restore-rate". 3) Look into your log4j
and check for "partition revocation took" and "partition assignment took"
entries and check their time difference.


Guozhang



On Sun, Jul 29, 2018 at 10:37 AM, Siva Ram <sivaraman...@gmail.com> wrote:

>  Hi,
>
> Kafka version 1.0.0 (can't upgrade to another version yet due to legacy
> dependency)
>
> The stream application uses low level processor API and maintains state.  A
> topic is setup with 30 partitions and I had split to 2 stream application
> instances consuming the same topic, each with 15 threads.  The application
> starts fine and moves well until REBALANCING occur.  When it does, the
> application takes long time to move to RUNNING status by itself.  During
> this time no exception, no additional logging occurs in the application.
>
> 1) Could this behavior be due to an issue on Kafka broker OR is this
> related to the stream application?
>
> 2) What logging can we increase to get additional insight as to what cause
> this pause state for a significant period of time (this is impacting the
> throughput)?
>
> FYI, we have set the REQUEST TIMEOUT to max integer value to avoid
> timeout.  In the event we have a single application with 30 threads, I
> don't see this long pause, but that means we need to increase the number of
> threads and memory, which is vertical scaling and not feasible for handling
> a topic with significant volume.
>
> *Instance 1:*
>
> 2018-07-29 01:45:43 INFO  StreamStateListener22 - Stream application moved
> from RUNNING to REBALANCING
> 2018-07-29 02:15:59 INFO  StreamStateListener22 - Stream application moved
> from REBALANCING to RUNNING
>
> 2018-07-29 05:19:18 INFO  StreamStateListener22 - Stream application moved
> from RUNNING to REBALANCING
> 2018-07-29 05:54:00 INFO  StreamStateListener22 - Stream application moved
> from REBALANCING to RUNNING
>
> *Instance 2:*
>
> 2018-07-29 01:45:58 INFO  StreamStateListener22 - Stream application moved
> from RUNNING to REBALANCING
> 2018-07-29 02:41:22 INFO  StreamStateListener22 - Stream application moved
> from REBALANCING to RUNNING
>
> 2018-07-29 05:19:33 INFO  StreamStateListener22 - Stream application moved
> from RUNNING to REBALANCING
> 2018-07-29 05:54:14 INFO  StreamStateListener22 - Stream application moved
> from REBALANCING to RUNNING
>
>
> Thanks,
> Siva
>



-- 
-- Guozhang

Reply via email to