I have a situation in which I have a stream that does a
transformation. This transformation can take as little as 10-30s or up
to about 15 minutes to run. The stream works just fine, until a
rebalance happens in the middle of long processing. Rebalancing
sometimes takes a long time, and sometimes, a new rebalance is started
by Kafka soon after the previous one completes, and this pattern
usually continues for some time.

Reading the docs for Kafka consumer, I see this gem:

> For use cases where message processing time varies unpredictably, neither of 
> these options may be sufficient. The recommended way to handle these cases is 
> to move message processing to another thread, which allows the consumer to 
> continue calling poll while the processor is still working. Some care must be 
> taken to ensure that committed offsets do not get ahead of the actual 
> position. Typically, you must disable automatic commits and manually commit 
> processed offsets for records only after the thread has finished handling 
> them (depending on the delivery semantics you need). Note also that you will 
> need to pause the partition so that no new records are received from poll 
> until after thread has finished handling those previously returned.

Ok awesome, that sounds easy enough and seems to be exactly what I
need. Doing the processing in a separate thread would make the
rebalance a snap.

However, is there a way to do this, or something similar, with
streams? I would like to use the streams abstraction rather than the
consumer/producer APIs directly.

Regards,
Raman

Reply via email to