I have a situation in which I have a stream that does a transformation. This transformation can take as little as 10-30s or up to about 15 minutes to run. The stream works just fine, until a rebalance happens in the middle of long processing. Rebalancing sometimes takes a long time, and sometimes, a new rebalance is started by Kafka soon after the previous one completes, and this pattern usually continues for some time.
Reading the docs for Kafka consumer, I see this gem: > For use cases where message processing time varies unpredictably, neither of > these options may be sufficient. The recommended way to handle these cases is > to move message processing to another thread, which allows the consumer to > continue calling poll while the processor is still working. Some care must be > taken to ensure that committed offsets do not get ahead of the actual > position. Typically, you must disable automatic commits and manually commit > processed offsets for records only after the thread has finished handling > them (depending on the delivery semantics you need). Note also that you will > need to pause the partition so that no new records are received from poll > until after thread has finished handling those previously returned. Ok awesome, that sounds easy enough and seems to be exactly what I need. Doing the processing in a separate thread would make the rebalance a snap. However, is there a way to do this, or something similar, with streams? I would like to use the streams abstraction rather than the consumer/producer APIs directly. Regards, Raman
