Hi, I’m trying to “enrich” a stream of events (i.e. roughly speaking read messages from one topic, make a query to an external system, write to another topic). The problem is – external system can handle lots of calls, but has a high latency. I can easily do the enrichment if I use one thread per message, but that's kinda' wasteful (I'd need a lot of partitions/ tasks to have reasonable throughput).... so, I'm thinking about using multiplexed IO. Like, have a processor that just registers the "input message" in a local state store, without committing the task; and then, the punctuator can look at the registered "input messages", start the IO for all of them, forward the results for the completed IO tasks to subsequent processors, and commit progress. The state store can be non-replicated, since I can reprocess messages in case of failure (and I don't really mind duplication of messages on the output topic)
The question is... would that work? I'm concerned about rebalancing (when one worker is added/removed), and specifically: - If I understand correctly the code, during partition rebalance the tasks may be suspended, and the suspend() method will actually commit the offsets (the last task where process() was completed, regardless whether it invoked commit() or not). That'd be bad, since it means that on rebalance, I might end up skipping records (I'm not concerned about duplication, within reasonable bounds; but I am concerned about skipping records) - Not sure what happens to the state during rebalancing (if I disable the changelog - i.e. I make all the state local). Is all state lost? Thanks, Virgil.