Hey all, I’m currently in the process of designing a system around Kafka and I’m wondering the recommended way to manage topics. Each event stream we have needs to be isolated from each other. A failure from one should not affect another event stream from processing (by failure, we mean a downstream failure that would require us to replay the messages).
So my first thought was to create a topic per event stream. This allows a larger event stream to be partitioned for added parallelism but keep the default # of partitions down as much as possible. This would solve the isolation requirement in that a topic can keep failing and we’ll continue replaying the messages without affected all the other topics. We read it’s not recommended to have your data model dictate the # of partitions or topics in Kafka and we’re unsure about this approach if we need to triple our event stream. We’re currently looking at 10,000 event streams (or topics) but we don’t want to be spinning up additional brokers just so we can add more event stream, especially if the load for each is reasonable. Another option we were looking into was to not isolate at the topic/partition level but to keep a set of pending offsets persisted somewhere (seemingly what Twitter Heron or Storm does but they don’t seem to persist the pending offsets). Thoughts?