Large # of Topics/Partitions

Daniel Fagnan Mon, 08 Aug 2016 12:13:15 -0700

Hey all,

I’m currently in the process of designing a system around Kafka and I’m 
wondering the recommended way to manage topics. Each event stream we have needs 
to be isolated from each other. A failure from one should not affect another 
event stream from processing (by failure, we mean a downstream failure that 
would require us to replay the messages).


So my first thought was to create a topic per event stream. This allows a 
larger event stream to be partitioned for added parallelism but keep the 
default # of partitions down as much as possible. This would solve the 
isolation requirement in that a topic can keep failing and we’ll continue 
replaying the messages without affected all the other topics.

We read it’s not recommended to have your data model dictate the # of 
partitions or topics in Kafka and we’re unsure about this approach if we need 
to triple our event stream.

We’re currently looking at 10,000 event streams (or topics) but we don’t want 
to be spinning up additional brokers just so we can add more event stream, 
especially if the load for each is reasonable.

Another option we were looking into was to not isolate at the topic/partition 
level but to keep a set of pending offsets persisted somewhere (seemingly what 
Twitter Heron or Storm does but they don’t seem to persist the pending offsets).

Thoughts?

Large # of Topics/Partitions

Reply via email to