I’ve been reading the Kafka docs and one thing that I’m having trouble 
understanding is how partitions affect sequential disk IO. One of the reasons 
Kafka is so fast is that you can do lots of sequential IO with read-ahead cache 
and all of that goodness. However, if your broker is responsible for say 20 
partitions, then won’t the disk be seeking to 20 different spots for its writes 
and reads? I thought that maybe letting the OS handle fsync would make this 
less of an issue but it still seems like it could be a problem.

In our particular situation, we are going to have 6 brokers, 3 in each DC, with 
mirror maker replication from the secondary DC to the primary DC. We aren’t 
likely to need to add more nodes for a while so would it be faster to have 1 
partition/node than say 3-4/node to minimise the seek times on disk?

Are my assumptions correct or is this not an issue in practice? There are some 
nice things about having more partitions like rebalancing more evenly if we 
lose a broker but we don’t want to make things significantly slower to get 
this.  

Thanks, Daniel.

Reply via email to