You'll want to account for the number of disks per node. Normally, partitions are spread across multiple disks. Even more important, the OS file cache reduces the amount of seeking provided that you are reading mostly sequentially and your consumers are keeping up.
On 6/24/14 3:58 AM, "Daniel Compton" <d...@danielcompton.net> wrote: >I¹ve been reading the Kafka docs and one thing that I¹m having trouble >understanding is how partitions affect sequential disk IO. One of the >reasons Kafka is so fast is that you can do lots of sequential IO with >read-ahead cache and all of that goodness. However, if your broker is >responsible for say 20 partitions, then won¹t the disk be seeking to 20 >different spots for its writes and reads? I thought that maybe letting >the OS handle fsync would make this less of an issue but it still seems >like it could be a problem. > >In our particular situation, we are going to have 6 brokers, 3 in each >DC, with mirror maker replication from the secondary DC to the primary >DC. We aren¹t likely to need to add more nodes for a while so would it be >faster to have 1 partition/node than say 3-4/node to minimise the seek >times on disk? > >Are my assumptions correct or is this not an issue in practice? There are >some nice things about having more partitions like rebalancing more >evenly if we lose a broker but we don¹t want to make things significantly >slower to get this. > >Thanks, Daniel.