We have many production clusters with three topics in the 1-3MB range and the rest in the multi-kb to sub-kb range. We do use gzip compression, implemented at the broker rather than the producer level. The clusters don’t usually break a sweat. We use MirrorMaker to aggregate these topics to a large central cluster and the message sizes aren’t an issue there, either.
The main problem we have with these large topics — especially one of them that is high throughput — is that after a cluster has been up a while and partitions have moved around (we use cruise-control) due to hardware failures, the topic tends to become less balanced across the broker storage and some brokers/disks fill up faster than others. — Peter > On Mar 13, 2019, at 6:28 PM, 1095193...@qq.com wrote: > > We have a use case where we want to produce data to kafka with max > size of 2 MB rarely (That is, based on user operations message size > will vary). > > Whether producing 2 Mb size will have any impact or we need to split > the message to small chunk such as 100 KB and produce. > > If we produce into small chunk, it will increase response time for the > user. Also, we have checked by producing 2 MB message to kafka and we > doesn't see much latency there. > > Anyway if we split the data and produce, it doesn't have any impact in > disk size. But whether broker performance will degrade due to this? > > Our broker configuration is: > > RAM 125.6 GB Disk Size 2.9 TB Processors 40 > > Thanks, > Hemnath K B.