We have many production clusters with three topics in the 1-3MB range and the 
rest in the multi-kb to sub-kb range. We do use gzip compression, implemented 
at the broker rather than the producer level. The clusters don’t usually break 
a sweat. We use MirrorMaker to aggregate these topics to a large central 
cluster and the message sizes aren’t an issue there, either.

The main problem we have with these large topics — especially one of them that 
is high throughput — is that after a cluster has been up a while and partitions 
have moved around (we use cruise-control) due to hardware failures, the topic 
tends to become less balanced across the broker storage and some brokers/disks 
fill up faster than others.

—
Peter

> On Mar 13, 2019, at 6:28 PM, 1095193...@qq.com wrote:
> 
> We have a use case where we want to produce data to kafka with max
> size of 2 MB rarely (That is, based on user operations message size
> will vary).
> 
> Whether producing 2 Mb size will have any impact or we need to split
> the message to small chunk such as 100 KB and produce.
> 
> If we produce into small chunk, it will increase response time for the
> user. Also, we have checked by producing 2 MB message to kafka and we
> doesn't see much latency there.
> 
> Anyway if we split the data and produce, it doesn't have any impact in
> disk size. But whether broker performance will degrade due to this?
> 
> Our broker configuration is:
> 
> RAM 125.6 GB Disk Size 2.9 TB Processors 40
> 
> Thanks,
> Hemnath K B.

Reply via email to