Thanks,

A couple of things:
- I’d recommend moving to 0.10.2 (latest release) if you can since several 
improvements were made in the last two releases that make rebalancing and 
performance better.

- When running on environments with large latency on AWS at least (haven’t 
tried Google cloud), one parameter we have found useful to increase performance 
is the receive and send socket size for the consumer and producer in streams. 
We’d recommend setting them to 1MB like this (where “props” is your own 
properties object when you start streams):

// the socket buffer needs to be large, especially when running in AWS with
// high latency. if running locally the default is fine.
props.put(ProducerConfig.SEND_BUFFER_CONFIG, 1024 * 1024);
props.put(ConsumerConfig.RECEIVE_BUFFER_CONFIG, 1024 * 1024);

Make sure the OS allows the larger socket size too.

Thanks
Eno

> On Mar 13, 2017, at 9:21 AM, Mahendra Kariya <mahendra.kar...@go-jek.com> 
> wrote:
> 
> Hi Eno,
> 
> Please find my answers inline.
> 
> 
> We are in the process of documenting capacity planning for streams, stay 
> tuned.
> 
> This would be great! Looking forward to it.
> 
> Could you send some more info on your problem? What Kafka version are you 
> using?
> 
> We are using Kafka 0.10.0.0.
>  
> Are the VMs on the same or different hosts?
> 
> The VMs are on Google Cloud. Two of them are in asia-east1-a and one is in 
> asia-east1-c. All three are n1-standard-4 Ubuntu instances.
>  
> Also what exactly do you mean by “the lag keeps fluctuating”, what metric are 
> you looking at?
> 
> We are looking at Kafka Manager for the time being. By fluctuating, I mean 
> the lag is few thousands at one time, we refresh it the next second, it is in 
> few lakhs, and again refresh it and it is few thousands. I understand this 
> may not be very accurate. We will soon have more accurate data once we start 
> pushing the consumer lag metric to Datadog.
> 
> But on a separate note, the difference between lags on different partitions 
> is way too high. I have attached a tab separated file herewith which shows 
> the consumer lag (from Kafka Manager) for the first the 50 partitions. As is 
> clear, the lag on partition 2 is 530 while the lag on partition 18 is 23K. 
> Note that the same VM is pulling data from both the partitions.
> 
> 
> 
> 
> <KafkaLags.tsv>

Reply via email to