I agree that accumulator timeout should be independent from the other two you 
mentioned. We at LinkedIn have come up with a solution and I'll create a KIP 
for it soon. In essence, we want batch.expiry.ms configuration that directly 
specifies accumulator timeout separately from request.timeout. Proliferation of 
request.timeout "up" the stack has been painful. There are a number of nuances 
to it and they will be discussed in the KIP. Stay tuned. 

-Sumant

Sent from my iPad

> On Nov 8, 2016, at 8:23 AM, Lukasz Druminski 
> <lukasz.drumin...@allegrogroup.com> wrote:
> 
> Hi,
> 
> We are using kafka-producer 0.8.2 on our production. We configured it with
> retries to Integer.MAX_VALUE and buffer.memory to 1GB.
> Thanks to this setup we are protected from unavailability of all brokers
> for around one hour (taking into account our production traffic).
> For example, when all brokers from a single DC/zone are down,
> kafka-producer buffers all incoming messages in its accumulator until full.
> When brokers are available again, the producer sends all the buffered
> messages to kafka. Thanks to this we have some time for recovery and don't
> loose messages at all.
> 
> Now, we would like to migrate to the newest kafka-producer 0.10.1 but we
> have a problem with preserving described behaviour because of changes
> introduced to producer library:
> 
> - proposal about adding request timeout to NetworkClient
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-19+-+Add+a+request+timeout+to+NetworkClient
> - producer record can stay in RecordAccumulator forever if leader is not
> available https://issues.apache.org/jira/browse/KAFKA-1788
> - add a request timeout to NetworkClient
> https://issues.apache.org/jira/browse/KAFKA-2120
> 
> These changes provide request.timeout.ms parameter which is used in:
> 
> 1. actual network RTT
> 2. server replication time
> 3. new mechanism for aborting expired batches
> 
> When brokers are unavailable for more than request.timeout.ms then
> kafka-producer starts dropping batches from accumulator with a
> TimeoutException in a callback with a message:
> 
>  "Batch containing " + recordCount + " record(s) expired due to timeout
> while requesting metadata from brokers for " + topicPartition
> 
> As a possible solution, to protect against unavailability of all brokers,
> in the newest kafka-producer:
> 
> - I could increase request.timeout.ms to one hour and batches would be
> dropped after that time but this value is not reasonable for (1) and (2)
> - I could catch TimeoutException and send corresponding message to
> kafka-producer again but then I don’t have guarantee that there will be
> free space in accumulator
> 
> In my opinion timeout for (3) should be independent from (1) and (2), or
> dropping expired batches should be an optional feature.
> What do you think about this issue? Do you have any suggestion/solution for
> this use case?
> 
> Best regards,
> Luke Druminski

Reply via email to