We are seeing some strange behavior from brokers after we we had to change
our log retention policy on brokers yesterday. We had a huge spike in
producer data for a small period which caused brokers to get very close to
the max disk space. Normally our retention policy is good 6-7 days but
since our consumers were synced up we changed the retention policy from
hour based to size based and cut short the size to a safe number (half of
our max disk space and normal usage is around 30%). After the restart, we
started seeing multiple producer side failures with FailedSends metrics
showing almost 10% failures and FailedProduceRequestsPerSec on the broker
side a non-zero number. The traces from one of the brokers looked like this:

[KafkaApi-8] Produce request with correlation id 2050686 from client xxx on
partition [TOPIC_NAME,18] failed due to Partition [TOPIC_NAME,18] doesn't
exist on 8 (kafka.server.KafkaApis)
 [KafkaApi-8] Produce request with correlation id 2102325 from client xxx
on partition [TOPIC_NAME,28] failed due to Partition [TOPIC_NAME,28]
doesn't exist on 8 (kafka.server.KafkaApis)

We checked and made sure those partitions were present on the broker.
Any help is appreciated. Also, is there a recommended way to purge log data
quickly out from the brokers.

Thanks,
Sadhan

Reply via email to