Hi all,
I have a problem with my kafka-streams (2.1.1) application. Sorry for being 
vague, but I couldn‘t find more information than the following:
Most of the times my services are running just fine, but sometimes (I cannot 
put my finger on a precise trigger) the .sst files of more or less random 
services are not getting cleaned up anymore. The number just keeps growing 
until I restart the specific service or reach the file limit of my server. It 
seems that services using more state stores are getting affected more often.

What I could observe is, that there is always „an event“ before this is 
happening. Yesterday for example we had to shut down one of our brokers and the 
consumers logged:
Received invalid metadata error in produce request on partition 
my-store-changelog-16 due to 
org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition.. Going to request metadata update now

Although 6 instances of that service logged this message, only 2 of them 
started piling up .sst files. All 6 kept working.


Some days ago the affected services logged following message before the file 
descriptor count started rising:
Failed to commit stream task 0_17 since it got migrated to another thread 
already. Closing it as zombie before triggering a new rebalance.
Detected task 0_17 that got migrated to another thread. This implies that this 
thread missed a rebalance and dropped out of the consumer group. Will try to 
rejoin the consumer group. Below is the detailed description of the task: …
[REBALANCING]


I already checked https://github.com/facebook/rocksdb/wiki/Delete-Stale-Files 
and had a look for leaking iterators in our code. Couldn’t find any + if we had 
a resource leak the problem would occur all the time, I guess? I found this old 
issue 
https://github.com/apache/kafka/commit/2b431b551252a65113cb720b102a2f3e8b301099 
and thought it looked a lot like mine. Could there be a rare case of 
resource/iterator leak, if a producer has to update itself?

I hope someone might have an idea where I could start looking,
Ole

Reply via email to