We have a cluster with 3 brokers and kafka 0.9.0.1. One week ago, we decide to 
adjust log.retention.hours from 10 days to 2 days. Stop and go the cluster and 
it is ok. But for one broker, we have every day more and more datas and two 
days later crash with message too many open files. lsof return 7400 opened 
files. We adjust to 10000 and crash again. So, in our data repository, we 
remove all the datas and run again and after a few minutes, cluster is OK. But 
now, after atfer 6 hours, the two valid brokers have 72 GB and the other broker 
has 90 GB. lsof -p xxx returns 1030 and it is growing continously. I am sure 
that tomorrow morning, we will have a crash.

In the server.log of the broken broker,

[2017-11-29 17:28:51,360] INFO Rolled new log segment for 
'__consumer_offsets-27' in 1 ms. (kafka.log.Log)
[2017-11-29 17:31:28,836] INFO Rolled new log segment for 
'__consumer_offsets-8' in 1 ms. (kafka.log.Log)
[2017-11-29 17:35:22,100] INFO Rolled new log segment for 
'__consumer_offsets-12' in 1 ms. (kafka.log.Log)
[2017-11-29 17:37:55,984] INFO Rolled new log segment for 
'__consumer_offsets-11' in 1 ms. (kafka.log.Log)
[2017-11-29 17:38:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
[2017-11-29 17:39:55,836] INFO Rolled new log segment for 
'__consumer_offsets-16' in 1 ms. (kafka.log.Log)
[2017-11-29 17:43:38,300] INFO Rolled new log segment for 
'__consumer_offsets-48' in 1 ms. (kafka.log.Log)
[2017-11-29 17:44:21,110] INFO Rolled new log segment for 
'__consumer_offsets-36' in 1 ms. (kafka.log.Log)
[2017-11-29 17:48:30,600] INFO [Group Metadata Manager on Broker 2]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

And in the same time on a valid broker

[2017-11-29 17:44:46,704] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-48/00000000002686063378.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:44:47,341] INFO Deleting segment 2687254936 from log 
__consumer_offsets-48. (kafka.log.Log)
[2017-11-29 17:44:47,376] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-48/00000000002687254936.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:32,991] INFO Deleting segment 0 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:32,991] INFO Deleting segment 1769617973 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:32,993] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/00000000000000000000.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:32,993] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/00000000001769617973.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:33,593] INFO Deleting segment 1770704579 from log 
__consumer_offsets-36. (kafka.log.Log)
[2017-11-29 17:45:33,627] INFO Deleting index 
/pfic/kafka/data/kafka_data/__consumer_offsets-36/00000000001770704579.index.deleted
 (kafka.log.OffsetIndex)
[2017-11-29 17:45:58,394] INFO [Group Metadata Manager on Broker 0]: Removed 0 
expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)

So, the broken broker never delete a segment. Of course, the three brokers have 
the same configuration.
What's happen ?
Thanks for your advices,


Jean-Max REYMOND
BPCE Infogérance & Technologies

------------------------------------------------------------------------------
L'intégrité de ce message n'étant pas assurée sur Internet, BPCE-IT ne peut 
être tenu responsable de son contenu. Si vous n'êtes pas destinataire de ce 
message, merci de le détruire et d'avertir l'expéditeur.
The integrity of this message cannot be guaranteed on the Internet. BPCE-IT 
cannot therefore be considered responsible for the contents. If you are not the 
intended recipient of this message, then please delete it and notify the sender.
------------------------------------------------------------------------------

Reply via email to