[ https://issues.apache.org/jira/browse/KAFKA-6630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Manikumar resolved KAFKA-6630. ------------------------------ Resolution: Fixed Fix Version/s: 1.2.0 > Speed up the processing of TopicDeletionStopReplicaResponseReceived events on > the controller > -------------------------------------------------------------------------------------------- > > Key: KAFKA-6630 > URL: https://issues.apache.org/jira/browse/KAFKA-6630 > Project: Kafka > Issue Type: Improvement > Components: core > Reporter: Lucas Wang > Assignee: Lucas Wang > Priority: Minor > Fix For: 1.2.0 > > > Problem Statement: > We find in a large cluster with many partition replicas, it takes a long time > to successfully delete a topic. > Root cause: > Further analysis shows that for a topic with N replicas, the controller > receives all the N StopReplicaResponses from brokers within a short time, > however sequentially handling all the N > TopicDeletionStopReplicaResponseReceived events one by one takes a long time. > Specifically the functions triggered while handling every single > TopicDeletionStopReplicaResponseReceived event include: > TopicDeletionStopReplicaResponseReceived.process calls > TopicDeletionManager.completeReplicaDeletion, which calls > TopicDeletionManager.resumeDeletions, which calls several inefficient > functions. > The inefficient functions called inside TopicDeletionManager.resumeDeletions > include > ReplicaStateMachine.areAllReplicasForTopicDeleted > ReplicaStateMachine.isAtLeastOneReplicaInDeletionStartedState > ReplicaStateMachine.replicasInState > Each of the 3 inefficient functions above will iterate through all the > replicas in the cluster, and filter out the replicas belonging to a topic. In > a large cluster with many replicas, these functions can be quite slow. > Total deletion time for a topic becomes long in single threaded controller > processing model: > Since the controller needs to sequentially process the queued > TopicDeletionStopReplicaResponseReceived events, if the time cost to process > one event is t, the total time to process all events for all replicas of a > topic is N * t. -- This message was sent by Atlassian JIRA (v7.6.3#76005)