[ 
https://issues.apache.org/jira/browse/KAFKA-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416778#comment-16416778
 ] 

Ashish Surana edited comment on KAFKA-6642 at 3/30/18 10:05 AM:
----------------------------------------------------------------

Current task assignor is sticky, and it can be made rack-aware with few 
changes. Where we ensure that same tasks (active & replicas) are assigned on 
different racks as much as possible.

Approach
 # RACK_ID can be added in StreamsConfig file, and needs to be passed while 
starting kafka-streams application. All the processes having same RACK_ID are 
considered in the same rack.
 # No changes in partition to task assignment

 

Assignment of tasks to instances:
 # We assign active tasks to the instances where same task was running as 
active previously.
 # Active Tasks which couldn't be assigned in first step are assigned to the 
instances where same task was running as standby previously
 # Active tasks that still couldn't be assigned, are assigned to instances in 
round-robin way starting from least-loaded instance
 # Above 3 steps are same as StickyAssignor as there is only one active task 
for any task_id so no extra rack aware logic is required in assigning active 
tasks.
 # Now we have to assign standy-task, and here we assign these to instances 
running in racks other than the one with its active task. If we run out of 
racks then we can assign standby-tasks in same rack but on different instances.
 # This makes the assignment rack-aware but more of a best effort and doesn't 
guarantee anything. This is because we might not have capacity left in some 
racks or we might have more number of replicas than number of racks etc

Note: Here we are making current StickyTaskAssignor rack-aware, but doesn't 
change the logic drastically.

Scenario#1
----
When RACK_ID is not passed in any of the stream instances.

In this case, assignment will happen as it's happening currently by 
StickyTaskAssignor. For all the instances for whom RACK_ID is not passed are 
considered to be part of single default-rack.

 

Scenario#2
----
When RACK_ID is passed in all the stream instances.

In this case, all instances belong to one or the other rack, and assignment is 
rack-aware as per above approach.

 

Scenario#3
----
When RACK_ID is passed in some stream instances but not in all.

In this case, all the instances with RACK_ID will belong to the provided racks. 
All the instances for whom RACK_ID were not passed, will be considered to be 
part of single default-rack.

 

Please let us know what you guys think about approach.


was (Author: asurana):
Current task assignor is sticky, and it can be made rack-aware with few 
changes. Where we ensure that same tasks (active & replicas) are assigned on 
different racks as much as possible.

Approach
 # RACK_ID can be added in StreamsConfig file, and needs to be passed while 
starting kafka-streams application. All the processes having same RACK_ID are 
considered in the same rack.
 # No changes in partition to task assignment

 

Assignment of tasks to instances:
 # We assign active tasks to the instances where same task was running as 
active previously.
 # Active Tasks which couldn't be assigned in first step are assigned to the 
instances where same task was running as standby previously
 # Active tasks that still couldn't be assigned, are assigned to instances in 
round-robin way starting from least-loaded instance
 # Above 3 steps are same as StickyAssignor as there is only one active task 
for any task_id so no extra rack aware logic is required in assigning active 
tasks.
 # Now we have to assign standy-task, and here we assign these to instances 
running in racks other than the one with its active task. If we run out of 
racks then we can assign standby-tasks in same rack but on different instances.
 # This makes the assignment rack-aware but more of a best effort and doesn't 
guarantee anything. This is because we might not have capacity left in some 
racks or we might have more number of replicas than number of racks etc

Note: Here we are making current StickyTaskAssignor rack-aware, but doesn't 
change the logic drastically. For example, current assignor is only sticky for 
active tasks, and standby task assignment logic is not sticky as it doesn't 
look for where the task was assigned previously.

Scenario#1
----
When RACK_ID is not passed in any of the stream instances.

In this case, assignment will happen as it's happening currently by 
StickyTaskAssignor. For all the instances for whom RACK_ID is not passed are 
considered to be part of single default-rack.

 

Scenario#2
----
When RACK_ID is passed in all the stream instances.

In this case, all instances belong to one or the other rack, and assignment is 
rack-aware as per above approach.

 

Scenario#3
----
When RACK_ID is passed in some stream instances but not in all.

In this case, all the instances with RACK_ID will belong to the provided racks. 
All the instances for whom RACK_ID were not passed, will be considered to be 
part of single default-rack.

 

Please let us know what you guys think about approach.

> Rack aware task assignment in kafka streams
> -------------------------------------------
>
>                 Key: KAFKA-6642
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6642
>             Project: Kafka
>          Issue Type: New Feature
>          Components: streams
>            Reporter: Ashish Surana
>            Priority: Major
>
> We have rack aware replica assignment in kafka broker ([KIP-36 Rack aware 
> replica 
> assignment|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]).
> This request is to have a similar feature for kafka streams applications. 
> Standby tasks/standby replica assignment in kafka streams is currently not 
> rack aware, and this request is to make it rack aware for better availability.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to