[ 
https://issues.apache.org/jira/browse/STORM-2539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Joseph Evans resolved STORM-2539.
----------------------------------------
    Resolution: Duplicate

[~siwoon.son],

There is a pull request up for something like this on STORM-2686, so I am 
marking this as a duplicate.

> Locality aware grouping, which is a new grouping method considering locality
> ----------------------------------------------------------------------------
>
>                 Key: STORM-2539
>                 URL: https://issues.apache.org/jira/browse/STORM-2539
>             Project: Apache Storm
>          Issue Type: New Feature
>          Components: storm-core
>            Reporter: Siwoon Son
>         Attachments: receiver-imbalance.png, sender-imbalance.png
>
>
> I’d like to propose a new grouping method. This method solves problems that 
> can occur with existing _Shuffle grouping_ method and _Local-or-shuffle_ 
> method.
> I was motivated by the following a question.
> bq. "Would not it be more efficient to transfer tuples to nearby nodes?"
> Among the Storm's grouping methods, _Shuffle grouping_ is a method of 
> randomly selecting a task of the next node. In this method, all nodes can 
> receive data evenly, but the same amount of data is transferred to a 
> relatively distant node, which can cause a high delay. To solve this problem, 
> Storm provides Local-or-shuffle grouping considering locality.
> _Local-or-shuffle grouping_ can minimize the delay by internally processing 
> data in the process if the task receiving the data is in the same process. 
> Local-or-shuffle grouping, however, considers *only locality*, which may 
> leads to two load balancing problems: +the sender imbalance+ and +the 
> receiver imbalance+. First, the sender imbalance problem is a load balancing 
> problem in which traffic is concentrated on a specific task because the 
> number of senders’ tasks and the number of nodes are not equal. Next, the 
> receiver imbalance problem is a load balancing problem in which traffic is 
> concentrated on a specific object because the number of receivers' tasks and 
> the number of nodes are not equal. If these problems occur, the tasks of 
> receivers will perform different amounts of work, resulting in performance 
> degradation and processing delays.
> |!sender-imbalance.png|width=400!|!receiver-imbalance.png|width=400!|
> |(a) Example of sender imbalance problem.|(b) Example of receiver imbalance 
> problem.|
> So, I propose locality aware grouping which can solve the load balancing 
> problem while considering the locality. Locality aware grouping is a method 
> of periodically calculating the ping response time between nodes and 
> transmitting more tuples probabilistically to nodes with low response time. I 
> implemented the proposed Locality aware grouping at 
> [https://github.com/dke-knu/i2am/tree/master/i2am-app/locality-aware-grouping].
>  LocalityAwareGrouping.java is a class that implements locality aware 
> grouping. LocalityGroupingTestTopology.java, TupleGeneratorSpout.java, and 
> PerformanceLoggingBolt.java are topology classes for testing this. 
> LocalityAwareGrouping$prepare() method reads the network information of each 
> node from the Zookeeper and activates the thread. This thread periodically 
> calculates the ping response time of each node. 
> LocalityAwareGrouping$chooseTasks() method selects a task by a higher 
> probability for the nodes with lower network response times.
> But the implementation is straightforward. To calculate the ping between 
> nodes, the network information of the nodes that tasks are performing is 
> needed. I got this information from Zookeeper using the Zookeeper$getData() 
> method. At this time, in order to create a Zookeeper object, I had no choice 
> but to receive the connection information of the Zookeeper from the user.
> Please let me know, if there is a way to get the network information of each 
> node without requiring additional parameters from the user and if you have 
> any additional comments.
> Thank you for reading.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to