Thank you for the clarification and pointing me to the JIRA case "So in the above pull request, I implemented a solution in the way similar to watershed algorithm. It would firstly pick the slots from least used host, until that host uses the same number of slots as the second least used slots. Then it evenly picks slots from the 2 least used hosts until reaches the 3rd one. Iterating this way, we can get the best balanced assignment."
This makes sense now. So mostly its evenly spread across supervisors. From: Grant Overby (groverby) [mailto:grove...@cisco.com] Sent: Friday, March 13, 2015 11:31 AM To: user@storm.apache.org Subject: Re: configuring topology.workers My understudying of the DefaultScheduler isn't complete, so take this with a grain of salt. As of 0.9.3, if I understand correctly: The list of available slots are sorted based on number of free slots on the same supervisor as the given slot then on port number. If there are two or more slots that tie on both these conditions, the order between those slots is random. Slots are then consumed in this order. This gives a round robin style behavior. I say style here because it deviates from a basic round robin algorithm when topologies have been killed and thus freeing up other slots. Preference is given to these slots to keep supervisor slot use balanced. So if you already have topologies deployed and then kill some, such that Supervisor A has 1 free slot and Supervisor B has 3 free slots and then deploy a topology with two workers, both will go to Supervisor B. This isn't the usual case. You can generally expect the topology to be spread evenly, or at least somewhat evenly, over the supervisors. The primary goal is to balance the used slots between supervisors with a secondary goal of spreading a topology across supervisors. This may help: https://issues.apache.org/jira/browse/STORM-132 [Image removed by sender.] Grant Overby Software Engineer Cisco.com<http://www.cisco.com/> grove...@cisco.com<mailto:grove...@cisco.com> Mobile: 865 724 4910 [Image removed by sender.] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information. From: Srividhya Shanmugam <srividhyashanmu...@fico.com<mailto:srividhyashanmu...@fico.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Thursday, March 12, 2015 at 8:49 PM To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Subject: RE: configuring topology.workers Thank you. To be more specific, I looked at DefaultScheduler source. It's getting the list of available slots as node+port combination...So if a node has two slots and next node has 2 more slots, will the workers for the topology be assigned to 2 slots in one node or one slot in each node? Appreciate your help. From: Grant Overby (groverby) [mailto:grove...@cisco.com] Sent: Thursday, March 12, 2015 6:38 PM To: user@storm.apache.org<mailto:user@storm.apache.org> Subject: Re: configuring topology.workers Topologies are assigned to supervisors in a round robin fashion by the default scheduler. You can provide other schedulers: http://xumingming.sinaapp.com/885/twitter-storm-how-to-develop-a-pluggable-scheduler/ [Image removed by sender.] Grant Overby Software Engineer Cisco.com<http://www.cisco.com/> grove...@cisco.com<mailto:grove...@cisco.com> Mobile: 865 724 4910 [Image removed by sender.] Think before you print. This email may contain confidential and privileged material for the sole use of the intended recipient. Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive for the recipient), please contact the sender by reply email and delete all copies of this message. Please click here<http://www.cisco.com/web/about/doing_business/legal/cri/index.html> for Company Registration Information. From: Srividhya Shanmugam <srividhyashanmu...@fico.com<mailto:srividhyashanmu...@fico.com>> Reply-To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Date: Thursday, March 12, 2015 at 5:26 PM To: "user@storm.apache.org<mailto:user@storm.apache.org>" <user@storm.apache.org<mailto:user@storm.apache.org>> Subject: configuring topology.workers All, I am trying to understand how setting topology.workers will impact distribution of work for a given topology. Say if storm cluster has 2 supervisor nodes and both the nodes are configured with supervisor.slots.ports: 6700, 6701,6702,6703. If the topology.workers is set to 2, will storm run two worker process in one node or two worker process - one in each node? How storm determines this? Thanks much, Srividhya This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately. This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately. This email and any files transmitted with it are confidential, proprietary and intended solely for the individual or entity to whom they are addressed. If you have received this email in error please delete it immediately.