1. If you are consuming data from Kafka or any other receiver based sources, then you can start 1-2 receivers per worker (assuming you'll have min 4 core per worker)
2. If you are having single receiver or is a fileStream then what you can do to distribute the data across machines is to do a repartition. Thanks Best Regards On Thu, Mar 19, 2015 at 11:32 PM, Mohit Anchlia <mohitanch...@gmail.com> wrote: > I am trying to understand how to load balance the incoming data to > multiple spark streaming workers. Could somebody help me understand how I > can distribute my incoming data from various sources such that incoming > data is going to multiple spark streaming nodes? Is it done by spark client > with help of spark master similar to hadoop client asking namenodes for the > list of datanodes? >