Coreection - group on partition id On Apr 19, 2016 6:33 AM, "Navin Ipe" <navin....@searchlighthealth.com> wrote:
> I've seen this: > http://storm.apache.org/releases/0.10.0/Understanding-the-parallelism-of-a-Storm-topology.html > but it doesn't explain how workers coordinate with each other, so > requesting a bit of clarity. > > I'm considering a situation where I have 2 million rows in MySQL or > MongoDB. > > 1. I want to use a Spout to read the first 1000 rows and send the > processed output to a Bolt. This happens in Worker1. > 2. I want a different instance of the same Spout class to read the next > 1000 rows in parallel with the working of the Spout of 1, then send the > processed output to an instance of the same Bolt used in 1. This happens in > Worker2. > 3. Same as 1 and 2, but it happens in Worker 3. > 4. I might setup 10 workers like this. > 5. When all the Bolts in the workers are finished, they send their outputs > to a single Bolt in Worker 11. > 6. The Bolt in Worker 11 writes the processed value to a new MySQL table. > > *My confusion here is in how to make the database iterations happen batch > by batch, parallelly*. Obviously the database connection would have to be > made in some static class outside the workers, but if workers are started > with just "conf.setNumWorkers(2);", then how do I tell the workers to > iterate different rows of the database? Assuming that the workers are > running in different machines. > > -- > Regards, > Navin >