True, but that's what I wanted to confirm by mentioning spout S1 and S2. Will S1 and S2 use their own n mod hash functions or is it a common function decided by Storm? (If anyone could offer a pointer on where I could find this in the Storm source code, I could try finding it myself too)
On Thu, Aug 11, 2016 at 2:36 PM, Gireesh Ramji <gireeshra...@yahoo.com> wrote: > It does not matter who hashes it as long as they all use the same hash > function it will go to the same bolt > > > ------------------------------ > *From:* Navin Ipe <navin....@searchlighthealth.com> > *To:* user@storm.apache.org > *Sent:* Thursday, August 11, 2016 4:56 PM > *Subject:* Re: How long until fields grouping gets overwhelmed with data? > > If the hash is dynamically computed and is stateless, then that brings up > one more question. > > Let's say there are two spout classes S1 and S2. I create 10 tasks of S1 > and 10 tasks of S2. > There are 10 tasks of a bolt B. > > S1 and S2 are fieldsGrouped with B. > > I receive data x in S1 and another data x in S2. > > If S1's emit of x goes to task1 of B, then will S2's emit of x also go to > task1 of B? > > *Basically the question is: *Is the hash value decided by the Spout or by > Storm? Because if it is decided by the spout, then S1's emit of x can go to > task 1 but S2's emit of x might go to some other task of the bolt, and that > won't serve the purpose of someone who wants all x'es to go to one bolt. > > > > > On Wed, Aug 10, 2016 at 8:58 PM, Navin Ipe <navin.ipe@searchlighthealth. > com> wrote: > > Oh that's good to know. I assume it works like this: > https://en.wikipedia.org/wiki/ > Hash_function#Hashing_ uniformly_distributed_data > <https://en.wikipedia.org/wiki/Hash_function#Hashing_uniformly_distributed_data> > > On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote: > > It's based on a modulo of a hash of the field. The fields grouping is > stateless. > > On Aug 10, 2016 8:18 AM, "Navin Ipe" <navin.ipe@searchlighthealth.c om > <navin....@searchlighthealth.com>> wrote: > > Hi, > > For spouts to be able to continuously send a fields grouped tuple to the > same bolt, it would have to store a key value map something like this, > right? > > field1023 ---> Bolt1 > field1343 ---> Bolt3 > field1629 ---> Bolt5 > field1726 ---> Bolt1 > field1481 ---> Bolt3 > > So if my topology runs for a very long time and the spout generates many > unique field values, won't this key value map run out of memory eventually? > > OR is there a failsafe or a map limit that Storm has to handle this > without crashing? > > If memory problems could happen, what would be an alternative way to solve > this problem where many unique fields could get generated over time? > > -- > Regards, > Navin > > > > > -- > Regards, > Navin > > > > > -- > Regards, > Navin > > > -- Regards, Navin