I think these are the appropriate code pointers: Original Clojure-based storm-core:
https://github.com/apache/storm/blob/v0.9.6/storm-core/src/clj/backtype/storm/daemon/executor.clj#L36-L39 New Java-based storm-core: https://github.com/apache/storm/blob/3b1ab3d8a7da7ed35adc448d24f1f1ccb6c5ff27/storm-core/src/jvm/org/apache/storm/daemon/GrouperFactory.java#L157-L161 On Thu, Aug 11, 2016 at 2:57 AM, Navin Ipe <navin....@searchlighthealth.com> wrote: > True, but that's what I wanted to confirm by mentioning spout S1 and S2. > Will S1 and S2 use their own n mod hash functions or is it a common > function decided by Storm? (If anyone could offer a pointer on where I > could find this in the Storm source code, I could try finding it myself too) > > On Thu, Aug 11, 2016 at 2:36 PM, Gireesh Ramji <gireeshra...@yahoo.com> > wrote: > >> It does not matter who hashes it as long as they all use the same hash >> function it will go to the same bolt >> >> >> ------------------------------ >> *From:* Navin Ipe <navin....@searchlighthealth.com> >> *To:* user@storm.apache.org >> *Sent:* Thursday, August 11, 2016 4:56 PM >> *Subject:* Re: How long until fields grouping gets overwhelmed with data? >> >> If the hash is dynamically computed and is stateless, then that brings up >> one more question. >> >> Let's say there are two spout classes S1 and S2. I create 10 tasks of S1 >> and 10 tasks of S2. >> There are 10 tasks of a bolt B. >> >> S1 and S2 are fieldsGrouped with B. >> >> I receive data x in S1 and another data x in S2. >> >> If S1's emit of x goes to task1 of B, then will S2's emit of x also go to >> task1 of B? >> >> *Basically the question is: *Is the hash value decided by the Spout or >> by Storm? Because if it is decided by the spout, then S1's emit of x can go >> to task 1 but S2's emit of x might go to some other task of the bolt, and >> that won't serve the purpose of someone who wants all x'es to go to one >> bolt. >> >> >> >> >> On Wed, Aug 10, 2016 at 8:58 PM, Navin Ipe <navin.ipe@searchlighthealth.c >> om> wrote: >> >> Oh that's good to know. I assume it works like this: >> https://en.wikipedia.org/wiki/ >> Hash_function#Hashing_ uniformly_distributed_data >> <https://en.wikipedia.org/wiki/Hash_function#Hashing_uniformly_distributed_data> >> >> On Wed, Aug 10, 2016 at 6:23 PM, Nathan Leung <ncle...@gmail.com> wrote: >> >> It's based on a modulo of a hash of the field. The fields grouping is >> stateless. >> >> On Aug 10, 2016 8:18 AM, "Navin Ipe" <navin.ipe@searchlighthealth.c om >> <navin....@searchlighthealth.com>> wrote: >> >> Hi, >> >> For spouts to be able to continuously send a fields grouped tuple to the >> same bolt, it would have to store a key value map something like this, >> right? >> >> field1023 ---> Bolt1 >> field1343 ---> Bolt3 >> field1629 ---> Bolt5 >> field1726 ---> Bolt1 >> field1481 ---> Bolt3 >> >> So if my topology runs for a very long time and the spout generates many >> unique field values, won't this key value map run out of memory eventually? >> >> OR is there a failsafe or a map limit that Storm has to handle this >> without crashing? >> >> If memory problems could happen, what would be an alternative way to >> solve this problem where many unique fields could get generated over time? >> >> -- >> Regards, >> Navin >> >> >> >> >> -- >> Regards, >> Navin >> >> >> >> >> -- >> Regards, >> Navin >> >> >> > > > -- > Regards, > Navin >