Thanks a lot for such a timely response. So, even if each bolt tasks resides in different worker (different server in our use-case), the messages go to all 32 tasks, right?
Also, this leads me into another question. (I think the answer is yes). Given field grouping guarantees that messages with same "field value" go to the same task, does "the same task" mean across all workers? or within same worker. For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt tasks b1, b2, b3 and b4 distributed across two workers w1 and w2. So it looks like, w1 - partition_0 -> s1 -> b1 & b2 w2 - partition_1 -> s2 -> b3 & b4 When two messages with same field value, m1 and m2 are produced to kafka partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3 in w3)? Simply put, does field grouping groups messages in whole topology? or only groups in a single worker? Thanks, Baek *Seungtack Baek | Precocity, LLC* Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 *seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* | www.precocityllc.com This is the end of this message. -- On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <dima.dra...@belleron.net> wrote: > Hi, Seungtack! > > Distribution of messages will be depends only from grouping (in case of > "shuffe grouping", Tuples are randomly distributed across the all bolt's > tasks in a way such that each bolt is guaranteed to get an equal number of > tuples. > > Best regards, > Dmytro Dragan > On Jun 8, 2015 07:12, "Seungtack Baek" <seungtackb...@precocityllc.com> > wrote: > >> Hi, >> >> I have read from the documentation that if you have more spout tasks than >> kafka partition, the excessive tasks will remain idle for entire lifecycle >> of the topology. >> >> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 >> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be >> assigned to each partitions in kafka and the other 2 will remain idle. >> However, does that mean that only the bolts within the same worker will get >> the messages (assuming shuffle grouping)? Or, do the messages get emitted >> to whatever bolt taks available, regardless of which worker? >> >> Thanks, >> Baek >> >> >> *Seungtack Baek | Precocity, LLC* >> >> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 >> >> *seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* | >> www.precocityllc.com >> >> >> This is the end of this message. >> >> -- >> >> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek < >> seungtackb...@precocityllc.com> wrote: >> >>> Hi, >>> >>> I have read from the documentation that if you have more spout tasks >>> than kafka partition, the excessive tasks will remain idle for entire >>> lifecycle of the topology. >>> >>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4 >>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be >>> assigned to each partitions in kafka and the other 2 will remain idle. >>> However, does that mean that only the bolts within the same worker will get >>> the messages (assuming shuffle grouping)? Or, do the messages get emitted >>> to whatever bolt taks available, regardless of which worker? >>> >>> Thanks, >>> Baek >>> >>> >>> *Seungtack Baek | Precocity, LLC* >>> >>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715 >>> >>> *seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* | >>> www.precocityllc.com >>> >>> >>> This is the end of this message. >>> >>> -- >>> >> >>