Thanks a lot for such a timely response.

So, even if each bolt tasks resides in different worker (different server
in our use-case), the messages go to all 32 tasks, right?

Also, this leads me into another question. (I think the answer is yes).
Given field grouping guarantees that messages with same "field value" go to
the same task, does "the same task" mean across all workers? or within same
worker.

For example, let's two kafka partition 0, 1, spout task s1, s2 and bolt
tasks b1, b2, b3 and b4 distributed across two workers w1 and w2.
So it looks like,
w1
 - partition_0 -> s1 -> b1 & b2
w2
 - partition_1 -> s2 -> b3 & b4

When two messages with same field value, m1 and m2 are produced to kafka
partition 0 and 1, respectively, does both m1 and m2 go to same bolt, say
b3? Or, does it get sent to same bolt in each worker (say b1 in w1 and b3
in w3)?

Simply put, does field grouping groups messages in whole topology? or only
groups in a single worker?

Thanks,
Baek





*Seungtack Baek | Precocity, LLC*

Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715

*seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* |
www.precocityllc.com


This is the end of this message.

--

On Mon, Jun 8, 2015 at 12:17 AM, Dima Dragan <dima.dra...@belleron.net>
wrote:

> Hi, Seungtack!
>
> Distribution of messages will be depends only from grouping (in case of
> "shuffe grouping", Tuples are randomly distributed across the all bolt's
> tasks in a way such that each bolt is guaranteed to get an equal number of
> tuples.
>
> Best regards,
> Dmytro Dragan
> On Jun 8, 2015 07:12, "Seungtack Baek" <seungtackb...@precocityllc.com>
> wrote:
>
>> Hi,
>>
>> I have read from the documentation that if you have more spout tasks than
>> kafka partition, the excessive tasks will remain idle for entire lifecycle
>> of the topology.
>>
>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>> assigned to each partitions in kafka and the other 2 will remain idle.
>> However, does that mean that only the bolts within the same worker will get
>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>> to whatever bolt taks available, regardless of which worker?
>>
>> Thanks,
>> Baek
>>
>>
>> *Seungtack Baek | Precocity, LLC*
>>
>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>
>> *seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* |
>> www.precocityllc.com
>>
>>
>> This is the end of this message.
>>
>> --
>>
>> On Sun, Jun 7, 2015 at 10:12 PM, Seungtack Baek <
>> seungtackb...@precocityllc.com> wrote:
>>
>>> Hi,
>>>
>>> I have read from the documentation that if you have more spout tasks
>>> than kafka partition, the excessive tasks will remain idle for entire
>>> lifecycle of the topology.
>>>
>>> Now, Let's consider 4 spout tasks, 32 bolt tasks (of one class) in 4
>>> workers (in 4 nodes) and 2 partitions in kafka. Then 2 tasks will be
>>> assigned to each partitions in kafka and the other 2 will remain idle.
>>> However, does that mean that only the bolts within the same worker will get
>>> the messages (assuming shuffle grouping)? Or, do the messages get emitted
>>> to whatever bolt taks available, regardless of which worker?
>>>
>>> Thanks,
>>> Baek
>>>
>>>
>>> *Seungtack Baek | Precocity, LLC*
>>>
>>> Tel/Direct: (972) 378-1030 | Mobile: (214) 477-5715
>>>
>>> *seungtackb...@precocityllc.com <seungtackb...@precocityllc.com>* |
>>> www.precocityllc.com
>>>
>>>
>>> This is the end of this message.
>>>
>>> --
>>>
>>
>>

Reply via email to