Re: Question about OpaqueTridentKafkaSpout

Abhishek Bhattacharjee Wed, 14 May 2014 17:51:06 -0700

I understand that. But if you check the first post in the thread it says if
a batch has tuples t1 t2 .. from partitions p1 p2..
I think it is not possible to have tuples from different partitions to form
a batch.



On Wed, May 7, 2014 at 8:23 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote:

> It all depends on the nature of the spout.
>
> With a transactional spout, batches are always the same, even if replayed.
>
> With an opaque spout, batches can change. But you have the guarantee that
> a tuple will only ever be processed successfully in a single batch. If a
> tuple fails in one batch, it could succeed in another.
>
> -Taylor
>
> On May 6, 2014, at 8:19 PM, Ashok Gupta <gupta.ashok2...@gmail.com> wrote:
>
> I think it can. That is where the coordinator comes in picture.
> Coordinator defines the parameters of a batch and emitters do the job of
> emitting the sub portions of batch.
>
>
>
>
>
> On Mon, May 5, 2014 at 12:50 PM, Abhishek Bhattacharjee <
> abhishek.bhattacharje...@gmail.com> wrote:
>
>> Are you sure that a batch can consist of tuples from different partitions
>> ?
>> I am just asking I am not sure , if it can then your question seems to be
>> valid else it is not valid anymore :-)
>>
>>
>> On Fri, May 2, 2014 at 7:42 AM, Ashok Gupta <gupta.ashok2...@gmail.com>wrote:
>>
>>>
>>> Hi,
>>>
>>>  I have theoretical question about the guarantees
>>> OpaqueKafkaTridentKafkaSpout provides. I would like to take an example to
>>> illustrate the question I have.
>>>
>>>  Suppose a batch with txId 10 has tuple t1, t2, t3, t4 and they
>>> respectively come from the kafka partition p1,p2,p3,p4. When this batch is
>>> played for the very first time it failed processing however the commit
>>> happen for tuples t3 in the database while it did not happen for the tuples
>>> t1,t2,t4. Since the batch failed, it is expected that the metadata in the
>>> zookeeper is not going to be updated i.e. it will not assume the offsets as
>>> committed for p1,p2,p3,p4. It is expected that the batch will be replayed,
>>> however, suppose before it gets replayed the kafka partition p3 goes down.
>>> What happens now? I understand that another batch with same transaction id
>>> containing t1, t2, t4 may be replayed, however since p3 is down, t3 won’t
>>> be replayed again. Since t3 is not replayed again, even if the batch
>>> succeeds on replay the offsets for the p3 don’t get updated in the
>>> zookeeper. That is all fine as long fault tolerance and opaque behavior is
>>> concerned.
>>>
>>>  My concern is more around what happens when partition p3 is back up
>>> again and the spout starts reading data from the last offset it committed
>>> successfully. Since from partition p3, tuple t3 is again going to be read
>>> and it is certainly going to be in a batch with some txId > 10 (say 19) it
>>> is going to be applied in the state again. This apparently violates the
>>> exactly once semantics.
>>>
>>>  Is the concern genuine or am I missing something?
>>> Regards
>>> --
>>> Ashok Gupta,
>>> (+1) 361-522-2172
>>> San Jose, CA
>>>
>>
>>
>>
>> --
>> *Abhishek Bhattacharjee*
>> *Pune Institute of Computer Technology*
>>
>
>
>
> --
> Ashok Gupta,
> (+1) 361-522-2172
> San Jose, CA
>
>


-- 
*Abhishek Bhattacharjee*
*Pune Institute of Computer Technology*

Re: Question about OpaqueTridentKafkaSpout

Reply via email to