I understand that. But if you check the first post in the thread it says if a batch has tuples t1 t2 .. from partitions p1 p2.. I think it is not possible to have tuples from different partitions to form a batch.
On Wed, May 7, 2014 at 8:23 AM, P. Taylor Goetz <ptgo...@gmail.com> wrote: > It all depends on the nature of the spout. > > With a transactional spout, batches are always the same, even if replayed. > > With an opaque spout, batches can change. But you have the guarantee that > a tuple will only ever be processed successfully in a single batch. If a > tuple fails in one batch, it could succeed in another. > > -Taylor > > On May 6, 2014, at 8:19 PM, Ashok Gupta <gupta.ashok2...@gmail.com> wrote: > > I think it can. That is where the coordinator comes in picture. > Coordinator defines the parameters of a batch and emitters do the job of > emitting the sub portions of batch. > > > > > > On Mon, May 5, 2014 at 12:50 PM, Abhishek Bhattacharjee < > abhishek.bhattacharje...@gmail.com> wrote: > >> Are you sure that a batch can consist of tuples from different partitions >> ? >> I am just asking I am not sure , if it can then your question seems to be >> valid else it is not valid anymore :-) >> >> >> On Fri, May 2, 2014 at 7:42 AM, Ashok Gupta <gupta.ashok2...@gmail.com>wrote: >> >>> >>> Hi, >>> >>> I have theoretical question about the guarantees >>> OpaqueKafkaTridentKafkaSpout provides. I would like to take an example to >>> illustrate the question I have. >>> >>> Suppose a batch with txId 10 has tuple t1, t2, t3, t4 and they >>> respectively come from the kafka partition p1,p2,p3,p4. When this batch is >>> played for the very first time it failed processing however the commit >>> happen for tuples t3 in the database while it did not happen for the tuples >>> t1,t2,t4. Since the batch failed, it is expected that the metadata in the >>> zookeeper is not going to be updated i.e. it will not assume the offsets as >>> committed for p1,p2,p3,p4. It is expected that the batch will be replayed, >>> however, suppose before it gets replayed the kafka partition p3 goes down. >>> What happens now? I understand that another batch with same transaction id >>> containing t1, t2, t4 may be replayed, however since p3 is down, t3 won’t >>> be replayed again. Since t3 is not replayed again, even if the batch >>> succeeds on replay the offsets for the p3 don’t get updated in the >>> zookeeper. That is all fine as long fault tolerance and opaque behavior is >>> concerned. >>> >>> My concern is more around what happens when partition p3 is back up >>> again and the spout starts reading data from the last offset it committed >>> successfully. Since from partition p3, tuple t3 is again going to be read >>> and it is certainly going to be in a batch with some txId > 10 (say 19) it >>> is going to be applied in the state again. This apparently violates the >>> exactly once semantics. >>> >>> Is the concern genuine or am I missing something? >>> Regards >>> -- >>> Ashok Gupta, >>> (+1) 361-522-2172 >>> San Jose, CA >>> >> >> >> >> -- >> *Abhishek Bhattacharjee* >> *Pune Institute of Computer Technology* >> > > > > -- > Ashok Gupta, > (+1) 361-522-2172 > San Jose, CA > > -- *Abhishek Bhattacharjee* *Pune Institute of Computer Technology*