Awesome! I do not know if it exists (or if you would have time to explain it here) but would you have a link that describe the design (or code) behind? especially with multiple ackers.
in all cases, many thanks for the explanation. On Wed, May 18, 2016 at 11:45 AM, Arun Mahadevan <ar...@apache.org> wrote: > Hi Olivier, > > > Are you talking about the $checkpoint spout or MySpout (with the > offset)? > > I was referring to the user spout (MySpout in this case). > > > Does it mean all the emitted tuples are acked only when the > $checkpoint.txId event is ack (and so $checkpoint.txId acts as a barrier)? > which means when tuples are acked (in MySpout), I am sure a state has been > checkpoint. > > Yes that is right. So when tuples are ack-ed in MySpout you can move your > offsets. > > >Does it mean my checkpoint interval must be lower than the tuple timeout > (TOPOLGY_MESSAGE_TIMEOUT)? > > Right, if you change the defaults it should be lower than message timeout. > The default checkpoint interval is 1s and message timeout is 30s. > > Thanks, > Arun > > > From: Olivier Mallassi > Reply-To: "user@storm.apache.org" > Date: Wednesday, May 18, 2016 at 12:57 AM > To: "user@storm.apache.org" > Subject: Re: State Checkpointing & spout state > > Hi Arun, > > Thank you for your answer. > I may be able to deal with "at least once" with idempotency and a stateful > bolt (need to look at in details yet) but being able to checkpoint the > state of the spout would be really helpful ;) > > anyway. I may have missed something in the doc but I just need to clarify > your phrase "It checkpoints the states of all the bolts and once that’s > successful, the tuples emitted by the spout are acked" > > Are you talking about the $checkpoint spout or MySpout (with the offset)? > Does it mean all the emitted tuples are acked only when the > $checkpoint.txId event is ack (and so $checkpoint.txId acts as a barrier)? > which means when tuples are acked (in MySpout), I am sure a state has been > checkpointed. > Does it mean my checkpoint interval must be lower than the tuple timeout > (TOPOLGY_MESSAGE_TIMEOUT)? > > Many thanks for your help. > > Olivier. > > On Tue, May 17, 2016 at 2:12 PM, Arun Mahadevan <ar...@apache.org> wrote: > >> Hi Oliver, >> >> The state checkpointing currently does not checkpoint the state of the >> spout. It checkpoints the states of all the bolts and once that’s >> successful, the tuples emitted by the spout are acked. So currently it >> provides at-least once guarantee. >> >> In the ack method of the spout, you can update your offsets. >> >> In future we will extend state checkpointing to checkpoint the state of >> the spout. >> >> Thanks, >> Arun >> >> >> From: Olivier Mallassi >> Reply-To: "user@storm.apache.org" >> Date: Tuesday, May 17, 2016 at 5:29 PM >> To: "user@storm.apache.org" >> Subject: State Checkpointing & spout state >> >> Hello >> >> I would need to use the state checkpointing for recovery (btw, very >> useful feature). I am facing an issue regarding how to checkpoint the state >> of the my spout (no the checkpoint spout) as part of the "transaction". >> >> My Spout is reading from kafka (or equivalent) and so keeps an offset of >> the last read events. >> It keeps track of >> - the last read offset >> - the emitted and acknowledged events (with their associated offset). >> - the emitted and unack events (so they can be replayed) >> >> With state checkpointing, the bolt states will be kept but how can I keep >> the state of the source ? how can I ensure the spout replays events from >> the offset that match the checkpoint (or txid)? >> Is there any guarantees in storm that the acks are received in the order >> they are sent? >> >> Cheers. >> >> olivier. >> > >