Hi Thomas, I have a question, so when we are using *KafkaSinglePortExactlyOnceOutputOperator* to write results into maprstream topic will it be able to read messgaes from the previous operator ?
Thanks Jaspal On Thu, Oct 6, 2016 at 6:28 PM, Thomas Weise <[email protected]> wrote: > For recovery you need to set the window data manager like so: > > https://github.com/DataTorrent/examples/blob/ > master/tutorials/exactly-once/src/main/java/com/example/ > myapexapp/Application.java#L33 > > That will also apply to stateful restart of the entire application > (relaunch from previous instance's checkpointed state). > > For cold restart, you would need to consider the property you mention and > decide what is applicable to your use case. > > Thomas > > > On Thu, Oct 6, 2016 at 4:16 PM, Jaspal Singh <[email protected]> > wrote: > >> Ok now I get it. Thanks for the nice explaination !! >> >> One more thing, so you mentioned about checkpointing the offset ranges >> to replay in same order from kafka. >> >> Is there any property we need to configure to do that? like initialOffset >> set to APPLICATION_OR_LATEST. >> >> >> Thanks >> Jaspal >> >> >> On Thursday, October 6, 2016, Thomas Weise <[email protected]> >> wrote: >> >>> What you want is the effect of exactly-once output (that's why we call >>> it also end-to-end exactly-once). There is no such thing as exactly-once >>> processing in a distributed system. In this case it would be rather >>> "produce exactly-once. Upstream operators, on failure, will recover to >>> checkpointed state and re-process the stream from there. This is >>> at-least-once, the default behavior. Because in the input operator you have >>> configured to replay in the same order from Kafka (this is done by >>> checkpointing the offset ranges), the computation in the DAG is idempotent >>> and the output operator can discard the results that were already published >>> instead of producing duplicates. >>> >>> On Thu, Oct 6, 2016 at 3:57 PM, Jaspal Singh <[email protected] >>> > wrote: >>> >>>> I think this is something called a customized operator implementation >>>> that is taking care of exactly once processing at output. >>>> >>>> What if any previous operators fail ? How we can make sure they also >>>> recover using EXACTLY_ONCE processing mode ? >>>> >>>> >>>> Thanks >>>> Jaspal >>>> >>>> >>>> On Thursday, October 6, 2016, Thomas Weise <[email protected]> >>>> wrote: >>>> >>>>> In that case please have a look at: >>>>> >>>>> https://github.com/apache/apex-malhar/blob/master/kafka/src/ >>>>> main/java/org/apache/apex/malhar/kafka/KafkaSinglePortExactl >>>>> yOnceOutputOperator.java >>>>> >>>>> The operator will ensure that messages are not duplicated, under the >>>>> stated assumptions. >>>>> >>>>> >>>>> On Thu, Oct 6, 2016 at 3:37 PM, Jaspal Singh < >>>>> [email protected]> wrote: >>>>> >>>>>> Hi Thomas, >>>>>> >>>>>> In our case we are writing the results back to maprstreams topic based >>>>>> on some validations. >>>>>> >>>>>> >>>>>> Thanks >>>>>> Jaspal >>>>>> >>>>>> >>>>>> On Thursday, October 6, 2016, Thomas Weise <[email protected]> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> which operators in your application are writing to external systems? >>>>>>> >>>>>>> When you look at the example from the blog ( >>>>>>> https://github.com/DataTorrent/examples/tree/master/tutoria >>>>>>> ls/exactly-once), there is Kafka input, which is configured to be >>>>>>> idempotent. The results are written to JDBC. That operator by itself >>>>>>> supports exactly-once through transactions (in conjunction with >>>>>>> idempotent >>>>>>> input), hence there is no need to configure the processing mode at all. >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> >>> >
