The KafkaSinglePortExactlyOnceOutputOperator takes whatever output from previous operator and writes to Kafka.
Sent from my iPhone > On Oct 7, 2016, at 07:59, Jaspal Singh <jaspal.singh1...@gmail.com> wrote: > > Hi Thomas, > > I have a question, so when we are using > KafkaSinglePortExactlyOnceOutputOperator to write results into maprstream > topic will it be able to read messgaes from the previous operator ? > > > Thanks > Jaspal > >> On Thu, Oct 6, 2016 at 6:28 PM, Thomas Weise <t...@apache.org> wrote: >> For recovery you need to set the window data manager like so: >> >> https://github.com/DataTorrent/examples/blob/master/tutorials/exactly-once/src/main/java/com/example/myapexapp/Application.java#L33 >> >> That will also apply to stateful restart of the entire application (relaunch >> from previous instance's checkpointed state). >> >> For cold restart, you would need to consider the property you mention and >> decide what is applicable to your use case. >> >> Thomas >> >> >>> On Thu, Oct 6, 2016 at 4:16 PM, Jaspal Singh <jaspal.singh1...@gmail.com> >>> wrote: >>> Ok now I get it. Thanks for the nice explaination !! >>> >>> One more thing, so you mentioned about checkpointing the offset ranges to >>> replay in same order from kafka. >>> >>> Is there any property we need to configure to do that? like initialOffset >>> set to APPLICATION_OR_LATEST. >>> >>> >>> Thanks >>> Jaspal >>> >>> >>>> On Thursday, October 6, 2016, Thomas Weise <thomas.we...@gmail.com> wrote: >>>> What you want is the effect of exactly-once output (that's why we call it >>>> also end-to-end exactly-once). There is no such thing as exactly-once >>>> processing in a distributed system. In this case it would be rather >>>> "produce exactly-once. Upstream operators, on failure, will recover to >>>> checkpointed state and re-process the stream from there. This is >>>> at-least-once, the default behavior. Because in the input operator you >>>> have configured to replay in the same order from Kafka (this is done by >>>> checkpointing the offset ranges), the computation in the DAG is idempotent >>>> and the output operator can discard the results that were already >>>> published instead of producing duplicates. >>>> >>>>> On Thu, Oct 6, 2016 at 3:57 PM, Jaspal Singh <jaspal.singh1...@gmail.com> >>>>> wrote: >>>>> I think this is something called a customized operator implementation >>>>> that is taking care of exactly once processing at output. >>>>> >>>>> What if any previous operators fail ? How we can make sure they also >>>>> recover using EXACTLY_ONCE processing mode ? >>>>> >>>>> >>>>> Thanks >>>>> Jaspal >>>>> >>>>> >>>>>> On Thursday, October 6, 2016, Thomas Weise <thomas.we...@gmail.com> >>>>>> wrote: >>>>>> In that case please have a look at: >>>>>> >>>>>> https://github.com/apache/apex-malhar/blob/master/kafka/src/main/java/org/apache/apex/malhar/kafka/KafkaSinglePortExactlyOnceOutputOperator.java >>>>>> >>>>>> The operator will ensure that messages are not duplicated, under the >>>>>> stated assumptions. >>>>>> >>>>>> >>>>>>> On Thu, Oct 6, 2016 at 3:37 PM, Jaspal Singh >>>>>>> <jaspal.singh1...@gmail.com> wrote: >>>>>>> Hi Thomas, >>>>>>> >>>>>>> In our case we are writing the results back to maprstreams topic based >>>>>>> on some validations. >>>>>>> >>>>>>> >>>>>>> Thanks >>>>>>> Jaspal >>>>>>> >>>>>>> >>>>>>>> On Thursday, October 6, 2016, Thomas Weise <t...@apache.org> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> which operators in your application are writing to external systems? >>>>>>>> >>>>>>>> When you look at the example from the blog >>>>>>>> (https://github.com/DataTorrent/examples/tree/master/tutorials/exactly-once), >>>>>>>> there is Kafka input, which is configured to be idempotent. The >>>>>>>> results are written to JDBC. That operator by itself supports >>>>>>>> exactly-once through transactions (in conjunction with idempotent >>>>>>>> input), hence there is no need to configure the processing mode at all. >>>>>>>> >>>>>>>> Thomas >>>>>>>> >>>> >> >