Hi Thomas,

I have a question, so when we are using
*KafkaSinglePortExactlyOnceOutputOperator* to write results into maprstream
topic will it be able to read messgaes from the previous operator ?


Thanks
Jaspal

On Thu, Oct 6, 2016 at 6:28 PM, Thomas Weise <[email protected]> wrote:

> For recovery you need to set the window data manager like so:
>
> https://github.com/DataTorrent/examples/blob/
> master/tutorials/exactly-once/src/main/java/com/example/
> myapexapp/Application.java#L33
>
> That will also apply to stateful restart of the entire application
> (relaunch from previous instance's checkpointed state).
>
> For cold restart, you would need to consider the property you mention and
> decide what is applicable to your use case.
>
> Thomas
>
>
> On Thu, Oct 6, 2016 at 4:16 PM, Jaspal Singh <[email protected]>
> wrote:
>
>> Ok now I get it. Thanks for the nice explaination !!
>>
>> One more thing, so you mentioned about checkpointing the offset ranges
>> to replay in same order from kafka.
>>
>> Is there any property we need to configure to do that? like initialOffset
>> set to APPLICATION_OR_LATEST.
>>
>>
>> Thanks
>> Jaspal
>>
>>
>> On Thursday, October 6, 2016, Thomas Weise <[email protected]>
>> wrote:
>>
>>> What you want is the effect of exactly-once output (that's why we call
>>> it also end-to-end exactly-once). There is no such thing as exactly-once
>>> processing in a distributed system. In this case it would be rather
>>> "produce exactly-once. Upstream operators, on failure, will recover to
>>> checkpointed state and re-process the stream from there. This is
>>> at-least-once, the default behavior. Because in the input operator you have
>>> configured to replay in the same order from Kafka (this is done by
>>> checkpointing the offset ranges), the computation in the DAG is idempotent
>>> and the output operator can discard the results that were already published
>>> instead of producing duplicates.
>>>
>>> On Thu, Oct 6, 2016 at 3:57 PM, Jaspal Singh <[email protected]
>>> > wrote:
>>>
>>>> I think this is something called a customized operator implementation
>>>> that is taking care of exactly once processing at output.
>>>>
>>>> What if any previous operators fail ? How we can make sure they also
>>>> recover using EXACTLY_ONCE processing mode ?
>>>>
>>>>
>>>> Thanks
>>>> Jaspal
>>>>
>>>>
>>>> On Thursday, October 6, 2016, Thomas Weise <[email protected]>
>>>> wrote:
>>>>
>>>>> In that case please have a look at:
>>>>>
>>>>> https://github.com/apache/apex-malhar/blob/master/kafka/src/
>>>>> main/java/org/apache/apex/malhar/kafka/KafkaSinglePortExactl
>>>>> yOnceOutputOperator.java
>>>>>
>>>>> The operator will ensure that messages are not duplicated, under the
>>>>> stated assumptions.
>>>>>
>>>>>
>>>>> On Thu, Oct 6, 2016 at 3:37 PM, Jaspal Singh <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> Hi Thomas,
>>>>>>
>>>>>> In our case we are writing the results back to maprstreams topic based
>>>>>> on some validations.
>>>>>>
>>>>>>
>>>>>> Thanks
>>>>>> Jaspal
>>>>>>
>>>>>>
>>>>>> On Thursday, October 6, 2016, Thomas Weise <[email protected]> wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> which operators in your application are writing to external systems?
>>>>>>>
>>>>>>> When you look at the example from the blog (
>>>>>>> https://github.com/DataTorrent/examples/tree/master/tutoria
>>>>>>> ls/exactly-once), there is Kafka input, which is configured to be
>>>>>>> idempotent. The results are written to JDBC. That operator by itself
>>>>>>> supports exactly-once through transactions (in conjunction with 
>>>>>>> idempotent
>>>>>>> input), hence there is no need to configure the processing mode at all.
>>>>>>>
>>>>>>> Thomas
>>>>>>>
>>>>>>>
>>>
>

Reply via email to