The default checkpoint interval is 30s and the interval between failing
aggregators is approximately 10s? In that case, no state will ever get
checkpointed and operator reset to initial state.

Thomas

--
sent from mobile

On Mon, Jun 18, 2018, 12:45 PM Mateusz Zakarczemny <m.zakarcze...@gmail.com>
wrote:

> Hi Pramod,
> I removed transient but result is the same -
> https://github.com/Matzz/apex-example/blob/master/src/main/java/aptest/Aggregator.java
>
> Creating aggregator 2018-06-18T10:42:50.582
> Failing aggregator! 2018-06-18T10:42:50.707
> Creating FileOutput 2018-06-18T10:42:50.848
> [1.0]
> [1.0, 2.0]
> [1.0, 2.0, 3.0]
> [1.0, 2.0, 3.0, 4.0]
> [1.0, 2.0, 3.0, 4.0, 5.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 13.0]
> Creating aggregator 2018-06-18T10:42:59.683
> Failing aggregator! 2018-06-18T10:42:59.794
> Creating FileOutput 2018-06-18T10:42:59.926
> Creating aggregator 2018-06-18T10:43:08.810
> Failing aggregator! 2018-06-18T10:43:08.918
> Creating FileOutput 2018-06-18T10:43:08.988
> [1.0]
> [1.0, 2.0]
> [1.0, 2.0, 3.0]
> Creating FileOutput 2018-06-18T10:43:18.059
> Creating aggregator 2018-06-18T10:43:18.142
> Failing aggregator! 2018-06-18T10:43:18.227
> [1.0]
> [1.0, 2.0]
> [1.0, 2.0, 3.0]
> [1.0, 2.0, 3.0, 4.0]
> Creating FileOutput 2018-06-18T10:43:27.130
> Creating aggregator 2018-06-18T10:43:27.135
> Failing aggregator! 2018-06-18T10:43:27.228
> [1.0]
> [1.0, 2.0]
> [1.0, 2.0, 3.0]
> [1.0, 2.0, 3.0, 4.0]
> [1.0, 2.0, 3.0, 4.0, 5.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0]
> [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
>
>
>
> pon., 18 cze 2018 o 00:16 Pramod Immaneni <pramod.imman...@gmail.com>
> napisaƂ(a):
>
>> Hi Matuesz,
>>
>> It is because you have defined the list as transient in the Aggregator.
>> Transient elements are not serialized and included when the checkpoint is
>> created.
>>
>> Thanks
>> On Sun, Jun 17, 2018 at 2:35 PM Mateusz Zakarczemny <
>> m.zakarcze...@gmail.com> wrote:
>>
>>> Hi all,
>>> I created simply app to test apex fault tolerance. It is build from
>>> three main operators:
>>> - sequence generator - operator which generate increasing numbers. One
>>> per time window
>>> - aggregator - just adds incoming number to the list and emits whole
>>> list downstream
>>> - file output - operator which writes incoming messages to the file
>>> To make it faulty, aggregator operator throws an exception for 10% of
>>> messages. Source code is here https://github.com/Matzz/apex-example
>>>
>>> I'm running it on sandbox docker image. I thought that even if
>>> aggregation operator is faulty, application will checkpoint its state.
>>> So over the time output list should be longer and longer.
>>> Unfortunately, it looks like on each failure app is resenting it state
>>> to the beginning. Sample output:
>>>
>>> *tail -f -n 100 /tmp/stream.out *
>>>
>>> *Creating FileOutput 2018-06-16T22:07:01.033*
>>> *Creating aggreagator 2018-06-16T22:07:01.040*
>>> *Creating FileOutput 2018-06-16T22:07:01.041*
>>> *Creating FileOutput 2018-06-16T22:07:02.719*
>>> *Creating aggreagator 2018-06-16T22:07:02.722*
>>> *Creating FileOutput 2018-06-16T22:07:02.723*
>>> *Creating FileOutput 2018-06-16T22:08:48.178*
>>> *Creating aggreagator 2018-06-16T22:08:48.185*
>>> *Creating FileOutput 2018-06-16T22:08:48.186*
>>> *Creating FileOutput 2018-06-16T22:08:49.847*
>>> *Creating aggreagator 2018-06-16T22:08:49.850*
>>> *Creating FileOutput 2018-06-16T22:08:49.852*
>>> *Creating FileOutput 2018-06-16T22:08:56.736*
>>> *Creating aggreagator 2018-06-16T22:08:56.740*
>>> *Creating FileOutput 2018-06-16T22:08:56.743*
>>> *Creating FileOutput 2018-06-16T22:08:57.899*
>>> *Creating aggreagator 2018-06-16T22:08:57.899*
>>> *Creating FileOutput 2018-06-16T22:08:57.899*
>>> *Creating FileOutput 2018-06-16T22:09:10.951*
>>> *Creating FileOutput 2018-06-16T22:09:10.986*
>>> *Creating aggreagator 2018-06-16T22:09:11.001*
>>> *Failing sequence generator!2018-06-16T22:09:11.029*
>>> *Creating FileOutput 2018-06-16T22:09:19.484*
>>> *Creating FileOutput 2018-06-16T22:09:19.506*
>>> *Creating aggreagator 2018-06-16T22:09:19.518*
>>> *Failing sequence generator!2018-06-16T22:09:19.542*
>>> *Creating FileOutput 2018-06-16T22:09:28.646*
>>> *Creating FileOutput 2018-06-16T22:09:28.668*
>>> *Creating aggreagator 2018-06-16T22:09:28.680*
>>> *Failing sequence generator!2018-06-16T22:09:28.704*
>>> *[1.0]*
>>> *Creating FileOutput 2018-06-16T22:09:37.864*
>>> *Creating FileOutput 2018-06-16T22:09:37.886*
>>> *Creating aggreagator 2018-06-16T22:09:37.897*
>>> *Failing sequence generator!2018-06-16T22:09:37.924*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]*
>>> *Creating FileOutput 2018-06-16T22:09:46.921*
>>> *Creating FileOutput 2018-06-16T22:09:46.944*
>>> *Creating aggreagator 2018-06-16T22:09:46.956*
>>> *Failing sequence generator!2018-06-16T22:09:46.980*
>>> *[1.0, 2.0, 3.0, 4.0]*
>>> *[1.0, 2.0, 3.0, 4.0]*
>>> *[1.0, 2.0, 3.0, 4.0]*
>>> *[1.0, 2.0, 3.0, 4.0]*
>>> *Creating FileOutput 2018-06-16T22:09:56.049*
>>> *Creating FileOutput 2018-06-16T22:09:56.070*
>>> *Creating aggreagator 2018-06-16T22:09:56.081*
>>> *Failing sequence generator!2018-06-16T22:09:56.112*
>>> *[1.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]*
>>> *Creating FileOutput 2018-06-16T22:10:05.213*
>>> *Creating FileOutput 2018-06-16T22:10:05.232*
>>> *Creating aggreagator 2018-06-16T22:10:05.241*
>>> *Failing sequence generator!2018-06-16T22:10:05.266**[1.0, 2.0]*
>>> *[1.0, 2.0]*
>>>
>>>
>>>
>>>  Could I ask for some explanation what I'm doing wrong?
>>>
>>> Regards,
>>> Matuesz Zakarczemny
>>>
>>>

Reply via email to