I have a couple of questions related to this:

1. We store state per key (Rocksdb backend). Currently, the state size
is ~1.5Gb. Checkpointing time sometimes reaches ~10-20 seconds. Is it
possible that checkpointing is affecting timer execution?
2. Does checkpointing cause Flink to stop consumption of data streams
(say from Kafka)? We have observed that when the timers are delayed,
there is delay in picking up messages from Kafka.
3. Are there any metrics exposed by Flink that could help us
understand better where the delay is coming from? Is there a metric
for knowing about contention between `processElement` and `onTimer`?
4. Is there a plan for moving from Scheduled Threadpool Executor to
using timing wheels for timeout?

If there is any other information that you need, please let me know.

On Tue, Sep 19, 2017 at 10:37 PM, Narendra Joshi <narendr...@gmail.com> wrote:
> The number of timers is about 400 per second. We have observed that onTimer
> calls are delayed only when the number of scheduled timers starts increasing
> from a minima. It would be great if you can share pointers to code I can
> look at to understand it better. :)
>
> Narendra Joshi
>
> On 14 Sep 2017 16:04, "Aljoscha Krettek" <aljos...@apache.org> wrote:
>>
>> Hi,
>>
>> Yes, execution of these methods is protected by a synchronized block. This
>> is not a fair lock so incoming data might starve timer callbacks. What is
>> the number of timers we are talking about here?
>>
>> Best,
>> Aljoscha
>>
>> > On 11. Sep 2017, at 19:38, Chesnay Schepler <c.schep...@web.de> wrote:
>> >
>> > It is true that onTimer and processElement are never called at the same
>> > time.
>> >
>> > I'm not entirely sure whether there is any prioritization/fairness
>> > between these methods
>> > (if not if could be that onTimer is starved) , looping in Aljoscha who
>> > hopefully knows more
>> > about this.
>> >
>> > On 10.09.2017 09:31, Narendra Joshi wrote:
>> >> Hi,
>> >>
>> >> We are using Flink as a timer scheduler and delay in timer execution is
>> >> a huge problem for us. What we have experienced is that as the number
>> >> of
>> >> Timers we register increases the timers start getting delayed (for more
>> >> than 5 seconds). Can anyone point us in the right direction to figure
>> >> out what might be happening?
>> >>
>> >> I have been told that `onTimer` and `processElement` are called with a
>> >> mutually exclusive lock. Could this locking be the reason this is
>> >> happening? In both the functions there is no IO happening and it should
>> >> not take 5 seconds.
>> >>
>> >> Is it possible that calls to `processElement` starve `onTimer` calls?
>> >>
>> >>
>> >> --
>> >> Narendra Joshi
>> >>
>> >
>>
>

Reply via email to