Some more input:

Right now, you can also use the `ProcessFunction` [1] available in Flink 1.2 to 
simulate state TTL.
The `ProcessFunction` should allow you to keep device state and simulate the 
online / offline detection by registering processing timers. In the `onTimer` 
callback, you can emit the “offline” marker event downstream, and in the 
`processElement` method, you can emit the “online” marker event if the case is 
the device has sent an event after it was determined to be offline.

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/stream/process_function.html

On March 6, 2017 at 9:40:28 PM, Tzu-Li (Gordon) Tai (tzuli...@apache.org) wrote:

Hi Bruno!

The Flink CEP library also seems like an option you can look into to see if it 
can easily realize what you have in mind.

Basically, the pattern you are detecting is a timeout of 5 minutes after the 
last event. Once that pattern is detected, you emit a “device offline” event 
downstream.
With this, you can also extend the pattern output stream to detect whether a 
device has became online again.

Here are some materials for you to take a look at Flink CEP:
1. http://flink.apache.org/news/2016/04/06/cep-monitoring.html
2. 
https://www.slideshare.net/FlinkForward/fabian-huesketill-rohrmann-declarative-stream-processing-with-streamsql-and-cep?qid=3c13eb7d-ed39-4eae-9b74-a6c94e8b08a3&v=&b=&from_search=4

The CEP parts in the slides in 2. also provides some good examples of timeout 
detection using CEP.

Hope this helps!

Cheers,
Gordon

On March 4, 2017 at 1:27:51 AM, Bruno Aranda (bara...@apache.org) wrote:

Hi all,

We are trying to write an online/offline detector for devices that keep 
streaming data through Flink. We know how often roughly to expect events from 
those devices and we want to be able to detect when any of them stops (goes 
offline) or starts again (comes back online) sending events through the 
pipeline. For instance, if 5 minutes have passed since the last event of a 
device, we would fire an event to indicate that the device is offline.

The data from the devices comes through Kafka, with their own event time. The 
devices events are in order in the partitions and each devices goes to a 
specific partition, so in theory, we should not have out of order when looking 
at one partition.

We are assuming a good way to do this is by using sliding windows that are big 
enough, so we can see the relevant gap before/after the events for each 
specific device. 

We were wondering if there are other ideas on how to solve this.

Many thanks!

Bruno

Reply via email to