Hi,

I have few questions regarding event time windowing. My scenario is devices
from various timezones will send messages with timestamp and I need to
create a window per device for 10 seconds. The messages will mostly arrive
in order.

Here is my sample code to perform windowing and aggregating the messages
after the window to further process it.

streamEnv.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);

FlinkKafkaConsumer010 consumer = new FlinkKafkaConsumer010("STREAM1",
                    new DeserializationSchema(),
                    kafkaConsumerProperties);

DataStream<Message> msgStream = streamEnv
.addSource(consumer)
.assignTimestampsAndWatermarks(new TimestampExtractor(Time.of(100,
TimeUnit.MILLISECONDS))); // TimestampExtractor implements
BoundedOutOfOrdernessTimestampExtractor

KeyedStream<Message, String> keyByStream = msgStream.keyBy(new
CustomKeySelector());

WindowedStream<Message, String, TimeWindow> windowedStream =

keyByStream.window(TumblingEventTimeWindows.of(org.apache.flink.streaming.api.windowing.time.Time.seconds(10)));

SingleOutputStreamOperator<Message> aggregatedStream =
windowedStream.apply(new AggregateEntries());

My questions are

- In the above code, data gets passed till the window function but even
after window time the data is not received in the apply function. Do I have
to supply a custom evictor or trigger?

- Since the data is being received from multiple timezones and each device
will have some time difference, would it be ok to assign the timestamp as
that of received timestamp in the message at source (kafka). Will there be
any issues with this?

- Are there any limitations on the number of time windows that can be
created at any given time? In my scenario if there are 1 million devices
there will be 1 million tumbling windows.

Thanks,
Govind

Reply via email to