Re: Aggregation based on Timestamp

Tzu-Li (Gordon) Tai Fri, 11 Aug 2017 04:09:49 -0700

Hi,

Yes, this is definitely doable in Flink, and should be very straightforward.


Basically, what you would do is define a FlinkKafkaConsumer source for your 
Kafka topic [1], following that a keyBy operation on the hostname [2], and then 
a 1-minute time window aggregation [3]. At the end of your pipeline would be a 
InfluxDB sink. There isn’t one out of the box, but it should be fairly easy to 
implement.
If you want deterministic results based on event-time processing, that is also 
possible [4].

Just throwing you links to get started here :) Let us know if you have more 
problems getting started.

Cheers,
Gordon

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/connectors/kafka.html
[2] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/datastream_api.html#datastream-transformations
[3] https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html
[4] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/event_time.html

On 10 August 2017 at 8:52:25 PM, Madhukar Thota (madhukar.th...@gmail.com) 
wrote:

Hi

We have use case where we have thousands of Telegraf agents sending data to 
kafka( some of them are sending 10s interval, 15s interval and 30s interval). 
We would like to aggregate the incoming data to 1 minuter interval based on the 
hostname as key before we write into influxdb. Is it possible to do this type 
of usecase with Flink? if so any sample to get started?

sample data ( influxdb line protocal) coming from Kafka

weather,location=us-midwest,season=summer temperature=82 1465839830100400200

-Madhu

Re: Aggregation based on Timestamp

Reply via email to