Re: Flink pipeline;

Leonard Xu Wed, 06 May 2020 18:59:16 -0700

Hi Aissa

Looks like your requirements is to enrich a real stream data(from kafka) with 
dimension data(your case will like: {sensor_id, equipment_id, workshop_id, 
factory_id} ), you can achieve your purpose by Flink DataStream API or just use 
FLINK SQL. I think use pure SQL will be esaier if your dimension data is in 
DB(like mysql, postgresql, hbase which can be treated as temporal table in 
flink),   and you can use first join a temporal table[1] to enrich your real 
stream and 
then write the aggregation sql [2] to finish your application.



Best,
Leonard Xu
[1] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table
 
<https://ci.apache.org/projects/flink/flink-docs-master/dev/table/streaming/joins.html#join-with-a-temporal-table>
[2] 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#aggregations
 
<https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sql/queries.html#aggregations>

> 在 2020年5月6日，21:39，hemant singh <hemant2...@gmail.com> 写道：
> 
> You will have to enrich the data coming in for eg- { "equipment-id" : 
> "1-234", "sensor-id" : "1-vcy", ..... } . Since you will most likely have a 
> keyedstream based on equipment-id+sensor-id or equipment-id, you can have a 
> control stream with data about equipment to workshop/factory mapping 
> something like this - { "equipment-id" : "1-234", "workshop-id" : 
> "1-234","factory-id" : "1-vcy", ..... } and then you can use CoProcess 
> function to join these two streams to have the enriched stream. Once you have 
> the enriched stream you can do aggregations at level you want to.
> You can refer here [1] or [2] for some sample and reference.
> 
> [1] https://www.youtube.com/watch?v=cJS18iKLUIY 
> <https://www.youtube.com/watch?v=cJS18iKLUIY>
> [2] https://training.ververica.com/exercises/eventTimeJoin.html 
> <https://training.ververica.com/exercises/eventTimeJoin.html>
> 
> Hemant
> 
> On Wed, May 6, 2020 at 3:10 AM Aissa Elaffani <aissaelaff...@gmail.com 
> <mailto:aissaelaff...@gmail.com>> wrote:
> Hello Guys,
> I am new to the real-time streaming field, and I am trying to build a BIG 
> DATA architecture for processing real-time streaming. I have some sensors 
> that generate data in json format, they are sent to Apache kafka cluster then 
> i want to consume them with Apache flinkin ordre to do some aggregation. The 
> probleme is that the data coming from kafka contains " the sensor ID , the 
> equipement ID in wiche it is installed, and the status of the equipment..", 
> knowing that the each sensor is installed in an equipement, and the 
> equipement is linked to an workshop that it self linked to factory. So i need 
> an other data source for the workshop and factories, because i want to do 
> aggregation on factories, and the data sent by the sensors contains just the 
> sensorIDand the equipementID...
> Guys I am new to the this field, and i am stuck in this. Can someone please 
> help me to achieve my goal, and explain to me how can i do that. And how can 
> i do this complexed aggregation??And if there is any optmisation to do? Sorry 
> for disturbing you !!!
> AISSA
>

Re: Flink pipeline;

Reply via email to