Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
egate all events of a given timestamp for a device, > it is a matter of keying by device id and applying a custom window. There > is no need for joins. > > On Tue, May 12, 2020 at 9:05 PM hemant singh wrote: > >> Hello Flink Users, >> >> Any views on this question of m

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
Hello Flink Users, Any views on this question of mine. Thanks, Hemant On Tue, May 12, 2020 at 7:00 PM hemant singh wrote: > Hello Roman, > > Thanks for your response. > > I think partitioning you described (event type + protocol type) is subject > to data skew. Including

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
consume do you mean the downstream system? > Yes. > > Regards, > Roman > > > On Mon, May 11, 2020 at 11:30 PM hemant singh > wrote: > >> Hello Roman, >> >> PFB my response - >> >> As I understand, each protocol has a distinct set of event typ

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
e + data_protocol). > Here you are talking about the source (to Flink job), right? > > Can you also share how are you going to consume these data? > > > Regards, > Roman > > > On Mon, May 11, 2020 at 8:57 PM hemant singh wrote: > >> Hi, >> >> I hav

Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hi, I have different events from a device which constitutes different metrics for same device. Each of these event is produced by the device in interval of few milli seconds to a minute. Event1(Device1) -> Stream1 -> Metric 1 Event2 (Device1) -> Stream2 -> Metric 2 ... .. ...

Re: Flink pipeline;

2020-05-06 Thread hemant singh
You will have to enrich the data coming in for eg- { "equipment-id" : "1-234", "sensor-id" : "1-vcy", . } . Since you will most likely have a keyedstream based on equipment-id+sensor-id or equipment-id, you can have a control stream with data about equipment to workshop/factory mapping

Re: [PROPOSAL] Contribute training materials to Apache Flink

2020-04-09 Thread hemant singh
Hello David, This is a nice move. Im pretty sure the more the resources at one place the better it is for reference, especially for starters. Thanks, Hemant On Thu, Apr 9, 2020 at 11:40 PM David Anderson wrote: > Dear Flink Community! > > For some time now Ververica has been hosting some

Re: Flink Schema Validation - Kafka

2020-03-18 Thread hemant singh
use a custom processFunction to parse the > string to JSON. > In your custom function, you can handle the error more flexibly (like > outputting erroneous records through a side output). > > I hope this helps! > > Best, > Robert > > On Wed, Mar 18, 2020 at 11:48 AM hemant singh >

Flink Schema Validation - Kafka

2020-03-18 Thread hemant singh
Hi Users, Is there a way I can do a schema validation on read from Kafka in a Flink job. I have a pipeline like this Kafka Topic Raw(json data) -> Kafka Topic Avro(avro data) -> Kafka Topic Transformed(avro data) -> Sink While reading from Raw topic I wanted to validate the schema so that in

Re: Timeseries aggregation with many IoT devices off of one Kafka topic.

2020-02-24 Thread hemant singh
Hello, I am also working on something similar. Below is the pipeline design I have, sharing may be it can be helpful. topic -> keyed stream on device-id -> window operation -> sink. You can PM me on further details. Thanks, Hemant On Tue, Feb 25, 2020 at 1:54 AM Marco Villalobos wrote: > I

Re: Flink Kafka connector consume from a single kafka partition

2020-02-19 Thread hemant singh
ctly one partition per device-id, you could even go with > `DataStreamUtil#reinterpretAsKeyedStream` to avoid any shuffling. > > Let me know if I misunderstood your use case or if you have further > questions. > > Best, > > Arvid > > On Wed, Feb 19, 2020 at 8:39 AM hemant singh wro

Flink Kafka connector consume from a single kafka partition

2020-02-18 Thread hemant singh
Hello Flink Users, I have a use case where I am processing metrics from different type of sources(one source will have multiple devices) and for aggregations as well as build alerts order of messages is important. To maintain customer data segregation I plan to have single topic for each customer

CEP with changing threshold

2020-02-12 Thread hemant singh
Hello Flink Users, I have a requirement to generate alerts for metrics like for example - if cpu utilization spike i.e *cpu_utilization > threshold* (>90%) n number of time in x minutes then generate alerts. For this I am using the CEP module. However, one of the requirements is for different

Re: Flink Dynamodb as sink

2020-02-03 Thread hemant singh
Thanks, I'll check it out. On Tue, Feb 4, 2020 at 12:30 PM 容祖儿 wrote: > you can customize a Sink function (implement SinkFunction) that's not so > hard. > > regards. > > On Tue, Feb 4, 2020 at 2:38 PM hemant singh wrote: > >> Hi All, >> >> I am using dy

Flink Dynamodb as sink

2020-02-03 Thread hemant singh
Hi All, I am using dynamodb as a sink for one of the metrics pipeline. Wanted to understand if there are any existing connectors. I did searched and could not find one. If none exists, has anyone implemented one and any hints on that direction will help a lot. Thanks, Hemant