Re: Need suggestion on Flink-Kafka stream processing design

2020-05-13 Thread Arvid Heise
Hi Hemant, what you described is an aggregation. You aggregate 15 small records into one large record. The concept of aggregation goes beyond pure numeric operations; for example, forming one large string with CONCAT is also a type of aggregation. In your case, I'd still follow my general outline

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
Hi Arvid, I don't want to aggregate all events, rather I want to create a record for a device combining data from multiple events. Each of this event gives me a metric for a device, so for example if I want one record for device-id=1 the metric will look like metric1, metric2, metric3 where m

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread Arvid Heise
Hi Hemant, In general, you want to keep all data coming from one device in one Kafka partition, such that the timestamps of that device are monotonically increasing. Thus, when processing data from one device, you have ensured that no out-of-order events with respect to this device happen. If you

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
Hello Flink Users, Any views on this question of mine. Thanks, Hemant On Tue, May 12, 2020 at 7:00 PM hemant singh wrote: > Hello Roman, > > Thanks for your response. > > I think partitioning you described (event type + protocol type) is subject > to data skew. Including a device ID should sol

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread hemant singh
Hello Roman, Thanks for your response. I think partitioning you described (event type + protocol type) is subject to data skew. Including a device ID should solve this problem. Also, including "protocol_type" into the key and having topic per protocol_type seems redundant. Each protocol is in sin

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-12 Thread Khachatryan Roman
Hello Hemant, Thanks for your reply. I think partitioning you described (event type + protocol type) is subject to data skew. Including a device ID should solve this problem. Also, including "protocol_type" into the key and having topic per protocol_type seems redundant. Furthermore, do you have

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hello Roman, PFB my response - As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? Yes, correct. distinct events and devices. Each device emits these event. > Based on data protocol I have 4-5 topics

Re: Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread Khachatryan Roman
Hi Hemant, As I understand, each protocol has a distinct set of event types (where event type == metrics type); and a distinct set of devices. Is this correct? > Based on data protocol I have 4-5 topics. Currently the data for a single event is being pushed to a partition of the kafka topic(produ

Need suggestion on Flink-Kafka stream processing design

2020-05-11 Thread hemant singh
Hi, I have different events from a device which constitutes different metrics for same device. Each of these event is produced by the device in interval of few milli seconds to a minute. Event1(Device1) -> Stream1 -> Metric 1 Event2 (Device1) -> Stream2 -> Metric 2 ... .. ... Even