Implications on incremental checkpoint duration based on RocksDBStateBackend, Number of Slots per TM

2020-06-17 Thread Sameer W
Hi, The number of RocksDB databases the Flink creates is equal to the number of operator states multiplied by the number of slots. Assuming a parallelism of 100 for a job which is executed on 100 TM's with 1 slot per TM as opposed to 10 TM's with 10 slots per TM I have noticed that the former

Re: Processing Message after emitting to Sink

2020-04-23 Thread Sameer W
One idea that comes to my mind is to convert ProcessFunction1 with a CoProcessFunction[1]. The processElement1() function can send to side-output and process and maintain the business function message as State without emitting it. Then as Arvid mentioned processElement2() can listen on the side

Re: Best way to link static data to event data?

2019-09-27 Thread Sameer W
Connected Streams is one option. But may be an overkill in your scenario if your CSV does not refresh. If your CSV is small enough (number of records wise), you could parse it and load it into an object (serializable) and pass it to the constructor of the operator where you will be streaming the

Re: Apache Flink - Question about dynamically changing window end time at run time

2019-04-24 Thread Sameer W
Global Windows is fine for this use case. I have used the same strategy. You just define custom evictors and triggers and you are all good. Windows are managed by keys, so as such as long as events are evicted from the window, that counts towards reclaiming memory for the key+window combination.

JAXB Classloading errors when using PMML Library (AWS EMR Flink 1.4.2)

2018-09-03 Thread Sameer W
Hi, I am using PMML dependency as below to execute ML models at prediction time within a Flink Map operator org.jpmml pmml-evaluator 1.4.3 javax.xml.bind jaxb-api org.glassfish.jaxb jaxb-runtime guava com.google.guava Environment is EMR, OpenJDK 1.8 and Flink 1.4.2.

Do Flink metrics survive a shutdown?

2018-05-09 Thread Sameer W
I want to use Flink metrics API to store user defined metrics (counters). I instantiate the MetricsGroup in the open() function of the RichMapFunction and increment the counters which are created within the metrics group. If the job restarts on failure, will the counters get restored from state?

Re: Feasability Question: Distributed FlinkCEP

2016-10-20 Thread Sameer W
Could you not do separate followedBy and then perform a join on the resulting alert stream. Pattern p1= followedBy(/*1st*/) Pattern p2= followedBy(/*1st*/) DataStream alertStream1 = CEP.pattern(keyedDs, p1) DataStream alertStream2 = CEP.pattern(keyedDs, p2) Then just join the two alertStream's

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread Sameer W
t; the rules/patterns that use not just the current event attributes, but also > past events (e.g. followedBy) are much harder to make them dynamic without > some help from Flink that implements the CEP operators. > > - LF > > > > > -- > *From:* Sam

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread Sameer W
I have used a JavaScript engine in my CEP to evaluate my patterns. Each event is a list of named attributes (HashMap like). And event is attached to a list of rules expressed as JavaScript code (See example below with one rule but I can match as many rules). The rules are distributed over a

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Sameer W
> Till > > On Tue, Oct 11, 2016 at 4:50 PM, Sameer W <sam...@axiomine.com> wrote: > >> Hi, >> >> If you know that the events are arriving in order and a consistent lag, >> why not just increment the watermark time every time the >> getCurrentWatermark(

Re: CEP and slightly out of order elements

2016-10-11 Thread Sameer W
Cheers, > Till > > On Tue, Oct 11, 2016 at 1:51 PM, Sameer W <sam...@axiomine.com> wrote: > >> Hi, >> >> If using CEP with event-time I have events which can be slightly out of >> order and I want to sort them by timestamp within their time-windows before >

Re: Listening to timed-out patterns in Flink CEP

2016-10-11 Thread Sameer W
Hi, If you know that the events are arriving in order and a consistent lag, why not just increment the watermark time every time the getCurrentWatermark() method is invoked based on the autoWatermarkInterval (or less to be conservative). You can check if the watermark has changed since the

CEP and slightly out of order elements

2016-10-11 Thread Sameer W
Hi, If using CEP with event-time I have events which can be slightly out of order and I want to sort them by timestamp within their time-windows before applying CEP- For example, if using 5 second windows and I use the following ds2 = ds.keyBy.window(TumblingWindow(10 seconds).apply(/*Sort by

Side Inputs vs. Connected Streams

2016-10-03 Thread Sameer W
Hi, I read the Side Inputs design document. How does it compare to using ConnectedStreams with respect to handling the ordering of streams transparently? One of the challenges I have with ConnectedStreams is

Re: Accessing state in connected streams

2016-08-27 Thread Sameer W
Ok sorry about that :-). I misunderstood as I am not familiar with Scala code. Just curious though how are you passing two MapFunction's to the flatMap function on the connected stream. The interface of ConnectedStream requires just one CoMap function-

Re: Accessing state in connected streams

2016-08-27 Thread Sameer W
There is no guarantee about the order in which each stream elements arrive in a connected streams. You have to check if the elements have arrived from Stream A before using the information to process elements from Stream B. Otherwise you have to buffer elements from stream B and check if there are

Re: Threading Model for Kinesis

2016-08-23 Thread Sameer W
e will still have 2 partitions. If the > 2 partitions belong to the same broker, the source instance will have only > 1 consuming threads; otherwise if the 2 partitions belong to different > brokers, the source instance will have 2 consuming threads. > > Regards, > Gordon > >

Re: Threading Model for Kinesis

2016-08-23 Thread Sameer W
, the source > instance will only create 1 thread to consume all of them. > > You are correct that currently the Kafka consumer does not handle > repartitioning transparently like the Kinesis connector, but we’re working > on this :) > > Regards, > Gordon > > On Aug

Re: Default timestamps for Event Time when no Watermark Assigner used?

2016-08-23 Thread Sameer W
y > assign timestamps, the Kinesis server-side timestamp (the time which > Kinesis received the record) is attached to the record as default, not > Flink’s ingestion time. > > Does this answer your question? > > Regards, > Gordon > > > On August 23, 2016 at 6:

Re: Threading Model for Kinesis

2016-08-23 Thread Sameer W
) > > Regards, > Gordon > > On August 23, 2016 at 6:50:31 PM, Sameer W (sam...@axiomine.com) wrote: > > Hi, > > The documentation says that there will be one thread per shard. If I my > streaming job runs with a parallelism of 10 and there are 20 shards, are > more thr

Threading Model for Kinesis

2016-08-23 Thread Sameer W
Hi, The documentation says that there will be one thread per shard. If I my streaming job runs with a parallelism of 10 and there are 20 shards, are more threads going to be launched within a task slot running a source function to consume the additional shards or will one source function

Default timestamps for Event Time when no Watermark Assigner used?

2016-08-23 Thread Sameer W
Hi, If you do not explicitly assign timestamps and watermarks when using Event Time, does it automatically default to using Ingestion Time? I was reading the Kinesis integration section and came across the note below and which raised the above question. I saw another place where you explicitly

Re: counting elements in datastream

2016-08-18 Thread Sameer W
Use Count windows and keep emitting results say every 1000 elements and do a sum. Or do without windows something like this which has the disadvantage that it emits a new updated result for each new element (not a good thing if your volume is high)-

Re: Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-12 Thread Sameer W
e > you emitted a Watermark. If you haven't emitted a Watermark for some > time, you can kick off a timeout and emit a Watermark. > > Cheers, > Max > > On Thu, Aug 11, 2016 at 1:05 AM, Sameer W <sam...@axiomine.com> wrote: > > Sorry for replying to my own messages but t

Re: Does Flink DataStreams using combiners?

2016-08-11 Thread Sameer W
Sorry I mean streaming cannot use combiners (repeated below) --- Streaming cannot use combiners. The aggregations happen on the trigger. The elements being aggregated are only known after the trigger delivers the elements to the evaluation function. Since windows can overlap and even

Re: Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-10 Thread Sameer W
PM, Sameer W <sam...@axiomine.com> wrote: > And this is happening in my local environment. As soon as I set the > parallelism to 1 it all works fine. > > Sameer > > On Wed, Aug 10, 2016 at 3:11 PM, Sameer W <sam...@axiomine.com> wrote: > >> Hi, >> &

Re: Flink : CEP processing

2016-08-10 Thread Sameer W
recovery. If that is > the case, then there are events in the CEP might be in two snapshots. > > Mans > > > On Tuesday, August 9, 2016 1:15 PM, Sameer W <sam...@axiomine.com> wrote: > > > In one of the earlier thread Till explained this to me ( > http://apache-flink-user-

Within interval for CEP - Wall Clock based or Event Timestamp based?

2016-08-10 Thread Sameer W
Hi, I am using EventTime but when the records get into the CEP PatternStream does the WITHIN interval refer to the wall clock time or the timestamps embedded in the event stream? If I provide WITHIN(Time.Seconds(5)) and in processing time I am getting events with timestamps in the range of 10

Re: Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-10 Thread Sameer W
And this is happening in my local environment. As soon as I set the parallelism to 1 it all works fine. Sameer On Wed, Aug 10, 2016 at 3:11 PM, Sameer W <sam...@axiomine.com> wrote: > Hi, > > I am noticing this behavior with Event Time processing- > > I have a Kafka top

Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-10 Thread Sameer W
Hi, I am noticing this behavior with Event Time processing- I have a Kafka topic with 10 partitions. Each Event Source sends data to any one of the partitions. Say I have only 1 event source active at this moment, which means only one partition is receiving data. None of my windows will fire

Connected Streams - Controlling Order of arrival on the two streams

2016-08-09 Thread Sameer W
Hi, I am using connected streams to send rules coded as JavaScript functions on one stream and event data on another stream. They are both keyed by the device id. The rules are cached in the co-map operation until another rule arrives to override existing rule. Is there a way to ensure that the

Re: Flink : CEP processing

2016-08-09 Thread Sameer W
Sameer. > > So does that mean that if the events keys are not same we cannot use the > CEP pattern match ? What if events are coming from different sources and > need to be correlated ? > > Mans > > > On Tuesday, August 9, 2016 9:40 AM, Sameer W <sam...@axiomine.co

Re: Flink : CEP processing

2016-08-09 Thread Sameer W
Hi, You will need to use keyBy operation first to get all the events you need monitored in a pattern on the same node. Only then can you apply Pattern because it depends on the order of the events (first, next, followed by). I even had to make sure that the events were correctly sorted by

Re: Having a single copy of an object read in a RichMapFunction

2016-08-04 Thread Sameer W
Theodore, Broadcast variables do that when using the DataSet API - http://data-artisans.com/how-to-factorize-a-700-gb-matrix-with-apache-flink/ See the following lines in the article- To support the above presented algorithm efficiently we had to improve Flink’s broadcasting mechanism since it

Re: CEP and Within Clause

2016-08-02 Thread Sameer W
rrives in the 11th minute, then it depends whether the second > T=31 arrived sometime between the 1st and 11th minute. If that's the case, > then you should also see a second matching. > > Cheers, > Till > > On Tue, Aug 2, 2016 at 10:20 PM, Sameer W <sam...@axiomine.com> wro

Re: CEP and Within Clause

2016-08-02 Thread Sameer W
gt; Till > > > > On Tue, Aug 2, 2016 at 5:12 AM, Aljoscha Krettek <aljos...@apache.org> > wrote: > >> +Till, looping him in directly, he probably missed this because he was >> away for a while. >> >> >> >> On Tue, 26 Jul 2016 at 18:21 Sameer W <

CEP and Within Clause

2016-07-26 Thread Sameer W
Hi, It looks like the WithIn clause of CEP uses Tumbling Windows. I could get it to use Sliding windows by using an upstream pipeline which uses Sliding Windows and produces repeating elements (in each sliding window) and applying a Watermark assigner on the resulting stream with elements

Re: Question about Checkpoint Storage (RocksDB)

2016-07-26 Thread Sameer W
(RocksDB?) when a threshold size is reached? Thanks, Sameer On Tue, Jul 26, 2016 at 7:29 AM, Ufuk Celebi <u...@apache.org> wrote: > On Mon, Jul 25, 2016 at 8:50 PM, Sameer W <sam...@axiomine.com> wrote: > > The question is, if using really long windows (in hours) if the stat

Question about Checkpoint Storage (RocksDB)

2016-07-25 Thread Sameer W
Hi, My understanding about the RocksDB state backend is as follows: When using a RocksDB state backend, it the checkpoints are backed up locally (to the TaskManager) using the backup feature of RocksDB by taking snapshots from RocksDB which are consistent read-only views on the RockDB database.

Re: Processing windows in event time order

2016-07-21 Thread Sameer W
the elements with a timestamp lower than the watermark have been sent as > well. > > On Thu, 21 Jul 2016 at 13:10 Sameer W <sam...@axiomine.com> wrote: > >> Thanks, Aljoscha, >> >> This what I am seeing when I use Ascending timestamps as watermarks- >>

Re: Processing windows in event time order

2016-07-21 Thread Sameer W
sent as > well. > > On Thu, 21 Jul 2016 at 13:10 Sameer W <sam...@axiomine.com> wrote: > >> Thanks, Aljoscha, >> >> This what I am seeing when I use Ascending timestamps as watermarks- >> >> Consider a window if 1-5 seconds >> Stream 1- Send

Re: Aggregate events in time window

2016-07-19 Thread Sameer W
How about using EventTime windows with watermark assignment and bounded delays. That way you allow more than 5 minutes (bounded delay) for your request and responses to arrive. Do you have a way to assign timestamp to the responses based on the request timestamp (does the response contain the

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Sameer W
rectly. > > Does it mean I create a jar package and then run it via eclipse? > > If not, could you point me to some resources? > > Thanks > Biplob > > > Sameer W wrote > > From Eclipse it creates a local environment and runs in the IDE. When the > > program

Re: Can't access Flink Dashboard at 8081, running Flink program using Eclipse

2016-07-19 Thread Sameer W
>From Eclipse it creates a local environment and runs in the IDE. When the program finishes so does the Flink execution instance. I have never tried accessing the console when the program is running but one the program is finished there is nothing to connect to. If you need to access the