Joining two streams of different priorities

2019-03-06 Thread Aggarwal, Ajay
My main input stream (inputStream1) gets processed using a pipeline that looks like below inputStream1 .keyBy("some-key") .window(TumblingEventTimeWindows.of(Time.seconds(Properties.WINDOW_SIZE)))

Re: Broadcast state with WindowedStream

2019-03-06 Thread Aggarwal, Ajay
Still looking for ideas as to how I can use broadcast state in my use case. From: "Aggarwal, Ajay" Date: Monday, March 4, 2019 at 4:52 PM To: "user@flink.apache.org" Subject: Re: Broadcast state with WindowedStream It sort of makes sense that broadcast st

Re: Checkpoint recovery and state external to flink

2019-03-05 Thread Aggarwal, Ajay
Hi Yun, This is good information. Thank you. However looks like it only applies to SinkFunction. Any thoughts for when intermediate operators are also interacting with external systems? Thanks. Ajay From: Yun Tang Date: Tuesday, March 5, 2019 at 4:04 AM To: "Aggarwal, Ajay&quo

Re: Broadcast state with WindowedStream

2019-03-04 Thread Aggarwal, Ajay
It sort of makes sense that broadcast state is not available with WindowedStream. But if I need some dynamic global state in MyProcessWindowFunction what are my options? Ajay From: "Aggarwal, Ajay" Date: Monday, March 4, 2019 at 4:36 PM To: "user@flink.apache.org" Subj

Broadcast state with WindowedStream

2019-03-04 Thread Aggarwal, Ajay
Is it possible to use broadcast state with windowing? My job looks like below inputStream .keyBy("some-key") .window(TumblingEventTimeWindows.of(Time.seconds(Properties.WINDOW_SIZE)))

Checkpoint recovery and state external to flink

2019-03-04 Thread Aggarwal, Ajay
What happens when the flink job interacts with a user managed database and hence has some state outside of flink? In these situations when a flink job is recovered from last successful checkpoint, this external state will not be in sync with the recovered flink state. In most cases it will be

KeyBy distribution across taskslots

2019-02-27 Thread Aggarwal, Ajay
I couldn’t find reference to it anywhere in the docs, so I thought I will ask here. When I use KeyBy operator, say KeyBy (“customerId”) and some keys (i.e. customers) are way too noisy than others, is there a way to ensure that too many noisy customers do not land on the same taskslot? In

Re: Impact of occasional big pauses in stream processing

2019-02-14 Thread Aggarwal, Ajay
other jobs as well? Is a task slot shared by multiple jobs? If not, my understanding is that this should not impact other flink jobs. Is that correct? Thanks. Ajay From: Andrey Zagrebin Date: Thursday, February 14, 2019 at 5:09 AM To: Rong Rong Cc: "Aggarwal, Ajay" , "user@f

Impact of occasional big pauses in stream processing

2019-02-13 Thread Aggarwal, Ajay
I was wondering what is the impact if one of the stream operator function occasionally takes too long to process the event. Given the following simple flink job inputStream .KeyBy (“tenantId”) .process ( new MyKeyedProcessFunction()) , if occasionally

Re: stream of large objects

2019-02-12 Thread Aggarwal, Ajay
Thanks Konstantin. And when serialization of events does become an issue because of size (say 100s MBs or GBs) how does it manifest itself? Is it mostly latency or something else? Ajay From: Konstantin Knauf Date: Tuesday, February 12, 2019 at 3:41 AM To: "Aggarwal, Ajay" C

Re: stream of large objects

2019-02-11 Thread Aggarwal, Ajay
in a Keyed context, so sharing all of these across all downstream tasks does not seem efficient. From: Chesnay Schepler Date: Sunday, February 10, 2019 at 4:57 AM To: "Aggarwal, Ajay" , "user@flink.apache.org" Subject: Re: stream of large objects NetApp Security WARNING: This

Re: stream of large objects

2019-02-08 Thread Aggarwal, Ajay
: Friday, February 8, 2019 at 8:45 AM To: "Aggarwal, Ajay" , "user@flink.apache.org" Subject: Re: stream of large objects Whether a LargeMessage is serialized depends on how the job is structured. For example, if you were to only apply map/filter functions after the aggr

stream of large objects

2019-02-07 Thread Aggarwal, Ajay
In my use case my source stream contain small size messages, but as part of flink processing I will be aggregating them into large messages and further processing will happen on these large messages. The structure of this large message will be something like this: Class LargeMessage {

late element and expired state

2019-02-05 Thread Aggarwal, Ajay
Hello, I have some questions regarding best practices to deal with ever expanding state with KeyBy(). In my input stream I will continue to see new keys. And I am using Keyed state. How do I keep the total state in limit? After reading the flink documentation and some blogs I am planning to

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
Thanks Fabian for the explanation. Let me do some more reading so what you said can sync-in little more. From: Fabian Hueske Date: Monday, February 4, 2019 at 10:22 AM To: "Aggarwal, Ajay" Cc: Congxian Qiu , "user@flink.apache.org" Subject: Re: Reverse of KeyBy Hi, Su

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
rst KeyBy into a single stream? Ajay From: Fabian Hueske Date: Monday, February 4, 2019 at 9:17 AM To: "Aggarwal, Ajay" Cc: Congxian Qiu , "user@flink.apache.org" Subject: Re: Reverse of KeyBy Hi, Calling keyBy twice will not work, because the second call overrides t

Re: Reverse of KeyBy

2019-02-04 Thread Aggarwal, Ajay
, 2019 at 6:40 AM To: "Aggarwal, Ajay" Cc: "user@flink.apache.org" Subject: Re: Reverse of KeyBy Hi Aggarwal How about keyBy(LargeMessageID) first, then assemble these fragments back into LargeMessages, then keyBy(MeyKey)? Best, Congxian Aggarwal, Ajay mailto:ajay.aggar...@ne

Reverse of KeyBy

2019-02-01 Thread Aggarwal, Ajay
I am new to Flink. I am trying to figure out if there is an operator that provides reverse functionality of KeyBy. Using KeyBy you can split a stream into disjoint partitions. Is there a way to bring those partitions back into a single stream? Let me explain using my use case below. My