Clear global state

2023-02-02 Thread Dario Heinisch
Hey all, Is it somehow possible to hook into all states in a current Job and clear them all at once? Currently the way I do it is just to stop the job and then restarting it. Was wonderding if there is a way where I can do it without restarting the job. I know about adding TTL to states but

Re: Notify on 0 events in a Tumbling Event Time Window

2022-05-09 Thread Dario Heinisch
It depends on the user case,  in Shilpa's use case it is about users so the user ids are probably know beforehand. https://dpaste.org/cRe3G <= This is an example with out an window but essentially Shilpa you would be reregistering the timers every time they fire. You would also have to ingest

Re: Controlling group partitioning with DataStream

2022-03-04 Thread Dario Heinisch
Hi, I think you are looking for this answer from David: https://stackoverflow.com/questions/69799181/flink-streaming-do-the-events-get-distributed-to-each-task-slots-separately-acc I think then you could technically create your partitioner - though little bit cubersome - by mapping your exist

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch
just want to do tumbling window over the "processing time" I.e: count whatever I have in the last 5 minutes. On Mon, 31 Jan 2022 at 17:09, Dario Heinisch wrote: Hi John This is because you are using event time (TumblingEventTimeWinodws) but you do not have a event tim

Re: Tumbling window apply will not "fire"

2022-01-31 Thread Dario Heinisch
Hi John This is because you are using event time (TumblingEventTimeWinodws) but you do not have a event time watermark strategy. It is also why I opened: https://issues.apache.org/jira/browse/FLINK-24623 because I feel like Flink should be throwing an exception in that case on startup. Take

Flink test late elements

2022-01-28 Thread Dario Heinisch
Hey there, Hope everyone is well! I have a question: ``` StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();     env.setParallelism(1);     DataStream dataStream = env.addSource(new CustomSource());     OutputTag outputTag = new OutputTag("late"

Re: [EXTERNAL] Re: Periodic Job Failure

2021-12-16 Thread Dario Heinisch
A shot in the dark but could it be this: https://mux.com/blog/5-years-of-flink-at-mux/ ? > The JVM will cache DNS entries forever by default. This is undesirable in Kubernetes deployments where there’s an expectation that DNS entries can and do change frequently as pod deployments move between

Re: to join or not to join, that is the question...

2021-11-05 Thread Dario Heinisch
Union creates a new stream containing all elements of the unioned streams: https://nightlies.apache.org/flink/flink-docs-release-1.14/docs/dev/datastream/operators/overview/#union On 05.11.21 14:25, Marco Villalobos wrote: Can two different streams flow to the same operator (an operator with t

Sliding window with filtering

2021-08-10 Thread Dario Heinisch
Hey there, So I have a stream of data, let the stream be a_1, a_2, a_3, a_4, a_5. Now I would like to have a sliding window which slides by 1 second and takes the data of 1 second. But here comes the caveat: - I only want to keep the data in the window that does not have matching elements.

Re: Delay data elements in pipeline by X minutes

2021-07-19 Thread Dario Heinisch
t approach, then adding a random key to create a keyed stream should work in all cases, right?  Jan On 7/18/21 3:52 PM, Dario Heinisch wrote: Hey Kiran, Yeah was thinking of another solution, so I have one posgresql sink & one kafka sink. So I can just process the data in real time a

Re: Delay data elements in pipeline by X minutes

2021-07-18 Thread Dario Heinisch
8.07.21 15:29, Kiran Japannavar wrote: Hi Dario, Did you explore other options? If your use case (apart from delaying sink writes) can be solved via spark streaming. Then maybe spark streaming with a micro-batch of 15 mins would help. On Sat, Jul 17, 2021 at 10:17 PM Dario Heinisch

Delay data elements in pipeline by X minutes

2021-07-17 Thread Dario Heinisch
Hey there, Hope all is well! I would like to delay the time by 15minutes before my data arrives at my sinks: stream() .map() [] . .print() I tried implementing my own ProcessFunction where TimeStamper is a custom Interface: public abstract class Timestamper {     public abstract long