Hi all, I am not sure when I should go for multiple jobs or have 1 job with all the sources and sinks. Following is my code.
val env = StreamExecutionEnvironment.getExecutionEnvironment ....... // create a Kafka source val srcstream = env.addSource(consumer) srcstream .keyBy(0) .window(ProcessingTimeSessionWindows.withGap(Time.days(14))) .reduce ... .map ... .addSink ... srcstream .keyBy(0) .window(ProcessingTimeSessionWindows.withGap(Time.days(28))) .reduce ... .map ... .addSink ... env.execute("Job1") My questions 1. The srcstream is a very high volume stream and the window size is 2 weeks and 4 weeks. Is the window size a problem? In this case, I think it is not a problem because I am using reduce which stores only 1 value per window. Is that right? 2. I am having 2 output operations one with 2 weeks window and the other with 4 weeks window. Are they executed in parallel or in sequence? 3. When I have multiple output operations like in this case should I break it into 2 different jobs ? 4. Can I run multiple jobs on the same cluster? Thanks