Multiple output operations in a job vs multiple jobs

anna stax Tue, 31 Jul 2018 00:11:28 -0700

Hi all,

I am not sure when I should go for multiple jobs or have 1 job with all the
sources and sinks. Following is my code.


   val env = StreamExecutionEnvironment.getExecutionEnvironment
    .......
    // create a Kafka source
    val srcstream = env.addSource(consumer)

    srcstream
      .keyBy(0)
      .window(ProcessingTimeSessionWindows.withGap(Time.days(14)))
      .reduce  ...
      .map ...
      .addSink ...

    srcstream
      .keyBy(0)
      .window(ProcessingTimeSessionWindows.withGap(Time.days(28)))
      .reduce  ...
      .map ...
      .addSink ...

    env.execute("Job1")

My questions

1. The srcstream is a very high volume stream and the window size is 2
weeks and 4 weeks. Is the window size a problem? In this case, I think it
is not a problem because I am using reduce which stores only 1 value per
window. Is that right?

2. I am having 2 output operations one with 2 weeks window and the other
with 4 weeks window. Are they executed in parallel or in sequence?

3. When I have multiple output operations like in this case should I break
it into 2 different jobs ?

4. Can I run multiple jobs on the same cluster?

Thanks

Multiple output operations in a job vs multiple jobs

Reply via email to