Re: Changing parallelism

Stephan Ewen Thu, 18 Feb 2016 05:14:52 -0800

Hi Zach!

Yes, changing parallelism is pretty high up the priority list. The good
news is that "scaling in" is the simpler part of changing the parallelism
and we are pushing to get that in soon.

Until then, there is only a pretty ugly trick that you can do right now to
"rescale' the state:

  1) savepoint with high parallelism

  2) run an intermediate job that has the state twice in two operators,
once with high parallelism, once with low. Emit the state from the first
operator, write in the second. The first operator has the operator ID of
the initial high-parallelism state.

  3) Run the low parallelism job, and the stateful operator needs the ID of
the second (low parallelism) operator in the intermediate job.

Greetings,
Stephan

On Thu, Feb 18, 2016 at 9:24 AM, Ufuk Celebi <u...@apache.org> wrote:

> Hey Zach!
>
> Sounds like a great use case.
>
> On Wed, Feb 17, 2016 at 3:16 PM, Zach Cox <zcox...@gmail.com> wrote:
> > However, the savepoint docs state that the job parallelism cannot be
> changed
> > over time [1]. Does this mean we need to use the same, fixed
> parallelism=n
> > during reprocessing and going forward? Are there any tricks or
> workarounds
> > we could use to still make changes to parallelism and take advantage of
> > savepoints?
>
> Yes, currently you have to keep the parallelism fixed. Dynamic scale
> in and out of programs will have very high priority after the 1.0
> release [1]. Unfortunately, I'm not aware of any work arounds to
> overcome this at the moment.
>
> – Ufuk
>
> [1] https://flink.apache.org/news/2015/12/18/a-year-in-review.html (at
> the end of the post there is a road map for 2016)
>

Re: Changing parallelism

Reply via email to