So I am looking at the Flink Management REST API... and, as I see it, there
are two paths to rescale a running topology:

1. Stop the topology with a savepoint and then start it up with the new
savepoint; or
2. Use the /jobs/:jobid/rescaling
<https://ci.apache.org/projects/flink/flink-docs-release-1.8/monitoring/rest_api.html#jobs-jobid-rescaling>
endpoint

The first one seems to work just fine.

The second one seems to just blow up every time I try to use it... I'll get
things like:

https://gist.github.com/stephenc/0bbc08391ddce5a781242900e4b33a5d#file-log-txt

The above was for the topology
https://gist.github.com/stephenc/0bbc08391ddce5a781242900e4b33a5d#file-streamingjob-java
running with options:

    --source parallel

Things are even worse with --source iterator as that has no checkpoint
state to recover from

Right now I am trying to discover what preconditions are required to be met
in order to be able to safely call the Rescaling endpoint and actually have
it work... I should note that I currently have not managed to get it to
work at all!!!

One of the things we are trying to do is add some automation to enable
scale-up / down as we see surges in processing load. We want to have an
automated system that can respond to those situations automatically for low
deltas and trigger an on-call engineer for persistent excess load. In that
regard I'd like to know what the automation should check to know whether it
can do rescaling via the dedicated end-point or if it should use the
reliable (but presumably slower) path of stop with savepoint & start from
savepoint.

The
https://gist.github.com/stephenc/0bbc08391ddce5a781242900e4b33a5d#file-streamingjob-java
job I have been using is just a quick job to let me test the automation on
a local cluster. It is designed to output a strictly increasing sequence of
numbers without missing any... optionally double them and then print them
out. The different sources are me experimenting with different types of
operator to see what kinds of topology can work with the rescaling end-point

Thanks in advance

Reply via email to