Awesome, thanks! Is there a way to make the new yarn job only on the new hardware? Or would the two jobs have to run on intersecting hardware and then would be switched on/off, which means we'll need a buffer of resources for our orchestration?
Also, good point on recovery. I'll spend some time looking into this. Thanks On Wed, Nov 11, 2020 at 11:53 PM Robert Metzger <rmetz...@apache.org> wrote: > Hey Rex, > > the second approach (spinning up a standby job and then doing a handover) > sounds more promising to implement, without rewriting half of the Flink > codebase ;) > What you need is a tool that orchestrates creating a savepoint, starting a > second job from the savepoint and then communicating with a custom sink > implementation that can be switched on/off in the two jobs. > With that approach, you should have almost no downtime, just increased > resource requirements during such a handover. > > What you need to consider as well is that this handover process only works > for scheduled maintenance. If you have a system failure, you'll have > downtime until the last checkpoint is restored. > If you are trying to reduce the potential downtime overall, I would rather > recommend optimizing the checkpoint restore time, as this can cover both > scheduled maintenance and system failures. > > Best, > Robert > > > > > > On Wed, Nov 11, 2020 at 8:56 PM Rex Fenley <r...@remind101.com> wrote: > >> Another thought, would it be possible to >> * Spin up new core or task nodes. >> * Run a new copy of the same job on these new nodes from a savepoint. >> * Have the new job *not* write to the sink until the other job is torn >> down? >> >> This would allow us to be eventually consistent and maintain writes going >> through without downtime. As long as whatever is buffering the sink doesn't >> run out of space it should work just fine. >> >> We're hoping to achieve consistency in less than 30s ideally. >> >> Again though, if we could get savepoints to restore in less than 30s that >> would likely be sufficient for our purposes. >> >> Thanks! >> >> On Wed, Nov 11, 2020 at 11:42 AM Rex Fenley <r...@remind101.com> wrote: >> >>> Hello, >>> >>> I'm trying to find a solution for auto scaling our Flink EMR cluster >>> with 0 downtime using RocksDB as state storage and S3 backing store. >>> >>> My current thoughts are like so: >>> * Scaling an Operator dynamically would require all keyed state to be >>> available to the set of subtasks for that operator, therefore a set of >>> subtasks must be reading to and writing from the same RocksDB. I.e. to >>> scale in and out subtasks in that set, they need to read from the same >>> Rocks. >>> * Since subtasks can run on different core nodes, is it possible to have >>> different core nodes read/write to the same RocksDB? >>> * When's the safe point to scale in and out an operator? Only right >>> after a checkpoint possibly? >>> >>> If the above is not possible then we'll have to use save points which >>> means some downtime, therefore: >>> * Scaling out during high traffic is arguably more important to react >>> quickly to than scaling in during low traffic. Is it possible to add more >>> core nodes to EMR without disturbing a job? If so then maybe we can >>> orchestrate running a new job on new nodes and then loading a savepoint >>> from a currently running job. >>> >>> Lastly >>> * Save Points for ~70Gib of data take on the order of minutes to tens of >>> minutes for us to restore from, is there any way to speed up restoration? >>> >>> Thanks! >>> >>> -- >>> >>> Rex Fenley | Software Engineer - Mobile and Backend >>> >>> >>> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >>> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >>> <https://www.facebook.com/remindhq> >>> >> >> >> -- >> >> Rex Fenley | Software Engineer - Mobile and Backend >> >> >> Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> >> | FOLLOW US <https://twitter.com/remindhq> | LIKE US >> <https://www.facebook.com/remindhq> >> > -- Rex Fenley | Software Engineer - Mobile and Backend Remind.com <https://www.remind.com/> | BLOG <http://blog.remind.com/> | FOLLOW US <https://twitter.com/remindhq> | LIKE US <https://www.facebook.com/remindhq>