Re: Flink AutoScaling EMR

Robert Metzger Thu, 12 Nov 2020 22:58:13 -0800

Hi,
it seems that YARN has a feature for targeting specific hardware:
https://hadoop.apache.org/docs/r3.1.0/hadoop-yarn/hadoop-yarn-site/PlacementConstraints.html
In any case, you'll need enough spare resources for some time to be able to
run your job twice for this kind of "zero downtime handover"


On Thu, Nov 12, 2020 at 6:10 PM Rex Fenley <r...@remind101.com> wrote:

> Awesome, thanks! Is there a way to make the new yarn job only on the new
> hardware? Or would the two jobs have to run on intersecting hardware and
> then would be switched on/off, which means we'll need a buffer of resources
> for our orchestration?
>
> Also, good point on recovery. I'll spend some time looking into this.
>
> Thanks
>
>
> On Wed, Nov 11, 2020 at 11:53 PM Robert Metzger <rmetz...@apache.org>
> wrote:
>
>> Hey Rex,
>>
>> the second approach (spinning up a standby job and then doing a handover)
>> sounds more promising to implement, without rewriting half of the Flink
>> codebase ;)
>> What you need is a tool that orchestrates creating a savepoint, starting
>> a second job from the savepoint and then communicating with a custom sink
>> implementation that can be switched on/off in the two jobs.
>> With that approach, you should have almost no downtime, just increased
>> resource requirements during such a handover.
>>
>> What you need to consider as well is that this handover process only
>> works for scheduled maintenance. If you have a system failure, you'll have
>> downtime until the last checkpoint is restored.
>> If you are trying to reduce the potential downtime overall, I would
>> rather recommend optimizing the checkpoint restore time, as this can cover
>> both scheduled maintenance and system failures.
>>
>> Best,
>> Robert
>>
>>
>>
>>
>>
>> On Wed, Nov 11, 2020 at 8:56 PM Rex Fenley <r...@remind101.com> wrote:
>>
>>> Another thought, would it be possible to
>>> * Spin up new core or task nodes.
>>> * Run a new copy of the same job on these new nodes from a savepoint.
>>> * Have the new job *not* write to the sink until the other job is torn
>>> down?
>>>
>>> This would allow us to be eventually consistent and maintain writes
>>> going through without downtime. As long as whatever is buffering the sink
>>> doesn't run out of space it should work just fine.
>>>
>>> We're hoping to achieve consistency in less than 30s ideally.
>>>
>>> Again though, if we could get savepoints to restore in less than 30s
>>> that would likely be sufficient for our purposes.
>>>
>>> Thanks!
>>>
>>> On Wed, Nov 11, 2020 at 11:42 AM Rex Fenley <r...@remind101.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I'm trying to find a solution for auto scaling our Flink EMR cluster
>>>> with 0 downtime using RocksDB as state storage and S3 backing store.
>>>>
>>>> My current thoughts are like so:
>>>> * Scaling an Operator dynamically would require all keyed state to be
>>>> available to the set of subtasks for that operator, therefore a set of
>>>> subtasks must be reading to and writing from the same RocksDB. I.e. to
>>>> scale in and out subtasks in that set, they need to read from the same
>>>> Rocks.
>>>> * Since subtasks can run on different core nodes, is it possible to
>>>> have different core nodes read/write to the same RocksDB?
>>>> * When's the safe point to scale in and out an operator? Only right
>>>> after a checkpoint possibly?
>>>>
>>>> If the above is not possible then we'll have to use save points which
>>>> means some downtime, therefore:
>>>> * Scaling out during high traffic is arguably more important to react
>>>> quickly to than scaling in during low traffic. Is it possible to add more
>>>> core nodes to EMR without disturbing a job? If so then maybe we can
>>>> orchestrate running a new job on new nodes and then loading a savepoint
>>>> from a currently running job.
>>>>
>>>> Lastly
>>>> * Save Points for ~70Gib of data take on the order of minutes to tens
>>>> of minutes for us to restore from, is there any way to speed up 
>>>> restoration?
>>>>
>>>> Thanks!
>>>>
>>>> --
>>>>
>>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>>
>>>>
>>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>>> <https://www.facebook.com/remindhq>
>>>>
>>>
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>

Re: Flink AutoScaling EMR

Reply via email to