Awesome, thanks! Is there a way to make the new yarn job only on the new
hardware? Or would the two jobs have to run on intersecting hardware and
then would be switched on/off, which means we'll need a buffer of resources
for our orchestration?

Also, good point on recovery. I'll spend some time looking into this.

Thanks


On Wed, Nov 11, 2020 at 11:53 PM Robert Metzger <rmetz...@apache.org> wrote:

> Hey Rex,
>
> the second approach (spinning up a standby job and then doing a handover)
> sounds more promising to implement, without rewriting half of the Flink
> codebase ;)
> What you need is a tool that orchestrates creating a savepoint, starting a
> second job from the savepoint and then communicating with a custom sink
> implementation that can be switched on/off in the two jobs.
> With that approach, you should have almost no downtime, just increased
> resource requirements during such a handover.
>
> What you need to consider as well is that this handover process only works
> for scheduled maintenance. If you have a system failure, you'll have
> downtime until the last checkpoint is restored.
> If you are trying to reduce the potential downtime overall, I would rather
> recommend optimizing the checkpoint restore time, as this can cover both
> scheduled maintenance and system failures.
>
> Best,
> Robert
>
>
>
>
>
> On Wed, Nov 11, 2020 at 8:56 PM Rex Fenley <r...@remind101.com> wrote:
>
>> Another thought, would it be possible to
>> * Spin up new core or task nodes.
>> * Run a new copy of the same job on these new nodes from a savepoint.
>> * Have the new job *not* write to the sink until the other job is torn
>> down?
>>
>> This would allow us to be eventually consistent and maintain writes going
>> through without downtime. As long as whatever is buffering the sink doesn't
>> run out of space it should work just fine.
>>
>> We're hoping to achieve consistency in less than 30s ideally.
>>
>> Again though, if we could get savepoints to restore in less than 30s that
>> would likely be sufficient for our purposes.
>>
>> Thanks!
>>
>> On Wed, Nov 11, 2020 at 11:42 AM Rex Fenley <r...@remind101.com> wrote:
>>
>>> Hello,
>>>
>>> I'm trying to find a solution for auto scaling our Flink EMR cluster
>>> with 0 downtime using RocksDB as state storage and S3 backing store.
>>>
>>> My current thoughts are like so:
>>> * Scaling an Operator dynamically would require all keyed state to be
>>> available to the set of subtasks for that operator, therefore a set of
>>> subtasks must be reading to and writing from the same RocksDB. I.e. to
>>> scale in and out subtasks in that set, they need to read from the same
>>> Rocks.
>>> * Since subtasks can run on different core nodes, is it possible to have
>>> different core nodes read/write to the same RocksDB?
>>> * When's the safe point to scale in and out an operator? Only right
>>> after a checkpoint possibly?
>>>
>>> If the above is not possible then we'll have to use save points which
>>> means some downtime, therefore:
>>> * Scaling out during high traffic is arguably more important to react
>>> quickly to than scaling in during low traffic. Is it possible to add more
>>> core nodes to EMR without disturbing a job? If so then maybe we can
>>> orchestrate running a new job on new nodes and then loading a savepoint
>>> from a currently running job.
>>>
>>> Lastly
>>> * Save Points for ~70Gib of data take on the order of minutes to tens of
>>> minutes for us to restore from, is there any way to speed up restoration?
>>>
>>> Thanks!
>>>
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>>
>> --
>>
>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>
>>
>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>> <https://www.facebook.com/remindhq>
>>
>

-- 

Rex Fenley  |  Software Engineer - Mobile and Backend


Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
 |  FOLLOW
US <https://twitter.com/remindhq>  |  LIKE US
<https://www.facebook.com/remindhq>

Reply via email to