Signal =
- exit status from service
- reason code from mesos, it task was killed by Mesos e.g. revocable core
revoked during oversubscription

Yes, I am aware of co-ordinated updates which allow this logic to be placed
outside Aurora. How does rollback work in that case? Perhaps I should just
disable auto-rollback in that case and out the rollback logic also into
this external system.

On Wed, Nov 1, 2017 at 8:39 AM, Bill Farner <wfar...@apache.org> wrote:

> Can Aurora distinguish between failures caused by the upgrade itself or
>> other transient systemic issues
>
>
> There isn't any signal i know of that would allow Aurora to independently
> determine the cause of task failures in a generic way.
>
> Two options come to mind:
> 1. Human intervention - aurora update pause from the CLI
> 2. Configure jobs to use JobUpdateSettings.blockIfNoPulsesAfterMs
> <https://github.com/apache/aurora/blob/d106b4ecc9537b8e844c4edc2210b9fe1853ccc4/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L708-L714>,
> and set up an in-house service to invoke pulseJobUpdate()
> <https://github.com/apache/aurora/blob/d106b4ecc9537b8e844c4edc2210b9fe1853ccc4/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L1134-L1139>.
> This opts the job update into requiring periodic positive acknowledgement
> from an external system that it is safe to proceed.  You could use this,
> for example, to automatically gate an update while a service has alerts
> firing.
>
>
>
> On Tue, Oct 31, 2017 at 1:14 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote:
>
>> Folks,
>> Sometimes in our cluster upgrades start failing due to transient outages
>> of dependencies or reasons unrelated to the new code being pushed out.
>> Aurora hits its failure threshold and starts automatic rollback which may
>> make a bad condition worse (e.g. if the outage was related to load rollback
>> will increase load). Can Aurora distinguish between failures caused by the
>> upgrade itself or other transient systemic issues (using e.g. reason code)?
>> If not does this make sense as a new feature?
>>
>> Mohit.
>>
>>
>

Reply via email to