Signal = - exit status from service - reason code from mesos, it task was killed by Mesos e.g. revocable core revoked during oversubscription
Yes, I am aware of co-ordinated updates which allow this logic to be placed outside Aurora. How does rollback work in that case? Perhaps I should just disable auto-rollback in that case and out the rollback logic also into this external system. On Wed, Nov 1, 2017 at 8:39 AM, Bill Farner <wfar...@apache.org> wrote: > Can Aurora distinguish between failures caused by the upgrade itself or >> other transient systemic issues > > > There isn't any signal i know of that would allow Aurora to independently > determine the cause of task failures in a generic way. > > Two options come to mind: > 1. Human intervention - aurora update pause from the CLI > 2. Configure jobs to use JobUpdateSettings.blockIfNoPulsesAfterMs > <https://github.com/apache/aurora/blob/d106b4ecc9537b8e844c4edc2210b9fe1853ccc4/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L708-L714>, > and set up an in-house service to invoke pulseJobUpdate() > <https://github.com/apache/aurora/blob/d106b4ecc9537b8e844c4edc2210b9fe1853ccc4/api/src/main/thrift/org/apache/aurora/gen/api.thrift#L1134-L1139>. > This opts the job update into requiring periodic positive acknowledgement > from an external system that it is safe to proceed. You could use this, > for example, to automatically gate an update while a service has alerts > firing. > > > > On Tue, Oct 31, 2017 at 1:14 PM, Mohit Jaggi <mohit.ja...@uber.com> wrote: > >> Folks, >> Sometimes in our cluster upgrades start failing due to transient outages >> of dependencies or reasons unrelated to the new code being pushed out. >> Aurora hits its failure threshold and starts automatic rollback which may >> make a bad condition worse (e.g. if the outage was related to load rollback >> will increase load). Can Aurora distinguish between failures caused by the >> upgrade itself or other transient systemic issues (using e.g. reason code)? >> If not does this make sense as a new feature? >> >> Mohit. >> >> >