Note to deployers: when syncing certain config changes (e.g. adding a new
variable) that touch both InitialiseSettings and CommonSettings, you will
now need to use sync-dir wmf-config, because individual sync-files will
likely fail if the intermediate state throws notices/errors.

(It was a good idea to do this before, but it'll be more strongly enforced

On Jul 25, 2016 12:35, "Tyler Cipriani" <> wrote:

> tl;dr: Scap will deploy to canary servers and check for error-log spikes
> in the next version (to be released Soon™).
> In light of recent incidents[0] which have created outages accompanied by
> large, easily detectable, error-rate spikes, a patch has recently landed in
> Scap[1] that will:
>    1. Push changes to a set of canary servers[2] before syncing to proxy
> servers
>    2. Wait a configurable length of time (currently 20 seconds[3]) for any
> errors to have time to make themselves known
>    3. Query Logstash (using a script written by Gabriel Wicke[4]) to
> determine if the error rate has increased over a configurable threshold
> (currently 10-fold[5])
> Big thanks to the folks that helped in this effort: Gabriel Wicke, Filippo
> Giunchedi and Giuseppe Lavagetto, Bryan Davis and Erik Bernhardson (for
> their mad Logstash skillz)!
> It is noteworthy, that in instances where expedience is required—we're in
> the middle of an outage and who cares what Logstash has to say—the
> `--force` flag can be added to skip canary checks all together (i.e. `scap
> sync-file --force wmf-config/InitialiseSettings 'Panic!!'`).
> The RelEng team's eventual goal is still to move MediaWiki deployments to
> the more robust and resillient Scap3 deployment framework. There is some
> high-priority work that has to happen before the Scap3 move. In the
> interim, we are taking steps (like this one) to respond to incidents and
> keep deployments safe.
> Hopefully, this work and the error-rate alert work from Ori last week[6]
> will allow everyone to be more conscientious and more keenly aware of
> deployments that cause large aberrations in the rate of errors.
> <3,
> Your Friendly Neighborhood Release Engineering Team
> [0].
> is the recent example I could find, but there have been others.
> [1].
> [2].
> [3].
> [4].
> [5].
> [6].
> _______________________________________________
> Ops mailing list
Wikitech-l mailing list

Reply via email to