Hi folks,

after speaking to a few folks, I'd like to check in on the WMF deployment
train schedule overall, and see if there are ways to optimize it.

(Note: In the below I refer to "test wikis" vs. "production wikis",
generously including mediawiki.org as a test wiki. I realize that our "test
wikis", with the exception of Labs wikis, run on the production cluster.)

== Current practice ==

* On Thursdays we increase the release counter, and deploy the latest
release to test wikis and the previous one to Wikipedias.

* On Mondays we deploy the latest release to non-Wikipedias.

== Problems with this approach ==

* We only have bits of Thursday and all of Friday to resolve issues that
are surfaced in the test wikis prior to the Monday rollout to the first
production wikis.

* Having two stages of release also increases the cognitive load on
developers in understanding when their code hits production wikis, which
arguably increases the risk of negative impact of a deploy going unnoticed.

== Advantages of this approach ==

* Commons serves just about enough traffic to sometimes act as a useful
canary for performance/scaling issues that will later appear in production.

* Developers have some post-deployment time to fix issues highly specific
to the non-Wikipedia wikis (e.g. extensions & gadgets only deployed there)
rather than being distracted by firefighting on Wikipedia

== Some options ==

Option A: Change nothing. I've not heard from enough folks to see if the
problems above are widely perceived to _be_ problems. If the consensus is
that current practice, for now, is the best possible approach, obviously we
should stick with it.

Option B: No Monday deploy. This would mean we'd have to improve our
testing process to catch issues affecting the non-Wikipedia wikis before
they hit production. I personally think getting rid of the Monday deploy
could create some _desirable_ pain that would act as a forcing function to
improve pre-release test practices, rather than using production wikis to
test.

At the same time, we'd have a full week to work out the kinks we find in
testing before they hit any production wiki, and could have a more
systematic process of backing out changes if needed prior to deployment.

Option C: Shift Monday deploys to Tuesday. This would at least give us an
additional work day to fix issues that have occurred in testing before they
hit prod. I personally don't think this goes far enough, but might be a
useful tweak to make if option B seems too problematic.

Are there other ways to optimize / issues I'm missing or misrepresenting
above?

Thanks,
Erik
-- 
Erik Möller
VP of Engineering and Product Development, Wikimedia Foundation
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to