Hi folks, after speaking to a few folks, I'd like to check in on the WMF deployment train schedule overall, and see if there are ways to optimize it.
(Note: In the below I refer to "test wikis" vs. "production wikis", generously including mediawiki.org as a test wiki. I realize that our "test wikis", with the exception of Labs wikis, run on the production cluster.) == Current practice == * On Thursdays we increase the release counter, and deploy the latest release to test wikis and the previous one to Wikipedias. * On Mondays we deploy the latest release to non-Wikipedias. == Problems with this approach == * We only have bits of Thursday and all of Friday to resolve issues that are surfaced in the test wikis prior to the Monday rollout to the first production wikis. * Having two stages of release also increases the cognitive load on developers in understanding when their code hits production wikis, which arguably increases the risk of negative impact of a deploy going unnoticed. == Advantages of this approach == * Commons serves just about enough traffic to sometimes act as a useful canary for performance/scaling issues that will later appear in production. * Developers have some post-deployment time to fix issues highly specific to the non-Wikipedia wikis (e.g. extensions & gadgets only deployed there) rather than being distracted by firefighting on Wikipedia == Some options == Option A: Change nothing. I've not heard from enough folks to see if the problems above are widely perceived to _be_ problems. If the consensus is that current practice, for now, is the best possible approach, obviously we should stick with it. Option B: No Monday deploy. This would mean we'd have to improve our testing process to catch issues affecting the non-Wikipedia wikis before they hit production. I personally think getting rid of the Monday deploy could create some _desirable_ pain that would act as a forcing function to improve pre-release test practices, rather than using production wikis to test. At the same time, we'd have a full week to work out the kinks we find in testing before they hit any production wiki, and could have a more systematic process of backing out changes if needed prior to deployment. Option C: Shift Monday deploys to Tuesday. This would at least give us an additional work day to fix issues that have occurred in testing before they hit prod. I personally don't think this goes far enough, but might be a useful tweak to make if option B seems too problematic. Are there other ways to optimize / issues I'm missing or misrepresenting above? Thanks, Erik -- Erik Möller VP of Engineering and Product Development, Wikimedia Foundation _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l