This is so rad - congratulations indeed to everyone who's been working on this!
On Thu, Apr 21, 2016 at 8:44 AM, Toby Negrin <tneg...@wikimedia.org> wrote: > Congrats Mark and everyone else involved. This is a big step for > reliability and performance of the sites and a difficult technical task to > say the least. > > Well done! > > -Toby > > On Thu, Apr 21, 2016 at 8:37 AM, Mark Bergsma <m...@wikimedia.org> wrote: > >> We've just completed the switch back, and all services are running from >> our main data center eqiad (Ashburn) again. >> >> The process went very smooth this time around. In the past two days >> leading up to this, we've been able to either fix or work around the most >> important issues we encountered on Tuesday. This meant that we had no real >> setbacks or unanticipated delays today, and therefore were able to complete >> the most time pressing and user-impacting part (during which MediaWiki is >> read-only) in 20 minutes, down from ~45 minutes two days ago. >> >> However, we'll be doing this again in the future, and until then we'll >> work on improving and further automating this process to get it down to >> hopefully much lower levels of impact and duration. >> >> Please let us know if you see any issues which may be caused by the >> switch-over(s). >> >> Thanks much to everyone involved! >> >> Mark >> >> >> >> On Thu, Apr 21, 2016 at 3:53 PM, Mark Bergsma <m...@wikimedia.org> wrote: >> >>> Hi everyone, >>> >>> After we've been successfully serving our sites from our backup >>> data-center codfw (Dallas) for the past two days, we're now starting our >>> switch back to eqiad (Ashburn) as planned[1]. >>> >>> We've already moved cache traffic back to eqiad, and within the next >>> minutes, we'll disable editing by going read-only for approximately 30 >>> minutes - hopefully a bit faster than 2 days ago. >>> >>> [1] http://blog.wikimedia.org/2016/04/11/wikimedia-failover-test/ >>> >>> On Tue, Apr 19, 2016 at 6:00 PM, Mark Bergsma <m...@wikimedia.org> >>> wrote: >>> >>>> Hi all, >>>> >>>> Today the data center switch-over commenced as planned, and has just >>>> fully completed successfully. We are now serving our sites from codfw >>>> (Dallas, Texas) for the next 2 days if all stays well. >>>> >>>> We switched the wikis to read-only (editing disabled) at 14:02 UTC, and >>>> went back read-write at 14:48 UTC - a little longer than planned. While >>>> edits were possible then, unfortunately at that time Special:Recent Changes >>>> (and related change feeds) were not yet working due to an unexpected >>>> configuration problem with our Redis servers until 15:10 UTC, when we found >>>> and fixed the issue. The site has stayed up and available for readers >>>> throughout the entire migration. >>>> >>>> Overall the procedure was a success with few problems along the way. >>>> However we've also carefully kept track of any issues and delays we >>>> encountered for evaluation to improve and speed up the procedure, and >>>> reducing impact to our users - some of which will already be implemented >>>> for our switch back on Thursday. >>>> >>>> We're still expecting to find (possibly subtle) issues today, and would >>>> like everyone who notices anything to use the following channels to report >>>> them: >>>> >>>> 1. File a Phabricator issue with project #codfw-rollout >>>> 2. Report issues on IRC: Freenode channel #wikimedia-tech (if urgent) >>>> 3. Send an e-mail to the Operations list: o...@lists.wikimedia.org >>>> >>>> We're not done yet, but thanks to all who have helped so far. :-) >>>> >>>> Mark >>>> >>> >>> -- >>> Mark Bergsma <m...@wikimedia.org> >>> Lead Operations Architect >>> Director of Technical Operations >>> Wikimedia Foundation >>> >> >> >> >> -- >> Mark Bergsma <m...@wikimedia.org> >> Lead Operations Architect >> Director of Technical Operations >> Wikimedia Foundation >> >> _______________________________________________ >> Ops mailing list >> o...@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/ops >> >> > > _______________________________________________ > Ops mailing list > o...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/ops > > -- Arthur Richards Team Practices Manager [[User:Awjrichards]] IRC: awjr +1-415-839-6885 x6687 _______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l