Re: [Wikitech-l] [WikimediaMobile] Fwd: Deployment postmortem

2013-10-31 Thread Matthew Walker
With all my prep work completed ahead of time; I can get a CentralNotice LD
out to both production branches in about 15 minutes (waiting on the Jenkins
merge is the longest bit of that.) I watch both the fatal and exception
logs whilst doing it and then quickly run through the patches to make sure
it's all working.

I've felt pressured in the LD to get stuff out and myself out of the way
when there have been more than two people in it -- which does correlate
with my 15 minute estimate for the fastest I feel I can safely deploy.

~Matt Walker
Wikimedia Foundation
Fundraising Technology Team


On Thu, Oct 31, 2013 at 7:53 AM, Adam Baso ab...@wikimedia.org wrote:

 Everyone, I apologize for the bug.

 I'll look for ways to guard better against this risk in the future, which
 will be important as we look to expand coverage of Wikipedia Zero to sister
 projects and the desktop form factor.

 Thanks to everyone for resolving the issue so quickly. You guys rule.

 And Roan, thanks for not flipping over my desk, despite the bug making RL
 go haywire on Wikidata AND holding up your lightning deployment. It's true
 - you are a gentleman and a scholar.

 -Adam


 On Wed, Oct 30, 2013 at 5:57 PM, Yuri Astrakhan 
 yastrak...@wikimedia.orgwrote:

 == Background ==
 ZeroRatedMobileAccess has always depended on MobileFrontend and used it
 liberally, including calls to its classes. However, it was done in hooks
 called by MF so Zero simply stopped working in absence of MF. This,
 however, changed in [1] where Zero started using a ResourceLoader module
 from MF.

 == What happened ==
 At 23:02pm UTC, after deploying Zero extension updates, fatal monitor was
 flooded with:

  -- Fatal error: Class 'MFResourceLoaderModule' not found in /usr/local/

 apache/common-local/php-1.23wmf1/includes/resourceloader/ResourceLoader.phpon
 line 408

 The issue was tracked down to Wikidata having MobileFrontend disabled,
 while ZeroRatedMobileAccess was enabled. It didn't impact page views
 directly, however all load.php calls that requested the startup module
 caused fatals because it attempted to instantiate MFResourceLoader class
 and couldn't find it. As a consequence, people might have seen pages
 without styles or scripts.

 A number of people (MaxSem, Reedy, Roan, and Greg, and possibly others)
 gave great assistance to track down the issue and rapidly disable the
 ZeroRatedMobileAccess extension in Wikidata. Furthermore, mobile
 configuration [2] will add an additional guard against calling
 ZeroRatedMobileAccess.php unless it's explicitly within the context of MF.

 Thank you to everyone!!!

 == Timeline ==
 All times in UTC

 * 22:48 Zero 1.22wmf22 deployed, no errors
 * 23:02 Zero 1.23wmf1 deployed, first errors appear - initially unnoticed
 * 23:08 A small MobileFrontend change deployed
 * 23:09 Errors noticed, initially linked with MobileFrontend push
 * 23:17 Max reverts his MobileFrontend changes, errors don't go away
 * 23:22 Problem narrowed down
 * 23:27 Fix deployed

 == Recomendations ==
 * Allow a bit more time between deployments and observe fatalmonitor
 before
 and after
 * Ensure Zero extension checks if Mobile extension is loaded before
 enabling itself if it relies on MFResourceLoader.

 --
 [1] https://gerrit.wikimedia.org/r/#/c/83133
 [2] https://gerrit.wikimedia.org/r/#/c/92811
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



 ___
 Mobile-l mailing list
 mobil...@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/mobile-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [WikimediaMobile] Fwd: Deployment postmortem

2013-10-31 Thread Greg Grossmeier
quote name=Matthew Walker date=2013-10-31 time=10:08:05 -0700
 I've felt pressured in the LD to get stuff out and myself out of the way
 when there have been more than two people in it -- which does correlate
 with my 15 minute estimate for the fastest I feel I can safely deploy.

Thanks for that perspective, Matt.

I think that is a reasonable cut off (15 minutes per LD participant,
thus 2 per LD window) that will still allow people to use the LDs but
also keep our sanity (and site) safe.

edited LD page:
https://wikitech.wikimedia.org/wiki/Lightning_deployments

Greg

-- 
| Greg GrossmeierGPG: B2FA 27B1 F7EB D327 6B8E |
| identi.ca: @gregA18D 1138 8E47 FAC8 1C7D |

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l