📘 Read this post on Phabricator at
https://phabricator.wikimedia.org/phame/live/1/post/129/
-------

How’d we do in our strive for operational excellence last month? Read on to
find out!

- Month in numbers.
- Current problems.
- Highlighted stories.

## 📊 *Month in numbers*

* 4 documented incidents in November 2018. [1]
* 42 Wikimedia-prod-error tasks closed in November 2018. [2]
* 36 Wikimedia-prod-error tasks created in November 2018. [3]
* 165 currently open Wikimedia-prod-error tasks (as of 12 December 2018).

Terminology:
* An *Exception* (or fatal) causes user actions to be prevented. For
example, a page would display  “Exception: Unable to render page”, instead
the article content.
* An *Error* (or non-fatal, or warning) can produce page views that are
technically unaware of a problem, but may show corrupt, incorrect, or
incomplete information.  Examples – an article would display the code word
“null” instead of the actual content, a user looking for Vegetables may be
taken to an article about Vegetarians, a user may receive a notification
that says “*You have (null) new messages.*”

With that behind us... Let’s celebrate this month’s highlights!

## *️⃣ *DB exception at wikitech.wikimedia.org
<http://wikitech.wikimedia.org>*

Quiddity reported that he was unable to disable a spam account, due to a
fatal exception. Andre Klapper used the Exception ID to find the stack
trace in the logs. The trace revealed that a table was missing in
Wikitech’s database.

The MediaWiki software was recently expanded with a “Partial blocking”
ability. [4] This involved introducing a new database table that stores
block metadata differently. This software update was deployed to Wikitech,
but this new table was not created.

@Marostegui (Database administrator) quickly applied the schema patches
that create the missing table. Thanks Manuel, Andre, and Quiddity; Teamwork!

– https://phabricator.wikimedia.org/T209674

## *️⃣ *Big-page Deletion Unleashed!*

It had been known for years, [5] that users are unable to delete or restore
pages with more than a few hundred revisions. Attempts to do so could fail,
with a fatal “DBTransactionSizeError” exception. This error indicates that
the change is too big or too slow. Such changes risk replication lag, and
may impact the stability of the infrastructure.

The database structure used by MediaWiki for page archives dates back to
2003 (over 15 years ago). I'll spare you the details, but it depends on
database interactions that are inherently slow when applied to systems as
big as Wikipedia! RFC T20493 intends to modernise this structure for the
long-term.

Then along came @BPirkle. Bill joined the WMF Core platform team earlier
this year. He took on the challenge of making page deletion work for any
size page, today.

Previously, page deletion happened in a single step. This simple approach
had the benefit of either succeeding in its entirety, or safely rolling
back like nothing happened. It also meant that the database protected us
against conflicting changes. In August, Bill started a two-month effort
that carefully split the logic for “delete a page” into smaller steps that
each are safe and quick. It now uses our JobQueue to schedule and run these
steps, without the user waiting for it.

–  https://phabricator.wikimedia.org/T198176 /
https://gerrit.wikimedia.org/r/456035

## 📉 *Current problems*

Take a look at the workboard and look for tasks that might need your help.
The workboard lists known issues, grouped by the week in which they were
first observed.

→  https://phabricator.wikimedia.org/tag/wikimedia-production-error/

I’d like to draw attention to a subset of PHP fatal errors. Specifically,
those that are publicly exposed (e.g. don’t require elevated user rights)
and use an HTTP 500 status code.

* CentralNotice: Some Special:CentralNoticeBanners urls fatal. –
https://phabricator.wikimedia.org/T149240
* Flow: Unable to view certain talk pages due to workflow
InvalidDataException. – https://phabricator.wikimedia.org/T70526
* JsonConfig: Unable to diff certain “.map” pages on Commons. –
https://phabricator.wikimedia.org/T203063
* MediaWiki (Parser): Parse API exposes fatal content model error. –
https://phabricator.wikimedia.org/T206253
* MediaWiki (Special-pages): Special:DoubleRedirects unavailable on ttwiki.
– https://phabricator.wikimedia.org/T204800
* MobileFrontend: Some Special:MobileDiff urls fatal. –
https://phabricator.wikimedia.org/T156293
* ProofreadPage: Unable to edit certain pages on Wikisource. –
https://phabricator.wikimedia.org/T176196
* Translate: Some Special:Translate urls fatal. –
https://phabricator.wikimedia.org/T204833
* Wikibase: Clicking “undo” for some revisions fatals with a
PatcherException. – https://phabricator.wikimedia.org/T97146

Public user requests resulting in fatals can (and have) caused alerts to
fire that notify SRE of wikis potentially being less available or down.

💡*ProTip*: Cross-reference one workboard with another via “Open Tasks” >
“Advanced Filter” and enter Tag(s) to apply as a filter.

## 🎉 *Thank you*

Thank you to everyone who helped by reporting or investigating problems in
Wikimedia production; and for implementing or reviewing their solutions.
Including: tstarling, thiemowmde, thcipriani, Tgr, Steinsplitter, Quiddity,
pmiazga, Nikerabbit, Mvolz, Lucas_Werkmeister_WMDE, kostajh, jrbs, JJMC89,
Jdforrester-WMF, hashar, Gilles, Daimona, Ciencia_Al_Poder, Catrope,
BPirkle, Barkeep49, Anomie, and Aklapper.

Thanks!

Until next time,

– Timo Tijhof

-------

Footnotes:

[1] Incidents. –
https://wikitech.wikimedia.org/wiki/Special:AllPages?from=Incident+documentation%2F20181101&to=Incident+documentation%2F20181131&namespace=0

[2] Tasks closed. –
https://phabricator.wikimedia.org/maniphest/query/.PkyGL4Rz_4i/#R

[3] Tasks opened. –
https://phabricator.wikimedia.org/maniphest/query/WsqbAxlHPLwk/#R

[4] Partial blocks. –
https://meta.wikimedia.org/wiki/Community_health_initiative/Per-user_page,_namespace,_and_upload_blocking

[5] Bug report about page deletion, 2007. –
https://phabricator.wikimedia.org/T13402
_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to