[Wikitech-l] MediaWiki core schema drifts with production databases

2022-08-22 Thread Amir Sarabadani
Hello,

After many years of work, I'm happy to announce a milestone in addressing
one of our major areas of tech debt in database infrastructure: we have
eliminated all schema drifts between MediaWiki core and production.

It all started six years ago when users in English Wikipedia reported that
checking history of some pages is quite slow *at random*. More in-depth
analysis showed the revision table in English Wikipedia was missing an
important index in some of the replicas. An audit of the schema of the
revision table revealed much bigger drifts in the revision table of that
Wiki. You can read more in its ticket: T132416


Lack of schema parity between expectation and reality is quite dangerous.
Trying to force an index in code assuming it would exist in production
(under the same name) would cause fatal error every time it’s attempted.
Trying to write to a field that doesn’t exist is similar. Such changes
easily pass tests and work well in our test setups (such as beta cluster)
just to cause an outage in production.

If only one table in one Wiki had this many drifts, looking at all Wikis
and all tables became of vital importance. We have around ~1,000 wikis,
~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has
around ~130 tables (half of them being tables from MediaWiki core) and each
table can have multiple drifts.

We slowly started looking for and addressing schema drifts five years ago
and later automated the discovery by utilizing abstract schema (before
that, the tool had to parse SQL) and discovered an overwhelming number of
drifts. You can look at the history of the work in T104459
.

Around fifty tickets addressing the drifts have been completed and they are
collected in T312538 . I suggest
checking some of them to see the scale of the work done. Each one of these
tickets took days to months of work to finish. Large number of them also
existed in primary databases, requiring a primary switchover and read-only
time for one or more Wikis. Each drift was different, in some cases, you
needed to change the code and not production so it needed a thorough
investigation.

Why do such drifts happen? The most common reason was when a schema change
happened in code but it was never requested to be applied in production.
For example, a schema change in code in 2007 led to having any wiki created
before that date to have a different schema than wikis created after it. We
introduced processes

and tooling to make sure this doesn’t happen anymore in 2015 but we still
needed to address previous drifts. The second common reason was when a host
didn’t get the schema change for various reasons (was out of rotation when
the schema was being applied, a shortcoming of the manual process). By
automating  most of the
schema change operational work we reduced the chance of such drifts from
happening as well.

After finishing core, we now need to look at WMF-deployed extensions,
starting with FlaggedRevs  that,
while being deployed to only 50 wikis and having only 8 tables, has ~7,000
drifts. Thankfully, most other extensions are in a healthier state.

I would like to personally thank Manuel Arostegui and Jaime Crespo for
their monumental dedication to fix these issues in the past years. Also a
big thank you to several of our amazing developers, Umherirrender, James
Forrester and Sam Reed who helped on reporting, going through the history
of MediaWiki to figure out why these drifts happened, and helping build the
reporting tools.

Best
-- 
*Amir Sarabadani (he/him)*
Staff Database Architect
Wikimedia Foundation 
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: MediaWiki core schema drifts with production databases

2022-08-22 Thread Sammy Tarling
Wow! That is amazing news — great work all around!


On Mon, Aug 22, 2022 at 8:57 AM Amir Sarabadani 
wrote:

> Hello,
>
> After many years of work, I'm happy to announce a milestone in addressing
> one of our major areas of tech debt in database infrastructure: we have
> eliminated all schema drifts between MediaWiki core and production.
>
> It all started six years ago when users in English Wikipedia reported that
> checking history of some pages is quite slow *at random*. More in-depth
> analysis showed the revision table in English Wikipedia was missing an
> important index in some of the replicas. An audit of the schema of the
> revision table revealed much bigger drifts in the revision table of that
> Wiki. You can read more in its ticket: T132416
> 
>
> Lack of schema parity between expectation and reality is quite dangerous.
> Trying to force an index in code assuming it would exist in production
> (under the same name) would cause fatal error every time it’s attempted.
> Trying to write to a field that doesn’t exist is similar. Such changes
> easily pass tests and work well in our test setups (such as beta cluster)
> just to cause an outage in production.
>
> If only one table in one Wiki had this many drifts, looking at all Wikis
> and all tables became of vital importance. We have around ~1,000 wikis,
> ~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has
> around ~130 tables (half of them being tables from MediaWiki core) and each
> table can have multiple drifts.
>
> We slowly started looking for and addressing schema drifts five years ago
> and later automated the discovery by utilizing abstract schema (before
> that, the tool had to parse SQL) and discovered an overwhelming number of
> drifts. You can look at the history of the work in T104459
> .
>
> Around fifty tickets addressing the drifts have been completed and they
> are collected in T312538 . I
> suggest checking some of them to see the scale of the work done. Each one
> of these tickets took days to months of work to finish. Large number of
> them also existed in primary databases, requiring a primary switchover and
> read-only time for one or more Wikis. Each drift was different, in some
> cases, you needed to change the code and not production so it needed a
> thorough investigation.
>
> Why do such drifts happen? The most common reason was when a schema change
> happened in code but it was never requested to be applied in production.
> For example, a schema change in code in 2007 led to having any wiki created
> before that date to have a different schema than wikis created after it. We
> introduced processes
> 
> and tooling to make sure this doesn’t happen anymore in 2015 but we still
> needed to address previous drifts. The second common reason was when a host
> didn’t get the schema change for various reasons (was out of rotation when
> the schema was being applied, a shortcoming of the manual process). By
> automating  most of the
> schema change operational work we reduced the chance of such drifts from
> happening as well.
>
> After finishing core, we now need to look at WMF-deployed extensions,
> starting with FlaggedRevs 
> that, while being deployed to only 50 wikis and having only 8 tables, has
> ~7,000 drifts. Thankfully, most other extensions are in a healthier state.
>
> I would like to personally thank Manuel Arostegui and Jaime Crespo for
> their monumental dedication to fix these issues in the past years. Also a
> big thank you to several of our amazing developers, Umherirrender, James
> Forrester and Sam Reed who helped on reporting, going through the history
> of MediaWiki to figure out why these drifts happened, and helping build the
> reporting tools.
>
> Best
> --
> *Amir Sarabadani (he/him)*
> Staff Database Architect
> Wikimedia Foundation 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/



-- 
* Sammy Tarling *(she/her)

Software Engineer

Wikimedia Foundation 
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: MediaWiki core schema drifts with production databases

2022-08-22 Thread Martin Domdey
Thank you very much for the great job,

if there are no tl_namespace and tl_title columns in table templatelinks,
please don't forget to actualize
https://www.mediawiki.org/w/index.php?title=Manual:Database_layout/diagram&action=render
and https://www.mediawiki.org/wiki/Manual:Templatelinks_table

All the best,
Martin ...




Am Mo., 22. Aug. 2022 um 09:56 Uhr schrieb Amir Sarabadani <
asarabad...@wikimedia.org>:

> Hello,
>
> After many years of work, I'm happy to announce a milestone in addressing
> one of our major areas of tech debt in database infrastructure: we have
> eliminated all schema drifts between MediaWiki core and production.
>
> It all started six years ago when users in English Wikipedia reported that
> checking history of some pages is quite slow *at random*. More in-depth
> analysis showed the revision table in English Wikipedia was missing an
> important index in some of the replicas. An audit of the schema of the
> revision table revealed much bigger drifts in the revision table of that
> Wiki. You can read more in its ticket: T132416
> 
>
> Lack of schema parity between expectation and reality is quite dangerous.
> Trying to force an index in code assuming it would exist in production
> (under the same name) would cause fatal error every time it’s attempted.
> Trying to write to a field that doesn’t exist is similar. Such changes
> easily pass tests and work well in our test setups (such as beta cluster)
> just to cause an outage in production.
>
> If only one table in one Wiki had this many drifts, looking at all Wikis
> and all tables became of vital importance. We have around ~1,000 wikis,
> ~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has
> around ~130 tables (half of them being tables from MediaWiki core) and each
> table can have multiple drifts.
>
> We slowly started looking for and addressing schema drifts five years ago
> and later automated the discovery by utilizing abstract schema (before
> that, the tool had to parse SQL) and discovered an overwhelming number of
> drifts. You can look at the history of the work in T104459
> .
>
> Around fifty tickets addressing the drifts have been completed and they
> are collected in T312538 . I
> suggest checking some of them to see the scale of the work done. Each one
> of these tickets took days to months of work to finish. Large number of
> them also existed in primary databases, requiring a primary switchover and
> read-only time for one or more Wikis. Each drift was different, in some
> cases, you needed to change the code and not production so it needed a
> thorough investigation.
>
> Why do such drifts happen? The most common reason was when a schema change
> happened in code but it was never requested to be applied in production.
> For example, a schema change in code in 2007 led to having any wiki created
> before that date to have a different schema than wikis created after it. We
> introduced processes
> 
> and tooling to make sure this doesn’t happen anymore in 2015 but we still
> needed to address previous drifts. The second common reason was when a host
> didn’t get the schema change for various reasons (was out of rotation when
> the schema was being applied, a shortcoming of the manual process). By
> automating  most of the
> schema change operational work we reduced the chance of such drifts from
> happening as well.
>
> After finishing core, we now need to look at WMF-deployed extensions,
> starting with FlaggedRevs 
> that, while being deployed to only 50 wikis and having only 8 tables, has
> ~7,000 drifts. Thankfully, most other extensions are in a healthier state.
>
> I would like to personally thank Manuel Arostegui and Jaime Crespo for
> their monumental dedication to fix these issues in the past years. Also a
> big thank you to several of our amazing developers, Umherirrender, James
> Forrester and Sam Reed who helped on reporting, going through the history
> of MediaWiki to figure out why these drifts happened, and helping build the
> reporting tools.
>
> Best
> --
> *Amir Sarabadani (he/him)*
> Staff Database Architect
> Wikimedia Foundation 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: MediaWiki core schema drifts with production databases

2022-08-22 Thread William Avery
Well done. This will save a ton of trouble down the road.

The number associated with FlaggedRevs is 'interesting', and is  not
inconsistent with my experiences as an editor.

On Mon, 22 Aug 2022 at 08:56, Amir Sarabadani 
wrote:

> Hello,
>
> After many years of work, I'm happy to announce a milestone in addressing
> one of our major areas of tech debt in database infrastructure: we have
> eliminated all schema drifts between MediaWiki core and production.
>
> It all started six years ago when users in English Wikipedia reported that
> checking history of some pages is quite slow *at random*. More in-depth
> analysis showed the revision table in English Wikipedia was missing an
> important index in some of the replicas. An audit of the schema of the
> revision table revealed much bigger drifts in the revision table of that
> Wiki. You can read more in its ticket: T132416
> 
>
> Lack of schema parity between expectation and reality is quite dangerous.
> Trying to force an index in code assuming it would exist in production
> (under the same name) would cause fatal error every time it’s attempted.
> Trying to write to a field that doesn’t exist is similar. Such changes
> easily pass tests and work well in our test setups (such as beta cluster)
> just to cause an outage in production.
>
> If only one table in one Wiki had this many drifts, looking at all Wikis
> and all tables became of vital importance. We have around ~1,000 wikis,
> ~200 hosts (each one hosting on average ~100 Wikis), and each Wiki has
> around ~130 tables (half of them being tables from MediaWiki core) and each
> table can have multiple drifts.
>
> We slowly started looking for and addressing schema drifts five years ago
> and later automated the discovery by utilizing abstract schema (before
> that, the tool had to parse SQL) and discovered an overwhelming number of
> drifts. You can look at the history of the work in T104459
> .
>
> Around fifty tickets addressing the drifts have been completed and they
> are collected in T312538 . I
> suggest checking some of them to see the scale of the work done. Each one
> of these tickets took days to months of work to finish. Large number of
> them also existed in primary databases, requiring a primary switchover and
> read-only time for one or more Wikis. Each drift was different, in some
> cases, you needed to change the code and not production so it needed a
> thorough investigation.
>
> Why do such drifts happen? The most common reason was when a schema change
> happened in code but it was never requested to be applied in production.
> For example, a schema change in code in 2007 led to having any wiki created
> before that date to have a different schema than wikis created after it. We
> introduced processes
> 
> and tooling to make sure this doesn’t happen anymore in 2015 but we still
> needed to address previous drifts. The second common reason was when a host
> didn’t get the schema change for various reasons (was out of rotation when
> the schema was being applied, a shortcoming of the manual process). By
> automating  most of the
> schema change operational work we reduced the chance of such drifts from
> happening as well.
>
> After finishing core, we now need to look at WMF-deployed extensions,
> starting with FlaggedRevs 
> that, while being deployed to only 50 wikis and having only 8 tables, has
> ~7,000 drifts. Thankfully, most other extensions are in a healthier state.
>
> I would like to personally thank Manuel Arostegui and Jaime Crespo for
> their monumental dedication to fix these issues in the past years. Also a
> big thank you to several of our amazing developers, Umherirrender, James
> Forrester and Sam Reed who helped on reporting, going through the history
> of MediaWiki to figure out why these drifts happened, and helping build the
> reporting tools.
>
> Best
> --
> *Amir Sarabadani (he/him)*
> Staff Database Architect
> Wikimedia Foundation 
> ___
> Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
> To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/

[Wikitech-l] Re: Wikimania Hackathon: Thanks for coming!

2022-08-22 Thread Haley Lepp
Hi again everyone!
Thank you to those of you who attended the Wikimania Hackathon. If you
haven't yet given feedback, please consider sharing your thoughts and ideas.

My very best,
Haley

On Sun, Aug 14, 2022 at 8:36 PM Haley Lepp  wrote:

> Dear Wikimania Hackathon Attendees,
>
> It was a pleasure spending the weekend hacking with you! I hope you had as
> much fun as I did playing with WikiData games, learning about clever ways
> to improve medical data on wikis, exploring Wikimedia Cloud Services, and
> engaging with all the other cool projects and sessions.
>
> If you missed the final showcase, it was recorded! You can watch it on
> YouTube. 
>
> The playlist of recorded sessions is also available. <
> https://www.youtube.com/playlist?list=PL3soxkz-0_y6xOBqgWuZXv4ZrGJTghXIA>
>
> There’s also information about all the projects on the Wikimania wiki. If
> you didn’t get the chance to present, you can still add your project there.
> 
>
> Finally, please consider giving feedback on the event! If you have ideas,
> things you liked, or things you’d like to see changed, this is your chance
> to share that so we can improve in the future. <
> https://etherpad.wikimedia.org/p/2022WikimaniaHackathon-FB>
>
> My best regards,
>
> Haley and the Developer Advocacy Team
>
>
> --
> *Haley Lepp*
> Technical Community Program Manager
> Wikimedia Foundation 
>
___
Wikitech-l mailing list -- wikitech-l@lists.wikimedia.org
To unsubscribe send an email to wikitech-l-le...@lists.wikimedia.org
https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/