I hadn’t followed along with this work and this summary is amazing, thank you so much!
On Sat, Oct 10, 2020 at 21:14 Amir Sarabadani <ladsgr...@gmail.com> wrote: > Hello, > It has been a while since I gave an update on the state of abstracting > schema and schema changes in mediawiki > <https://phabricator.wikimedia.org/T191231>. So here's a really long one. > > So far around half of the mediawiki core tables have been migrated to > abstract schema (plus lots of extensions lika Wikibase, Babel, Linter, > BetaFeatures, etc.). Special thanks to Tgr for reviewing most of the > patches and Sam Reed and James Forrester for doing the extensions. > > With the growing number of schemas being abstracted, this is going to > affect your development if you work on schema and schema changes in core or > any of the extensions. So If you do, please read Manual:Schema changes > <https://www.mediawiki.org/wiki/Manual:Schema_changes> in mediawiki.org > > You might think that abstraction is just migrating SQL to JSON but it's > much more, we are making the database schema of mediawiki much more > consistent, We are basically addressing several long standing issues like > T164898 <https://phabricator.wikimedia.org/T164898> and T42626 > <https://phabricator.wikimedia.org/T42626> as well. > > *Improvement aspects* > > First aspect is drifts between different DBMSes. Sqlite schema is being > produced by regex replacement (this code > <https://github.com/wikimedia/mediawiki/blob/c477bcf2c5c482d3189ec3579c5dee444eb06f7d/includes/libs/rdbms/database/DatabaseSqlite.php#L898>) > which is less than great but at least it comes from one place. For > Postgres, its schema and MySQL/Sqlite has drifted so drastically, that > fixing it so far required 76 schema changes fixing issues ranging from > missing indexes to missing PKs, extra AUTO_INCREMENT where it shouldn't be, > missing DEFAULT values, drifting data types and much more. You can follow > the fixes of Postgres in here <https://phabricator.wikimedia.org/T164898>. > > The second aspect is the inconsistency in the schema itself. How do we > model strings? VARCHAR? VARBINARY()? VARCHAR() BINARY? (all three are > different things). You'd be surprised how inconsistent our MySQL is. So > far, we are migrating all VARCHAR() BINARY fields to VARBINARY() (so far > ten schema changes). > > Another inconsistency is timestamps. In MySQL, around half of them are > BINARY(14) and the other half VARBINARY(14) (but in Postgres all are > TIMESTAMPTZ), there is even a ticket > <https://phabricator.wikimedia.org/T42626> about it. It makes sense to > migrate all of them to BINARY(14) but not all timestamps are 14 characters, > e.g. expiry fields accept "infinity" as value and it's a valid timestamp in > Postgres ¯\_(ツ)_/¯ When you turn an expiry field to BINARY(14), "infinity" > becomes " infinity" and as the result mediawiki doesn't recognize it > as infinity ("infinity" != " infinity"). There are several ways to > move forward handling expiry fields, you can follow the discussion in this > gerrit patch <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/631936>. > > Another fun aspect: Booleans. MySQL doesn't have boolean, it translates > them to TINYINT(1) but other DBMSes don't have TINYINT, they have SMALLINT > and BOOL though (and we mostly use SMALLINT for them), we decided to go > with SMALLINT for these cases (which is different than what Doctrine DBAL > does, it uses BOOL, so we introduced our own custom type for booleans). > > Last but not least: ENUMs. MySQL and Postgres support that but Sqlite > doesn't. Doctrine DBAL doesn't support ENUM at all (as it's an > anti-pattern) while core has eight fields that are ENUM. There's an RFC to > discourage using it in general. Feel free to comment on it. > <https://phabricator.wikimedia.org/T119173> > > A miscellaneous note: The directories that hold the archive of sql patches > of schema change are exploding (some of the sql patches are even orphan but > we can't find them because there are so many of them). So I started a RFC > to clean that mess up: Drop support for database upgrade older than two > LTS releases <https://phabricator.wikimedia.org/T259771> > > *What's next?* > > - We continue to migrate more tables, hopefully we will get two third > of them by the end of the year (fingers crossed). You can follow the > progress in its ticket <https://phabricator.wikimedia.org/T230428>. > - We will support abstract schema changes, really soon, like in a > couple of weeks. Basically you start a json file containing snapshots of > before and after of a table and then a maintenance script will produce the > needed sql patches for you for different schemas. This will increase the > developer productivity drastically, since 1- Schema change sql files become > more reliable and consistent and less prone to errors like adding the > column to the wrong table in some DBMSes > > <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/328377/22/maintenance/mssql/archives/patch-user_groups-ug_expiry.sql#4> > 2- You don't need to know Postgres or Sqlite peculiarities to make patches > against it. The reason you need to proved the whole table for adding like > an index is that sqlite doesn't support all types of ALTER TABLES, you have > to create temporary tables, move the data around and then rename and drop > in some cases, producing beautiful sql patches like this > > <https://gerrit.wikimedia.org/r/c/mediawiki/core/+/630341/1/maintenance/sqlite/archives/patch-querycachetwo-qcc_title-varbinary.sql> > - We work on improving the script that reports drifts between core > and our production. I have already made it work with abstract schemas as > well, I will continue working on it to report even smaller differences like > field size, type, etc. Which is now much easier thanks to the abstract > schema. Slowly we will migrate that script to production (as part of SRE > scripts) and we will do automated reports and automated drift fixes (on > small wikis). You can follow the work on this ticket. > <https://phabricator.wikimedia.org/T104459> So far, this script is > being run manually but found more than thousand and thousands of drifts > across the cluster and all are fixed thanks to our amazing DBAs (look at > the ticket) > > > *How can I help?* > Glad you asked! You can follow the abstract-schema > <https://gerrit.wikimedia.org/r/q/hashtag:%22abstract-schema%22+(status:open%20OR%20status:merged)> > hashtag in gerrit and review patches or you can make them yourself (get > yourself familiar using the documentations > <https://www.mediawiki.org/wiki/Manual:Schema_changes>). If you maintain > an extension feel free to migrate its table(s) (and track it in this > ticket <https://phabricator.wikimedia.org/T259374>). If you use Postgres > for mediawiki, please help us with testing the improvements for Postgres. > > Thanks for reading this long email! > > Best > > -- > Amir (he/him) > > _______________________________________________ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l >
_______________________________________________ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l