As most of us already know, replag on enwiki has been going up and up
since about 30 June. As it says on status.toolserver.org, "Hight replag
because of inserting of many SHA1-hashes."  (Note to DaB.: the first
word should be spelled "High".)

I asked DaB. on IRC how long this might go on, and he replied one to two
weeks.  However, I've since done some independent investigation that
suggests that his estimate might be a little low.

It turns out that there are three large blocks of consecutive entries in
the revision database that need to be populated with SHA1 hashes. 
Apparently there are three processes running in parallel on the WMF
servers that are filling in each of these blocks from the bottom, by
numerical order of rev_id.  Knowing this, we can estimate how many
revisions still need to be populated at any given point; and, taking
such estimates at various points in time, can estimate how long the
process will take.  (Needless to say, this is only an estimate since the
rate at which database changes are processed on the toolserver side is
variable; also, the blocks of rev_ids are not actually consecutive due
to deletions, but we can assume for our purposes that the deleted
revisions are distributed uniformly throughout the database.)

It further turns out that it is only possible to compute this estimate
for sql-s1-user (thyme), because the enwiki_p view on sql-s1-rr
(rosemary) does not have the rev_sha1 field at all (!).  It appears that
the server on rosemary is receiving millions of database updates each
day from WMF and throwing them in the bit bucket.

Anyway, based on four observations spaced at 6 hour intervals, it
appears that thyme is populating about 353,000 revisions per hour, or
8.5 million per day.  A simple trendline analysis shows that, at this
rate, completing the 230,000,000 remaining unpopulated revisions will
take about 27 more days (estimated completion Aug 6 at 17:48 UTC).

Anyone who relies on use of the enwiki_p database should expect a
prolonged continuation of degraded service and steadily increasing
replag.
-- 
  Russell Blau
  russb...@imapmail.org


_______________________________________________
Toolserver-l mailing list (Toolserver-l@lists.wikimedia.org)
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to