https://bugzilla.wikimedia.org/show_bug.cgi?id=34156
Web browser: --- Bug #: 34156 Summary: SiteStatsInit::refresh() triggered inappropriately, caused downtime Product: MediaWiki Version: 1.18 Platform: All OS/Version: All Status: NEW Severity: normal Priority: Unprioritized Component: Database AssignedTo: wikibugs-l@lists.wikimedia.org ReportedBy: tstarl...@wikimedia.org Classification: Unclassified On pl.wikipedia.org from 05:58:45 onwards, SiteStatsInit::refresh() began to be called several times per second. It's not known at this stage why SiteStats::isSane() returned false. The binlog shows that the refresh() queries were often executed in autocommit mode, meaning that the DELETE query was committed before the INSERT query began. This would have caused isSane() to return false until the new row insert was committed, leading to a flood of attempted refreshes. Eventually, a flood of SELECT COUNT(*) queries at around 07:10 caused an overload on all s2 slaves, leading to an overload of the apache pool and site-wide downtime. SiteStatsInit was disabled and all related queries were killed. When the dust settled, the site_stats row was missing, and had to be recovered from binlogs. I suggest removing the isSane() checks from loadAndLazyInit(), and doing a refresh only from maintenance scripts or web-based upgrade. SiteStats::load() should be able to tolerate a missing site_stats row, and the accessor functions should return false without giving a PHP warning. Additionally, the refresh should be done with REPLACE instead of DELETE and INSERT. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l