https://bugzilla.wikimedia.org/show_bug.cgi?id=56840
Web browser: --- Bug ID: 56840 Summary: Special:Allpages is too slow! Product: MediaWiki Version: unspecified Hardware: All OS: All Status: NEW Severity: normal Priority: Unprioritized Component: Special pages Assignee: wikibugs-l@lists.wikimedia.org Reporter: canan...@wikimedia.org Classification: Unclassified Mobile Platform: --- The following query was pulled from ishmael: /* SpecialAllpages::showToplevel */ select page_title from `page` where page_namespace = ? and page_is_redirect = ? and (page_title >= ?) and (page_title >= ?) order by page_title limit ? Real sample: SELECT /* SpecialAllpages::showToplevel 108.35.125.176 */ page_title FROM `page` WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_title >= '1887') AND (page_title >= 'Centennial_Trail_State_Park') ORDER BY page_title ASC LIMIT 86786,2 Hundreds of concurrent instances of this have been seen. They touched ~9M rows each, took over 100s, and had problematic clauses like that amazing LIMIT. They are not duplicates, having different page_title values and limit offsets (yet always a limit count of 2). Furthermore batches of this query have been increasing in frequency. The query itself comes from: http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L225 including that wonderful 'limit ...,2'. It has this wonderful comment (at http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L175): # TODO: Either make this *much* faster or cache the title index points # in the querycache table. ...so that should probably be done. My best guess is that some third-party crawler is trying to obtain the titles of all articles by paging through Special:Allpages --- and, as we found out the hard way --- generating the list that was is O(N^3) where N is how many pages into the list you are. If we "cach[ing] the title index points" we can get the complexity down to O(N^2). Which is still bad, but an improvement. To make the page request (amortized) O(1) we need to generate the index points for the entire list of articles in a single traversal of all the articles (possibly done in chunks, possibly done by a background task). -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l