https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

       Web browser: ---
            Bug ID: 56840
           Summary: Special:Allpages is too slow!
           Product: MediaWiki
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: Special pages
          Assignee: wikibugs-l@lists.wikimedia.org
          Reporter: canan...@wikimedia.org
    Classification: Unclassified
   Mobile Platform: ---

The following query was pulled from ishmael:

/* SpecialAllpages::showToplevel */ select page_title from `page` where
page_namespace = ? and page_is_redirect = ? and (page_title >= ?) and
(page_title >= ?) order by page_title limit ?

Real sample:

SELECT /* SpecialAllpages::showToplevel 108.35.125.176 */  page_title  FROM
`page`   WHERE page_namespace = '0' AND page_is_redirect = '0' AND (page_title
>= '1887') AND (page_title >= 'Centennial_Trail_State_Park')  ORDER BY
page_title ASC LIMIT 86786,2

Hundreds of concurrent instances of this have been seen. They touched ~9M rows
each, took over 100s, and had problematic clauses like that amazing LIMIT. They
are not duplicates, having different page_title values and limit offsets (yet
always a limit count of 2).

Furthermore batches of this query have been increasing in frequency.

The query itself comes from:
http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L225
including that wonderful 'limit ...,2'.

It has this wonderful comment (at
http://git.wikimedia.org/blob/mediawiki%2Fcore.git/ceba5987b0d36f134ffcd9e4f704d1608fe88b79/includes%2Fspecials%2FSpecialAllpages.php#L175):

# TODO: Either make this *much* faster or cache the title index points
# in the querycache table.

...so that should probably be done.

My best guess is that some third-party crawler is trying to obtain the titles
of all articles by paging through Special:Allpages --- and, as we found out the
hard way --- generating the list that was is O(N^3) where N is how many pages
into the list you are.

If we "cach[ing] the title index points" we can get the complexity down to
O(N^2).  Which is still bad, but an improvement.

To make the page request (amortized) O(1) we need to generate the index points
for the entire list of articles in a single traversal of all the articles
(possibly done in chunks, possibly done by a background task).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to