[Bug 56840] Special:Allpages is too slow!

2014-05-11 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Bartosz DziewoƄski  changed:

   What|Removed |Added

   See Also||https://bugzilla.wikimedia.
   ||org/show_bug.cgi?id=65159

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #21 from Gabriel Wicke  ---
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #15)
> > > (and the offset grows the further you go).
> > 
> > Are we sure that this is the case? My reading of the code suggests that it
> > remains constant per wiki.
> 
> Re-reading, you're right.  Still O(N^3) to fetch all titles from a wiki with
> N
> titles because the offset, although constant, is O(N).  O(N) 'pages' * O(N)
> 'queries per page' * O(N) 'rows touched per query due to the offset'.

O(N/100) (titles per section) times 100 (sections) is still O(N), so O(N^2)
total. Note that the number of sections is constant, not O(N).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #20 from Nik Everett  ---
(In reply to comment #19)
> (Hopefully $dbr->estimateRowCount() executes in O(1) time!)

For MySQL, PostgreSQL, and MSSQL it is O(1) because it uses an explain.  Looks
like Sqllite and Oracle do it with a COUNT(*) though.  I know Oracle hides its
explain output behind obtuse views that you may or may not have access to.  I'm
not sure about Sqllite.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-14 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #19 from C. Scott Ananian  ---
(In reply to comment #18)
> (In reply to comment #15)
> > (and the offset grows the further you go).
> 
> Are we sure that this is the case? My reading of the code suggests that it
> remains constant per wiki.

Re-reading, you're right.  Still O(N^3) to fetch all titles from a wiki with N
titles because the offset, although constant, is O(N).  O(N) 'pages' * O(N)
'queries per page' * O(N) 'rows touched per query due to the offset'.

(Hopefully $dbr->estimateRowCount() executes in O(1) time!)

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #18 from Gabriel Wicke  ---
(In reply to comment #15)
> I think it's actually the OFFSET clause (in this case, the first element of
> the
> LIMIT tuple) which is slowing things down.  There isn't an effective index
> that
> can quickly give you the "86786th article after
> Centennial_Trail_State_Park". 
> So instead it's sequentially scanning through 86,786 entries 

Yeah, that's 1/100 of 8.6 million pages. Very close to the nine million rows
from the original report.

> (and the offset grows the further you go).

Are we sure that this is the case? My reading of the code suggests that it
remains constant per wiki.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #17 from Tim Starling  ---
(In reply to comment #16)
> I'm curious how this seemingly suddenly became a problem. Was this is a
> MariaDB
> performance regression?

See the original report:

> My best guess is that some third-party crawler is trying to obtain the titles
> of all articles by paging through Special:Allpages --- and, as we found out
> the
> hard way --- generating the list that was is O(N^3) where N is how many pages
> into the list you are.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #16 from MZMcBride  ---
I'm curious how this seemingly suddenly became a problem. Was this is a MariaDB
performance regression?

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #15 from C. Scott Ananian  ---
I think it's actually the OFFSET clause (in this case, the first element of the
LIMIT tuple) which is slowing things down.  There isn't an effective index that
can quickly give you the "86786th article after Centennial_Trail_State_Park". 
So instead it's sequentially scanning through 86,786 entries (and the offset
grows the further you go).

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Gabriel Wicke  changed:

   What|Removed |Added

 CC||gwi...@wikimedia.org

--- Comment #14 from Gabriel Wicke  ---
Out of curiosity, do we actually know why the section query seems to touch all
pages? From what I can tell the offset is always <# of pages in wiki> / 100,
and the second >= page_title clause is incremented on each iteration. Maybe the
MySQL optimizer is not clever enough to figure out that it can drop the first
page_title clause:

(page_title >= '1887') AND (page_title >= 'Centennial_Trail_State_Park')

If that is the case, then changing the query to only use the second clause
should bring the complexity down to O(N) per page for O(N^2) total.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Andre Klapper  changed:

   What|Removed |Added

   Priority|High|Highest
 Status|PATCH_TO_REVIEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #13 from Andre Klapper  ---
Don't see any remaining open patches here. Reclosing.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-13 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Andre Klapper  changed:

   What|Removed |Added

   Priority|Highest |High

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #12 from Gerrit Notification Bot  ---
Change 95085 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95085

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #11 from Gerrit Notification Bot  ---
Change 95084 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95084

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #10 from Gerrit Notification Bot  ---
Change 95085 had a related patch set uploaded by Ori.livneh:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95085

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #9 from Gerrit Notification Bot  ---
Change 95084 had a related patch set uploaded by Ori.livneh:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/95084

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Gerrit Notification Bot  changed:

   What|Removed |Added

 Status|RESOLVED|PATCH_TO_REVIEW
 Resolution|FIXED   |---

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #8 from Ori Livneh  ---
Tim's patch resolves the immediate operational issue. I moved the feature
discussion to wikitech-l:
.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Ori Livneh  changed:

   What|Removed |Added

 Status|PATCH_TO_REVIEW |RESOLVED
 Resolution|--- |FIXED

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-12 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #7 from Gerrit Notification Bot  ---
Change 94690 merged by jenkins-bot:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/94690

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #6 from Gerrit Notification Bot  ---
Change 94690 had a related patch set uploaded by Tim Starling:
In Special:AllPages, limit the size of hierarchical lists

https://gerrit.wikimedia.org/r/94690

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Gerrit Notification Bot  changed:

   What|Removed |Added

 Status|NEW |PATCH_TO_REVIEW

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Tim Starling  changed:

   What|Removed |Added

 CC||tstarl...@wikimedia.org

--- Comment #5 from Tim Starling  ---
The hierarchical contents list feature is a cute nostalgic reference to the
idea of volumes of paper encyclopedias, but it is difficult to implement
efficiently. I think it's doubtful that anyone derives a significant benefit
from it on a wiki the size of en.wikipedia.org. We could just replace it with a
simple AlphabeticPager, like the one on Special:ListUsers.

That is, unless someone has an idea of how to make this truly efficient, like
O(1) in the total number of articles on the wiki? Surely we would set the bar
that high if we were introducing the feature today.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-10 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #4 from Sean Pringle  ---
After the outage mentioned in comment #1, there are temporary pt-kill jobs
running on S1 slaves to snipe this query if it runs over 30s or appears in
batches. This is to keep the slaves alive.

If people as well as crawlers get affected, that's why.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Nik Everett  changed:

   What|Removed |Added

 CC||neverett+bugzilla@wikimedia
   ||.org

--- Comment #3 from Nik Everett  ---
You can probably make the query a lot faster by removing the offset and adding
a WHERE page_title > $last_title$.  It changes the feature slightly but
iterating using something unique that you are sorted on and have indexed is
generally pretty quick and prevents the duplicates you can get with OFFSET
clauses.  I think this is worth doing even if we don't want stuff crawling the
end point because it closes up a vector of attack.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #2 from MZMcBride  ---
(In reply to comment #0)
> My best guess is that some third-party crawler is trying to obtain the titles
> of all articles by paging through Special:Allpages --- and, as we found out
> the hard way --- generating the list that was is O(N^3) where N is how many
> pages into the list you are.

No crawler or similarly automated process should be hitting
index.php?title=Special:AllPages. MediaWiki has a robust Web API at
api.php?action=query&list=allpages.

If a particular crawler or two are disrupting production, blocking those
crawlers by IP or User-Agent would be the best temporary fix. Though obviously
we'll need to evaluate and address the underlying query issue as well.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

--- Comment #1 from Faidon Liambotis  ---
Urgency highest, as this is actively producing problems on production right now
and, in fact, the above query was an outlier in yesterday's incident
investigation.

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Faidon Liambotis  changed:

   What|Removed |Added

   Priority|Unprioritized   |Highest
 CC||fai...@wikimedia.org

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 56840] Special:Allpages is too slow!

2013-11-09 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=56840

Sam Reed (reedy)  changed:

   What|Removed |Added

   Keywords||performance

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l