Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Thomas Dalton
2009/6/26 Brion Vibber br...@wikimedia.org:
 Tim Starling wrote:
 It's quite a complex feature. If you have a server that deadlocks or
 is otherwise extremely slow, then it will block rendering for all
 other attempts, meaning that the article can not be viewed at all.
 That scenario could even lead to site-wide downtime, since threads
 waiting for the locks could consume all available apache threads, or
 all available DB connections.

 It's a reasonable idea, but implementing it would require a careful
 design, and possibly some other concepts like per-article thread count
 limits.

 *nod* We should definitely ponder the issue since it comes up
 intermittently but regularly with big news events like this. At the
 least if we can have some automatic threshold that temporarily disables
 or reduces hits on stampeded pages that'd be spiffy...

Of course, the fact that everyone's first port of call after hearing
such news is to check the Wikipedia page is a fantastic thing, so it
would be really unfortunate if we have to stop people doing that.
Would it be possible, perhaps, to direct all requests for a certain
page through one server so the rest can continue to serve the rest of
the site unaffected? Or perhaps excessively popular pages could be
rendered (for anons) as part of the editing process, rather than the
viewing process, since that would mean each version of the article is
rendered only once (for anons) and would just slow down editing
slightly (presumably by a fraction of a second), which we can live
with. There must be something we can do that allows people to continue
viewing the page wherever possible.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Aryeh Gregor
On Fri, Jun 26, 2009 at 6:33 AM, Thomas Daltonthomas.dal...@gmail.com wrote:
 Of course, the fact that everyone's first port of call after hearing
 such news is to check the Wikipedia page is a fantastic thing, so it
 would be really unfortunate if we have to stop people doing that.

He didn't say we'd shut down views for the article, just we'd shut
down reparsing or cache invalidation or something.  This is the live
hack that was applied yesterday:

Index: includes/parser/ParserCache.php
===
--- includes/parser/ParserCache.php (revision 52359)
+++ includes/parser/ParserCache.php (working copy)
 -63,6 +63,7 @@
if ( is_object( $value ) ) {
wfDebug( Found.\n );
# Delete if article has changed since the cache was made
+   if( $article-mTitle-getPrefixedText() != 'Michael 
Jackson' ) {
// temp hack!
$canCache = $article-checkTouched();
$cacheTime = $value-getCacheTime();
$touched = $article-mTouched;
 -82,6 +83,7 @@
}
wfIncrStats( pcache_hit );
}
+   }// temp hack!
} else {
wfDebug( Parser cache miss.\n );
wfIncrStats( pcache_miss_absent );

It just meant that people were seeing outdated versions of the article.

 Would it be possible, perhaps, to direct all requests for a certain
 page through one server so the rest can continue to serve the rest of
 the site unaffected?

Every page view involves a number of servers, and they're not all
interchangeable, so this doesn't make a lot of sense.

 Or perhaps excessively popular pages could be
 rendered (for anons) as part of the editing process, rather than the
 viewing process, since that would mean each version of the article is
 rendered only once (for anons) and would just slow down editing
 slightly (presumably by a fraction of a second), which we can live
 with.

You think that parsing a large page takes a fraction of a second?  Try
twenty or thirty seconds.

But this sounds like a good idea.  If a process is already parsing the
page, why don't we just have other processes display an old cached
version of the page instead of waiting or trying to reparse
themselves?  The worst that would happen is some users would get old
views for a couple of minutes.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Roan Kattouw
2009/6/26 Aryeh Gregor simetrical+wikil...@gmail.com:
 But this sounds like a good idea.  If a process is already parsing the
 page, why don't we just have other processes display an old cached
 version of the page instead of waiting or trying to reparse
 themselves?  The worst that would happen is some users would get old
 views for a couple of minutes.

This is a very good idea, and sounds much better than having those
other processes wait for the first process to finish parsing. It would
also reduce the severity of the deadlocks occurring when a process
gets stuck on a parse or dies in the middle of it: instead of
deadlocking, the other processes would just display stale versions
instead of wasting time. If we design these parser cache locks to
expire after a few minutes or so, it should work just fine.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-26 Thread Domas Mituzas

 This is a very good idea, and sounds much better than having those

the major problem with all dirty caching is that we have more than one  
caching layer, and of course, things abort.

the fact, that people should be shown dirty versions instead of proper  
article leads to situation where in case of vandal fighting, etc,  
people will see stale versions, instead of waiting few seconds and  
getting real one.

In theory, update flow could look like this:

1. Set I'm working on this in a parallelism coordinator or lock  
manager
2. Do all database transactions  commit
3. Parse
4. Set memcached object
5. Invalidate squid objects

Now, should we parse, block or serve stale, could be dynamic, e.g. if  
we detect more than x parallel parses we fall back to blocking for few  
seconds, once we detect more than y of blocked threads on the task, or  
block expires and there's no fresh content yet (or there's new  
copy.. ) - then stale stuff can be served.
In perfect world that asks for specialized software :)

Do note, for past quite a few years we did lots and lots of work to  
avoid stale content being served. I would not see dirty serving as  
something we should be proud of ;-)

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-25 Thread Aryeh Gregor
On Thu, Jun 25, 2009 at 8:14 PM, Domas Mituzasmidom.li...@gmail.com wrote:
 The problem is quite simple, lots of people (like, million pageviews
 on an article in an hour) caused a cache stampede (all pageviews
 between invalidation and re-rendering needed parsing), and as MJ
 article is quite cite-heavy (and cite problems were outlined in 
 http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/41547
  ;) the reparsing was very very painful on our application cluster -
 all apache children eventually ended up doing lots of parsing work and
 consuming connection slots to pretty much everything :)

So if two page views are trying to view the same uncached page at the
same time with the same settings, the later ones should all block on
the first one's reparsing instead of doing it themselves.  It should
provide faster service for big articles too, even ignoring load, since
the earlier parse will be done before you could finish yours anyway.

That seems pretty easy to do.  You'd have some delays if everything
waited on a process that died or something, of course.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Current events-related overloads

2009-06-25 Thread Tim Starling
Aryeh Gregor wrote:
 On Thu, Jun 25, 2009 at 8:14 PM, Domas Mituzasmidom.li...@gmail.com wrote:
 The problem is quite simple, lots of people (like, million pageviews
 on an article in an hour) caused a cache stampede (all pageviews
 between invalidation and re-rendering needed parsing), and as MJ
 article is quite cite-heavy (and cite problems were outlined in 
 http://article.gmane.org/gmane.science.linguistics.wikipedia.technical/41547
  ;) the reparsing was very very painful on our application cluster -
 all apache children eventually ended up doing lots of parsing work and
 consuming connection slots to pretty much everything :)
 
 So if two page views are trying to view the same uncached page at the
 same time with the same settings, the later ones should all block on
 the first one's reparsing instead of doing it themselves.  It should
 provide faster service for big articles too, even ignoring load, since
 the earlier parse will be done before you could finish yours anyway.

It's quite a complex feature. If you have a server that deadlocks or
is otherwise extremely slow, then it will block rendering for all
other attempts, meaning that the article can not be viewed at all.
That scenario could even lead to site-wide downtime, since threads
waiting for the locks could consume all available apache threads, or
all available DB connections.

It's a reasonable idea, but implementing it would require a careful
design, and possibly some other concepts like per-article thread count
limits.

-- Tim Starling


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Current events-related overloads

2009-06-25 Thread Brion Vibber
Tim Starling wrote:
 It's quite a complex feature. If you have a server that deadlocks or
 is otherwise extremely slow, then it will block rendering for all
 other attempts, meaning that the article can not be viewed at all.
 That scenario could even lead to site-wide downtime, since threads
 waiting for the locks could consume all available apache threads, or
 all available DB connections.
 
 It's a reasonable idea, but implementing it would require a careful
 design, and possibly some other concepts like per-article thread count
 limits.

*nod* We should definitely ponder the issue since it comes up 
intermittently but regularly with big news events like this. At the 
least if we can have some automatic threshold that temporarily disables 
or reduces hits on stampeded pages that'd be spiffy...

-- brion

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l