Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-09-27 Thread jidanni
> "AG" == Aryeh Gregor  writes:

AG> Facebook...

Speaking of which, I hear they compile their PHP for extra speed. Anyway,
http://www.useit.com/alertbox/response-times.html mentions the pain of
reading slow sites.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-13 Thread Aryeh Gregor
On Fri, Aug 13, 2010 at 6:55 AM, Tei  wrote:
> I am not dissapointed.  The wiki model make it hard, because
> everything can be modified, because the whole thing is giganteous and
> have a innertia, and the need to support a giganteous list of
> languages that will make the United Nations looks like timid.

Actually, wikis are much easier to optimize than most other classes of
apps.  The pages only change rarely compared to something like
Facebook or Google, which really has to regenerate every single page
customized to the view.  That's why we get by with so little money
compared to real organizations.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-13 Thread Tei
On 12 August 2010 00:01, Domas Mituzas  wrote:
...
>
> I'm sorry to disappoint you but none of the issues you wrote down here are 
> any new.
> If after reading any books or posts you think we have deficiencies, mostly it 
> is because of one of two reasons, either because we're lazy and didn't 
> implement, or because it is something we need to maintain wiki model.
>

I am not dissapointed.  The wiki model make it hard, because
everything can be modified, because the whole thing is giganteous and
have a innertia, and the need to support a giganteous list of
languages that will make the United Nations looks like timid.  And I
know you guys are a awesome bunch. And lots of eyes has ben put on the
problems.

This make mediawiki a ideal scenario to think about tecniques to make
the web faster.

Heres a cookie, a really nice plugin for firebug to check speed.
http://code.google.com/p/page-speed/


-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Platonides
Aryeh Gregor wrote:
> On Wed, Aug 11, 2010 at 5:55 PM, Roan Kattouw  wrote:
>> Is that a server-side redirect, or is it done in JS? In the latter
>> case, it taking long would make sense, and would actually be slowed
>> down by moving all 

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Aryeh Gregor
On Wed, Aug 11, 2010 at 5:55 PM, Roan Kattouw  wrote:
> Is that a server-side redirect, or is it done in JS? In the latter
> case, it taking long would make sense, and would actually be slowed
> down by moving all 

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Domas Mituzas
Hi!

<3 enthusiasm :)

> 1)
> This is not a website "http://en.wikipedia.org";, is a redirection to this:
> http://en.wikipedia.org/wiki/Main_Page
> Can't "http://en.wikipedia.org/wiki/Main_Page"; be served from
> "http://en.wikipedia.org";?

Our major entrance is not via main page usually, so this would be a niche 
optimization that does not really matter that much (well, ~2% of article views 
go to main page, and only 15% of that are loading http://en.wikipedia.org/, 
and... :)

> 2)
> The CSS load fine.  \o/

No, they don't, at least not on first pageview. 

> Probabbly the combining effort will save speed anyway.

Yes. We have way too many separate css assets. 

> A bunch of js files!, and load one after another, secuential. This is
> worse than a C program written to a file from disk reading byte by
> byte. !!

Actually, if a program reads byte by byte, whole page is already cached by OS, 
so it is not that expensive ;-) 
And yes, we know that we have a bit too many JS files loaded, and there's work 
to fix that (Roan wrote about that). 

> Combining will probably save a lot. Or using a strategy to force the
> browser to concurrent download + lineal execute, these files.

:-) Thanks for stating obvious. 

> 
> 5)
> There are a lot of img files. Do the page really need than much? sprinting?.

It is PITA to sprite (not sprint) community uploaded images, and again, that 
would work only for front page, which is not our main target. Skin should of 
course be sprited. 

> Total: 13.63 seconds.

Quite slow connection you've got there. I get 1s rendering times with 
cross-atlantic trips (and much better times if I get served by European caches 
:)

> You guys want to make this faster with cache optimization. But maybe
> is not bandwith the problem, but latency. Latency accumulate even with
> HEAD request that result in 302.   All the 302 in the world will not
> make the page feel smooth, if already acummulate into 3+ seconds
> territory.   ...Or I am wrong?

You are. First of all, skin assets are not doing IMS requests, they are all 
cached. 
We force browsers to do IMS on page views so that browsers would pick up edits 
(it is a wiki). 

> Probably is a much better idea to read that book that my post

I'm sorry to disappoint you but none of the issues you wrote down here are any 
new. 
If after reading any books or posts you think we have deficiencies, mostly it 
is because of one of two reasons, either because we're lazy and didn't 
implement, or because it is something we need to maintain wiki model. 

Though of course, it is all fresh and scared you for life, we've been doing 
this for life. ;-) 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Roan Kattouw
2010/8/11 Aryeh Gregor :
> I've noticed that when browsing from my phone, the redirect to m. is a
> noticeable delay, sometimes a second or more.  We don't serve many
> redirects other than that, though, AFAIK.
>
Is that a server-side redirect, or is it done in JS? In the latter
case, it taking long would make sense, and would actually be slowed
down by moving all 

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Aryeh Gregor
On Wed, Aug 11, 2010 at 12:51 PM, Tei  wrote:
> Reading this book has scared me for life.  There are things that are
> worst than I trough.  JS forcing everything monothread (even stoping
> the download of new resources!)... while it download ..and while it
> executes.

In newer browsers this is no longer the case.  They can fetch other
resources while script is loading.  They can't begin rendering further
until the script finishes executing, but this isn't such a big issue,
since scripts usually don't do much work at the point of inclusion.
(As Roan says, work is undergoing to improve this, but I thought I'd
point out that it's not quite as bad as you say.)

> There are a lot of img files. Do the page really need than much? sprinting?.
>
> Total: 13.63 seconds.

Some usability stuff is sprited, I think.  Overall, though, spriting
is a pain in the neck, and we don't load enough images that it's
necessarily worth it to sprite too aggressively.  Image loads don't
block page layout, so it's not a huge deal.  I think script
optimization is much more important right now.

> You guys want to make this faster with cache optimization. But maybe
> is not bandwith the problem, but latency. Latency accumulate even with
> HEAD request that result in 302.   All the 302 in the world will not
> make the page feel smooth, if already acummulate into 3+ seconds
> territory.   ...Or I am wrong?

I've noticed that when browsing from my phone, the redirect to m. is a
noticeable delay, sometimes a second or more.  We don't serve many
redirects other than that, though, AFAIK.

> Probably is a much better idea to read that book that my post

I've read Steve Souders' High-Performance Websites, which is probably
pretty similar in content.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Roan Kattouw
2010/8/11 Tei :
> 1)
> This is not a website "http://en.wikipedia.org";, is a redirection to this:
> http://en.wikipedia.org/wiki/Main_Page
> Can't "http://en.wikipedia.org/wiki/Main_Page"; be served from
> "http://en.wikipedia.org";?
>
> Wait.. this will break relative links on the frontpage, but.. these
> are absolute!   title="Wikipedia">Wikipedia
>
That would get complicated with Squid cache AFAIK. One redirect (which
is also a 301 Moved Permanently, which means clients may cache the
redirect destination) isn't that bad, right?

> 4)
> A bunch of js files!, and load one after another, secuential. This is
> worse than a C program written to a file from disk reading byte by
> byte. !!
> Combining will probably save a lot. Or using a strategy to force the
> browser to concurrent download + lineal execute, these files.
>
I'll quote my own post from this thread:
>> The resourceloader branch contains work in progress on aggressively
>> combining and minifying JavaScript and CSS. The mapping of one
>> resource = one file will be preserved, but the mapping of one resource
>> = one REQUEST will die: it'll be possible, and encouraged, to obtain
>> multiple resources in one request.
We're aware of this problem, or we wouldn't be spending paid
developers' time on this resource loader project.

> You guys want to make this faster with cache optimization. But maybe
> is not bandwith the problem, but latency. Latency accumulate even with
> HEAD request that result in 302.   All the 302 in the world will not
> make the page feel smooth, if already acummulate into 3+ seconds
> territory.   ...Or I am wrong?
>
I'm assuming you mean 304 (Not Modified)? 302 (Found) means the same
as 301 except it's not cacheable.

We're not intending to do many requests resulting in 304s, we're
intending on reducing the number of requests made and on keeping the
long client-side cache expiry times (Cache-Control:
maxage=largenumber) that we already use.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-11 Thread Tei
On 2 August 2010 15:24, Roan Kattouw  wrote:
> 2010/8/2 Tei :
>> Maybe a theme can get the individual icons that the theme use, and
>> combine it all in a single png file.
>>
> This technique is called spriting, and the single combined image file
> is called a sprite. We've done this with e.g. the enhanced toolbar
> buttons, but it doesn't work in all cases.
>
>> Maybe the idea than resource=file must die in 2011 internet :-/
>>
> The resourceloader branch contains work in progress on aggressively
> combining and minifying JavaScript and CSS. The mapping of one
> resource = one file will be preserved, but the mapping of one resource
> = one REQUEST will die: it'll be possible, and encouraged, to obtain
> multiple resources in one request.
>



A friend a recomended to me a excellent book (yes books are still
usefull on this digital age).  Is called "Even Faster Websites".
Everyone sould make his company buy this book. Is excellent.

Reading this book has scared me for life.  There are things that are
worst than I trough.  JS forcing everything monothread (even stoping
the download of new resources!)... while it download ..and while it
executes.   How about a 90% of the code is not needed in onload, but
is loaded before onload anyway. Probably is a much better idea to read
that book that my post (thats a good line, I will end my email with
it).

Some comments on Wikipedia speed:


1)
This is not a website "http://en.wikipedia.org";, is a redirection to this:
http://en.wikipedia.org/wiki/Main_Page
Can't "http://en.wikipedia.org/wiki/Main_Page"; be served from
"http://en.wikipedia.org";?

Wait.. this will break relative links on the frontpage, but.. these
are absolute!  Wikipedia

2)
The CSS load fine.  \o/
Probabbly the combining effort will save speed anyway.

3)
Probably the CSS rules can be optimized for speed )-:
Probably not.

4)
A bunch of js files!, and load one after another, secuential. This is
worse than a C program written to a file from disk reading byte by
byte. !!
Combining will probably save a lot. Or using a strategy to force the
browser to concurrent download + lineal execute, these files.

5)
There are a lot of img files. Do the page really need than much? sprinting?.

Total: 13.63 seconds.


You guys want to make this faster with cache optimization. But maybe
is not bandwith the problem, but latency. Latency accumulate even with
HEAD request that result in 302.   All the 302 in the world will not
make the page feel smooth, if already acummulate into 3+ seconds
territory.   ...Or I am wrong?

Probably is a much better idea to read that book that my post

-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread Happy-melon

"Oldak Quill"  wrote in message 
news:aanlktik8sqmaetwvg8eta+ca49i08rfbrmvicsms+...@mail.gmail.com...
> On 2 August 2010 12:13, Oldak Quill  wrote:
>> On 28 July 2010 20:13,   wrote:
>>> Seems to me playing the role of the average dumb user, that
>>> en.wikipedia.org is one of the rather slow websites of the many websites
>>> I browse.
>>>
>>> No matter what browser, it takes more seconds from the time I click on a
>>> link to the time when the first bytes of the HTTP response start flowing
>>> back to me.
>>>
>>> Seems facebook is more zippy.
>>>
>>> Maybe Mediawiki is not "optimized".
>>
>>
>> For what it's worth, Alexa.com lists the average load time of the
>> websites they catalogue. I'm not sure what the metrics they use are,
>> and I would guess they hit the squid cache and are in the United
>> States.
>>
>> Alexa.com list the following average load times as of now:
>>
>> wikipedia.org: Fast (1.016 Seconds), 74% of sites are slower.
>> facebook.com: Average (1.663 Seconds), 50% of sites are slower.
>
>
> An addendum to the above message:
>
> According to the Alexa.com help page "Average Load Times: Speed
> Statistics" (http://www.alexa.com/help/viewtopic.php?f=6&t=1042):
> "The Average Load Time ... [is] based on load times experienced by
> Alexa users, and measured by the Alexa Toolbar, during their regular
> web browsing."
>
> So although US browsers might be overrepresented in this sample (I'm
> just guessing, I have no figures to support this statement), the Alexa
> sample should include many non-US browsers, assuming that the figure
> reported by Alexa.com is reflective of its userbase.
>
And the average Alexa toolbar user is logged in to facebook and using it to 
see what their friends were up to last night, with masses of personalised 
content; while the average Alexa toolbar user is a reader seeing the same 
page as everyone else.  We definitely have the theoretical advantage.

--HM
 



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread Aryeh Gregor
On Mon, Aug 2, 2010 at 8:32 PM, Lars Aronsson  wrote:
> Couldn't you just tag every internal link with
> a separate class for the length of the target article,
> and then use different personal CSS to set the
> threshold? The generated page would be the same
> for all users:
>
> My Article

Until the page changes length.  That would force all articles that
link to it to be reparsed, unless we use some way to insert the
correct page lengths into a parsed page before serving it to the user.
 In which case we don't really need to do this, we can just insert the
stub class on the correct pages using the same mechanism.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread Platonides
Lars Aronsson wrote:
> On 08/01/2010 10:55 PM, Aryeh Gregor wrote:
>> One easy hack to reduce this problem is just to only provide a few
>> options for stub threshold, as we do with thumbnail size.  Although
>> this is only useful if we cache pages with nonzero stub threshold . .
>> . why don't we do that?  Too much fragmentation due to the excessive
>> range of options?
> 
> Couldn't you just tag every internal link with
> a separate class for the length of the target article,
> and then use different personal CSS to set the
> threshold? The generated page would be the same
> for all users:
> 
> My Article

That would be workable, eg. one class for articles smaller than 50
bytes, other for 100, 200, 250, 300, 400, 500, 600, 700, 800, 1000,
2000, 2500, 5000, 1 if it weren't for having to update all those
classes whenever the page changes.

It would work to add it as a separate stylesheet for stubs, though.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread John Vandenberg
On Tue, Aug 3, 2010 at 8:55 PM, K. Peachey  wrote:
> Would something like what is shown below get it even further down?
>
> a { color: blue }
> a.1_byte_article, a.2_byte_article, a.3_byte_article,
> ...

using an abbreviation like ba would also help.

Limiting the user pref to intervals of 10 bytes would also help.

Also, as this piece of CSS is being dynamically generated,it only
needs to include the variations that occur in the body of the
article's HTML.

Or the CSS can be generated by JS on the client side, which is what
Aryeh has been suggesting all along (I think).

btw, I thought Domas was kidding.  I got a chuckle out of it, at least.

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread K. Peachey
Would something like what is shown below get it even further down?

a { color: blue }
a.1_byte_article, a.2_byte_article, a.3_byte_article,
a.4_byte_article, a.5_byte_article, a.6_byte_article,
a.7_byte_article, a.8_byte_article, a.9_byte_article,
a.10_byte_article,a.11_byte_article, a.12_byte_article,
a.13_byte_article, a.14_byte_article, a.15_byte_article,
a.16_byte_article, a.17_byte_article, a.18_byte_article,
a.19_byte_article, a.20_byte_article, a.21_byte_article,
a.22_byte_article, a.23_byte_article, a.24_byte_article,
a.25_byte_article, a.26_byte_article, a.27_byte_article,
a.28_byte_article, a.29_byte_article, a.30_byte_article,
a.31_byte_article, a.32_byte_article, a.33_byte_article,
a.34_byte_article, a.35_byte_article, a.36_byte_article,
a.37_byte_article, a.38_byte_article, a.39_byte_article,
a.40_byte_article, a.41_byte_article, a.42_byte_article,
a.43_byte_article, a.44_byte_article, a.45_byte_article,
a.46_byte_article, a.47_byte_article, a.48_byte_article,
a.49_byte_article, a.50_byte_article, a.51_byte_article,
a.52_byte_article, a.53_byte_article, a.54_byte_article,
a.55_byte_article, a.56_byte_article, a.57_byte_article,
a.58_byte_article, a.59_byte_article, a.60_byte_article,
a.61_byte_article, a.62_byte_article, a.63_byte_article,
a.64_byte_article, a.65_byte_article, a.66_byte_article,
a.67_byte_article, a.68_byte_article, a.69_byte_article,
a.70_byte_article, a.71_byte_article, a.72_byte_article,
a.73_byte_article, a.74_byte_article, a.75_byte_article,
a.76_byte_article, a.77_byte_article, a.78_byte_article,
a.79_byte_article, a.80_byte_article, a.81_byte_article,
a.82_byte_article, a.83_byte_article, a.84_byte_article,
a.85_byte_article, a.86_byte_article, a.87_byte_article,
a.88_byte_article, a.89_byte_article, a.90_byte_article,
a.91_byte_article, a.92_byte_article, a.93_byte_article,
a.94_byte_article, a.95_byte_article { color: red }

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread Liangent
On 8/3/10, Lars Aronsson  wrote:
> Couldn't you just tag every internal link with
> a separate class for the length of the target article,
> and then use different personal CSS to set the
> threshold? The generated page would be the same
> for all users:

So if a page is changed, all pages linking to it need to be parsed
again. Will this cost even more?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-03 Thread Domas Mituzas
Hi!

> Couldn't you just tag every internal link with
> a separate class for the length of the target article,

Great idea, how come noone ever came up with this, I even have a stylesheet 
ready, here it is (do note, even it looks big in text, gzip gets it down to 10% 
so we can support this kind of granularity even up to a megabyte :)

Domas

a { color: blue }
a.1_byte_article { color: red; }
a.2_byte_article { color: red; }
a.3_byte_article { color: red; }
a.4_byte_article { color: red; }
a.5_byte_article { color: red; }
a.6_byte_article { color: red; }
a.7_byte_article { color: red; }
a.8_byte_article { color: red; }
a.9_byte_article { color: red; }
a.10_byte_article { color: red; }
a.11_byte_article { color: red; }
a.12_byte_article { color: red; }
a.13_byte_article { color: red; }
a.14_byte_article { color: red; }
a.15_byte_article { color: red; }
a.16_byte_article { color: red; }
a.17_byte_article { color: red; }
a.18_byte_article { color: red; }
a.19_byte_article { color: red; }
a.20_byte_article { color: red; }
a.21_byte_article { color: red; }
a.22_byte_article { color: red; }
a.23_byte_article { color: red; }
a.24_byte_article { color: red; }
a.25_byte_article { color: red; }
a.26_byte_article { color: red; }
a.27_byte_article { color: red; }
a.28_byte_article { color: red; }
a.29_byte_article { color: red; }
a.30_byte_article { color: red; }
a.31_byte_article { color: red; }
a.32_byte_article { color: red; }
a.33_byte_article { color: red; }
a.34_byte_article { color: red; }
a.35_byte_article { color: red; }
a.36_byte_article { color: red; }
a.37_byte_article { color: red; }
a.38_byte_article { color: red; }
a.39_byte_article { color: red; }
a.40_byte_article { color: red; }
a.41_byte_article { color: red; }
a.42_byte_article { color: red; }
a.43_byte_article { color: red; }
a.44_byte_article { color: red; }
a.45_byte_article { color: red; }
a.46_byte_article { color: red; }
a.47_byte_article { color: red; }
a.48_byte_article { color: red; }
a.49_byte_article { color: red; }
a.50_byte_article { color: red; }
a.51_byte_article { color: red; }
a.52_byte_article { color: red; }
a.53_byte_article { color: red; }
a.54_byte_article { color: red; }
a.55_byte_article { color: red; }
a.56_byte_article { color: red; }
a.57_byte_article { color: red; }
a.58_byte_article { color: red; }
a.59_byte_article { color: red; }
a.60_byte_article { color: red; }
a.61_byte_article { color: red; }
a.62_byte_article { color: red; }
a.63_byte_article { color: red; }
a.64_byte_article { color: red; }
a.65_byte_article { color: red; }
a.66_byte_article { color: red; }
a.67_byte_article { color: red; }
a.68_byte_article { color: red; }
a.69_byte_article { color: red; }
a.70_byte_article { color: red; }
a.71_byte_article { color: red; }
a.72_byte_article { color: red; }
a.73_byte_article { color: red; }
a.74_byte_article { color: red; }
a.75_byte_article { color: red; }
a.76_byte_article { color: red; }
a.77_byte_article { color: red; }
a.78_byte_article { color: red; }
a.79_byte_article { color: red; }
a.80_byte_article { color: red; }
a.81_byte_article { color: red; }
a.82_byte_article { color: red; }
a.83_byte_article { color: red; }
a.84_byte_article { color: red; }
a.85_byte_article { color: red; }
a.86_byte_article { color: red; }
a.87_byte_article { color: red; }
a.88_byte_article { color: red; }
a.89_byte_article { color: red; }
a.90_byte_article { color: red; }
a.91_byte_article { color: red; }
a.92_byte_article { color: red; }
a.93_byte_article { color: red; }
a.94_byte_article { color: red; }
a.95_byte_article { color: red; }
a.96_byte_article { color: red; }
a.97_byte_article { color: red; }
a.98_byte_article { color: red; }
a.99_byte_article { color: red; }
a.100_byte_article { color: red; }
a.101_byte_article { color: red; }
a.102_byte_article { color: red; }
a.103_byte_article { color: red; }
a.104_byte_article { color: red; }
a.105_byte_article { color: red; }
a.106_byte_article { color: red; }
a.107_byte_article { color: red; }
a.108_byte_article { color: red; }
a.109_byte_article { color: red; }
a.110_byte_article { color: red; }
a.111_byte_article { color: red; }
a.112_byte_article { color: red; }
a.113_byte_article { color: red; }
a.114_byte_article { color: red; }
a.115_byte_article { color: red; }
a.116_byte_article { color: red; }
a.117_byte_article { color: red; }
a.118_byte_article { color: red; }
a.119_byte_article { color: red; }
a.120_byte_article { color: red; }
a.121_byte_article { color: red; }
a.122_byte_article { color: red; }
a.123_byte_article { color: red; }
a.124_byte_article { color: red; }
a.125_byte_article { color: red; }
a.126_byte_article { color: red; }
a.127_byte_article { color: red; }
a.128_byte_article { color: red; }
a.129_byte_article { color: red; }
a.130_byte_article { color: red; }
a.131_byte_article { color: red; }
a.132_byte_article { color: red; }
a.133_byte_article { color: red; }
a.134_byte_article { color: red; }
a.135_byte_article { color: red; }
a.136_byte_article

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Lars Aronsson
On 08/01/2010 10:55 PM, Aryeh Gregor wrote:
> One easy hack to reduce this problem is just to only provide a few
> options for stub threshold, as we do with thumbnail size.  Although
> this is only useful if we cache pages with nonzero stub threshold . .
> . why don't we do that?  Too much fragmentation due to the excessive
> range of options?

Couldn't you just tag every internal link with
a separate class for the length of the target article,
and then use different personal CSS to set the
threshold? The generated page would be the same
for all users:

My Article



-- 
   Lars Aronsson (l...@aronsson.se)
   Aronsson Datateknik - http://aronsson.se



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Aryeh Gregor
On Mon, Aug 2, 2010 at 10:50 AM, John Vandenberg  wrote:
> Does that approach gain much over HTTP pipelining?

Yes, because browsers don't HTTP pipeline in practice, because
transparent proxies at ISPs cause sites to break if they do that, and
there's no reliable way to detect them.  Opera does pipelining and
blacklists bad ISPs or something, I think.  See:

https://bugzilla.mozilla.org/show_bug.cgi?id=264354

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Tei
On 2 August 2010 15:24, Roan Kattouw  wrote:
...
>> Maybe the idea than resource=file must die in 2011 internet :-/
>>
> The resourceloader branch contains work in progress on aggressively
> combining and minifying JavaScript and CSS. The mapping of one
> resource = one file will be preserved, but the mapping of one resource
> = one REQUEST will die: it'll be possible, and encouraged, to obtain
> multiple resources in one request.
>

:-O

That is awesome solution, considering the complex of the real world
problems. Elegant, and probably as side effect may remove some bloat.

-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread John Vandenberg
On Mon, Aug 2, 2010 at 11:24 PM, Roan Kattouw  wrote:
> The resourceloader branch contains work in progress on aggressively
> combining and minifying JavaScript and CSS. The mapping of one
> resource = one file will be preserved, but the mapping of one resource
> = one REQUEST will die: it'll be possible, and encouraged, to obtain
> multiple resources in one request.

Does that approach gain much over HTTP pipelining?

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Roan Kattouw
2010/8/2 Tei :
> Maybe a theme can get the individual icons that the theme use, and
> combine it all in a single png file.
>
This technique is called spriting, and the single combined image file
is called a sprite. We've done this with e.g. the enhanced toolbar
buttons, but it doesn't work in all cases.

> Maybe the idea than resource=file must die in 2011 internet :-/
>
The resourceloader branch contains work in progress on aggressively
combining and minifying JavaScript and CSS. The mapping of one
resource = one file will be preserved, but the mapping of one resource
= one REQUEST will die: it'll be possible, and encouraged, to obtain
multiple resources in one request.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Tei
On 2 August 2010 13:25, Domas Mituzas  wrote:
>> The first load of the homepage can be slow:
>> http://zerror.com/unorganized/wika/lader1.png
>> http://en.wikipedia.org/wiki/Main_Page
>> (I need a bigger monitor, the escalator don't fit on my screen)
>
> well, no wonder that first page load is sluggish, with 12 style sheets, and 
> 12 javascript files - there're plenty of low hanging fruits there.
>

Maybe a theme can get the individual icons that the theme use, and
combine it all in a single png file.

Updating any icon on that theme would result on updating the whole
"combine" image ( I herd ImageMagick have a tool just for that, so can
be scripted). This seems what sites like www.google.com and
www.gmail.com are doing.

I say theme, because I suppose all other images live in a wiki world
and can't be combined that way.

Maybe the idea than resource=file must die in 2011 internet :-/

-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Domas Mituzas
> The first load of the homepage can be slow:
> http://zerror.com/unorganized/wika/lader1.png
> http://en.wikipedia.org/wiki/Main_Page
> (I need a bigger monitor, the escalator don't fit on my screen)

well, no wonder that first page load is sluggish, with 12 style sheets, and 12 
javascript files - there're plenty of low hanging fruits there. 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l



Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Oldak Quill
On 2 August 2010 12:13, Oldak Quill  wrote:
> On 28 July 2010 20:13,   wrote:
>> Seems to me playing the role of the average dumb user, that
>> en.wikipedia.org is one of the rather slow websites of the many websites
>> I browse.
>>
>> No matter what browser, it takes more seconds from the time I click on a
>> link to the time when the first bytes of the HTTP response start flowing
>> back to me.
>>
>> Seems facebook is more zippy.
>>
>> Maybe Mediawiki is not "optimized".
>
>
> For what it's worth, Alexa.com lists the average load time of the
> websites they catalogue. I'm not sure what the metrics they use are,
> and I would guess they hit the squid cache and are in the United
> States.
>
> Alexa.com list the following average load times as of now:
>
> wikipedia.org: Fast (1.016 Seconds), 74% of sites are slower.
> facebook.com: Average (1.663 Seconds), 50% of sites are slower.


An addendum to the above message:

According to the Alexa.com help page "Average Load Times: Speed
Statistics" (http://www.alexa.com/help/viewtopic.php?f=6&t=1042):
"The Average Load Time ... [is] based on load times experienced by
Alexa users, and measured by the Alexa Toolbar, during their regular
web browsing."

So although US browsers might be overrepresented in this sample (I'm
just guessing, I have no figures to support this statement), the Alexa
sample should include many non-US browsers, assuming that the figure
reported by Alexa.com is reflective of its userbase.

-- 
Oldak Quill (oldakqu...@gmail.com)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Oldak Quill
On 28 July 2010 20:13,   wrote:
> Seems to me playing the role of the average dumb user, that
> en.wikipedia.org is one of the rather slow websites of the many websites
> I browse.
>
> No matter what browser, it takes more seconds from the time I click on a
> link to the time when the first bytes of the HTTP response start flowing
> back to me.
>
> Seems facebook is more zippy.
>
> Maybe Mediawiki is not "optimized".


For what it's worth, Alexa.com lists the average load time of the
websites they catalogue. I'm not sure what the metrics they use are,
and I would guess they hit the squid cache and are in the United
States.

Alexa.com list the following average load times as of now:

wikipedia.org: Fast (1.016 Seconds), 74% of sites are slower.
facebook.com: Average (1.663 Seconds), 50% of sites are slower.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Tei
On 28 July 2010 21:13,   wrote:
> Seems to me playing the role of the average dumb user, that
> en.wikipedia.org is one of the rather slow websites of the many websites
> I browse.
>
> No matter what browser, it takes more seconds from the time I click on a
> link to the time when the first bytes of the HTTP response start flowing
> back to me.
>
> Seems facebook is more zippy.

It seems fast here: 130ms.

The first load of the homepage can be slow:
http://zerror.com/unorganized/wika/lader1.png
http://en.wikipedia.org/wiki/Main_Page
(I need a bigger monitor, the escalator don't fit on my screen)



-- 
--
ℱin del ℳensaje.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Domas Mituzas
> That's what he did. Read the query.

;-) thats what happens when email gets ahead of coffee.

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Andrew Garrett
On Mon, Aug 2, 2010 at 5:35 PM, Domas Mituzas  wrote:
> Hi!
>
>> I.e., only about a quarter of users have been ported to
>> user_properties.  Why wasn't a conversion script run here?
>
> In theory if all properties are at defaults, user shouldn't be there. The 
> actual check should be against the blob field.

That's what he did. Read the query.

-- 
Andrew Garrett
http://werdn.us/

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-02 Thread Domas Mituzas
Hi!

> I.e., only about a quarter of users have been ported to
> user_properties.  Why wasn't a conversion script run here?

In theory if all properties are at defaults, user shouldn't be there. The 
actual check should be against the blob field.

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Aryeh Gregor
On Sun, Aug 1, 2010 at 6:24 PM, Aryeh Gregor
 wrote:
> That won't work, because it won't count users whose settings are all
> default.  However, we can tell who's switched because user_options
> will be empty.

SELECT COUNT(*) FROM user WHERE user_options = ''; SELECT COUNT(*) FROM user;
+--+
| COUNT(*) |
+--+
|  3491404 |
+--+
1 row in set (10 min 20.11 sec)

+--+
| COUNT(*) |
+--+
| 12822573 |
+--+
1 row in set (7 min 47.87 sec)

I.e., only about a quarter of users have been ported to
user_properties.  Why wasn't a conversion script run here?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Roan Kattouw
2010/8/1 Platonides :
> I think that the condition should have been the inverse (users with
> recent edits, not users which don't have old edits)
Oops. I thought I had reversed the condition correctly, but as you
point out I hadn't. I'll run the corrected queries tomorrow.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Aryeh Gregor
On Sun, Aug 1, 2010 at 5:03 PM, Roan Kattouw  wrote:
> I don't know. Cursory inspection seems to indicate user_properties is
> relatively complete, but comprehensive count queries are too slow for
> me to dare run them on the cluster. Maybe you could run something
> along the lines of SELECT COUNT(DISTINCT up_user) FROM
> user_properties; on the toolserver and compare it with SELECT COUNT(*)
> FROM user;

That won't work, because it won't count users whose settings are all
default.  However, we can tell who's switched because user_options
will be empty.

On Sun, Aug 1, 2010 at 5:48 PM, Platonides  wrote:
> Note that we do offer several options, not only the free-text field. I
> think that the underlying problem is that when changing an article from
> 98 bytes to 102, we would need to invalidate all pages linking to it for
> stubthresholds of 100 bytes.

Aha, that must be it.  Any stub threshold would require extra page
invalidation, which we don't do because it would be pointlessly
expensive.  Postprocessing would fix the problem.

> Since the pages are reparsed, custom values are not a problem now.
> I think that to cache for the stubthresholds, we would need to cache
> just before the replaceLinkHolders() and perform the replacement at the
> user request.

Yep.  Or parse further, but leave markers lingering in the output
somehow.  We don't need to cache the actual wikitext, either way.  We
just need to cache at some point after all the heavy lifting has been
done, and everything that's left can be done in a couple of
milliseconds.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Platonides
Roan Kattouw wrote:
> 2010/8/1 Platonides:
>> Aryeh, can you do some statistics about the frequency of the different
>> stub thresholds? Perhaps restricted to people which edited this year, to
>> discard unused accounts.
>>
> He can't, but I can.  I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
> 
> Roan Kattouw (Catrope)

Thanks, Roan.
I think that the condition should have been the inverse (users with
recent edits, not users which don't have old edits) but anyway it shows
that with a few (8-10) values we could please almost everyone.

Also, it shows that people don't understand how to disable it. The tail
has many extremely large values which can only mean "don't treat stubs
different".


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Platonides
Roan Kattouw wrote:
>> One easy hack to reduce this problem is just to only provide a few
>> options for stub threshold, as we do with thumbnail size.  Although
>> this is only useful if we cache pages with nonzero stub threshold . .
>> . why don't we do that?  Too much fragmentation due to the excessive
>> range of options?
> Maybe; but the fact that the field is present but set to 0 in the
> parser cache key is very weird. SVN blame should probably be able to
> tell who did this and hopefully why.
> 
> Roan Kattouw (Catrope)

Look at Article::getParserOutput() on how $wgUser->getOption(
'stubthreshold' ) is explicitely check that it is 0 before enabling the
parser cache.
*There are several other entry points to the ParserCache in Article,
it's a bit mixed.


Note that we do offer several options, not only the free-text field. I
think that the underlying problem is that when changing an article from
98 bytes to 102, we would need to invalidate all pages linking to it for
stubthresholds of 100 bytes.

Since the pages are reparsed, custom values are not a problem now.
I think that to cache for the stubthresholds, we would need to cache
just before the replaceLinkHolders() and perform the replacement at the
user request.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Roan Kattouw
2010/8/1 Aryeh Gregor :
> On Sun, Aug 1, 2010 at 4:43 PM, Roan Kattouw  wrote:
>> He can't, but I can.  I ran a couple of queries and put the result at
>> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
>
> I can too -- I'm a toolserver root, so I have read-only access to
> pretty much the whole database (minus some omitted
> databases/tables/columns, mainly IP addresses and maybe private
> wikis).
Ah yes, I forgot about that. I was assuming you'd need access to the
live DB for this.

> But no need, since you already did it.  :)  The data isn't
> complete because not all users have been ported to user_properties,
> right?
>
I don't know. Cursory inspection seems to indicate user_properties is
relatively complete, but comprehensive count queries are too slow for
me to dare run them on the cluster. Maybe you could run something
along the lines of SELECT COUNT(DISTINCT up_user) FROM
user_properties; on the toolserver and compare it with SELECT COUNT(*)
FROM user;

> One easy hack to reduce this problem is just to only provide a few
> options for stub threshold, as we do with thumbnail size.  Although
> this is only useful if we cache pages with nonzero stub threshold . .
> . why don't we do that?  Too much fragmentation due to the excessive
> range of options?
Maybe; but the fact that the field is present but set to 0 in the
parser cache key is very weird. SVN blame should probably be able to
tell who did this and hopefully why.

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Aryeh Gregor
On Sun, Aug 1, 2010 at 4:43 PM, Roan Kattouw  wrote:
> He can't, but I can.  I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold

I can too -- I'm a toolserver root, so I have read-only access to
pretty much the whole database (minus some omitted
databases/tables/columns, mainly IP addresses and maybe private
wikis).  But no need, since you already did it.  :)  The data isn't
complete because not all users have been ported to user_properties,
right?

One easy hack to reduce this problem is just to only provide a few
options for stub threshold, as we do with thumbnail size.  Although
this is only useful if we cache pages with nonzero stub threshold . .
. why don't we do that?  Too much fragmentation due to the excessive
range of options?

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Chad
On Sun, Aug 1, 2010 at 1:43 PM, Roan Kattouw  wrote:
> 2010/8/1 Platonides :
>> Aryeh, can you do some statistics about the frequency of the different
>> stub thresholds? Perhaps restricted to people which edited this year, to
>> discard unused accounts.
>>
> He can't, but I can.  I ran a couple of queries and put the result at
> http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold
>

Isn't stub threshold a *reading* preference? It wouldn't be
unreasonable to assume that someone could have that
preference set and not regularly edit.

Also doesn't take into account people who haven't changed
their preferences in a long time (and thus aren't in user_props
yet)

-Chad

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Roan Kattouw
2010/8/1 Platonides :
> Aryeh, can you do some statistics about the frequency of the different
> stub thresholds? Perhaps restricted to people which edited this year, to
> discard unused accounts.
>
He can't, but I can.  I ran a couple of queries and put the result at
http://www.mediawiki.org/wiki/User:Catrope/Stub_threshold

Roan Kattouw (Catrope)

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Platonides
Aryeh Gregor wrote:
> Look, this is just not a useful solution, period.  It would be
> extremely ineffective.  If you extended the permitted staleness level
> so much that it would be moderately effective, it would be useless,
> because you'd be seeing hours- or days-old articles.  On the other
> hand, for a comparable amount of effort you could implement a solution
> that actually is effective, like adding an extra postprocessing stage.

Yes, I have some ideas on how to improve it.


> On Fri, Jul 30, 2010 at 1:32 PM, John Vandenberg  wrote:
> Someone who sets their stub threshold to 357 is their own performance enemy.

In fact, setting the stub threshold to anything disables the parser
cache. You can only hit it when it is set to 0.

Aryeh, can you do some statistics about the frequency of the different
stub thresholds? Perhaps restricted to people which edited this year, to
discard unused accounts.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-08-01 Thread Aryeh Gregor
On Fri, Jul 30, 2010 at 1:32 PM, John Vandenberg  wrote:
> So you're telling my theoretical logged-in-reader to use default
> prefs, or log out, when the reason they are a logged-in-reader is so
> they can control their preferences..!

Yep.  You want features, you often pay a performance penalty.  In this
case the performance penalty should be reducible, or at least clearly
marked, but that's a general rule anyway.

> Surely there are a few common 'preference sets' which large numbers of
> readers use?

Changing any parser-related preference will kill page load times.

> There are plenty of pages which change more than once per minute,

No pages change once per minute on average.  That would be 1440 edits
per day, or more than 500,000 per year.  Only one page on enwiki
(WP:AIAV) has more than 500,000 edits *total*, let alone per year.
There were only 18 edits to WP:ANI between 17:00 and 18:00 today, just
for example, which is less than one edit every three minutes.  There
are some times when a particular page changes many times in a minute
-- like when a major event occurs and everyone rushes to update an
article -- but these are rare and don't last long.

You also seem to be missing how many different possible parser cache
keys there are.  It's not like there are only five or ten possible
versions.  As I said before -- if you change your parser-related
settings around a bunch, you will probably rarely or never hit parser
cache except when you yourself viewed the page since it last changed.
There are too many possible permutations of settings here.

> however I'd expect a much higher threshold, variable based on the
> volume of page activity, or some other mechanism to determine whether
> the cached version is acceptably stale for the logged-in-reader.
>
> There is no infrastructure required for extra stale entries.  If the
> viewer is happy to accept the slightly stale revision for there chosen
> prefs, serve it.  If not, reparse.

Look, this is just not a useful solution, period.  It would be
extremely ineffective.  If you extended the permitted staleness level
so much that it would be moderately effective, it would be useless,
because you'd be seeing hours- or days-old articles.  On the other
hand, for a comparable amount of effort you could implement a solution
that actually is effective, like adding an extra postprocessing stage.

On Fri, Jul 30, 2010 at 8:22 PM,   wrote:
> Hmmm, maybe they're there amongst the "!"s below.
> $ lynx --source http://en.wikipedia.org/wiki/Main_Page | grep parser
> Expensive parser function count: 44/500
> 

Yes.  That key is generated by the following line in
includes/parser/ParserCache.php:

$key = wfMemcKey( 'pcache', 'idhash',
"{$pageid}-{$renderkey}!{$hash}{$edit}{$printable}" );

The relevant bit of that, for us, is $hash, which is generated by
getPageRenderingHash() in includes/User.php:

// stubthreshold is only included below for completeness,
// it will always be 0 when this function is called by parsercache.

$confstr =$this->getOption( 'math' );
$confstr .= '!' . $this->getOption( 'stubthreshold' );
if ( $wgUseDynamicDates ) {
$confstr .= '!' . $this->getDatePreference();
}
$confstr .= '!' . ( $this->getOption( 'numberheadings' ) ? '1' : '' );
$confstr .= '!' . $wgLang->getCode();
$confstr .= '!' . $this->getOption( 'thumbsize' );
// add in language specific options, if any
$extra = $wgContLang->getExtraHashOptions();
$confstr .= $extra;

So anonymous users on enwiki have math=3, stubthreshold=0 (although
the comment indicates this is irrelevant somehow), date preferences =
'default', numberheadings = 1, language = 'en', thumbsize = 4.
Changing any of those from the default will make you miss the parser
cache on enwiki.

On Sat, Jul 31, 2010 at 12:58 PM, Daniel Kinzler  wrote:
> This is a few years old, but I guess it's still relevant:
>  I experimented a bit
> with ways to do all the per-user preference stuff on the client side, with 
> XSLT.

XSLT seems a bit baroque.  If the goal is to use script to avoid cache
misses, why not just use plain old JavaScript?  A lot more people know
it, it supports progressive rendering (does XSLT?), and it's much
better supported.  In particular, your approach of serving something
other than HTML and relying on XSLT support to transform it will
seriously confuse text browsers, search engines, etc.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-31 Thread Daniel Kinzler
Aryeh Gregor schrieb:
> As soon as you're logged in, you're missing Squid cache, because we
> have to add your name to the top, attach your user CSS/JS, etc.  You
> can't be served the same HTML as an anonymous user.  If you want to be
> served the same HTML as an anonymous user, log out.

This is a few years old, but I guess it's still relevant:
 I experimented a bit
with ways to do all the per-user preference stuff on the client side, with XSLT.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-30 Thread jidanni
> "AG" == Aryeh Gregor  writes:

AG> Fortunately, the major slowdown is parser cache misses, not Squid
AG> cache misses.  To avoid parser cache misses, just make sure you don't
AG> change parser-affecting preferences to non-default values.  (We don't
AG> say which these are, of course . . .)

Hmmm, maybe they're there amongst the "!"s below.
$ lynx --source http://en.wikipedia.org/wiki/Main_Page | grep parser
Expensive parser function count: 44/500


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-30 Thread John Vandenberg
On Sat, Jul 31, 2010 at 1:45 AM, Aryeh Gregor
 wrote:
> On Fri, Jul 30, 2010 at 4:49 AM, John Vandenberg  wrote:
>> Could we add a logged-in-reader mode, for people who are infrequent
>> contributors but wish to be logged in for the prefs.
>
> ...
>
> Fortunately, the major slowdown is parser cache misses, not Squid
> cache misses.  To avoid parser cache misses, just make sure you don't
> change parser-affecting preferences to non-default values.  (We don't
> say which these are, of course . . .)

So you're telling my theoretical logged-in-reader to use default
prefs, or log out, when the reason they are a logged-in-reader is so
they can control their preferences..!

>> They could be served a slightly old cached version of the page when
>> one is available for their prefs.  e.g. if the cached version is less
>> than a minute old.
>
> That would make no difference.  If you've fiddled with your
> preferences nontrivially, there's a good chance that not a single
> other user has the exact same preferences, so you'll only hit the
> parser cache if you yourself have viewed the page recently.  For
> instance, if you set your stub threshold to 357 bytes, you'll never
> hit anyone else's cache (unless someone else has that exact stub
> threshold).  Even if you just fiddle with on/off options, there are
> several, and the number of combinations is exponential.

Someone who sets their stub threshold to 357 is their own performance enemy.

Surely there are a few common 'preference sets' which large numbers of
readers use?

How many people only look at the front page in the morning, and jump
to a few pages from there..?

> Moreover, practically no page changes anywhere close to once per
> minute.  If the threshold is set that low, you'll essentially never
> get extra parser cache hits.  On the other hand, extra infrastructure
> will be needed to keep around stale parser cache entries, so it's a
> clear overall loss.

There are plenty of pages which change more than once per minute,
however I'd expect a much higher threshold, variable based on the
volume of page activity, or some other mechanism to determine whether
the cached version is acceptably stale for the logged-in-reader.

There is no infrastructure required for extra stale entries.  If the
viewer is happy to accept the slightly stale revision for there chosen
prefs, serve it.  If not, reparse.

>> The down side is that if they see an error, it may already be fixed.
>> OTOH, if the page is being revised frequently, the same is likely to
>> happen anyway.  The text could be stale before it hits the wire due to
>> parsing delay.
>
> However, in that case everyone will see the new contents at more or
> less the same time -- it won't be inconsistent.

Not on frequently changing pages.  many edits can occur while I am
pulling the page down the wire.  I then need to read the page to find
this error.

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-30 Thread Aryeh Gregor
On Fri, Jul 30, 2010 at 2:42 AM, Alex Brollo  wrote:
> Yes, but I  presume that a big advantage could come  from having a
> simplified, unique, js-free version of the pages online, completely devoid
> of "user preferences" to avoid any need to parse it again when uploaded by
> different users with different preferences profile.

This is exactly what we have when you're logged out.  The request goes
to a Squid, and it serves a static cached file, no dynamic bits (if
it's already cached).  When you log in, it can't be static, because we
display your name in the upper right, etc.

On Fri, Jul 30, 2010 at 4:49 AM, John Vandenberg  wrote:
> Could we add a logged-in-reader mode, for people who are infrequent
> contributors but wish to be logged in for the prefs.

As soon as you're logged in, you're missing Squid cache, because we
have to add your name to the top, attach your user CSS/JS, etc.  You
can't be served the same HTML as an anonymous user.  If you want to be
served the same HTML as an anonymous user, log out.

Fortunately, the major slowdown is parser cache misses, not Squid
cache misses.  To avoid parser cache misses, just make sure you don't
change parser-affecting preferences to non-default values.  (We don't
say which these are, of course . . .)

> They could be served a slightly old cached version of the page when
> one is available for their prefs.  e.g. if the cached version is less
> than a minute old.

That would make no difference.  If you've fiddled with your
preferences nontrivially, there's a good chance that not a single
other user has the exact same preferences, so you'll only hit the
parser cache if you yourself have viewed the page recently.  For
instance, if you set your stub threshold to 357 bytes, you'll never
hit anyone else's cache (unless someone else has that exact stub
threshold).  Even if you just fiddle with on/off options, there are
several, and the number of combinations is exponential.

Moreover, practically no page changes anywhere close to once per
minute.  If the threshold is set that low, you'll essentially never
get extra parser cache hits.  On the other hand, extra infrastructure
will be needed to keep around stale parser cache entries, so it's a
clear overall loss.

> The down side is that if they see an error, it may already be fixed.
> OTOH, if the page is being revised frequently, the same is likely to
> happen anyway.  The text could be stale before it hits the wire due to
> parsing delay.

However, in that case everyone will see the new contents at more or
less the same time -- it won't be inconsistent.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-30 Thread Strainu
On Fri, Jul 30, 2010 at 11:49 AM, John Vandenberg  wrote:
>
> Could we add a logged-in-reader mode, for people who are infrequent
> contributors but wish to be logged in for the prefs.
>
> They could be served a slightly old cached version of the page when
> one is available for their prefs.  e.g. if the cached version is less
> than a minute old.
> The down side is that if they see an error, it may already be fixed.
> OTOH, if the page is being revised frequently, the same is likely to
> happen anyway.  The text could be stale before it hits the wire due to
> parsing delay.

That could work on the first 3-5 wikipedias by number of visitors, for
the rest you are most likely to serve VERY old versions (or just
re-parse the page if you put a low threshold).

Strainu

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-30 Thread John Vandenberg
On Fri, Jul 30, 2010 at 6:23 AM, Aryeh Gregor
 wrote:
> On Thu, Jul 29, 2010 at 4:07 PM, Strainu  wrote:
>> Could you please elaborate on that? Thanks.
>
> When pages are parsed, the parsed version is cached, since parsing can
> take a long time (sometimes > 10 s).  Some preferences change how
> pages are parsed, so different copies need to be stored based on those
> preferences.  If these settings are all default for you, you'll be
> using the same parser cache copies as anonymous users, so you're
> extremely likely to get a parser cache hit.  If any of them is
> non-default, you'll only get a parser cache hit if someone with your
> exact parser-related preferences viewed the page since it was last
> changed; otherwise it will have to reparse the page just for you,
> which will take a long time.
>
> This is probably a bad thing.

Could we add a logged-in-reader mode, for people who are infrequent
contributors but wish to be logged in for the prefs.

They could be served a slightly old cached version of the page when
one is available for their prefs.  e.g. if the cached version is less
than a minute old.
The down side is that if they see an error, it may already be fixed.
OTOH, if the page is being revised frequently, the same is likely to
happen anyway.  The text could be stale before it hits the wire due to
parsing delay.

For pending changes, the pref 'Always show the latest accepted
revision (if there is one) of a page by default' could be enabled by
default.  Was there any discussion about the default setting for this
pref?

--
John Vandenberg

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Alex Brollo
2010/7/30 Daniel Friesen 

>
> That's pretty much the purpose of the caching servers.
>

Yes, but I  presume that a big advantage could come  from having a
simplified, unique, js-free version of the pages online, completely devoid
of "user preferences" to avoid any need to parse it again when uploaded by
different users with different preferences profile. Nevertheless I say
again: it's only a completely layman idea.

-- 
Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Daniel Friesen
Alex Brollo wrote:
> 2010/7/30 Platonides 
>
>   
>> We have a couple of options: {$edit}{$printable} which do in fact the
>> same (remove the sections edit links), so they could be merged.
>> Additionally, the non-editsection version can be retrieved from the
>> editsectioned one with a preg_replace.
>> So yes, I think it can be simplified without even affecting the poor
>> CSSless users.
>>
>>
>> 
> Perhaps you're telling the same I'm going to suggest,... My idea is, to have
> online a static version, very fast  too of any page, that could be the
> default version for unlogged users; very similar to the CD static version of
> wiki projects, only adding some trick to switch to the normal, editable,
> complete, customable (but slow) version. Obviusly this version would have
> only one version of any page, with no need to parse it again according to
> user preferences.
>
> Don't matter if such an idea is completely fool, I'm far form an expert!
>
> Alex
>   
That's pretty much the purpose of the caching servers.

-- 
~Daniel Friesen (Dantman, Nadir-Seen-Fire) [http://daniel.friesen.name]


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Alex Brollo
2010/7/30 Platonides 

>
>
> We have a couple of options: {$edit}{$printable} which do in fact the
> same (remove the sections edit links), so they could be merged.
> Additionally, the non-editsection version can be retrieved from the
> editsectioned one with a preg_replace.
> So yes, I think it can be simplified without even affecting the poor
> CSSless users.
>
>
Perhaps you're telling the same I'm going to suggest,... My idea is, to have
online a static version, very fast  too of any page, that could be the
default version for unlogged users; very similar to the CD static version of
wiki projects, only adding some trick to switch to the normal, editable,
complete, customable (but slow) version. Obviusly this version would have
only one version of any page, with no need to parse it again according to
user preferences.

Don't matter if such an idea is completely fool, I'm far form an expert!

Alex
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Platonides
Domas Mituzas wrote:
>> This is probably a bad thing.  I'd think that most of the settings
>> that fragment the parser cache should be implementable in a
>> post-processing stage, which should be more than fast enough to run on
>> parser cache hits as well as misses.  But we don't have such a thing.
> 
> some of which can be even done with css/js, I guess. 
> I'm all for simplifying whatever processing backend has to do :-) 
> 
> Domas

We have a couple of options: {$edit}{$printable} which do in fact the
same (remove the sections edit links), so they could be merged.
Additionally, the non-editsection version can be retrieved from the
editsectioned one with a preg_replace.
So yes, I think it can be simplified without even affecting the poor
CSSless users.


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Domas Mituzas
> This is probably a bad thing.  I'd think that most of the settings
> that fragment the parser cache should be implementable in a
> post-processing stage, which should be more than fast enough to run on
> parser cache hits as well as misses.  But we don't have such a thing.

some of which can be even done with css/js, I guess. 
I'm all for simplifying whatever processing backend has to do :-) 

Domas

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Aryeh Gregor
On Thu, Jul 29, 2010 at 4:07 PM, Strainu  wrote:
> Could you please elaborate on that? Thanks.

When pages are parsed, the parsed version is cached, since parsing can
take a long time (sometimes > 10 s).  Some preferences change how
pages are parsed, so different copies need to be stored based on those
preferences.  If these settings are all default for you, you'll be
using the same parser cache copies as anonymous users, so you're
extremely likely to get a parser cache hit.  If any of them is
non-default, you'll only get a parser cache hit if someone with your
exact parser-related preferences viewed the page since it was last
changed; otherwise it will have to reparse the page just for you,
which will take a long time.

This is probably a bad thing.  I'd think that most of the settings
that fragment the parser cache should be implementable in a
post-processing stage, which should be more than fast enough to run on
parser cache hits as well as misses.  But we don't have such a thing.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Domas Mituzas
> Could you please elaborate on that? Thanks.

we don't have large blinking red lights when people deviate with their parser 
cache settings - that makes them miss the cache and each pageview is slow. 

Domas
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-29 Thread Strainu
> And if
> you're logged in, I'm betting we're much less optimized -- certainly
> if you have unusual parser preferences (which I'm sure you do), so you
> miss the parser cache regularly.

Could you please elaborate on that? Thanks.

Andrei

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] wikipedia is one of the slower sites on the web

2010-07-28 Thread Aryeh Gregor
On Wed, Jul 28, 2010 at 3:13 PM,   wrote:
> Seems to me playing the role of the average dumb user, that
> en.wikipedia.org is one of the rather slow websites of the many websites
> I browse.
>
> No matter what browser, it takes more seconds from the time I click on a
> link to the time when the first bytes of the HTTP response start flowing
> back to me.
>
> Seems facebook is more zippy.
>
> Maybe Mediawiki is not "optimized".

Is this logged in or not?  If you're not logged in, you should be
hitting Squid cache most of the time, and we should be about as fast
as anyone with similar RTT.  But you might easily be far away from the
nearest Wikipedia server than the nearest Facebook server.  And if
you're logged in, I'm betting we're much less optimized -- certainly
if you have unusual parser preferences (which I'm sure you do), so you
miss the parser cache regularly.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l