Re: [Wikitech-l] speed of Vector in en.wikipedia

2010-05-21 Thread William Pietri
On 05/20/2010 09:15 AM, Amir E. Aharoni wrote:
 On various Vector feedback pages as well as on OTRS many people report that
 since the switch to Vector it takes significantly more time for Wikipedia
 pages to load.[...]

 Are there any more precise measurements?




I don't know how useful it is, but recently I helped a client build some 
JS-based, in-browser page load performance monitoring. It tracks various 
rendering events of a chosen percentage of pageviews. The only 
server-side code processes web server logs in batch, so it is pretty low 
impact, and works with cached pages. It's been served in a few hundred 
million pageviews with no obvious problems yet.

I think most of the server-side code is pretty particular to their 
needs, but if somebody wants it for Wikipedia, I'm sure they'd be 
willing to give up the client-side stuff and my rough-and-ready initial 
pass at the log parsing, which is in Ruby. If that's useful, let me know 
off-list and I'll ask 'em for permission.

William


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] speed of Vector in en.wikipedia

2010-05-21 Thread William Pietri
On 05/21/2010 03:39 PM, Roan Kattouw wrote:
 I'm not sure we need this. I don't see a reason why one of us
 usability developers can't just load pages, find out whether they're
 slower, and where the slowness is. If the slowness is present for
 everyone (many different people reporting slowness seems to indicate
 that, and I have no reason to believe otherwise) you don't need to
 jump through hoops to gather data from random users when you can
 easily reproduce it yourself.


I'm not sure you need it either; I just thought I'd offer what I had on 
hand. But the reason we built it was to quantify the problem so we could 
narrow down and prioritize issues.

There are a lot of variables in browser performance, and we found it 
frustrating to try to simulate various user conditions (OS versions, 
browser versions, physical location, bandwidth) and get solid, 
statistically valid measurements that we thought correlated well with 
what people were actually experiencing. After enough futzing with that, 
it was a relief to get some actual numbers rolling in automatically.

Last I heard they were going to set it up to graph aggregate client-side 
performance over time, so that they could easily see if their normal 
feature changes had unexpected browser performance impact. They want to 
solve the problems before users complain. So few of them do, especially 
about something subtle like performance.

William


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] hiphop! :)

2010-02-28 Thread William Pietri
On 02/28/2010 01:33 PM, Domas Mituzas wrote:

 not 10x. I did concurrent benchmarks for API requests (e.g. opensearch) on 
 modern boxes, and saw:

 HipHop: Requests per second:1975.39 [#/sec] (mean)
 Zend: Requests per second:371.29 [#/sec] (mean)

 these numbers seriously kick ass. I still can't believe I observe 2000 
 mediawiki requests/s from a single box ;-)


Bravo! That's fantastic. Thanks for both the work and the testing.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 05:54 PM, Domas Mituzas wrote:
 Hi!

 from now on specific per-bot/per-software/per-client User-Agent header is 
 mandatory for contacting Wikimedia sites.


Two questions:

Was there some urgent production impact that required doing this with no 
notice?

Was any impact analysis done on this? Given Wikipedia's mission, we 
can't be as casual about rejecting traffic as a commercial site would 
be. If a commercial site accidentally gets rid of some third-world 
traffic running behind a shoddy ISP, it's no loss; nobody wants to 
advertise to them anyhow. But for us, those are the people who gain the 
most from being able to reach us.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 07:25 PM, Domas Mituzas wrote:
 Was there some urgent production impact that required doing this with no
 notice?
  
 Actually we had User-Agent header requirement for ages, it just failed to do 
 what it had to do for a while. Consider this to be a bugfix.


Ok. I'm going to take that as no. In the future, I think it would be 
better to let people know in advance about non-urgent changes that may 
break things for them.


 Was any impact analysis done on this?
  
 Yup!


Would you care to share the results with us?

In the future, I'd suggest giving basic info like that as part of an 
announcement.

 Actually, at the moment this mostly affects crap sites that hot-load data 
 from us to display spamvertisements on hacked sites on internet.


That's another good thing to share as part of a change announcement: 
motivation for the change.

 I don't know where your 'shoddy ISP' speculation fits in.


Last I looked, there were a lot of poorly maintained proxies out there, 
some of which mangle headers. It seemed reasonable to me that some of 
those are on low-rent ISPs in poor countries. If you have already done 
the work to prove that no legitimate users anywhere in the world are 
impacted by this change, then perhaps you could save us further 
discussion and just explain that?

Thanks,

William



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 06:50 PM, Steve Summit wrote:
 You're trying to detect / guard against malicious behavior using
 *User-Agent*??  Good grief.  Have fun with the whack-a-mole game, then.



Yes, a simple restriction like this tends to create smarter villains 
rather than less villainy. Filtering on an obvious, easy-to-change 
characteristic also destroys a useful source of information on who the 
bad people are, making future abuse prevention efforts harder.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] User-Agent:

2010-02-15 Thread William Pietri
On 02/15/2010 07:55 PM, Domas Mituzas wrote:
 Yes, a simple restriction like this tends to create smarter villains
 rather than less villainy. Filtering on an obvious, easy-to-change
 characteristic also destroys a useful source of information on who the
 bad people are, making future abuse prevention efforts harder.
  

 Thanks for insights.  But no.

 We don't use UA as first step of analysis, it was helpful tertiary tool, that 
 put these people into ignorant or malicious category.
 If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as 
 fully malicious behavior.
 If they had nice UA, we might have attempted to contact them or have isolated 
 their workload until the issue is fixed ;-)


I am saying that going forward you have eliminated WMF's ability to use 
a tertiary tool that you agree was helpful.

Having spent a lot of time dealing with abuse early in the Web's 
history, I wouldn't have done it that way. But it's not really my 
problem and you don't appear to be looking for input, so godspeed.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] search ranking

2010-01-10 Thread William Pietri
On 01/10/2010 06:12 PM, Gregory Maxwell wrote:
 If anyone feels adventurous:

 http://www.joachims.org/publications/joachims_02c.pdf

 http://www.cs.cornell.edu/People/tj/svm_light/svm_rank.html


Ooh, that looks fun. If I wanted to investigate, I'd start here, yes?

http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/

Is the click data available, too?

Thanks,

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] [Foundation-l] Boing Boing applauds stats.grok.se!

2010-01-08 Thread William Pietri
On 01/08/2010 09:02 AM, David Gerard wrote:
 http://www.boingboing.net/2010/01/07/wikibumps.html


And the poster, who is a Boing Boing guest editor, is one of our own, an 
English Wikipedia contributor since 2004:

http://en.wikipedia.org/wiki/User:Jokestress

William


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] downloading wikipedia database dumps

2010-01-07 Thread William Pietri
On 01/07/2010 01:40 AM, Jamie Morken wrote:
 I have a
 suggestion for wikipedia!!  I think that the database dumps including
 the image files should be made available by a wikipedia bittorrent
 tracker so that people would be able to download the wikipedia backups
 including the images (which currently they can't do) and also so that
 wikipedia's bandwidth costs would be reduced. [...]


Is the bandwidth used really a big problem? Bandwidth is pretty cheap 
these days, and given Wikipedia's total draw, I suspect the occasional 
dump download isn't much of a problem.

Bittorrent's real strength is when a lot of people want to download the 
same thing at once. E.g., when a new Ubuntu release comes out. Since 
Bittorrent requires all downloaders to be uploaders, it turns the flood 
of users into a benefit. But unless somebody has stats otherwise, I'd 
guess that isn't the problem here.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged revs on en:wp?

2010-01-05 Thread William Pietri
On 12/13/2009 03:28 PM, William Pietri wrote:
 That seems like a fine thing to do. I've already promised to post an
 update here once we have a clearer picture; I'll write up a more
 general-audience version for the blog, too.


As promised, here's an update:

http://techblog.wikimedia.org/2010/01/flagged-revisions-your-questions-answered/

Feel free to drop me a line with questions or comments.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Failed to Download any pages-articles.xml.bz2 file

2009-12-16 Thread William Pietri
On 12/15/2009 10:32 AM, Rob Giberson wrote:
 I failed several times and ended up downloading exactly the same number of
 bytes:  1,465,454KB (1.46GB), even for different versions of that file.

 Is it possible that the files themselves are corrupted? Anyone successfully
 downloaded one of these big files recently?


Just to confirm that this works fine, I started up an EC2 instance and 
successfully downloaded this file:

http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2

I ended up with 5.4 GB of data, and an MD5 checksum of 
802c79045801bfb5f77fab3170af8efc. Just to be sure the file was 
uncorrupted, I ran it through bzcat and got a checksum of 
5dd71caf9ed0b3387b351783325b788a.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged revs on en:wp?

2009-12-13 Thread William Pietri
On 12/13/2009 05:59 AM, David Gerard wrote:
 Would it be possible for someone to write a techblog post about what's
 in the way of flagged revs on en:wp? Quite a few people are wondering
 what's up and why someone hasn't pulled the switch.


That seems like a fine thing to do. I've already promised to post an 
update here once we have a clearer picture; I'll write up a more 
general-audience version for the blog, too.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged revs on en:wp?

2009-12-10 Thread William Pietri
On 12/10/2009 01:14 PM, David Gerard wrote:
 Do we have any time frame for flagged revisions on en:wp?


Especially on a project where previous expectations haven't been met, 
I'm very reluctant to give dates without reasonably good data to back 
them up. However, we're making good progress toward having a date.

Aaron is finishing exams this week, and will be joining us in San 
Francisco next week. While he's here, we'll be meeting with the 
usability folks to come up with final-ish designs that improve 
FlaggedRevs usability and fit it in to the upcoming usability work. That 
in hand, we'll make a list of all the remaining work and put some 
relative estimates on it. From there we can build a release plan. As we 
make progress along the plan, I can measure completed work and project 
an actual release date.

Without data, the best answer I can give is soon-ish, I hope. I expect 
that the usability improvements won't be massive amounts of work. There 
are performance concerns we should look at. There may be a show-stopper 
bug that I want to either eliminate or prove nonexistent. Because the 
next version should address a lot of the labs feedback, I'd like to do 
at least one more labs release. We'd like to add some more statistics, 
so we can have clear indicators of the effect that FlaggedRevs has on 
en:wp. The major wildcard is patrolled revisions which are part of the 
current proposal; I'll learn more next week, but my hazy understanding 
is that will require a substantial amount of new work in the core.

I'll post an update here at the end of next week or early the following 
week.

William


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Flagged revs on en:wp?

2009-12-10 Thread William Pietri
On 12/10/2009 07:21 PM, Steve Bennett wrote:
 On Fri, Dec 11, 2009 at 2:18 PM, William Pietriwill...@scissor.com  wrote:

 Because the
 next version should address a lot of the labs feedback, I'd like to do
 at least one more labs release.
  
 Yes, please! I had some pretty big concerns about the last version...



Great! We did too. To make sure that they're covered, have you posted 
them anywhere? E.g., here:

http://flaggedrevs.labs.wikimedia.org/wiki/Talk:Main_Page


If it looks like the upcoming work will be reasonably quick, then 
hopefully one push to labs will be enough. If it will take a while, then 
I'd like to release to labs at least monthly, so that progress is more 
transparent. And either way, although I'd like to ship to en:wp as soon 
as possible, I'd rather discover and resolve as many problems as 
possible in the labs context. No sense having millions of people tell us 
something dozens can.

Thanks,

William



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Production machine config

2009-12-09 Thread William Pietri

Somone recently suggested this to me:

 In terms of configuration this lists most of whats public:
 http://noc.wikimedia.org/conf/


I just went back to look at these again, and I see that link is now a 
404. I take it the configs have moved. Does anybody know the best place 
to get that now?

Also, when new boxes are added, how do we build them?


Thanks,

William



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Production machine config

2009-11-24 Thread William Pietri
Hi, all.

The FlaggedRevs lab site mysteriously got better, which makes me 
nervous. To counteract that, I'd like to put together some basic smoke 
and load tests. My plan is to do that on Amazon's cloud infrastructure. 
A few questions:

   1. Does anybody have an AMI (basically, a virtual machine config)
  that's already like WMF production servers?
   2. If not, is there something like a Puppet config that I can use to
  help build the AMI?
   3. What's the best way for me to find out details on production
  server configs? I see the info in Special:Version, but I'm
  wondering a) where to find those packages, and b) if there are
  other requirements (e.g., particular kernel versions, supporting
  software, etc)?


Once I get the AMI working, I'll post details here so others can easily 
spin up pseudo-production boxes.

William
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Production machine config

2009-11-24 Thread William Pietri
Tim Starling wrote:
 I can only assume by this post that you're completely out of the loop
 on FlaggedRevs and have received no support from Wikimedia tech staff.
   

Naturally, I would appreciate any assistance at getting more in the 
loop. I've certainly tried, but you (or anybody) should feel free to 
send me suggestions off list. However, Erik and Aaron have been very 
helpful, so any fault here would surely be mine.

 There's no need for performance testing since we've been running it on
 our second-biggest wiki for 18 months and the hardware requirements
 there can be extrapolated to the English Wikipedia.
   

That's a reasonable notion. However, given that the labs site was 
basically broken for a week for no reason that's apparent yet, an 
alternative notion is that there is something about the labs code or the 
labs config that blows up, possibly due to use.

If you have an easier way of proving that the new code and config works 
just as well as the old code and German config do, I'm all ears. 
Automated smoke and tests were just the best solution that I came up 
with on my own.


 Domas tells me that that extrapolation leads to some concern that we
 might approach our hardware limits on the s1 cluster. The sums need to
 be done and the hardware ordered, if necessary.

 I think we need to discuss this off list, preferably in real time with
 Aaron and Domas.
   

I would love that. I'll contact you off list about arranging that.


William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Labs site down?

2009-11-19 Thread William Pietri
Steve Bennett wrote:
 Hmm, maybe it's time big complicated intrusive features like this were
 designed by BA*, rather than by committee.
   


I think it's possible to get the best of both worlds. The community has 
a lot of expertise in what's going on, what they need, and what the 
risks are. But professional software designers have a much bigger 
library of potential solutions, plus a variety of skills and hard-won 
experience. Not just in designing the software, but planning the 
rollout, managing the risks, and working with developers.

Most software is either internal business software, where users are 
obliged to put up with almost anything, or consumer-oriented, where 
users are mostly uninvolved and fickle. The product management methods 
in either of those spaces probably wouldn't work well here: Wikipedians, 
as volunteers, can't be ordered around. They also aren't just consumers; 
they're a community, one that wants to engage deeply. But I think we can 
take tools from both and figure out something that works here.

Are there historical examples of WMF development projects that have gone 
particularly well? I'd love to look at them in detail.

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Labs site down?

2009-11-18 Thread William Pietri

Hi! It appears the labs site for FlaggedRevs is down:

http://flaggedrevs.labs.wikimedia.org/w/index.php?title=Hurricane_Vince_(2005)

It was also down last Friday:

https://bugzilla.wikimedia.org/show_bug.cgi?id=21500

And I figure it has been sad for the 5 days in between.


Does anybody know if this is intentional? And if not, what's the best 
way for me to help get it back up again?

Thanks,

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Labs site down?

2009-11-18 Thread William Pietri
Jake Wartenberg wrote:
 As a side note, what exactly is holding up implementation of this on enwiki?

   

That's an excellent question, and one I'm hopefully getting close to 
answering. As background, Erik recently brought me on for a few hours a 
week to move things along.

Part of the problem is definitely the size and complexity of the 
proposed feature set. There are also a number of outstanding questions 
about the best user interface, including issues with naming, clarity of 
the conceptual model, ease of use, and impact on various audiences, 
especially including readers. Plus, the extension is now in production 
use on 24 WMF projects, with nearly as many different configurations.

Those are all solvable problems, but resource constraints have kept us 
from progressing as fast as we'd like. Happily, the usability folks -- 
especially Howie Fung -- have had a little time to lend a hand with the 
UI issues, and I think we'll shortly have a proposal for something we 
can get into production soon. That would include enough measurement that 
we can see what the impact is, and provide for quick follow-ups if there 
are issues with the first release.

Once I have more solid info, I'll definitely mention it here. But in the 
meantime, the labs site being down is definitely holding us up some, so 
I'd love to find out how I can fix that.

Thanks,

William



___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


[Wikitech-l] Viewing site configs

2009-10-19 Thread William Pietri
Hi! I'm working on the FlaggedRevs stuff, and would like to get the 
configs for every site using it.

Right now I'm working with Howie -- a usability expert -- to make a list 
of all the use cases and affected interfaces. That's no problem for the 
English site, which we presume will be like the current labs site. But 
when we were in the offices last week, Brion showed me a substantial 
list of sites using the FlaggedRevs extension. I know the German 
configuration is importantly different, but have no idea about the rest.

Does anybody know where that list of FlaggedRevs sites is? And what's 
the best way for me to get copies of the site configs so I can see how 
they're using FlaggedRevs?

My shell-fu is good, so I'm glad to go digging; I just don't know where 
to start, or what access I  might need.

Thanks,

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] Viewing site configs

2009-10-19 Thread William Pietri
Simon Walker wrote:
 2009/10/19 William Pietri will...@scissor.com:
   
 Does anybody know where that list of FlaggedRevs sites is? And what's
 the best way for me to get copies of the site configs so I can see how
 they're using FlaggedRevs?
 

 is this not the relevant config?

 http://noc.wikimedia.org/conf/highlight.php?file=flaggedrevs.php

 I don't know if that's up-to-date or not, you'll have to get one of
 the real devs to confirm that, though I think it's pulled from the
 real live config files.
   

Just what I'm looking for, thanks. I figured it would be spread out 
across a bunch of per-site files, so this is perfect for my needs.

Going one level up, I see the message Here are some Wikimedia 
configuration files which are not in Subversion. The files are 
dynamically generated and are perfectly up-to-date. So I'll assume 
these are close enough to hot for my purposes unless somebody tells me 
otherwise.

Is there a good place for me to read up on basic site operations stuff 
like this so I can avoid more noob questions?

Thanks,

William

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l


Re: [Wikitech-l] FlaggedRevs en.wp deployment update

2009-10-16 Thread William Pietri
Aryeh Gregor wrote:
 On Fri, Oct 16, 2009 at 2:35 PM, William Pietri will...@scissor.com wrote:
   
 And I see at least a brief mention of testing here:

 http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Testing
 

 I've updated that to be more accurate.
   

Thanks. That's helpful to know.

If end up with enough time to do something about the lack of tests, I'll 
start by bringing it up here to see if there's a way to do it that 
avoids the fate of previous efforts.


Thanks,

William
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l