Re: [Wikitech-l] speed of Vector in en.wikipedia
On 05/20/2010 09:15 AM, Amir E. Aharoni wrote: On various Vector feedback pages as well as on OTRS many people report that since the switch to Vector it takes significantly more time for Wikipedia pages to load.[...] Are there any more precise measurements? I don't know how useful it is, but recently I helped a client build some JS-based, in-browser page load performance monitoring. It tracks various rendering events of a chosen percentage of pageviews. The only server-side code processes web server logs in batch, so it is pretty low impact, and works with cached pages. It's been served in a few hundred million pageviews with no obvious problems yet. I think most of the server-side code is pretty particular to their needs, but if somebody wants it for Wikipedia, I'm sure they'd be willing to give up the client-side stuff and my rough-and-ready initial pass at the log parsing, which is in Ruby. If that's useful, let me know off-list and I'll ask 'em for permission. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] speed of Vector in en.wikipedia
On 05/21/2010 03:39 PM, Roan Kattouw wrote: I'm not sure we need this. I don't see a reason why one of us usability developers can't just load pages, find out whether they're slower, and where the slowness is. If the slowness is present for everyone (many different people reporting slowness seems to indicate that, and I have no reason to believe otherwise) you don't need to jump through hoops to gather data from random users when you can easily reproduce it yourself. I'm not sure you need it either; I just thought I'd offer what I had on hand. But the reason we built it was to quantify the problem so we could narrow down and prioritize issues. There are a lot of variables in browser performance, and we found it frustrating to try to simulate various user conditions (OS versions, browser versions, physical location, bandwidth) and get solid, statistically valid measurements that we thought correlated well with what people were actually experiencing. After enough futzing with that, it was a relief to get some actual numbers rolling in automatically. Last I heard they were going to set it up to graph aggregate client-side performance over time, so that they could easily see if their normal feature changes had unexpected browser performance impact. They want to solve the problems before users complain. So few of them do, especially about something subtle like performance. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] hiphop! :)
On 02/28/2010 01:33 PM, Domas Mituzas wrote: not 10x. I did concurrent benchmarks for API requests (e.g. opensearch) on modern boxes, and saw: HipHop: Requests per second:1975.39 [#/sec] (mean) Zend: Requests per second:371.29 [#/sec] (mean) these numbers seriously kick ass. I still can't believe I observe 2000 mediawiki requests/s from a single box ;-) Bravo! That's fantastic. Thanks for both the work and the testing. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 05:54 PM, Domas Mituzas wrote: Hi! from now on specific per-bot/per-software/per-client User-Agent header is mandatory for contacting Wikimedia sites. Two questions: Was there some urgent production impact that required doing this with no notice? Was any impact analysis done on this? Given Wikipedia's mission, we can't be as casual about rejecting traffic as a commercial site would be. If a commercial site accidentally gets rid of some third-world traffic running behind a shoddy ISP, it's no loss; nobody wants to advertise to them anyhow. But for us, those are the people who gain the most from being able to reach us. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 07:25 PM, Domas Mituzas wrote: Was there some urgent production impact that required doing this with no notice? Actually we had User-Agent header requirement for ages, it just failed to do what it had to do for a while. Consider this to be a bugfix. Ok. I'm going to take that as no. In the future, I think it would be better to let people know in advance about non-urgent changes that may break things for them. Was any impact analysis done on this? Yup! Would you care to share the results with us? In the future, I'd suggest giving basic info like that as part of an announcement. Actually, at the moment this mostly affects crap sites that hot-load data from us to display spamvertisements on hacked sites on internet. That's another good thing to share as part of a change announcement: motivation for the change. I don't know where your 'shoddy ISP' speculation fits in. Last I looked, there were a lot of poorly maintained proxies out there, some of which mangle headers. It seemed reasonable to me that some of those are on low-rent ISPs in poor countries. If you have already done the work to prove that no legitimate users anywhere in the world are impacted by this change, then perhaps you could save us further discussion and just explain that? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 06:50 PM, Steve Summit wrote: You're trying to detect / guard against malicious behavior using *User-Agent*?? Good grief. Have fun with the whack-a-mole game, then. Yes, a simple restriction like this tends to create smarter villains rather than less villainy. Filtering on an obvious, easy-to-change characteristic also destroys a useful source of information on who the bad people are, making future abuse prevention efforts harder. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] User-Agent:
On 02/15/2010 07:55 PM, Domas Mituzas wrote: Yes, a simple restriction like this tends to create smarter villains rather than less villainy. Filtering on an obvious, easy-to-change characteristic also destroys a useful source of information on who the bad people are, making future abuse prevention efforts harder. Thanks for insights. But no. We don't use UA as first step of analysis, it was helpful tertiary tool, that put these people into ignorant or malicious category. If they'd have spoofed their UAs, we'd block the IPs and inform upstreams, as fully malicious behavior. If they had nice UA, we might have attempted to contact them or have isolated their workload until the issue is fixed ;-) I am saying that going forward you have eliminated WMF's ability to use a tertiary tool that you agree was helpful. Having spent a lot of time dealing with abuse early in the Web's history, I wouldn't have done it that way. But it's not really my problem and you don't appear to be looking for input, so godspeed. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] search ranking
On 01/10/2010 06:12 PM, Gregory Maxwell wrote: If anyone feels adventurous: http://www.joachims.org/publications/joachims_02c.pdf http://www.cs.cornell.edu/People/tj/svm_light/svm_rank.html Ooh, that looks fun. If I wanted to investigate, I'd start here, yes? http://svn.wikimedia.org/svnroot/mediawiki/branches/lucene-search-2.1/ Is the click data available, too? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Foundation-l] Boing Boing applauds stats.grok.se!
On 01/08/2010 09:02 AM, David Gerard wrote: http://www.boingboing.net/2010/01/07/wikibumps.html And the poster, who is a Boing Boing guest editor, is one of our own, an English Wikipedia contributor since 2004: http://en.wikipedia.org/wiki/User:Jokestress William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] downloading wikipedia database dumps
On 01/07/2010 01:40 AM, Jamie Morken wrote: I have a suggestion for wikipedia!! I think that the database dumps including the image files should be made available by a wikipedia bittorrent tracker so that people would be able to download the wikipedia backups including the images (which currently they can't do) and also so that wikipedia's bandwidth costs would be reduced. [...] Is the bandwidth used really a big problem? Bandwidth is pretty cheap these days, and given Wikipedia's total draw, I suspect the occasional dump download isn't much of a problem. Bittorrent's real strength is when a lot of people want to download the same thing at once. E.g., when a new Ubuntu release comes out. Since Bittorrent requires all downloaders to be uploaders, it turns the flood of users into a benefit. But unless somebody has stats otherwise, I'd guess that isn't the problem here. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Flagged revs on en:wp?
On 12/13/2009 03:28 PM, William Pietri wrote: That seems like a fine thing to do. I've already promised to post an update here once we have a clearer picture; I'll write up a more general-audience version for the blog, too. As promised, here's an update: http://techblog.wikimedia.org/2010/01/flagged-revisions-your-questions-answered/ Feel free to drop me a line with questions or comments. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Failed to Download any pages-articles.xml.bz2 file
On 12/15/2009 10:32 AM, Rob Giberson wrote: I failed several times and ended up downloading exactly the same number of bytes: 1,465,454KB (1.46GB), even for different versions of that file. Is it possible that the files themselves are corrupted? Anyone successfully downloaded one of these big files recently? Just to confirm that this works fine, I started up an EC2 instance and successfully downloaded this file: http://download.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2 I ended up with 5.4 GB of data, and an MD5 checksum of 802c79045801bfb5f77fab3170af8efc. Just to be sure the file was uncorrupted, I ran it through bzcat and got a checksum of 5dd71caf9ed0b3387b351783325b788a. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Flagged revs on en:wp?
On 12/13/2009 05:59 AM, David Gerard wrote: Would it be possible for someone to write a techblog post about what's in the way of flagged revs on en:wp? Quite a few people are wondering what's up and why someone hasn't pulled the switch. That seems like a fine thing to do. I've already promised to post an update here once we have a clearer picture; I'll write up a more general-audience version for the blog, too. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Flagged revs on en:wp?
On 12/10/2009 01:14 PM, David Gerard wrote: Do we have any time frame for flagged revisions on en:wp? Especially on a project where previous expectations haven't been met, I'm very reluctant to give dates without reasonably good data to back them up. However, we're making good progress toward having a date. Aaron is finishing exams this week, and will be joining us in San Francisco next week. While he's here, we'll be meeting with the usability folks to come up with final-ish designs that improve FlaggedRevs usability and fit it in to the upcoming usability work. That in hand, we'll make a list of all the remaining work and put some relative estimates on it. From there we can build a release plan. As we make progress along the plan, I can measure completed work and project an actual release date. Without data, the best answer I can give is soon-ish, I hope. I expect that the usability improvements won't be massive amounts of work. There are performance concerns we should look at. There may be a show-stopper bug that I want to either eliminate or prove nonexistent. Because the next version should address a lot of the labs feedback, I'd like to do at least one more labs release. We'd like to add some more statistics, so we can have clear indicators of the effect that FlaggedRevs has on en:wp. The major wildcard is patrolled revisions which are part of the current proposal; I'll learn more next week, but my hazy understanding is that will require a substantial amount of new work in the core. I'll post an update here at the end of next week or early the following week. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Flagged revs on en:wp?
On 12/10/2009 07:21 PM, Steve Bennett wrote: On Fri, Dec 11, 2009 at 2:18 PM, William Pietriwill...@scissor.com wrote: Because the next version should address a lot of the labs feedback, I'd like to do at least one more labs release. Yes, please! I had some pretty big concerns about the last version... Great! We did too. To make sure that they're covered, have you posted them anywhere? E.g., here: http://flaggedrevs.labs.wikimedia.org/wiki/Talk:Main_Page If it looks like the upcoming work will be reasonably quick, then hopefully one push to labs will be enough. If it will take a while, then I'd like to release to labs at least monthly, so that progress is more transparent. And either way, although I'd like to ship to en:wp as soon as possible, I'd rather discover and resolve as many problems as possible in the labs context. No sense having millions of people tell us something dozens can. Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Production machine config
Somone recently suggested this to me: In terms of configuration this lists most of whats public: http://noc.wikimedia.org/conf/ I just went back to look at these again, and I see that link is now a 404. I take it the configs have moved. Does anybody know the best place to get that now? Also, when new boxes are added, how do we build them? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Production machine config
Hi, all. The FlaggedRevs lab site mysteriously got better, which makes me nervous. To counteract that, I'd like to put together some basic smoke and load tests. My plan is to do that on Amazon's cloud infrastructure. A few questions: 1. Does anybody have an AMI (basically, a virtual machine config) that's already like WMF production servers? 2. If not, is there something like a Puppet config that I can use to help build the AMI? 3. What's the best way for me to find out details on production server configs? I see the info in Special:Version, but I'm wondering a) where to find those packages, and b) if there are other requirements (e.g., particular kernel versions, supporting software, etc)? Once I get the AMI working, I'll post details here so others can easily spin up pseudo-production boxes. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Production machine config
Tim Starling wrote: I can only assume by this post that you're completely out of the loop on FlaggedRevs and have received no support from Wikimedia tech staff. Naturally, I would appreciate any assistance at getting more in the loop. I've certainly tried, but you (or anybody) should feel free to send me suggestions off list. However, Erik and Aaron have been very helpful, so any fault here would surely be mine. There's no need for performance testing since we've been running it on our second-biggest wiki for 18 months and the hardware requirements there can be extrapolated to the English Wikipedia. That's a reasonable notion. However, given that the labs site was basically broken for a week for no reason that's apparent yet, an alternative notion is that there is something about the labs code or the labs config that blows up, possibly due to use. If you have an easier way of proving that the new code and config works just as well as the old code and German config do, I'm all ears. Automated smoke and tests were just the best solution that I came up with on my own. Domas tells me that that extrapolation leads to some concern that we might approach our hardware limits on the s1 cluster. The sums need to be done and the hardware ordered, if necessary. I think we need to discuss this off list, preferably in real time with Aaron and Domas. I would love that. I'll contact you off list about arranging that. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Labs site down?
Steve Bennett wrote: Hmm, maybe it's time big complicated intrusive features like this were designed by BA*, rather than by committee. I think it's possible to get the best of both worlds. The community has a lot of expertise in what's going on, what they need, and what the risks are. But professional software designers have a much bigger library of potential solutions, plus a variety of skills and hard-won experience. Not just in designing the software, but planning the rollout, managing the risks, and working with developers. Most software is either internal business software, where users are obliged to put up with almost anything, or consumer-oriented, where users are mostly uninvolved and fickle. The product management methods in either of those spaces probably wouldn't work well here: Wikipedians, as volunteers, can't be ordered around. They also aren't just consumers; they're a community, one that wants to engage deeply. But I think we can take tools from both and figure out something that works here. Are there historical examples of WMF development projects that have gone particularly well? I'd love to look at them in detail. William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Labs site down?
Hi! It appears the labs site for FlaggedRevs is down: http://flaggedrevs.labs.wikimedia.org/w/index.php?title=Hurricane_Vince_(2005) It was also down last Friday: https://bugzilla.wikimedia.org/show_bug.cgi?id=21500 And I figure it has been sad for the 5 days in between. Does anybody know if this is intentional? And if not, what's the best way for me to help get it back up again? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Labs site down?
Jake Wartenberg wrote: As a side note, what exactly is holding up implementation of this on enwiki? That's an excellent question, and one I'm hopefully getting close to answering. As background, Erik recently brought me on for a few hours a week to move things along. Part of the problem is definitely the size and complexity of the proposed feature set. There are also a number of outstanding questions about the best user interface, including issues with naming, clarity of the conceptual model, ease of use, and impact on various audiences, especially including readers. Plus, the extension is now in production use on 24 WMF projects, with nearly as many different configurations. Those are all solvable problems, but resource constraints have kept us from progressing as fast as we'd like. Happily, the usability folks -- especially Howie Fung -- have had a little time to lend a hand with the UI issues, and I think we'll shortly have a proposal for something we can get into production soon. That would include enough measurement that we can see what the impact is, and provide for quick follow-ups if there are issues with the first release. Once I have more solid info, I'll definitely mention it here. But in the meantime, the labs site being down is definitely holding us up some, so I'd love to find out how I can fix that. Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Viewing site configs
Hi! I'm working on the FlaggedRevs stuff, and would like to get the configs for every site using it. Right now I'm working with Howie -- a usability expert -- to make a list of all the use cases and affected interfaces. That's no problem for the English site, which we presume will be like the current labs site. But when we were in the offices last week, Brion showed me a substantial list of sites using the FlaggedRevs extension. I know the German configuration is importantly different, but have no idea about the rest. Does anybody know where that list of FlaggedRevs sites is? And what's the best way for me to get copies of the site configs so I can see how they're using FlaggedRevs? My shell-fu is good, so I'm glad to go digging; I just don't know where to start, or what access I might need. Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Viewing site configs
Simon Walker wrote: 2009/10/19 William Pietri will...@scissor.com: Does anybody know where that list of FlaggedRevs sites is? And what's the best way for me to get copies of the site configs so I can see how they're using FlaggedRevs? is this not the relevant config? http://noc.wikimedia.org/conf/highlight.php?file=flaggedrevs.php I don't know if that's up-to-date or not, you'll have to get one of the real devs to confirm that, though I think it's pulled from the real live config files. Just what I'm looking for, thanks. I figured it would be spread out across a bunch of per-site files, so this is perfect for my needs. Going one level up, I see the message Here are some Wikimedia configuration files which are not in Subversion. The files are dynamically generated and are perfectly up-to-date. So I'll assume these are close enough to hot for my purposes unless somebody tells me otherwise. Is there a good place for me to read up on basic site operations stuff like this so I can avoid more noob questions? Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] FlaggedRevs en.wp deployment update
Aryeh Gregor wrote: On Fri, Oct 16, 2009 at 2:35 PM, William Pietri will...@scissor.com wrote: And I see at least a brief mention of testing here: http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Testing I've updated that to be more accurate. Thanks. That's helpful to know. If end up with enough time to do something about the lack of tests, I'll start by bringing it up here to see if there's a way to do it that avoids the fate of previous efforts. Thanks, William ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l