Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Feb 6, 2013, at 9:32 PM, David Schoonover wrote: > Just want to summarize and make sure I've got the right conclusions, as > this thread has wandered a bit. > > *1. X-MF-Mode: Alpha/Beta Site Usage* > * > * > We'll roll this into the X-CS header, which will now be KV-pairs (using > normal URL encoding), and set by Varnish. This will avoid an explosion of > cryptic headers for analytic purposes. > > Questions: > - It seems there's some confusion around "bypassing Varnish". If I > understand correctly, it's not that Varnish is ever bypassed, just that the > upstream response is not cached if cookies are present. Is that right? Yes > - Since we're repurposing X-CS, should we perhaps rename it to something > more apt to address concerns about cryptic non-standard headers flying > about? I'd like to propose to define *one* request header to be used for all analytics purposes. It can be key/value pairs, and be set client side where applicable. Varnish can append to it where needed, later keys overriding earlier ones. Then we can log that one header across all HTTP/caching clusters without having to change the log stream all the time, and without wasting much space, and caching edge configuration changes are kept to a minimum as well. And we might as well be transparent in its naming. header name "Log-Parameters:"? > *2. X-MF-Req: Primary vs Secondary API Requests* > > This header will be replaced with a query parameter set by the client-side > JS code making the request. Analytics will parse it out at processing time > and Do The Right Thing. I think the question of using a URL param vs a request header should mainly take into account whether the response varies on the value of the parameter. If the responses are otherwise identical, and the value is only used for analytics purposes, I would prefer to put that into the above header instead, as it will impair cacheability / cache size otherwise (even if those requests are currently not cacheable for other reasons). If the responses are actually different based on this parameter, I would prefer to have it in the URL where possible. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Engineering] RFC: Introducing two new HTTP headers to track mobile pageviews
On Feb 9, 2013, at 11:21 PM, Asher Feldman wrote: > For this particular case, the API requests are for either getting specific > sections of an article as opposed to either the whole thing, or the first > section as part of an initial pageview. I might not have grokked the > original RFC email well, but I don't understand why this was being > discussed as a logging challenge or necessitating a request header. A > mobile api request to just get section 3 of the article on otters should > already utilize a query param denoting that section 3 is being fetched, and > is already clearly not a "primary" request. Yes, that part remains a bit unclear to me as well - some more details would be welcome. > Whether or not it makes sense for mobile to move in the direction of > splitting up article views into many api requests is something I'd love to > see backed up by data. I'm skeptical for multiple reasons. What is the main motivation used here? Reducing article sizes/transfers at the expense of more latency? -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [Labs-l] Maria DB
On Feb 14, 2013, at 5:02 PM, Petr Bena wrote: > Keeping debug symbols in binaries will result in poor performance, or it > should That's bollocks. It results in a larger binary _on disk_. The symbol table isn't even loaded into memory and doesn't affect performance. Debug information is *highly useful* in a production setup, and we try to run all our core applications with it so we have a chance to debug issues when they occur. I think the only reason distributions omit debug information is to save disk space. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Mobile caching improvements are coming
Hi Max, On Mar 29, 2013, at 10:45 AM, Max Semenik wrote: > Hi, we at the mobile team are currently working on improving our > current hit rate, publishing the half-implemented plan here for review: > == Proposed strategy == > * We don't vary pages on X-Device anymore. > * Because we still need to give really ancient WAP phones WML output, we > create a new header, X-WAP, with just two values, yes or not[1] > * And we vary our output on X-WAP instead of X-Device[2] > * Because we still need to serve device-specific CSS but can't use device > name in page HTML, we create a single ResourceLoader module, > mobile.device.detect, which outputs styles depending on X-Device.[2] This > does not affect bits cache fragmentation because it simply changes the way > the same data is varied, but not adds the new fragmentation factors. Bits > hit rate currently is very high, by the way. Yes. It does add Vary processing on the bits caches for these requests though. But we can change that by including the X-Device header into the hash for cache lookups, if we want to. > * And because we need X-Device, we will need to direct mobile load.php > requests to the mobile site itself instead of bits. Not a problem because > mobile domains are served by Varnish just like bits. > * Since now we will be serving ResourceLoader to all devices, we will > blacklist all the incompatible devices in the startup module to prevent > them from choking on the loads of JS they can't handle (and even if they > degrade gracefully, still no need to force them to download tens of > kilobytes needlessly)[3] Good work! This should help a great deal. > Your comments are highly appreciated! :) I've been pondering a bit about the two options for serving mobile ResourceLoader requests with Varnish: on the bits caches or on the mobile caches. I don't fully like either option to be honest. On one hand I'd like to keep mobile device detection off our currently very efficient bits caches, on the other hand I don't like the idea of mixing in the RL requests into the high churn of mobile request LRU cache eviction of the frontend caches. Unfortunately Varnish currently doesn't allow us to separate/specify cache backends for objects well. So let's go with Asher's suggestion indeed, and add the device detection to the bits servers. Let's keep it such that it'll always be easy to distinguish these requests, so we can easily decide to move these to another Varnish cluster at any point. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ZERO architecture
Hi Yuri, Thanks for writing this up. I'll put some comments and questions inline. On May 30, 2013, at 7:16 PM, Yuri Astrakhan wrote: > *== Technical Requirements ==* > * increase Varnish cache hits / minimize cache fragmentation > * Set up and configure new partners without code changes > * Use partner-supplied IP ranges as a preferred alternative to the geo-ip > database for fundraising & analytic teams Note that a Varnish VMOD to support the latter is being written at the moment by Brandon Black. > *== Current state ==* > Zero domain requests set X-Subdomain="ZERO", and treat the request as > mobile. The backend uses X-Subdomain and X-CS headers to customize result. > The cache is heavily fragmented due to its variance on both of these > headers in addition to the variance set by MobileFrontend extension and > MediaWiki core. ...and also, variance due to the different hostname (and thus URL). > *== Proposals ==* > In order to reduce Zero-caused fragmentation, we propose to shrink from one > bucket per carrier (X-CS) to three general buckets: > * smart phones bucket -- banner and site modifications are done on the > client in javascript > * feature phones -- HTML only, the banner is inserted by the ESI > ** for carriers with free images > ** for carriers without free images > > *=== Varnish logic ===* > * Parse User-Agent to distinguish between desktop / mobile / feature phone: > X-Device-Type=desktop|mobile|legacy Using the OpenDDR library? > * Use IP -> X-CS lookup (under development by OPs) to convert client's IP > into X-CS header > * If X-CS && X-Device-Type == 'legacy': Use IP -> X-Images lookup (same > lookup plugin, different database file) to determine if carrier allows > images Hopefully we can set the X-Images header straight from the ip database. > Since each carrier has its own list of free languages, language links on > feature phones will point to origin, which will either silently redirect > or ask for confirmation. Perhaps we can store the list of supported languages for the carrier in the ip database as well? > *=== ZERO vs M ===* > Even though I think zero. and m. subdomains should both go the way of the > dodo to make each article have just one canonical location (no more > linking & Google issues) , this won't happen until we are fully migrated > to Varnish and make some mobile code changes (and possibly other changes > that I am not aware of). What do you mean by "until we are fully migrated to Varnish"? MobileFrontend has always exclusively been on Varnish. > At the same time, we should try to get rid of ZERO wherever possible. > There are two technical differences between m & zero: zero shows a link to > image instead of the actual image, and a big red zero warning is shown if > the carrier is not detected. There is also an organizational difference -- > some carriers only whitelist zero, some - only m, and some -- both zero > & m subdomains. I'm still a little confused about "m" vs "ZERO" and "images" vs "no images". That probably means others are too. :) Can you elaborate a little on that? I thought that was pretty much the same, but according to your spreadsheet that doesn't seem to be the case? Overall this sounds reasonable I think, we'll just need to work out the details. As Arthur also said in this thread, I'd like to keep zero & m completely aligned, ideally sharing the Varnish cache objects and the mobile device detection at the Varnish level as much as possible. I don't think we disagree here. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] ZERO architecture
>> * feature phones -- HTML only, the banner is inserted by the ESI >> ** for carriers with free images >> ** for carriers without free images >> > > What about including ESI tags for banners for smart devices as well as > feature phones, then either use ESI to insert the banner for both device > types or, alternatively, for smart devices don't let Varnish populate the > ESI chunk and instead use JS to replace the ESI tags with the banner? That > way we can still serve the same HTML for smart phones and feature phones > with images (one less thing for which to vary the cache). I think the verdict is still out on whether it's better to use ESI for Banners in Varnish or use JS for that client-side. I guess we'll have to test and see. > Are there carrier-specific things that would result in different HTML for > devices that do not support JS, or can you get away with providing the same > non-js experience for Zero as MobileFrontend (aside from the > banner, presumably handled by ESI)? If not currently, do you think its > feasible to do that (eg make carrier-variable links get handled via special > pages so we can always rely on the same URIs)? Again, it would be nice if > we could just rely on the same HTML to further reduce cache variance. It > would be cool if MobileFrontend and Zero shared buckets and they were > limited to: > > * HTML + images > * HTML - images > * WAP That would be nice. > Since we improved MobileFrontend to no longer vary the cache on X-Device, > I've been surprised to not see a significant increase in our cache hit > ratio (which warrants further investigation but that's another email). Are > there ways we can do a deeper analysis of the state of the varnish cache to > determine just how fragmented it is, why, and how much of a problem it > actually is? I believe I've asked this before and was met with a response > of 'not really' - but maybe things have changed now, or others on this list > have different insight. I think we've mostly approached the issue with a > lot more assumption than informed analysis, and if possible I think it > would be good to change that. Yeah, we should look into that. We've already flagged a few possible culprits, and we're also working on the migration of the desktop wiki cluster from Squid to Varnish, which has some of the same issues with variance (sessions, XVO, cookies, Accept-Language...) as MobileFrontend does. After we've finished migrating that and confirmed that it's working well, we want to unify those clusters' configurations a bit more, and that by itself should give us additional opportunity to compare some strategies there. We've since also figured out that the way we've calculate cache efficiency with Varnish is not exactly ideal; unlike Squid, cache purges are done as HTTP requests to Varnish. Therefore in Varnish, those cache lookups are calculated into the cache hit rate, which isn't very helpful. To make things worse, the few hundreds of purges a second vs actual client traffic matter a lot more on the mobile cluster (with much less traffic but a big content set) than it does for our other clusters. So until we can factor that out in the Varnish counters (might be possible in Varnish 4.0), we'll have to look at other metrics. More useful therefore is to check the actual backend fetches ("backend_req"), and these appear to have gone down some. Annoyingly, every time we restart a Varnish instance we get a spike in the Ganglia graphs, making the long-term graphs pretty much unusable. To fix that we'll either need to patch Ganglia itself or move to some other stats engine (statsd?). So we have a bit of work to do there on the Ops front. Note that we're about to replace all Varnish caches in eqiad by (fewer) newer, much bigger boxes, and we've decided to also upgrade the 4 mobile boxes with those same specs. And we're also doing that in our new west coast caching data center as well as esams. This will increase the mobile cache size a lot, and will hopefully help by throwing resources at the problem. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] The summary of new zero architecture proposal
Hi Yuri, On Jun 14, 2013, at 7:16 PM, Yuri Astrakhan wrote: > Based on many ideas that were put forth, I would like to seek comments on > this ZERO design. This HTML will be rendered for both M and ZERO subdomains > if varnish detects that request is coming from a zero partner. M and ZERO > will be identical except for the images - ZERO substitutes images with > links to File:xxx namespace through a redirector. > > * All non-local links always point to a redirector. On javascript capable > devices, it will load carrier configuration and replace the link with local > confirmation dialog box or direct link. Without javascript, redirector will > either silently 301-redirect or show confirmation HTML. Links to images on > ZERO.wiki and all external links are done in similar way. For M, you only want to do this when it's a zero carrier I guess? If not, just a straight link? > * The banner is an ESI link to */w/api.php?action=zero&banner=250-99* - > returns HTML blob of the banner. (Not sure if banner ID should be > part of the URL) > > Expected cache fragmentation for each wiki page: > * per subdomain (M|ZERO) > * if M - per "isZeroCarrier" (TRUE|FALSE). if ZERO - always TRUE. > 3 variants is much better then one per carrier ID * 2 per subdomain. I'm wondering, is there any HTML difference between "M & isZeroCarrier == TRUE" and "ZERO"? Links maybe? Can we make those protocol relative perhaps? We might be able to kill the cache differences for the domain completely, while still supporting both URLs externally. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] wikivoyage (and wikidata) served by Varnish in eqiad
Last week, we moved wikidata traffic in eqiad (so in practice, all non-European traffic) from Squid to the new text Varnish cluster. A few issues were found and fixed, and we haven't seen any new issues for several days. Today I've done the same for Wikivoyage. Non-European Wikivoyage traffic, served by our eqiad cluster, is now served by Varnish. Wikivoyage has a bigger portion of normal users vs. API/bot traffic, so some new issues could surface. Please let us know if you see any problems on Wikivoyage that might be related to the Varnish migration; file a Bugzilla ticket or mail me directly. Thanks! -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] wikivoyage (and wikidata) served by Varnish in eqiad
On Aug 5, 2013, at 5:24 PM, David Gerard wrote: > On 5 August 2013 16:17, Mark Bergsma wrote: > >> Last week, we moved wikidata traffic in eqiad (so in practice, all >> non-European traffic) from Squid to the new text Varnish cluster. A few >> issues were found and fixed, and we haven't seen any new issues for several >> days. >> Today I've done the same for Wikivoyage. Non-European Wikivoyage traffic, >> served by our eqiad cluster, is now served by Varnish. Wikivoyage has a >> bigger portion of normal users vs. API/bot traffic, so some new issues could >> surface. >> Please let us know if you see any problems on Wikivoyage that might be >> related to the Varnish migration; file a Bugzilla ticket or mail me directly. > > > Somewhat ignorant question: once we go all-Varnish, will logs be > generated in a similar format to eventually end up at stats.grok.se? Yes, that's generated from our UDP log data, which we have for Squid, Varnish and nginx alike. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [outages] www.wikipedia.com from Level3 via IPv6 not working
Hi George, On Aug 7, 2013, at 8:31 PM, George Herbert wrote: > Not sure if this is real or not, but report that some IPv6 ingress to WMF > not working at the moment from Level3 networks. We had the same result in the Level3 looking glass, but while we were debugging it and trying to gather more info or hosts/networks affected, it started working again in the L3 LG as well. So it appears that the problem was resolved. If anyone is still seeing issues reaching us over IPv6 via Level 3, then please let us know. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] [outages] www.wikipedia.com from Level3 via IPv6 not working
On Aug 7, 2013, at 10:07 PM, George Herbert wrote: > > The original reporter saw the same restoration of service (I assume)... > > Question - Are the WMF ops folks on the NANOG and outages lists? Was this > redundant reporting? 8-) On... yes. Actually reading them? Not really. :) -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Data center move in Amsterdam: expect some downtime
In the upcoming days until new years we will be moving our servers and other equipment in the Amsterdam data center location to a new data center. Unfortunately this might result in some down time and hiccups of certain web sites & services, although we will try to keep this to a minimum. On Sunday the 28th, between 09:00 and 11:00 UTC we will migrate our network in Amsterdam to new equipment. All services located there will be unreachable for a brief period. Traffic for the main wikis will be rerouted to the Florida cluster however, and should remain unaffected. In the days after we will be moving the servers themselves. Some services, such as the mailing lists server, the subversion server and the toolserver cluster, will be down for a number of hours while the equipment is being moved. Traffic for the wikis should again remain largely unaffected. We hope to have the entire migration finished before we enter the last few hours of 2008... and start 2009 with a clean sheet. Happy Holidays everyone! -- Mark Bergsma System & Network Administrator, Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Downtime due to network maintenance, Friday July 31st 12:00 UTC
Hello, Due to a problem in one of our core routers in our Tampa cluster we need to perform some network maintenance tomorrow, Friday July 31st around 12:00 UTC. We will be performing a software upgrade and reboot of the router. This should not take more than a few minutes if everything goes well. Unfortunately this means that practically all sites and services will be down during that time. For those interested: one of the line cards in the router failed earlier this week. A replacement has arrived, but does not boot up correctly after hot plugging. Because we want to upgrade the firmware anyway, we will reboot the entire box. Cheers, -- Mark Bergsma System & Network Administrator, Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] scaled media (thumbs) as *temporary* files, not stored forever
To revive this old thread... On Sep 5, 2012, at 9:35 PM, Asher Feldman wrote: > On Tue, Sep 4, 2012 at 3:11 PM, Platonides wrote: > >> On 03/09/12 02:59, Tim Starling wrote: >>> I'll go for option 4. You can't delete the images from the backend >>> while they are still in Squid, because then they would not be purged >>> when the image is updated or action=purge is requested. In fact, that >>> is one of only two reasons for the existence of the backend thumbnail >>> store on Wikimedia. The thumbnail backend could be replaced by a text >>> file that stores a list of thumbnail filenames which were sent to >>> Squid within a window equivalent to the expiry time sent in the >>> Cache-Control header. >>> -- Tim Starling >> >> The second one seems easy to fix. The first one should IMHO be fixed in >> squid/varnish by allowing wildcard purges (ie. PURGE >> /wikipedia/commons/thumb/5/5c/Tim_starling.jpg/* HTTP/1.0) > fast.ly implements group purge for varnish like this via a proxy daemon > that watches backend responses for a "tag" response header (i.e. all > resolutions of Tim_starling.jpg would be tagged that) and builds an > in-memory hash of tags->objects which can be purged on. I've been told > they'd probably open source the code for us if we want it, and it is > interesting (especially to deal with the fact that we don't purge articles > at all of their possible url's) albeit with its own challenges. If we > implemented a backend system to track thumbnails that exist for a given > orig, we may be able to remove our dependency on swift container listings > to purge images, paving the way for a second class of thumbnails that are > only cached. How about this idea: Just "purge all images with this prefix" doesn't really work in Squid or Varnish, because they don't store their cache database in a format that makes it cheap to determine which objects would match that. Varnish could do it with their "bans", but each ban is kept around for a long time, and with the tens, sometimes hundreds of purges a second we do, this would quickly add up to a massive ban list. But... Varnish allows you to customize how it hashes objects into its object hash table (vcl_hash). What we could do, is hash thumbnails to the same hash key as their original. Because of our current URL structure, that's pretty much a matter of stripping off the thumbnail postfix. Then the original and all its associated thumbnails end up at the same hash key in the hash table, and only a single purge for the original would nuke them all out of the cache. This relies on Varnish having an efficient implementation for multiple objects at a single hash key. It probably does, since it implements Vary processing this way. We would essentially be doing the same, Vary-ing on the thumbnail size. But I'll check the implementation to be sure. Of course this won't work for Squid, but I'm pretty close to being able to replace Squid by Varnish entirely for upload. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] scaled media (thumbs) as *temporary* files, not stored forever
On Oct 24, 2012, at 11:36 AM, Mark Bergsma wrote: > How about this idea: > > Just "purge all images with this prefix" doesn't really work in Squid or > Varnish, because they don't store their cache database in a format that makes > it cheap to determine which objects would match that. Varnish could do it > with their "bans", but each ban is kept around for a long time, and with the > tens, sometimes hundreds of purges a second we do, this would quickly add up > to a massive ban list. > > But... Varnish allows you to customize how it hashes objects into its object > hash table (vcl_hash). What we could do, is hash thumbnails to the same hash > key as their original. Because of our current URL structure, that's pretty > much a matter of stripping off the thumbnail postfix. Then the original and > all its associated thumbnails end up at the same hash key in the hash table, > and only a single purge for the original would nuke them all out of the cache. > > This relies on Varnish having an efficient implementation for multiple > objects at a single hash key. It probably does, since it implements Vary > processing this way. We would essentially be doing the same, Vary-ing on the > thumbnail size. But I'll check the implementation to be sure. I checked, and Varnish stores all variant objects in a linked list per hash table entry. So once it looks up the hash entry for the URL of the original, it'll have to do a linear search for the right thumbnail size, matching each against a Vary header string. If we do this, we'll need to restrict the number of variants (thumb sizes) so we don't get hundreds/thousands on a single hash key. Here's a little proof of concept to demonstrate how it could work: https://gerrit.wikimedia.org/r/#/c/29805/2 -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Wikimedia logging infrastructure
On 10-08-10 07:16, Rob Lanphier wrote: > At any rate, there are a couple of problems with the way that it works: > 1. Once we saturate the NIC on the logging machine, the quality of > our sampling degrades pretty rapidly. We've generally had a problem > with that over the past few months. > As already stated elsewhere, we didn't really saturate any NICs, just some socket buffers. Because of the large number of configured log pipes, the software (udp2log) could not empty the socket buffers fast enough. > If this were your typical commercial operation, the answer would be > "why aren't you just logging into Streambase?" (or some other data > warehousing storage solution). I'm not suggesting that we do that (or > even look at any of the solutions that bill themselves as open source > alternatives), since, while our needs are increasing, we still aren't > planning to be anywhere near as sophisticated as a lot of data > tracking orgs. Still, it's worth asking questions about our existing > setup. Should we be looking optimize our existing single-box setup, > extending our software to have multi-node collection, or looking at a > whole new collection strategy? > > Besides the ideas that are currently being kicked around of improving or rewriting the udp log collection software, there's also always the short-term, easy option of sending a multicast UDP stream, and having multiple collectors with distinct log pipes setup. E.g. one machine for the sampled logging, and another, independent machine to do all the special purpose log streams. I do like more efficient software solutions rather than throwing more iron at the problem, though. :) -- Mark Bergsma Operations Engineer, Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Thumbnail issues being resolved
(I just posted the following to the tech blog, http://techblog.wikimedia.org) Last Monday, our Solaris server that contains all image thumbnails developed problems. It ran out of memory, became too slow and eventually even started to crash. (For the technically inclined: we think the kernel is leaking some file system structure in kernel memory.) This caused missing thumbnails across Wikimedia projects. We addressed these problems in the following ways: * We decreased the load on this server by adapting the Squid configuration, so it would have to handle fewer requests. * We ordered more memory, in order to double the total physical memory in the relevant systems. * We set up two new Linux servers that will eventually replace the Solaris server. At first, the addition of these Linux servers in a partially caching setup seemed enough to fix the immediate problem, while gradually copying all thumbnail files, allowing us to replace the Solaris server completely. However, on Saturday night the Solaris server started crashing repeatedly, making it necessary to engage the image scalers to regenerate a large part of the missing thumbnails. This is causing some slowness of loading and generating new (uncached) thumbnails. Fortunately, most users have not experienced serious problems while using the site, since most thumbnails are cached by our HTTP caching layer. It is impossible to determine exactly how long it will take to recover completely from the slower service, but we expect that this will take no more than a few days. Over the past months we have been developing a new and more scalable architecture for media storage, which will solve these problems once and for all. We hope to deploy this new architecture within a few months, also utilizing the new data center. Please watch the Tech Blog for updates on this project. -- Mark Bergsma Operations Engineering Program Manager Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Mailing lists server migration today
Hi, Today I will be migrating the mailing lists from a very old server (lily) in Amsterdam, to a new server (sodium) in our new Ashburn data center. Mailman will be upgraded to version 2.1.13 along the way. During the migration, mail will be delayed as all data will need to be transferred to the new host. No mail should go lost, but no new mails will be sent out during the process until done, and the web interface will be unavailable. This shouldn't take more than one hour, if all goes well. I will report here when things should be back up and running. Afterwards, please let us know of any new issues, in bugzilla or on IRC (#wikimedia-tech). We don't expect any problems, but as with any software upgrade or migration, this can't be guaranteed... Thanks, -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] CANCELED: Mailing lists server migration today
On Jan 13, 2012, at 2:54 PM, Mark Bergsma wrote: > Hi, > > Today I will be migrating the mailing lists from a very old server (lily) in > Amsterdam, to a new server (sodium) in our new Ashburn data center. Mailman > will be upgraded to version 2.1.13 along the way. ...and right after I sent this mail, I rebooted the new server once more before starting the maintenance. But suddenly it refused to come back up, or even reinstall. Likely the server has hardware issues. Therefore the maintenance is canceled for today, until we've figured out what the problem is. The migration will probably happen next week, possibly using different hardware. -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] COMPLETED: Mailing lists server migration today
On Jan 13Jan 18, 2012, at 2:54 PM, Mark Bergsma wrote: > Today I will be migrating the mailing lists from a very old server (lily) in > Amsterdam, to a new server (sodium) in our new Ashburn data center. Mailman > will be upgraded to version 2.1.13 along the way. > > During the migration, mail will be delayed as all data will need to be > transferred to the new host. No mail should go lost, but no new mails will be > sent out during the process until done, and the web interface will be > unavailable. This shouldn't take about one hour, if all goes well. > > I will report here when things should be back up and running. Afterwards, > please let us know of any new issues, in bugzilla or on IRC > (#wikimedia-tech). We don't expect any problems, but as with any software > upgrade or migration, this can't be guaranteed... The mailing lists server migration is now complete - Mailman is now running on server sodium. As some people pointed out, my message earlier today was indeed sent out with the wrong Date header. I simply redirected my old mail and edited it a bit, forgetting that the Date header would not be adjusted by my mail client. Sorry for that. :) The Mailman migration went smoothly, and I'm not aware of any problems. Please let us know in Bugzilla or on IRC (#wikimedia-tech) if you're experiencing any new issues. Unfortunately we needed to change the IP address of lists.wikimedia.org for this migration. Some large e-mail providers (e.g. Google) are rate limiting reception of mail messages from the new IP (208.80.154.4) because it's not known and whitelisted yet. To prevent further mail delivery delays today, I've configured the new lists server to forward mails that would otherwise be delayed via the old mail server and old source IP address again, for the time being. Thanks, -- Mark Bergsma Lead Operations Architect Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Faidon Liambotis promoted to Principal Operations Engineer
I'm pleased to announce that Faidon Liambotis has been promoted to Principal Operations Engineer. From the very first week he was hired, Faidon has expressed great interest in understanding and improving the complete infrastructure stack of the Wikimedia Foundation, covering not only the domain of the Operations team, but far beyond. I distinctly remember how, a few days after he was hired (which at the time, I didn't take any part in), he approached me for the first time on IRC and said: "Hi Mark! Nice to meet you. I see you just wrote this nice new director for consistent URL hashing to backends in Varnish. Let me help you get that upstreamed!" I believe in that same week he fixed some bugs in our nginx setup and solved our scalability issues with Puppet's external (Nagios) resources, amongst other things. Ever since, Faidon has taken on many projects, large and small, and completed them in ways going far beyond his duties. He has spent enormous amounts of time reviewing other people's patch sets, discussing their ideas, and mentoring them in their work. He's instrumental in coordinating efforts across multiple groups and making sure everyone arrives at the best possible solution. In discussions he's noticed for being analytical and methodical, and calmly working towards a common goal. This is reflected in his architecting work too, where he contributes with sensible ideas and a great knowledge of the open source software and solutions landscape. Outside of Wikimedia, Faidon has been active in Debian and other open source projects since 2004. He cares deeply about our use of open source solutions and helping our software extensions get upstreamed and made available to others. I think it's only appropriate that we recognize his role with this promotion. The biggest problem we may have with him is that he works too much and is involved with almost everything. Fortunately that is a good fit for his new role. :) Please join me in congratulating Faidon. — Mark Bergsma Lead Operations Architect Acting Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Welcome Chase Pettet to the Wikimedia Operations Team
I'm very pleased to announce that Chase Pettet is joining the Wikimedia Foundation as Operations Engineer. Chase comes to us from DeviantArt, where he was responsible for their general server management infrastructure, monitoring and networking, as well as supporting the development team(s). Within Wikimedia Operations he will have similar responsibilities, working on Operations infrastructure projects and supporting other Engineering teams with their Operations needs. Chase will be working remotely from his home in Missouri. He started with us yesterday. Please join me in welcoming Chase! — Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Welcome Giuseppe Lavagetto to Wikimedia Operations
I'm pleased to announce that today, Giuseppe Lavagetto will be joining the Operations Team as an Operations Engineer. Giuseppe is based in Rome, Italy and will be working with us remotely. He's coming from Venere, a daughter company of Expedia, and has greatly helped streamline Operations and improve service reliability there. Giuseppe is very passionate about free and open source, free content and user privacy, and these aspects are strong motivations for him to join the Wikimedia Foundation. In his free time, he's an active volunteer with Autistici[1], a project that provides users communications privacy and helps avoid censorship. He also likes to contribute to various small FLOSS projects and loves music, blues, soul and hip-hop in particular. He's happily living with his wife and 11 year old step-daughter in Rome. Giuseppe will be joining us next week in our off-site team meeting in Athens, which should be a short trip for him. :) Please welcome Giuseppe! [1] http://www.autistici.org — Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Please welcome Filippo Giunchedi to Wikimedia TechOps
I'm very happy to announce that Filippo Giunchedi is joining us as an Operations Engineer in the Technical Operations team. Filippo is Italian, but he lives in Dublin where he interned at Google and worked at Amazon before coming to Wikimedia. He's gained a lot of experienced working with large scale distributed systems and infrastructure there. Filippo will be working with us remotely. Today is his start day, but we were lucky to have him join us at our Ops off-site meeting in Athens a few weeks ago, where he helped improve our monitoring of system metrics with Graphite. Fiddling with machines has always been his passion - it led to being fascinated by computers in the late 90s. He got involved in free software projects (e.g. Debian, as a Debian Developer) in the mid-2000s. System level technologies, infrastructure, distributed systems and networking are his main interests. On a different level, he's also interested in online privacy and secure/anonymous communications (e.g. Tor). You can find Filippo on IRC (Freenode), using the nick name "godog". Please join me in welcoming Filippo! — Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] SSL 3.0 disabled on Wikimedia sites
Hi all, Due to the POODLE vulnerability in SSL3.0 that's been announced this week and has made its round through the media, we decided that we needed to disable SSL3.0 on all our HTTPS services today, to protect the security of all our users. The bulk of that change has been deployed today at 15:00 UTC for the wikis, and the remaining HTTPS services are getting the same treatment throughout the day. Please see our blog post on this topic for details: http://blog.wikimedia.org/2014/10/17/protecting-users-against-poodle-by-removing-ssl-3-0-support/ If you see or hear about anyone having issues connecting to our sites over HTTPS or logging in, please direct them at the link above, and urge them to upgrade their software. Unfortunately due to the nature of HTTPS we're not able to provide a fallback when users get an error message due to this. We're still looking into the possibility to provide affected users with an informative error message upon login however, before they get redirected from HTTP to HTTPS. As a side note, we've also deployed Google's SCSV SSL extension[1] on our servers yesterday, such that the attack surface for such vulnerabilities will be reduced in the future for clients which support this extension. [1] http://googleonlinesecurity.blogspot.nl/2014/10/this-poodle-bites-exploiting-ssl-30.html Thanks, -- Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Announcement: Yuvi Panda joins Ops
Hi all, I'm very pleased to announce that as of this week, Yuvi Panda is part of the Wikimedia Technical Operations team, to work on our Wikimedia Labs infrastructure. Yuvi originally joined the Wikimedia Foundation Mobile team in December 2011, where he has been lead development for the original Wikipedia App and its rewrite, amongst many other projects. Besides his work in Mobile, Yuvi has been volunteering for Ops work in Wikimedia Labs for a long time now. One of the notable examples of his work is a seamlessly integrated Web proxy system that allows public web requests from the Internet to be proxied to Labs instances on private IPs without requiring public IP addresses for each instance. This very user friendly system, which he built on top of NGINX, LUA, redis, sqlite and the OpenStack API, sees a lot of usage and has dramatically reduced the need for Labs users to request (scarce) public IP address resources via a manual approval process. Another example of his work that has made a big difference is the initiation of the Labs-Vagrant project; bringing the virtues of the Mediawiki:Vagrant project to Wikimedia Labs, and allowing anyone to bring a MediaWiki development environment up in Labs with great ease. More recently Yuvi has been working on our much needed infrastructure in Labs for monitoring metrics (Graphite) and service availability (Shinken). We expect this will give us a lot more insight into the internals and availability of software and services running in Wikimedia Labs and its many projects, and we should be able to deploy it in Production as well. Of course all of this work didn't go unnoticed, and about half a year ago we've asked Yuvi if he was interested to move to Ops. With his extensive development experience and his demonstrated ability to join this with solid Ops work to create stable and highly useful solutions, we think he's a great fit for this role. Yuvi recently had his VISA application accepted, and is planning to move to San Francisco in March 2015. Until then he will be working with us remotely from India. Please join me in congratulating Yuvi! -- Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Tweet of site outage
Hi all, We've indeed had a total site outage for roughly 30 minutes. We're still collecting all data, but we've tracked down the cause to multiple cascading issues including loss of power to a critical SPOF network switch and HHVM MediaWiki application servers getting blocked due to multiple unoptimal timeout settings. We'll post a full incident report soon, and work to correct the underlying issues as soon as possible. Our apologies, On Thu, Feb 5, 2015 at 7:03 PM, Guillaume Paumier wrote: > Hi, > > Le jeudi 5 février 2015, 09:58:01 George Herbert a écrit : > > I saw a WMF tweet of a site outage (network?) around 9:30am Pacific > time, by > > the time I could check now things seem ok on en > > Sites are mostly back up but there are still issues with login, so the Ops > team hasn't had time to write a postmortem yet. > > -- > Guillaume Paumier > > ___ > Wikitech-l mailing list > Wikitech-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wikitech-l > -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Tweet of site outage
The incident report is now posted on wikitech: https://wikitech.wikimedia.org/wiki/Incident_documentation/20150205-SiteOutage On Thu, Feb 5, 2015 at 7:57 PM, Mark Bergsma wrote: > Hi all, > > We've indeed had a total site outage for roughly 30 minutes. We're still > collecting all data, but we've tracked down the cause to multiple cascading > issues including loss of power to a critical SPOF network switch and HHVM > MediaWiki application servers getting blocked due to multiple unoptimal > timeout settings. We'll post a full incident report soon, and work to > correct the underlying issues as soon as possible. > > Our apologies, > > On Thu, Feb 5, 2015 at 7:03 PM, Guillaume Paumier > wrote: > >> Hi, >> >> Le jeudi 5 février 2015, 09:58:01 George Herbert a écrit : >> > I saw a WMF tweet of a site outage (network?) around 9:30am Pacific >> time, by >> > the time I could check now things seem ok on en >> >> Sites are mostly back up but there are still issues with login, so the Ops >> team hasn't had time to write a postmortem yet. >> >> -- >> Guillaume Paumier >> >> _______ >> Wikitech-l mailing list >> Wikitech-l@lists.wikimedia.org >> https://lists.wikimedia.org/mailman/listinfo/wikitech-l >> > > > > -- > Mark Bergsma > Lead Operations Architect > Director of Technical Operations > Wikimedia Foundation > -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Moritz Muehlenhoff joins as Ops Security Engineer
Hi all, I'm very pleased to announce that as of todayyesterday, Moritz Mühlenhoff will be joining the Ops team in the role of Operations Security Engineer. We're excited as for the first time we'll have an engineer on our team able to focus on enhancing the security of our infrastructure. Some of you Debian users may recognize his name; in his spare time he's very active in the Debian Security Team and sends out a large portion of their security advisory mails. ;) Moritz lives in Bremen, North Germany (internationally perhaps best known for being the home of Beck's beer) with his spouse Silvia and their 16 m/o son Tjark. Besides being a Debian Developer, he also very much enjoys Rugby Union and plays tighthead prop in his local club "Union 60 Bremen" in the third divison of Germany. He used to be a frequent visitor of film festivals such as the San Sebastian festival, but with the baby around home theatre has become more prevalent. :-) Moritz is working with us remotely, and can usually be found using his nick "jmm" on Freenode. Please join me in welcoming Moritz to the team! -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
[Wikitech-l] Please welcome Jaime Crespo
Hi all, I'm very pleased to announce that we've recently hired Jaime Crespo as Sr. Database Administrator. Jaime has joined the Technical Operations team to strengthen our DBA capacity. He will be working closely with Sean, and will join responsibility for our production database infrastructure, the Wikimedia Labs replicas and the Analytics/research databases. His addition to the team will also allow us to support our developers better with code review and advice about database queries and schema tuning. Before he joined us Jaime has been a MySQL/MariaDB DBA consultant, both at Percona and later as an independent contractor. In that role he has supported many database environments, large and small. Being a fan of the free software and open data movements, Jaime is excited to be employing his experience in such an environment. Jaime lives in the Zaragoza area in Spain, and will be working with us remotely from home. Outside of work, he is an active contributor to the Spanish Wikipedia and the OpenStreetMap projects as well. His other hobbies include photography, cycling, astronomy, reading and acting in theater. Jaime can be found on IRC under the nickname 'jynus'. Please join me in welcoming him! -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Data center switch-over moving ahead next week: please stay available :)
Hi everyone, After we've been successfully serving our sites from our backup data-center codfw (Dallas) for the past two days, we're now starting our switch back to eqiad (Ashburn) as planned[1]. We've already moved cache traffic back to eqiad, and within the next minutes, we'll disable editing by going read-only for approximately 30 minutes - hopefully a bit faster than 2 days ago. [1] http://blog.wikimedia.org/2016/04/11/wikimedia-failover-test/ On Tue, Apr 19, 2016 at 6:00 PM, Mark Bergsma wrote: > Hi all, > > Today the data center switch-over commenced as planned, and has just fully > completed successfully. We are now serving our sites from codfw (Dallas, > Texas) for the next 2 days if all stays well. > > We switched the wikis to read-only (editing disabled) at 14:02 UTC, and > went back read-write at 14:48 UTC - a little longer than planned. While > edits were possible then, unfortunately at that time Special:Recent Changes > (and related change feeds) were not yet working due to an unexpected > configuration problem with our Redis servers until 15:10 UTC, when we found > and fixed the issue. The site has stayed up and available for readers > throughout the entire migration. > > Overall the procedure was a success with few problems along the way. > However we've also carefully kept track of any issues and delays we > encountered for evaluation to improve and speed up the procedure, and > reducing impact to our users - some of which will already be implemented > for our switch back on Thursday. > > We're still expecting to find (possibly subtle) issues today, and would > like everyone who notices anything to use the following channels to report > them: > > 1. File a Phabricator issue with project #codfw-rollout > 2. Report issues on IRC: Freenode channel #wikimedia-tech (if urgent) > 3. Send an e-mail to the Operations list: o...@lists.wikimedia.org > > We're not done yet, but thanks to all who have helped so far. :-) > > Mark > -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] Data center switch-over moving ahead next week: please stay available :)
We've just completed the switch back, and all services are running from our main data center eqiad (Ashburn) again. The process went very smooth this time around. In the past two days leading up to this, we've been able to either fix or work around the most important issues we encountered on Tuesday. This meant that we had no real setbacks or unanticipated delays today, and therefore were able to complete the most time pressing and user-impacting part (during which MediaWiki is read-only) in 20 minutes, down from ~45 minutes two days ago. However, we'll be doing this again in the future, and until then we'll work on improving and further automating this process to get it down to hopefully much lower levels of impact and duration. Please let us know if you see any issues which may be caused by the switch-over(s). Thanks much to everyone involved! Mark On Thu, Apr 21, 2016 at 3:53 PM, Mark Bergsma wrote: > Hi everyone, > > After we've been successfully serving our sites from our backup > data-center codfw (Dallas) for the past two days, we're now starting our > switch back to eqiad (Ashburn) as planned[1]. > > We've already moved cache traffic back to eqiad, and within the next > minutes, we'll disable editing by going read-only for approximately 30 > minutes - hopefully a bit faster than 2 days ago. > > [1] http://blog.wikimedia.org/2016/04/11/wikimedia-failover-test/ > > On Tue, Apr 19, 2016 at 6:00 PM, Mark Bergsma wrote: > >> Hi all, >> >> Today the data center switch-over commenced as planned, and has just >> fully completed successfully. We are now serving our sites from codfw >> (Dallas, Texas) for the next 2 days if all stays well. >> >> We switched the wikis to read-only (editing disabled) at 14:02 UTC, and >> went back read-write at 14:48 UTC - a little longer than planned. While >> edits were possible then, unfortunately at that time Special:Recent Changes >> (and related change feeds) were not yet working due to an unexpected >> configuration problem with our Redis servers until 15:10 UTC, when we found >> and fixed the issue. The site has stayed up and available for readers >> throughout the entire migration. >> >> Overall the procedure was a success with few problems along the way. >> However we've also carefully kept track of any issues and delays we >> encountered for evaluation to improve and speed up the procedure, and >> reducing impact to our users - some of which will already be implemented >> for our switch back on Thursday. >> >> We're still expecting to find (possibly subtle) issues today, and would >> like everyone who notices anything to use the following channels to report >> them: >> >> 1. File a Phabricator issue with project #codfw-rollout >> 2. Report issues on IRC: Freenode channel #wikimedia-tech (if urgent) >> 3. Send an e-mail to the Operations list: o...@lists.wikimedia.org >> >> We're not done yet, but thanks to all who have helped so far. :-) >> >> Mark >> > > -- > Mark Bergsma > Lead Operations Architect > Director of Technical Operations > Wikimedia Foundation > -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l
Re: [Wikitech-l] 2017-10-25 Scrum of Scrums meeting notes
On Wed, Nov 1, 2017 at 9:31 AM, Federico Leva (Nemo) wrote: > Thanks. "Procurement for Asia datacenter has started" is big news! Is > energy efficiency a criterion for the product/supplier selection? > > (This needs not be something very complicated. For instance Dell asks a > small surcharge if you want a more efficient PSU, IIRC. < > http://www.dell.com/learn/uk/en/ukbsdt1/help-me-choose/hmc- > power-supply-unit-12g?ref=CFG>) > We buy servers from two standard vendors (Dell and HP), and select the most energy-efficient components (including PSUs) available for the task - not only because it's better for the environment, but also because it allows us to achieve higher rack density (more servers within the same space) and therefore also saves costs over time. These server configurations have been carefully sized, selected and tested/measured in practice with actual work loads to provide the most optimal usage of resources. For example, through consolidation and optimization of cache clusters onto fewer, but somewhat higher capacity new servers we've reduced the amount of equipment and power required to approximately 60%, also shrinking data center space, supportive infrastructure and management needs in the process. -- Mark Bergsma Lead Operations Architect Director of Technical Operations Wikimedia Foundation ___ Wikitech-l mailing list Wikitech-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikitech-l