Re: [Wikitech-l] ZERO architecture

Mark Bergsma Fri, 31 May 2013 02:45:44 -0700

>> * feature phones -- HTML only, the banner is inserted by the ESI
>> ** for carriers with free images
>> ** for carriers without free images
>> 
> 
> What about including ESI tags for banners for smart devices as well as
> feature phones, then either use ESI to insert the banner for both device
> types or, alternatively, for smart devices don't let Varnish populate the
> ESI chunk and instead use JS to replace the ESI tags with the banner? That
> way we can still serve the same HTML for smart phones and feature phones
> with images (one less thing for which to vary the cache).


I think the verdict is still out on whether it's better to use ESI for Banners 
in Varnish or use JS for that client-side. I guess we'll have to test and see.

> Are there carrier-specific things that would result in different HTML for
> devices that do not support JS, or can you get away with providing the same
> non-js experience for Zero as MobileFrontend (aside from the
> banner, presumably handled by ESI)? If not currently, do you think its
> feasible to do that (eg make carrier-variable links get handled via special
> pages so we can always rely on the same URIs)? Again, it would be nice if
> we could just rely on the same HTML to further reduce cache variance. It
> would be cool if MobileFrontend and Zero shared buckets and they were
> limited to:
> 
> * HTML + images
> * HTML - images
> * WAP

That would be nice.

> Since we improved MobileFrontend to no longer vary the cache on X-Device,
> I've been surprised to not see a significant increase in our cache hit
> ratio (which warrants further investigation but that's another email). Are
> there ways we can do a deeper analysis of the state of the varnish cache to
> determine just how fragmented it is, why, and how much of a problem it
> actually is? I believe I've asked this before and was met with a response
> of 'not really' - but maybe things have changed now, or others on this list
> have different insight. I think we've mostly approached the issue with a
> lot more assumption than informed analysis, and if possible I think it
> would be good to change that.

Yeah, we should look into that. We've already flagged a few possible culprits, 
and we're also working on the migration of the desktop wiki cluster from Squid 
to Varnish, which has some of the same issues with variance (sessions, XVO, 
cookies, Accept-Language...) as MobileFrontend does. After we've finished 
migrating that and confirmed that it's working well, we want to unify those 
clusters' configurations a bit more, and that by itself should give us 
additional opportunity to compare some strategies there.

We've since also figured out that the way we've calculate cache efficiency with 
Varnish is not exactly ideal; unlike Squid, cache purges are done as HTTP 
requests to Varnish. Therefore in Varnish, those cache lookups are calculated 
into the cache hit rate, which isn't very helpful. To make things worse, the 
few hundreds of purges a second vs actual client traffic matter a lot more on 
the mobile cluster (with much less traffic but a big content set) than it does 
for our other clusters. So until we can factor that out in the Varnish counters 
(might be possible in Varnish 4.0), we'll have to look at other metrics.

More useful therefore is to check the actual backend fetches ("backend_req"), 
and these appear to have gone down some. Annoyingly, every time we restart a 
Varnish instance we get a spike in the Ganglia graphs, making the long-term 
graphs pretty much unusable. To fix that we'll either need to patch Ganglia 
itself or move to some other stats engine (statsd?). So we have a bit of work 
to do there on the Ops front.

Note that we're about to replace all Varnish caches in eqiad by (fewer) newer, 
much bigger boxes, and we've decided to also upgrade the 4 mobile boxes with 
those same specs. And we're also doing that in our new west coast caching data 
center as well as esams. This will increase the mobile cache size a lot, and 
will hopefully help by throwing resources at the problem.

-- 
Mark Bergsma <m...@wikimedia.org>
Lead Operations Architect
Wikimedia Foundation





_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] ZERO architecture

Reply via email to