G'day, I'm happy to commit this to Squid-2.HEAD as-is. Can you throw it in a Bugzilla report and spit me the number?
Thanks, Adrian On Fri, Feb 08, 2008, Tim Starling wrote: > There are two major sources of suboptimal hit rate on Wikipedia which > relate to the Vary header: > > * In Accept-Encoding, we only care whether "gzip" is present or not, but > IE and Firefox use different whitespace conventions and so each get > separate entries in the cache > * We only care whether the user is logged in or not. Other cookies, such > as pure-JavaScript cookies used by client-side code to store > preferences, unnecessarily degrade our hit rate. > > There have been other patches related to this problem, but as far as I'm > aware, they're all special-case, site-specific hacks. My patch adds an > X-Vary-Options response header (hereafter XVO), and thus gives the > origin server fine control over cache variance. In the patch, the XVO > header overrides the Vary header, so the Vary header can still be sent > as usual for compatibility with caches that don't support this feature. > > The format of the XVO header is inspired by the format of the Accept > header. As in Vary, XVO is separated by commas into parts which relate > to different request headers. Then those parts are further separated by > semicolons. The first semicolon-separated part is the request header > name, and subsequent parts give name/value pairs separated by equals > signs, defining options relating to the variance of that header. > > Two option names are currently defined: > > list-contains: splits the request header into comma-separated parts > and varies depending on whether the resulting list contains the option value > string-contains: performs a simple search of the request header and > varies depending on whether it matches. > > Multiple such options per header are allowed. > > So for example: > > X-Vary-Options: Cookie; string-contains=UserID; > string-contains=_session, Accept-Encoding; list-contains=gzip > > This would vary the cache on three tests: > * whether the Cookie header contains the string "UserID" > * whether the Cookie header contains the string "_session" > * whether the Accept-Encoding header, interpreted as a comma-separated > list, contains the item "gzip" > > The patch refactors all references to the Vary and X-Accelerator-Vary > headers into the functions httpHeaderHasVary() and httpHeaderGetVary() > in HttpHeader.c. It then adds X-Vary-Options to these functions, > interpreting it as a string rather than a list to avoid inappropriate > splitting on whitespace. It puts the resulting combined header into > X-Vary-Options instead of Vary in the base vary marker object, again to > avoid inappropriate list-style interpretation. httpMakeVaryMark() then > interprets this combined header in the way described above. > > The added features of the patch are conditional, and are enabled by the > configure option --enable-vary-options. Autoconf and automake will need > to be run after applying this patch. > > The interpretation of some non-standards-compliant Vary headers (those > containing semicolons) is changed slightly by this patch regardless of > --enable-vary-options. > > The patch is attached and also available at: > http://noc.wikimedia.org/~tstarling/patches/vary_options_upstream.patch > > For your review and consideration. > > -- Tim Starling > diff -Xdiffx -ru squid-2.6.18.orig/configure.in squid-2.6.18/configure.in > --- squid-2.6.18.orig/configure.in 2008-01-10 23:34:23.000000000 +1100 > +++ squid-2.6.18/configure.in 2008-02-07 19:43:23.000000000 +1100 > @@ -1507,6 +1507,16 @@ > fi > ]) > > +dnl Enable vary options > +AC_ARG_ENABLE(vary_options, > +[ --enable-vary-options > + Enable support for the X-Vary-Options header.], > +[ if test "$enableval" = "yes" ; then > + echo "Enabling support for vary options" > + AC_DEFINE(VARY_OPTIONS, 1, [Enable support for the X-Vary-Options > header]) > + fi > +]) > + > AC_ARG_ENABLE(follow-x-forwarded-for, > [ --enable-follow-x-forwarded-for > Enable support for following the X-Forwarded-For > diff -Xdiffx -ru squid-2.6.18.orig/src/client_side.c > squid-2.6.18/src/client_side.c > --- squid-2.6.18.orig/src/client_side.c 2008-02-07 19:28:38.000000000 > +1100 > +++ squid-2.6.18/src/client_side.c 2008-02-08 14:39:38.000000000 +1100 > @@ -735,10 +735,7 @@ > request_t *request = http->request; > const char *etag = httpHeaderGetStr(&mem->reply->header, HDR_ETAG); > const char *vary = request->vary_headers; > - int has_vary = httpHeaderHas(&entry->mem_obj->reply->header, > HDR_VARY); > -#if X_ACCELERATOR_VARY > - has_vary |= httpHeaderHas(&entry->mem_obj->reply->header, > HDR_X_ACCELERATOR_VARY); > -#endif > + int has_vary = httpHeaderHasVary(&entry->mem_obj->reply->header); > if (has_vary) > vary = httpMakeVaryMark(request, mem->reply); > > @@ -4948,10 +4945,7 @@ > varyEvaluateMatch(StoreEntry * entry, request_t * request) > { > const char *vary = request->vary_headers; > - int has_vary = httpHeaderHas(&entry->mem_obj->reply->header, HDR_VARY); > -#if X_ACCELERATOR_VARY > - has_vary |= httpHeaderHas(&entry->mem_obj->reply->header, > HDR_X_ACCELERATOR_VARY); > -#endif > + int has_vary = httpHeaderHasVary(&entry->mem_obj->reply->header); > if (!has_vary || !entry->mem_obj->vary_headers) { > if (vary) { > /* Oops... something odd is going on here.. */ > diff -Xdiffx -ru squid-2.6.18.orig/src/enums.h squid-2.6.18/src/enums.h > --- squid-2.6.18.orig/src/enums.h 2008-02-07 19:28:38.000000000 +1100 > +++ squid-2.6.18/src/enums.h 2008-02-07 21:35:18.000000000 +1100 > @@ -256,6 +256,9 @@ > #if X_ACCELERATOR_VARY > HDR_X_ACCELERATOR_VARY, > #endif > +#if VARY_OPTIONS > + HDR_X_VARY_OPTIONS, > +#endif > HDR_X_ERROR_URL, /* errormap, requested URL */ > HDR_X_ERROR_STATUS, /* errormap, received HTTP status line > */ > HDR_FRONT_END_HTTPS, > diff -Xdiffx -ru squid-2.6.18.orig/src/http.c squid-2.6.18/src/http.c > --- squid-2.6.18.orig/src/http.c 2008-02-07 19:28:38.000000000 +1100 > +++ squid-2.6.18/src/http.c 2008-02-08 14:48:44.000000000 +1100 > @@ -353,20 +353,29 @@ > String vstr = StringNull; > > stringClean(&vstr); > - hdr = httpHeaderGetList(&reply->header, HDR_VARY); > - if (strBuf(hdr)) > - strListAdd(&vary, strBuf(hdr), ','); > - stringClean(&hdr); > -#if X_ACCELERATOR_VARY > - hdr = httpHeaderGetList(&reply->header, HDR_X_ACCELERATOR_VARY); > - if (strBuf(hdr)) > - strListAdd(&vary, strBuf(hdr), ','); > - stringClean(&hdr); > -#endif > + vary = httpHeaderGetVary(&reply->header); > + debug(11,3) ("httpMakeVaryMark: Vary: %s\n", strBuf(vary)); > + > while (strListGetItem(&vary, ',', &item, &ilen, &pos)) { > - char *name = xmalloc(ilen + 1); > - xstrncpy(name, item, ilen + 1); > - Tolower(name); > + const char *sc_item, *sc_pos = NULL; > + int sc_ilen; > + String str_item; > + char *name = NULL; > + String value_spec = StringNull; > + int need_value = 1; > + > + stringLimitInit(&str_item, item, ilen); > + > + /* Get the header name */ > + if (strListGetItem(&str_item, ';', &sc_item, &sc_ilen, &sc_pos)) { > + name = xmalloc(sc_ilen + 1); > + xstrncpy(name, sc_item, sc_ilen + 1); > + Tolower(name); > + } else { > + name = xmalloc(1); > + *name = '\0'; > + } > + > if (strcmp(name, "accept-encoding") == 0) { > aclCheck_t checklist; > memset(&checklist, 0, sizeof(checklist)); > @@ -381,22 +390,76 @@ > if (strcmp(name, "*") == 0) { > /* Can not handle "Vary: *" efficiently, bail out making the > response not cached */ > safe_free(name); > + stringClean(&str_item); > stringClean(&vary); > stringClean(&vstr); > break; > } > - strListAdd(&vstr, name, ','); > + > + /* Fetch the header string */ > hdr = httpHeaderGetByName(&request->header, name); > - safe_free(name); > - value = strBuf(hdr); > - if (value) { > - value = rfc1738_escape_part(value); > - stringAppend(&vstr, "=\"", 2); > - stringAppend(&vstr, value, strlen(value)); > - stringAppend(&vstr, "\"", 1); > + > + /* Process the semicolon-separated options */ > +#ifdef VARY_OPTIONS > + while (strListGetItem(&str_item, ';', &sc_item, &sc_ilen, &sc_pos)) { > + char *opt_name = xmalloc(sc_ilen + 1); > + char *opt_value; > + char *eqpos; > + xstrncpy(opt_name, sc_item, sc_ilen + 1); > + eqpos = strchr(opt_name, '='); > + if (!eqpos) { > + opt_value = NULL; > + } else { > + *eqpos = '\0'; > + opt_value = eqpos + 1; > + } > + Tolower(opt_name); > + > + if (strcmp(opt_name, "list-contains") == 0 && opt_value) { > + if (strBuf(hdr) && strListIsMember(&hdr, opt_value, ',')) { > + opt_value = rfc1738_escape_part(opt_value); > + strListAdd(&value_spec, "list-contains[\"", ';'); > + stringAppend(&value_spec, opt_value, strlen(opt_value)); > + stringAppend(&value_spec, "\"]", 2); > + } > + need_value = 0; > + } else if (strcmp(opt_name, "string-contains") == 0 && opt_value) { > + if (strBuf(hdr) && strIsSubstr(&hdr, opt_value)) { > + opt_value = rfc1738_escape_part(opt_value); > + strListAdd(&value_spec, "string-contains[\"", ';'); > + stringAppend(&value_spec, opt_value, strlen(opt_value)); > + stringAppend(&value_spec, "\"]", 2); > + } > + need_value = 0; > + } else { > + debug(11,3) ("httpMakeVaryMark: unrecognised vary option: > %s\n", opt_name); > + } > + safe_free(opt_name); > } > +#endif > + > + if (need_value) { > + value = strBuf(hdr); > + if (value) { > + value = rfc1738_escape_part(value); > + strListAdd(&value_spec, "\"", ';'); > + stringAppend(&value_spec, value, strlen(value)); > + stringAppend(&value_spec, "\"", 1); > + } > + } > + > + strListAdd(&vstr, name, ','); > + stringAppend(&vstr, "=", 1); > + if (strBuf(value_spec)) { > + stringAppend(&vstr, strBuf(value_spec), strLen(value_spec)); > + } > + > stringClean(&hdr); > + stringClean(&value_spec); > + stringClean(&str_item); > + safe_free(name); > } > + > safe_free(request->vary_hdr); > safe_free(request->vary_headers); > if (strBuf(vary) && strBuf(vstr)) { > @@ -514,11 +577,7 @@ > /* non-chunked. Handle as one single big chunk (-1 if terminated by > EOF) */ > httpState->chunk_size = > httpReplyBodySize(httpState->orig_request->method, reply); > } > - if (httpHeaderHas(&reply->header, HDR_VARY) > -#if X_ACCELERATOR_VARY > - || httpHeaderHas(&reply->header, HDR_X_ACCELERATOR_VARY) > -#endif > - ) { > + if (httpHeaderHasVary(&reply->header)) { > const char *vary = NULL; > if (Config.onoff.cache_vary) > vary = httpMakeVaryMark(httpState->orig_request, reply); > diff -Xdiffx -ru squid-2.6.18.orig/src/HttpHeader.c > squid-2.6.18/src/HttpHeader.c > --- squid-2.6.18.orig/src/HttpHeader.c 2007-12-21 20:56:53.000000000 > +1100 > +++ squid-2.6.18/src/HttpHeader.c 2008-02-08 14:49:24.000000000 +1100 > @@ -133,6 +133,9 @@ > #if X_ACCELERATOR_VARY > {"X-Accelerator-Vary", HDR_X_ACCELERATOR_VARY, ftStr}, > #endif > +#if VARY_OPTIONS > + {"X-Vary-Options", HDR_X_VARY_OPTIONS, ftStr}, > +#endif > {"X-Error-URL", HDR_X_ERROR_URL, ftStr}, > {"X-Error-Status", HDR_X_ERROR_STATUS, ftInt}, > {"Front-End-Https", HDR_FRONT_END_HTTPS, ftStr}, > @@ -210,6 +213,9 @@ > #if X_ACCELERATOR_VARY > HDR_X_ACCELERATOR_VARY, > #endif > +#if VARY_OPTIONS > + HDR_X_VARY_OPTIONS, > +#endif > HDR_X_SQUID_ERROR > }; > > @@ -1185,6 +1191,54 @@ > return tot; > } > > +/* Get the combined Vary headers as a String > + * Returns StringNull if there are no vary headers > + */ > +String httpHeaderGetVary(const HttpHeader * hdr) > +{ > + String hdrString = StringNull; > +#if VARY_OPTIONS > + HttpHeaderEntry *e; > + if ((e = httpHeaderFindEntry(hdr, HDR_X_VARY_OPTIONS))) { > + stringInit(&hdrString, strBuf(e->value)); > + return hdrString; > + } > +#endif > + > + hdrString = httpHeaderGetList(hdr, HDR_VARY); > +#if X_ACCELERATOR_VARY > + { > + String xavString = StringNull; > + xavString = httpHeaderGetList(hdr, HDR_X_ACCELERATOR_VARY); > + if (strBuf(xavString)) > + strListAdd(&hdrString, strBuf(xavString), ','); > + stringClean(&xavString); > + } > +#endif > + return hdrString; > +} > + > +/* > + * Returns TRUE if at least one of the vary headers are present > + */ > +int httpHeaderHasVary(const HttpHeader * hdr) > +{ > +#if VARY_OPTIONS > + if (httpHeaderHas(hdr, HDR_X_VARY_OPTIONS)) { > + return TRUE; > + } > +#endif > +#if X_ACCELERATOR_VARY > + if (httpHeaderHas(hdr, HDR_X_ACCELERATOR_VARY)) { > + return TRUE; > + } > +#endif > + if (httpHeaderHas(hdr, HDR_VARY)) { > + return TRUE; > + } > + return FALSE; > +} > + > /* > * HttpHeaderEntry > */ > @@ -1438,3 +1492,5 @@ > assert(id >= 0 && id < HDR_ENUM_END); > return strBuf(Headers[id].name); > } > + > + > diff -Xdiffx -ru squid-2.6.18.orig/src/HttpReply.c > squid-2.6.18/src/HttpReply.c > --- squid-2.6.18.orig/src/HttpReply.c 2006-06-11 10:28:19.000000000 +1000 > +++ squid-2.6.18/src/HttpReply.c 2008-02-08 14:42:04.000000000 +1100 > @@ -325,8 +325,7 @@ > return squid_curtime; > } > } > - if (Config.onoff.vary_ignore_expire && > - httpHeaderHas(&rep->header, HDR_VARY)) { > + if (Config.onoff.vary_ignore_expire && httpHeaderHasVary(&rep->header)) { > const time_t d = httpHeaderGetTime(&rep->header, HDR_DATE); > const time_t e = httpHeaderGetTime(&rep->header, HDR_EXPIRES); > if (d == e) > diff -Xdiffx -ru squid-2.6.18.orig/src/protos.h squid-2.6.18/src/protos.h > --- squid-2.6.18.orig/src/protos.h 2008-02-07 19:28:38.000000000 +1100 > +++ squid-2.6.18/src/protos.h 2008-02-08 14:46:21.000000000 +1100 > @@ -444,6 +444,8 @@ > extern squid_off_t httpHeaderGetSize(const HttpHeader * hdr, http_hdr_type > id); > extern time_t httpHeaderGetTime(const HttpHeader * hdr, http_hdr_type id); > extern TimeOrTag httpHeaderGetTimeOrTag(const HttpHeader * hdr, > http_hdr_type id); > +extern String httpHeaderGetVary(const HttpHeader * hdr); > +extern int httpHeaderHasVary(const HttpHeader * hdr); > extern HttpHdrCc *httpHeaderGetCc(const HttpHeader * hdr); > extern HttpHdrRange *httpHeaderGetRange(const HttpHeader * hdr); > extern HttpHdrContRange *httpHeaderGetContRange(const HttpHeader * hdr); > diff -Xdiffx -ru squid-2.6.18.orig/src/store.c squid-2.6.18/src/store.c > --- squid-2.6.18.orig/src/store.c 2008-02-07 19:28:38.000000000 +1100 > +++ squid-2.6.18/src/store.c 2008-02-08 14:55:06.000000000 +1100 > @@ -721,7 +721,12 @@ > state->e = storeCreateEntry(url, log_url, flags, method); > httpBuildVersion(&version, 1, 0); > httpReplySetHeaders(state->e->mem_obj->reply, version, HTTP_OK, > "Internal marker object", "x-squid-internal/vary", -1, -1, squid_curtime + > 100000); > +#if VARY_OPTIONS > + /* Can't put a string into a list header */ > + httpHeaderPutStr(&state->e->mem_obj->reply->header, HDR_X_VARY_OPTIONS, > vary); > +#else > httpHeaderPutStr(&state->e->mem_obj->reply->header, HDR_VARY, vary); > +#endif > storeSetPublicKey(state->e); > if (!state->oe) { > /* New entry, create new unique ID */ > @@ -1039,20 +1044,8 @@ > } > newkey = storeKeyPublicByRequest(mem->request); > if (mem->vary_headers && !EBIT_TEST(e->flags, KEY_EARLY_PUBLIC)) { > - String vary = StringNull; > vary_id_t vary_id; > - String varyhdr; > - varyhdr = httpHeaderGetList(&mem->reply->header, HDR_VARY); > - if (strBuf(varyhdr)) > - strListAdd(&vary, strBuf(varyhdr), ','); > - stringClean(&varyhdr); > -#if X_ACCELERATOR_VARY > - /* This needs to match the order in http.c:httpMakeVaryMark */ > - varyhdr = httpHeaderGetList(&mem->reply->header, > HDR_X_ACCELERATOR_VARY); > - if (strBuf(varyhdr)) > - strListAdd(&vary, strBuf(varyhdr), ','); > - stringClean(&varyhdr); > -#endif > + String vary = httpHeaderGetVary(&mem->reply->header); > /* Create or update the vary object */ > vary_id = storeAddVary(mem->url, mem->log_url, mem->method, newkey, > httpHeaderGetStr(&mem->reply->header, HDR_ETAG), strBuf(vary), > mem->vary_headers, mem->vary_encoding); > if (vary_id.create_time) { > -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -