Here's yet another design version, following many helpful suggestions from Henrik.
Gzip Content-Encoding in Squid Design Version Choice The goal will be to get these changes into Squid3 HEAD. Content-Encoding Protocol The content-encoding protocol is describedi Header field cases from client: If Accept-Encoding field is present in client request If there is a cached response aleady available, and it contains a Content-Encoding field with encodings that are a subset of what the client accepts Then forward response to client unchanged Else (no cached response with right content-encoding) If uncoded response isn't available Then forward client request to server/cache If server/cache response contains Content-Encoding field Then forward new response to client Else (server/cache response doesn't have Content-Encoding) Then encode client response Send encoded response to client Else (uncoded server response already available) Then encode uncoded response Send encoded response to client Else (no Accept-Encoding in client request) If uncoded server response already available Forward unchanged to client Else if coded server response already available Then decode server response send decoded response to client Else (no response available yet) Then forward request to client or cache, and behave unchanged with respect to this protocol. There will be no explicit links between objects that are different links to the same coding. Instead, StoreKeys of coded objects will be chosen particularly as MD5(OriginalStoreKey,Content-Encoding). This would allow one to derive the StoreKeys of all possible encodings including original if only knowing the original StoreKey and not the requested URL. Searching for an uncoded version of an object is done by generating an uncoded StoreKey and looking for an object with that key. It's needed upon cache miss (see protocol above). Upon original or encoded object update or PURGE, delete all the possible encoding variants. As the encodings are applied locally the possible combinations are known and finite so there is no problem on purging all at once. If the number of encodings grows nontrivially, we may need to add an additional mechanism to keep that check under control. Original-update deletion will be triggered on swapout of a new original object (when it gets a public key). Etags: Encoded objects will be given unique new entity tags. There will be a configuration option to turn off content-encoding. Content-Encoding Implementation New HttpHdrContCode module, that parses related HTTP headers, and arranges for encoding or decoding appropriately. Includes the following functions: codeParseRequest(): Called from client_side:parseHttpRequest() after clientStreamInit() call. Checks for and parses Allow-Encoding headers. Instantiates content_coding appropriately, and calls codeClientStreamInit(). codeClientStreamInit(): Adds a new node to clientStream with codeStreamRead(), codeStreamCallback(), and codeStreamStatus() functions. codeStreamCallback()set up encoding/decoding state depending on combination of Content-Encoding and Allow-Encoding fields seen. codeStreamRead(): call HttpContentCoder transformation functions appropriately. codeStreamStatus(): report status to stream. New HttpContentCoder abstract type, with functions: encodeStart() encodeEnd() encodeChunk() decodeStart() decodeEnd() decodeChunk() New per-coded-object ContentCoderState, to handle coding state. It'll be referenced from the clientStream, and include fields: HttpContentCoder *coder off_t codedOffset Objects will be stored both in unencoded and encoded formats. An object will stay in the format in which Squid receives it until requested by a client requesting a different Content-Encoding which Squid supports (this could be immediate). Once this happens, the object will be streamed coded into a different StoreEntry and on to the client. Other changes needed: Add new content_coding field to HttpReply. New httpHeaderGetContentEncoding(HttpReply *) function in HttpHeader.cc. A new configuration flag to turn content-encoding off, if desired. A new object flag, "encoded". Whenever an encoded or decoded object is created, it's tagged as "encoded". Thus, a locally redecoded object will be obviously so. A new store.cc function, storeDeleteCodedCopies(), will do the deletion of all (un)coded copies described above. Gzip A new GzipContentCoder module, which will be an instance of HttpContentCoder. Data encoding will be handled by the gzip.org <a href=http://www.gzip.org/zlib/> zlib library</a>. Functions: gzEncodeStart: call inflateInit2(), write header gzEncodeEnd: write trailer gzEncodeChunk: call inflate() gzDecodeStart: call deflateInit2(), read and verify header gzDecodeEnd: verify trailer gzDecodeChunk: call deflate() gzDoSaveEncoded(): true Test Strategy Must pass the test suite. Must add appropriate tests, including sending gzipped content to oneself successfully. Will also test against Apache mod_gzip implementation, and maybe even gunzip.