On 07/11/2014 05:47 PM, Alex Rousskov wrote:
On 07/11/2014 05:27 AM, Tsantilas Christos wrote:
The PageSpeed example fits better to a post-cache RESPMOD feature.
I do not think so. Post-cache RESPMOD does not allow Squid to cache the
adapted variants. Please let me know if I missed how post-cache RESPMOD
can do that.
I did not read correctly the problem you want to solve. I had in my mind
a proxy which cache original content and then adapts the cached content
according client rules.
But you want to cache adapted content.
However still I am not sure I can understand how the post-cache reqmod
will help.
Assume the following scenario:
- Client A requests original web page
- Client B requests optimized web page (removed spaces and comments)
I am expecting a solution which will store to cache two copies of the
web page, the optimized and the original copy.
A solution on this is can be to use a mechanism similar to the vary
headers, for example define a ICAP header which should included to vary.
I did not look to storeID feature but probably can be used for the same
purpose.
The key here is that PageSpeed and similar services want to create (and
cache) many adapted responses out of a single virgin response. Neither
HTTP itself nor the Squid architecture support that well. Post-cache
REQMOD allows basic PageSpeed support (the first request for "small"
adapted content gets "large" virgin content, but the second request for
small content fetches it from the PageSpeed cache, storing it in Squid
cache). To optimize PageSpeed support further (so that the first request
can get small content), we will need to add another generally useful
feature, but I would rather not bring it into this discussion (there
will be a separate RFC if we get that far).
Probably I did not understand well how the PageSpeed works or what a
PageSpeed cache means. But in the above scenario squid looks that will
store only one version of the content (the small content).
Is this the only required?
What am I missing?
The alternative is to create a completely new interface (not a true
vectoring point) that allows an adaptation service to push multiple
adapted responses into the Squid cache _and_ tell Squid which of those
responses to use for the current request. While I have considered
proposing that, I still think we would be better off supporting
"standard" and "well understood" building blocks (such as standard
adaptation vectoring points) rather than such highly-specialized
interfaces. Please let me know if you disagree.
Is
the post-cacge REQMOD just a first step to support all post-cache
vectoring points?
You can certainly view it that way, but I do not propose or promise
adding post-cache RESPMOD :-).
Thank you,
Alex.
On 07/11/2014 01:15 AM, Alex Rousskov wrote:
Hello,
I propose adding support for a third adaptation vectoring point:
post-cache REQMOD. Services at this new point receive cache miss
requests and may adapt them as usual. If a service satisfies the
request, the service response may get cached by Squid. As you know,
Squid currently support pre-cache REQMOD and pre-cache RESPMOD.
We have received many requests for post-cache adaptation support
throughput the years, and I personally resisted the temptation of adding
another layer of complexity (albeit an optional one) because it is a lot
of work and because many use cases could be addressed without post-cache
adaptation support.
The last straw (and the motivation for this RFC) was PageSpeed[1]
integration. With PageSpeed, one can generate various variants of
"optimized" content. For example, mobile users may receive smaller
images. Apache and Nginx support PageSpeed modules. It is possible to
integrate Squid with PageSpeed (and similar services) today, but it is
not possible for Squid to _cache_ those generated variants unless one is
willing to pay for another round trip to the origin server to get
exactly the same unoptimized content.
The only way to support Squid caching of PageSpeed variants without
repeated round trips to the origin server is using two Squids. The
parent Squid would cache origin server responses while the child Squid
would adapt parent's responses and cache adapted content. Needless to
say, running two Squids (each with its own cache) instead of one adds
significant performance/administrative overheads and complexity.
As far as internals are concerned, I am currently thinking of launching
adaptation job for this vectoring point from FwdState::Start(). This
way, its impact on the rest of Squid would be minimal and some adapters
might even affect FwdState routing decisions. The initial code name for
the new class is MissReqFilter, but that may change.
The other candidate location for plugging in the new vectoring point is
the Server class. However, that class is already complex. It handles
communication with the next hop (with child classes doing
protocol-specific work and confusing things further) as well as
pre-cache RESPMOD vectoring point with caching initiation on top of
that. The Server code already has trouble distinguishing various content
streams it has to juggle. I am worried that adding another vectoring
point there would make that complexity significantly worse.
It is possible that we would be able to refactor/encapsulate some of the
code so that it can be reused in both the existing Server and the new
MissReqFilter classes. I will look out for such opportunities while
trying to keep the overall complexity in check.
Any objections to adding post-cache REQMOD or better implementation
ideas?
Thank you,
Alex.
[1] https://developers.google.com/speed/pagespeed/