On 2014-09-18 9:24, Alexander Todorov wrote:
I'm running an experiment which needs to collect http objects (html
pages, images, CSS, JavaScript, etc) and store them in some easy to
access/analyze structure. Something like:
.../device-mac-addr/timestamp/url-or-domain-would-be-nice/content/
under content/ goes
* the actual content
* the headers
* any referenced content in a subdir if this is an HTML page
Hi, Alex,
The purpose of a cache is to serve collected HTTP resources to large
numbers of clients as quickly as possible, while minimizing duplication
and keeping the content as fresh as possible; this is then complicated
by the HTTP Vary mechanism.
What you're asking for is something very different than this, so
mod_cache_disk is not a good solution. For example, a MAC address is
irrelevant, and a timestamp in the path is actually harmful.
mod_cache_disk does, however, use the host header, port, URL path and
query string to create a hash that it uses for its filenames and
directory names -- this permits mod_cache_disk to find cached resources
quickly while avoiding problems with URL length or special characters in
filenames.
In case it is helpful, you can see what is in the cache by running the
command "htcacheclean -a -D -p/path/to/your/disk/cache". You can also
get more detailed information by using the "-A" option instead of "-a".
You could then use the output from this as an index to what is in the
cache at a particular point in time. See
https://httpd.apache.org/docs/2.4/programs/htcacheclean.html
If this doesn't meet your need, you might want to look into writing your
own module to do exactly what you need for your experiment.
--
Mark Montague
m...@catseye.org
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@httpd.apache.org
For additional commands, e-mail: users-h...@httpd.apache.org