Robert Collins wrote:
On Tue, 2003-10-21 at 21:31, Martin Ritchie wrote:

Sorry if this is a total newbee question but I'm wanting to store the actual page content in a database is there anyone out there that has done anything like this? Do you have any pointers of where I should start.


Well, there are a few approaches. The simplest would be to tail
store.log, and copy out the objects as they are completed. You can use
ufsdump in the squid3 sources (cd src && make ufsdump) as a sample
application for examining a single cached object. Only a little work
would be needed to list all the metadata, and the byte offset that
actual data starts - from there you can insert that into your database.
(Be sure to take a local copy (not hardlink) first, so as to minimise
the occurences of the object being recycled before you get to it. You
can't do that with COSS though. A second approach would be a hacked
squid with a an external call out of some sort - perhaps iCap , although
the iCap patches are still only for 2.5.

my cvs head ufsdump doesn't want to compile. I'm getting a number of mulitple definition errors based on a number of comm_select methods. I'm still new to C++ so please go easy on me. I'm not sure that even getting this working will solve our problem as only cached pages will be in the cache.


If I'm wanting to go for the second approach of 'patching' squid with an external call where would I start. Is 2.5 and icap the best approach or should I be looking to v3?

I guess the html is sent to the client as it arrives but is it ever available fully in memory? and is it possible to add db processing when the content has been fully retrieved.

tia




-- Martin Ritchie

the Kelvin Institute
50, George Street

+44 (0) 141 548 5719



Reply via email to