On 09/08/2012 01:36 AM, Alex Rousskov wrote:
On 09/07/2012 01:32 PM, Eliezer Croitoru wrote:
On 09/07/2012 05:10 PM, Alex Rousskov wrote:

     I am not sure what "option" you are referring to in the above. The
     Store::get(key) API I have described is not optional -- it is the
     primary way of detecting a hit.

+1 to use the api.
are we talking about the "Store::Root().get" ?
That is where it starts, but other Store kids implement the get() method
as well IIRC. From the caller perspective, you should also look at
storeGetPublic*() and similar functions.
basic review Done already.
     The URL rewriting must happen before or during Store key calculation.

The problem is that if I am rewriting the url, unlike
url_rewrite\redirect the connection should be initiated based on the
original url and not the rewritten one.
Sure, but that is not a problem you need to worry about much. Since you
are not going to alter the original URL, the connection initiation code
will not be affected by your changes (except for some minor adjustments
if some data members get renamed or Store::get() API-related calls get
adjusted as discussed below).
Got my answers during the review.


What I do not know yet is what and where and based on what the request
is done.
For this project, you do not need to know much about connection
initiation code. You need to make a copy of the original URL, rewrite
that copy using the new helper, and put it in a new HttpRequest data
member (for example). This should probably happen (or be initiated from)
client_side.cc or client_side_request.cc.
Well after the review it seems like there is no need to copy the request but to use the char * storeurl. it makes more sense since you never touch any other part of the object while you are getting it.
You then need to make sure that every time Store calculates a store key,
the new member (if any) is used instead of the original URL.
this is the original reason I wanted to know\find about the initiation code.
but I managed to find what I need for now.

Since the current code is using the original request URL where you want
the rewritten/store URL to be used, I recommend _renaming_ all members
that keep or extract the original/request URL (e.g., MemObject::url).
This will force you to go through the relevant callers and make sure
that the right URL is used, depending on the caller context.
Exactly what I was getting into.
Once you are done, we can review whether the vast majority of changes in
the diff are a simple renaming of "url" to "requestUrl". If yes, you can
revert that renaming change. If there are many cases where old "url"
became "storeUrl" and many cases where it became "requestUrl" then the
change should stay.

Please note that the above is just a sketch. I am not suggesting any
specific new names, and I am using existing class names only as an example.
Will look into it more.
as far as I can tell most of the current code MemObject::url calls are from debug.



so we need to store the original url and the store_url at least for the
period of time the request is being served.
Yes, we need to keep both URLs around during the transaction lifetime.
When I was talking about storing a URL, I was only talking about Store
(i.e., storing something in the cache).
A reason I can think of is to make it possible to use mgr:store_url_option that will maybe give some data about in cache objects such as urls etc. For most cases the access.log is what will be used but if there is a need it's for it it's for that.

by the way the Purge tool uses the URL part of the metadata to find urls and purge them in a ufs cache_dir. since I stored objects in my cache with URL meta data as the rewritten one I could do lookups on the cache objects using purge. It gives you the ability to find out exactly at this moment what objects urls are in the cache and in my case also rewritten ones.


(answer about storing or not the store_url in meta)
That is a separate question. I cannot answer it, unfortunately. I do not
know why Squid2 stored either of those URLs. I know Squid3 does not
store the original/request URL.
It does in ufs cahe_dir as a fact. (see the below quotation from a cache file)


url_rewrite api gets the http requests as the background to any action
so it cannot be used for store_url since any change to the request will
lead to the change we are trying to not do exactly.

mem_object contains:
{...
char *url;
HttpRequest *request;
...}
and the url is being used to make the hash but I still dont know what is
being taken from the mem_object to do the request.
HttpStateData::buildRequestPrefix() builds request headers. It may use
mem_object::url. I do not think you should change that code (apart from
renaming and Store::Get() API changes discussed elsewhere).

You should only be concerned with calls to storeGetPublic*() and such.
Make sure the callers do not use mem_object::url without checking
whether there is a "rewritten" store URL as well. You will probably want
to change profile of most of those calls so that the caller does not
need to worry about picking the right URL to use (the decision would
happen inside storeGetPublic() and friends).
Got it on the review.
How is that metadata used in Squid2? What is it used for? Nobody is
suggesting that Squid2 does something wrong here, of course. We just
need to understand _what_ it does so that we can either mimic what it
does in Squid3 or do something better.
Better +1
since squid3 has another approach to the code and I can see it.

even a mimic for me is more then just "cp /.unknown /.superunknown_partly_known"

since couple hours ago I noticed that it was done in order to make
things possible and not really to think in a more wide angle about the
effect of it.
What things does it make possible?
what I meant was to solve the problem of de-duplication.
It was the soul purpose of the specific project\feature and it was late added. I can assume that maybe then the approach was to store in meta data thing for a purpose. if somebody do knows why the original url was embedded into the meta data then it would maybe answer on the possible reason for adding the storeurl into it.

I hope we will not have to commit some code without understanding its
effects.
Not a chance we will commit anything with this effect unless I will actually understand what it effects.
I dont like to commit untested things.
IIRC, when Squid rebuilds in-memory index from swap.state or from the
directory directly, it does not need URLs. It needs keys which are
stored in swap.state and in directory entries.
Good.
and means no helper in any part of the way should be except request from: http,icp,htcp.(all methods)

If a corrupted swap.state entry is detected, it is removed/ignored. If
the whole swap.state is missing or its header is corrupted, then we
rebuild the in-memory index from disk instead. Please note that not all
cache_dir types use swap.state.
OK, but if the data is in the meta data it answers all the questions needed from many angles of the glass. 1. we do not in any was need to store in the meta data the store url beacuse all that is needed is seems in it already. I tested the thing about the meta data and the results for the trunk(updated for 2\3 days ago)

> f = File.open('../root/var/cache/squid/00/00/0000002A')
>l1 = f.gets
=> "\u0003\xAE\u0000\u0000\u0000\u0003\u0010\u0000\u0000\u0000\xFAiUQ|\x918\xFFs\u007F\x86-7p;^\t,\u0000\u0000\u00001\xCAIP\u0000\u0000\u0000\u00001\xCAIP\u0000\u0000\u0000\u0000\xFF\xFF\xFF\xFF\xFF\xFF\xFF\xFF\v\xC0\xFCL\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000`\u0004\u0004B\u0000\u0000\u0000http://walla.yad2.co.il/walla.co.il/js/function.js?rnd=1624874682\u0000\u0004\n";
> l2 = f.gets
=> "\u0000\u0000\u0000STORETEST\u0000\n"
# the above lines is what originally stored in the meta data of a ufs cache_dir while testing with a simple small addition of STORETEST to make sure what the fate of this meta data.

the next is from a 3.head original code:
> f = File.open('../root/var/cache/squid/00/00/00000000')
> l1 = f.gets
=> "\u0003\xA5\u0000\u0000\u0000\u0003\u0010\u0000\u0000\u0000+\eH\u001Ay\\\xEBC\x86\xBE[\xDBq\u001E\u0005\xEA\t,\u0000\u0000\u0000\xB2\xE4JP\u0000\u0000\u0000\u0000\xB3\xE4JP\u0000\u0000\u0000\u0000\x87\xF4\u001Fr\u0000\u0000\u0000\u0000\xD2KJP\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001\u0000`\u0004\u0004H\u0000\u0000\u0000http://l1.yimg.com/dh/ap/default/120907/yangtze_river_red_getty_uni.jpg\u0000\n";
> l2 = f.gets
=> "\b\u0000\u0000\u0000\x98U\u0000\u0000\u0000\u0000\u0000\u0000HTTP/1.0 200 OK\r\n"

so it's clear that also there the original url is stored in meta data.
again dont know about other stores form.

in the storeSwapMetaBuild() at store_swapmeta.cc it's clearly used to store the original url.
about the other stores I dont know since I dont use them.


HTH,

Alex.

Sure it is!

Eliezer

Reply via email to