Good stuff!

>
> bad word! bad word! Squid in this context is a "reverse proxy". To all
> intents and visibility of the client they are the web server.
>
> The various common meanings of "transparent" has nothing to do with it.
>

Hehe... correct terminology noted.

>> 3) It is protected behind local AUTH applications which perform
>> complex access checks before passing the request onto Squid
>
> You might be able to reduce your server box overheads by merging that into a
> Squid auth helper. This may or may not be a big issue to do so.
>

Yup - big issue at this time - I might move in this direction at a later time.

>> #------
>> http_port 3128 act-as-origin accel vhost http11
>
> The use of vhost here forces Squid to process the Host: header and cache
> URLs with its content as the domain name.
>
> To meet criteria (4) "All documents ... cached a  [http://127.0.0.1/URL]";
>
> You need to be using:
>
>  http_port 80 act-as-origin accel http11 dstdomain=127.0.0.1
>

Hmmm.  Removing vhost and setting dstdomain may be worthwhile (I'll
try it) but shouldn't the Squid listening port and the origin server
port be different?

>> icp_port 3130
>> cache_dir ufs /cache/data 2048 16 256
>
> aufs please.
>

I'll test this out. Are there disadvantages to using aufs?


>>
>> ## Origin server
>> cache_peer 127.0.0.1 parent 80 0 name=localweb max-conn=250 no-query
>> no-netdb-exchange originserver http11
>> cache_peer_access localweb allow localnet
>> cache_peer_access localweb deny all
>> ## Sibling Caches
>> #   cache_peer [IP_OF_SIBLING_1] sibling 3128 3130 proxy-only
>> cache_peer [IP_OF_SIBLING_2] sibling 3128 3130 proxy-only
>> cache_peer [IP_OF_SIBLING_3] sibling 3128 3130 proxy-only
>> cache_peer [IP_OF_SIBLING_4] sibling 3128 3130 proxy-only
>>
>
> #1 rules of reverse proxies:
>   If the reverse-proxy rules are not above the generic forward-proxy rule
> they risk false error pages.
>

Could you expand on this? Are you referring to the order of the
directives in the squid.conf or something else?

>> - Does the ICP sibling setup makes sense or will it limit the number
>> of servers in the cluster? Or should this be redesigned to work with
>> multiple parent caches instead of siblings? Or perhaps multicast ICP?
>> Or I could try digests?
>
> You want it to be scalable AND fast? multicast or digests.
>
> You want to maximize bandwidth capacity? digests or CARP.
>

I want it to solve world hunger... but I'll settle for scalable and
fast.  I think I'll give digests a try - but I have a concern
regarding the sheer number of cached items that will be in the digest.
Any advice or formulas on how to calculate digest size?

>>
>> - Would using 'icp_hit_stale' and 'allow-miss' improve hit-ratios
>> between the shards? Is there a way to force a given Squid server to be
>> the ONLY server storing a cached document (stale, fresh, or
>> otherwise)?
>
> icp_hit_stale  allows peers to say "I have it!" when what they really have
> is an old stale copy. Useful of the peer is close and the object can be
> served stale while a better one is fetched. Bad if it causes spreading of
> non-cacheable objects.
>
> allow-miss  allows peers to send the "I have it" message on stale objects
> and fetch a new copy from their fast source when they are asked for the full
> thing. Thus refreshing the object in two caches instead of just one.
> Mitigating the total effect of having that one fetch be extra slow.
>

Are these still relevant when using digests instead of ICP?

>>
>> - Using this basic setup for about a month now and I am getting
>> strange squid access.log entries when the load goes up:
>>
>> 2009-04-04 11:13:47 504 GET "http://127.0.0.1:3128/[URL]"; TCP_HIT NONE
>> 3018 0 "127.0.0.1" "127.0.0.1:3128" "-" "-"
>
> This is due to your website being hosted on 127.0.0.1 port 3128.
>
> The Host: header contains domain:port unless the port is the http default
> port 80.
>
> The new http_port line I gave you above should fix this as a by-product.
>

The current origin servers are listening on port 80 (unfortunately so
are the AUTH applications - in fact they are actually the same web
server instance).

I think my confusion may be related to the AUTH application and how it
is currently designed to handle request forwarding. The original plan
was to allow the AUTH application to determine whether it wants to
send the request through squid (on port 3128) or bypass it altogether
and forward the request directly to the origin webserver (on port 80)
on the same machine.

I want to make sure that the AUTH application can still get the
content from the local origin server even if the local Squid does not
come back with a response. Am I over thinking this?

---------------------------------------------------------

Thanks again for the info - very useful!

Rob

Reply via email to