Dan Jacobson <[EMAIL PROTECTED]> writes:

> Can we please have a way of dealing with sites where case doesn't
> matter, and over the years there have been all kinds of links made to
> the same file, that we click, and wastefully get the same file, or
> don't click, fearing a big download, while all along the file already
> was gotten by us long ago.
> 
> $ wwwoffle-ls http://www.dgt.gov.tw
> /Chinese/About-dgt/publication.shtml
> /CHINESE/About-dgt/publication.shtml
> /Chinese/About-dgt/Publication.shtml
> /Chinese/About-Dgt/publication.shtml...
> 
> Probably a declaration in wwwoffle.conf about what sites are to be
> considered case-insensitive, and map all wwwoffle transactions
> regarding those sites into all lower case (except %B3%E9 stuff? so
> maybe all upper case? but hard on eyes, so lower case?).
> 
> Thereafter, even links that have already been fetched would show
> proper color, despite their case.
> 
> We would probably also need to run a script to rename pages already in
> the cache to lower case, picking which ones of multiple copies
> (probably the newest one) we want to become the lower case version.

Even though I don't really like the idea, I think that it might work.

When I first saw the subject of thie e-mail I was all prepared to tell
you how difficult it would be.  That all of the cached files in
WWWOFFLE are stored with names that are generated in a one-way mapping
from the URL.  That it is not possible to check if another URL is
already cached with a case insensitive match without checking against
every cached file on that host.

You have diverted me from all that by thinking of the solution.
Mapping all of the URLs to lower case before creating the filename,
but keeping them in the original form everywhere else.  Provided that
the cache is initially replaced with files that have lower-case URLs
there should be no problem.

... Unless ...

Are all of the files that you see in your cache really the same file?

I remember a web-site once that allowed you to use any case for the
URLs.  If you asked for http://www.foo/foo/bar/index.html it would
send you a re-direction for http://www.foo/Foo/bar/index.html and this
would send a redirection for http://www.foo/Foo/Bar/index.html and
this would then redirect you to http://www.foo/Foo/Bar/Index.html.  So
you needed to fetch all four of these files to get to the one that
held the prize, the real contents.

If you asked for http://www.foo/foo/bar/index.html and then cached it
the browser would ask WWWOFFLE for http://www.foo/Foo/bar/index.html
and WWWOFFLE would say that the existing lower-case URL matched so
would send its contents back.  But this would be the same file you
just sent to the browser.  The browser could never get to the real
content of the page.

You might say that you can't cache any redirection pages from sites
that are handled without case-sensitivity.  But this would not work
since a URL like http://www.foo/Foo/Bar/ will probably redirect the
browser to http://www.foo/Foo/Bar/Index.html so must be cached.  If it
is not cached then you end up with a new outgoing request for a URL
that you already have the contents of.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html

Reply via email to