Dan Jacobson <[EMAIL PROTECTED]> writes: > Can we please have a way of dealing with sites where case doesn't > matter, and over the years there have been all kinds of links made to > the same file, that we click, and wastefully get the same file, or > don't click, fearing a big download, while all along the file already > was gotten by us long ago. > > $ wwwoffle-ls http://www.dgt.gov.tw > /Chinese/About-dgt/publication.shtml > /CHINESE/About-dgt/publication.shtml > /Chinese/About-dgt/Publication.shtml > /Chinese/About-Dgt/publication.shtml... > > Probably a declaration in wwwoffle.conf about what sites are to be > considered case-insensitive, and map all wwwoffle transactions > regarding those sites into all lower case (except %B3%E9 stuff? so > maybe all upper case? but hard on eyes, so lower case?). > > Thereafter, even links that have already been fetched would show > proper color, despite their case. > > We would probably also need to run a script to rename pages already in > the cache to lower case, picking which ones of multiple copies > (probably the newest one) we want to become the lower case version.
Even though I don't really like the idea, I think that it might work. When I first saw the subject of thie e-mail I was all prepared to tell you how difficult it would be. That all of the cached files in WWWOFFLE are stored with names that are generated in a one-way mapping from the URL. That it is not possible to check if another URL is already cached with a case insensitive match without checking against every cached file on that host. You have diverted me from all that by thinking of the solution. Mapping all of the URLs to lower case before creating the filename, but keeping them in the original form everywhere else. Provided that the cache is initially replaced with files that have lower-case URLs there should be no problem. ... Unless ... Are all of the files that you see in your cache really the same file? I remember a web-site once that allowed you to use any case for the URLs. If you asked for http://www.foo/foo/bar/index.html it would send you a re-direction for http://www.foo/Foo/bar/index.html and this would send a redirection for http://www.foo/Foo/Bar/index.html and this would then redirect you to http://www.foo/Foo/Bar/Index.html. So you needed to fetch all four of these files to get to the one that held the prize, the real contents. If you asked for http://www.foo/foo/bar/index.html and then cached it the browser would ask WWWOFFLE for http://www.foo/Foo/bar/index.html and WWWOFFLE would say that the existing lower-case URL matched so would send its contents back. But this would be the same file you just sent to the browser. The browser could never get to the real content of the page. You might say that you can't cache any redirection pages from sites that are handled without case-sensitivity. But this would not work since a URL like http://www.foo/Foo/Bar/ will probably redirect the browser to http://www.foo/Foo/Bar/Index.html so must be cached. If it is not cached then you end up with a new outgoing request for a URL that you already have the contents of. -- Andrew. ---------------------------------------------------------------------- Andrew M. Bishop [EMAIL PROTECTED] http://www.gedanken.demon.co.uk/ WWWOFFLE users page: http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html
