Paul Slootman <[EMAIL PROTECTED]> writes:

> Apparently http://archive.neotonic.com/archive/ still doesn't work.
> I've traced what's exchanged:
> 
> request:
>     GET /archive/ HTTP/1.1
...
>     Accept-Encoding: x-gzip; q=1.0, gzip; q=1.0, x-deflate; q=0.9, deflate; q=0.9, 
> identity; q=0.1
>     TE: chunked
> 
> response:
>     HTTP/1.1 200 OK
...
>     Content-Encoding: deflate
>     Transfer-Encoding: chunked
>     Content-Type: text/html
> 
>     621
>     �Xmo�6[...unreadable...]
> 
> I get a blank page, and this error is logged:
> 
> wwwoffles[28773]: Error reading reply body from remote host [IO(zlib): unknown 
> compression method].

Since having all of the problems with handling compressed data in
previous versions of WWWOFFLE I have spent a lot of time looking for
the types of reply data that get sent.  The method used by WWWOFFLE
will handle four different formats.

1) If gzip is used then it is easy to recognise since the first two
   bytes have a particular value.  The data here is not of that
   format.

2) If deflate is used then the HTTP specification says that there
   should be a zlib header present that describes the compression
   options.  This header can be recognised by the relationship between
   the first two bytes (checksum).

3) In reality the use of deflate compression omits the two byte zlib
   header, this is contrary to the HTTP specification, but seems to be
   the common practice.

4) The other format that I have found is generated by certain versions
   of PHP that claim to provide deflate compression, but in truth use
   a 10 byte gzip header followed by the 2 byte zlib header.

Any format that WWWOFFLE receives that cannot be recognised by the
first two bytes is assumed to be deflate compression using the default
options (3).


In the case shown above WWWOFFLE would have assumed that it is
deflate, but since it cannot decompress it (the error comes from
zlib, not from WWWOFFLE) then it would seem that it is not.

If anybody else finds any sites that don't work with the WWWOFFLE
decompression functions in version 2.8 I would like to know the
address so that I can do some testing.


> Completely unrelated, a couple of other points:
> 
> - This is given twice in the conf/wwwoffle.conf file:
> 
> # [<URL-SPEC>] only-same-host-images = yes | no
> #         If the only images that are fetched are the ones that are on
> #         the same
> #         host as the page that references them (default=no).

Are you sure about this?  I don't see this, the conf/wwwoffle.conf
file is automatically generated from the conf/wwwoffle.conf.template
and doc/README.CONF files.  Neither of these have the repeated text in
them (and nowhere does it have the strange formatting that you show).

> - upgrade-config.pl changes e.g.
> 
>     User-Agent = WWWOFFLE/2.7h
> 
>   into
> 
>     User-Agent = WWWOFFLE/2.8h
> 
>   That doesn't make much sense; I'd say forget about trailing letters in
>   the original.

I have updated the regular expression that it uses to match the old
version number so that it should pick up the right thing now.

-- 
Andrew.
----------------------------------------------------------------------
Andrew M. Bishop                             [EMAIL PROTECTED]
                                      http://www.gedanken.demon.co.uk/

WWWOFFLE users page:
        http://www.gedanken.demon.co.uk/wwwoffle/version-2.8/user.html

Reply via email to