Hi Markus,

Which version of Nutch are you referring to? I'm not seeing this exact
code in master branch.
Is this roughly the code you are referencing?
https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L304-L318

Thanks
lewismc

On Tue, Jul 30, 2024 at 8:14 AM <[email protected]> wrote:

> ---------- Forwarded message ----------
> From: Markus Jelsma <[email protected]>
> To: user <[email protected]>
> Cc:
> Bcc:
> Date: Tue, 30 Jul 2024 17:13:01 +0200
> Subject: Protocol-http not storing response headers
> Hi,
>
> Protocol-http does this (not storing HTTP response heades if response is
> compressed):
>
>           // store the headers verbatim only if the response was not
> compressed
>           // as the content length reported does not match otherwise
>           if (httpHeaders != null) {
>             headers.add(Response.RESPONSE_HEADERS, httpHeaders.toString());
>           }
>           if (Http.LOG.isTraceEnabled()) {
>             Http.LOG.trace("fetched " + content.length + " bytes from " +
> url);
>           }
>
> And i do not agree with it. Almost all content is compressed now, so this
> will never work. We need the headers and response code stored for WARC
> export and do not care about an incorrect length header.
>
> Before patching this up and breaking that code out of the compression
> condition, i do ask myself, is that a good idea? I don't see okhttp having
> the same condition.
>
> Markus

-- 
http://people.apache.org/keys/committer/lewismc

Reply via email to