Hi Markus, Which version of Nutch are you referring to? I'm not seeing this exact code in master branch. Is this roughly the code you are referencing? https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L304-L318
Thanks lewismc On Tue, Jul 30, 2024 at 8:14 AM <[email protected]> wrote: > ---------- Forwarded message ---------- > From: Markus Jelsma <[email protected]> > To: user <[email protected]> > Cc: > Bcc: > Date: Tue, 30 Jul 2024 17:13:01 +0200 > Subject: Protocol-http not storing response headers > Hi, > > Protocol-http does this (not storing HTTP response heades if response is > compressed): > > // store the headers verbatim only if the response was not > compressed > // as the content length reported does not match otherwise > if (httpHeaders != null) { > headers.add(Response.RESPONSE_HEADERS, httpHeaders.toString()); > } > if (Http.LOG.isTraceEnabled()) { > Http.LOG.trace("fetched " + content.length + " bytes from " + > url); > } > > And i do not agree with it. Almost all content is compressed now, so this > will never work. We need the headers and response code stored for WARC > export and do not care about an incorrect length header. > > Before patching this up and breaking that code out of the compression > condition, i do ask myself, is that a good idea? I don't see okhttp having > the same condition. > > Markus -- http://people.apache.org/keys/committer/lewismc

