Re: Protocol-http not storing response headers

2024-08-01 Thread Markus Jelsma
Yeah, i've overwritten the Content-Length header with on the length of the decompressed content byte array. Luckily our clients' needs are modest in what they demand in their WARCs. Many thanks, Markus Op wo 31 jul 2024 om 14:22 schreef Sebastian Nagel : > Hi Markus, > > >> And i do not agree

Re: Protocol-http not storing response headers

2024-07-31 Thread Sebastian Nagel
Hi Markus, >> And i do not agree with it. Almost all content is compressed now, so this >> will never work. We need the headers and response code stored for WARC >> export and do not care about an incorrect length header. No, don't do this. You need to rewrite the header. There are many WARC rea

Re: Protocol-http not storing response headers

2024-07-31 Thread Markus Jelsma
Aah thanks Lewis. We're still on 1.15, glad to see this was fixed already, and that i would have patched it in exactly the same way. Thanks! Op di 30 jul 2024 om 18:42 schreef lewis john mcgibbney : > Hi Markus, > > Which version of Nutch are you referring to? I'm not seeing this exact > code in

Re: Protocol-http not storing response headers

2024-07-30 Thread lewis john mcgibbney
Hi Markus, Which version of Nutch are you referring to? I'm not seeing this exact code in master branch. Is this roughly the code you are referencing? https://github.com/apache/nutch/blob/master/src/plugin/protocol-http/src/java/org/apache/nutch/protocol/http/HttpResponse.java#L304-L318 Thanks le