patrick peck created NUTCH-2067: ----------------------------------- Summary: HttpFormAuthentication unable to decode login page when server responds with GZIP encoding Key: NUTCH-2067 URL: https://issues.apache.org/jira/browse/NUTCH-2067 Project: Nutch Issue Type: Bug Components: plugin, protocol Affects Versions: 1.10 Reporter: patrick peck
The method org.apache.nutch.protocol.httpclient.HttpFormAuthentication#httpGetPageContent() which is used to download the login page when doing form authentication, fails to take into account that the response body may be gzip encoded which is possible given the fact that the Http.configureClient() method sets the Accept-Encoding header to "x-gzip, gzip, deflate". It's also not possible to override the Accept-Encoding header, since it's overridden by the default (or, to be more exact: if you add an <additionalPostHeaders> <field name="Accept-Encoding" value="identity" /> </additionalPostHeaders> to the configuration, the http client sends out the Accept-Encoding header twice, first with the above configuration, second with the default configuration.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)