Tim Allison created NUTCH-3001:
----------------------------------

             Summary: protocol-selenium requires Content-Type header 
                 Key: NUTCH-3001
                 URL: https://issues.apache.org/jira/browse/NUTCH-3001
             Project: Nutch
          Issue Type: Bug
            Reporter: Tim Allison


It looks like the selenium protocol requires that there be content-type. 

The logic seems to be: If the content type is html or xhtml, use selenium, 
otherwise just grab the bytes.  

If the content-type is null, nothing is pulled.  

My guess is that the logic should be : if the content type is not null and 
equals html or xhtml use selenium, otherwise grab the bytes.

Right?

{noformat}
      String contentType = getHeader(Response.CONTENT_TYPE);

      // handle with Selenium only if content type in HTML or XHTML
      if (contentType != null) {



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to