I was able to follow the Nutch tutorial and get the bin/crawl command working with sites that don't require authentication, including loading the results into a Solr installation. I also checked that I could query the Solr index and get back the expected information.
However, I can't figure out how to get it to use Kerberos authentication to fetch urls. I'm using apache-nutch-1.8, which appears to have the necessary version of Apache HttpClient (httpclient-4.1.1.jar). Here's what I see: ./bin/nutch org.apache.nutch.parse.ParserChecker https://myhost.example.com fetching: https://myhost.example.com Fetch failed with protocol status: access_denied(17), lastModified=0: Authentication required: https://myhost.example.com In logs/hadoop.log: 2014-05-27 20:35:53,866 INFO parse.ParserChecker - fetching: https://myhost.example.com 2014-05-27 20:35:54,071 ERROR protocol.RobotRulesParser - Agent we advertise (My Nutch Spider) not listed first in 'http.robots.agents' property! 2014-05-27 20:35:54,071 INFO httpclient.Http - http.proxy.host = null 2014-05-27 20:35:54,071 INFO httpclient.Http - http.proxy.port = 8080 2014-05-27 20:35:54,071 INFO httpclient.Http - http.timeout = 10000 2014-05-27 20:35:54,071 INFO httpclient.Http - http.content.limit = 65536 2014-05-27 20:35:54,071 INFO httpclient.Http - http.agent = My Nutch Spider/Nutch-1.8 2014-05-27 20:35:54,071 INFO httpclient.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2014-05-27 20:35:54,071 INFO httpclient.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2014-05-27 20:35:54,651 WARN httpclient.HttpMethodDirector - Unable to respond to any of these challenges: {negotiate=Negotiate} I enabled protocol-httpclient in conf/nutch-default.xml. I expect I need to put something in conf/httpclient-auth.xml, but I can't figure out what. I found the http://wiki.apache.org/nutch/HttpAuthenticationSchemes page, but all the examples there seem to assume that credentials consist of a username and password, which is of course not the case with Kerberos. How do I tell Nutch to use Negotiate authentication? Thanks, Eric

