Yes, I will check that. I cranked up the logging and ran again, to see if you might spot something odd.
2016-11-02 14:23:01,652 INFO parse.ParserChecker - fetching: http://iis75.intranet.org 2016-11-02 14:23:01,684 INFO plugin.PluginRepository - Plugins: looking in: /opt/nutch/plugins 2016-11-02 14:23:01,684 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/creativecommons/plugin.xml 2016-11-02 14:23:01,693 DEBUG plugin.PluginRepository - plugin: id=creativecommons name=Creative Commons Plugins version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,705 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.creativecommons.nutch.CCParseFilter 2016-11-02 14:23:01,705 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.creativecommons.nutch.CCIndexingFilter 2016-11-02 14:23:01,706 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/feed/plugin.xml 2016-11-02 14:23:01,708 DEBUG plugin.PluginRepository - plugin: id=feed name=Feed Parse/Index/Query Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,708 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.feed.FeedParser 2016-11-02 14:23:01,709 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.feed.FeedIndexingFilter 2016-11-02 14:23:01,709 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/headings/plugin.xml 2016-11-02 14:23:01,711 DEBUG plugin.PluginRepository - plugin: id=headings name=Headings Parse Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,711 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parse.headings.HeadingsParseFilter 2016-11-02 14:23:01,712 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-anchor/plugin.xml 2016-11-02 14:23:01,714 DEBUG plugin.PluginRepository - plugin: id=index-anchor name=Anchor Indexing Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,714 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.anchor.AnchorIndexingFilter 2016-11-02 14:23:01,715 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-basic/plugin.xml 2016-11-02 14:23:01,732 DEBUG plugin.PluginRepository - plugin: id=index-basic name=Basic Indexing Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,732 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.basic.BasicIndexingFilter 2016-11-02 14:23:01,733 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-geoip/plugin.xml 2016-11-02 14:23:01,735 DEBUG plugin.PluginRepository - plugin: id=index-geoip name=GeoIP2 Indexing Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,735 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.geoip.GeoIPIndexingFilter 2016-11-02 14:23:01,736 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-links/plugin.xml 2016-11-02 14:23:01,738 DEBUG plugin.PluginRepository - plugin: id=index-links name=Index inlinks and outlinks version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,738 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.links.LinksIndexingFilter 2016-11-02 14:23:01,738 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-metadata/plugin.xml 2016-11-02 14:23:01,754 DEBUG plugin.PluginRepository - plugin: id=index-metadata name=Index Metadata version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,754 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.metadata.MetadataIndexer 2016-11-02 14:23:01,754 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-more/plugin.xml 2016-11-02 14:23:01,756 DEBUG plugin.PluginRepository - plugin: id=index-more name=More Indexing Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,757 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.more.MoreIndexingFilter 2016-11-02 14:23:01,757 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-replace/plugin.xml 2016-11-02 14:23:01,764 DEBUG plugin.PluginRepository - plugin: id=index-replace name=Replace Indexer version=1.0 provider=PeterCiuffetticlass=null 2016-11-02 14:23:01,764 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.replace.ReplaceIndexer 2016-11-02 14:23:01,764 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/index-static/plugin.xml 2016-11-02 14:23:01,779 DEBUG plugin.PluginRepository - plugin: id=index-static name=Index Static version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,779 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.staticfield.StaticFieldIndexer 2016-11-02 14:23:01,779 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/indexer-cloudsearch/plugin.xml 2016-11-02 14:23:01,782 DEBUG plugin.PluginRepository - plugin: id=indexer-cloudsearch name=CloudSearchIndexWriter version=1.0.0 provider=nutch.apache.orgclass=null 2016-11-02 14:23:01,783 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexWriter class=org.apache.nutch.indexwriter.cloudsearch.CloudSearchIndexWriter 2016-11-02 14:23:01,784 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/indexer-dummy/plugin.xml 2016-11-02 14:23:01,785 DEBUG plugin.PluginRepository - plugin: id=indexer-dummy name=DummyIndexWriter version=1.0.0 provider=nutch.apache.orgclass=null 2016-11-02 14:23:01,786 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexWriter class=org.apache.nutch.indexwriter.dummy.DummyIndexWriter 2016-11-02 14:23:01,786 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/indexer-elastic/plugin.xml 2016-11-02 14:23:01,804 DEBUG plugin.PluginRepository - plugin: id=indexer-elastic name=ElasticIndexWriter version=1.0.0 provider=nutch.apache.orgclass=null 2016-11-02 14:23:01,804 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexWriter class=org.apache.nutch.indexwriter.elastic.ElasticIndexWriter 2016-11-02 14:23:01,805 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/indexer-solr/plugin.xml 2016-11-02 14:23:01,807 DEBUG plugin.PluginRepository - plugin: id=indexer-solr name=SolrIndexWriter version=1.0.0 provider=nutch.apache.orgclass=null 2016-11-02 14:23:01,808 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexWriter class=org.apache.nutch.indexwriter.solr.SolrIndexWriter 2016-11-02 14:23:01,809 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/language-identifier/plugin.xml 2016-11-02 14:23:01,811 DEBUG plugin.PluginRepository - plugin: id=language-identifier name=Language Identification Parser/Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,811 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.analysis.lang.HTMLLanguageParser 2016-11-02 14:23:01,811 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.analysis.lang.LanguageIndexingFilter 2016-11-02 14:23:01,811 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-htmlunit/plugin.xml 2016-11-02 14:23:01,833 DEBUG plugin.PluginRepository - plugin: id=lib-htmlunit name=HTTP Framework version=1.0 provider=org.apache.nutchclass=null 2016-11-02 14:23:01,838 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-http/plugin.xml 2016-11-02 14:23:01,839 DEBUG plugin.PluginRepository - plugin: id=lib-http name=HTTP Framework version=1.0 provider=org.apache.nutchclass=null 2016-11-02 14:23:01,840 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-nekohtml/plugin.xml 2016-11-02 14:23:01,841 DEBUG plugin.PluginRepository - plugin: id=lib-nekohtml name=CyberNeko HTML Parser version=1.9.19 provider=net.sourceforge.nekohtmlclass=null 2016-11-02 14:23:01,842 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-regex-filter/plugin.xml 2016-11-02 14:23:01,872 DEBUG plugin.PluginRepository - plugin: id=lib-regex-filter name=Regex URL Filter Framework version=1.0 provider=org.apache.nutchclass=null 2016-11-02 14:23:01,872 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-selenium/plugin.xml 2016-11-02 14:23:01,874 DEBUG plugin.PluginRepository - plugin: id=lib-selenium name=HTTP Framework version=1.0 provider=org.apache.nutchclass=null 2016-11-02 14:23:01,878 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/lib-xml/plugin.xml 2016-11-02 14:23:01,880 DEBUG plugin.PluginRepository - plugin: id=lib-xml name=XML Libraries version=1.0 provider=org.apache.nutch.xmlclass=null 2016-11-02 14:23:01,880 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/microformats-reltag/plugin.xml 2016-11-02 14:23:01,902 DEBUG plugin.PluginRepository - plugin: id=microformats-reltag name=Rel-Tag microformat Parser/Indexer/Querier version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,902 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.microformats.reltag.RelTagParser 2016-11-02 14:23:01,902 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.microformats.reltag.RelTagIndexingFilter 2016-11-02 14:23:01,902 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/mimetype-filter/plugin.xml 2016-11-02 14:23:01,904 DEBUG plugin.PluginRepository - plugin: id=mimetype-filter name=Filter indexed documents by the detected MIME version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,904 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.filter.MimeTypeIndexingFilter 2016-11-02 14:23:01,905 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/nutch-extensionpoints/plugin.xml 2016-11-02 14:23:01,906 DEBUG plugin.PluginRepository - plugin: id=nutch-extensionpoints name=the nutch core extension points version=2.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,907 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-ext/plugin.xml 2016-11-02 14:23:01,909 DEBUG plugin.PluginRepository - plugin: id=parse-ext name=External Parser Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,909 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.ext.ExtParser 2016-11-02 14:23:01,909 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.ext.ExtParser 2016-11-02 14:23:01,910 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-html/plugin.xml 2016-11-02 14:23:01,935 DEBUG plugin.PluginRepository - plugin: id=parse-html name=Html Parse Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,935 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.html.HtmlParser 2016-11-02 14:23:01,935 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-js/plugin.xml 2016-11-02 14:23:01,937 DEBUG plugin.PluginRepository - plugin: id=parse-js name=JavaScript Parser version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,937 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.js.JSParseFilter 2016-11-02 14:23:01,937 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parse.js.JSParseFilter 2016-11-02 14:23:01,938 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-metatags/plugin.xml 2016-11-02 14:23:01,939 DEBUG plugin.PluginRepository - plugin: id=parse-metatags name=MetaTags version=1.0 provider=digitalpebble.comclass=null 2016-11-02 14:23:01,939 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parse.metatags.MetaTagsParser 2016-11-02 14:23:01,939 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-swf/plugin.xml 2016-11-02 14:23:01,941 DEBUG plugin.PluginRepository - plugin: id=parse-swf name=SWF Parse Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,941 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.swf.SWFParser 2016-11-02 14:23:01,942 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-tika/plugin.xml 2016-11-02 14:23:01,944 DEBUG plugin.PluginRepository - plugin: id=parse-tika name=Tika Parser Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,944 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.tika.TikaParser 2016-11-02 14:23:01,966 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parse-zip/plugin.xml 2016-11-02 14:23:01,968 DEBUG plugin.PluginRepository - plugin: id=parse-zip name=Zip Parse Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,968 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.Parser class=org.apache.nutch.parse.zip.ZipParser 2016-11-02 14:23:01,968 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parsefilter-naivebayes/plugin.xml 2016-11-02 14:23:01,970 DEBUG plugin.PluginRepository - plugin: id=parsefilter-naivebayes name=Naive Bayes Parse Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,970 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parsefilter.naivebayes.NaiveBayesParseFilter 2016-11-02 14:23:01,971 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/parsefilter-regex/plugin.xml 2016-11-02 14:23:01,973 DEBUG plugin.PluginRepository - plugin: id=parsefilter-regex name=Regex Parse Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:01,999 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.parse.HtmlParseFilter class=org.apache.nutch.parsefilter.regex.RegexParseFilter 2016-11-02 14:23:02,000 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-file/plugin.xml 2016-11-02 14:23:02,001 DEBUG plugin.PluginRepository - plugin: id=protocol-file name=File Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,002 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.file.File 2016-11-02 14:23:02,002 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-ftp/plugin.xml 2016-11-02 14:23:02,003 DEBUG plugin.PluginRepository - plugin: id=protocol-ftp name=Ftp Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,003 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.ftp.Ftp 2016-11-02 14:23:02,004 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-htmlunit/plugin.xml 2016-11-02 14:23:02,005 DEBUG plugin.PluginRepository - plugin: id=protocol-htmlunit name=HtmlUnit Protocol Plug-in version=1.0.0 provider=nutch.apache.orgclass=null 2016-11-02 14:23:02,005 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.htmlunit.Http 2016-11-02 14:23:02,005 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.htmlunit.Http 2016-11-02 14:23:02,006 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-http/plugin.xml 2016-11-02 14:23:02,007 DEBUG plugin.PluginRepository - plugin: id=protocol-http name=Http Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,007 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.http.Http 2016-11-02 14:23:02,007 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.http.Http 2016-11-02 14:23:02,008 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-httpclient/plugin.xml 2016-11-02 14:23:02,009 DEBUG plugin.PluginRepository - plugin: id=protocol-httpclient name=Http / Https Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,009 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.httpclient.Http 2016-11-02 14:23:02,009 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.httpclient.Http 2016-11-02 14:23:02,010 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-interactiveselenium/plugin.xml 2016-11-02 14:23:02,031 DEBUG plugin.PluginRepository - plugin: id=protocol-interactiveselenium name=Http Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,031 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.interactiveselenium.Http 2016-11-02 14:23:02,031 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/protocol-selenium/plugin.xml 2016-11-02 14:23:02,040 DEBUG plugin.PluginRepository - plugin: id=protocol-selenium name=Http Protocol Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,041 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.protocol.Protocol class=org.apache.nutch.protocol.selenium.Http 2016-11-02 14:23:02,041 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/scoring-depth/plugin.xml 2016-11-02 14:23:02,042 DEBUG plugin.PluginRepository - plugin: id=scoring-depth name=Scoring plugin for depth-limited crawling. version=1.0.0 provider=ant.comclass=null 2016-11-02 14:23:02,042 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.depth.DepthScoringFilter 2016-11-02 14:23:02,043 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/scoring-link/plugin.xml 2016-11-02 14:23:02,044 DEBUG plugin.PluginRepository - plugin: id=scoring-link name=Link Analysis Scoring Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,044 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.link.LinkAnalysisScoringFilter 2016-11-02 14:23:02,044 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/scoring-opic/plugin.xml 2016-11-02 14:23:02,046 DEBUG plugin.PluginRepository - plugin: id=scoring-opic name=OPIC Scoring Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,046 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.opic.OPICScoringFilter 2016-11-02 14:23:02,046 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/scoring-similarity/plugin.xml 2016-11-02 14:23:02,047 DEBUG plugin.PluginRepository - plugin: id=scoring-similarity name=Similarity based Scoring Plug-in version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,048 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.similarity.SimilarityScoringFilter 2016-11-02 14:23:02,048 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/subcollection/plugin.xml 2016-11-02 14:23:02,062 DEBUG plugin.PluginRepository - plugin: id=subcollection name=Subcollection indexing and query filter version=1.0.0 provider=apache.orgclass=null 2016-11-02 14:23:02,062 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.subcollection.SubcollectionIndexingFilter 2016-11-02 14:23:02,062 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/tld/plugin.xml 2016-11-02 14:23:02,064 DEBUG plugin.PluginRepository - plugin: id=tld name=Top Level Domain Plugin version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,064 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.tld.TLDIndexingFilter 2016-11-02 14:23:02,064 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.tld.TLDScoringFilter 2016-11-02 14:23:02,064 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-automaton/plugin.xml 2016-11-02 14:23:02,066 DEBUG plugin.PluginRepository - plugin: id=urlfilter-automaton name=Automaton URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,066 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.automaton.AutomatonURLFilter 2016-11-02 14:23:02,066 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-domain/plugin.xml 2016-11-02 14:23:02,067 DEBUG plugin.PluginRepository - plugin: id=urlfilter-domain name=Domain URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,068 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.domain.DomainURLFilter 2016-11-02 14:23:02,068 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-domainblacklist/plugin.xml 2016-11-02 14:23:02,069 DEBUG plugin.PluginRepository - plugin: id=urlfilter-domainblacklist name=Domain Blacklist URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,069 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.domainblacklist.DomainBlacklistURLFilter 2016-11-02 14:23:02,069 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-ignoreexempt/plugin.xml 2016-11-02 14:23:02,071 DEBUG plugin.PluginRepository - plugin: id=urlfilter-ignoreexempt name=External Domain Ignore Exemption version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,071 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLExemptionFilter class=org.apache.nutch.urlfilter.ignoreexempt.ExemptionUrlFilter 2016-11-02 14:23:02,071 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-prefix/plugin.xml 2016-11-02 14:23:02,086 DEBUG plugin.PluginRepository - plugin: id=urlfilter-prefix name=Prefix URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,087 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.prefix.PrefixURLFilter 2016-11-02 14:23:02,087 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-regex/plugin.xml 2016-11-02 14:23:02,088 DEBUG plugin.PluginRepository - plugin: id=urlfilter-regex name=Regex URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,088 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.regex.RegexURLFilter 2016-11-02 14:23:02,089 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-suffix/plugin.xml 2016-11-02 14:23:02,090 DEBUG plugin.PluginRepository - plugin: id=urlfilter-suffix name=Suffix URL Filter version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,090 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.suffix.SuffixURLFilter 2016-11-02 14:23:02,090 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlfilter-validator/plugin.xml 2016-11-02 14:23:02,097 DEBUG plugin.PluginRepository - plugin: id=urlfilter-validator name=URL Validator version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,097 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLFilter class=org.apache.nutch.urlfilter.validator.UrlValidator 2016-11-02 14:23:02,097 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlmeta/plugin.xml 2016-11-02 14:23:02,098 DEBUG plugin.PluginRepository - plugin: id=urlmeta name=URL Meta Indexing Filter version=1.0.0 provider=sgonyeaclass=null 2016-11-02 14:23:02,098 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.indexer.IndexingFilter class=org.apache.nutch.indexer.urlmeta.URLMetaIndexingFilter 2016-11-02 14:23:02,098 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.scoring.ScoringFilter class=org.apache.nutch.scoring.urlmeta.URLMetaScoringFilter 2016-11-02 14:23:02,099 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-ajax/plugin.xml 2016-11-02 14:23:02,100 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-ajax name=AJAX URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,100 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.ajax.AjaxURLNormalizer 2016-11-02 14:23:02,100 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-basic/plugin.xml 2016-11-02 14:23:02,102 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-basic name=Basic URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,102 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.basic.BasicURLNormalizer 2016-11-02 14:23:02,102 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-host/plugin.xml 2016-11-02 14:23:02,103 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-host name=Host URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,104 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.host.HostURLNormalizer 2016-11-02 14:23:02,104 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-pass/plugin.xml 2016-11-02 14:23:02,105 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-pass name=Pass-through URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,105 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.pass.PassURLNormalizer 2016-11-02 14:23:02,105 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-protocol/plugin.xml 2016-11-02 14:23:02,121 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-protocol name=Protocol URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,121 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.protocol.ProtocolURLNormalizer 2016-11-02 14:23:02,121 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-querystring/plugin.xml 2016-11-02 14:23:02,122 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-querystring name=Querystrings URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,123 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.querystring.QuerystringURLNormalizer 2016-11-02 14:23:02,123 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-regex/plugin.xml 2016-11-02 14:23:02,124 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-regex name=Regex URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,124 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.regex.RegexURLNormalizer 2016-11-02 14:23:02,124 DEBUG plugin.PluginRepository - parsing: /opt/nutch/plugins/urlnormalizer-slash/plugin.xml 2016-11-02 14:23:02,126 DEBUG plugin.PluginRepository - plugin: id=urlnormalizer-slash name=Slash URL Normalizer version=1.0.0 provider=nutch.orgclass=null 2016-11-02 14:23:02,126 DEBUG plugin.PluginRepository - impl: point=org.apache.nutch.net.URLNormalizer class=org.apache.nutch.net.urlnormalizer.slash.SlashURLNormalizer 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: index-geoip 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: lib-http 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: nutch-extensionpoints 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: lib-xml 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: language-identifier 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: indexer-dummy 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: lib-nekohtml 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: subcollection 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: urlfilter-validator 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: urlmeta 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: scoring-depth 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: indexer-cloudsearch 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: microformats-reltag 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: urlfilter-ignoreexempt 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: protocol-interactiveselenium 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: protocol-ftp 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: parsefilter-regex 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: parse-ext 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: parse-zip 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: lib-htmlunit 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: urlnormalizer-querystring 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: feed 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: index-more 2016-11-02 14:23:02,127 DEBUG plugin.PluginRepository - not including: urlnormalizer-slash 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: headings 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: index-links 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: creativecommons 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: parse-metatags 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: parse-swf 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlnormalizer-protocol 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: lib-selenium 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: protocol-htmlunit 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlnormalizer-ajax 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: index-metadata 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: protocol-selenium 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: parsefilter-naivebayes 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: mimetype-filter 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlfilter-suffix 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlfilter-domain 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlfilter-domainblacklist 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: parse-js 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: index-static 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: tld 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: lib-regex-filter 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlfilter-automaton 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: urlfilter-prefix 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: scoring-link 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: protocol-http 2016-11-02 14:23:02,128 DEBUG plugin.PluginRepository - not including: scoring-similarity 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - not including: urlnormalizer-host 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - not including: protocol-file 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - not including: index-replace 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - not including: indexer-elastic 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.indexer.IndexingFilter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.indexer.IndexWriter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.parse.Parser 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.parse.HtmlParseFilter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.protocol.Protocol 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.net.URLFilter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.net.URLExemptionFilter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.net.URLNormalizer 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.scoring.ScoringFilter 2016-11-02 14:23:02,129 DEBUG plugin.PluginRepository - Adding extension point org.apache.nutch.segment.SegmentMergeFilter 2016-11-02 14:23:02,129 INFO plugin.PluginRepository - Plugin Auto-activation mode: [true] 2016-11-02 14:23:02,129 INFO plugin.PluginRepository - Registered Plugins: 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Regex URL Filter (urlfilter-regex) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Html Parse Plug-in (parse-html) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - HTTP Framework (lib-http) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Http / Https Protocol Plug-in (protocol-httpclient) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - the nutch core extension points (nutch-extensionpoints) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Basic Indexing Filter (index-basic) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Anchor Indexing Filter (index-anchor) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Tika Parser Plug-in (parse-tika) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Basic URL Normalizer (urlnormalizer-basic) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Regex URL Filter Framework (lib-regex-filter) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Regex URL Normalizer (urlnormalizer-regex) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - CyberNeko HTML Parser (lib-nekohtml) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - OPIC Scoring Plug-in (scoring-opic) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Pass-through URL Normalizer (urlnormalizer-pass) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - SolrIndexWriter (indexer-solr) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Registered Extension-Points: 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch Content Parser (org.apache.nutch.parse.Parser) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch URL Filter (org.apache.nutch.net.URLFilter) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - HTML Parse Filter (org.apache.nutch.parse.HtmlParseFilter) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch Scoring (org.apache.nutch.scoring.ScoringFilter) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch URL Normalizer (org.apache.nutch.net.URLNormalizer) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch Protocol (org.apache.nutch.protocol.Protocol) 2016-11-02 14:23:02,130 INFO plugin.PluginRepository - Nutch URL Ignore Exemption Filter (org.apache.nutch.net.URLExemptionFilter) 2016-11-02 14:23:02,147 INFO plugin.PluginRepository - Nutch Index Writer (org.apache.nutch.indexer.IndexWriter) 2016-11-02 14:23:02,147 INFO plugin.PluginRepository - Nutch Segment Merge Filter (org.apache.nutch.segment.SegmentMergeFilter) 2016-11-02 14:23:02,147 INFO plugin.PluginRepository - Nutch Indexing Filter (org.apache.nutch.indexer.IndexingFilter) 2016-11-02 14:23:02,148 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2016-11-02 14:23:02,196 DEBUG params.DefaultHttpParams - Set parameter http.useragent = Jakarta Commons-HttpClient/3.1 2016-11-02 14:23:02,197 DEBUG params.DefaultHttpParams - Set parameter http.protocol.version = HTTP/1.1 2016-11-02 14:23:02,198 DEBUG params.DefaultHttpParams - Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager 2016-11-02 14:23:02,198 DEBUG params.DefaultHttpParams - Set parameter http.protocol.cookie-policy = default 2016-11-02 14:23:02,198 DEBUG params.DefaultHttpParams - Set parameter http.protocol.element-charset = US-ASCII 2016-11-02 14:23:02,198 DEBUG params.DefaultHttpParams - Set parameter http.protocol.content-charset = ISO-8859-1 2016-11-02 14:23:02,215 DEBUG params.DefaultHttpParams - Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@7fad8c79 2016-11-02 14:23:02,215 DEBUG params.DefaultHttpParams - Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z] 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Java version: 1.8.0_111 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Java vendor: Oracle Corporation 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Java class path: /opt/nutch:/opt/nutch/conf:/usr/lib/jvm/jre-1.8.0/lib/tools.jar:/opt/nutch/lib/activation-1.1.jar:/opt/nutch/lib/aopalliance-1.0.jar:/opt/nutch/lib/apache-nutch-1.12.jar:/opt/nutch/lib/args4j-2.0.16.jar:/opt/nutch/lib/asm-3.3.1.jar:/opt/nutch/lib/avro-1.7.4.jar:/opt/nutch/lib/bootstrap-3.0.3.jar:/opt/nutch/lib/cglib-2.2.1-v20090111.jar:/opt/nutch/lib/cglib-2.2.2.jar:/opt/nutch/lib/closure-compiler-v20130603.jar:/opt/nutch/lib/commons-cli-1.2.jar:/opt/nutch/lib/commons-codec-1.10.jar:/opt/nutch/lib/commons-collections-3.2.1.jar:/opt/nutch/lib/commons-collections4-4.0.jar:/opt/nutch/lib/commons-compress-1.9.jar:/opt/nutch/lib/commons-configuration-1.8.jar:/opt/nutch/lib/commons-daemon-1.0.13.jar:/opt/nutch/lib/commons-el-1.0.jar:/opt/nutch/lib/commons-httpclient-3.1.jar:/opt/nutch/lib/commons-io-2.4.jar:/opt/nutch/lib/commons-jexl-2.1.1.jar:/opt/nutch/lib/commons-lang-2.6.jar:/opt/nutch/lib/commons-lang3-3.1.jar:/opt/nutch/lib/commons-logging-1.1.3.jar:/opt/nutch/lib/commons-math3-3.1.1.jar:/opt/nutch/lib/commons-net-3.1.jar:/opt/nutch/lib/crawler-commons-0.6.jar:/opt/nutch/lib/cxf-core-3.0.4.jar:/opt/nutch/lib/cxf-rt-bindings-soap-3.0.4.jar:/opt/nutch/lib/cxf-rt-bindings-xml-3.0.4.jar:/opt/nutch/lib/cxf-rt-databinding-jaxb-3.0.4.jar:/opt/nutch/lib/cxf-rt-frontend-jaxrs-3.0.4.jar:/opt/nutch/lib/cxf-rt-frontend-jaxws-3.0.4.jar:/opt/nutch/lib/cxf-rt-frontend-simple-3.0.4.jar:/opt/nutch/lib/cxf-rt-transports-http-3.0.4.jar:/opt/nutch/lib/cxf-rt-transports-http-jetty-3.0.4.jar:/opt/nutch/lib/cxf-rt-ws-addr-3.0.4.jar:/opt/nutch/lib/cxf-rt-wsdl-3.0.4.jar:/opt/nutch/lib/cxf-rt-ws-policy-3.0.4.jar:/opt/nutch/lib/dom4j-1.6.1.jar:/opt/nutch/lib/dsiutils-2.0.12.jar:/opt/nutch/lib/fastutil-6.5.2.jar:/opt/nutch/lib/geronimo-servlet_3.0_spec-1.0.jar:/opt/nutch/lib/guava-16.0.1.jar:/opt/nutch/lib/guice-3.0.jar:/opt/nutch/lib/guice-servlet-3.0.jar:/opt/nutch/lib/h2-1.4.180.jar:/opt/nutch/lib/hadoop-annotations-2.4.0.jar:/opt/nutch/lib/hadoop-auth-2.4.0.jar:/opt/nutch/lib/hadoop-client-2.2.0.jar:/opt/nutch/lib/hadoop-common-2.4.0.jar:/opt/nutch/lib/hadoop-hdfs-2.4.0.jar:/opt/nutch/lib/hadoop-mapreduce-client-app-2.2.0.jar:/opt/nutch/lib/hadoop-mapreduce-client-common-2.4.0.jar:/opt/nutch/lib/hadoop-mapreduce-client-core-2.4.0.jar:/opt/nutch/lib/hadoop-mapreduce-client-jobclient-2.4.0.jar:/opt/nutch/lib/hadoop-mapreduce-client-shuffle-2.4.0.jar:/opt/nutch/lib/hadoop-yarn-api-2.4.0.jar:/opt/nutch/lib/hadoop-yarn-client-2.4.0.jar:/opt/nutch/lib/hadoop-yarn-common-2.4.0.jar:/opt/nutch/lib/hadoop-yarn-server-common-2.4.0.jar:/opt/nutch/lib/hadoop-yarn-server-nodemanager-2.4.0.jar:/opt/nutch/lib/htmlparser-1.6.jar:/opt/nutch/lib/httpclient-4.3.5.jar:/opt/nutch/lib/httpcore-4.3.2.jar:/opt/nutch/lib/icu4j-55.1.jar:/opt/nutch/lib/jackson-annotations-2.5.0.jar:/opt/nutch/lib/jackson-core-2.5.1.jar:/opt/nutch/lib/jackson-core-asl-1.8.8.jar:/opt/nutch/lib/jackson-databind-2.5.1.jar:/opt/nutch/lib/jackson-dataformat-cbor-2.5.1.jar:/opt/nutch/lib/jackson-jaxrs-1.8.8.jar:/opt/nutch/lib/jackson-jaxrs-base-2.5.1.jar:/opt/nutch/lib/jackson-jaxrs-json-provider-2.5.1.jar:/opt/nutch/lib/jackson-mapper-asl-1.8.8.jar:/opt/nutch/lib/jackson-module-jaxb-annotations-2.5.1.jar:/opt/nutch/lib/jackson-xc-1.8.8.jar:/opt/nutch/lib/jasper-compiler-5.5.23.jar:/opt/nutch/lib/jasper-runtime-5.5.23.jar:/opt/nutch/lib/javassist-3.12.1.GA.jar:/opt/nutch/lib/javax.annotation-api-1.2.jar:/opt/nutch/lib/javax.inject-1.jar:/opt/nutch/lib/java-xmlbuilder-0.4.jar:/opt/nutch/lib/javax.persistence-2.0.0.jar:/opt/nutch/lib/javax.ws.rs-api-2.0.1.jar:/opt/nutch/lib/jaxb-api-2.2.2.jar:/opt/nutch/lib/jaxb-core-2.1.14.jar:/opt/nutch/lib/jaxb-impl-2.2.3-1.jar:/opt/nutch/lib/jersey-client-1.9.jar:/opt/nutch/lib/jersey-core-1.9.jar:/opt/nutch/lib/jersey-guice-1.9.jar:/opt/nutch/lib/jersey-json-1.9.jar:/opt/nutch/lib/jersey-server-1.9.jar:/opt/nutch/lib/jettison-1.1.jar:/opt/nutch/lib/jetty-6.1.26.jar:/opt/nutch/lib/jetty-continuation-8.1.15.v20140411.jar:/opt/nutch/lib/jetty-http-8.1.15.v20140411.jar:/opt/nutch/lib/jetty-io-8.1.15.v20140411.jar:/opt/nutch/lib/jetty-security-8.1.15.v20140411.jar:/opt/nutch/lib/jetty-server-8.1.15.v20140411.jar:/opt/nutch/lib/jetty-util-6.1.26.jar:/opt/nutch/lib/jetty-util-8.1.15.v20140411.jar:/opt/nutch/lib/joda-time-2.3.jar:/opt/nutch/lib/jquery-2.0.3-1.jar:/opt/nutch/lib/jquerypp-1.0.1.jar:/opt/nutch/lib/jquery-selectors-0.0.3.jar:/opt/nutch/lib/jquery-ui-1.10.2-1.jar:/opt/nutch/lib/jsap-2.1.jar:/opt/nutch/lib/jsch-0.1.42.jar:/opt/nutch/lib/json-20131018.jar:/opt/nutch/lib/jsp-api-2.1.jar:/opt/nutch/lib/jsr305-1.3.9.jar:/opt/nutch/lib/juniversalchardet-1.0.3.jar:/opt/nutch/lib/libidn-1.15.jar:/opt/nutch/lib/log4j-1.2.17.jar:/opt/nutch/lib/lucene-analyzers-common-4.10.2.jar:/opt/nutch/lib/lucene-core-4.10.2.jar:/opt/nutch/lib/maven-parent-config-0.3.4.jar:/opt/nutch/lib/modernizr-2.6.2-1.jar:/opt/nutch/lib/neethi-3.0.3.jar:/opt/nutch/lib/netty-3.6.2.Final.jar:/opt/nutch/lib/ormlite-core-4.48.jar:/opt/nutch/lib/ormlite-jdbc-4.48.jar:/opt/nutch/lib/oro-2.0.8.jar:/opt/nutch/lib/paranamer-2.3.jar:/opt/nutch/lib/protobuf-java-2.5.0.jar:/opt/nutch/lib/reflections-0.9.8.jar:/opt/nutch/lib/servlet-api-2.5.jar:/opt/nutch/lib/slf4j-api-1.7.9.jar:/opt/nutch/lib/slf4j-log4j12-1.7.5.jar:/opt/nutch/lib/snappy-java-1.0.4.1.jar:/opt/nutch/lib/spring-aop-4.0.4.RELEASE.jar:/opt/nutch/lib/spring-beans-4.0.4.RELEASE.jar:/opt/nutch/lib/spring-context-4.0.4.RELEASE.jar:/opt/nutch/lib/spring-core-4.0.4.RELEASE.jar:/opt/nutch/lib/spring-expression-4.0.4.RELEASE.jar:/opt/nutch/lib/spring-web-4.0.4.RELEASE.jar:/opt/nutch/lib/stax2-api-3.1.4.jar:/opt/nutch/lib/stax-api-1.0-2.jar:/opt/nutch/lib/t-digest-3.1.jar:/opt/nutch/lib/tika-core-1.12.jar:/opt/nutch/lib/typeaheadjs-0.9.3.jar:/opt/nutch/lib/warc-hadoop-0.1.0.jar:/opt/nutch/lib/webarchive-commons-1.1.5.jar:/opt/nutch/lib/wicket-bootstrap-core-0.9.2.jar:/opt/nutch/lib/wicket-bootstrap-extensions-0.9.2.jar:/opt/nutch/lib/wicket-core-6.16.0.jar:/opt/nutch/lib/wicket-extensions-6.13.0.jar:/opt/nutch/lib/wicket-ioc-6.16.0.jar:/opt/nutch/lib/wicket-request-6.16.0.jar:/opt/nutch/lib/wicket-spring-6.16.0.jar:/opt/nutch/lib/wicket-util-6.16.0.jar:/opt/nutch/lib/wicket-webjars-0.4.0.jar:/opt/nutch/lib/woodstox-core-asl-4.4.1.jar:/opt/nutch/lib/wsdl4j-1.6.3.jar:/opt/nutch/lib/xercesImpl-2.11.0.jar:/opt/nutch/lib/xml-apis-1.4.01.jar:/opt/nutch/lib/xmlenc-0.52.jar:/opt/nutch/lib/xmlParserAPIs-2.6.2.jar:/opt/nutch/lib/xml-resolver-1.2.jar:/opt/nutch/lib/xmlschema-core-2.2.1.jar 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Operating system name: Linux 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Operating system architecture: amd64 2016-11-02 14:23:02,222 DEBUG httpclient.HttpClient - Operating system version: 3.10.0-327.36.3.el7.x86_64 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SUN 1.8: SUN (DSA key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; X.509 certificates; JKS & DKS keystores; PKIX CertPathValidator; PKIX CertPathBuilder; LDAP, Collection CertStores, JavaPolicy Policy; JavaLoginConfig Configuration) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunRsaSign 1.8: Sun RSA signature provider 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunJSSE 1.8: Sun JSSE provider(PKCS12, SunX509/PKIX key/trust factories, SSLv3/TLSv1/TLSv1.1/TLSv1.2) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunJCE 1.8: SunJCE Provider (implements RSA, DES, Triple DES, AES, Blowfish, ARCFOUR, RC2, PBE, Diffie-Hellman, HMAC) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunJGSS 1.8: Sun (Kerberos v5, SPNEGO) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunSASL 1.8: Sun SASL provider(implements client mechanisms for: DIGEST-MD5, GSSAPI, EXTERNAL, PLAIN, CRAM-MD5, NTLM; server mechanisms for: DIGEST-MD5, GSSAPI, CRAM-MD5, NTLM) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - XMLDSig 1.8: XMLDSig (DOM XMLSignatureFactory; DOM KeyInfoFactory; C14N 1.0, C14N 1.1, Exclusive C14N, Base64, Enveloped, XPath, XPath2, XSLT TransformServices) 2016-11-02 14:23:02,229 DEBUG httpclient.HttpClient - SunPCSC 1.8: Sun PC/SC provider 2016-11-02 14:23:02,253 INFO protocol.RobotRulesParser - Whitelisted hosts: [iis75.intranet.org] 2016-11-02 14:23:02,253 INFO httpclient.Http - http.proxy.host = null 2016-11-02 14:23:02,253 INFO httpclient.Http - http.proxy.port = 8080 2016-11-02 14:23:02,253 INFO httpclient.Http - http.proxy.exception.list = false 2016-11-02 14:23:02,265 INFO httpclient.Http - http.timeout = 36000 2016-11-02 14:23:02,265 INFO httpclient.Http - http.content.limit = 65536 2016-11-02 14:23:02,265 INFO httpclient.Http - http.agent = APL-Nutch-Spider/Nutch-1.12 ([email protected]) 2016-11-02 14:23:02,265 INFO httpclient.Http - http.accept.language = en-us,en-gb,en;q=0.7,*;q=0.3 2016-11-02 14:23:02,265 INFO httpclient.Http - http.accept = text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 2016-11-02 14:23:02,268 DEBUG params.DefaultHttpParams - Set parameter http.connection.timeout = 36000 2016-11-02 14:23:02,268 DEBUG params.DefaultHttpParams - Set parameter http.socket.timeout = 36000 2016-11-02 14:23:02,268 DEBUG params.DefaultHttpParams - Set parameter http.socket.sendbuffer = 8192 2016-11-02 14:23:02,268 DEBUG params.DefaultHttpParams - Set parameter http.socket.receivebuffer = 8192 2016-11-02 14:23:02,268 DEBUG params.DefaultHttpParams - Set parameter http.connection-manager.max-total = 50 2016-11-02 14:23:02,269 DEBUG params.DefaultHttpParams - Set parameter http.connection-manager.max-per-host = {HostConfiguration[]=10} 2016-11-02 14:23:02,269 DEBUG params.DefaultHttpParams - Set parameter http.connection-manager.timeout = 36000 2016-11-02 14:23:02,270 DEBUG params.DefaultHttpParams - Set parameter http.default-headers = [Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3 , Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7 , Accept: text/html,application/xml;q=0.9,application/xhtml+xml,text/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 , Accept-Encoding: x-gzip, gzip, deflate ] 2016-11-02 14:23:02,279 TRACE httpclient.Http - Credentials - username: domainuser; set as default for realm: domain; scheme: ntlm 2016-11-02 14:23:02,296 TRACE httpclient.HttpState - enter HttpState.getCredentials(AuthScope) 2016-11-02 14:23:02,296 TRACE httpclient.Http - Pre-configured credentials with scope - host: iis75.intranet.org; port: 80; not found for url: http://iis75.intranet.org 2016-11-02 14:23:02,297 TRACE httpclient.HttpState - enter HttpState.setCredentials(AuthScope, Credentials) 2016-11-02 14:23:02,352 TRACE methods.GetMethod - enter GetMethod(String) 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.version = HTTP/1.0 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.unambiguous-statusline = false 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.single-cookie-header = false 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.strict-transfer-encoding = false 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.reject-head-body = false 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.warn-extra-input = false 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.status-line-garbage-limit = 2147483647 2016-11-02 14:23:02,352 DEBUG params.DefaultHttpParams - Set parameter http.protocol.content-charset = UTF-8 2016-11-02 14:23:02,353 DEBUG params.DefaultHttpParams - Set parameter http.protocol.cookie-policy = compatibility 2016-11-02 14:23:02,353 DEBUG params.DefaultHttpParams - Set parameter http.protocol.single-cookie-header = true 2016-11-02 14:23:02,353 DEBUG params.DefaultHttpParams - Set parameter http.useragent = APL-Nutch-Spider/Nutch-1.12 ([email protected]) 2016-11-02 14:23:02,353 TRACE httpclient.HttpClient - enter HttpClient.executeMethod(HttpMethod) 2016-11-02 14:23:02,353 TRACE httpclient.HttpClient - enter HttpClient.executeMethod(HostConfiguration,HttpMethod,HttpState) 2016-11-02 14:23:02,373 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,373 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,374 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,374 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,374 TRACE httpclient.MultiThreadedHttpConnectionManager - enter HttpConnectionManager.getConnectionWithTimeout(HostConfiguration, long) 2016-11-02 14:23:02,374 DEBUG httpclient.MultiThreadedHttpConnectionManager - HttpConnectionManager.getConnection: config = HostConfiguration[host=http://iis75.intranet.org], timeout = 36000 2016-11-02 14:23:02,374 TRACE httpclient.MultiThreadedHttpConnectionManager - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) 2016-11-02 14:23:02,374 TRACE httpclient.MultiThreadedHttpConnectionManager - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) 2016-11-02 14:23:02,375 DEBUG httpclient.MultiThreadedHttpConnectionManager - Allocating new connection, hostConfig=HostConfiguration[host=http://iis75.intranet.org] 2016-11-02 14:23:02,391 TRACE httpclient.HttpMethodDirector - Attempt number 1 to process request 2016-11-02 14:23:02,391 TRACE httpclient.HttpConnection - enter HttpConnection.open() 2016-11-02 14:23:02,391 DEBUG httpclient.HttpConnection - Open connection to iis75.intranet.org:80 2016-11-02 14:23:02,441 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.execute(HttpState, HttpConnection) 2016-11-02 14:23:02,441 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) 2016-11-02 14:23:02,441 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) 2016-11-02 14:23:02,441 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) 2016-11-02 14:23:02,442 DEBUG wire.header - >> "GET / HTTP/1.0[\r][\n]" 2016-11-02 14:23:02,442 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,443 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,443 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,443 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,443 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,443 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addUserAgentRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,443 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addHostRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,443 DEBUG httpclient.HttpMethodBase - Adding Host request header 2016-11-02 14:23:02,443 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addCookieRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,455 TRACE httpclient.HttpState - enter HttpState.getCookies() 2016-11-02 14:23:02,455 TRACE cookie.CookieSpec - enter CookieSpecBase.match(String, int, String, boolean, Cookie[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addProxyConnectionHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "Accept: text/html,application/xml;q=0.9,application/xhtml+xml,text/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "Accept-Encoding: x-gzip, gzip, deflate[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "User-Agent: APL-Nutch-Spider/Nutch-1.12 ([email protected])[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,456 DEBUG wire.header - >> "Host: iis75.intranet.org[\r][\n]" 2016-11-02 14:23:02,456 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.writeLine() 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,457 DEBUG wire.header - >> "[\r][\n]" 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.flushRequestOutputStream() 2016-11-02 14:23:02,457 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,457 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readStatusLine(HttpState, HttpConnection) 2016-11-02 14:23:02,457 TRACE httpclient.HttpConnection - enter HttpConnection.readLine() 2016-11-02 14:23:02,458 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,458 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,487 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,487 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,488 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,488 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HeaderParser.parseHeaders(InputStream, String) 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,488 DEBUG wire.header - << "Server: Microsoft-IIS/7.5[\r][\n]" 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,488 DEBUG wire.header - << "WWW-Authenticate: Negotiate[\r][\n]" 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,488 DEBUG wire.header - << "WWW-Authenticate: NTLM[\r][\n]" 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,488 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "WWW-Authenticate: Basic realm="iis75.intranet.org"[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "X-Powered-By: ASP.NET[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "Date: Wed, 02 Nov 2016 19:23:03 GMT[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "Connection: close[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "Content-Length: 0[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,489 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,489 DEBUG wire.header - << "[\r][\n]" 2016-11-02 14:23:02,489 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processResponseHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,489 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processCookieHeaders(Header[], HttpState, HttpConnection) 2016-11-02 14:23:02,489 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpState, HttpConnection) 2016-11-02 14:23:02,489 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpConnection) 2016-11-02 14:23:02,489 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,489 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.canResponseHaveBody(int) 2016-11-02 14:23:02,490 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 14:23:02,490 TRACE httpclient.HttpMethodDirector - enter HttpMethodBase.processAuthenticationResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,496 DEBUG auth.AuthChallengeProcessor - Supported authentication schemes in the order of preference: [ntlm, digest, basic] 2016-11-02 14:23:02,496 INFO auth.AuthChallengeProcessor - ntlm authentication scheme selected 2016-11-02 14:23:02,496 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 14:23:02,496 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 14:23:02,496 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,496 TRACE httpclient.HttpState - enter HttpState.getCredentials(AuthScope) 2016-11-02 14:23:02,497 DEBUG httpclient.HttpMethodDirector - Retry authentication 2016-11-02 14:23:02,499 DEBUG httpclient.HttpMethodBase - Should close connection in response to directive: close 2016-11-02 14:23:02,499 TRACE httpclient.HttpConnection - enter HttpConnection.close() 2016-11-02 14:23:02,499 TRACE httpclient.HttpConnection - enter HttpConnection.closeSockedAndStreams() 2016-11-02 14:23:02,499 DEBUG httpclient.HttpMethodDirector - Authenticating with NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,499 TRACE httpclient.HttpState - enter HttpState.getCredentials(AuthScope) 2016-11-02 14:23:02,499 TRACE auth.NTLMScheme - enter NTLMScheme.authenticate(Credentials, HttpMethod) 2016-11-02 14:23:02,501 DEBUG params.HttpMethodParams - Credential charset not configured, using HTTP element charset 2016-11-02 14:23:02,504 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,504 TRACE httpclient.HttpMethodDirector - Attempt number 1 to process request 2016-11-02 14:23:02,504 TRACE httpclient.HttpConnection - enter HttpConnection.open() 2016-11-02 14:23:02,504 DEBUG httpclient.HttpConnection - Open connection to iis75.intranet.org:80 2016-11-02 14:23:02,507 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.execute(HttpState, HttpConnection) 2016-11-02 14:23:02,507 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) 2016-11-02 14:23:02,507 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) 2016-11-02 14:23:02,507 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) 2016-11-02 14:23:02,507 DEBUG wire.header - >> "GET / HTTP/1.1[\r][\n]" 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addUserAgentRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addHostRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,508 DEBUG httpclient.HttpMethodBase - Adding Host request header 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addCookieRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,508 TRACE httpclient.HttpState - enter HttpState.getCookies() 2016-11-02 14:23:02,508 TRACE cookie.CookieSpec - enter CookieSpecBase.match(String, int, String, boolean, Cookie[]) 2016-11-02 14:23:02,508 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addProxyConnectionHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,508 DEBUG wire.header - >> "Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3[\r][\n]" 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,508 DEBUG wire.header - >> "Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7[\r][\n]" 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,508 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,508 DEBUG wire.header - >> "Accept: text/html,application/xml;q=0.9,application/xhtml+xml,text/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 DEBUG wire.header - >> "Accept-Encoding: x-gzip, gzip, deflate[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 DEBUG wire.header - >> "User-Agent: APL-Nutch-Spider/Nutch-1.12 ([email protected])[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 DEBUG wire.header - >> "Authorization: NTLM TlRMTVNTU <snip by bob> MENPQUNE[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 DEBUG wire.header - >> "Host: iis75.intranet.org[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.writeLine() 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,509 DEBUG wire.header - >> "[\r][\n]" 2016-11-02 14:23:02,509 TRACE httpclient.HttpConnection - enter HttpConnection.flushRequestOutputStream() 2016-11-02 14:23:02,509 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,510 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readStatusLine(HttpState, HttpConnection) 2016-11-02 14:23:02,510 TRACE httpclient.HttpConnection - enter HttpConnection.readLine() 2016-11-02 14:23:02,510 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,510 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,603 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,604 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,604 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,604 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HeaderParser.parseHeaders(InputStream, String) 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,604 DEBUG wire.header - << "Content-Type: text/html; charset=us-ascii[\r][\n]" 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,604 DEBUG wire.header - << "Server: Microsoft-HTTPAPI/2.0[\r][\n]" 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,604 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,605 DEBUG wire.header - << "WWW-Authenticate: NTLM TlRMTVNTUAACAAAABQAFADgAAAAGAoECr+K/ <snip by bob> AAAAA[\r][\n]" 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,605 DEBUG wire.header - << "Date: Wed, 02 Nov 2016 19:23:03 GMT[\r][\n]" 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,605 DEBUG wire.header - << "Content-Length: 341[\r][\n]" 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,605 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,605 DEBUG wire.header - << "[\r][\n]" 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processResponseHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processCookieHeaders(Header[], HttpState, HttpConnection) 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpState, HttpConnection) 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpConnection) 2016-11-02 14:23:02,605 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.canResponseHaveBody(int) 2016-11-02 14:23:02,605 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 14:23:02,605 TRACE httpclient.HttpMethodDirector - enter HttpMethodBase.processAuthenticationResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,606 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 14:23:02,606 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 14:23:02,606 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,606 TRACE httpclient.HttpState - enter HttpState.getCredentials(AuthScope) 2016-11-02 14:23:02,606 DEBUG httpclient.HttpMethodDirector - Retry authentication 2016-11-02 14:23:02,606 DEBUG wire.content - << "<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN""http://www.w3.org/TR/html4/strict.dtd">[\r][\n]" 2016-11-02 14:23:02,606 DEBUG wire.content - << "<HTML><HEAD><TITLE>Not Authorized</TITLE>[\r][\n]" 2016-11-02 14:23:02,606 DEBUG wire.content - << "<META HTTP-EQUIV="Content-Type" Content="text/html; charset=us-ascii"></HEAD>[\r][\n]" 2016-11-02 14:23:02,606 DEBUG wire.content - << "<BODY><h2>Not Authorized</h2>[\r][\n]" 2016-11-02 14:23:02,606 DEBUG wire.content - << "<hr><p>HTTP Error 401. The requested resource requires user authentication.</p>[\r][\n]" 2016-11-02 14:23:02,606 DEBUG wire.content - << "</BODY></HTML>[\r][\n]" 2016-11-02 14:23:02,606 DEBUG httpclient.HttpMethodBase - Resorting to protocol version default close connection policy 2016-11-02 14:23:02,606 DEBUG httpclient.HttpMethodBase - Should NOT close connection, using HTTP/1.1 2016-11-02 14:23:02,607 TRACE httpclient.HttpConnection - enter HttpConnection.isResponseAvailable() 2016-11-02 14:23:02,607 DEBUG httpclient.HttpMethodDirector - Authenticating with NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,607 TRACE httpclient.HttpState - enter HttpState.getCredentials(AuthScope) 2016-11-02 14:23:02,607 TRACE auth.NTLMScheme - enter NTLMScheme.authenticate(Credentials, HttpMethod) 2016-11-02 14:23:02,607 DEBUG params.HttpMethodParams - Credential charset not configured, using HTTP element charset 2016-11-02 14:23:02,658 TRACE httpclient.HttpMethodBase - HttpMethodBase.addRequestHeader(Header) 2016-11-02 14:23:02,658 TRACE httpclient.HttpMethodDirector - Attempt number 1 to process request 2016-11-02 14:23:02,660 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.execute(HttpState, HttpConnection) 2016-11-02 14:23:02,660 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequest(HttpState, HttpConnection) 2016-11-02 14:23:02,660 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestLine(HttpState, HttpConnection) 2016-11-02 14:23:02,660 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.generateRequestLine(HttpConnection, String, String, String, String) 2016-11-02 14:23:02,660 DEBUG wire.header - >> "GET / HTTP/1.1[\r][\n]" 2016-11-02 14:23:02,660 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,660 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.writeRequestHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addUserAgentRequestHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addHostRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,661 DEBUG httpclient.HttpMethodBase - Adding Host request header 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addCookieRequestHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,661 TRACE httpclient.HttpState - enter HttpState.getCookies() 2016-11-02 14:23:02,661 TRACE cookie.CookieSpec - enter CookieSpecBase.match(String, int, String, boolean, Cookie[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.addProxyConnectionHeader(HttpState, HttpConnection) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "Accept-Language: en-us,en-gb,en;q=0.7,*;q=0.3[\r][\n]" 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "Accept-Charset: utf-8,ISO-8859-1;q=0.7,*;q=0.7[\r][\n]" 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "Accept: text/html,application/xml;q=0.9,application/xhtml+xml,text/xml;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5[\r][\n]" 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "Accept-Encoding: x-gzip, gzip, deflate[\r][\n]" 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "User-Agent: APL-Nutch-Spider/Nutch-1.12 ([email protected])[\r][\n]" 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,661 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,661 DEBUG wire.header - >> "Authorization: NTLM TlRMTVNTUAADAAAAGAAYAFUAAAAAAAAAbQAAAAUABQBAAAAABQAF <snip by bob> A==[\r][\n]" 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,662 DEBUG wire.header - >> "Host: iis75.intranet.org[\r][\n]" 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.print(String) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.writeLine() 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[]) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.write(byte[], int, int) 2016-11-02 14:23:02,662 DEBUG wire.header - >> "[\r][\n]" 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.flushRequestOutputStream() 2016-11-02 14:23:02,662 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,662 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readStatusLine(HttpState, HttpConnection) 2016-11-02 14:23:02,662 TRACE httpclient.HttpConnection - enter HttpConnection.readLine() 2016-11-02 14:23:02,662 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,662 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,953 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,954 DEBUG wire.header - << "HTTP/1.1 401 Unauthorized[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseHeaders(HttpState,HttpConnection) 2016-11-02 14:23:02,954 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HeaderParser.parseHeaders(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "Server: Microsoft-IIS/7.5[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "WWW-Authenticate: Negotiate[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "WWW-Authenticate: NTLM[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "WWW-Authenticate: Basic realm="iis75.intranet.org"[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "X-Powered-By: ASP.NET[\r][\n]" 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,954 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,954 DEBUG wire.header - << "Date: Wed, 02 Nov 2016 19:23:03 GMT[\r][\n]" 2016-11-02 14:23:02,955 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,955 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,955 DEBUG wire.header - << "Content-Length: 0[\r][\n]" 2016-11-02 14:23:02,955 TRACE httpclient.HttpParser - enter HttpParser.readLine(InputStream, String) 2016-11-02 14:23:02,955 TRACE httpclient.HttpParser - enter HttpParser.readRawLine() 2016-11-02 14:23:02,955 DEBUG wire.header - << "[\r][\n]" 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processResponseHeaders(HttpState, HttpConnection) 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.processCookieHeaders(Header[], HttpState, HttpConnection) 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpState, HttpConnection) 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.readResponseBody(HttpConnection) 2016-11-02 14:23:02,955 TRACE httpclient.HttpConnection - enter HttpConnection.getResponseInputStream() 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodBase - enter HttpMethodBase.canResponseHaveBody(int) 2016-11-02 14:23:02,955 DEBUG httpclient.HttpMethodDirector - Authorization required 2016-11-02 14:23:02,955 TRACE httpclient.HttpMethodDirector - enter HttpMethodBase.processAuthenticationResponse(HttpState, HttpConnection) 2016-11-02 14:23:02,955 DEBUG auth.AuthChallengeProcessor - Using authentication scheme: ntlm 2016-11-02 14:23:02,955 DEBUG auth.AuthChallengeProcessor - Authorization challenge processed 2016-11-02 14:23:02,955 DEBUG httpclient.HttpMethodDirector - Authentication scope: NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,955 DEBUG httpclient.HttpMethodDirector - Credentials required 2016-11-02 14:23:02,955 DEBUG httpclient.HttpMethodDirector - Credentials provider not available 2016-11-02 14:23:02,955 INFO httpclient.HttpMethodDirector - Failure authenticating with NTLM <any realm>@iis75.intranet.org:80 2016-11-02 14:23:02,958 DEBUG httpclient.HttpMethodBase - Resorting to protocol version default close connection policy 2016-11-02 14:23:02,959 DEBUG httpclient.HttpMethodBase - Should NOT close connection, using HTTP/1.1 2016-11-02 14:23:02,959 TRACE httpclient.HttpConnection - enter HttpConnection.isResponseAvailable() 2016-11-02 14:23:02,959 TRACE httpclient.HttpConnection - enter HttpConnection.releaseConnection() 2016-11-02 14:23:02,959 DEBUG httpclient.HttpConnection - Releasing connection back to connection manager. 2016-11-02 14:23:02,959 TRACE httpclient.MultiThreadedHttpConnectionManager - enter HttpConnectionManager.releaseConnection(HttpConnection) 2016-11-02 14:23:02,959 DEBUG httpclient.MultiThreadedHttpConnectionManager - Freeing connection, hostConfig=HostConfiguration[host=http://iis75.intranet.org] 2016-11-02 14:23:02,959 TRACE httpclient.MultiThreadedHttpConnectionManager - enter HttpConnectionManager.ConnectionPool.getHostPool(HostConfiguration) 2016-11-02 14:23:02,959 DEBUG util.IdleConnectionHandler - Adding connection at: 1478114582959 2016-11-02 14:23:02,959 DEBUG httpclient.MultiThreadedHttpConnectionManager - Notifying no-one, there are no waiting threads 2016-11-02 14:23:02,959 TRACE httpclient.Http - url: http://iis75.intranet.org; status code: 401; bytes received: 0; Content-Length: 0 2016-11-02 14:23:03,239 DEBUG util.ObjectCache - No object cache found for conf=Configuration: core-default.xml, core-site.xml, nutch-default.xml, nutch-site.xml, instantiating a new object cache 2016-11-02 14:23:03,356 TRACE httpclient.Http - 401 Authentication Required -----Original Message----- From: Furkan KAMACI [mailto:[email protected]] Sent: Wednesday, November 02, 2016 2:20 PM To: [email protected] Cc: Bell, Bob <[email protected]> Subject: Re: Nutch 1.12 NTLM authentication IIS 7.5 Intranet Hi Bob, Server may require that the domain as a part of username. For example, "domain\\user". Could you check that? Kind Regards, Furkan KAMACI On Wed, Nov 2, 2016 at 9:11 PM, Bell, Bob <[email protected]> wrote: > I have replaced <iis74.intranet> is just a string replacement for our > actual intranet name something like blah.intranet.org, and I use the > <> convention when I obscuring actual data. > > What might the log4js.properties entry for httpclient.Http ? I see it > is only at INFO level logging, but I do not know that proper object > path to set it up. > > Thanks, > Bob > > >Hi Bob, > > > >Do you write host as <iis75.intranet> or iis75.intranet ? > > > >Kind Regards, > >Furkan KAMACI > > -----Original Message----- > From: Bell, Bob > Sent: Wednesday, November 02, 2016 12:17 PM > To: '[email protected]' <[email protected]> > Cc: Bell, Bob <[email protected]> > Subject: Nutch 1.12 NTLM authentication IIS 7.5 Intranet > > I have been trying for more than a year to get NTLM to work with IIS 7.5 > without success. I was > happy to see the 1.12 recent release, and thought ok I will give it > shot again. I am almost to point where I do not believe it works with > ntlm, or it does not know how to handle the multiple 401's > that are returned, or I have some fundamental problem somewhere ? I > have tried everything I > could think of, and am at loss on how to solve this mystery. My Nutch > server is a Centos 7 in a > Virtual Box. I am using the httpclient as indicated in the docs but > with no love. I can fetch with > anonymous, but I need ntlm to work. > > I am using plugin.includes = >protocol-httpclient > > nutch-site.xml: > <property> > <name>http.auth.file</name> > <value>httpclient-auth.xml</value> > <description>Authentication configuration file for 'protocol-httpclient' > plugin. > </description> > </property> > > httpclient-auth.xml for local user: > <auth-configuration> > <credentials username="nutch" password="<somepassword>"> > <default scheme="basic" port="80"/> > </credentials> > </auth-configuration> > > Here is output with local user account on the server, one thing I > notice, is that I cannot force authentication to be anything other > than ntlm, even though I support ntlm, basic, and > digest. Notice the scheme was basic, > but it goes though ntlm regardless. > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > fetching: http://<iis75.intranet> > Whitelisted hosts: [<iis75.intranet>] > http.proxy.host = null > http.proxy.port = 8080 > http.proxy.exception.list = false > http.timeout = 36000 > http.content.limit = 65536 > http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = > text/html,application/xhtml+ > xml,application/xml;q=0.9,*/*;q=0.8 > Credentials - username: nutch; set as default for realm: ; scheme: > basic Pre-configured credentials with scope - host: <iis75.intranet>; > port: 80; not found for url: http://<iis75.intranet> Authorization > required Supported authentication schemes in the order of preference: > [ntlm, digest, basic] ntlm authentication scheme selected Using > authentication scheme: > ntlm Authorization challenge processed Authentication scope: NTLM <any > realm>@<iis75.intranet>:80 Credentials required Credentials provider > realm>not > available No credentials available for NTLM <any > realm>@<iis75.intranet>:80 > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > Content-Length: 0 > 401 Authentication Required > Fetch failed with protocol status: access_denied(17), lastModified=0: > Authentication required: http://<iis75.intranet> [root@localhost > nutch]# > > > httpclient-auth.xml for domain user: > <auth-configuration> > <credentials username="<domainuser>" password="<domainpassword> > <default host="<iis75.intranet>" scheme="ntlm" port="80" > realm="<domain>"/> > </credentials> > </auth-configuration> > > note: doesn’t matter what I put in the host, doesn’t seem to change > anything. > > [root@localhost nutch]# nutch parsechecker http://<iis75.intranet> > fetching: http://<iis75.intranet> > Whitelisted hosts: [<iis75.intranet>] > http.proxy.host = null > http.proxy.port = 8080 > http.proxy.exception.list = false > http.timeout = 36000 > http.content.limit = 65536 > http.agent = APL-Nutch-Spider/Nutch-1.12 http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 http.accept = > text/html,application/xhtml+ > xml,application/xml;q=0.9,*/*;q=0.8 > Credentials - username: <domainuser>"; set as default for realm: > =<domain>; scheme: ntlm Pre-configured credentials with scope - host: > <iis75.intranet>; port: 80; not found for url: http://<iis75.intranet> > Authorization required Supported authentication schemes in the order > of > preference: [ntlm, digest, basic] ntlm authentication scheme selected > Using authentication scheme: ntlm Authorization challenge processed > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 Retry > authentication Authenticating with NTLM <any > realm>@<iis75.intranet>:80 enter NTLMScheme.authenticate(Credentials, > HttpMethod) Authorization required Using authentication scheme: ntlm > Authorization challenge processed Authentication scope: NTLM <any > realm>@<iis75.intranet>:80 Retry authentication Authenticating with > NTLM <any realm>@<iis75.intranet>:80 enter > NTLMScheme.authenticate(Credentials, HttpMethod) Authorization > required Using authentication scheme: ntlm Authorization challenge > processed Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > Credentials required Credentials provider not available Failure > authenticating with NTLM <any realm>@<iis75.intranet>:80 > url: http://<iis75.intranet>; status code: 401; bytes received: 0; > Content-Length: 0 > 401 Authentication Required > Fetch failed with protocol status: access_denied(17), lastModified=0: > Authentication required: http://<iis75.intranet> > > Last entry in Hadoop.log: > > 2016-11-02 12:08:49,568 INFO parse.ParserChecker - fetching: http:// > <iis75.intranet> > 2016-11-02 12:08:50,040 DEBUG util.ObjectCache - No object cache found > for > conf=Configuration: core-default.xml, core-site.xml, > nutch-default.xml, nutch-site.xml, instantiating a new object cache > 2016-11-02 12:08:50,119 INFO protocol.RobotRulesParser - Whitelisted > hosts: [<iis75.intranet>] > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.host = null > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.proxy.port = 8080 > 2016-11-02 12:08:50,119 INFO httpclient.Http - > http.proxy.exception.list = false > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.timeout = 36000 > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.content.limit = > 65536 > 2016-11-02 12:08:50,119 INFO httpclient.Http - http.agent = > APL-Nutch-Spider/Nutch-1.12 ([email protected]) > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept.language = > en-us,en-gb,en;q=0.7,*;q=0.3 > 2016-11-02 12:08:50,120 INFO httpclient.Http - http.accept = > text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 > 2016-11-02 12:08:50,133 TRACE httpclient.Http - Credentials - username: > <domainuser>; set as default for realm: <domain>; scheme: ntlm > 2016-11-02 12:08:50,134 TRACE httpclient.Http - Pre-configured > credentials with scope - host: <iis75.intranet>; port: 80; not found > for url: http:// <iis75.intranet> > 2016-11-02 12:08:50,313 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Supported > authentication schemes in the order of preference: [ntlm, digest, > basic] > 2016-11-02 12:08:50,320 INFO auth.AuthChallengeProcessor - ntlm > authentication scheme selected > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,320 DEBUG auth.AuthChallengeProcessor - > Authorization challenge processed > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,320 DEBUG httpclient.HttpMethodDirector - Retry > authentication > 2016-11-02 12:08:50,321 DEBUG httpclient.HttpMethodDirector - > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,321 TRACE auth.NTLMScheme - enter > NTLMScheme.authenticate(Credentials, HttpMethod) > 2016-11-02 12:08:50,351 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,352 DEBUG auth.AuthChallengeProcessor - > Authorization challenge processed > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - Retry > authentication > 2016-11-02 12:08:50,352 DEBUG httpclient.HttpMethodDirector - > Authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,352 TRACE auth.NTLMScheme - enter > NTLMScheme.authenticate(Credentials, HttpMethod) > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Authorization required > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - Using > authentication scheme: ntlm > 2016-11-02 12:08:50,393 DEBUG auth.AuthChallengeProcessor - > Authorization challenge processed > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Authentication scope: NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Credentials required > 2016-11-02 12:08:50,393 DEBUG httpclient.HttpMethodDirector - > Credentials provider not available > 2016-11-02 12:08:50,393 INFO httpclient.HttpMethodDirector - Failure > authenticating with NTLM <any realm>@<iis75.intranet>:80 > 2016-11-02 12:08:50,395 TRACE httpclient.Http - url: > http://<iis75.intranet>; status code: 401; bytes received: 0; > Content-Length: 0 > 2016-11-02 12:08:50,681 DEBUG util.ObjectCache - No object cache found > for > conf=Configuration: core-default.xml, core-site.xml, > nutch-default.xml, nutch-site.xml, instantiating a new object cache > 2016-11-02 12:08:50,804 TRACE httpclient.Http - 401 Authentication > Required > > Any help is appreciated, as I am about to move on to another spirder > for solr. > > Thanks, > Bob > >

