Thanks for responding! 

I've hit it again with TRACE logging... here's the results of that:

2019-04-25 08:53:10,261 INFO  parse.ParserChecker - fetching:
http://url.com/crawltest.html
2019-04-25 08:53:10,268 INFO  plugin.PluginRepository - Plugins: looking in:
C:\nutch\apache-nutch-1.5.1\plugins
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository - Plugin
Auto-activation mode: [true]
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository - Registered Plugins:
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Html Parse 
Plug-in
(parse-html)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         HTTP Framework
(lib-http)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Http / Https
Protocol Plug-in (protocol-httpclient)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Regex URL Filter
(urlfilter-regex)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         the nutch core
extension points (nutch-extensionpoints)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Basic Indexing
Filter (index-basic)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Anchor Indexing
Filter (index-anchor)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Tika Parser 
Plug-in
(parse-tika)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Basic URL
Normalizer (urlnormalizer-basic)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Regex URL Filter
Framework (lib-regex-filter)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Regex URL
Normalizer (urlnormalizer-regex)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         URL Validator
(urlfilter-validator)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         CyberNeko HTML
Parser (lib-nekohtml)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Pass-through URL
Normalizer (urlnormalizer-pass)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         OPIC Scoring
Plug-in (scoring-opic)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Http Protocol
Plug-in (protocol-http)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository - Registered
Extension-Points:
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch Content
Parser (org.apache.nutch.parse.Parser)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch URL Filter
(org.apache.nutch.net.URLFilter)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         HTML Parse 
Filter
(org.apache.nutch.parse.HtmlParseFilter)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch Scoring
(org.apache.nutch.scoring.ScoringFilter)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch URL
Normalizer (org.apache.nutch.net.URLNormalizer)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch Protocol
(org.apache.nutch.protocol.Protocol)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch Segment 
Merge
Filter (org.apache.nutch.segment.SegmentMergeFilter)
2019-04-25 08:53:10,350 INFO  plugin.PluginRepository -         Nutch Indexing
Filter (org.apache.nutch.indexer.IndexingFilter)
2019-04-25 08:53:10,377 INFO  httpclient.Http - http.proxy.host = null
2019-04-25 08:53:10,377 INFO  httpclient.Http - http.proxy.port = 8080
2019-04-25 08:53:10,378 INFO  httpclient.Http - http.timeout = 10000
2019-04-25 08:53:10,379 INFO  httpclient.Http - http.content.limit = -1
2019-04-25 08:53:10,379 INFO  httpclient.Http - http.agent =
Spider/Nutch-1.5.1
2019-04-25 08:53:10,379 INFO  httpclient.Http - http.accept.language =
en-us,en-gb,en;q=0.7,*;q=0.3
2019-04-25 08:53:10,380 INFO  httpclient.Http - http.accept =
text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
2019-04-25 08:53:10,385 TRACE httpclient.Http - Credentials - username:
user; set as default for realm: ntdomain; scheme: 
2019-04-25 08:53:10,392 TRACE httpclient.Http - Pre-configured credentials
with scope -  host: url.com; port: 80; not found for url:
http://url.com/crawltest.html
2019-04-25 08:53:10,449 DEBUG auth.AuthChallengeProcessor - Supported
authentication schemes in the order of preference: [ntlm, digest, basic]
2019-04-25 08:53:10,449 INFO  auth.AuthChallengeProcessor - ntlm
authentication scheme selected
2019-04-25 08:53:10,450 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2019-04-25 08:53:10,450 DEBUG auth.AuthChallengeProcessor - Authorization
challenge processed
2019-04-25 08:53:10,452 TRACE auth.NTLMScheme - enter
NTLMScheme.authenticate(Credentials, HttpMethod)
2019-04-25 08:53:10,460 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2019-04-25 08:53:10,460 DEBUG auth.AuthChallengeProcessor - Authorization
challenge processed
2019-04-25 08:53:10,461 TRACE auth.NTLMScheme - enter
NTLMScheme.authenticate(Credentials, HttpMethod)
2019-04-25 08:53:10,952 DEBUG auth.AuthChallengeProcessor - Using
authentication scheme: ntlm
2019-04-25 08:53:10,953 DEBUG auth.AuthChallengeProcessor - Authorization
challenge processed
2019-04-25 08:53:10,955 INFO  httpclient.HttpMethodDirector - Failure
authenticating with NTLM <any realm>@url.com:80
2019-04-25 08:53:10,959 TRACE httpclient.Http - url:
http://url.com/crawltest.html; status code: 401; bytes received: 6322;
Content-Length: 6322
2019-04-25 08:53:11,033 TRACE httpclient.Http - 401 Authentication Required
2019-04-25 08:53:11,133 INFO  crawl.SignatureFactory - Using Signature impl:
org.apache.nutch.crawl.MD5Signature
2019-04-25 08:53:11,135 INFO  parse.ParserChecker - parsing:
http://urlcom/crawltest.html
2019-04-25 08:53:11,135 INFO  parse.ParserChecker - contentType:
application/xhtml+xml
2019-04-25 08:53:11,138 INFO  parse.ParserChecker - signature:
495abb7f991fb4dd6a056f748908a2d9

Regarding whats on the server security events - a couple interesting things:
1. It sees it, but the failure reason is "Unknown user name or bad
password". The user and password being sent from httpclient-auth.xml is the
exact same as what i'm sending in from the curl command
2. Unlike the Curl command, the Account Name being sent over is all upper
case! I have this suspicion that this has something to do with it. Again,
though, the username in httpclient-auth.xml is NOT all in upper case. 





--
Sent from: http://lucene.472066.n3.nabble.com/Nutch-User-f603147.html

Reply via email to