Dear all,

I've been trying to crawl and index a https intranet, but the generator
keeps saying that there are 0 links to be fetched after authenticating and
parsing the first page. It seems that there's something wrong with the
parser when used with https (httpclient).

here's the command that I'm using to reproduce the error:

bin/nutch org.apache.nutch.parse.ParserChecker http://server/user/library

cmd output:  http://pastebin.com/h5e7wAZ5

hadoop.log: http://pastebin.com/S7ieS2TT (you can see the page is fetched
and the contents around line 300)

Any ideas/help will be appreciated,

Alfredas

Reply via email to