Not finding links when using HTTPS (httpclient)

Alfredas Chmieliauskas Fri, 07 Oct 2011 01:17:28 -0700

Dear all,

I've been trying to crawl and index a https intranet, but the generator
keeps saying that there are 0 links to be fetched after authenticating and
parsing the first page. It seems that there's something wrong with the
parser when used with https (httpclient).


here's the command that I'm using to reproduce the error:

bin/nutch org.apache.nutch.parse.ParserChecker http://server/user/library

cmd output:  http://pastebin.com/h5e7wAZ5

hadoop.log: http://pastebin.com/S7ieS2TT (you can see the page is fetched
and the contents around line 300)

Any ideas/help will be appreciated,

Alfredas

Not finding links when using HTTPS (httpclient)

Reply via email to