Which parser are you using for html? parse-html or parse-tika?

On 1 August 2011 20:00, webdev1977 <[email protected]> wrote:

> I had protocol-httpclient working in 1.2 and sending certificates for a
> group
> of sites.  I moved the plugin over to the 1.3 environment and it won't
> work.. I am having the same issue as the OP.. no content parsed for the
> seed
> url.  I see it come in on debug.wire... <html>....
> https://domain.com/test.php?id=123 link ...</html>..
> but then it does nothing with the links here.  I have tried changing my
> filters multiple times and it just won't parse them.  I also ran the
> ParseChecker class and I get "0" outlinks.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Fetched-pages-has-no-content-tp3171881p3216762.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Reply via email to