Re: protocol-httpclient

Julien Nioche Mon, 01 Aug 2011 21:58:49 -0700

There hasn't been any changes to it between 1.2 and 1.3 and it was already
broken then. It does not handle multithreading well, leading to all sorts of
random exceptions. A good replacement would be to use the code in
Crawler-Commons that Ken contributed and wrap it as a protocol endpoint. Not
entirely sure whether it can already handle certificates but if not this
could be a good thing to add it to CC.


Sorry if you've already done so but would you mind explaining what doesn't
work for you anymore and what exceptions you are getting?


On 1 August 2011 20:28, webdev1977 <[email protected]> wrote:

> I have just recently learned that it is recommended not to use
> protocol-httpclient due to the underlying commons http library and problems
> with this.
>
> I am very disappointed to learn this as about half of my domains to crawl
> use https and require certs.  Does anyone know how much of an effort it
> would be to port to the apache http client?
>
> Also, are their any JIRA issues open that might describe some of the
> problems we are having with it.  I had it working perfectly fine in 1.2,
> upgraded to 1.3 and now it is not working :-(
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/protocol-httpclient-tp3216821p3216821.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com

Re: protocol-httpclient

Reply via email to