Re: Continue Nutch Crawling After Exception

Lewis John Mcgibbney Tue, 05 Mar 2013 10:59:28 -0800

Hi,

On Tue, Mar 5, 2013 at 7:22 AM, raviksingh <ravisingh.air...@gmail.com>wrote:


> I am new to Nutch.I have already configured Nutch with MYSQL. I have few
> questions :
>

I would like to star by saying that this is not a great idea. If you read
this list you will see why.


>
> 1.Currently I am crawling all the domains from my SEED.TXT. If some
> exception occurs the crawling stops and some domains are not crawled, just
> because of one domain/webpage. Is there a way to force nutch to continue
> crawling after exception occurs ?
>

What are the exceptions?


>
> 2.I want domains/URLs to be crawled from DB. Currently I and reading from
> DB
> and writing to SEED.TXT before starting to crawl. Is there a better way?
>

Not yet, this has also been discussed pretty thoroughly.


>
> 3.Is there a way to provide URLFilter for scanning/restricting particular
> domain/Url programatically? I have checked org.apache.nutch.net.URLFilter.
> I
> was unable to make it work.
>
>
Please give an example of what you are trying to do here? Are you using the
de facto scripts provided with Nutch or something else to run your Nutch
server?
-- 
*Lewis*

Re: Continue Nutch Crawling After Exception

Reply via email to