Hi, On Tue, Mar 5, 2013 at 7:22 AM, raviksingh <ravisingh.air...@gmail.com>wrote:
> I am new to Nutch.I have already configured Nutch with MYSQL. I have few > questions : > I would like to star by saying that this is not a great idea. If you read this list you will see why. > > 1.Currently I am crawling all the domains from my SEED.TXT. If some > exception occurs the crawling stops and some domains are not crawled, just > because of one domain/webpage. Is there a way to force nutch to continue > crawling after exception occurs ? > What are the exceptions? > > 2.I want domains/URLs to be crawled from DB. Currently I and reading from > DB > and writing to SEED.TXT before starting to crawl. Is there a better way? > Not yet, this has also been discussed pretty thoroughly. > > 3.Is there a way to provide URLFilter for scanning/restricting particular > domain/Url programatically? I have checked org.apache.nutch.net.URLFilter. > I > was unable to make it work. > > Please give an example of what you are trying to do here? Are you using the de facto scripts provided with Nutch or something else to run your Nutch server? -- *Lewis*