Hi

hoping for some help to get sitemaps.xml working
using this command to crawl  (nutch 1.18)

NUTCH_HOME/bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch
--sitemaps-from-hostdb always -s $NUTCH_HOME/urls/ $NUTCH_HOME/Crawl 10

if this flag is used *--sitemaps-from-hostdb always*
*this error occurs*

*Generator: number of items rejected during selection:Generator:    201
SCHEDULE_REJECTEDGenerator: 0 records selected for fetching, exiting ...*

without this flag present   it crawls the site without issue and

nutch-default.xml set the interval to 2 seconds from default 30 days.

 <name>db.fetch.interval.default</name>

  <value>2</value>

I also don't understand why the crawldb is automatically deleted after each
crawl so I cannot runn any commands about url's that are not crawled.

Any help

-- 

Andrew MacKay

-- 
CONFIDENTIALITY NOTICE: The information contained in this email is 
privileged and confidential and intended only for the use of the individual 
or entity to whom it is addressed.   If you receive this message in error, 
please notify the sender immediately at 613-729-1100 and destroy the 
original message and all copies. Thank you.

Reply via email to