Hello,

I'm using Nutch to crawl a mailing list index that we have here
internally. I'd like to be able to force Nutch to recrawl just the
index page - so it can find the mailing list posts that are new since
the last crawl. Will re-injecting the URL into the crawldb accomplish
this or is there some other way to do it? I'd like to set the max
recrawl age high enough that the pages would, theoretically, never get
re-crawled (there's no point, because it's an email archive that's never
going to change) but I can't do that until I'm sure that I can force a
recrawl on this one specific page. Thanks!

Thanks,
Eddie

Reply via email to