I'm seeing many cases where ParserChecker finds outlinks in a document, but when running crawl on this document they do not appear in the crawl DB at all (and are not indexed).
My URL filters are trivial as far as I can tell, and the missing links are not special in any way that I can see. For example: /bin/nutch parsechecker -dumpText "http://corporate.exxonmobil.com/" finds, among others, the URLs https://energyfactor.exxonmobil.com/ and http://corporate.exxonmobil.com/en/investors/corporate-governance. However, when running bin/crawl urls_yossi yossi 2 with only http://corporate.exxonmobil.com/ in urls_yossi, and then dumping yossi/crawldb (using `nutch readdb`), the two above URLs are not found. When finished, the crawldb contains 786 entries, which is far below topN. Any idea what could be causing these URLs to be ignored?