I kicked off a bunch of web crawls on Friday to run over the weekend. They all started fine but didn't finish. No errors in the logs I can find. All action seemed to stop after a couple of hours. It's configured as complete crawl that runs every 24 hours.
I don't expect you to have an answer to what went wrong with such limited information, but I did see a problem with robots.txt (at the bottom of this email). Does it mean robots.txt was not used at all for the crawl, or just that part was ignored? (I kind of expected this kind of error to kill the crawl, but maybe I just don't understand it.) If the crawl were ignoring the robots.txt, or a part of it, and the crawled site banned my crawler, what would I see in the MCF logs? Thanks, Mark 02-09-2014 09:54:48.679robots parsesomesite.gov:80 ERRORS01Unknown robots.txt line: 'Sitemap: < http://www.somesite.gov/sitemapindex.xml>'
