Hello,

I am using nutch 2.3.1

I preform the commands:
./nutch inject ../urls/seed.txt
./nutch generate -topN 2500
./nutch fetch -all

The problem is, the data only displays the raw HTML from the first
URL/page. All the other URLS that were accumulated by the generate command
are not actually crawled.

I cannot get nutch to crawl the other generated urls...I also cannot get
nutch to crawl the entire website. What are the options that I need to use
to crawl an entire site?

Does anyone have any insights or recommendations?

Thank you so much for your help,
-T

Reply via email to