Hi - i am not familiar with 2x but if those are your commands, then you are either missing the parse job or fetcher.parse=true, and not performing an updatedb job to write discovered records back to the DB.
Markus -----Original message----- > From:Tom Running <runningt...@gmail.com> > Sent: Tuesday 1st March 2016 5:39 > To: user@nutch.apache.org > Subject: Nutch cannot crawl entire website > > Hello, > > I am using nutch 2.3.1 > > I preform the commands: > ./nutch inject ../urls/seed.txt > ./nutch generate -topN 2500 > ./nutch fetch -all > > The problem is, the data only displays the raw HTML from the first > URL/page. All the other URLS that were accumulated by the generate command > are not actually crawled. > > I cannot get nutch to crawl the other generated urls...I also cannot get > nutch to crawl the entire website. What are the options that I need to use > to crawl an entire site? > > Does anyone have any insights or recommendations? > > Thank you so much for your help, > -T >