Hi I'm using Nutch 2.2.1
Each of the 4 jobs in the crawl cycle, as explained here need to reread the entire webtable to get started: http://wiki.apache.org/nutch/Nutch2Crawling This is a serious bottleneck for my use case. I know that the fetch and parse job can be combined via the Nutch config. This removes the need for the parse job to be run separately- and therefore the webtable does not to be read again. The page I linked to states that a future development might be combining the generate and fetch stages so that only one read of the webtable is required. Has anyone attempted to do is? Is there a patch out there for a combined generator and fetch job? Thanks Az

