That doesn't seem too unreasonable of a result count to me if you're running local. Assuming you're partitioning via host, all of those URLs are to the same host, and you have a 3 second politeness delay you should end up w/ a crawl lasting
21497 * 3 / 60 / 60 = 17.9 hours There's a wiki page on crawl optimization that might help you out: https://wiki.apache.org/nutch/OptimizingCrawls As for conf documentation check the descriptions in nutch-default and try poking around the wiki for some more info. I think your best bet is going to be what's in the conf files. -- Jimmy On Tue, Sep 29, 2015 at 6:13 PM, Pramod Setlur <set...@usc.edu> wrote: > Hello, > > We had left Nutch to crawl with 25 urls and 7 rounds. After around 15 hrs > it was able to fetch only 10% of the URLs. > > I have attached a screenshot for a better reference. Can you guide us on > what other configurations need to be added to improve crawling? > > Also where can i learn more about configurations of Nutch. Eg: increasing > the threads for crawling, etc. > > Thank you, > Best Regards, > Pramod P. Setlur > > alt email id: pramodset...@gmail.com > [M] - +1-(323)-637-5256 > USC ID: 7369871317 > LinkedIn <http://www.linkedin.com/pub/pramod-setlur/3b/726/270/> > >