Yes, if you just crawl webpages (not including .pdf, .doc....).
2011/1/31 .: Abhishek :. <[email protected]>: > Hi, > > Thanks for the update. I tried using the Luke tool. > > It shows the "Number of documents" as 40. So is this the number of pages? > > > Thanks, > Abhi > > > On Mon, Jan 31, 2011 at 1:01 PM, 黄淑明 <[email protected]> wrote: > >> Nutch describe page by "document', so you can get the total document >> by index tool, such as Luke ("number of documents") >> or you can get documents by code,such as: >> IndexSearcher searcher = new new IndexSearcher(dir); >> searcher.maxDoc(); >> >> hope this will help you. >> >> tiger >> 2011/01/31 >> >> >> >> 2011/1/31 .: Abhishek :. <[email protected]>: >> > Hi folks, >> > >> > How do I get to know the number of pages Nutch has crawled? >> > >> > I see from the tutorial below, >> > >> > >> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html >> > >> > that the readdb gives the number of pages and urls. I am using Nutch 1.2 >> > and I am unable to get the number of pages crawled using the readdb >> command. >> > >> > I actually need to roughly calculate the time taken to crawl a single >> page, >> > so the number of pages would be great help. >> > >> > Thanks, >> > Abhishek >> > >> >

