Thanks a bunch 黄淑明 2011/1/31 黄淑明 <[email protected]>
> Yes, if you just crawl webpages (not including .pdf, .doc....). > > > 2011/1/31 .: Abhishek :. <[email protected]>: > > Hi, > > > > Thanks for the update. I tried using the Luke tool. > > > > It shows the "Number of documents" as 40. So is this the number of > pages? > > > > > > Thanks, > > Abhi > > > > > > On Mon, Jan 31, 2011 at 1:01 PM, 黄淑明 <[email protected]> wrote: > > > >> Nutch describe page by "document', so you can get the total document > >> by index tool, such as Luke ("number of documents") > >> or you can get documents by code,such as: > >> IndexSearcher searcher = new new IndexSearcher(dir); > >> searcher.maxDoc(); > >> > >> hope this will help you. > >> > >> tiger > >> 2011/01/31 > >> > >> > >> > >> 2011/1/31 .: Abhishek :. <[email protected]>: > >> > Hi folks, > >> > > >> > How do I get to know the number of pages Nutch has crawled? > >> > > >> > I see from the tutorial below, > >> > > >> > > >> > http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html > >> > > >> > that the readdb gives the number of pages and urls. I am using Nutch > 1.2 > >> > and I am unable to get the number of pages crawled using the readdb > >> command. > >> > > >> > I actually need to roughly calculate the time taken to crawl a single > >> page, > >> > so the number of pages would be great help. > >> > > >> > Thanks, > >> > Abhishek > >> > > >> > > >

