Nutch describe page by "document', so you can get the total document
by index tool, such as Luke ("number of documents")
or you can get documents by code,such as:
IndexSearcher searcher = new new IndexSearcher(dir);
searcher.maxDoc();

hope this will help you.

tiger
2011/01/31



2011/1/31 .: Abhishek :. <[email protected]>:
> Hi folks,
>
>  How do I get to know the number of pages Nutch has crawled?
>
>  I see from the tutorial below,
>
> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
>
>  that the readdb gives the number of pages and urls. I am using Nutch 1.2
> and I am unable to get the number of pages crawled using the readdb command.
>
> I actually need to roughly calculate the time taken to crawl a single page,
> so the number of pages would be great help.
>
> Thanks,
> Abhishek
>

Reply via email to