Thanks a bunch 黄淑明

2011/1/31 黄淑明 <[email protected]>

> Yes, if you just crawl webpages (not including .pdf, .doc....).
>
>
> 2011/1/31 .: Abhishek :. <[email protected]>:
> > Hi,
> >
> >  Thanks for the update. I tried using the Luke tool.
> >
> >  It shows the "Number of documents" as 40. So is this the number of
> pages?
> >
> >
> > Thanks,
> > Abhi
> >
> >
> > On Mon, Jan 31, 2011 at 1:01 PM, 黄淑明 <[email protected]> wrote:
> >
> >> Nutch describe page by "document', so you can get the total document
> >> by index tool, such as Luke ("number of documents")
> >> or you can get documents by code,such as:
> >> IndexSearcher searcher = new new IndexSearcher(dir);
> >> searcher.maxDoc();
> >>
> >> hope this will help you.
> >>
> >> tiger
> >> 2011/01/31
> >>
> >>
> >>
> >> 2011/1/31 .: Abhishek :. <[email protected]>:
> >> > Hi folks,
> >> >
> >> >  How do I get to know the number of pages Nutch has crawled?
> >> >
> >> >  I see from the tutorial below,
> >> >
> >> >
> >>
> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
> >> >
> >> >  that the readdb gives the number of pages and urls. I am using Nutch
> 1.2
> >> > and I am unable to get the number of pages crawled using the readdb
> >> command.
> >> >
> >> > I actually need to roughly calculate the time taken to crawl a single
> >> page,
> >> > so the number of pages would be great help.
> >> >
> >> > Thanks,
> >> > Abhishek
> >> >
> >>
> >
>

Reply via email to