Re: Number of pages crawled?

黄淑明 Sun, 30 Jan 2011 22:45:00 -0800

Yes, if you just crawl webpages (not including .pdf, .doc....).


2011/1/31 .: Abhishek :. <[email protected]>:
> Hi,
>
>  Thanks for the update. I tried using the Luke tool.
>
>  It shows the "Number of documents" as 40. So is this the number of pages?
>
>
> Thanks,
> Abhi
>
>
> On Mon, Jan 31, 2011 at 1:01 PM, 黄淑明 <[email protected]> wrote:
>
>> Nutch describe page by "document', so you can get the total document
>> by index tool, such as Luke ("number of documents")
>> or you can get documents by code,such as:
>> IndexSearcher searcher = new new IndexSearcher(dir);
>> searcher.maxDoc();
>>
>> hope this will help you.
>>
>> tiger
>> 2011/01/31
>>
>>
>>
>> 2011/1/31 .: Abhishek :. <[email protected]>:
>> > Hi folks,
>> >
>> >  How do I get to know the number of pages Nutch has crawled?
>> >
>> >  I see from the tutorial below,
>> >
>> >
>> http://today.java.net/pub/a/today/2006/01/10/introduction-to-nutch-1.html
>> >
>> >  that the readdb gives the number of pages and urls. I am using Nutch 1.2
>> > and I am unable to get the number of pages crawled using the readdb
>> command.
>> >
>> > I actually need to roughly calculate the time taken to crawl a single
>> page,
>> > so the number of pages would be great help.
>> >
>> > Thanks,
>> > Abhishek
>> >
>>
>

Re: Number of pages crawled?

Reply via email to