Hi,Stefan,
thanks your reply.
i've tried a 20 depth and it works better,it can crawl almost all the
pages. however it have not crawled all pages yet.
i'll try a bigger depth like 30 later...
Stefan Groschupf 写道:
Hi,
may be you can try to have a much higher depth something like 20?
However in
tefan Groschupf" <[EMAIL PROTECTED]>
To:
Sent: Friday, July 07, 2006 1:59 AM
Subject: Re: why i can't crawl all the linked pages in the specified page to
crawl.
> Hi,
> may be you can try to have a much higher depth something like 20?
> However in general check:
> + t
: Thursday, July 06, 2006 10:59 PM
Subject: Re: why i can't crawl all the linked pages in the specified page to
crawl.
Hi,
may be you can try to have a much higher depth something like 20?
However in general check:
+ the regex url filter file.
+ the rebotos.txt
+ nofollow tag in the pages
+
Hi,
may be you can try to have a much higher depth something like 20?
However in general check:
+ the regex url filter file.
+ the rebotos.txt
+ nofollow tag in the pages
+ number of out links to extrac in nutch-default.cml
Stefan
On 06.07.2006, at 19:12, kevin pang wrote:
i set up the nutch to
i set up the nutch to crawl the url: http://www.haha365.com/gd_joke/
but after crawl complete, only 54 pages were fetched.
here is the log info:
060705 154332 parsing file:/C:/cygwin/nutch-0.7.2/conf/nutch-default.xml
060705 154332 parsing file:/C:/cygwin/nutch-0.7.2/conf/crawl-tool.xml
060705 1