Re: why i can't crawl all the linked pages in the specified page to crawl.

2006-07-07 Thread kevin
Hi,Stefan, thanks your reply. i've tried a 20 depth and it works better,it can crawl almost all the pages. however it have not crawled all pages yet. i'll try a bigger depth like 30 later... Stefan Groschupf 写道: Hi, may be you can try to have a much higher depth something like 20? However in

Re: why i can't crawl all the linked pages in the specified page to crawl.

2006-07-06 Thread Tonal Communications \(Stijn Amundsen\)
tefan Groschupf" <[EMAIL PROTECTED]> To: Sent: Friday, July 07, 2006 1:59 AM Subject: Re: why i can't crawl all the linked pages in the specified page to crawl. > Hi, > may be you can try to have a much higher depth something like 20? > However in general check: > + t

Re: why i can't crawl all the linked pages in the specified page to crawl.

2006-07-06 Thread Honda-Search Administrator
: Thursday, July 06, 2006 10:59 PM Subject: Re: why i can't crawl all the linked pages in the specified page to crawl. Hi, may be you can try to have a much higher depth something like 20? However in general check: + the regex url filter file. + the rebotos.txt + nofollow tag in the pages +

Re: why i can't crawl all the linked pages in the specified page to crawl.

2006-07-06 Thread Stefan Groschupf
Hi, may be you can try to have a much higher depth something like 20? However in general check: + the regex url filter file. + the rebotos.txt + nofollow tag in the pages + number of out links to extrac in nutch-default.cml Stefan On 06.07.2006, at 19:12, kevin pang wrote: i set up the nutch to

why i can't crawl all the linked pages in the specified page to crawl.

2006-07-06 Thread kevin pang
i set up the nutch to crawl the url: http://www.haha365.com/gd_joke/ but after crawl complete, only 54 pages were fetched. here is the log info: 060705 154332 parsing file:/C:/cygwin/nutch-0.7.2/conf/nutch-default.xml 060705 154332 parsing file:/C:/cygwin/nutch-0.7.2/conf/crawl-tool.xml 060705 1