Try changing the value of this parameter in nutch-site.xml <property> <name>db.max.outlinks.per.page</name> <value>100</value> <description>The maximum number of outlinks that we'll process for a page. If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks will be processed for a page; otherwise, all outlinks will be processed. </description> </property>
Julien On 31 January 2012 02:56, mina <[email protected]> wrote: > i crawl a site with nutch 1.4. but nutch dosen't crawl all links in this > site. the language of this site is not English. for example nutch dosen't > crawl this link: > > > http://www.irna.ir/News/30786427/سوء-استفاده-از-نام-كمیته-امداد-برای-جمع-آوری-رای-در-مناطق-محروم/سياسي/ > > what can i solve this problem? what config i should do? > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702014.html > Sent from the Nutch - User mailing list archive at Nabble.com. > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com

