i set this property to -1 but nutch dosen't crawl. i have a problem
with Arabic sites:
i can crawl an arabic site like: http://www.sahafa.com/
but i can't crawl another site like:http://www.aljazeera.net/Portal/
 help me please.

On 1/31/12, Julien Nioche-4 [via Lucene]
<[email protected]> wrote:
>
>
> Try changing the value of this parameter in nutch-site.xml
>
> <property>
>   <name>db.max.outlinks.per.page</name>
>   <value>100</value>
>   <description>The maximum number of outlinks that we'll process for a page.
>   If this value is nonnegative (>=0), at most db.max.outlinks.per.page
> outlinks
>   will be processed for a page; otherwise, all outlinks will be processed.
>   </description>
> </property>
>
>
> Julien
>
> On 31 January 2012 02:56, mina <[email protected]> wrote:
>
>> i crawl a site with nutch 1.4. but nutch dosen't crawl all links in this
>> site. the language of this site is not English. for example nutch dosen't
>> crawl this link:
>>
>>
>> http://www.irna.ir/News/30786427/سوء-استفاده-از-نام-كمیته-امداد-برای-جمع-آوری-رای-در-مناطق-محروم/سياسي/
>>
>> what can i solve this problem? what config i should do?
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702014.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>
>
>
>
> --
> *
> *Open Source Solutions for Text Engineering
>
> http://digitalpebble.blogspot.com/
> http://www.digitalpebble.com
>
>
> _______________________________________________
> If you reply to this email, your message will be added to the discussion
> below:
> http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702789.html
>
> To unsubscribe from error in crawl all link in no English language sites,
> visit
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3702014&code=dGFoZXJlZ2Fuaml5YXJAZ21haWwuY29tfDM3MDIwMTR8NTgyODE5NjA3


--
View this message in context: 
http://lucene.472066.n3.nabble.com/error-in-crawl-all-link-in-no-English-language-sites-tp3702014p3702796.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to