Hi,

Regarding politeness, 3 threads per queue is not really polite :)

Cheers

 
 
-----Original message-----
> From:jc <jvizu...@gmail.com>
> Sent: Fri 01-Mar-2013 15:08
> To: user@nutch.apache.org
> Subject: Re: a lot of threads spinwaiting
> 
> Hi Roland and lufeng,
> 
> Thank you very much for your replies, I already tested lufeng advice, with
> results pretty much as expected.
> 
> By the way, my nutch installation is based on 2.1 version with hbase as
> crawldb storage
> 
> Roland, maybe fetcher.server.delay param has something to do with that as
> well, I set it to 3 secs, setting it to 0 would be unpolite?
> 
> All info you provided has helped me a lot, only one issue remains unfixed
> yet, there are more than 60 URLs from different hosts in my seed file, and
> only 20 queues, things may seem that all other 40 hosts have no more URLs to
> generate, but I really haven't seen any URL coming from those hosts since
> the creation of the crawldb.
> 
> Based on my poor experience following params would allow a number of 60
> queues for my vertical crawl, am I missing something?
> 
> topN = 1 million
> fetcher.threads.per.queue = 3
> fetcher.threads.per.host = 3 (just in case, I remember you told me to use
> per.queue instead)
> fetcher.threads.fetch = 200
> seed urls of different hosts = 60 or more (regex-urlfilter.txt allows only
> urls from these hosts, they're all there, I checked)
> crawldb record count > 1 million
> 
> Thanks again for all your help
> 
> Regards,
> JC
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/a-lot-of-threads-spinwaiting-tp4043801p4043988.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to