RE: hung threads in big nutch crawl process

Markus Jelsma Mon, 03 Dec 2012 12:15:31 -0800
This page explains the individual steps:
http://wiki.apache.org/nutch/NutchTutorial#A3.2_Using_Individual_Commands_for_Whole-Web_Crawling
 
 
-----Original message-----
> From:Eyeris Rodriguez Rueda <eru...@uci.cu>
> Sent: Mon 03-Dec-2012 21:08
> To: user@nutch.apache.org
> Subject: RE: hung threads in big nutch crawl process
> 
> Thank markus for your anwer.
> I always have used nutch with console making a complete cycle
> bin/nutch crawl urls -dir crawl -depth 10 -topN 100000 -solr 
> http://localhost:8080/solr
> Could you explain me how to use a separately process. I was reading the wiki 
> but not function for me because I don’t understand the commands. I want to 
> use nutch in distribuited mode, could you give me a good documentation of it.
> 
> _____________________________________________________________________
> Ing. Eyeris Rodriguez Rueda
> Teléfono:837-3370
> Universidad de las Ciencias Informáticas
> _____________________________________________________________________
> 
> -----Mensaje original-----
> De: Markus Jelsma [mailto:markus.jel...@openindex.io] 
> Enviado el: lunes, 03 de diciembre de 2012 1:42 PM
> Para: user@nutch.apache.org
> Asunto: RE: hung threads in big nutch crawl process
> 
> Hi - Hadoop organizes some threads but in Nutch the only job that uses 
> threads is the fetcher. Parses are done using the executor service.
> 
> It is very well possible that you have some regexes that are very complex and 
> Nutch can take a long time processing those, especially if you parse in the 
> fetcher job.
> 
> You should run the Nutch jobs separate to find out which job is giving you 
> trouble.
> 
> -----Original message-----
> > From:Eyeris Rodriguez Rueda <eru...@uci.cu>
> > Sent: Mon 03-Dec-2012 20:31
> > To: user@nutch.apache.org
> > Subject: hung threads in big nutch crawl process
> > 
> > Hi all.
> > I have detected that in big nutch crawl process(depth:10 topN:100 000) some 
> > threads are hunged in some part of crawl cicle for example normalizing by 
> > regex and fetching urls to.
> > Im using nutch 1.5.1 and solr 3.6.
> > Ram:2GB
> > CPU:CoreI3.
> > OS:Ubuntu 12.04(server)
> > 
> > I have a doubt, How nutch manipulate the threads in a cicle of crawl 
> > process ?.
> > Is multithread the generation,fetching,parsing process ? 
> > 
> > PD:Sorry for my english. Is not my native language.
> 
> 
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
> 
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>
RE: hung threads in big nutch crawl process

Reply via email to