This page explains the individual steps:
http://wiki.apache.org/nutch/NutchTutorial#A3.2_Using_Individual_Commands_for_Whole-Web_Crawling
-----Original message-----
> From:Eyeris Rodriguez Rueda <eru...@uci.cu>
> Sent: Mon 03-Dec-2012 21:08
> To: user@nutch.apache.org
> Subject: RE: hung threads in big nutch crawl process
>
> Thank markus for your anwer.
> I always have used nutch with console making a complete cycle
> bin/nutch crawl urls -dir crawl -depth 10 -topN 100000 -solr
> http://localhost:8080/solr
> Could you explain me how to use a separately process. I was reading the wiki
> but not function for me because I don’t understand the commands. I want to
> use nutch in distribuited mode, could you give me a good documentation of it.
>
> _____________________________________________________________________
> Ing. Eyeris Rodriguez Rueda
> Teléfono:837-3370
> Universidad de las Ciencias Informáticas
> _____________________________________________________________________
>
> -----Mensaje original-----
> De: Markus Jelsma [mailto:markus.jel...@openindex.io]
> Enviado el: lunes, 03 de diciembre de 2012 1:42 PM
> Para: user@nutch.apache.org
> Asunto: RE: hung threads in big nutch crawl process
>
> Hi - Hadoop organizes some threads but in Nutch the only job that uses
> threads is the fetcher. Parses are done using the executor service.
>
> It is very well possible that you have some regexes that are very complex and
> Nutch can take a long time processing those, especially if you parse in the
> fetcher job.
>
> You should run the Nutch jobs separate to find out which job is giving you
> trouble.
>
> -----Original message-----
> > From:Eyeris Rodriguez Rueda <eru...@uci.cu>
> > Sent: Mon 03-Dec-2012 20:31
> > To: user@nutch.apache.org
> > Subject: hung threads in big nutch crawl process
> >
> > Hi all.
> > I have detected that in big nutch crawl process(depth:10 topN:100 000) some
> > threads are hunged in some part of crawl cicle for example normalizing by
> > regex and fetching urls to.
> > Im using nutch 1.5.1 and solr 3.6.
> > Ram:2GB
> > CPU:CoreI3.
> > OS:Ubuntu 12.04(server)
> >
> > I have a doubt, How nutch manipulate the threads in a cicle of crawl
> > process ?.
> > Is multithread the generation,fetching,parsing process ?
> >
> > PD:Sorry for my english. Is not my native language.
>
>
> 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
> INFORMATICAS...
> CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
>
> http://www.uci.cu
> http://www.facebook.com/universidad.uci
> http://www.flickr.com/photos/universidad_uci
>